On the Viability of Memory Forensics in Compromised Environments

Zur Praktikabilit¨atvon Hauptspeicherforensik in kompromittierten Umgebungen

Der Technischen Fakult¨atder Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg zur Erlangung des Grades

DOKTOR-INGENIEUR

vorgelegt von

Johannes Stuettgen

aus Herdecke Als Dissertation genehmigt von der Technischen Fakult¨atder Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg

Tag der m¨undlichen Pr¨ufung: 28.05.2015 Vorsitzende des Promotionsorgans: Prof. Dr.-Ing. habil. Marion Merklein Gutachter: Prof. Dr.-Ing. Felix Freiling Prof. Dr. Michael Meier Abstract

Memory forensics has become a powerful tool for the detection and analysis of ma- licious . It provides investigators with an impartial view of a system, expos- ing hidden processes, threads, and network connections, by acquiring and analyzing physical memory. Because malicious software must be at least partially resident in memory in order to execute, it cannot remove all its traces from RAM. However, the memory acquisition process is vulnerable to subversion in compromised envi- ronments. Malicious software can employ anti-forensic techniques to intercept the acquisition and filter memory contents while they are copied. In this thesis, we analyze 12 popular memory acquisition tools for Windows, , and Mac OS X, and study their implementation in regard to how they enumerate and map memory. We find that all of the analyzed programs use the to perform these tasks, and further illustrate this by implementing an open source memory acquisition framework for Mac OS X. In a survey of kernel rootkit techniques, that prevent or filter physical memory access, we show that all 12 tested programs are vulnerable to anti-forensics, because they rely on the operating system for critical functions. To elliminate this vulnerability, we develop an operating system independent ap- proach that directly utilizes the hardware to enumerate and map memory. By inter- acting with the PCI controller, we are able to safely avoid memory mapped device buffers while acquiring the entire physical address space. We program the page tables directly to map memory, forcing the MMU to facilitate arbitrary physical memory access from our driver’s data segment. We implement our techniques into the open source memory acquisition frameworks Winpmem, Pmem, and OSXPmem, further- ing the capabilities of memory acquisition software on the Windows, Linux, and Mac OS X platforms. Finally, we apply our novel technique to related problems in memory forensics. Memory acquisition software for Linux can only be run on a system with the exact same kernel version and configuration as the system it was compiled on, dueto dependencies on kernel data structures. We are able to create a minimal, kernel independent version of our module, which we inject into a compatible host module on the target. By hijacking the hosts data structures, we are able to load the infected module, redirect control flow, and communicate with it using a character device. A second innovative property of our acquisition approach is that, because we can enumerate the location of memory mapped device buffers, we are able to safely access memory regions unknown to the operating system. This allows us to acquire malicious during of the memory acquisition process. We present asurvey on firmware code and data in the physical address space, and show how wecan capture the BIOS, PCI option ROMs, and the ACPI tables using our approach. We implement plugins for the open source memory analysis framework Volatility, which are able to extract the ACPI tables from memory and analyze them for malicious behavior.

Zusammenfassung

Hauptspeicherforensik hat sich zu einem m¨achtigen Werkzeug fur¨ die Erkennung und Analyse von Schadsoftware entwickelt. Sie stellt Ermittlern eine objektive Sicht auf Computersysteme bereit, mit der versteckte Artefakte, wie Prozesse und Netzwerk- verbindungen, durch Analyse des Hauptspeicherinhalts enttarnt werden k¨onnen. Da Schadsoftware zumindest in Teilen des Hauptspeichers vorhanden sein muss um aus- gefuhrt¨ werden zu k¨onnen, ist es unm¨oglich alle Spuren einer Infektion zur Laufzeit aus dem Speicher zu beseitigen. In kompromittierten Umgebungen besteht allerdings die Gefahr, dass der Zugriff auf den Hauptspeicher durch anti-forensische Methoden unterwandert wird. In dieser Arbeit analysieren wir die Implementierung von 12 weit verbreitete Werk- zeuge zur Erstellung von Hauptspeicherabbildern unter Windows, Linux, und Mac OS X. Unsere Untersuchungen zeigen, dass alle Programme das Betriebssystem zur Umsetzung kritischer Aufgaben verwenden, was wir durch die Implementierung des Programms OSXPmem illustrieren. In einer Studie geben wir dann einen Uberblick¨ uber¨ die verschiedenen anti-forensischen Techniken auf Betriebssystemebene, und zeigen mit einem Experiment das alle 12 untersuchten Programme anf¨allig fur¨ Anti- Forensik sind. Wir schließen diese Lucke¨ durch die Entwicklung von betriebssystemunabh¨angige Techniken zur Hauptspeichersicherung. Durch direkte Interaktion mit dem PCI Con- troller identifizieren wir in den physischen Adressraum eingeblendete Ger¨ate, was uns den sicheren Zugriff auf den restlichen Speicher¨oglicht. erm Wir blenden diesen in den Adressraum unserer Anwendung ein, indem wir die Datenstrukturen der MMU manipulieren. Wir implementieren unsere Techniken in die quelloffenen Programme Winpmem, Pmem, und OSXPmem, was Ermittlern den Einsatz unter Windows, Linux, und Mac OS X erm¨oglicht. Schließlich nutzen wir unsere Techniken, um zwei weitere Probleme der Hauptspei- cherforensik zu l¨osen. Programme zur Speichersicherung unter Linux sind nur auf Systemen mit der exakt gleichen Kernel Version und Konfiguration lauff¨ahig mit der sie kompiliert wurden, da sie ein Kernel Module laden was von den Datenstrukturen des Kernels abh¨angig ist. Wir l¨osen dieses Problem indem wir eine minimale Version unseres Moduls in ein kompatibles Opfer“-Modul auf dem Zielsystem injizieren. Zur ” Kommunikation mit dem Kernel zweckentfremden wir die Datenstrukturen des Op- fers, was uns erlaubt unser Programm auf einer großen Menge verschiedener Linux Systeme zu verwenden, ohne es neu kompilieren zu mussen.¨ Die zweite innovative Eigenschaft unseres Ansatzes ist, dass wir gefahrlos auf Speicherbereiche zugreifen k¨onnen die dem Betriebssystem nicht bekannt sind, da uns die Position der in den Adressraum eingeblendeten Ger¨ate bekannt ist. Dies erlaubt uns den Zugriff auf die Firmware im Zuge der Hauptspeicheruntersuchung. Wir geben einen Uberblick¨ uber¨ die Lage von BIOS, PCI option ROMs und den ACPI Tabellen im physischen Adress- raum, und implementieren Techniken zur Sicherung und Analyse von Firmware fur¨ die quelloffene Speicheranalyse-Software Volatility.

Acknowledgments

This thesis would not have been possible without the support of others. First and foremost, I would like to thank my supervisor Felix Freiling for his continuous advice and support during my time at the Security Research Group of the Department of Science in Erlangen. Many thanks also go to Michael Meier, from the University of Bonn, for agreeing to be my second supervisor. I also thank my colleagues at the Security Research Group, for a cheerful and friendly working atmosphere. I would also like to extend my thanks to the Google Incident Response Team, for many interesting discussions and an exciting working environment. A special thank you goes to Michael Cohen, who, with his guidance and inspiration facilitated two of the three papers this thesis is built upon. In addition, I want to thank the following list of people (in alphabetical order) for helping me proofread this thesis and forge it into something legible: Michael Gruhn, Tilo M¨uller,Ben Stock, Heiner St¨uttgenand Stefan V¨omel. Finally, I want to thank Mathieu Suiche, for his commitment to forensic tool testing, providing me with an evaluation version of Moonsols Windows Memory Toolkit, that allowed me to also test a commercial tool for anti-forensic resilience.

Contents

1 Introduction ...... 1 1.1 Contributions...... 2 1.2 Related Work...... 4 1.3 Publications...... 7 1.4 Outline...... 8

2 Technical Background ...... 9 2.1 x86 Architecture...... 10 2.1.1 The Physical Address Space...... 12 2.1.2 Memory Protection...... 15 2.1.3 The PCI Express Bus...... 19 2.2 Linux Kernel Modules...... 25 2.2.1 Module Binary Organization...... 26 2.2.2 Linking and Loading...... 27 2.3 System Firmware...... 29 2.3.1 Basic Input Output System...... 29 2.3.2 (Unified) Extensible Firmware Interface...... 30 2.3.3 PCI Option ROMs...... 30 2.3.4 Advanced Configuration and Power Interface...... 31 2.4 Summary...... 33

3 Memory Acquisition ...... 35 3.1 Principles of Memory Acquisition...... 36 3.1.1 Criteria for Sound Memory Acquisition...... 36 3.1.2 Correctness of Existing Memory Acquisition Tools...... 39 3.1.3 Memory Image Formats...... 40 3.2 Software Memory Acquisition Techniques...... 41 3.2.1 Memory Acquisition Challenges...... 41

i Contents

3.2.2 Operating System Memory Interfaces...... 42 3.2.3 Driver-Based Memory Acquisition...... 44 3.3 Summary...... 48

4 Anti-Memory Forensics ...... 49 4.1 Anti-Forensic Techniques...... 50 4.2 Attacks on Memory Acquisition Software...... 51 4.2.1 Windows...... 51 4.2.2 Mac OS X...... 55 4.2.3 Linux...... 56 4.3 Passive Anti-Forensics...... 58 4.3.1 Hidden Memory...... 58 4.3.2 Evaluation...... 61 4.4 Summary...... 62

5 Anti-Forensic Resilient Memory Acquisition ...... 63 5.1 Improving Memory Acquisition...... 63 5.1.1 Hardware-based Memory Enumeration...... 64 5.1.2 Hardware-based Memory Mapping...... 66 5.1.3 Evaluation...... 68 5.2 Discussion...... 69 5.2.1 Loading of Driver...... 69 5.2.2 Interception of Data Buffers...... 69 5.2.3 Debug Registers...... 70 5.2.4 Shadow Page Tables...... 70 5.2.5 Reliability and Stability...... 71 5.3 Summary...... 71

6 Kernel Independent Memory Acquisition on Linux ...... 73 6.1 Compatibility of Linux Kernel Modules With Different Kernels..... 75 6.1.1 Bypassing Module Version Checking...... 77 6.1.2 Requirements for a Stable Approach...... 77

ii Contents

6.2 Reliable Loading of Generic Acquisition Modules...... 78 6.2.1 Parasitizing a Compatible Module...... 78 6.2.2 Code Injection into Kernel Modules...... 79 6.3 Redirection of Control Flow...... 80 6.3.1 Interception of Module Initialization...... 80 6.3.2 Communication with User Mode...... 81 6.3.3 Selection of a Suitable Host...... 82 6.4 Implementation of a Minimal Acquisition Module...... 83 6.5 Summary...... 85

7 Acquisition and Analysis of Compromised Firmware ...... 87 7.1 Rootkit Strategies for Compromising Firmware...... 88 7.1.1 BIOS- and EFI-Based Attacks...... 88 7.1.2 PCI Option ROM-Based Attacks...... 89 7.1.3 ACPI-Based Attacks...... 90 7.2 Enumeration of Firmware in the Physical Address Space...... 91 7.2.1 Enumeration of the Physical Address Space...... 91 7.2.2 Mapping of Memory and Firmware Regions...... 94 7.3 Firmware Analysis...... 95 7.4 Evaluation...... 96 7.4.1 Stability and Correctness of the Acquisition Method...... 96 7.4.2 Comparison with Available Memory Acquisition Solutions... 97 7.4.3 Detection of ACPI Rootkits...... 97 7.5 Discussion...... 98 7.5.1 Technological Limitations...... 99 7.5.2 Anti-Forensics...... 99 7.6 Summary...... 99

8 Conclusion ...... 101 8.1 Summary...... 101 8.2 Future Work...... 103

Bibliography ...... 105

iii List of Figures

2.1 Organization of the Background Chapter...... 9 2.2 Architecture of a North- and South-Bridge Based ...... 10 2.3 Architecture of a modern PCH based chipset...... 11 2.4 Memory Transaction Routing...... 13 2.5 Memory Map of a Haswell System...... 14 2.6 Virtual Address Space on an x86-64 System...... 17 2.7 Datastructures Involved in Virtual to Physical Address Translation. 18 2.8 PCIe Protocol Layers...... 20 2.9 PCIe Architecture...... 21 2.10 PCI Configuration Space Addressing...... 22 2.11 PCIe Type 0 Configuration Space Header...... 23 2.12 PCI 32- MMIO BAR Layout...... 24 2.13 PCIe Type 1 Configuration Space Header...... 24 2.14 ELF file layout...... 26 2.15 Static vs. Dynamic Linking...... 27 2.16 Loading of a Kernel Module...... 28 2.17 ACPI Architecture...... 32

3.1 Space-Time Diagram of an Atomicity Violation...... 38 3.2 Space-Time Diagram of Integrity Violations...... 39

4.1 Effects of DKOM on /proc/iomem ...... 58 4.2 Hidden memory on Test System with 4 GB RAM...... 59

5.1 PTE Remapping Technique...... 67

6.1 Initialization of a Kernel Module...... 76 6.2 Relocation Hook of module->init ...... 81 6.3 Relocation Hook of file_operations ...... 82

7.1 Firmware Memory Ranges...... 92 7.2 Views on the Physical Address Space...... 93

iv List of Tables

4.1 Evaluation of Acquisition with Active Anti-Forensics...... 54

6.1 Host Modules by Kernel Version...... 84

7.1 Firmware Acquisition Capabilities of Memory Forensic Software.... 97 7.2 Classification of Operation Regions in the ACPI Test Data Set..... 98

v Listings

3.1 Identifying Physical Memory Regions in /proc/kcore ...... 44 3.2 Memory Mapping in OSXPmem...... 47 3.3 Accessing the Memory Map in OSXPmem...... 48 4.1 Attack on Windows Memory Management APIs...... 53 4.2 OS X Memory-Map Overwriting...... 56 4.3 DKOM Attack on Linux Memory Map...... 57 5.1 PCI BAR Sizing...... 65 6.1 Module Data Structure (The Linux Kernel Archives, 2013)...... 75

vi Acronyms

ACPI ...... Advanced Configuration and Power Interface AMD ...... AML ...... ACPI Machine Language API ...... Application Programming Interface ASL ...... ACPI Source Language ASLR ...... Address Space Layout Randomization ATM ...... Automated Teller Machine

BAR ...... Base Address Register BDF ...... Bus, Device, Function BIOS ...... Basic Input Output System BSOD ...... Blue Screen of Death

CAM ...... Configuration Access Mechanism CPU ...... Central Processing Unit CS ...... Code Segment

DKOM ...... Direct Kernel Object Manipulation DMA ...... Direct Memory Access DMI ...... Direct Media Interface DOS ...... Disk Operating System DSDT ...... Differentiated System Description Table DTB ...... Directory Table Base DWORD ...... Double Word DXE ...... Driver Execution Environment

EBDA ...... Extended BIOS Data Area ECAM ...... Enhanced CAM EFI ...... Extensible Firmware Interface ELF ...... Executable and Linkable Format

vii Acronyms

EPROM ...... Erasable Programmable ROM

FADT ...... Fixed ACPI Description Table FSB ...... Front Side Bus

GFX ...... Graphics GPU ...... Graphical Processing Unit GTT ...... Graphics (GFX) Translation Tables

HAL ...... Hardware Abstraction Layer Haswell ...... Intel 4th Generation Core Architecture HF ...... High Frequency

I/O ...... Input/Output ID ...... Identifier IDT ...... Interrupt Descriptor Table iMC ...... Integrated Memory Controller IOMMU ...... Input/Output MMU IVT ...... Interrupt Vector Table

LFSR ...... Linear Feedback Shift Register LMAP ...... Linux Memory Acquisition Parasite LPC ...... Low Pin Count

MBR ...... ME ...... Management Engine MMIO ...... Memory Mapped Input/Output MMU ...... Memory Management Unit

NVRAM ...... Non-Volatile RAM

OS ...... Operating System OS X ...... Apple Mac OS X

PAM ...... Programmable Attribute Map PCH ...... Platform Controller Hub PCI ...... Peripheral Component Interconnect

viii Acronyms

PCIe ...... Peripheral Component Interconnect Express PD ...... Page Directory PDE ...... PD Entry PDPT ...... Page Directory Pointer Table PDPTE ...... PDPT Entry PEI ...... Pre-EFI Initialization PFN ...... Page Frame Number PLT ...... Procedure Linkage Table PML4 ...... Page Map Level 4 PML4E ...... PML4 Entry PMM ...... POST Memory Manager POST ...... Power On Self Test PS ...... Page Size PT ...... Page Table PTE ...... PT Entry

RAM ...... Random Access Memory ROM ...... Read Only Memory RSDP ...... Root System Description Pointer RSDT ...... Root System Description Table RX ...... Receive

SATA ...... Serial Advanced Technology Attachment SEC ...... Security SMI ...... System Management Interrupt SMM ...... System Management Mode SMRAM ...... System Management RAM SPI ...... Serial Peripherial Interface

TCP/IP ...... Internet protocol suite TLB ...... Translation Lookaside Buffer TOLUD ...... Top of Lower Usable DRAM

ix Acronyms

TOUUD ...... Top of Upper Usable DRAM TSEG ...... Top of Main Memory Segment TX ...... Transmit

UEFI ...... Unified Extensible Firmware Interface UMA ...... Uniform Memory Access USB ...... Universal Serial Bus

VFS ...... Virtual Filesystem VMM ...... Virtual Machine Monitor

XROMBAR ...... Expansion ROM Base Address Register XSDT ...... Extended Root System Description Table

YAML ...... YAML Ain’t Markup Language

x Chapter 1

Introduction

In 2013, a bank in the Ukraine noticed that an Automated Teller Machine (ATM) was dispensing cash for no apparent reason. At seemingly random intervals, the machine would start emptying its money supply onto the street without user in- teraction. The security cameras showed that the money was being picked up by random strangers, just happening to pass by at the right moment. When computer security specialists analyzed the at the bank, they uncovered one of the biggest bank heists in history (The New York Times, 2015). A group of criminals had attacked the computers of over 100 banks worldwide, infecting key systems with the malicious software agent Carbanak (Kaspersky Labs, 2015). Aside from manip- ulating ATMs to dispense money, the software was used to monitor bank employees and discover how the banks conducted their operations. By impersonating bank employees, the group managed to transfer hundreds of millions of US dollars into offshore accounts. The transfers went unnoticed for months, because the criminals manipulated the banks internal bookkeeping records to hide the missing balance. Total financial losses are estimated to be between 300 million and 1 billion USdol- lars (Kaspersky Labs, 2015). Incidents like the Carbanak bank heist show that malicious software has become a significant threat to businesses worldwide. In fact, a recent study by McAfee estimates the global cost of cybercrime to be more than 400 billion US dollars (McAfee Inc., 2014). Memory forensics, the process of acquiring and analyzing the contents of a com- puters RAM, has become an integral part of digital forensic investigations targeting malicious software, because it provides an impartial view of a computer systems internal state (Walters and Petroni, 2007). It can be used to detect and analyze hidden processes, network connections, and other artifacts on computers infected by malicious software (Sutherland et al., 2008). Because malicious software must reside somewhere in memory to execute, it is impossible to hide all traces of the infection from RAM (Kornblum, 2006). To remain undetected, software must sub- vert the memory acquisition process to present analysis software with a filtered view of physical memory. There are multiple ways of obtaining a copy of memory, all with different characteristics, constraints, and potential to subversion by malicious software. Hardware-based memory acquisition methods like memory transplantation (Halder- man et al., 2008) and bus attacks (Boileau, 2006) require physical access to the target system, which is not available in remote incident response scenarios. Further- more, new technologies pose problems to established hardware memory acquisition

1 1 Introduction techniques. Memory data scrambling, as employed by DDR3 memory controllers, poses a big challenge to memory transplantation attacks, as the acquired memory is scrambled by undocumented methods that have yet to be deciphered (Gruhn and M¨uller, 2013; Skochinsky, 2014). Bus-based memory acquisition techniques are impeded by the introduction of the Input/Output MMU (IOMMU), which allows software to configure the memory controller to protect certain memory regions from device access (Rutkowska, 2007). Software-based memory acquisition methods don’t require physical access to the tar- get system, but have their own problems. While previous work has shown that it is possible to leverage System Management Mode (SMM) to run memory acquisition software in an environment isolated from the potentially subverted system (Wang et al., 2011), this method requires a firmware modification, which makes it plat- form dependent and not portable. There has also been work to leverage hardware virtualization technology for memory acquisition (Martignoni et al., 2010), which requires the processor to support hardware virtualization and can only work if no other program (including malicious software) has already made use of this. Without the availability of firmware components or virtualization extensions, software has to resort to the operating system level to acquire physical memory. Software memory acquisition techniques at the operating system level are the most versatile ones, requiring no preparation on the target system. They can be used to create a memory image on the local hard-disk, send an image remotely over the network, and even to perform remote, live memory analysis (Cohen et al., 2011). However, because they run with the same privileges as malicious software on a poten- tially infected system, they are prone to subversion by anti-forensic techniques. This thesis aims at improving the resilience of memory acquisition software to subversion, raising the bar for criminals to hide their actions.

1.1 Contributions

This thesis consists of three major parts: First, we analyze the functionality of memory acquisition software and identify anti-forensic techniques that can subvert the acquisition process. Based on our findings, we develop a novel technique that is not subject to subversion by anti-forensics. Finally, we show how our approach can be adapted to solve two other memory forensic problems as well.

Memory Acquisition Operating System Internals In Chapter3, we present a survey on the current state of the art of software memory acquisition. We give an overview on the criteria we use to asses the quality of a memory image, and point out the importance of the correctness of an image for analysis. We then analyze a set of 12 popular memory acquisition frameworks for Windows, Linux, and Mac OS X, and survey their functionality in regard to two major tasks: memory enumeration

2 1.1 Contributions and memory mapping. We find that all tested programs use the operating system to enumerate and map memory, and give an overview of the programming interfaces used for this purpose. Finally, we illustrate the technical details of memory acquisi- tion software by implementing the OSXPmem tool for Mac OS X systems. At the time of release this was the only software able to acquire memory on Mac OS X sys- tems newer than version 10.8. Our results foster a better understanding of memory acquisition software and provide investigators with the much needed capability to acquire memory on recent versions of Mac OS X.

Anti-Memory Forensics Because of the reliance on operating system services to enumerate and map memory, software memory acquisition can be subverted by malicious software running in kernel-mode. In Chapter4, we give an overview of practical anti-forensic techniques on current memory acquisition software. We find that some tools employ undocumented functions to map memory instead of standard operating system interfaces, to avoid previously published anti-forensic techniques that filter the operating system memory interface (Bilby, 2006). We show that these undocumented functions can still be attacked using standard rootkit techniques like inline hooking and direct kernel object manipulation. Furthermore, we present new classes of anti-forensic attacks against memory enumeration, that are able to selectively hide sections of physical memory. To prove our claims, we create proof-of- concept implementations of these attacks for Windows, Linux, and Mac OS X, that disable several operating system programming interfaces used by memory acquisition software. Our evaluation shows that none of the 12 tested forensic tools is able to acquire memory on systems with anti-forensic modifications in place. This work serves as a demonstration on how easy it still is for malicious software to subvert the memory acquisition process, 6 years after the first public demonstration by the DDFY rootkit (Bilby, 2006).

Anti-Forensic Resilient Memory Acquisition In Chapter5, we present a new software memory acquisition technique that does not depend on operating system functionality. Instead of querying the operating system for available memory, we interact directly with the hardware to enumerate memory mapped device buffers in the physical address space. We then map the remaining regions of the physical address space into our drivers virtual address space by directly manipulating the processes page tables. Our evaluation shows that this approach is not vulnerable to the anti-forensic methods presented in the previous chapter. Our technique is implemented into the open source Winpmem (Cohen, 2012), Pmem for Linux (Co- hen, 2011), and OSXPmem (St¨uttgen, 2012) tools, and publicly released within the Rekall memory forensics framework (Cohen, 2014b). Our research raises the bar for malicious software to subvert memory forensic investigations, enabling investigators to detect and analyze malware that was previously invisible.

3 1 Introduction

Kernel Independent Linux Memory Acquisition One of the advantages of our novel memory acquisition techniques is that it does not rely on any operating system functionality. In Chapter6 we utilize this property to solve a key problem in Linux memory forensics: The requirement of having to compile a memory acquisition kernel module specifically for the target system. The reason for this is thatLinux kernel modules are statically linked with the kernel at runtime, which requires them to be compiled with the exact same version of the kernel headers and configuration to be binary compatible. Since our method is kernel independent, we don’t require binary compatibility as long as we are able to load our module and communicate with it. To achieve this goal we implement a custom linker that is able to inject our memory acquisition module into a compatible host module on the target system. By modifying the relocation tables of the host, we instrument its data structures for communication with the kernel. This is stable because the host module was compiled with the correct configuration and headers, so we are actually using compatible data structures. Our method allows us to create a memory acquisition program that can be distributed as an executable and does not need to be re-compiled for the target system. This reduces the amount of preparation necessary to acquire memory on Linux systems, relieving investigators and shortening response times.

Acquisition and Analysis of Compromised Firmware In Chapter7, we ex- plore a second innovative property of our new memory acquisition technique. Since our hardware-based memory enumeration method provides us with the location of memory mapped device buffers in the physical address space, we can safely access memory regions that are unknown to the operating system. This allows us to acquire code and data from the system firmware. We analyze the physical address space for firmware related regions, and conduct an experiment that shows we are ableto acquire the BIOS, PCI option ROMs, and the ACPI tables this way. We develop plugins for the open source Volatility (Walters, 2014) memory analysis framework, that extract the ACPI tables from a memory image and scan them for malicious behavior. We evaluate these tools by implementing a proof-of-concept ACPI rootkit, which we successfully detect using our methods. The developed tools and techniques enable investigators to acquire and analyze malware at the firmware level, which was previously impossible with memory forensic tools.

1.2 Related Work

The focus of this thesis lies on software memory acquisition techniques on the oper- ating system level. However, there has been a considerable research effort on other methods, which we will outline in this section.

Memory Acquisition Using Hardware Virtualization To isolate memory ac- quisition software from the potentially subverted operating system, previous work

4 1.2 Related Work has suggested to leverage hardware virtualization extensions, which are available in most recent x86 processors (Intel Corporation, 2014b). They allow the memory acquisition program to load a Virtual Machine Monitor (VMM) to isolate itself from the operating system, making it impossible for malicious software to manipulate the acquisition software. By virtualizing physical memory on the fly, it is possible to create a memory image without the inconsistencies caused by concurrent system ac- tivity. Previous work includes the Hypersleuth framework (Martignoni et al., 2010) and Vis (Yu et al., 2012). The major limitation of memory acquisition software based on hardware virtualiza- tion is the requirement of loading the VMM first. If another VMM is already active this approach cannot work, as there can only be one VMM active at the same time. This becomes a problem when malicious software makes use of a VMM to hide its activities from the operating system, as has been successfully demonstrated in the past (King and Chen, 2006; Rutkowska, 2006).

Firmware Assisted Memory Acquisition Firmware can leverage SMM to per- form management tasks transparently to the operating system. This mode can only be entered through a System Management Interrupt (SMI), and its code and data are protected by the memory controller in a special region of memory called System Management RAM (SMRAM). Wang et al.(2011) proposed to leverage this mode as a trusted and protected execution environment for memory acquisition software. SMMDumper (Reina et al., 2012) is a proof-of-concept that implements this idea. It is delivered as a firmware upgrade and injects itself into SMM on system boot. It then modifies the configuration of the interrupt controller to redirect keyboard interrupts to an SMI, where they are filtered for a specific command that initiates memory acquisition. When this command is received, SMMDumper directly ac- cesses the network card and sends the contents of physical memory to an analysis system over the network. While this method is resilient to subversion from the operating system level, there are a number of limitations that make it impractical for most cases at the moment. To install the program in SMM a firmware update is required, since SMRAM is locked before the firmware passes control to the operating system. This requires vendor support because firmware updates are cryptographically signed, and also involves a reboot to install the firmware update. It is also a platform dependent solution, because the software has to be adapted to work with the systems firmware as well as ship with custom SMM drivers for the network card. The required amount of preparation and custom development for the specific target platform disqualifies this approach for most incident response scenarios.

Cold boot attacks Halderman et al.(2008) have shown that by using simple cool- ing techniques it is possible to preserve memory contents over significant periods of time without power. Because memory cells are essentially capacitors, they don’t

5 1 Introduction loose their state immediately when not powered. In fact, the stored charge slowly drains over time and needs to be refreshed periodically. The time until a memory cell looses it’s charge can be dramatically extended by lowering its temperature. By cooling memory modules to -50℃with a simple can of compressed air over 99.9% of the data can be recovered after a period of 60 seconds without power (Halder- man et al., 2008). This allows investigators to quickly transplant the memory of a computer into another system, which then copies it’s contents to persistent storage. However, recent advances in DRAM technology have made this technique consid- erably more difficult. Intel DDR3 integrated memory controllers mangle each data word with an undocumented scrambling system to reduce the effects of excessive di/dt, caused by successive 1s and 0s, on the data bus (Intel Corporation, 2013). Recent patents by Intel suggest the use of a Linear Feedback Shift Register (LFSR) (Mozak, 2011), which is randomly seeded at system boot by the system firmware. Without knowing the LFSR polynom and seed for a specific system configuration it is impossible to recover the original contents of memory as seen by the system. At the time of writing there is no publicly known way of de-scrambling the con- tents of DDR3 modules and further research is indicated (Gruhn and M¨uller, 2013; Skochinsky, 2014).

Warm Reboot Attacks Depending on the installed firmware and system config- uration, memory contents are sometimes preserved over warm reboots (Chow et al., 2005). If an investigator has the ability to boot a system from a custom medium like an USB flash drive, he can boot a small acquisition OS to create a memory image after forcing a warm reboot (Vidas, 2010). Because firmware might clear memory or at least overwrite some memory regions during boot, this method is not reliable and should only be used as a last resort. If it fails there is no other way to obtain a memory image as contents of memory have been irreversibly altered.

Direct Memory Access Attacks Any device with bus master capability is able to initiate Direct Memory Access (DMA) transactions on the Peripheral Component Interconnect (PCI) bus without involving the CPU (PCI-SIG, 2002). This alleviates the CPU from handling data transfers from devices to memory and vice versa. Since bus mastering basically allows any PCI device to read arbitrary memory regions, it is possible to acquire a memory image with a specially crafted PCI device. There have been multiple proof-of-concept implementations like the Tribble (Carrier and Grand, 2004), CoPilot (Petroni et al., 2004) and FRED (BBN Technologies, 2006) PCI cards. However, such cards need to be installed in the target system prior to an incident and are not generally available. There is only one commercially available product (WindowsSCOPE, 2014), which is very expensive 1. The IEEE 1394 (Firewire), Thunderbolt and ExpressCard protocols all allow for DMA and thus can be utilized for memory acquisition (Hermann, 2014). There

1 At the time of writing, the Windowsscope CaptureGUARD PCIe card cost 7999$

6 1.3 Publications has been significant work on reading and writing to RAM through Firewire (Becher et al., 2005; Boileau, 2006), and open source software for memory acquisition is readily available (Witherden, 2010; Maartmann-Moe, 2013). However, modern x86-64 systems implement an IOMMU that allows to remap and block memory transactions to and from devices (Intel Corporation, 2014d). This allows software to configure the memory controller in a way that protects arbitrary memory regions from rogue devices performing DMA(Rutkowska, 2007). Also, some operating systems disable DMA when the system is put to sleep to prevent DMA attacks when the device is stolen. Mac OSX Lion and later implement this protection with FileVault 2 (Garrison, 2011).

1.3 Publications

Parts of this thesis are based on three peer-reviewed academic papers the author has presented at international conferences over the past three years. To improve readability, we will not cite every section adapted from these papers again. This section serves as an overall reference that attributes each paper to the relevant chapters it was included in. In our paper “Anti-Forensic Resilient Memory Acquisition” (St¨uttgenand Cohen, 2013), written together with Michael Cohen, we created a software based memory acquisition technique that is resilient to current anti-forensic methods. The PCI memory enumeration technique was developed by Michael Cohen, while the author of this thesis created the anti-forensic survey as well as the memory mapping technique. The resulting research paper was mostly written by the author, with exception of the introduction, conclusion and PCI enumeration sections. Parts of this paper are used in Chapters3,4, and5, but have been significantly expanded to more thoroughly cover all aspects of software memory acquisition. Based on the techniques developed in our previous paper, we have developed a method to inject memory acquisition kernel modules into arbitrary kernels on Linux. The research paper “Robust Linux memory acquisition with minimal target impact” (St¨uttgenand Cohen, 2014) was created under the guidance of Michael Cohen. All of the software development was accomplished by the author of this thesis, and, with the exception of the introduction, the resulting research paper was also written by the author of this thesis. The publication was honored with the best paper award at the DFRWS EU conference in Amsterdam, 2014. It forms the foundation of Chapter6. Chapter7 explores the memory acquisition capabilities developed in our previous work to acquire and analyze malicious firmware on x86 systems. It is based onour publication “Acquisition and Analysis of Compromised Firmware Using Memory Forensics” (St¨uttgenet al., 2015), which was created together with Stefan Voemel and Michael Denzel. The memory acquisition software, the BIOS and PCI option

7 1 Introduction

ROM experiments and the majority of the research paper were written by the author of this thesis. Stefan Voemel composed the introduction and parts of the background section, while Michael Denzel created the ACPI Volatility plugins and performed the ACPI evaluation under the guidance of the author.

1.4 Outline

This thesis is organized as follows: Chapter2 provides the technical background necessary to understand our techniques. In Chapter3, we present a survey on the practical details of software memory acquisition. Chapter4 focuses on illustrat- ing anti-forensic techniques against current software memory acquisition methods. Chapter5 incorporates our insights into an anti-forensic resilient memory acquisition approach. In the next two chapters we explore the new capabilities of this approach. In Chapter6, we create a Linux memory acquisition module that is compatible with a wide range of kernels without recompilation, by combining kernel module infection techniques normally utilized by rootkits with our new memory acquisition method. Chapter7 focuses on using our technique to acquire malicious firmware. Finally, we conclude our work in Chapter8, and present opportunities for future work.

8 Chapter 2

Technical Background

In this chapter, we will illustrate the basic concepts and techniques this thesis is built on. Our explanations provide the reader with the specialized technical knowledge necessary to understand our work. This is not intended to be a complete description of the x86-64 architecture, but a compact primer on the concepts utilized within this thesis. An exhaustive explanation of every architectural detail can be found in the work of other authors (Intel Corporation, 2014b; Corbet et al., 2005; Salihun, 2006).

Outline of the Chapter

This chapter is organized as follows: Section 2.1 presents an overview of the archi- tecture of x86-64 systems, focusing on memory organization and management. We explain the different address spaces, as well as the components that route memory transactions through the system, most notably the Peripheral Component Intercon- nect Express (PCIe) bus. Section 2.2 then introduces the structure and operation of kernel modules for the Linux platform. Here we show how modules are organized, linked and loaded. Finally, Section 2.3 introduces the system firmware, its different components, and how they work. This chapter can be read selectively depending on the readers experience and goal. Figure 2.1 depicts the requirements of each major area of the thesis. Section 2.1 illustrates core concepts utilized everywhere in this thesis and therefore should al- ways be studied by the reader. In addition, Chapter6 assumes an understanding of Linux kernel modules, which requires reading Section 2.2. Chapter7 focuses on the system firmware, which is explained in Section 2.3.

2.1

3 4 5 2.2 2.3

6 7

Figure 2.1: Organization of the Background Chapter

9 2 Technical Background

CPU

Memory

Graphics Card Chipset

Graphics Bus Memory Bus North-Bridge (PCIe) (DDR3)

South-Bridge

PCI Bus PCI Bus

PCI Device PCI Device PCI Device

Figure 2.2: Architecture of a North- and South-Bridge Based Chipset

2.1 x86 Architecture

In this section, we illustrate the core components that implement memory man- agement on Intel x86-64 Central Processing Units (CPUs) with the Intel Core ar- chitecture. We focus on x86-64 systems in particular, as all of the work done in this thesis is targeted towards this architecture. Some of the finer details like the specific bus technology used or the exact location of some components differon Ad- vanced Micro Devices (AMD) based systems. However, the data structures relevant to memory routing and mapping are standardized and apply equally to computers with an AMD based CPU and chipset. The information we provide is prerequisite to understanding the technical details of this thesis. Most of this section is based on an article by Drepper(Drepper, 2007), as well as the Intel System Architecture Manual (Intel Corporation, 2014b) and Intel Architecture Whitepaper (Turley, 2014). Readers interested in the architectural differences of AMD based systems are referred to the AMD64 Architecture Programmers Manual (Advanced Micro Devices, 2011). Modern x86 computers consist of a multitude of interconnected components. There are CPU cores running the actual code, devices for input and output, and Ran- dom Access Memory (RAM). The system tying these modules together is called the chipset. For a long time, typically consisted of two components, the north- and the south-bridge. The north-bridge was responsible for high-speed devices like the graphics card and RAM. The south-bridge connected lower-speed devices like

10 2.1 x86 Architecture

CPU

CPU Core CPU Core

Memory Graphics Card Host Bridge Graphics Bus PCIe (PCIe) Memory Bus iMC DMI (DDR3)

PCH

DMI SATA

LPC USB BIOS ROM

PCIe SPI

PCIe Bus Devices

Figure 2.3: Architecture of a modern PCH based chipset

network interfaces or controllers for persistent storage. The architecture of such a system is illustrated in Figure 2.2.

In a classical north- and south-bridge architecture like the Intel 815, the entire chipset is located on the mainboard (Intel Corporation, 2000). The CPU is connected to the north-bridge over the Front Side Bus (FSB). North- and south-bridge are also tied together with a dedicated interface. On more modern Intel chipsets like the Platform Controller Hub (PCH) this is done using the Direct Media Interface (DMI)(Intel Corporation, 2014c).

For performance reasons, Intel started to integrate northbridge functionality into the CPU starting with the PCH architecture (Intel Corporation, 2009). In the PCH architecture, all north-bridge functionality is handled by a component in the CPU called the host bridge. The host-bridge consists of a PCIe controller, an Integrated Memory Controller (iMC) and a DMI interface connecting it to the PCH.

11 2 Technical Background

The PCH also has PCIe functionality, to which all faster Input/Output (I/O) devices are connected. Other protocols like Serial Advanced Technology Attachment (SATA) or Universal Serial Bus (USB) are used to communicate with peripheral devices like hard-disks or the keyboard. The controllers for these protocols are located in the PCH and connected to the PCIe bus. Slow legacy devices like the RS-232 serial interface, PS/2 keyboards and mice are connected to the Low Pin Count (LPC) bus on the PCH. Finally, the Basic Input Output System (BIOS)1 flash chip is attached to the PCH through the Serial Peripherial Interface (SPI). A more detailed (but still incomplete) depiction of a modern PCH based system is provided in Figure 2.3 (Intel Corporation, 2014c).

2.1.1 The Physical Address Space

Among other interfaces, each CPU is connected to its hostbridge via an address bus. While its main purpose is addressing of RAM, it is also used for Memory Mapped Input/Output (MMIO)(Intel Corporation, 2014b, Chapter 14). MMIO is a form of I/O, where the registers and/or memory of devices are mapped into the CPUs view of memory by the hostbridge. Thus, whenever the CPU attempts to read data from a physical address the result is not always a memory read. There are a large number of devices that are mapped into the physical address space for performance reasons (interrupt controllers, graphics cards, firmware network cards, etc.). The physical address space is the set of all valid addresses on the CPUs memory address bus. Note that there is also an address space called the DMA or bus address space. This refers to the physical address space from the view of devices performing DMA. Because the hostbridge can remap addresses coming from devices, the view of a device on the physical address space can be different than that of the CPU(Miller et al., 2015). However, this is not important for this thesis as we focus on software memory acquisition running on the CPU.

The Memory Bus When the CPU initiates a memory transaction it travels through the hostbridge, which is responsible for routing the transaction to the appro- priate device (Salihun, 2014). The hostbridge decodes the target of the transaction and chooses to either forward it to the memory controller, if the target is memory, the integrated Graphical Processing Unit (GPU), if it targets the GPU, or to the DMI controller, if the target is unknown. The DMI connection interfaces the CPU package with the PCH, to which all lower speed devices are connected. The PCH also has a memory target decoder logic which is responsible for forwarding transactions from the DMI interface to the correct device and vice versa.

1 Since 2005, more and more vendors are replacing the BIOS with the Unified Extensible Firmware Interface (UEFI)(Zimmer et al., 2010)

12 2.1 x86 Architecture

CPU Package

Integrated Memory Graphics Graphics Card Hostbridge

PCIe Root Memory Complex Controller

Memory Tar- get Decoder

PCIe Devices DMI Controller Wifi

PCH

DMI Ethernet BIOS ROM Controller

PCIe SPI

Sound Memory Target Decoder

LPC USB

Figure 2.4: Routing of a memory transaction through the hostbridge and PCH to the sound card on a Haswell system (Salihun, 2014)

Figure 2.4 illustrates the routing of memory transactions through the chipset of a Intel 4th Generation Core Architecture (Haswell) system. The red line illustrates the path a transaction takes when the CPU reads from an address that is mapped to a buffer in the sound card. The memory target decoding logic in the hostbridge looks up the range and finds it assigned to the PCH. The read is thus passed through the DMI interconnect to the PCH. It is then decoded and forwarded to the PCI controller. The PCI controller then initiates a PCI transaction for this address. Finally, the transaction is claimed by the sound card and the data is passed back upstream until it reaches the CPU.

System Address Ranges The physical address space is the set of all addressable memory addresses and contains all system address ranges. While the x86-64 address space defines physical addresses to be 64 bit long, the effective width of theaddress

13 2 Technical Background

512GB

PCI Memory

TOUUD (RECLAIM BASE + X) Main Memory TOM Reclaim ME-UMA RECLAIM BASE MESEG BASE

Main OS Visible Memory >4GB

4GB Flash, 4GB 0xFEC00000 APIC LT OS X Invisible Reclaim PCI Memory GFX GTT TOLUD TOLUD TSEG TSEG TSEGMB TSEGMB

Main Memory OS Visible <4GB 1MB Legacy

0 0

Figure 2.5: Memory Map of a Haswell System (Intel Corporation, 2013)

bus, and thus the size of the address space, is implementation specific (Intel Cor- poration, 2014b, Section 3.3). All physical addresses that are routed to the same device are called a system address range. The exact number and location of all sys- tem address ranges depend on the chipset, firmware, and installed devices. Because of this, the physical address space layout is different between most systems. The exact location and size of these segments is stored in registers inside the memory controller, as well as on the mapped devices themselves. A detailed explanation on how device memory is accessed and configured is the focus of Section 2.1.3.

Figure 2.5 illustrates the memory map of a Haswell system with more than 4 GiB of RAM. The physical address space (as seen by the CPU) is shown on the left, the actual physical memory is on the right. The CPU has 39 address lines supporting an effective physical address space of 512 GiB. This information is taken directly from the Haswell datasheet (Intel Corporation, 2013), where further details are available if necessary.

The address space begins with the Disk Operating System (DOS) legacy range, which occupies the first 1 MiB. The first 640 KiB are always mapped to RAM, while the remainder is mapped according to the Programmable Attribute Map (PAM) registers, depending on how it is used by the system firmware (see Section 2.3).

14 2.1 x86 Architecture

Next to the legacy range lies a large block of RAM that is directly mapped. Its end is determined by the location of the Top of Main Memory Segment (TSEG) range. All physical memory above this address is inaccessible to the Operating System (OS) and needs to be remapped. The location of the TSEG range depends on the value of the TSEGMB register. It is used by the system firmware and is only accessible in SMM. The entire range is thus inaccessible from normal CPU operations. The Top of Lower Usable DRAM (TOLUD) denotes the border to the first MMIO region. Everything from TOLUD to 4 GiB is reserved for MMIO by the hostbridge in the physical address space. It forwards all transactions from the CPU into this region to the DMI bus for processing by the PCH. Physical memory located in this region needs to be reclaimed somewhere else in the address space, as it is inaccessible from this location. Depending on the graphics device used, the graphics card might also use some of the physical memory in this region to store the GFX Translation Tables (GTT). Also, if the CPU features an internal GFX card, it will use some of the memory in this region. RAM located above 4 GiB is directly mapped into the physical address space and is referred to as another main memory address range. At the top of this range, RAM that was shadowed by the first MMIO range (OS invisible, reclaim) is remapped. The Top of Upper Usable DRAM (TOUUD) register marks the end of this range. The remainder of the physical address space is again used for MMIO and decoded to the DMI bus. Firmware will map device memory individually in a non-overlapping way somewhere into this range. Most CPUs also feature an embedded management processor, which is called Man- agement Engine (ME) on Intel systems. TheME can allocate part of physical mem- ory for its own use, which is depicted as theME-Uniform Memory Access (UMA) region. This memory is not accessible by any other device other than theME.

2.1.2 Memory Protection

For security and stability reasons, most modern computer architectures offer a fea- ture called virtual memory. In this concept each program runs inside its own “virtual address space”, isolated from all other programs. A special component in the CPU called the Memory Management Unit (MMU) is responsible of mapping all virtual address spaces into the physical address space. In this thesis we refer to this process as paging2. Paging is controlled by data structures managed by theOS, which are then parsed by the MMU. When paging is turned on the MMU will translate all addresses before

2 Every x86 CPU also supports a memory translation model called segmentation. However, seg- mentation is largely disabled when operating in 64 bit mode, which is why we ignore it in this thesis (Intel Corporation, 2014b, Section 3.3.3)

15 2 Technical Background they are put on the address bus. This mechanism further allows to map files directly into memory or page out parts of unused memory to disk, effectively using RAM as a cache for slower persistent storage. The MMU maintains a cache called the Translation Lookaside Buffer (TLB) to avoid having to walk the page tables for every memory access repeatedly. Intel x86-64 processors support two different implementations of address translation, two-level paging for 32 bit code (IA-32) and four-level paging for 64 bit code (IA- 32e) (Intel Corporation, 2014b). Since our focus in this thesis is on 64 bit operating systems only, we will limit our explanation to IA-32e paging.

The Virtual Address Space The set of virtual addresses the CPU can access is called the virtual address space. On x86-64 systems all virtual addresses have a size of 64 , resulting in a virtual address space of 264 (16 EiB) (Intel Corporation, 2014b, Section 3.3.7). However, implementations of the architecture can choose to use a smaller size of effective virtual address to improve efficiency. Intel x86-64 implementations today use an effective address length of 48 bits, resulting in a virtual address space of 248 bytes (256 TiB). All unused bits in 64 bit addresses are sign extended, to equal the most significant bit of the effective address (Intel Corporation, 2014b, Section 3.3.7.1). In Figure 2.6, the virtual address space of a typical x86-64 Linux process is laid out. The exact usage layout of the virtual address space depends on the implementation, the contents of user- and kernel-space in this figure are just an example. The paging datastructures are managed by the kernel, so the virtual address space can look different on systems with disparate operating systems. Canonical addressing divides the address space into two halves, separated by the non-canonical space. Memory accesses to non-canonical addresses generate a general protection fault (Intel Corporation, 2014b, Section 3.3.7.1), which is why the whole range from 0x0000800000000000 to 0xffff7f0000000000 can be considered non- existing. In higher half kernel architectures like Linux, this separation is used to divideOS and user code (Tanenbaum and Bos, 2014). The lower half is used by the process itself and is called userspace. It contains the programs code and data, as well as mapped files and libraries. Userspace spans from 0x0000000000000000 to 0 x00007fffffffffff, for a total size of 128 TiB. Because changing the address space to perform work in the kernel flushes the TLB which causes a performance hit, operating systems like , Apple Mac OS X (OS X) and Linux use the higher half of each process address space to map a view of the kernel. Depending on the implementation there are one or more kernel stacks, heaps and the kernel code and data segments. The upper half is called kernel space and also has a size of 128 TiB.

16 2.1 x86 Architecture

0xffffffffffffffff Kernel Space

...

128 TiB Kernel Data

Kernel Code 0xffff800000000000

Non-Canonical Addresses

0x00007fffffffffff User Space

Stack

Mapped Files 128 TiB

Heap

Data

Program Code 0x0000000000000000 Figure 2.6: Virtual Address Space on an x86-64 System with 48 Address Bits

Code running in user-space does not have access to kernel-space memory. This is ensured by the MMU by matching the privilege level of the running code with the level of the memory region. Privileges follow a ring model, where the operating system runs in the innermost ring (0) with maximum privileges, while user-mode code runs in the outermost ring (3) with restricted access to memory. The details are out of the scope of this thesis though, interested readers are referred to the Intel Software Developers Manual (Intel Corporation, 2014b, Volume 1, Section 6.3.5).

Paging Datastructures IA-32e paging uses a 4-layer paging hierarchy to translate a virtual address into a physical address. This means there are 4 different kinds of tables, which are traversed by the MMU during page translation to find the physical address for a given virtual address. The results of these lookups are cached in the TLB, so the lookup does not have to be repeated for subsequent accesses into the same page. Figure 2.7 depicts the data structures involved for translating a 4 KiB page. IA-32e paging also supports page sizes of 2 MiB and 1 GiB. For translation the virtual

17 2 Technical Background

Physical Memory Virtual Address 47 39 38 30 29 21 20 12 11 0 Physical Page PML4 Directory Ptr. Directory Page Table Offset

Physical Address 12 PML4 PDPT PD PT

9 9 9 9 PML4E PDPTE PDE PTE 40

40 40 40

CR3 40

Figure 2.7: Datastructures Involved in Virtual to Physical Address Translation (IA- 32e Paging) (Intel Corporation, 2014b, Chapter 4.5) address is separated into table indexes and a page offset. The indexes are used to find the relevant entry to this page in the respective data structure. Afterthe physical page has been found, the offset is added to find the actual physical address of a specific .

The CR3 register always points to the physical address of the first level data structure called the Page Map Level 4 (PML4). This table contains 512 PML4 Entry (PML4E) of 64 bit size. It then uses the first 9 bits of the virtual address as an index intothe PML4 to find the PML4E for this specific page. The PML4E contains, among flags and other bookkeeping data, the physical address of the Page Directory Pointer Table (PDPT) that manages the range of pages for this virtual address. The MMU then looks up the corresponding glspdpte by using the next 9 bits from the virtual address as an index. Starting with the PDPT Entry (PDPTE), the Page Size (PS) flag becomes impor- tant. If it is set to 1, the PDPTE contains the physical address of a 1 GiB page encompassing this virtual address. If not, it references the Page Directory (PD) responsible for this address. The MMU then interprets the next 9 bits of the virtual address as an index to find the corresponding PD Entry (PDE) for this page. The PDE is then checked for itsPS flag. If it is set to 1, the PDE contains the physical address of a 2 MiB page. Otherwise, it points to a Page Table (PT) for this address range. Again, 9 bits from the virtual address act as an index into thePT, selecting the corresponding PT Entry (PTE). Finally, the PTE contains the physical address of a 4 KiB page. The MMU then uses the remaining 12 bits as an offset into the page to compute the final physical address of the virtual address.

18 2.1 x86 Architecture

Please note that we have only skimmed the surface of x86-64 paging. Most of the details used for paging memory to disk or mapping files are not important for the understanding of this thesis, and have been implemented in different ways on indi- vidual operating systems. A more detailed explanation on paging, memory mapped files, shared memory and all other operating system specific memory management internals can be found in the work of other authors (Intel Corporation, 2014b; Russi- novich et al., 2009; Duarte, 2009; Gorman, 2004; Levin, 2012).

2.1.3 The PCI Express Bus

The PCIe bus is a high-performance, general purpose I/O interconnect used to link most devices in modern computers with the chipset. It replaces the older parallel PCI bus with a faster serial bus. Examples of PCIe devices in computer systems include graphics and network cards, as well as SATA and USB controllers connecting peripheral devices. While the PCIe bus implements a new protocol and architecture, it is still backwards compatible to PCI and supports the legacy PCI configuration mechanism. Because especially Chapters5 and7 make use of certain PCIe features, we are going to give a short introduction on the architecture, protocol and configuration of PCIe. Our explanations are by no means complete, we deliberately skip most of the low level details like error correction, flow control and the physical link. For more detailed information we refer the reader to the official specification, on which this section is based (PCI-SIG, 2010a).

PCIe Protocol Similar to common networking protocol stacks such as the Internet protocol suite (TCP/IP), PCIe is a layered, packet switched protocol. Figure 2.8 provides a brief overview on the different layers. Packets are formed in the trans- action layer, then passed down the stack until they are actually transmitted over the wire. When they arrive at the other endpoint, the individual layers decode the packets and extract the data. On the physical layer, PCIe devices are connected through lanes, which are full duplex serial connections. A lane consists of two differential signaling pairs, one to Receive (RX) and one to Transmit (TX). Lanes can be bundled to links of 1x, 2x, 4x, 8x, 12x, 16x, and 32x lanes, where data is divided onto the lanes by bytewise striping. The data link layer handles link management and data integrity. For link manage- ment it can construct its own packets that don’t have a transaction packet embedded. For integrity it handles error detection by attaching and verifying error detecting codes, as well as retransmitting erroneous packets. The transaction layer creates packets that communicate events such as memory reads, writes or signals. It also implements a credit based form of flow control. It

19 2 Technical Background

Transaction Transaction

Data Link Data Link Physical Physical RX TX RX TX

Figure 2.8: PCIe Protocol Layers (PCI-SIG, 2010a) supports four different address spaces for a transaction: Memory, I/O, Configuration and Message. Memory transactions are used to transfer data using MMIO, while I/O transactions use the CPUs I/O space. Configuration transactions are used to access a devices configuration space, which we will further explain in Section 2.1.3. Finally, message transactions are used for signaling between devices, for example to trigger interrupts.

PCIe Fabric Architecture PCIe is a point to point protocol. The set of all links between interconnected components is referred to as a fabric or hierarchy. An illustration of a PCIe fabric is provided in Figure 2.9. The fabric is composed of a root complex, multiple endpoints, a switch and a PCIe to PCI bridge. The root complex is the root of the PCIe hierarchy and connects the CPU to the PCIe fabric. It can support one or more PCIe root ports, which each define a separate hierarchy domain. Each domain in turn can be composed of one or more endpoints, switches or bridges. For example, the root complex in Figure 2.9 connects four domains: GPU, PCI, Memory Controller and a switch with four endpoints. Root ports do not have to be physically located in the root complex. For example, on a Haswell system the root complex is located in the CPU and provides ports to the integrated GPU, the memory controller and the PCH. However, the port for the PCH is linked through the chipset interconnect (Salihun, 2014) and physically located on the PCH(Intel Corporation, 2014c). So even if the PCH appears to have its own root complex, the PCHs root port is actually linked to the root complex in the CPU via DMI. PCI-to-PCI bridges are the “routers” of the PCIe fabric. They have a primary (ingress) and secondary (egress) port, and can forward transactions from one port to the other in both directions. Each bridge is configured with a specific memory range and will claim all transactions that fall into that range on its ingress port.

20 2.1 x86 Architecture

CPU

PCIe Endpoint (GPU) PCIe Endpoint (Memory) PCIe PCIe Root Complex iMC PCIe PCIe to PCI Bridge PCIe

Switch PCIe PCIe

PCIe PCIe

PCI

Legacy Endpoint Legacy Endpoint PCIe Endpoint PCIe Endpoint

Figure 2.9: PCIe Architecture (PCI-SIG, 2010a)

Switches are logical assemblies of multiple virtual PCI-to-PCI bridges. To config- uration software they look like multiple bridges. They forward transactions using memory address based routing, just like PCI-to-PCI bridges. Finally, there are bridges to other protocols such as PCI.A PCIe to PCI bridge must comply with PCIe specifications on its PCIe port and connects a legacy PCI bus to the PCIe fabric.

PCIe Configuration PCIe configuration is supported via two different mecha- nisms: the legacy PCI compatible Configuration Access Mechanism (CAM) and PCIe Enhanced CAM (ECAM). CAM is binary compatible with the old PCI config- uration mechanism and is accessed through the CPUs I/O space, while ECAM is an extension to increase the size of the configuration space and only available through MMIO. The techniques we developed in this thesis only access information that is available in both configuration spaces. For simplicity and backwards compatibility we decided to use the CAM mechanism, which is why we will not explain the extended config- uration mechanism here. Readers interested in the details of ECAM configuration can find more information in the PCI specification (PCI-SIG, 2010a).

21 2 Technical Background

012781011151623243031 E Device Function Reserved Bus Number Register 0 0 N Number Number

Figure 2.10: PCI Configuration Space Addressing (PCI-SIG, 2010a)

Configuration transactions follow a PCI compatible addressing scheme, by which an address consists of 3 parts: Bus, Device, Function (BDF), separated by colons and a dot. For example the address of the host bridge is usually 00:00.0, implying bus 0, device 0 and function 0. The bus number refers to the PCI legacy bus topology of parallel buses linked via PCI-to-PCI bridges. This terminology has been carried over to PCIe for compatibility reasons, so buses correspond to links in the fabric managed by a specific bridge. The device number corresponds to exactly one device on a specific bus. A device is allowed to implement multiple independent services called functions. Each function must provide its own configuration space, which can be addressed with the function number. CAM is accomplished through two Double Word (DWORD) sized registers in the systems I/O space, CONFIG_ADDRESS (0xCF8) and CONFIG_DATA (0xCFC). Software can access data in the configuration space by first writing a configuration address to CONFIG_ADDRESS, and then reading or writing the selected DWORD through CONFIG_DATA. The format of the CONFIG_ADDRESS register follows the BDF notation of function addressing, as depicted in Figure 2.10. The first bit (EN) is an flag that enables translation of I/O read/writes to PCI configuration space transactions by the host bridge. It must be set to 1 for all configuration space access. Bits 24-31 are reserved for future use. The bus number is encoded as an 8 bit integer, allowing for 256 different buses per PCI domain. The device number occupies 5 bits, for 32 devices per bus. The next 3 bits are used for the function number, for a maximum of 8 functions per device. Finally, there are 6 bits that select the appropriate DWORD inside of the configuration space. This leads to a total of 256 bytes of configuration data. Because configuration space access must be DWORD aligned, the last 2 bits are always set to zero. Each PCIe function must implement a configuration space. While CAM configura- tion space has a size of 256 bytes and ECAM even fits 4096 bytes, the only strictly defined part of configuration space is the configuration space header. It is locatedin the first 64 bytes of configuration space. The layout of memory behind the header is implementation specific and organized into a linked list of so called capabilities. The capability pointer in the configuration header points to the start of this list. There are two different types of configuration space headers, type 0 for endpoints and type 1 for the root complex, bridges and switches. Because we are only interested in functionality to determine device enumeration and memory mapping, we will ignore most of the details and focus only on the relevant fields in each header. For more

22 2.1 x86 Architecture

08162431

Device Identifier (ID) VendorID 00h

Status Command 04h

Class Code RevisionID 08h

BIST Header Type Latency Timer Cache Line Size 0Ch

Base Address Register 0 10h

. .

Base Address Register 5 24h

Cardbus CIS Pointer 28h

. .

Expansion Read Only Memory (ROM) Base Address 30h

Reserved Capabilities 34h

. .

Deprecated Interrupt Pin Interrupt Line 3Ch

Figure 2.11: PCIe Type 0 Configuration Space HeaderPCI-SIG ( , 2010a) information on other parts of these headers and the capability list, we encourage the reader to consult the PCI specification (PCI-SIG, 2010a, Chapter 7.4). Figure 2.11 shows a redacted version of a type 0 configuration header. The first 16 bytes are identically laid out in both header types and used for general device control and bus enumeration. The vendorID is a 16 bit integer and assigned by the PCI Special Interest Group, who also maintain a list of all vendors and theirIDs (PCI-SIG, 2015). The deviceID is assigned by each vendor individually to uniquely identify each device. The command register contains flags that control the behaviour of the device, for example if it responds to memory or I/O transactions, or if it is allowed to issue those transactions (bus mastering). Finally, the header type field defines the further layout of the configuration header. Its most significant bitalso specifies if the device supports multiple functions. Memory and I/O space mapping of device memory is performed using the Base Address Register (BAR). Type 0 configuration headers have 6 BAR located adjacent to the fixed part of the header. Each BAR defines the start of a memory range that the device maps into the physical memory address space. The exact layout of a BAR is illustrated in Figure 2.12. The first 28 bits determine the address of the range. The prefetchable flag is hardwired by the device to show if reads from this memory region have side effects on the device. If it is set, this memory region is guaranteed

23 2 Technical Background

PrefetchableType I/O 31 01234 0 00 Base Address 0 1 10

Figure 2.12: PCI 32-Bit MMIO BAR Layout (PCI-SIG, 2002)

08162431

DeviceID Vendor ID 00h

. .

Base Address Register 0 10h

Base Address Register 1 14h Subordinate Bus Secondary Bus Primary Bus Sec. Lat. Timer 18h Number Number Number . .

Memory Limit Memory Base 20h

Prefetchable Memory Limit Prefetchable Memory Base 24h

Prefetchable Base Upper 32 Bits 28h

Prefetchable Limit Upper 32 Bits 2Ch

. .

Bridge Control Interrupt Pin Interrupt Line 3Ch

Figure 2.13: PCIe Type 1 Configuration Space Header (PCI-SIG, 2010a) not to cause side effects on the device when read. The type field indicates ifthe BAR references a 32 bit (00) or 64 bit (10) region. For 64 bit regions the BAR is extended with the next BAR in the header, interpreted as the most significant part of the address. Finally, the least significant bit indicates if the BAR references a MMIO or an I/O space region. I/O BARs have a slightly different layout, but are not important for this thesis.

Software can determine the size of BAR ranges by writing a sequence of all 1s to the BAR. Devices must hardwire all address bits to zero in a way that performing a bitwise not operation on the result and then adding 0x01 yields the size of the range. For example if software writes 0xFFFFFFFF to a BAR and then reads 0xFFFFFFE0, it performs a bitwise not (0x00000001F) and adds 0x01 to obtain a size of 32 (0x20). Because the lower 4 bits are used as flags, the minimum size ofa BAR region is 16 bytes and those bits are set to 0 for this calculation.

24 2.2 Linux Kernel Modules

The type 1 configuration header starts to differ from type 0 on byte 16. For thesake of brevity we are going to focus only on transaction routing and memory mapping related fields. For more details we encourage the reader to refer tothe PCI-to-PCI bridge specification (PCI-SIG, 1998), which we use as our main reference for this section. A redacted depiction of a type 1 header is provided in Figure 2.13. This header type only has two BARs, which have the same meaning as in type 0 headers. The bus numbers in the next DWORD describe the bus topology in PCI notation. They are ignored by PCIe, but still set to be compatible with legacy software. The primary bus number is the number of the PCI bus to which the primary bridge interface is connected. The secondary number thus denotes the number of the bus connected to the secondary interface, while the subordinate bus number denotes the highest PCI bus that is behind the bridge. Memory transaction routing is performed through the memory base and limit regis- ters. If the memory limit register is set a lower value than the memory base register, MMIO forwarding is disabled. In any other case, the bridge will forward all mem- ory transactions that fall into the range between the base and limit on its ingress interface to the egress interface. The minimum size of MMIO ranges for bridges is 1 MiB, thus the lower 20 bits are hardwired to zero in the base and all 1s in the limit register. The prefetchable memory base and limit registers are optional and work the same way, except that the memory ranges they describe have no side effects on reads. In conclusion, MMIO transaction from the physical address space to PCIe devices are routed through the PCIe fabric by PCI-to-PCI bridge compatible nodes depending on their address. Bridges and endpoints are configured by the firmware on system boot and can be relocated by theOS or drivers by programming the endpoint BARs and corresponding bridge memory registers. Software can enumerate this configu- ration by parsing PCI configuration space, which is also available on PCIe based systems.

2.2 Linux Kernel Modules

To foster an understanding for our Linux kernel module injection techniques in Chapter6, we give a short overview on the anatomy of a kernel module and how it is linked and loaded. The Linux kernel does not have a dedicated driver model like Windows orOSX. Instead, drivers are either compiled directly into the kernel or linked with the kernel binary at runtime through a Loadable Kernel Modules (LKM) (Corbet et al., 2005). LKMs are stored in files with the extension .ko and loaded through the insmod and modprobe programs by issuing the system call init_module.

25 2 Technical Background

Linking View Execution View

ELF Header ELF Header

Program Header Table Program Header Table optional

Section 1 Segment 1 ...

Section n Segment 2 ......

Section Header Table Section Header Table optional

Figure 2.14: ELF file layout (based on TIS Committee, 1995)

2.2.1 Module Binary Organization

The executable file format on Linux systems is ELF(TIS Committee, 1995). It is a binary format composed of a generic ELF header, a number of program- and section headers and finally the actual sections/segments which contain program code and data. The ELF header stores information on the file class, programs architecture, endi- anness, entry point and other generic details. There are four classes of ELF files: executables, object files, shared objects and core dumps. Executables are ready to be loaded and run, while object files are intended to be further processed by a linker. Shared objects can be dynamically linked with other objects. Finally, core dumps are created during program crashes to store debugging information (Levine, 1999). The loader relies on the segments to identify the file layout and decide which parts to map into memory with which permissions. The program header table stores infor- mation on the segment types and locations. It is therefore required in an executable but optional in an object file. The linker instead relies on sections to operate on the file. The section header table stores information on section location, typeand size. The section headers are thus mandatory in an object file, but optional in an executable. Because of this dualism, there are actually two disparate views of the same ELF file which are illustrated in Figure 2.14. Depending on which headers are consulted, the internal structure of the file is organized differently, resulting in a linking- and an execution view. Which one is relevant depends on the intended purpose.

26 2.2 Linux Kernel Modules

EXEC REL

......

.got Global Offset Table Relocation Table .rela.text .got.plt GOT Procedure Link Table .text .text ......

DYN REL

......

.text .text ......

Figure 2.15: Static vs. Dynamic Linking

Linux kernel modules are relocatable ELF object files and not an executable. The obvious difference is that executable ELF files are processed by a loader, while relocatable objects are intended for a linker. Dependencies on other objects in an ELF executable are resolved by dynamic linking. In this process, external symbols are referenced through the Global Offset Table.got ( ) and Procedure Linkage Table (.got.plt), and resolved by the dynamic linker at runtime. In contrast to this, relocatable ELF objects are statically linked using relocations. Each section with references to symbols in other sections or objects has a corre- sponding relocation table. Entries in these tables contain information on the specific symbol referenced, and how to patch a specific code or data reference with the final address of the symbol after it has been relocated. One or more of these relocatable objects can be linked together by placing them into their final position in the final executable or address space, after which the linker applies all relocations to patch the now final references directly into the code.

2.2.2 Linking and Loading

The actual loading process of a kernel module is started with a system call from user mode, and then handled by the kernel directly. We give a brief overview on the most important steps, as illustrated in Figure 2.16:

1. A user-mode process (usually insmod) loads the kernel module image into mem- ory and issues an init module system call. 2. The system call causes the kernel to dynamically allocate memory for the module and copy it into kernel space. 3. After the kernel has checked that the module is a valid ELF file it starts to analyze the .modinfo section of the module. This section contains information

27 2 Technical Background

module.ko Kernel

.text do init module 6 .rela.text

.init.text apply relocations 5 versions

.modinfo check module license and versions 4

check modinfo 3

copy module from user 2

init module 1 insmod sys init module –

Figure 2.16: Loading of a Kernel Module

on the exact version of kernel headers the module was compiled with. The kernel will refuse to load any modules that contain incompatible version magic.

4. If CONFIG_MODVERSIONS is enabled, the kernel will also check the version magic for every individual symbol the module imports. During compilation a list with this symbol magic is placed in the __versions section of the module. The kernel will also refuse to load a module with incompatible symbol version magic.

5. After the version check, the kernel invokes its internal linker to resolve all re- locations in the module. This will replace any inter-section or external symbol references in the module with the actual addresses of these symbols in the run- ning kernel, assimilating the module into the kernel image.

6. Finally, the kernel will link the module structure provided by the module into the module list and call the function pointer stored in module.init, which passes execution to the modules init module function.

In the context of the Linux kernel this means that loading a kernel module is actually the same thing as linking an executable. The kernel module is linked into the kernel executable, by the kernel itself, at runtime.

28 2.3 System Firmware

2.3 System Firmware

Because Chapter7 focuses on the acquisition and analysis of firmware, we give a short overview on the different firmware components used on x86 systems. This section is based on work by Salihun(2006), as well as the PCI(PCI-SIG, 2002) and Advanced Configuration and Power Interface (ACPI)(Intel Corporation, 2014a) specifications. The system firmware, i.e., the BIOS or the UEFI on more modern systems, is the first program that runs on the CPU when a computer is turned on. The respective code is saved in a non-volatile storage area, usually an EEPROM, on the mainboard of the machine. The chipset is initially configured to map the contents of this ROM into the physical address space from 0xF0000 to 0xFFFFF. The ROM is also aliased in a way that at least 16 bytes are mapped to the physical address 0xFFFFFFF0, the CPUs reset vector (Intel Corporation, 2014b, Chapter 9.1.4). As soon as the power supply is stable and all clocks are synchronized, the reset line in the CPU is deasserted, and execution resumes at the processors reset vector. At this time the CPU is in real mode, so even with segmentation it is not able to reach this address. This problem is remedied by statically initializing the base address of the CS register to 0xFFFF0000 on reset. The CPU will thus fetch the first instructions from 0xFFFFFFF0 (not 0xFFF0), until the Code Segment (CS) base is reset by a jmp or call instruction. Firmware code at this address then performs a far jump into the code residing in the mapped firmware ROM(Intel Corporation, 2014b, Chapter 9.1.4). The firmware code initially runs on the ROM chip. More precisely, only a small stub is directly executed at the beginning. The remaining instructions are typically compressed, because the firmware ROM is orders of magnitude slower than the system RAM. The stub is responsible for setting up the memory controller as well as the individual DRAM modules. The firmware then moves from ROM into RAM by manipulating the PAM registers, to shadow the ROM-mapped regions with RAM and uncompressing its code image into memory. As one of the last steps, the system firmware creates and maintains a runtime en- vironment with basic I/O services. It starts initializing devices on the PCI bus and maps their registers and memory into the physical address space as required. It is during this phase that the final layout of the physical address space is determined. While this procedure is similar in BIOS as well as UEFI based systems, the runtime environment after this step is fundamentally different.

2.3.1 Basic Input Output System

The BIOS runtime environment operates in 16-bit real mode. It is responsible for creating an Interrupt Vector Table (IVT) in order to support a set of simple

29 2 Technical Background operations, e.g. sending output to the screen or reading data from a hard disk. The latter functionality is required to drive the bootstrapping process and load the boot manager as well as the operating system later on. Precisely, the BIOS reads the code of the boot manager from the Master Boot Record (MBR) of the first harddisk into memory, and directly transfers execution to it. The boot manager, in turn, loads the operating system and further prepares the system environment. For these tasks, the primary BIOS services are used. In the last step, the operating system takes over interrupt handling by setting up an appropriate Interrupt Descriptor Table (IDT). By switching to the new IDT, direct access to the BIOS services is lost.

2.3.2 (Unified) Extensible Firmware Interface

Contrary to BIOS-based firmware, UEFI operates in 32-bit protected mode. The boot process comprises a distinct Security (SEC) phase in which the integrity of the firmware is explicitly checked, and secure is facilitated. In a second, so-called Pre-EFI Initialization (PEI) phase, similar tasks as during early BIOS initialization are performed. However, at the end of this phase, EFI provides a structured Driver Execution Environment (DXE) for drivers and services. These are not loaded from the MBR but from the file system of a designated EFI System Partition. The location of the respective images are specified in Non-Volatile RAM (NVRAM) on the mainboard. Drivers, , and theOS can interact with the firmware through specific protocols. After the operating system has started, it still has access to some firmware interfaces through the so-called UEFI runtime services.

2.3.3 PCI Option ROMs

Because the system firmware has no internal knowledge on the functionality ofat- tached devices, code specifically required for device initialization is provided bythe devices themselves. For PCI and PCIe devices this code is located on a ROM chip on the device. For a more detailed explanation of device initialization code the reader is referred to the official specification (PCI-SIG, 2010b, Section 6.3), on which this section is based. During Power On Self Test (POST) the firmware maps these ROMs into the physical address space by configuring the Expansion ROM Base Address Register (XROM- BAR) in the devices configuration space (see Section 2.1.3). Expansion ROMs can contain multiple code images, one for each supported archi- tecture. Each image is aligned to 512 bytes and starts with a header, describing its contents. The location of subsequent images depends on the size of the previ- ous image. Firmware parses each header and selects the image appropriate for its architecture. The header consists of a 2 byte signature, 16 bytes of architecture dependent data and a 2 byte offset to the so called PCI Data Structure. The PCI data structure

30 2.3 System Firmware contains information on the architecture for which the image was built, the size of the image and the device this image was designed for. For x86 compatible images the header additionally has fields which store the offset ofthe INIT function, as well as the amount of memory required for initialization. POST code on x86 systems must copy the appropriate image into a writable region of memory and pass control to the INIT function. Option ROM code will then run and initialize the device. Memory allocated to option ROM code must be writeable to allow initialization code to unpack and modify its own image in memory. On x86 systems the POST Memory Manager (PMM) is responsible for allocating memory to device initialization code. It uses the memory area spanning from 0 xC0000 to 0xEFFFF for storage of code and data. For legacy reasons the first 64 KB are reserved for video cards. Note that version 3.0 of the PCI firmware specification has changed the way memory is managed during POST. The PMM is now allowed to allocate memory above 1 MB even while theOS is running (PCI-SIG, 2010b). The location of regions outside of the previously described area are implementation specific and not standardized.

2.3.4 Advanced Configuration and Power Interface

In this section, we give a brief overview on the ACPI. Because Chapter7 deals with acquisition and analysis of ACPI malware from memory, we will focus purely on the code execution mechanics of ACPI. For further information we refer the reader to the official ACPI specification (ACPI Promoters Corporation, 2013), on which this section is based. ACPI defines a platform independent interface between theOS and the hardware. Figure 2.17 illustrates the interaction between ACPI, theOS and the hardware. The interface consists of three components: Registers, System Description Tables, and Firmware. ACPI System Description Tables characterize the hardware and what needs to be done to make it function. They are supplied to theOS by the ACPI firmware. Ad- ditionally, ACPI tables contain Definition Blocks with code to control the hardware. This code is supplied in the ACPI Machine Language (AML), which is an abstract language that is executed by an AML in theOS. ACPI registers are actually part of the platform hardware. They refer to the part of the hardware that is constrained by the ACPI specification. The ACPI firmware in turn amounts to the part of the platform firmware that implements the ACPI interface. It consists of routines that manage power and system sleep states, and is responsible for supplying the ACPI tables to theOS. When the system boots, ACPI firmware copies the ACPI tables into an arbitrary memory region. To enable theOS to find them, it must place a structure called the

31 2 Technical Background

Kernel Operating System

ACPI Drivers AML Interpreter

ACPI ACPI ACPI Registers Firmware Tables ACPI

Platform Hardware Platform Firmware

Figure 2.17: ACPI Architecture (based on ACPI Promoters Corporation, 2013)

Root System Description Pointer (RSDP) into the first 1 KB of the Extended BIOS Data Area (EBDA) or the firmware ROM image between 0xE0000 and 0xFFFFF3 on a 16 byte boundary. This structure contains a signature ("RSD PTR "), a checksum, and a pointer to the Extended Root System Description Table (XSDT)4, which in turn points to all other ACPI tables. TheOS scans the specified memory regions for the signature, validates the checksum, and, if successful, follows the pointer inside to locate the XSDT.

The XSDT is the root directory from which all other tables are discovered. TheOS then loads tables such as the Fixed ACPI Description Table (FADT), which in turn leads to the Differentiated System Description Table (DSDT). The DSDT contains code and data in AML format, which is executed in the AML interpreter of theOS upon initialization. There are 15 other ACPI tables, some of which are optional and don’t have to be present on every ACPI implementation. A detailed list of all available tables and their function can be found in the ACPI specification (ACPI Promoters Corporation, 2013).

TheOS uses these tables to interact with the hardware without a need for any platform specific knowledge. All hardware details are embedded inthe AML code, so theOS just needs to interact with the description blocks to enumerate hardware and interact with it.

3 On UEFI systems the RSDP is provided to the in the UEFI System Table, so there is no need to scan for it. 4 On 32 bit systems the legacy Root System Description Table (RSDT) is used instead of the XSDT.

32 2.4 Summary

2.4 Summary

In this chapter, we have introduced the concepts and technologies that drive the memory architecture on modern x86 systems. We have introduced the physical ad- dress space and explained how the PCIe protocol ties devices and memory together. Furthermore, we have presented the virtual address space and the mechanisms and data structures responsible for address translation. These concepts form the founda- tion on which software memory acquisition is built, and are necessary to understand memory enumeration and mapping techniques. In addition, we have given a short overview on the architecture of Linux kernel modules. This is important to understand the relinking techniques we introduce in Chapter6 to load our acquisition module into arbitrary kernels. We have also laid out the components of the system firmware that run under the hood of typical x86 computers. This forms the base for Chapter7, where we focus on the acquisition of firmware code and data in the course of forensic memory acquisition.

33

Chapter 3

Memory Acquisition

Memory acquisition is the process of obtaining a copy of the physical memory of a system for analysis. It is the first step of a memory forensics investigation, in which insights into a computer system are gained by analyzing its physical memory. Memory forensics is very useful for the discovery of rootkits, which manipulate the OS into hiding their presence. Because a rootkit’s code, data, processes and threads need to exist somewhere in memory in order to run, it is impossible for a rootkit to remove all its traces from physical memory. This problem is referred to as the rootkit paradox, which memory analysis techniques can exploit to detect and analyze rootkits (Kornblum, 2006).

In this chapter, we give an overview of the field of memory acquisition. It serves as background and motivation for the main contributions of this thesis. We first explore the theoretical foundations of the field and then present a study onthe technical principles that facilitate memory acquisition in current tools. We focus especially on the correctness of memory images, which we introduce as the main metric to measure the quality of memory acquisition techniques in regard to the rootkit threat.

Outline of the Chapter

This chapter is outlined as follows: First, we give a definition of the memory acqui- sition process and depict the characteristics of the resulting memory image in more detail in Section 3.1. We introduce criteria that define forensically sound memory ac- quisition and present an evaluation that examines popular memory acquisition tools with regard to these criteria. In Section 3.2 we examine current software memory acquisition mechanisms and tools. We describe operating system memory interfaces for Windows, Linux andOSX and provide an overview on the technical details of a selection of third party tools. Finally, we develop the memory acquisition program OSXPmem, an open source memory acquisition tool we developed for theOSX platform. Our insights further the understanding of theOS internals involved in acquiring physical memory, which serves as a foundation for understanding their attack surface.

35 3 Memory Acquisition

3.1 Principles of Memory Acquisition

Modern memory analysis frameworks like Rekall (Cohen, 2014b) and Volatility (Wal- ters, 2014) require a copy of physical memory from the system under investigation (Ligh et al., 2014). This copy is referred to as a memory image and its creation as memory acquisition. Along with the work of Schatz(2007a) and V¨omeland Freiling (2012) we define a memory image as follows: Definition 1. A memory image is an exact copy of all physical memory ranges of a computer system at a specific point in time.

This implies three important requirements. The copy must be exact, meaning there must not be any errors in the image. It must be complete, which means all physical memory ranges must be copied. And finally, the copy must be created at a specific point in time, which implies the image must be taken at once, not over a long period of time.

3.1.1 Criteria for Sound Memory Acquisition

Memory analysis techniques can only be reliable if the memory image they operate on corresponds exactly to Definition1. If the image is not an exact copy of memory at the time of acquisition, this can result in incorrect analysis results. In addition to problems resulting from errors in the acquisition program, software memory acquisition is prone to concurrency issues. Because the acquisition process is not an atomic operation and takes time to complete, the acquisition time is a time span rather than a discrete value. Software running in parallel to the acquisition program can write to parts of the systems memory while it is copied into the image, leading to inconsistencies in the image called memory smear (Richard and Case, 2014). To be able to measure and evaluate the quality of memory acquisition procedures and thus the resulting memory image, several authors have proposed criteria for memory acquisition quality (Afek et al., 1993; Schatz, 2007b; Inoue et al., 2011; V¨omeland Freiling, 2012). We use the model proposed by V¨omeland Freiling(2012) to classify our work, because it combines all relevant aspects into three independent criteria: correctness, atomicity, and integrity. In the following section we will give a short overview of these criteria, to assess the methods developed in this thesis.

Correctness Regarding a set of memory regions, a memory image is correct if for all these regions the data captured in the memory image matches the contents of this region at the point in time it is duplicated (V¨omeland Freiling, 2012). While this seems trivial, there are at least three things that can go wrong during acquisition, resulting in an incorrect image:

36 3.1 Principles of Memory Acquisition

1. Errors in the acquisition software can result in memory regions being copied to wrong parts of the image. For example, a popular open source software had a bug that resulted in memory regions being stored at the wrong offset in a raw image (Suiche, 2009a). As a result, virtual to physical address translation became impossible on the image, because physical pointers were no longer valid. 2. A broken memory enumeration procedure can result in some pages not being written to the memory image at all. Such an image is incorrect as it does not include the content of some pages at all. For example in a comparison of mdd (ManTech CSI, Inc., 2009) and win32dd (Suiche, 2009b), mdd produced an image that was 118 pages smaller than the win32dd image (Inoue et al., 2011). This implies that at least on of the programs did not enumerate memory correctly, resulting in an incorrect image. 3. Malicious software can subvert the acquisition process and manipulate the result- ing image. For example Milkovic(2012) developed a proof of concept software capable of hiding malware artefacts from memory images. By manipulating the buffers of the image file as it is being written, the malware causes the acquisition software to write an incorrect image.

Atomicity As we will see in Section 3.2, software memory acquisition is not an atomic operation. Some regions of memory can be overwritten by concurrent system activity before they can be copied to the image. The atomicity of a memory image is quantified by the amount of memory that is changed by concurrent system activity during the duration of the imaging operation. An atomicity violation happens when the contents of a memory region are modi- fied by concurrent system activity before it can be acquired, but the cause ofthe modification is not present in the image(V¨omeland Freiling, 2012). State of the art memory analysis tools are unable to identify atomicity violations in a memory image. They have to assume to be operating on an atomic image to be able to function. However, under the right conditions, this can lead to problems and even incorrect results. V¨omeland Freiling(2012) illustrate concurrent activity using space-time diagrams, which visualize dependencies of memory operations over time. Each memory region consists of a horizontal line, dots on that line represent an operation (read or write) on that region. A simplified example is given in Figure 3.1. The illustration consists of processes p operating on memory regions r at times t . A process p allocates ∈ P ∈ R ∈ T 1 a region of memory r3 at time t0 and stores some data in it at time t1. The page table entry for this allocation is stored in r1. Simultaneously, memory acquisition software p2 starts and copies the page tables into the image. Before it can also copy the contents of the mapped page, p1 releases this region of memory. Shortly after that (at time t3), another process p3 allocates the same memory region with its page

37 3 Memory Acquisition

p1 p2 p3

r1

r2

r3

time t0 t1 t2 t3 t4 t5

Figure 3.1: Space-Time Diagram of an Atomicity Violation

tables stored in r2. It copies incriminating data into r3 at time t4. Finally, p2 copies the data from the page into the image at t5.

Memory analysis software operating on the image will attribute the data in r3 to p1, because the page tables show it mapped into this process. The image is correct, after all the page tables did contain this information at the time they were copied. The analysis is also performed correctly, the software did interpret the page tables in the appropriate way. However, the result of the analysis is wrong, because the image is not atomic. The events leading to a change in ownership of r3 at t2 and t3 are not visible in the image. However, the incriminating data copied into r3 at t4 is present because the acquisition process is interleaved with concurrent system activity by p1 and p3. This results in incorrect analysis results in spite of a correct image, caused by a lack of atomicity.

Integrity Similarly to atomicity, the level of integrity of an image also depends on the amount of memory that changes during the acquisition. However, it does not depend on causality. Instead, the integrity of an image is measured relatively to a point in time. Intuitively, this is the time where the acquisition started, but the definition allows for any arbitrary point in time if necessary (V¨omeland Freiling, 2012). A simple example for integrity violations is the change the imaging software itself causes during acquisition. The program needs to be copied into memory, libraries have to be loaded and copy operations fill buffers. All these operations cause memory to be overwritten after the imaging process has started. Thus, the integrity of the image is violated in regard to the starting time of the acquisition process. But not only the memory imaging program can affect the integrity of a memory im- age. Figure 3.2 illustrates system activity during a memory acquisition procedure.

38 3.1 Principles of Memory Acquisition

p1 p2 p3

r1

r2

r3

time t1 t2

Figure 3.2: Space-Time Diagram of Integrity Violations

Processes p1 and p2 run in parallel with the acquisition software p3. The acquisi- tion software is started at time t1. There are no atomicity violations, because the causality of all memory writes is preserved. However, the integrity of the image is violated with respect to t1, because the value contained in r3 at that point in time is not the value that is copied to the image. The process p2 overwrites this value before the imager copies the memory region, causing the integrity violation. Any potential evidence stored in r3 at t1 is lost.

Note that the integrity of the image is intact in regard to a second point in time t2. All values that existed in r r at t are copied to the image intact. The writes 1 − 3 2 that happen after t2 are not important, because the respective regions have already been captured.

3.1.2 Correctness of Existing Memory Acquisition Tools

In a previous publication (V¨omeland St¨uttgen, 2013), which is not part of this thesis, we built an evaluation platform for memory acquisition software based on the criteria introduced in Section 3.1.1. The platform consists of a modified version of the Bochs x86 emulator (The Bochs Project, 2013), that has been extended with an instrumentation module. The module is capable of monitoring and logging all memory operations in the emulated environment and even identify the responsible thread. This information can later be used to track causality and identify potential atomicity violations. We have evaluated three memory acquisition tools for windows: mdd (ManTech CSI, Inc., 2009), WinPMEM (Cohen, 2012) and Win32dd (Suiche, 2009b). We chose these tools because the source was freely available, which simplified the integration of the platforms hypercall mechanism.

39 3 Memory Acquisition

In our tests we immediately discovered errors in mdd (ManTech CSI, Inc., 2009) and Win32dd (Suiche, 2009b) that caused both programs to create an incorrect image. To avoid system instability both programs skip MMIO regions in the physical address space (see Section 3.2.3). However, when writing subsequent memory regions to the image they did not pad the gaps with zeroes, instead directly writing the adjacent memory region next to the previous one. This is an interesting bug, because the image is complete so it looks correct at first glance. The difference becomes apparent if we go back toFigure 2.5 and have a look at the physical address space. If we just remove all regions mapped to devices and shift the memory regions to be adjacent to each other, the address of all memory regions except the first one changes. This invalidates any physical memory reference that points into one of the upper regions. As shown in Section 2.1.2, all virtual address translation datastructures rely on physical address references. Because memory analysis software relies on physical to virtual address translation to bridge the semantic gap in a memory image, any analysis technique more advanced than string extraction becomes impossible. Other results of our evaluation indicate that software memory acquisition does have severe problems regarding atomicity and integrity. However, for the purpose of this thesis we will concentrate on correctness, as this is the critical criterion in regard to rootkit manipulation. While rootkits cannot remove themselves from memory, they can subvert the memory acquisition process to remove themselves from the memory image. Memory analysis techniques aimed at rootkit discovery and analysis thus rely on an absolute correct image to function.

3.1.3 Memory Image Formats

Memory images can be stored in a variety of formats, the most notable difference being sparseness and inclusion of metadata. The simplest format for a memory image is a raw image. It is a binary file that represents an exact copy of the physical address space of the system it was acquired from. Inaccessible regions are zero padded and the resulting file has the same size as the physical address space. This is done to preserve the physical address of data in the image. Each memory region is located at the same offset in the image file as it was stored in the physical address space. Physical memory references can thus be resolved by interpreting them as a file offset. Note that because of the padded MMIO regions this file can be much larger than the amount of memory installed in the system. Because of its simplicity, the format is supported by most major memory analysis frameworks. More sophisticated formats adopt a sparse approach. By providing metadata on the physical address of memory regions and their respective file offsets in a header of the file, they don’t have to carry padding for inaccessible memory regions. Because of this sparseness, they are much smaller than a raw image and roughly amount to the size of physical memory. Examples include the LiME format (Sylve, 2012),

40 3.2 Software Memory Acquisition Techniques

Executable and Linkable Format (ELF) and the Mach-O core files, and the Microsoft crashdump format. ELF and Mach-O core files are binary file formats to store dumps of virtual memory and debugging information, which makes them ideally suited to also store physical memory images. However, at the time of writing there is no general method of storing metadata together with the memory image, with some tools inventing their own formats. For example, WinPMEM (Cohen, 2012) is able to store additional metadata like the page file or the address of the kernels page tables ina YAML Ain’t Markup Language (YAML) footer at the end of any container file. However, no standards exist in this regard and the YAML data generated by WinPMEM is currently usable with the Rekall (Cohen, 2014b) memory forensic framework only.

3.2 Software Memory Acquisition Techniques

To identify the different points in the memory acquisition process where software can be intercepted by malware, we study the inner workings of open and closed source memory acquisition software on the Windows, Linux and Mac OS X platform. To discover the most common ways of accessing memory on these platforms, we studied the implementation of all open source memory acquisition tools that were available at the time of writing and capable of acquiring memory from a 64 bit version version of the respectiveOS:

• WinPMEM (Cohen, 2012) • fmem (Koll´ar, 2010) • Win32dd (Suiche, 2009b) • pmem (Cohen, 2011) • mdd (ManTech CSI, Inc., 2009) • LiME (Sylve, 2012)

To get a complete picture, we also reverse engineered the most popular closed source memory acquisition applications that were freely available:

• FTK Imager (AccessData, 2012) • MacMemoryze (Mandiant, 2012) • WindowsMemoryReader (ATC-NY, • DumpIt (Suiche, 2011) 2012b) • Memoryze (Mandiant, 2011) • MacMemoryReader (ATC-NY, 2012a)

3.2.1 Memory Acquisition Challenges

To create a correct image of the entire physical memory on a system, software must solve two main challenges. As illustrated in Section 2.1.2, all software started by the user runs in an isolated virtual address space with no control over physical memory. A memory acquisition program running in user space can only access it’s

41 3 Memory Acquisition own memory, and thus never create a complete memory image as defined in Section 3.1. This limitation can only be bypassed by code running in system mode, which means the acquisition program needs help from theOS or a driver to map all physical memory into its virtual address space. But even with direct access to the physical address space, software must enumerate the address space layout and determine the location of the physical memory regions. As shown in Section 2.1.1, this layout varies on most machines, and memory regions are interleaved with MMIO regions mapped to device memory. Reading from a MMIO region can cause an interrupt on the device, leading to data corruption and system crashes. This leads us to two main challenges software must solve to successfully acquire memory:

1. Software must reliably map all physical memory regions into its virtual address space. 2. Software must enumerate the entire physical address space and identify all phys- ical memory and MMIO regions.

These challenges equally apply to all x86 systems running in protected- or long-mode regardless of theOS.

3.2.2 Operating System Memory Interfaces

Most major operating systems have dedicated interfaces to physical memory in- tended for debugging or legacy software. Some of these interfaces can also be used to create a memory image for forensic purposes. In the following, we will give an overview of the different interfaces available on Windows, Linux andOSX.

Microsoft Windows The Microsoft Windows family of operating systems provides the section object \\.\Device\PhysicalMemory to software that needs to access physical memory. This object presents a section object interface to the physical address space. Since Windows Server 2003 Service Pack 1, this object can only be accessed from kernel space, so a driver is needed to use it (Microsoft Corporation, 2013). Memory acquisition software can simply map regions of physical memory into its own address space from this file1 and write them to disk or the network. In addition, Windows has the ability to write memory dumps on system crashes. The Windows kernel adopts a fail fast policy, meaning it reacts to inconsistencies by shutting down with what is commonly known as a blue screen (Russinovich et al.,

1 Section objects can be mapped using the ZwMapViewOfSection() Application Programming In- terface (API).

42 3.2 Software Memory Acquisition Techniques

2009). When the kernel detects a problem, it calls the KeBugCheckEx function. This function first disables all interrupts on all CPUs, writes a memory dump to the region on disk occupied by the page file, and then halts the system while displaying an error message on a blue background. Because system activity is halted and the memory image is written by theOS itself, the atomicity of the image is better than with other software approaches (V¨omel and Freiling, 2011). However, the memory image is incomplete by default and only contains kernel memory (Russinovich et al., 2009). While it is possible to configure the system to include all physical memory into the image, this requires reconfigura- tion and a reboot. Software can register hooks to the crashdump function that get called before the memory image is written (Russinovich et al., 2009), which makes it trivial for malware to hide from this mechanism. Also note that the crashdump function effectively brings down the system, which can result in loss of data and forces a reboot. Because of its limitations and the required preparation to achieve a complete memory dump on demand, this method is not well suited for most forensic scenarios. Finally, it is possible to exploit the hybernation mode on Windows to acquire mem- ory. When Windows goes into suspend-to-disk state, it saves the state of the pro- cessor and memory to disk into the so called hybernation file. While there has been work to analyze and utilize these files in course of memory forensic investigations, they are not guaranteed to be complete and the format varies between different Windows versions (Ruff and Suiche, 2007).

Linux Linux has a special character device that is usually mounted to /dev/mem. Similarly to the Windows physical memory section object, it provides a file-like view of the physical address space. It is used for legacy software, e.g. the X-Server uses it on systems where the graphics driver does not offer direct access to the video cards framebuffer and configuration registers (Lineberry, 2009). Because it can be abused to escalate privileges and install rootkits, kernel 2.6.26 introduced a new config option CONFIG_STRICT_DEVMEM (van de Ven, 2008). This option restricts / dev/mem access to the first megabyte of physical memory, which makes it unsuitable for memory acquisition (Lineberry, 2009). Linux kernels that have the CONFIG_PROC_KCORE config option enabled have the /proc/kcore file used for kernel debugging. This file exports the kernels virtual address space as an ELF . Because the Linux kernel maps all physical memory into its virtual address space on x86-64 systems (Kleen, 2004), it is possible to extract a physical memory image from the kcore file (Ligh et al., 2014). A proof of concept implementation exists in the Volatility project (Walters, 2014) and we have also implemented this method into Linux Memory Acquisition Parasite (LMAP), our Linux memory acquisition platform introduced in Chapter6. To acquire a physical memory image from kcore, LMAP parses the ELF header of the /proc/kcore file. Listing 3.1 illustrates a simplified version of the algorithm. LMAP

43 3 Memory Acquisition

1 for (size_t i = 0; i < ehdr.e_phnum; i++) { 2 ... 3 // Only add segment if inside kernels physical memory mapping 4 if (phdr.p_vaddr >= 0xffff880000000000 && 5 phdr.p_vaddr <= 0xffffc80000000000) { 6 memory_map_append( 7 mm, phdr.p_vaddr, phdr.p_filesz, phdr.p_offset 8 ); 9 } 10 }

Listing 3.1: Identifying Physical Memory Regions in /proc/kcore

iterates over each ELF program header and checks the virtual address of the corre- sponding segment. Segments with a virtual address between 0xffff880000000000 and 0xffffc80000000000 belong to the direct kernel physical memory mapping and need to be copied. The p_offset field shows the location of this segment inthe kcore file, while the p_filesz field describes the size of the segment. By iterating over all segments in the file, LMAP enumerates the physical address space and adds segments containing physical memory to its memory map. These segments are then copied into the memory image by reading from the stored file offsets in /proc/kcore.

Mac OS X Early versions ofOSX provided the /dev/mem and /dev/kmem device files similarly to Linux systems. However, these devices were disabled by Applewith the move to the x86 architecture (Singh, 2006). They can be re-enabled by setting the kmem boot argument, which requires a reboot (Halvorsen and Clarke, 2011). Because of this, they are only useful in cases where the system can be prepared for memory acquisition before the incident. While it should also be possible to exploit the hybernation file onOSX to acquire memory, we are not aware of publicly available solutions to this problem. However, there are indications this is being investigated (Ruff and Suiche, 2007).

3.2.3 Driver-Based Memory Acquisition

Some operating systems like Mac OSX or Linux since kernel 2.6.26 do not offer direct physical memory access. On these systems custom drivers are needed to access the entire physical address space and create a memory image. Also, OS physical memory devices like \\.\Device\PhysicalMemory are obvious targets for malicious software (Bilby, 2006). This warrants the use of more robust and stealthy methods to access physical memory.

44 3.2 Software Memory Acquisition Techniques

Microsoft Windows The standard API for memory acquisition on Windows is the \\.\Device\PhysicalMemory section object. Because this interface is built into theOS for the purpose of physical memory access, it is considered to be the most stable approach.

However, memory acquisition drivers increasingly use undocumented APIs for map- ping of physical memory to evade interception by malicious software. These include the MmMapIOSpace symbol originally intended for drivers to map MMIO regions of devices, as well as the MmMapMemoryDumpMdl symbol used by the kernels own crash- dump facilities.

Memory enumeration is commonly achieved by calling the MmGetPhysicalMemory Ranges function, which returns the contents of MmPhysicalMemoryBlock. This data structure contains an array that stores the physical address and size of all available physical memory ranges in the system (Cohen, 2014a).

Linux The restrictions of /dev/mem through the CONFIG_STRICT_DEVMEM option have forced developers to pursue an alternate route of physical memory access. The RedHat crash utility for example is a debugging tool to investigate system crashes by analyzing kernel memory. It requires access to the entire physical ad- dress space to work and relied on /dev/mem before kernel 2.6.26. To work around CONFIG_STRICT_DEVMEM the “crash” kernel module was developed, to provide similar functionality as the unrestricted /dev/mem device (Anderson, 2008). This module can be used to get access to physical memory from userspace, which can then be copied into an image by use of a file copying tool like “dd”.

Based on this idea, multiple implementations of /dev/mem like modules for forensic memory acquisition have been developed (Koll´ar, 2010; Cohen, 2011). However, with the exception of the pmem module, these tools lack any safeguards for the address space regions read, which requires userspace tools reading from their device node to make sure not to read from MMIO regions to ensure system stability (see Section 3.2). Software can retrieve information on the physical address space layout from / proc/iomem, which exports the memory resource tree to userspace. Regions marked as “System RAM” are guaranteed to be backed by physical memory and safe to read.

There are several problems with this approach. First of all, memory reading is performed with a block-wise file copying tool such as dd. This uses a lot of memory and is rather slow, because each page is first copied to userspace, and then copied back into the kernel for writing to disk or sending over the network. This also causes a lot of memory to be overwritten, which violates forensic principles and can destroy evidence resident in memory regions that have already been freed (Sylve, 2012). Furthermore, it is very easy for malicious software, even from user space, to intercept the operation and filter or modify the memory image.

45 3 Memory Acquisition

Recent research on Linux memory acquisition has focused on moving most of the operation to kernel space to minimize these problems. For example, the LiMe and pmem tools obtain information on the address space layout directly from kernel mode by parsing the iomem_resource tree (Sylve, 2012; Cohen, 2011). Furthermore, LiMe avoids the buffer copying issues of user mode imagers by writing the image directly from kernel mode (Sylve, 2012).

Mac OS X The Mac OS X kernel actually consists of multiple components that each run in systemmode and thus have access to physical memory. The hardware- specific details are managed by the platform expert, which is a kernel objectthat interfaces other components with the systems buses (Singh, 2006). Memory and task management is ultimately performed by the Mach component of theOSX kernel, which is based on the Mach microkernel (Accetta et al., 1986). However, most other kernel functionality is implemented by a task running in system mode which is based on BSD (Singh, 2006). Finally, the IOKit provides a C++-based environment for drivers (Levin, 2012). While it is possible to load generic kernel extensions, the preferred method of loading a driver is through the IOKit. Memory enumeration onOSX can be achieved by obtaining the firmware memory map. When the system boots, the Extensible Firmware Interface (EFI) passes infor- mation on the layout of the physical address space to the platform expert via the so called boot arguments. Memory acquisition software can utilize this information to enumerate physical memory and avoid accessing MMIO regions. Kernel extensions can access them directly through the MemoryMap member of the bootArgs symbol in the platform experts state. The IOKit communicates with the platform expert through driver connection points called nubs (Apple Inc., 2013b, I/O Kit Architecture). It receives a copy of the boot arguments through the root nub during initialization, which it stores in the IOPlatformArgs (Singh, 2006). IOKit drivers can find the root nub using the IOService::getServiceRoot function and then get the IOPlatformArgs from there. There are multiple APIs available for physical memory access. Software can call directly into Mach memory management functions or use the IOKit as a wrapper. However, mach memory management symbols are not exported, so the officially sanctioned way of mapping memory is through the IOKit. The two most commonly used interfaces in the IOKit for mapping memory are the IOMemoryDescriptor and IOService APIs (Apple Inc., 2009, 2013a). We will go into this in more detail in our description of the OSXPmem kernel extension in Section 3.2.3. The first freely available tool for Mac OS X memory acquisition was MacMemo- ryReader (Inoue et al., 2011). It uses the DTrace framework to obtain the memory map from usermode by reading the PE_state.bootArgs from the platform expert. It then loads a generic kernel extension that creates a /dev/mem character device that maps memory using an IOMemoryDescriptor. Unfortunately MacMemoryReader is

46 3.2 Software Memory Acquisition Techniques

1 page_desc = IOMemoryDescriptor::withPhysicalAddress( 2 page, PAGE_SIZE, kIODirectionIn 3 ); 4 ... 5 page_map = page_desc->createMappingInTask( 6 kernel_task, 0, kIODirectionIn, 0, 0 7 ); 8 ... 9 * vaddr = (void *)(page_map->getAddress()); 10 ... 11 uiomove64((uint64_t)vaddr_page, (uint32_t)chunk_len, uio); Listing 3.2: Memory Mapping in OSXPmem

not open-source and the project seems to have been abandoned. At the time of writing the project website was unreachable (ATC-NY, 2012a). A free alternative to MacMemoryReader is MacMemoryze (Mandiant, 2012). It utilizes an IOKit driver to enumerate memory by getting the memory map from the IOKit root nub. It then services a /dev/mem character device using the IOService API.

OSXPmem At the time of writing only one free memory acquisition tool forOS X existed and it was closed source (ATC-NY, 2012a). Its development has since been ceased and the old versions are not capable of acquiring memory on recent versions ofOSX such as 10.9 and 10.10. To fill this gap we developed the program OSXPmem, which is able to acquire memory from all recentOSX from 10 .6 to 10.10. It consists of a user space tool that creates the memory image, as well as a generic kernel extension facilitating physical memory access and enumeration. Interaction with the kernel extension is accomplished through a character device in /dev/pmem. OSXPmem maps memory using the IOKit IOMemoryDescriptor class. We provide an simplified version of the relevant code in Listing 3.2. By creating an IOMemory- Descriptor with a physical address, and then calling the function createMappingIn Task, the IOKit maps the requested page into the virtual address space of the ker- nels mach task. From there we can copy it to user space on reads to the device file using the uiomove64 function. For flexibility reasons, we don’t restrict the read offsets for the device file, allowing access to the entire physical address space. To ensure system stability, the user space component must enumerate the physical address space to ensure it doesn’t read from MMIO regions. This is accomplished by parsing the EFI memory map. Because the memory map is not available from user space, the kernel extension provides an interface for user space programs to obtain it. The relevant parts of the program are illustrated in Listing 3.3. The kernel extension first obtains a pointer

47 3 Memory Acquisition

1 boot_args *ba = (boot_args *)PE_state.bootArgs; 2 mmap = (EfiMemoryRange *)ba->MemoryMap; 3 mmap_size = ba->MemoryMapSize; 4 ... 5 copyout(mmap, *((uint64_t *)buffer), mmap_size); Listing 3.3: Accessing the Memory Map in OSXPmem

to the boot arguments from the platform expert. It then stores references to the EFI memory map and its size, and uses the copyout function to copy it to userspace upon request. To create a memory image, the user space component of OSXPmem first loads the kernel extension and obtains the memory map. It then iterates over the memory map and reads all valid memory regions from the /dev/pmem device file. The result is written either as a raw binary file, an ELF core file, or a Mach-O core dump for analysis.

3.3 Summary

In this chapter we have given an overview of the theoretical and practical details of software memory acquisition. We have introduced criteria for sound memory acquisition and given examples as well as a short overview of an evaluation of these criteria. Furthermore, we have pointed out the importance of correctness of memory images for the purpose of rootkit detection. Because rootkits cannot remove themselves from memory entirely, they must subvert this property to remain invisible. Finally, we have presented an overview on the inner workings of most freely available memory acquisition software. We have defined the tasks of memory access and memory enumeration as the two critical steps to obtain a correct memory image. By studying open source tools and reverse engineering other, freely available, memory forensic software, we have compiled an overview on how software can solve these tasks on the Windows, Linux andOSX platforms. Note that our results show that all publicly available memory acquisition software completely relies on operating system interfaces to enumerate and access memory. This can be abused by malware to subvert the acquisition process, resulting in an incorrect image that has been cleaned of malware traces. Investigators have no way of knowing if the created image is correct, and may therefore draw false conclusions and remain oblivious of the malware present on the system. The remainder of this thesis is dedicated to identifying such deceptive techniques and to develop methods to create correct memory images in spite of sophisticated malware with anti-forensic capabilities.

48 Chapter 4

Anti-Memory Forensics

Some of the most widely used kernel rootkit techniques are interception of system call APIs (hooking) and Direct Kernel Object Manipulation (DKOM). By hooking kernel APIs directly, rootkits can filter the view of the system that is presented to detection and analysis software (Hoglund and Butler, 2005). DKOM attacks directly manipulate kernel data structures to hide processes, threads, network connections and other malware traces (Butler, 2004). Because theOS itself has been subverted in this scenario, it can no longer be trusted to deliver accurate information. Memory forensics provides a more reliable view of the system state, and thus is increasingly often used to detect and analyze malicious software. Anti-memory-forensic techniques attack either the acquisition or analysis phase of memory forensic investigations. Analysis software like the Volatility framework (Walters, 2014), which relies mostly on memory scanning, can be subverted by de- stroying or manipulating certain kernel data structures needed for its operation. For example, it was possible to prevent analysis by Volatility by overwriting the KdDebuggerDataBlock.OwnerTag string (Haruyama and Suzuki, 2012). Because this data structure is only used for kernel debugging, destroying it does not im- pact regular system operation. Other work proposes flooding the address space with thousands of fabricated data structures intended to distract and overwhelm investigators with false positives (Williams and Torres, 2014). While attacks on the analysis phase can be effective, they have the drawback that they can be overcome with sufficient effort. As investigators improve their methods to deal with these techniques, they can re-analyze previously acquired memory images and uncover evidence they missed in past analysis. Successful attacks on the acquisition phase are permanent, because of the volatile nature of RAM. If malware succeeds in hiding its traces from a memory image, it is very unlikely investigators will be able to come back with improved tools and acquire another image. By the time they do, concurrent activity on the system and/or sys- tem reboots will have erased most of the data of interest. The problem with current memory acquisition software is that it relies onOS APIs to enumerate and access physical memory. Malware can use the same techniques that are already employed to subvert system calls, to filter the view software has on physical memory. For example, the DDFY rootkit intercepts access to the \\.\Device\PhysicalMemory object on Windows systems by using a filter driver (Bilby, 2006). Dementia expands on this principle by filtering arbitrary traces from file system buffers of the memory image while it is being written to disk (Milkovic, 2012). The shadow walker rootkit

49 4 Anti-Memory Forensics uses a hook in the page fault handler of theOS to desynchronize the data- and instruction TLB(Sparks and Butler, 2005). This approach can hide control flow modifications because the CPU will get different data on instruction fetches than a memory acquisition tool acquires by reading from the same region of memory. In this chapter, we present an overview of the current state of the art in anti-forensics against software memory acquisition, and analyze different techniques in regard to the two main tasks we identified in the previous chapter: memory mapping, and memory enumeration. Based on our analysis of memory acquisition software inter- nals in Section 3.2.3, we develop a generic DKOM attack on memory enumeration. We develop proof-of-concept implementations for Windows, Linux andOSX that are able to subvert all publicly available memory acquisition software. We then iden- tify a method of hiding arbitrary code and data from memory acquisition software by utilizing hidden regions of memory that are unknown to theOS.

Outline of the Chapter

This chapter is organized as follows: In Section 4.1, we classify different anti- memory-forensic techniques in regard to their targeted part of the acquisition pro- cess. We show how an attacker can hide code and data from memory images by intercepting memory mapping and memory enumeration APIs. In Section 4.2, we propose practical anti-memory-forensic techniques that attack the memory map- ping and enumeration APIs on Windows, Linux andOSX. We evaluate each of the developed methods on a broad selection of publicly available memory acquisition software. Finally, in Section 4.3, we introduce a novel, passive, technique for hiding code and data from memory acquisition software.

4.1 Anti-Forensic Techniques

As we have illustrated in Section 3.2.1, all software memory acquisition tools must solve two primary challenges: memory mapping and memory enumeration. These present key points in the memory acquisition process that anti-forensic malware can attack to hide its traces.

Attacks on Memory Mapping Memory acquisition software must map all phys- ical memory into its virtual address space to be able to access it. As we have shown in Section 3.2.2, this is accomplished with help of theOS. By intercepting the mem- ory mapping APIs in theOS, rootkits can selectively replace regions of memory that contain traces of their existence with a benign copy of this region. This allows them to transparently remove evidence from the memory image.

50 4.2 Attacks on Memory Acquisition Software

Attacks on Memory Enumeration To prevent access to MMIO regions which can destabilize the system, software must enumerate the physical address space and identify all physical memory regions. This information is passed on to theOS by the firmware, which is generally not available to drivers at runtime. Software canquery theOS for the memory map, which provides an overview on the physical address space. Rootkits can intercept these APIs to hide specific regions from memory acquisition software. While it is not possible to hide all system modifications this way, rootkits can still hide their code and data from a memory image. Depending on the implementation of theOS it is also possible to perform a DKOM attack on the memory map directly. This is even stealthier because it does not require any redirection of control flow and can’t be detected by integrity checks. As we will show in Section 4.3, memory enumeration can even be cheated using passive techniques. There are small regions of memory in the physical address space that theOS doesn’t know about. Their existence is a result on constraints during POST and they are so small that their loss is considered acceptable by theOS. By identifying those regions malware can move code and data out of known memory without actively interfering with either theOS or memory acquisition software itself. Since these memory regions don’t exist from theOS perspective, they will not be acquired into the image.

4.2 Attacks on Memory Acquisition Software

In this section, we demonstrate practical attacks on memory acquisition software by patching all relevantOS APIs to return an error instead of performing their intended task. This makes it impossible for software to access or enumerate physical memory. With this capability it is also trivial to selectively hide code and data from the image by employing one of the strategies that have been described in previous work (Bilby, 2006; Milkovic, 2012). We test our attacks against a wide range of freely available memory acquisition software on Windows, Linux andOSX.

4.2.1 Windows

In our analysis of Windows memory acquisition software in Section 3.2.2, we have identified three APIs used by software to access physical memory, as well as one API for memory enumeration. Furthermore, we noticed that some memory acquisition tools rely on debugging data structures in theOS for the creation of memory images in crashdump format. By manipulating these data structures we can prevent them from writing the crashdump image.

Memory Enumeration As mentioned in Section 3.2.1, memory acquisition drivers need to enumerate the physical address space prior to acquisition. On the Microsoft

51 4 Anti-Memory Forensics

Windows family of operating systems, all tested drivers use the undocumented sym- bol MmGetPhysicalMemoryRanges() to obtain a map of the physical address space. By patching this function to always return NULL, which is the failure indicator for this function, we prevent drivers from learning about the physical address space layout. As reading from device memory can crash the kernel, this effectively pre- vents memory acquisition. Usage of this API is discouraged by Microsoft, so regular drivers don’t use it and patching it does not impact system stability. An actual rootkit could of course simply return a modified version of the memory map, which excludes ranges it is trying to hide. Acquisition would then appear successful, while being incomplete.

Memory Mapping To actually access physical memory, acquisition drivers need to map it into the kernels virtual address space (see Section 3.2.1). The three kernel APIs commonly used for this purpose are ZwMapViewOfSection, MmMapIOSpace and the undocumented symbol MmMapMemoryDumpMdl. For demonstration purposes, we patch MmMapMemoryDumpMdl to return NULL. As this symbol is undocumented and usage is discouraged, we can patch it without affecting system stability. Because the other two APIs are often used by drivers, patching them can destabilize the system. However, a more sophisticated rootkit can easily install hooks that filter mapping operations on hidden pages. This would also be a very reliable modification, subverting any memory acquisition tools using the other two API.

Debugger Block hiding The static kernel structure KdDebuggerDataBlock is used by memory acquisition software to find the base address of the kernel image and several non-exported symbols. It can be found by scanning for the OwnerTag member, which is the static string “KDBG”. Haruyama and Suzuki already demon- strated that overwriting this tag is effective in thwarting analysis by frameworks like Volatility (Haruyama and Suzuki, 2012). This technique can even disrupt memory acquisition, as some drivers rely on the KDBG to resolve some symbols.

Evaluation We have created a small kernel patcher (shown in Listing 4.1) to demon- strate these techniques. Note that because of kernel patch protection this script will not work on 64 bit kernels without disabling Patch Guard (Microsoft Corporation, 2006). However, as more and more rootkits subvert this protection (Rusakov, 2011, 2012; Allievi, 2014), we believe it is safe to assume an attacker is able to do this. For testing purposes we have enabled debug mode on our test systems, which disables Patch Guard. The script requires Winpmem 1.6.0 (Cohen, 2012) with write support to be loaded, which is used to get unrestricted access to physical memory. The script utilizes the Rekall memory forensic framework (Cohen, 2014b) to locate the kernel and its sym- bols in memory. It then patches the previously mentioned enumeration and mapping

52 4.2 Attacks on Memory Acquisition Software

1 from rekall import session 2 from rekall.plugins.overlays import windows 3 4 def KernelApiPatch(session, symbol, patch): 5 session.profile.get_constant_object( 6 symbol ,"String" 7 ).write(patch) 8 9 def PatchKDBG(session): 10 session.profile.get_constant_object( 11 "KdDebuggerDataBlock","_KDDEBUGGER_DATA64" 12 ).Header.OwnerTag.write("MOOF") 13 14 if __name__ =="__main__": 15 session = session.Session( 16 filename =r"\\.\pmem", 17 autodetect = ["nt_index","pe"], 18 profile_path=[ 19 "http://profiles.rekall-forensic.com" 20 ] 21 ) 22 shellcode ="\x48\x31\xc0\xc3" 23 KernelApiPatch( 24 session ,"MmGetPhysicalMemoryRanges", shellcode 25 ) 26 KernelApiPatch( 27 session ,"MmMapMemoryDumpMdl", shellcode 28 ) 29 PatchKDBG(session) Listing 4.1: Attack on Windows Memory Management APIs

functions to return NULL, and overwrites the KdDebuggerDataBlock.OwnerTag with the meaningless string “MOOF”.

We evaluated our proof-of-concept techniques against several popular memory ac- quisition tools. For this study, we have requested evaluation copies of “Moon- sols Dumpit”, “HBGary Fastdump Pro”, “GMG Systems Kntdd” and “Guidance’s WinEn” for the purpose of forensic tool testing. Only Moonsols responded posi- tively to our request. Additionally, we included open source or free tools such as Memoryze (Mandiant, 2011), FTK Imager (AccessData, 2012), WinPmem (Cohen, 2012), and WindowsMemoryReader (ATC-NY, 2012b). We believe that most other tools exhibit similar deficiencies. However, since we are unable to test these, readers are encouraged to use the provided test script in Listing 4.1 to reproduce these tests themselves.

53 4 Anti-Memory Forensics

GetPhysical- MapMemory- Program Version Format KDBG MemoryRanges DumpMdl

Memoryze 2.0 raw ✓✗✓ FTK Imager 3.1.2 raw ✓✗✓ Win64dd 1.4.0 raw ✓/✗✗✗ Win64dd 1.4.0 dmp ✗✗✗ DumpIt 1.4.0 raw ✓✗✗ WinPmem 1.3.1 raw ✗✗✓ WinPmem 1.3.1 dmp ✗✗✓ WMR 1.0 raw ✓✗✓ WMR 1.0 dmp ✓✗✓

Table 4.1: Evaluation of Acquisition with Active Anti-Forensics

Our test system is an x86-64 Intel computer with 8 GiB of RAM, running a fully patched Windows 7 x86-64 with Service Pack 1. We have tested the tools using their default settings to produce a raw image. In cases where the tool could produce an image in the crashdump format, the tests were repeated for this format. All patches in Listing 4.1 were tested individually, as well as simultaneously. A summary of the evaluation results is depicted in Table 4.1. The ✓ symbol means the acquisition tool was able to create an image of memory despite the employed anti-forensic method, the ✗ signals a failed acquisition. The data shows that every tested acquisition tool was subverted by at least one of the tested anti-forensic methods. After employing all anti-forensic techniques simultaneously, none of the tools were able to acquire a single byte of memory. Some tools even crashed the kernel while trying, a very undesirable effect when analysing production systems. This may be due to missing error checking within the acquisition tool which may assume that Kernel memory management APIs can never fail.

Mandiant Memoryze Destroying the KDBG Owner Tag had no impact on the per- formance of Memoryze. Also, hooking MmMapMemoryDumpMdl had no effect, as Mem- oryze only supports the \\.\Device\PhysicalMemory and MmMapIOSpace methods for mapping physical memory. Hooking MmGetPhysicalMemoryRanges caused Mem- oryze to crash the kernel immediately, making it impossible to acquire any memory at all and forcing the target machine to reboot without an error message.

Accessdata FTK Imager Similarly to Memoryze, destroying the KDBG Owner Tag or hooking MmMapMemoryDumpMdl did not affect FTK Imager, as it maps memory by

54 4.2 Attacks on Memory Acquisition Software calling ZwMapViewOfSection on the \\.\Device\PhysicalMemory device. However, hooking MmGetPhysicalMemoryRanges resulted in an empty image, without any apparent warnings.

Moonsols Win64dd When creating a raw image, the destruction of the KDBG Owner Tag resulted in spontaneous reboots during acquisition with Win64dd. In our tests, an incomplete dump of 100 MB was created before the fault occurred. The log did not include any error messages. Similar behaviour was experienced when hooking MmGetPhysicalMemoryRanges or MmMapMemoryDumpMdl, which is the default mem- ory mapping method of Win64DD. The tool behaved in the same way when creating a crash dump (dmp). However, when providing all arguments on the command-line and creating a raw image, the KDBG method did not cause Win64dd to crash any- more. It was still impossible to create a crashdump, though. We presume Win64dd’s interactive mode queries the driver for some information, that triggers it to search for the KDGB, regardless of the image format.

Moonsols DumpIt Moonsols offers a packaged version of it’s memory acquisition tools called DumpIt. This tool only supports the raw output format and does not seem to be affected by overwriting of the KDBG Owner Tag. It is still vulnerable to the other two anti-forensic methods.

WinPmem Overwriting the KDBG Owner Tag causes WinPmem to fail. In contrast to other tools we tested, it does not crash the kernel. However, there is no error message indicating the reason for the failure. The hooking of MmGetPhysicalMemory Ranges also causes an abort, displaying the error message to obtain memory geom- etry. Hooking MmMapMemoryDumpMdl does not affect WinPmem, as it utilizes the \\.\Device\PhysicalMemory and MmMapIOSpace methods for memory mapping.

ATC-NY WindowsMemoryReader The KDBG method did not affect Windows- MemoryReader at all. It was even able to create a crashdump. However, the re- sulting dump could not be parsed by WinDBG completely, as the contained KDBG block was corrupted. Hooking of MmMapMemoryDumpMdl had no effect, as it is not used by WindowsMemoryReader. The MmGetPhysicalMemoryRanges method how- ever completely disabled both, raw and dmp output. It caused an error in the driver to crash the kernel, immediately rebooting the host.

4.2.2 Mac OS X

The demonstrated problems are not Windows specific. We have also conducted experiments with other operating systems, with similar results. On Mac OS X 10.8 Mountain Lion, we have tested MacMemoryReader in version 3.0.2 (ATC-NY,

55 4 Anti-Memory Forensics

2012a), as well as OXSPmem (St¨uttgen, 2012) version RC1. Both function in a similar way, with the same inherent problems malicious software can exploit. On EFI enabled systems, rather than using the BIOS Interrupt 0x15 routine, mem- ory geometry is obtained by calling an EFI boot service. The platform expert com- ponent of the OS X kernel obtains the memory map from EFI and stores a pointer to this structure in the symbol PE_state.bootArgs.MemoryMap. Zeroing this struc- ture, or simply zeroing the size, will prevent acquisition drivers from obtaining a map of physical address space, effectively preventing acquisition. Of course, amore sophisticated rootkit could modify this map to exclude any protected data. Acquisi- tion will then succeed, without any indication of subversion. However, hidden data will not be included in the image, reducing its evidentiary efficacy. This procedure is very easy to implement, a possible implementation is depicted in Listing 4.2. In our tests a simple kernel extension calling this 2-line function completely prevented OSXPmem and MacMemoryReader from acquiring even a single byte of memory.

1 void destroy_efi_memory_map(void){ 2 // Access boot arguments through platform export, 3 // and zero size member ofEFI Memory Map. 4 boot_args * ba = (boot_args *)(PE_state.bootArgs); 5 ba->MemoryMapSize = 0; 6 } Listing 4.2: OS X Memory-Map Overwriting

Similarly to Windows acquisition tools, OS X physical memory mapping can also be easily subverted. On OS X, physical memory mapping is achieved by creating an object of IOMemoryDescriptor, and then calling it’s createMappingInTask() method. By either hooking the constructor or mapping method, malicious software can perform the exact same attacks as with the above mentioned Windows memory mapping functions. Because this API is also used by regular drivers we refrain from destroying it like in our Windows experiment, as this would cause system instability unless fully implemented with filtering capability.

4.2.3 Linux

The Linux kernel maintains a tree of data structures called iomem_resource, de- scribing the physical address space of the system. The tree is built during boot and subsequently defined when drivers declare responsibility for a specific region in the physical address space. Each resource identifies a specific memory region and contains information on the location of this regions in the physical address space

56 4.2 Attacks on Memory Acquisition Software as well as a name, some flags, and pointers to other regions in the tree. When the kernel initializes memory regions backed by RAM, it assigns the static string “System RAM” as a name. As we have shown in Section 3.2.3, this name is used in all publicly available memory acquisition tools for Linux to identify memory regions backed by RAM. Similar toOSX, this presents a choke point in the system which can be abused by malware to subvert the memory enumeration procedure.

1 struct resource *p = &iomem_resource; 2 for (p = p->child; p != NULL; p = p->sibling) { 3 if (!strcmp(p->name,"SystemRAM")){ 4 disable_writeprotect(); 5 *((char *)p->name + 8) =’O’; 6 enable_writeprotect(); 7 break; 8 } 9 } Listing 4.3: DKOM Attack on Linux Memory Map

To prove how vital this data structure is for memory enumeration, we have devised a technique that stops Linux memory acquisition tools by patching one byte of kernel data as shown in Listing 4.3. We traverse the iomem_resource tree and locate nodes that have a reference to the “System RAM” string. When we encounter a reference to this string, we set the byte at offset 8 to “O”1. Figure 4.1 shows an illustration of the iomem_resource tree before and after the modification. As a result of replacing the “A”in RAM with an “O”, all “System RAM” regions are now named “System ROM”. Memory acquisition software that uses the name of a region to identify its type is lead to believe the system does not have any memory installed at all, and skip over the RAM regions. This technique is naturally not very stealthy. An investigator will find it rather hard to believe that a system has no installed memory and quickly notice the suspiciouly large ROM regions. However, he would still have a hard time acquiring memory from this system. Note that a rootkit could take this one step further and just remove a small region of memory that contains the data it wants to hide. Memory acquisition would then succeed, but the rootkits code and data would be missing from the image. We have tested this method on a system with an Intel x86-64 CPU, 8 GiB of RAM and Ubuntu 14.04 with kernel 3.13.0-45-generic. Our test suite included pmem for Linux (Cohen, 2011) and the most recent version of LiME (Sylve et al., 2012)2.

1 Because the string is located in a read-only region of memory, we briefly disable memory write protection by modifying the WP flag in the CR0 register. 2 Downloaded on February 9th, 2015.

57 4 Anti-Memory Forensics

Before After

1 00000000-00000fff : reserved 1 00000000-00000fff : reserved 2 00001000-0009fbff : System RAM 2 00001000-0009fbff : System ROM 3 0009fc00-0009ffff : reserved 3 0009fc00-0009ffff : reserved 4 000a0000-000bffff : PCI Bus 0000:00 4 000a0000-000bffff : PCI Bus 0000:00 5 000c0000-000c8fff : Video ROM 5 000c0000-000c8fff : Video ROM 6 000c9000-000c99ff : Adapter ROM 6 000c9000-000c99ff : Adapter ROM 7 000ca000-000cc3ff : Adapter ROM 7 000ca000-000cc3ff : Adapter ROM 8 000f0000-000fffff : reserved 8 000f0000-000fffff : reserved 9 000f0000-000fffff : System ROM 9 000f0000-000fffff : System ROM 10 00100000-07ffdfff : System RAM 10 00100000-07ffdfff : System ROM 11 01000000-017344c3 : Kernel code 11 01000000-017344c3 : Kernel code 12 017344c4-01d1e2ff : Kernel data 12 017344c4-01d1e2ff : Kernel data 13 01e77000-01fdffff : Kernel bss 13 01e77000-01fdffff : Kernel bss

Figure 4.1: Effects of DKOM on /proc/iomem

Both programs were unable to acquire a single byte of memory after we modified the RAM string and simply wrote an empty image.

4.3 Passive Anti-Forensics

We use the term passive anti-forensics to describe methods that refrain from in- terfering with the system at all. Because they do not modify any code or data on the system they are very hard to detect from an acquired image, even after their existence has become known.

4.3.1 Hidden Memory

The memory map provided by the system firmware does not give any details on the precise location of all memory regions in the physical address space. When theOS obtains the memory map from the firmware, it can also decide to partially ignore this information. This results in regions of physical memory that are not known to theOS which we call hidden memory. We have discovered this phenomenon while implementing our hardware based memory enumeration method detailed in Chapter 5. In Figure 4.2, we compare the actual physical address space layout of our test system with the view provided by the BIOS E820 map and the MmGetPhysicalMemoryRanges API on Windows. The blue regions represent memory that is visible in the respective view and safe to use, while the red regions must not be accessed at all because they represent MMIO. The physical memory map of the system is shown on the left, and was obtained by applying our PCI-based memory enumeration method (see Section 5.1.1). This method reliably determines all MMIO regions in the physical address space (shown

58 4.3 Passive Anti-Forensics

Physical Memory Memory Map BIOS E820 View Manager View

0xFFFFFFFF 0xFFFFFFFF APIC + BIOS ROM reserved 0xFFFC0000

0xF080C000 PCI MMIO 0xF03FFFFF

0xF0020000 PCI MMIO 0xF0000000

0xE8000000 PCI MMIO 0xE0000000

ACPI Reclaim 0x7FFF0000

Memory Available 0x00100000 0x00100000 Upper BIOS Reserved 0x000F0000 Lower BIOS 0x000E0000 PCI Option ROMs 0x000C0000 Video Window 0x000A0000 EBDA Reserved 0x0009FC00 0x0009F000 Memory Available 0x00000000 0x00000000

Figure 4.2: Hidden memory on Test System with 4 GB RAM

in red). Note that aside from static memory regions like BIOS and option ROM areas we cannot determine which regions contain memory and which are not mapped at all. However, this view gives us an accurate view on which regions are safe to read and which are not.

The area in the center of the figure represents the view on the physical address space as supplied by BIOS interrupt 0x15 with AX=0xE820. This is the memory map as seen by the bootloader, who then passes it to theOS. We have obtained this data by using the undocumented x86 BIOS emulator in the Hardware Abstraction Layer (HAL) of Windows (Chappell, 2010). This API allows us to directly call BIOS interrupts and obtain the BIOS memory map through the same channel as the bootloader (Chappell, 2011). The resulting view shows which regions in the physical address space contain memory that can safely be used by theOS. However, it neither includes all memory available on the system, nor all MMIO regions.

59 4 Anti-Memory Forensics

The right side of the figure represents the view of the Windows memory manager. It was obtained by calling the MmGetPhysicalMemoryRanges API on Windows. This API does not inform about reserved or MMIO regions, only about available memory. Similar to the BIOS memory map it can only be used to enumerate memory that is safe to use by theOS, but doesn’t inform the user about MMIO regions that must be avoided.

Note that the decreasing completeness of these views lead to regions of memory that are invisible to the Windows memory manager.

• There is a range of 3072 bytes between 0x0009F000 and 0x0009FC00. The firmware claims the EBDA region behind this for its own use. Because the win- dows memory manager only handles whole pages, it ignores the trailing 0xC00 bytes in the first memory region. Note that the size ofthe EBDA is not stan- dardized and can vary from system to system. This means that the amount of memory hidden here also varies in size on different systems.

• The 128 KiB range between 0x000C0000 and 0x000E0000 is reserved for PCI option ROMs. This region must be backed by RAM for option ROMs to exe- cute (PCI-SIG, 2002). There are no guarantees this region will not be used by option ROMs in the future, so overwriting it can destabilize the system. As it is considered tedious to reclaim memory from this region it is ignored by theOS.

• The BIOS memory regions between 0x000E0000 and 0x00100000 are normally also backed by RAM. While the actual mapping is controlled by the PAM regis- ters in the memory controller, firmware migrates to RAM during initialization to increase performance (Salihun, 2006), so it is very likely that this region contains memory.

• The ACPI reclaim range also consists of memory. Firmware marks this range as reclaimable to prevent its use before theOS power management has obtained the ACPI tables from here. After that, it’s available for use by the system. Most systems don’t spend the effort to reclaim it though, as it is rather small. Inour tests it never appeared in the Windows memory managers available ranges.

There can be other regions in the physical address space that are not mentioned in the memory maps of the BIOS or Windows memory manager. When the BIOS sets up MMIO mappings, it can shadow regions that contain RAM with device memory. Also someOS don’t bother reclaiming small regions like the ACPI tables as they only add up to a few KiB of memory. Malware can utilize these regions to store code and data outside of the scope ofOS controlled memory, if it can sucessfully enumerate them. Since all memory acquisition tools we tested only acquire the available regions from the Windows memory manager, the hidden regions are not part of the memory image.

60 4.3 Passive Anti-Forensics

4.3.2 Evaluation

The size and location of hidden memory regions depend on the chipset, firmware and hardware devices, and differ from system to system. We have observed cases where only the firmware regions and a very small region in front ofthe EBDA exists, as well as a case with many hidden memory regions intermingled with MMIO regions. To get an overall impression on the amount of available hidden mem we have performed experiments to acquire hidden memory on Windows with memory acquisition software.

We have located regions of hidden memory on our test system by identifying seg- ments outside of available regions of the Windows memory manager that are not mapped by devices. We analyzed these regions with a memory probing scheme, where we first discarded any regions that were not completely zeroed. Wethen wrote known values to the remaining memory regions and read them back to ensure the write persisted. On our test system we found 52 pages of hidden memory with this method.

We then tested if memory acquisition software was able to acquire this memory by filling all hidden memory segments with a known string. We performed the testwith the same selection of tools as in the previous evaluation for Windows, by acquiring a raw memory image with each program, and comparing the data at the corresponding offsets with the known string. Note that some tools provide multiple settings on which memory regions to acquire. Because the hidden memory segments lie inside regions reported as MMIO by the operating system, we have acquired the test images using the most extensive setting possible. Mandiant Memoryze, AccessData FTK Imager, Winpmem and Moonsols DumpIt didn’t allow the acquisition of anything other than the “available” regions. In the resulting image, these regions were zero- padded, except for Memoryze, which uses the 0xBA byte for padding. The known string was not acquired.

ATC-NY WindowsMemoryReader allows very fine tuning on the parts of memory that are acquired. It even resolves all device DMA mappings and provides options to include them in the image. Unfortunately, it regards regions that are neither “available” nor memory mapped IO as non-existent, so they can not be selected for acquisition. They were zero padded in the image, the known string could not be acquired. When using the most extensive setting “-r” to acquire all resources, the system crashed before the entire memory could be acquired.

Moonsols Win64dd is an exception in this test, because it provides a mode that acquires the entire physical address space. In our test this did acquire the known string from the first 3 hidden memory segments. However the machine crashed and rebooted while imaging the reserved memory region containing the 4th segment. This resulted in the image file being incomplete, missing the last hidden segment.

61 4 Anti-Memory Forensics

4.4 Summary

In this chapter, we have presented a study on the most common anti-forensic tech- niques against software memory acquisition. We have organized the techniques in regard to which phase in the acquisition process they target: memory mapping or memory enumeration. We have used the knowledge gained in the last chapter to lo- cate specific APIs on Windows, Linux, andOSX, that can be intercepted to subvert the memory acquisition process. Furthermore, we have developed new techniques that target the memory map on these platforms to either selectively exclude memory from the image or even make acquisition completely impossible. We have implemented a proof-of-concept implementation for 64 bit Windows 7 sys- tems that proves all freely available memory acquisition tools today are vulnerable against these techniques. In addition, we have provided implementations of our memory enumeration attack for Windows, Linux, andOSX, and evaluated them with all publicly available tools for their respective platform. Finally, we have shown the existence of hidden memory, which is memory in the system that is unknown to theOS. We found that the amount of hidden memory varied from around 3 KiB to over 200 KiB in our test environment. In an experiment we have demonstrated that none of the freely available memory acquisition tools for Windows was able to acquire all of the hidden memory regions. In fact, only Win64dd was able to acquire any hidden memory at all, but it crashed the system in the middle of the acquisition process. Note that none of the presented techniques are particularly hard for an attacker to perform. We were able to disable all available memory acquisition tools for Windows with a 29 line Python script, on Linux andOSX we even succeeded by changing a single byte of kernel memory. From this we conclude that the current software memory acquisition techniques are ill suited for malware detection, as it is trivial for a rootkit to subvert the acquisition process and filter evidence from the image. Hardware memory acquisition techniques are not without problems either and can- not replace software methods in all cases because of physical access requirements and the sometimes negative effects on the stability of the system. Because ofthis we see an urgent need for software memory acquisition techniques that are more resilient to malware subversion. In the next chapter we will show that by inter- acting directly with the hardware it is possible to remove the potentially subverted operating system from the acquisition process, which significantly reduces the attack surface.

62 Chapter 5

Anti-Forensic Resilient Memory Acquisition

As we have seen in the previous chapters, all publicly available memory acquisition software depends entirely on the operating system to achieve its two most critical tasks: memory mapping and memory enumeration. This enables malicious software to prevent memory acquisition with minimal effort on Windows, Linux andOSX. Kernel rootkits can intercept the APIs used by memory acquisition software to access and enumerate physical memory, which allows them to filter the resulting image at will (Bilby, 2006; Milkovic, 2012). In this chapter, we advance the field of forensic memory acquisition by developing software techniques to map and enumerate physical memory without relying on the potentially subvertedOS. We achieve this by interacting directly with the hardware to enumerate MMIO regions in the physical address space, and then programming the MMUs data structures to map the remaining regions into the virtual address space. We show that these techniques are resilient to the attacks presented in Chap- ter4, and discuss their potential attack surface.

Outline of the Chapter

This chapter is outlined as follows: In Section 5.1, we develop memory mapping and enumeration techniques that are independent of operating system APIs. In Section 5.2, we discuss the anti-forensic resilience of our approach. We present a number of conceivable attacks against our methods and discuss possible counter measures. We conclude the chapter with a short summary of our work in Section 5.3.

5.1 Improving Memory Acquisition

To become more resilient against anti-forensic attacks, memory acquisition software must stop relying on the potentially subvertedOS to perform its task. We achieve this goal by accessing the hardware directly, rather than relying on kernel APIs. Our driver is therefore not vulnerable to the simple anti-forensic techniques demon- strated in Chapter4. Additionally, not using non-standard APIs makes it harder to differentiate our acquisition driver from ordinary drivers without thorough code analysis.

63 5 Anti-Forensic Resilient Memory Acquisition

5.1.1 Hardware-based Memory Enumeration

As discussed in Section 3.2, obtaining the physical memory map via firmware ser- vice routines can only be done early in the operating system’s boot sequence. While Windows provides an undocumented real-mode emulator to access the BIOS(Chap- pell, 2010), it is trivial for malware to extend the approach we presented in Section 4.2 to filter this API as well. As we have also shown in Section 4.3, data may be hidden in reserved regions which are not used by the operating system. Forensic memory acquisition tools should aim to recover all available data, including data in reserved regions. However, the danger with reading from MMIO regions is that hardware may become activated, crash the system, corrupt data, or get physically damaged. Therefore, rather than finding the memory regions which are safe to read (e.g., via the MmGetPhysicalMemoryRanges routine), we instead directly enumerate the memory ranges which are not safe to read, and avoid those. As we have shown in Section 2.1.1, the physical address space is mainly composed of regions backed by memory, as well as regions that are routed to the PCIe fabric, which are allocated dynamically by the firmware during POST. As we have illus- trated in Section 2.1.3, the PCIe CAM protocol can be used to enumerate all active PCI devices, and locate their MMIO regions through the BAR registers. Secondary buses on the main PCI bus must also reserve memory ranges for themselves, which can also be read using this method by enumerating PCI-to-PCI bridge memory base and limit registers (PCI-SIG, 1998). For a detailed explanation of the configuration procedure please refer to Section 2.1.3. To enumerate all MMIO regions on the PCI bus, we read the “vendor ID” field from the configuration space of each possible BDF address. The read will return the invalidID 0xffff if there is no device with this address. If we instead get a valid ID, we parse all non I/O BAR registers, to determine the location and size of any MMIO regions of the device. We have to handle endpoint (type 0) and bridge (type 1) configuration space differently, as they implement a different numberof BARs. Also, devices can implement 64 bit BARs, which in turn leads to different offsets and sizes in configuration space. In Listing 5.1, we illustrate the process of extracting the start- and end-address from a 32 bit bar. The read_pci_config and write_pci_config functions translate the given bus, device, function and configuration offset into a PCI configuration address as shown in Figure 2.10 in Section 2.1.3. They write this address to the CONFIG_ADDRESS I/O port and then read/write to/from the CONFIG_DATA port to initiate a PCI configuration transaction on the bus. We use these functions to first read the contents of the BAR and the command register of this function to create a backup, as BAR sizing requires writing to both of them. Then we disable transaction decoding on the device by writing 0s to the command register. This prevents the device from claiming any transactions while we modify its BAR. We

64 5.1 Improving Memory Acquisition

1 u32 mask = 0; 2 u32 start = 0; 3 u32 end = 0; 4 u32 bar = read_pci_config(bus, dev, fun, offset); 5 u16 command = read_pci_config(bus, dev, fun, PCI_COMMAND); 6 7 // Disable transaction decoding 8 write_pci_config(bus, dev, fun, PCI_COMMAND, 0); 9 10 // GetBAR size 11 write_pci_config(bus, dev, fun, offset, 0xffffffff); 12 mask = read_pci_config(bus, dev, fun, offset) & 0xfffffff0; 13 write_pci_config(bus, dev, fun, offset, bar); 14 15 // Re-Enable transaction decoding 16 write_pci_config(bus, dev, fun, PCI_COMMAND, command); 17 18 // Strip flags fromBAR to get start ofMMIO region 19 start = bar & 0xfffffff0; 20 // Get end of range by adding inverse mask 21 end = ˜mask + start; Listing 5.1: PCI BAR Sizing

then write a sequence of 1s to the BAR and read it back to determine the number of hardwired bits, which determine the BARs mask. After restoring the BAR from the previously obtained backup, we re-enable transaction decoding by restoring the command register from its backup. The base of the MMIO region described by this BAR is then obtained by discarding the least significant four bits, which are used as flags and have nothing to do with the MMIO address. The end of the region is obtained by adding the inverse mask to the base. Note that CAM is performed by interacting with the PCI root complex through port I/O, not operating system APIs, and cannot be hooked in the usual way. In addition to the discovered MMIO regions, standard memory regions that are assigned to hardware, such as the ISA bus hole ranges, are automatically added to the list of excluded memory ranges. Also, there might be other devices that are not registered on the PCI bus but might have memory mapped into the physical address space. Examples include the High Precision Event Timer (HPET) on the LPC Bus, as well as local APICs, I/O APICs and BIOS ROMs. While it is possible to locate MMIO ranges used by these devices by parsing the MP (Intel Corporation, 1997) or ACPI Tables (Hewlett-Packard et al., 2011), these tables are not expected to be updated after the system has booted (Hewlett-Packard et al., 2011; Intel Corporation, 1997), making them an easy target for rootkit manipulation. While there are programming rules enforcing register alignment for reads in some of these MMIO regions like the HPET or APIC (Intel Corporation, 2014b), reading them does not violate any of

65 5 Anti-Forensic Resilient Memory Acquisition the documented constraints and did not cause any problems in our experiments. However, some devices might exist that cause problems when being read and don’t adhere to the PCI specifications. A broad evaluation of different devices should be focus of future research. Once we have obtained a list of all PCIe MMIO regions, we need to determine the highest addressable physical memory region in the system. Whilst theOS stores this value internally, we do not wish to query it as it could be compromised. Calcu- lating or obtaining this value from the hardware is not trivial, since, as described in Section 2.1.1, some regions might not be mapped to RAM at all. Because of mem- ory reclaiming, the highest physical memory address can be much higher than the total size of installed memory. We therefore allow this setting to be user selectable, and prefer to acquire past the end of physical memory, which simply yields zeros on reads.

5.1.2 Hardware-based Memory Mapping

The main function of the kernel’s memory mapping APIs is to set up the page tables used by the MMU to point to the respective page frame in physical memory (see Section 2.1.2). Memory acquisition software relies on these APIs because interfering with the kernel’s management of the page tables is risky due to synchronization re- quirements and detailed understanding of kernel page table management. Especially on multi-core systems race conditions can occur by simultaneous manipulation of page table entries by different cores. However, the kernel’s memory mapping APIs are limited and, as we have shown in Chapter4, their use makes memory acquisition software vulnerable to anti-forensic attacks. To be resilient against such attacks, it is important to directly create the page table entries that map physical memory. To avoid directly manipulating the kernel’s own page tables, which would be a complex endeavor with high potential for crashes and system instability, we ask the kernel to allocate a single non-pageable page for our own use. This causes the kernel to create a PTE to our own private allocation. Since this memory is non-paged, we can be confident that the mapping to this memory will not be modified by thekernel while we are using it, guaranteeing that our driver has exclusive access to this PTE. There are multiple methods to achieve this, depending on the operating system. On Windows, the regular non-paged pool allocations usually have large page PTEs (2 MiB), which would complicate our technique. Instead, we create an unused, page sized, static char array for this purpose within the driver’s binary. We then make sure this allocation does not get paged out while the driver is loaded by calling the MmProbeAndLockPages routine. On Linux we use vmalloc and onOS X IOMallocAligned for this purpose. The created page-sized mapping is further referred to as the “Rogue Page”, because we will abuse its PTE for mapping physical memory. Rather than using the APIs the operating system offers to drivers that

66 5.1 Improving Memory Acquisition

Virtual Address Virtual Memory Physical Memory 47 39 38 30 29 21 20 12 11 0 PML4 Directory Ptr. Directory Page Table Offset

Rogue Page Allocated Page ̸

PML4 PDPT PD PT Flush address from TLB 9 9 9 9 PML4E PDPTE PDE PTE

Target Page

40 40 40

CR3 40

Figure 5.1: PTE Remapping Technique need to manipulate the page tables, we perform a very common operation (memory allocation), which allows us to keep a lower profile and make it harder for malware to identify our module as a memory acquisition driver. The driver then walks the page tables directly using the value of the CR3 register to find the Directory Table Base (DTB), and then determines the virtual address of the responsible PTE. While page table data structures are references using physical addresses, most operating systems have the page tables permanently mapped into the kernel address space for quick access. As illustrated in Figure 5.1, the driver first obtains the address for the PML4 from the CR3 register. It then uses parts of the virtual address of the rogue page to locate the corresponding PDPTE, which it uses to find the PDE and finally the PTE. The PTE, in turn, refers to the Page Frame Number (PFN), which is the physical offset of the page divided by the page size. For each physical page we wish to access (further referred to as the “Target Page”), the driver changes the PFN in the PTE to match the physical address of the target page. It then flushes the virtual address of the rogue page fromthe TLB. All further reads from the virtual address of the rogue page will now be performed from the physical target page by the system’s MMU. Once the TLB is flushed, the MMU will automatically translate our buffer’s virtual address into the physical pagein hardware. This algorithm does not call any operating system functionality at all, once the rogue page has been locked into memory. We simply write to the PTE address directly and copy memory out of the rogue page to the user space buffers. Note that, depending on the caching type in the PTE that holds the original mapping to a physical page, writing to the rogue page can cause cache incoherence and is strongly discouraged. Thus, operating systems usually prevent the creation of an incompatible second mapping to the same physical page (Vidstrom, 2006). However,

67 5 Anti-Forensic Resilient Memory Acquisition this is not a problem for the purpose of memory acquisition, as we only need to read from this mapping. Of course it is possible that reading from the rogue page results in stale data that has already been replaced in one of the CPU caches. Because of the inherent atomicity and integrity issues that come with any software based acquisition procedure (see Section 3.1.1), the image contains stale data already, so this is not problematic. By effectively bypassing the operating system in the creation of the rogue mapping, this approach is even more powerful than using one of the APIs that would prevent the mapping in some situations.

5.1.3 Evaluation

We have integrated these techniques into the open source acquisition tool Winpmem (Cohen, 2012). We then tested it against all anti-forensic techniques presented in Chapter4 on a Windows 7 x64 virtual machine as well as a physical Intel Ivy-Bridge System with 4 GiB of RAM running Windows 8 x64. Both systems were equipped with Intel 510 Series Solid State Drives, to minimize the storage bottleneck when writing the image. The tool was able to acquire the entire address space on both fully compromised systems with a broken KDBG and hooks in MmMapMemoryDumpMdl and MmGetPhysicalMemoryRanges. It also correctly acquired the contents of hidden memory. We did not witness non-deterministic stability issues, like we experienced with Win64dd and WindowsMemoryReader when acquiring the entire address space. Our approach is generally more stable than current established techniques, because we are in no danger to trigger any API checks in the kernel (see Section 5.2.5). While the PCI memory enumeration technique is available optionally with a special command line switch, our hardware-based memory mapping technique has been the default setting for Winpmem on 64 bit systems since version 1.5.51. It is not possible to do an exact performance evaluation against other approaches, as we acquire a large amount of memory that is not accessible to other tools, which is why we have to read and write more data. However, in comparison to current techniques our approach is significantly slower. For example, the unpatched version of Winpmem wrote a zero padded image of the 4.8 GiB physical address space on our test machine in 22 seconds at 218 MiB/s. Our tool created a 6.3 GiB image in 3 minutes and 20 seconds, about 9 times slower at 31.5 MiB/s. While this does have a negative impact on the atomicity of the image, we believe it to be sufficient in real world scenarios, given the benefits the technique provides. Depending on the chosen storage medium, the bottleneck could also be the network or hard-disk. Furthermore, we believe I/O throughput can be significantly improved in the future, by mapping bigger ranges of memory. Our current implementation writes each page separately, which can not utilize the large file I/O buffers of the operating system in an optimal way.

1 See git commit 2f375a3f6e398af940f0de53cb734e27f2a872de in March 2014 (Cohen, 2014b).

68 5.2 Discussion

5.2 Discussion

Our technique does not require any complicated techniques to implement and yet raises the bar for anti-forensic methods significantly. Since we do not rely on theOS for mapping physical pages or enumerating memory, simple hooking techniques, such as demonstrated in DDFY (Bilby, 2006) are ineffective. By flushing the rogue page from the TLB just before copying the memory out, we remove the effect of desyn- chronized TLB attacks (Sparks and Butler, 2005). Also, the technique is completely operating system independent and works on all x86 systems. We have successfully tested it on Windows, Linux, andOSX systems with implementations based on the Linux pmem (Cohen, 2011), Winpmem (Cohen, 2012), and OSXPmem (St¨uttgen, 2012) drivers. The following discussion evaluates our solution against possible anti- forensic attacks that a rootkit might implement.

5.2.1 Loading of Driver

Our memory acquisition technique depends on being able to run in kernel mode. The obvious countermeasure a rootkit can implement is to prevent our driver from being loaded into kernel-mode, for example by hooking the Service Control Manager interface. Although our driver requires access to system-mode, there are few signatures that can be employed to detect our driver’s intentions. Currently, it is trivial for a rootkit to identify a memory acquisition driver simply by inspecting the module’s import table. This is especially true for a driver that uses undocumented functions which are not usually imported by legitimate drivers (e.g. MmMapMemoryDumpMdl as is used by the Win64DD driver (MoonSols, 2012)). By rejecting the driver from loading, the rootkit reveals its existence, so it must only do this as a last resort, when it is certain that a forensic agent is running. Since our driver does not import any specialOS functions, a more thorough analysis must be conducted to determine its intentions.

5.2.2 Interception of Data Buffers

Once physical memory is accessible, memory acquisition drivers typically write it to disk, to a network socket, or copy it to user-mode buffers. A simple anti-forensic technique is to mark certain regions of memory using a magic string and then hooking all kernel file operations and kernel to user-space copy operations, searching forthe magic strings. If these are found, the rootkit has an opportunity to scrub the data. This attack can be easily circumvented by encrypting or obfuscating the raw data as it is copied to userspace. We can use simple RC4 encryption to prevent the rootkit from identifying the data as it is passed from kernel-space to user-space.

69 5 Anti-Forensic Resilient Memory Acquisition

5.2.3 Debug Registers

An effective anti-forensic technique is the use of the debug registers to alertthe rootkit of reading certain memory regions (Halfdead, 2008). Modern CPUs have a set of debug registers which can be used to set hardware breakpoints on memory access (Intel Corporation, 2014b). The processor can monitor four distinct memory access breakpoints stored in debug registers DR0-DR3. Ordinarily, the debug registers contain a virtual address and trap when the processor accesses the breakpoint in the virtual address space. This kind of breakpoint is ineffective against our imaging driver since, in the kernel’s virtual address space, we are accessing our own private memory page. The PTE manipulation simply makes the desired physical memory page available through this virtual page. However, the Debug Control Register (DR7) can configure the breakpoint to be an I/O read or write breakpoint. This has the effect of generating a trap when the CPU executes an in or out instruction with an operand matching the breakpoint. Our acquisition process is not affected by this (since we do not use in/out instructions to read physical memory). Unfortunately, our PCI introspection routine which is used to enumerate MMIO regions utilizes these instructions to access PCI configuration space. A malicious rootkit can thus hook our PCI enumeration routine and cause a fabri- cated PCI device to appear in the PCIe hierarchy by returning a forged configuration space buffer when querying for a specific deviceID. This configuration can claim that the fabricated device is occupying a specific memory region for MMIO, causing our tool to exclude it from the imaging process. To become resilient against debug register I/O emulation attacks like these, we could switch to using the ECAM configuration method of PCIe. PCIe configuration space is not mapped into the I/O space, but directly into the physical address space using MMIO. By using our direct page remapping technique to access it, our tool would no longer be susceptible to I/O breakpoints. However, we need a way to reliably locate PCIe configuration space in memory without using theOS or I/O space. We leave an implementation of this technique as future work.

5.2.4 Shadow Page Tables

Another weakness of our technique is that it relies on theOS to find the page tables in the first place. All addresses in CR3 and the page tables are physical addresses. Hence walking the page tables requires a physical-to-virtual translation function, which relies on theOS. A rootkit could hook this translation function and employ a shadow-paging approach to hook write access to PTEs. This would require removing write access to the page tables and hooking the page fault handler (Ooi, 2009). There is no way to prevent a rootkit from doing this, nor to detect it, but there is a simple solution for this problem. If the memory driver creates its own page

70 5.3 Summary tables and changes CR3 to point to these custom tables, we can remain in complete control over the translation process without alerting the rootkit. The details of this implementation are left for future research, as well.

5.2.5 Reliability and Stability

The Windows kernel adopts a fast fail policy to minimize data corruption in case of an error (Russinovich et al., 2009). There are a lot of checks in place that try to detect misbehaving drivers. When an inconsistency is detected, the kernel creates a so called bugcheck, commonly known as the Blue Screen of Death (BSOD). This immediately halts all system activity and prints an error message (see Section 3.2.2). From a forensic perspective this is undesirable, because the BSOD effectively shuts down the system and makes further memory acquisition difficult. While the bugcheck causes the kernel to write a crashdump, this dump is not complete by default and partially overwrites the page file, resulting in the destruction of evidence. Itis therefore crucial that the acquisition method is as robust as possible. We believe our technique is generally more stable than others because we do not call any kernel APIs during memory acquisition. Thus, we bypass any internal checks in the kernel which could cause a BSOD. For example, page table manipulation is not allowed at interrupt level and attempting to do so causes a BSOD. However, since we operate outside of the kernel’s APIs we bypass the checks in this case and potentially avoid a number of cases which can lead to a BSOD.

5.3 Summary

In this chapter we have shown a technique for software memory acquisition that does not rely on operating system support for its two most critical tasks: memory enumerationg and memory mapping. Instead, we consult the hardware itself to enumerate all memory regions that are unsafe to read. We then map the safe regions into the drivers physical address space by programming the data structures used by the MMU directly. We have implemented this technique into the open source memory acquisition programs WinPmem, PMEM, and OSXPmem, and used them to evaluate our approach on the Windows, Linux, andOSX platforms. Our evaluation showed that our method can reliably acquire all physical memory, even on systems subverted with the active and passive anti-forensic techniques in- troduced in Chapter4. We did not witness system instability, instead we argue that the memory mapping technique used is actually more stable than interacting with theOS, at least on Windows, because it is not subject to driver validation and interrupt level safeguards. We have discussed possible counter measures and found that it is hard for malware to identify a driver using our technique as a forensic agent, due to the lack of

71 5 Anti-Forensic Resilient Memory Acquisition memory management APIs used. The only two possibilities for intercepting our techniques on the same privilege level would be the use of I/O port hooking using hardware debug registers, or shadow paging to sandbox page table manipulations. Both techniques require significant effort and introduce changes into the system by which the rootkit could be detected. Finally, we have discussed how to improve our approach to become resilient against both of these attacks, by switching from CAM to ECAM configuration and using our own private page tables. Note that our technique does not depend on any operating system functionality, except for a single page sized memory allocation. It is also not limited to acquire just memory. As long as reads from the mapped region do not have side effects it can also be used to acquire the contents of ROM chips or other devices. The next two chapters explore these new capabilities by using our approach to implement a Linux memory acquisition kernel module compatible with a wide range of kernels without recompiling, as well as acquiring firmware code and data to detect and analyze BIOS, UEFI, option ROM and ACPI rootkits.

72 Chapter 6

Kernel Independent Memory Acquisition on Linux

As we have shown in Section 3.2.2, there are two mechanisms that allow access to physical memory from user-space on Linux: /dev/mem and /proc/kcore. However, the /dev/mem device is restricted on most systems and /proc/kcore is not always enabled. Also, user-space memory acquisition tools are vulnerable to even basic malware techniques like LD_PRELOAD based shared library rootkits (Ligh et al., 2014). Because of this, memory acquisition on Linux systems typically requires loading a kernel module. The Linux kernel checks modules for having the correct version and checksums before loading, and will refuse to load a kernel module pre-compiled on a different kernel version or configuration to the one being acquired. This check is necessary, since the layout of internal kernel data structures varies between versions and configurations, and a module calling kernel APIs with incompatible data structures will result in system instability and potentially a crash. For incident response this requirement makes memory acquisition problematic, since responders often do not know in advance which kernel version they will need to ac- quire. It is not always possible to compile the kernel module on the acquired system, which may not even have compilers or kernel headers installed. Some Linux mem- ory acquisition solutions aim to solve this problem by maintaining a vast library of kernel modules for every possible distribution and kernel version (Raytheon Pikew- erks, 2013). While this works well as long as the specific kernel is available in the library, it is hard to maintain and can not cover cases where the kernel has been custom-compiled or just is not common enough to be awarded a place in the library. This is especially the case on mobile phones. Phone vendors often publish the kernel version they used, but the configuration and details on all vendor specific patches are frequently not known, severely impeding memory acquisition (Sylve, 2012). Rootkit authors also have encountered the same problem when trying to infect kernels where the build environment is not available. Recent work for Android shows that while it is trivial to bypass module version checking, it is still a hard problem to identify the layout of data structures in unknown binary kernels (You, 2012). In the Android case this problem is solved by restricting dependencies to very few kernel symbols and reverse engineering their data structures on the fly using heuristics (You, 2012). A solution for data structure layout detection could be live disassembly of functions which are known to be stable and use certain members in these data structures. Recent work has shown that it’s possible to dynamically determine the offsets of

73 6 Kernel Independent Memory Acquisition on Linux particular members in certain data structures used in memory management, file I/O and the socket API (Case et al., 2010).

Kernel integrity monitoring systems also face similar problems, as they have to monitor dynamic data and need to deduce its type and structure to analyze it. Since this changes with different kernel versions, these systems need to infer the kernels data layout from external sources. The KOP (Carbone et al., 2009) and MAS (Weidong et al., 2012) frameworks are designed to monitor integrity of dynamic kernel data structures. Their approach involves statically analyzing the kernel source code and debug symbols to infer type information for dynamic data. However, they rely on the kernel source-code and debug symbols for the exact running kernel being available in advance, which is exactly the dependency we can not guarantee in the incident response scenario.

Since the hardware-assisted memory acquisition technique we presented in Chapter 5 does not require access to kernel APIs, it is able to function in any kernel, regard- less of its version or configuration. To solve the problem of having to custom-build a kernel module for every target system, we have developed a method to load a minimal module into a running kernel using a parasitic approach. Most modern kernels have a large number of legitimate kernel modules, compiled specifically for the running kernel, already present on the system. Our approach locates a suitable existing kernel module (host module), injects a minimal memory acquisition mod- ule into it (parasite module) and loads the combined module into the kernel. The resulting modified kernel module is fully compatible with the running kernel. All data structures accessed by the kernel are taken from the host module, and were in fact compiled with compatible kernel headers and config options. However, control flow is diverted from the host module to the parasite module, by modifying static linking information. This allows the parasite module’s code to use the hosts’ data structures for communication with kernel APIs.

Outline of the Chapter

This chapter is organized as follows: In Section 6.1, we give an overview of the existing methods of loading incompatible modules into the Linux kernel, and define the requirements a module faces to be safely loaded into an unknown kernel. In Section 6.2, we then present an approach of using rootkit techniques to inject a memory acquisition module into an existing compatible module on the system. In Section 6.3, we develop techniques to redirect control flow and hijack data structures from existing modules by manipulating their relocation tables. Finally, we discuss the implementation of a minimal acquisition module that utilizes these techniques to be able to be loaded into arbitrary kernels without recompilation in Section 6.4.

74 6.1 Compatibility of Linux Kernel Modules With Different Kernels

1 struct module { 2 enum module_state state; 3 struct list_head list; 4 char name[MODULE_NAME_LEN]; 5 ... 6 #ifdefCONFIG_UNUSED_SYMBOLS 7 ... 8 #endif 9 #ifdefCONFIG_MODULE_SIG 10 bool sig_ok; 11 #endif 12 ... 13 /* Startup function.*/ 14 int (*init)(void); 15 ... 16 } Listing 6.1: Module Data Structure (The Linux Kernel Archives, 2013)

6.1 Compatibility of Linux Kernel Modules With Different Kernels

As we have seen in Section 2.2, Linux kernel modules are object files and linked directly into the running kernel. Because they run at the same privilege level as the kernel, there is no protection of kernel memory from their actions, which is why an error in a module can lead to kernel data corruption and thus to a kernel panic. Furthermore, because the kernel is directly linked with the module object file, it actually uses some of the modules data structures. For example, each module con- tains a special section called .gnu.linkonce.this_module, which holds a static data structure generated in the compilation process. The layout of this data struc- ture is defined in the kernel headers, and it is used by the kernel for bookkeeping and managing of the module. An abbreviated version is shown in Listing 6.1. It is linked into the module list and the kernel will regularly access its members. Figure 6.1 shows a part of the kernel’s code that loads the module. After loading and relocating the module the kernel directly dereferences the module.init member to call the modules initialization function. The offset of init relative to the start of the module struct depends on three factors:

• The configuration of the compiler affects the size of the individual members, as well as the padding in between. If any of these change, the location of all following members of the structure is shifted. • The configuration of the kernel affects which ifdef directives evaluate to true. For example, if CONFIG_MODULE_SIG is enabled the structure contains an addi- tional boolean member before the init pointer, shifting its location backwards.

75 6 Kernel Independent Memory Acquisition on Linux

Module

1 struct module __this_module 2 __attribute__((section(".gnu.linkonce.this_module"))) = { 3 .name = KBUILD_MODNAME, 4 .init = init_module, 5 ... 6 };

Kernel

1 static int do_init_module(struct module *mod) { 2 ... 3 /* Start the module*/ 4 if (mod->init != NULL) 5 ret = do_one_initcall(mod->init); 6 ... 7 /* Now it’sa first class citizen!*/ 8 mod->state = MODULE_STATE_LIVE; 9 ... 10 }

Figure 6.1: Initialization of a Kernel Module

• The struct layout can also change between different kernel versions. A new kernel version might add or remove a member of the struct in front of init, shifting its location.

Forcing the module to be compiled with the exact same kernel version, configuration and compiler settings ensures that all APIs are compatible and structs have the exact same layout in both the module and the kernel. If the number of members, their order, the compilers padding settings or a conditional member are only present on certain configurations or differ from kernel to module, certain members (like for example the init pointer) will be at a different offset than the kernel expects. The call to mod->init might result in a call to something entirely different, such as uninitialized data or even unmapped memory. This can easily result in a kernel crash, forcing a reboot or leading to possible data loss or corruption.

As we have seen in this section, it is crucial to compile LKMs for the exact kernel they are to be loaded into. It is important to use the same kernel headers, config and compiler as was used to build the target kernel. There are numerous safeguards in place to prevent incompatible modules from loading. Disabling or circumventing this protection can cause undefined behaviour and/or data corruption and should not be attempted.

76 6.1 Compatibility of Linux Kernel Modules With Different Kernels

6.1.1 Bypassing Module Version Checking

There are multiple ways to get around the version check and load a module even if it was compiled for a different kernel version. However, because of the reasons mentioned before this should only be a last resort as it can result in undefined behaviour, data corruption or worse. The kernel config option CONFIG_MODULE_FORCE_LOAD allows modules without valid version magic to be loaded by using the “--force” option of the modprobe program. In many cases if the module was compiled on a very closely related kernel (e.g. only the last digit is different) for the same distribution this will work. For larger differences this technique could cause a kernel crash and is usually not recommended. Because it is hard to verify if the versions are compatible without comparing the kernel headers and configuration, this option essentially allows for a gamble with the possibility of a very bad outcome. Documentation clearly states that “Forced module loading sets the ‘F’ (forced) taint flag and is usually a really bad idea.”(The Linux Kernel Archives, 2013, init/Kconfig), which is the reason few production kernels are compiled with this configuration option enabled. Even without the forced loading option enabled, the kernel can still be tricked into accepting an incompatible module by modification of the .modinfo and __versions sections. The version magic is not cryptographically signed, so it can simply be extracted from a valid module on the target system and replace the incompatible magic previously stored in another module. Because the module now contains valid magic strings for this kernel version and all its imported symbols, the version check will pass and the kernel will allow the module to be loaded. Nevertheless, the inher- ent danger with this is the same as with forced loading. It can result in undefined behavior, kernel crash and data loss. Finally, the kexec system call offers another way to insert code into system mode. “[K]exec is a system call that enables you to load and boot into another kernel from the currently running kernel”(The Linux man-pages, 2012). This can be used to load a custom acquisition kernel, replacing the old one, similar to the approach taken by the Body Snatcher tool (Schatz, 2007a). However, this will render the old kernel unusable and there is no way to recover from this into the state the system was in before. Additionally, this system call only exists on kernels compiled with CONFIG KEXEC enabled, so there is no guarantee that it will be available.

6.1.2 Requirements for a Stable Approach

Multiple problems have to be solved to load an incompatible kernel module in a reliable manner without affecting system stability. The first is the matter of getting system mode code execution. We need the ability to insert arbitrary code into the running kernel and pass control to it. This involves bypassing the version check and

77 6 Kernel Independent Memory Acquisition on Linux handing the kernel a valid struct module with an module->init pointer under our control. For this to work it is also necessary to predict the layout of the kernel’s data struc- tures. Especially the module data structure is needed to get code execution in the first place, but usage of many kernel APIs also requires creation of specific datastruc- tures with the correct layout. For example the creation of a device inode to commu- nicate with user mode requires a kernel module to have a valid file_operations data structure with correctly positioned pointers to the relevant driver functions (such as read, write, and llseek). The more APIs a kernel module wants to employ, the more data structures have to be used, which increases the necessary knowledge of the layout of the running kernels data structures. This implies that the problem becomes much easier to solve if the memory acquisition module uses as few APIs as possible. Some linux memory acquisition solutions have a rich feature set, such as writing to disk from kernel mode or dumping memory over the network (Sylve, 2012). However, this requires knowledge of the layout of data structures used in the Virtual Filesystem (VFS) or network sockets. Additionally, some existing tools parse the iomem_resource tree to enumerate physical memory mappings (to avoid acquiring MMIO regions as shown in Section 3.2.1). Kernel APIs mapping the virtual address space or even allocating memory can be difficult to use without detailed knowledge of the running kernel’s data structures and APIs. Ideally, an acquisition module for this scenario should use as few kernel APIs as possible.

6.2 Reliable Loading of Generic Acquisition Modules

A technique for loading a generic memory acquisition kernel module simplifies the acquisition process for incident responders on Linux systems. Investigators can concentrate on the incident and stop worrying about the exact kernel version of the target system, and prebuilding compatible kernel modules. Because the acquisition technique we developed in Chapter5 does not use any kernel APIs, we can use it to acquire memory on kernels where the layout of data structures is unknown. We simply inject our module into a compatible module we find on the target system, and then redirect the control flow.

6.2.1 Parasitizing a Compatible Module

The first step in parasitizing a compatible module, is to locate a valid kernel module for the running kernel suitable for parasitizing. On most distributions the directory /lib/modules/ contains a large number of kernel modules for different devices, which have all been compiled with the correct headers and configuration and thus are compatible for linking into the running kernel. Code injection into one of these

78 6.2 Reliable Loading of Generic Acquisition Modules modules not only allows us to pass the kernel version checks but also ensures that the struct module linked into the kernel is compatible. Parasitizing an existing kernel module is not a novel technique. The technique has previously been employed by malware authors as a stealthy persistence technique (Truff, 2003). Because a kernel module is a relocatable object, it is easy to add new code and data to it using standard tools. It can be essentially relinked with another module to combine both into a single object file. This can be done using the linker ld or by copying individual sections using objcopy. The Adore-ng rootkit (Stealth, 2004) for example uses this technique to hide its kernel module inside a legitimate one on infected systems and gain code execution when the host is loaded on startup. The method is documented to work on a wide variety of kernels, from the 2.4 series (Truff, 2003) to more current 2.6 and 3.0 kernels (Styx, 2012). To divert control flow in the infected module, malware rewrites the symbol names of initialization functions. By renaming init module to something else and changing the name of the injected initialization routines to init module, the kernel linker will insert the address of the injected routine into the struct module->init member when relocating the module. When the kernel initializes the loaded module it will thus call the malware’s code, not the hosts. While this technique provides a stable method to solve the first problem of getting code execution in a stable manner, it does not address the problem of learning the struct layout of the running kernel. For our use case, we are interested in other data structures a host module has to offer. If we can find a kernel module on the target system that contains all necessary data structures the parasite kernel module needs in order to use the kernel APIs, we can parasitize this module and make use of them ourselves. Because code references in the host module’s data structures are resolved by the kernel linker on load through relocations, the relocation tables of the host module contain information on the data structure layout. This can be exploited to patch pointers in relocated data structures on module load to suit our needs, without having to know anything about their layout.

6.2.2 Code Injection into Kernel Modules

Previous work used the linker ld to link code into the host module (Truff, 2003; Styx, 2012). However, this complicates the build process because it either needs the linker available on the target system, or it is necessary to first copy a suitable module from the target to a system with a suitable build environment, infect it there and then copy the result back. This is both undesirable when responding to an incident, as it changes the target’s state and increases forensic impact. Therefore it is prudent to implement a custom linker that can perform this process on the fly in memory when executed on the target. The linker has to beable

79 6 Kernel Independent Memory Acquisition on Linux insert entries into section header, symbol and string tables and add sections to the binary. We have created the elfrelink C library for this purpose (St¨uttgen, 2014). It is able to inject ELF object files into each other and migrate the required symbol, string and relocation tables automatically.

6.3 Redirection of Control Flow

Once we are able to inject code into a kernel module, we need to divert the control flow away from the host to the parasite. This can be performed by using atechnique we call “Relocation Hooking”. This is commonly used to manipulate entries in the Procedure Linkage Table (PLT) to hook calls to dynamic libraries in ELF executa- bles (Shoumikhin, 2010). The general idea is that the linker will use information in the relocation tables to patch the program’s control flow, thus manipulation of these tables can force the linker to patch a program for us. Relocation Tables are an array of relocation entries, each describing the use of a symbol in a specific location of the program. They provide information on howthis code needs to be patched to reference the actual address of this symbol, as soon as it has been loaded and its address is known. Because references and addressing are highly architecture dependent, a large number of different types of relocations exist. On the x86-64 architecture a relocation table is an array of struct ELF64 Rela, storing the offset in the code where the relocation will be performed, information on the type of relocation, the index of the referenced symbol, and an addend. De- pending on the type of relocation, the addend has to be added to the symbol off- set, for example when patching an RIP relative reference in position independent code. There are 37 different types of relocation on x86-64 (Matz et al., 2012), of which only 5 are actually used in kernel modules (The Linux Kernel Archives, 2013, arch/x86/kernel/module.c).

6.3.1 Interception of Module Initialization

Each kernel module contains a data structure called __this_module, which is au- tomatically generated from the module source code at compile time through macro expansion. The resulting definition is available in the generated .mod.c file, and linked into the module using the relocation table for its section (.gnu.linkonce. this_module). This data structure is then used by the kernel to call the initialization code pointed to by __this_module->init. The relocation table for this section has an entry that instructs the kernel to patch the address of the init_module function into this member of the struct. By modifying the symbol index in that relocation entry we can make the linker patch any symbol we want into the struct when the module is loaded. Thus it is sufficient to find this relocation entry and change its symbol index to the one of the parasites initialization function to get code execution. This process is illustrated in Figure 6.2.

80 6.3 Redirection of Control Flow

this module Kernel Module

.text 1 struct module __this_module __attribute__( 2 ( section (".gnu.linkonce.this_module"))) = { 3 .name = KBUILD_MODNAME, .rela.gnu.linkonce.this module 4 .init = init_module, ̸ 5 #ifdefCONFIG_MODULE_UNLOAD ̸ .gnu.linkonce.this module 6 .exit = cleanup_module, 7 #endif init parasite .text 8 .arch = MODULE_ARCH_INIT, }; 9 .data

Figure 6.2: Relocation Hook of module->init

Note that we don’t need to know anything about the layout of __this_module at all to do this, all information needed to patch this data structure is available in the relocation entry and the patch itself is performed by the linker.

6.3.2 Communication with User Mode

Even after we have achieved code execution, we still lack a method of communicating with user space. A memory acquisition driver needs to receive instructions from user space on which physical pages to acquire and needs to pass these pages back to user space. One of the simplest and most commonly used methods for system- to user-mode communication in Linux is the character device. A kernel module can create a data structure called file_operations, which contains function pointers for operations like read, write, and llseek. The module then registers a major number with the kernel, which will link the data structure to any inode referencing that major number. The system call mknod can be used from user space to create such an inode. Any file operations on this inode will be dispatched to the functions referenced in the corresponding file_operations data structure. If the host module implements a character device, it must already have a compatible version of this struct in its .data or .rodata section (Usually kernel modules ini- tialize their file_operations statically at compile time). To populate the function pointers in this struct there have to be relocation entries for this section, because the functions are placed in another section whose address is not known until it is loaded. When the kernel loads the module, the linker relocates the sections and then places the addresses of all relevant functions into the file_operations data structure, by parsing the corresponding relocation table. We can exploit this process by modifying the relocation table of the host to point to a symbol of our choice instead of the original read and llseek functions exported by the host module, as illustrated in Figure 6.3. When the parasitized module is loaded, the kernel linker will patch the data structure with function pointers to

81 6 Kernel Independent Memory Acquisition on Linux

file operations Kernel Module

.text ̸ 1 static struct file_operations lp_fops = { .rela.data 2 .owner = THIS_MODULE, ̸ 3 .llseek = lp_llseek, parasite llseek .data 4 .read = lp_read, parasite read 5 }; .text

.data

Figure 6.3: Relocation Hook of file_operations the parasites’ read and llseek functions instead. The parasite can then call the register_chrdev API in the kernel with a pointer to this struct, which is guaranteed to be compatible with the running kernel. Knowledge of file_operations layout is not necessary, because the relocation entries contain the necessary information. Our pointers will be placed at the correct offsets by the linker and any read or llseek calls to a device inode with our major number will be dispatched to the parasites’ read or llseek functions.

6.3.3 Selection of a Suitable Host

Due to the need for certain symbols and structs, this approach won’t work with arbitrary kernel modules. However, most distributions ship with a large number of modules to handle many different hardware devices, which are found in /lib/ modules/‘uname -r‘. We can scan this directory and select a host module that satisfies the following criteria:

• It contains a symbol with an _fops suffix in the .data or .rodata section, which indicates it has a file_operations data structure available. • It contains symbols with _read and _llseek suffixes, with relocation entries into the file_operations data structure. This is necessary for us to successfully patch file_operations. • It imports the symbols register_chrdev and copy_to_user, which the parasite needs to register the file operations struct with a major number and copy data to user buffers when called for read.

If we find such a module on the target we can load it into memory, injectthe acquisition module, hook the relocations and then pass it to the init_module system call for linking into the kernel.

82 6.4 Implementation of a Minimal Acquisition Module

6.4 Implementation of a Minimal Acquisition Module

As mentioned in Section 6.1.2, it is important that the memory acquisition module imports as few kernel symbols as possible. While it is possible to employ the same technique for other data structures as used on the module and file_operations data structures, this increases the requirements on the host module. For each additional API we want to use, we add a dependency that must be satisfied by some module on the target. This decreases the number of suitable modules, reducing the chance of finding a suitable host. We have developed a minimal physical memory acquisition module, which only relies on the register_chrdev and copy_to_user symbols. The module is based on the techniques introduced in Chapter5, and maps memory without kernel support. This is accomplished by directly editing the page tables and manually remapping parts of the modules data segment to the desired physical page. Commonly, memory acquisition modules perform memory enumeration in kernel mode by parsing the iomem_resource tree (Sylve, 2012; Cohen, 2011). However, this requires knowledge of the layout of the resource data structure. We removed this functionality from the kernel module, and leave the detection of physical memory layout to the user-space imaging tool. It can achieve this by parsing /proc/iomem from user-space, or by using PCI introspection as shown in Section 5.1.1. In our original implementation of PTE Remapping we used the preempt_disable and preempt_enable symbols to ensure the modules thread cannot be interrupted and resumed on another CPU. Because the TLB of another CPU might still contain the old mapping for the remapped page, this could result in a corrupted image. Use of these symbols implies we would have to find a valid version magic on the target, which we do not want to rely on. We have replaced them by simply using the cli /sti instructions to disable interrupts for the brief period of remapping and copying a page. We also removed debug logging from the module, as not every suitable host module might import printk. Furthermore, we removed all dynamic memory allocation from the pmem module, and placed all data structures into the data segment. This even allows us to get rid of the kmalloc, vmalloc, kfree and vfree symbols, as each module might use a different memory allocation API and we don’t want to limit our selection intarget modules this way. Another important detail we discovered when trying to make a module as version independent from the running kernel as possible is config options that affect APIs. For example the copy_to_user API is an inline function calling _copy_to_user after performing some debug bookkeeping on kernels with a specific config option enabled1. Compiling in an environment where this option is enabled will result in a

1 Kernels that are older than 3.0 have the CONFIG_DEBUG_SPINLOCK_SLEEP option, newer ones have CONFIG_DEBUG_ATOMIC_SLEEP.

83 6 Kernel Independent Memory Acquisition on Linux

Distribution Kernel Version Modules Available Modules Suitable

Fedora 10 2.6.27 1746 4 Fedora 15 2.6.38 2280 14 Fedora 16 3.1 2384 14 Ubuntu 8.04 2.6.24 1939 6 Ubuntu 12.10 3.8 3708 14 Ubuntu 13.10 3.11 3957 15

Table 6.1: Host Modules by Kernel Version

symbol dependency that kernels compiled without it can not satisfy, thus limiting the scope where the module can be successfully loaded. Also this causes problems when scanning for suitable hosts, as they import _copy_to_user when this option is enabled and copy_to_user when compiled without it. We have solved this problem by explicitly calling _copy_to_user in our module, and modifying the symbol table to use the correct one depending on what the host uses. Since copy_to_user essen- tially calls _copy_to_user, this is doesn’t affect the codes correctness or stability.

Finally, the build environment needs to be slightly tweaked, because some configu- ration options trigger dependencies on symbols that might not be available on the target system. For example, if the CONFIG_FUNCTION_TRACER option is enabled, all functions will call the symbol __fentry__ at the beginning to enable ftrace func- tionality in the kernel (Rostedt, 2009). Any module compiled with this will depend on the __fentry__ symbol which is not available on kernels without ftrace.

We have evaluated our approach on multiple Linux distributions and kernel versions to provide data on how big the difference in kernel version can actually be while still being able to obtain a physical memory image. We compiled our parasite module on an Ubuntu system with kernel 3.8.0-34. We do not believe this technique will work on 2.4 kernels due to massive changes in module loading and relocation architecture, so we did not test these (Salzman et al., 2001).

We have tested our module on six different kernels and distributions as shown in Table 6.1. All tested systems had a number of suitable modules available, with newer kernels providing 14 to 15 different suitable host modules, and older kernels 4 to 6.

Our technique was successful in acquiring memory from all tested systems without crashes or any other major problems.

84 6.5 Summary

6.5 Summary

In this chapter we have illustrated the creation of an ELF relinking library that is capable of injecting a kernel module into another module, while taking care of all string, symbol, and relocation table dependencies. With a technique we have named relocation hooking we have leveraged the information contained in a modules’ relocation tables to steal its data structures and use them to interact with the kernel in a stable manner. Furthermore, we have developed a physical memory acquisition kernel module that is independent of the version of the running kernel. It has been stripped down to the bare essentials, and requires only two kernel APIs to function, because it uses our kernel independent memory mapping technique developed in Chapter5. With the relinking library we are able to load the binary module on any Linux kernel between 2.6.38 to 3.10, regardless of configuration or compiler options. Testing shows our approach has no negative impact on system stability and provides reliable access to physical memory. This simplifies memory forensic procedures sig- nificantly and allows for physical memory acquisition even on systems where kernel headers are not available. It also minimizes the impact on the target system, as there is no need to install a build environment and compile software on the system that is to be analysed.

85

Chapter 7

Acquisition and Analysis of Compromised Firmware

In 2010, computer security researcher Dragos Ruiu noticed some very strange be- haviour in his computers (Goodin, 2013). A number of his machines would suddenly delete data or change their configuration without prompting. He started inves- tigating this issue and claimed to have discovered a malware species that infects the system firmware and actively propagates to other computers by modifying the firmware of connected USB devices. If a USB flash drive from an infected machine was plugged into a clean system it would suddenly exhibit the same symptoms. He even suspected the malware to communicate with other infected systems that were not connected to any network by use of High Frequency (HF) sounds. The malware in this case was subsequently named BadBIOS. Ruius was never able to prove his claims and present evidence of the malwares existence. And while the capabilities of BadBIOS might sound like straight out of a science fiction movie, researchers have shown that bridging of air gaps usingHF is indeed viable (Hanspach and Goetz, 2013; O’Malley and Choo, 2014). Nohl et al. (2014) also demonstrated that it is possible to infect the firmware of USB devices with malicious software that can completely take over a system in fractions of a second without requiring user interaction. So even if BadBIOS might have only existed in Ruius imagination, it is possible to create such software. BIOS and UEFI have also been successfully attacked by researchers in the past (Wojtczuk and Tereshkin, 2009; Loukas, 2012). Firmware attacks have even been spotted in the wild. For example, the Mebromi malware has the ability to infect specific versions of Award BIOS to ensure its persistence on infected hosts(Giuliani, 2013). Recently leaked documents also show that state actors have been using this attack vector for a long time. The NSA internally advertises a software called DEITYBOUNCE, capable of infecting the BIOS of Dell servers since 2007 (Schneier, 2014). What makes this threat so dangerous is that it is extremely hard to detect. There is no anti-virus software on the firmware level and SMM allows malware to leverage an execution environment that is completely hidden from the rest of the system. To detect and analyze malicious firmware it’s necessary to obtain the contents of the firmware ROMs. This can either be accomplished using a hardware Erasable Programmable ROM (EPROM) programmer (The Project, 2009) or by software that interacts with the ROM chip (The Flashrom Team, 2013). For exam- ple, the Copernicus project (Butterworth et al., 2013) aims at extracting malicious firmware code and data directly over the SPI bus. Because this approach is vulner-

87 7 Acquisition and Analysis of Compromised Firmware able to malicious software running in SMM, the latest implementation utilizes the Intel TXT extensions (Intel Corporation, 2014e) to isolate the acquisition module from other parts of the system (Kovah et al., 2014). As we will show in this chapter, current memory forensic technology is also com- pletely oblivious to malicious firmware. In order to mitigate this research gap,we present a comprehensive study on current firmware rootkit techniques, the traces they leave on infected systems, and propose methods for identifying them in the course of memory forensic investigations. Utilizing the memory mapping and enu- meration methods we illustrated in Chapter5, we show that it is possible to read firmware code and data from the systems memory bus. With this knowledge, wede- velop tools and techniques to integrate firmware acquisition into the forensic memory acquisition process. Our insights are implemented into standard open-source tools which are published as part of the Rekall project (Cohen, 2014b). We evaluate our work using a proof-of-concept ACPI rootkit implementation and manipulated firmware images.

Outline of the Chapter

The remainder of this chapter is outlined as follows: In Section 7.1, we present a survey of current firmware rootkit techniques and their implications for memory forensics. We then describe a method for enumerating and acquiring firmware code and data from the memory bus in Section 7.2. In Section 7.3, we discuss the analysis of the acquired data, followed by an evaluation of how well these insights are already incorporated in common forensic suites and applications in Section 7.4. Aspects and limitations that need to be considered when applying the respective concepts in real- world investigations are discussed in Section 7.5. We conclude with a short summary of our work in Section 7.6.

7.1 Rootkit Strategies for Compromising Firmware

In this section we present a survey of the current state of the art in x86 firmware- based malware techniques. We group exploits by technology used and point out the traces that are recoverable using memory forensics. While firmware rootkits are highly target-specific and require a lot of in-depth knowledge to develop, malware authors have demonstrated that building working prototypes is feasible, and various approaches have already been adopted by different species “in the wild” (Giuliani, 2013).

7.1.1 BIOS- and EFI-Based Attacks

As Bulygin et al.(2014) report, a huge number of BIOS/EFI attacks were success- fully carried out in the past. Despite update signature verification, secure boot, and

88 7.1 Rootkit Strategies for Compromising Firmware other security measures at the firmware level, many feasible attack vectors still exist. In the following, we give a brief overview of common system compromise strategies. When an x86 computer is first switched on, the ROM containing the firmware is initially writable through the SPI bus. This functionality is necessary to permit legitimate installation of new firmware updates. On the other hand, before control is handed to the operating system, SPI flash must be properly locked down to prevent software from overwriting the ROM. However, many vendors fail at these tasks and leave the respective areas open for manipulation (Bulygin, 2013; Bulygin et al., 2013). As a consequence, malicious code may flash the firmware ROM directly from kernel space and incorporate malevolent functionality. In addition, most BIOS update implementations do not require a cryptographic signature. They process any source file as long as it matches a given format. This flaw was exploited by the Mebromi rootkit to infect versions of Award BIOS (Bulygin et al., 2014). In contrast, modern firmware technologies based on EFI are more wary of such attack vectors and attempt to verify update requests more rigorously. However, the respective algorithms may contain unintentional errors and, thus, be susceptible themselves as Wojtczuk and Tereshkin(2009) argue. Even with all software measures perfectly implemented, a malicious adversary at an arbitrary position in the supply chain can modify a system’s firmware with the help of a flash programmer. As recently outlined by Brossard(2012), the origi- nal firmware image can be replaced with a malicious one using open firmware like Coreboot (Minnich, 2014), SeaBIOS (O’Connor, 2014), or iPXE (Brown, 2014). All these attacks ultimately result in reprogramming of the firmware flash ROM. As laid out in Section 2.3, this ROM chip is mapped on the memory bus from 0 xF0000 to 0xFFFFF. It is thus possible to include this region into a memory image for analysis.

7.1.2 PCI Option ROM-Based Attacks

Because some PCI devices require custom initialization, system firmware loads and executes any option ROM provided by devices during boot time. This code runs in firmware context while SPI flash is unlocked and can therefore patch the firmware ROM effortlessly. For instance, as Brossard(2012) points out, it is possible to load a bootkit over the built-in Wifi or WiMax devices of the system by flashing a malicious option ROM onto a network card. Thereby, firewalls or intrusion detection systems can be bypassed. A vulnerable firmware version can also be directly exploited over the network: Triulzi (2010) outlines techniques for remotely reflashing the firmware of specific network cards. Even worse, because PCI devices have unrestricted access to physical memory, additional malicious code may be downloaded in order to further propagate into the local network.

89 7 Acquisition and Analysis of Compromised Firmware

Last but not least, a system may also be compromised using a malicious device that is attached over a hardware port and initiating a subsequent reboot. For example, Loukas(2012) shows how an Apple computer may be infected with malware by connecting a small ethernet adapter to the Thunderbolt port. Because Thunder- bolt hardware has direct access to the PCI bus and, thus, to physical memory, the machine is prone to attack, in correspondence to our previous explanations. Ad- ditionally, Hudson(2014) demonstrate that it is possible to infect the EFI from a malicious Thunderbolt option ROM. The previously described attacks result in the introduction of one or more new PCI option ROMs into the system. Firmware maps this ROM somewhere into the phys- ical address space and stores a pointer to its location in PCI configuration space. Similarly to the firmware ROM, option ROMs can also be read over the mem- ory bus, and thus their code can also be included into a memory image. Further- more, firmware copies option ROMs into the option ROM memory0xC0000 area( - 0xE0000) for execution (see Section 2.3.3). This area is actually RAM and should also be included into a memory image.

7.1.3 ACPI-Based Attacks

ACPI programs run in kernel space and therefore have full permission to operate on the physical address space. Even though sensitive data structures could theoretically be protected efficiently by filtering the respective instructions in the AML virtual machine, such restrictions have not yet been implemented in any major operating system to the best of our knowledge. Neither Linux up to kernel 3.15 nor Windows up to version 8 have security measures in place to prevent ACPI programs from subverting the system core. Because the ACPI tables are provided by the firmware, they are implicitly trusted. In the presence of a skilled adversary, this assumption may be potentially devastating. The vulnerability we have just outlined can be exploited in several ways: First, it is possible to patch the ACPI tables directly in the firmware image. In addition, because the tables are copied to memory and must be identified by the operating system, a malicious bootkit has the chance of modifying them prior to this process. Alternatively, a manipulated version of the tables can be placed right in front of the firmware-provided copy. Since the location of tables is not strictly defined and must be retrieved by the operating system with the help of a signature-based scan (see Section 2.3.4), only the manipulated version is found, while the original and legitimate code is never executed. As a consequence, an ACPI rootkit may be embedded in either the firmware ROM on the mainboard, in any PCI option ROM, on a connected PCI device, or even as part of an EFI driver module. Detection and removal of such a threat is cumbersome, and most of the described methods even survive a complete wipe of the hard disk.

90 7.2 Enumeration of Firmware in the Physical Address Space

A proof of concept implementation of an ACPI rootkit for the Linux kernel has already been published (see Heasman, 2006). The rootkit hooks all unused system calls by overwriting the sys_ni_syscall() function with the instructions call ebx; ret;. Because the ebx register is controlled by code running in user space, effectively all programs with an arbitrary privilege level are able to executecode in kernel space. The concept can be used to, e.g., illegitimately gain additional permissions or load additional kernel rootkits even in case kernel module loading has been disabled. However, at the point of this writing, we are not aware that these insights are being actively abused by malicious programs “in the wild”. No matter what the original attack vector was, the ACPI tables have to be placed into RAM for the OS to find and execute them. If not already present, they should definitely be included into a memory image. If they are supplied by the firmware or a malicious option ROM, all firmware ROMs should be included in the memory image.

7.2 Enumeration of Firmware in the Physical Address Space

As we have shown in Section 7.1, there are many regions in the physical address space that contain firmware code and data. Figure 7.1 illustrates the layout of the address space on a machine, we specifically set up for testing. Not highlighted are regions containing physical RAM and are marked “Memory”. These are already acquired with a standard physical memory dump (see Section 3.2). The blue regions contain firmware code or data that can be accessed through the memory bus. Theyhaveto be incorporated into the memory image if firmware analysis should be performed. Regions marked in red represent memory-mapped I/O and must not be touched. Just reading from these regions can cause an interrupt on the device, thus leading to data corruption and system crashes.

7.2.1 Enumeration of the Physical Address Space

As pointed out in Section 3.2, memory acquisition software commonly relies on the operating system to identify and map physical memory. Precisely, imaging programs duplicate solely those parts of the address space that are explicitly marked as RAM. On Microsoft Windows, the MmGetPhysicalMemoryRanges API can be used to query the memory manager for the physical memory layout. However, further but less common methods do exist: On systems with a BIOS, for instance, the firmware memory map may be queried in real-mode by setting the eax register to 0xE820 and repeatedly invoking interrupt 0x15. This method is usually applied by the boot manager, and the retrieved information is passed to the operating system for further processing. During runtime, it is not advisable to manually switch to real-mode from a driver as this can cause system instabilities. Fortunately, since Windows Vista,

91 7 Acquisition and Analysis of Compromised Firmware

Physical Memory Ranges

0xFFFFFFFF

0xF080C000 PCI MMIO 0xF03FFFFF

0xF0020000 PCI MMIO 0xF0000000

0xE8000000 PCI MMIO 0xE0000000 ACPI Tables 0x7FFF0000

Memory

0x00100000 Upper BIOS 0x000F0000 Lower BIOS 0x000E0000 PCI Option ROMs 0x000C0000 Video Window 0x000A0000 EBDA 0x0009FC00 Memory 0x00000000

Figure 7.1: Firmware Memory Ranges the kernel’s HAL includes an undocumented BIOS emulation module that permits drivers to access BIOS services directly (Chappell, 2010).

Each memory enumeration method provides a unique view of the physical address space. None of them is entirely accurate though, because most devices (especially on the PCI bus) are not directly managed by the operating system but by a vendor- supplied driver. In Figure 7.2, we present a comparison of three major sources of information on the physical address space. Regions that contain firmware code or data are marked in blue, while regions that are reserved by devices and must not be read are marked in red. Unmarked regions are unknown, they might be backed by memory or mapped by devices. If the location of device MMIO regions is unknown, software must not access unmarked regions to ensure system stability.

The most incomplete view of the physical address space is returned when querying the MmGetPhysicalMemoryRanges API in the windows memory manager, as seen on the right of the figure. As we have argued in the previous section, memory imaging programs only acquire those ranges that are identified as being “available” by the operating system. For safety reasons, other areas are ignored, including regions of memory that are used by the firmware. For this reason, memory images obtained

92 7.2 Enumeration of Firmware in the Physical Address Space

Static + PCI BIOS E820 Memory Manager

0xFFFFFFFF 0xFFFFFFFF APIC + BIOS ROM reserved 0xFFFC0000

0xF080C000 PCI MMIO 0xF03FFFFF

0xF0020000 PCI MMIO 0xF0000000

0xE8000000 PCI MMIO 0xE0000000

ACPI Reclaim 0x7FFF0000

Memory Available 0x00100000 0x00100000 Upper BIOS Reserved 0x000F0000 Lower BIOS 0x000E0000 PCI Option ROMs 0x000C0000 Video Window 0x000A0000 EBDA Reserved 0x0009FC00 0x0009F000 Memory Available 0x00000000 0x00000000

Figure 7.2: Views on the Physical Address Space

through this method are not suited for firmware examinations. With respect to a test system we analyzed, a created memory image only contains two ranges of physical memory. The remaining regions in the image are either zero-padded or not part of the image at all (e.g., when using the crash dump approach (Microsoft Corporation, 2011)).

As depicted in the center part of Figure 7.2, the BIOS provides a better view of the physical address space. Additionally to the memory regions identified by the memory manager, the BIOS also keeps track of memory used by ACPI. Furthermore, there are 3,072 bytes of memory right at the end of the first memory region that the operating system does not know about (hidden memory, as illustrated in Section 4.3). (U)EFI offers a similar service to the BIOS memory map. However, because it is a boot service, it is not available anymore once the boot manager has handed control to the operating system. The layout and classification of memory ranges is the same though.

93 7 Acquisition and Analysis of Compromised Firmware

The most exhaustive map of the physical address space can be constructed by in- tersecting knowledge from the architecture specifications with an enumeration of PCI configuration space. This view is illustrated on the left side ofFigure 7.2: As discussed in Section 2.3, the physical address space layout in the first megabyte is well-defined. There are designated regions in the physical address space for PCI op- tion ROM execution, the BIOS/UEFI and EBDA. Note that the mentioned firmware ROMs in these regions are not actually mapped ROMs anymore. Due to performance reasons, firmware migrates into memory during initialization (see Sections 2.3, 2.3.3). It is therefore safe to read from these addresses and perform memory acquisition just like with regions that are explicitly marked as RAM. The memory layout above the first megabyte is not defined and depends onthe amount of installed memory as well as on the number of installed devices. Because the latter map registers and memory into this part of the address space, simply iterating through the entire area would be a dangerous process since the respective operations could trigger interrupts and result in undefined behaviour and the loss of data. Therefore, in order to avoid instabilities, software needs to consult the firmware or operating system upon what areas are safe to read. Because the ACPI tables lie somewhere outside of the memory regions reported by the operating system, it is prudent to acquire as much memory from the upper part as safely possible. Furthermore, it is trivial for malware to hook the kernel memory enumeration APIs and hide from the acquisition. Because the real danger of accessing memory outside the available regions comes from touching PCI device memory, it is best to simply exclude all MMIO regions and acquire all remaining sections. This can be accomplished by use of our PCI memory enumeration technique shown in Section 5.1.1. To sum up, the non-red regions on the left of Figure 7.2 do not necessarily contain RAM. Reading from parts of the physical address space that are not mapped simply returns zeroes1. The resulting image is significantly larger than an image that solely comprises ranges being marked as “available” but includes the entire firmware code and data.

7.2.2 Mapping of Memory and Firmware Regions

Some of the firmware regions in the physical address space we have identified areac- tually RAM. The ACPI tables, EBDA and the PCI option ROM area in the first MB are stored in memory and can thus be accessed using conventional methods like kmap . Others, like PCI option ROMs are memory-mapped I/O which can cause problems with standard kernel memory mapping functions due to caching constraints. While it is possible to use iomap_nocache on Linux, or MmMapIOSpace on Windows to ac- cess them, we prefer to bypass the operating system for accessing device memory. If

1 It is possible that some systems return another pattern or even data that is still on the bus from a previous read. However, we have not witnessed such behavior during our tests.

94 7.3 Firmware Analysis an area of memory has already been mapped by a driver or even the kernel itself, care has to be taken to conform to caching attributes to avoid memory corruption. The Windows kernel will actually prevent any attempts to map memory that has already been mapped with different caching attributes, making use of standard operating system memory mapping facilities unreliable (Vidstrom, 2006). We can use the PTE remapping technique described in Section 5.1.2 to map firmware memory. In fact, our implementation in Chapter5 is already capable of acquiring firmware this way. Because our method uses a separate mapping and is guaranteed to only read from this mapping, we can avoid running into problems with cache coherence and alignment requirements. The operating system can not interfere with this because we bypass the memory management APIs and create the mapping manually. The resulting memory image now contains all memory, firmware code and data, and can be analyzed using standard tools like Rekall (Cohen, 2014b) or Volatility (Walters, 2014).

7.3 Firmware Analysis

Firmware implementations are platform dependent, and executable formats and code compression schemes vary from vendor to vendor. It is out of the scope of this thesis to present generic firmware code analysis and verification solutions. However, since the memory locations of firmware code are clearly defined, it is trivial to disassemble it with the Rekall dis plugin or extract it to the filesystem with the dump plugin for analysis with specialized software like IDA Pro (Hex-Rays, 2005). ACPI code on the other hand allows for more automation on the analysis side. We have created two plug-ins for the Volatility (Walters, 2014) and Rekall (Cohen, 2014b) frameworks, one for dumping the ACPI tables from a memory image, and another one for scanning the respective tables for potential rootkits. To acquire the ACPI tables from memory we have mirrored the process used by the OS to find them (see Section 2.3.4). First, a signature-based scan for the RSDT is performed. When the RSDT is found, we follow the pointers inside to locate the other ACPI tables. Our plugin then writes the tables out to the filesystem for analysis. For analysis-related tasks, we first decompile and, in a second step, examine the tables for signs of malicious behavior. The central technique for manipulating ker- nel memory from an ACPI program is the definition of so-called operating regions. They determine which part of the address space will be modified. Our method for detection of malicious behavior is thus to identify all operation regions that reference kernel memory. An investigator can then use this information to focus on exactly those sections of the ACPI program during investigation. The plugin utilizes the official AML decompiler (Intel Corporation, 2014a) to trans- form the AML code into ACPI Source Language (ASL). The resulting ASL code

95 7 Acquisition and Analysis of Compromised Firmware is subsequently scanned, and all operation regions referencing critical memory are flagged as suspicious, i.e., parts of physical memory that contain kernel codeand data.

7.4 Evaluation

We have evaluated the created tools for stability, correctness, and, in case of the ACPI triage plug-in, for rate of detection and number of false positives and negatives. We set up several physical as well as virtual machines and created duplicates of their physical address space. The machines comprised the following configuration:

• A Lenovo x220 notebook with an Intel Sandy Bridge CPU and 8 GBs of DDR3 RAM running Ubuntu 12.04 x64 • A Dell workstation with Intel Ivy Bridge CPU and 8 GBs of DDR3 RAM running Windows 8.1 x64 • A virtual machine based on VirtualBox with 4 GBs of RAM running Debian 7 x64 with Kernel 3.2.41 • A virtual machine based on VirtualBox with 2 GBs of RAM running Windows 7 SP1 x64

7.4.1 Stability and Correctness of the Acquisition Method

All acquisition operations were successfully completed every time. We could iden- tify the firmware regions in every image with corresponding data. We were not able to verify the firmware though, because we did not have access to EEPROM reprogramming hardware and, thus, did not have access to the original contents of the firmware ROM. Additionally, because most firmware implementations are com- pressed to save space, proper verification would require reverse engineering of the firmware compression algorithm and analysis of the decompressed ROM image.To establish correct firmware acquisition without access to the ROM nonetheless, we leveraged features of virtualization software. Specifically, -kvm (Linux Kernel Organization, 2014) permits loading custom BIOS images over the - command line option. With the help of the -option-rom parameter, it is possible to load a custom Option ROM as well. We started a qemu-kvm-based virtual machine with a version of SeaBIOS (O’Connor, 2014) and an iPXE Option ROM (Brown, 2014). By acquiring memory from inside the virtual machine, we obtained an image with known BIOS and PCI Option ROM code. We were able to find fragments of the iPXE and SeaBIOS images in thecre- ated memory images at their expected locations. In addition, we could identify parts of the dumped firmware to come from the supplied ROM images. Other parts were heavily modified though and are likely to have been space-optimized in memory. Further experiments are needed in the future to confirm these assumptions.

96 7.4 Evaluation

Acquisition Tool Firmware Acquired

Memoryze ✗ FTK Imager ✗ Moonsols DumpIt ✗ WinPmem ✗ WinPmem (pci) ✓ WindowsMemoryReader ✗ LiMe ✗ Pmem ✗ Pmem (pci) ✓

Table 7.1: Firmware Acquisition Capabilities of Memory Forensic Software

7.4.2 Comparison with Available Memory Acquisition Solutions

We have evaluated a large group of freely-available memory acquisition solutions to see if they are capable of correctly obtaining firmware code and data. The results of our evaluation are depicted in Table 7.1. Thereby, an entry labeled with the extension pci means that the respective version of the program supports PCI address space enumeration (see Section 7.2.1). As can be seen, only those two versions were able to acquire all firmware code and data. All other tools simply imaged the “available” ranges supplied by the Windows Memory Manager or, on Linux systems, by the iomem_ressource tree (see Section 3.2.2), and do not contain any firmware-related code or data.

7.4.3 Detection of ACPI Rootkits

We created a simple ACPI rootkit that is capable of modifying the Linux kernel and setting up a hidden backdoor, analogously to the proof of concept application by Heasman(2006) as described in Section 7.1.3. The rootkit was installed on five vir- tual machines running Fedora 19, Ubuntu 12.04, Debian 7, OpenSuse 12.3, and Win- dows XP as well as two physical Intel Sandy Bridge systems running Ubuntu 12.04. Each system was analyzed with the help of the scanner plug-in we developed for the Volatility framework (see Section 7.3). Further tests were conducted with non- infected ACPI tables of original manufacturers as well as manually manipulated tables that covered a wide range of malicious accesses to kernel memory. Objective of our experiments was to examine ACPI-related data structures and automatically distinguish potentially infected components from legitimate program parts. In to- tal, 299 operation regions were evaluated. The corresponding results are shown in Table 7.2.

97 7 Acquisition and Analysis of Compromised Firmware

Correctly Classified Falsely Classified ∑

Malicious 13.0% 16.4% 29.4% Benign 61.9% 61.9% Unknown 8.7% 8.7%

∑ 83.6% 16.4% 100%

Table 7.2: Classification of Operation Regions in the ACPI Test Data Set

As can be seen, the scanner flagged 29.4% of all operation regions as malicious. In reality however, only 13% of these regions represented true rootkit activity. The remaining 16.4% were erroneously reported due to legitimate memory accesses in the AML virtual machine. In contrast, 61.9% of the operation regions were correctly recognized as benign and do not reference any kernel memory. Last but not least, 8.7% of the regions could not be evaluated because their respective arguments were dynamic. If the parameters of a region depend on a variable or the result of a function call, it is impossible to determine the target of the operation with static code analysis. Evaluating those would require the state of the runtime environment at the given time they are executed. The missing regions can thus not be classified and have to be manually analyzed.

Our results can be summarized as follows: On the one hand, due to our plug-in, 61.9% of all memory accesses do not need to be examined in detail and may safely be ignored in the course of an investigation. As such, forensic practitioners benefit from considerable time savings and are able to focus on the relevant sections of an ACPI program. On the other hand, with 16.4%, the number of false positives is still rather high. As we have already indicated, these mis-classifications stem from the fact that we were unable to distinguish accesses to regions that belong to legitimate ACPI memory from those that access actual kernel data structures. To decrease the false positive rate, an in-depth analysis of the ACPI environment of the kernel would be necessary. For this task, further research must be conducted in the future.

7.5 Discussion

Even though our approach is capable of reliably acquiring all firmware code and data and may be easily integrated with existing memory forensic procedures, practitioners have to be aware of technological limitations. A brief discussion of these will be subject of the following sections.

98 7.6 Summary

7.5.1 Technological Limitations

Some firmware rootkits cannot be detected with software-based memory forensic methods. Any rootkit that completely isolates itself from the CPU-accessible mem- ory falls into this category. SMM rootkits, for instance, patch the BIOS to inject code into System Management Mode. This code is run when a SMI is triggered. The System Management Mode comprises its own address space, i.e., SMRAM, and is strictly separated from accesses by kernel or user space applications. This restriction is enforced by the memory controller and can not be bypassed when the respective configuration registers have been set up correctly. By a similar reasoning, malicious programs running on theME(Stewin and Bystrov, 2012) cannot be discovered. The only way of obtaining a copy of the respective memory regions would be to perform a RAM transplantation attack (Halderman et al., 2008). For this purpose, physical access to the machine and a system reboot would be required. On systems with DDR3 RAM there is currently no way to do this due to data scrambling (see Section 1.2).

7.5.2 Anti-Forensics

It is also possible for firmware to hide or even wipe malicious code and data from RAM before the acquisition process commences. If the only malicious component that is still in memory at runtime resides in SMRAM, it is protected by the mem- ory controller and will not appear in the memory image. Any bootstrapping code in the firmware can be wiped from memory after performing its designated task. In this situation, the only way of acquiring the malicious code is by either using a flash programmer to physically read the ROM chip or running a tool like Coperni- cus (Butterworth et al., 2013) if Intel TXT is available.

7.6 Summary

In this chapter, we have discussed possibilities for rootkits and other sophisticated malicious applications to compromise x86 systems at the firmware level. Although yet rarely seen “in the wild”, these types of attacks are highly dangerous and may be particularly devastating because the base of the machine is subverted at a very early point of time, and corresponding traces are easily overlooked during typical system investigation routines. As we have seen, common memory forensic solutions distributed on the market to date fail to properly acquire the respective sources of the physical address space and are therefore ill-prepared in the course of an incident. We have adapted the techniques developed in Chapter5 to enable investigators to acquire firmware code and data in the course of a memory forensic investigation. We have also created two plug-ins for the Volatility and Rekall forensic frameworks to facilitate inspection of the ACPI environment and discover traces of malevolent behavior more quickly.

99

Chapter 8

Conclusion

Memory forensics has become a powerful tool in the arsenal of incident responders, forensic investigators, and malware analysts. It can provide an unfiltered view on the internals of operating systems and programs, uncovering artifacts hidden by malicious software, such as processes, threads, and network connections. As memory is volatile and cannot be accessed by user-mode programs directly, its contents must be made available for analysis by acquiring it into a memory image. Because of the physical access requirement and practicality issues elaborated in Section 1.2, memory acquisition is mostly performed by software. This process is vulnerable to anti-forensics by malicious software, which we try to remedy in this thesis.

8.1 Summary

In Chapter3 we have given an overview of the current state of the art of software memory acquisition, which we define as the process of creating a copy of physical memory called memory image. We have outlined the criteria that we use to classify the quality of memory images, and pointed out the importance of correctness in regard to obtaining a “true” and complete copy of the systems physical memory. We have analyzed the two main challenges software must solve to acquire physical memory: memory enumeration and memory mapping. Memory enumeration refers to the task of locating RAM in the physical address space. Because the physical address space is not continuous and contains MMIO regions interleaved with regions backed by RAM, software must determine the location of all RAM regions to avoid accessing device memory, which can cause system instability. Memory mapping refers to the creation of a mapping in the virtual memory of the acquisition process. Because of memory protection software cannot access physical memory directly, but has to create an entry in the page tables to get access to a specific physical page. In our analysis of 12 forensic memory acquisition programs for Windows, Linux, and OS X, we found that all of them rely on the operating system to enumerate and map physical memory. We have given an overview of the operating system APIs used by the software, and implemented a memory acquisition framework forOSX called OSXPmem. The reliance of memory acquisition software on the operating system for its most critical tasks make it prone to subversion by anti-forensic software. In Chapter4, we have given an overview of anti-forensic techniques against memory enumeration and memory mapping, as well as passive techniques that utilize unknown regions

101 8 Conclusion of physical memory we call hidden memory. To demonstrate the severity of the problem, we have implemented a selection of anti-forensic techniques for Windows, Linux andOSX. Using these proof-of-concept implementations, we have performed an evaluation of the 12 memory acquisition tools introduced in the previous chapter. We found that none of the analyzed programs were able to acquire a memory image with our anti-forensic techniques in place. The techniques we have demonstrated are generic and can be extended by an attacker to selectively hide information from memory acquisition tools. The simplicity of these methods emphasizes the need for software memory acquisition techniques that are resilient against anti-forensic attacks.

To counter the attacks presented in the last chapter, we have developed a soft- ware memory acquisition technique that does not rely on the operating system for memory enumeration and mapping, which we introduce in Chapter5. Instead of enumerating available physical memory regions, we query the hardware directly to identify all MMIO regions that are mapped into the physical address space. This enables our software to safely access the entire physical address space, while avoiding to read from device memory, which can destabilize the system. We map memory by allocating a page of memory we call the rogue page. By walking the page tables we locate the page table entry used by the MMU to map the physical frame for the rogue page. We then directly modify the frame number in this entry to point to the target page. After flushing the rogue page from the TLB this causes the MMU to direct all further memory accesses for the rogue page to the target page in physical memory. Our evaluation showed that we can reliably acquire all physical memory with this technique, even on systems that have been subverted by our anti-forensic tools. Finally, we have discussed possible anti-forensic techniques that could still work against our approach. We have identified debug register rootkits and shadow paging as the only conceivable attacks on the same privilege level that could work, and presented ideas on how to further improve our technique to be resilient against these two methods.

One of the key benefits of our memory enumeration and mapping method introduced in the previous chapter is that it is operating system independent. This makes it ideal for solving a problem in Linux memory acquisition: The requirement of having to compile an acquisition kernel module on a system with the exact same configuration as the target, or even worse, on the target itself. This is mandatory to maintain system stability, because the layout of data structures changes with different versions and configurations of the kernel. In Chapter6, we have illustrated the creation of a minimal memory acquisition module for Linux, that is independent of the kernel version and configuration used. We have adapted methods normally used in rootkits to inject this module into a compatible host module on the target, and then instrument the host module’s data structures, to redirect control flow and communicate with kernel APIs. This approach has allowed us to create a memory acquisition program that can be distributed as a statically linked binary. It is able to

102 8.2 Future Work relink a dynamically selected host module on the target system on the fly, converting it into a memory acquisition module that is fully compatible with the running kernel. A second novel property of our approach is that, due to MMIO enumeration it can safely acquire more than just physical memory. In Chapter7, we have given an overview of memory used by the system firmware and shown that current publicly available memory acquisition software is incapable of acquiring firmware code and data. With the use of the memory enumeration and mapping techniques developed in Chapter5, we were able to acquire all firmware memory regions that are not protected by SMM, including the BIOS/UEFI ROM, PCI option ROMs, and the ACPI tables. To aid investigators in their analysis of malicious firmware, we have developed plugins for memory analysis frameworks that help identify ACPI code accessing operating system memory regions.

8.2 Future Work

While the techniques we have developed in this thesis have furthered the anti-forensic resilience of software memory acquisition, there is still potential for improvement. To become immune to debug register based PCI device simulation, our memory enumer- ation procedure can be improved to use the PCIe-based ECAM mechanism to access PCI configuration space. It is also possible to obtain memory geometry directly from the memory controller, making PCI enumeration unnecessary. The Intel iMC, for example, makes its memory configuration registers available through MMIO(Intel Corporation, 2013). Because these registers are directly used for memory routing, they are locked once the physical address space is configured. This makes them a reliable source of information regarding the address space layout, because its im- possible for malicious software to tamper with them. It also reduces the danger of accessing regions of memory mapped to devices not on the PCI bus, making the approach more stable. Furthermore, instead of using a rogue page to map memory, we suggest utilizing a private page table hierarchy. While this approach is much more complicated, it is resilient against shadow paging and solves the problem of requiring a physical-to- virtual translation function for operating systems with kernel Address Space Layout Randomization (ASLR). This can also improve the acquisition speed of our ap- proach, because the private page tables can map the entire physical address space, which allows us to use larger buffers and reduces the amount of necessary TLB flushes. In addition to the correctness of images, their atomicity and integrity are also prob- lematic when acquired by software. Recent research has shown that low levels of atomicity in an image make it difficult to integrate the page file into the memory analysis process (Richard and Case, 2014). By hooking the page fault handler and marking all memory as non-writable, software could implement a lazy-dumping ap-

103 8 Conclusion proach similar to the one used in virtualization-based memory acquisition software (Martignoni et al., 2010). Finally, with the injection of a memory acquisition module into arbitrary Linux kernels we have solved the kernel version problem for the acquisition side, but not for the analysis side. For sophisticated analysis of the acquired memory dump we need to gather information on symbols and data structures. The Rekall (Cohen, 2014b) and Volatility (Walters, 2014) projects for example refer to this as a profile. This profile is usually built by compiling a kernel module with debugging information for the exact kernel version on the target, which is then parsed to extract data structure layout and symbol information (Hale, 2013). When the kernel version and configuration is not known or available, this is not possible. However, information on the kernels data structure layout is contained in the relocation tables of modules and the kernel binary itself. Future work can utilize this information to build a partial profile for the target system. By extracting and parsing this information we can get an understanding of the layout of parts of certain data structures. With this knowledge we can build a partial profile without having access to kernel headers and configuration files.

104 Bibliography

AccessData (2012). FTK Imager. http://www.accessdata.com/, 2012.

Accetta, Mike; Baron, Robert; Bolosky, William; Golub, David; Rashid, Richard; Tevanian, Avadis; Young, Michael (1986). Mach: A New Kernel Foundation for UNIX Development. In Proceedings of the USENIX Summer Conference (pp. 93–112)., 1986.

ACPI Promoters Corporation (2013). Advanced Configuration and Power Interface Specification – Revision 5.0 Errata A. http://acpi.info/DOWNLOADS/ACPI_5_ Errata%20A.pdf, 2013.

Advanced Micro Devices (2011). AMD64 Architecture Programmer’s Man- ual. http://developer.amd.com/resources/documentation-articles/ developer-guides-manuals/, 2011.

Afek, Yehuda; Attiya, Hagit; Dolev, Danny; Gafni, Eli; Merritt, Michael; Shavit, Nir (1993). Atomic Snapshots of Shared Memory. Journal of the ACM, Volume 40(4), pp. 873–890, 1993.

Allievi, Andrea (2014). Understanding and Defeating Windows 8.1 Kernel Patch Protection. http://www.nosuchcon.org/talks/2014/D2_01_Andrea_Allievi_ Win8.1_Patch_protections.pdf, 2014.

Anderson, David (2008). Red Hat Crash Utility. http://people.redhat.com/ anderson/crash_whitepaper, 2008.

Apple Inc. (2009). IOKit Device Driver Design Guidelines. https://developer. apple.com/library/mac/documentation/DeviceDrivers/Conceptual/ WritingDeviceDriver/, 2009.

Apple Inc. (2013a). IOMemoryDescriptor Class Reference. https:// developer.apple.com/library/mac/documentation/Kernel/Reference/ IOMemoryDescriptor_reference/, 2013.

Apple Inc. (2013b). Kernel Programming Guide. https://developer.apple.com/ library/mac/documentation/Darwin/Conceptual/KernelProgramming, 2013.

ATC-NY (2012a). MacMemoryReader. http://cybermarshal.com/index.php/ cyber-marshal-utilities/mac-memory-reader, 2012.

ATC-NY (2012b). WindowsMemoryReader. http://cybermarshal.com/index. php/cyber-marshal-utilities/windows-memory-reader, 2012.

105 Bibliography

BBN Technologies (2006). FRED: Forensic RAM Extraction Device. http://www. ir.bbn.com/˜vkawadia/, 2006. Becher, Michael; Dornseif, Maximillian; Klein, Christian N. (2005). FireWire – All Your Memory Are Belong To Us. In Proceedings of the Annual CanSecWest Applied Security Conference, 2005.

Bilby, Darren (2006). Low Down and Dirty: Anti-Forensic Rootkits. In Proceedings of Black Hat Japan, 2006.

Boileau, Adam (2006). Hit by a Bus: Physical Access Attacks with Firewire. In Proceedings of Ruxcon, 2006.

Brossard, Jonathan (2012). Hardware Backdooring is Practical. https:// media.blackhat.com/bh-us-12/Briefings/Brossard/BH_US_12_Brossard_ Backdoor_Hacking_Slides.pdf, 2012.

Brown, Michael (2014). iPXE. http://ipxe.org/, 2014.

Bulygin, Yuriy (2013). Evil Maid Just Got Angrier – Why Full-Disk Encryption With TPM is Insecure on Many Systems. https://cansecwest.com/slides/ 2013/Evil%20Maid%20Just%20Got%20Angrier.pdf, 2013.

Bulygin, Yuriy; Bazhaniuk, Oleksandr; Furtak, Andrew; Loucaides, John (2014). Summary of Attacks Against BIOS and Secure Boot. http://www.c7zero.info/ stuff/DEFCON22-BIOSAttacks.pdf, 2014.

Bulygin, Yuriy; Furtak, Andrew; Bazhaniuk, Oleksandr (2013). A Tale of One Software Bypass of Windows 8 Secure Boot. https://media.blackhat.com/us- 13/us-13-Bulygin-A-Tale-of-One-Software-Bypass-of-Windows-8- Secure-Boot-Slides.pdf, 2013.

Butler, Jamie (2004). DKOM (Direct Kernel Object Manipulation). Black Hat Windows Security, 2004.

Butterworth, John; Kallenberg, Corey; Kovah, Xeno; Herzog, Amy (2013). Bios Chronomancy: Fixing the Core Root of Trust for Measurement. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security (pp. 25–36).: ACM, 2013.

Carbone, Martim; Cui, Weidong; Lu, Long; Lee, Wenke; Peinado, Marcus; Jiang, Xuxian (2009). Mapping kernel objects to enable systematic integrity checking. In Proceedings of the 16th ACM conference on Computer and communications security (pp. 555–565).: ACM, 2009.

Carrier, B.D.; Grand, J. (2004). A hardware-based memory acquisition procedure for digital investigations. Digital Investigation, Volume 1(1), pp. 50–60, 2004.

106 Bibliography

Case, Andrew; Marziale, Lodovico; Richard, Golden G (2010). Dynamic recreation of kernel data structures for live forensics. Digital Investigation, 7, pp. S32–S40, 2010.

Chappell, Geoff (2010). The x86 BIOS Emulator. http://www.geoffchappell. com/studies/windows/km/hal/api/x86bios/call.htm, 2010.

Chappell, Geoff (2011). Viewing the Firmware Memory Map. http://www. geoffchappell.com/studies/windows/km/hal/api/x86bios/fwmemmap.htm? tx=7, 2011.

Chow, Jim; Pfaff, Ben; Garfinkel, Tal; Rosenblum, Mendel (2005). Shredding Your Garbage: Reducing Data Lifetime through Secure Deallocation. In Proceedings of the 14th Conference on USENIX Security Symposium, 2005.

Cohen, Michael (2011). PMEM - physical memory driver. http://code.google. com/p/volatility/source/browse/branches/scudette/tools/linux, 2011.

Cohen, Michael (2012). The PMEM Memory acquisition suite. http:// code.google.com/p/volatility/source/browse/branches/scudette/tools/ windows/winpmem, 2012.

Cohen, Michael (2014a). How to stop memory acquisition by changing one byte. http://rekall-forensic.blogspot.de/2014/03/how-to-stop-memory- acquisition-by.html, 2014.

Cohen, Michael (2014b). Rekall Memory Forensic Framework. http://www.rekall- forensic.com, 2014.

Cohen, M.; Bilby, D.; Caronni, G. (2011). Distributed Forensics and Incident Re- sponse in the Enterprise. Digital Investigation, 8, pp. S101–S110, 2011.

Corbet, Jonathan; Rubini, Alessandro; Kroah-Hartman, Greg (2005). Linux Device Drivers. O’Reilly, third edition Edition, 2005.

Drepper, Ulrich (2007). What every programmer should know about memory. Red Hat, Inc, 11, 2007.

Duarte, Gustavo (2009). How the Kernel Manages Your Memory. http://duartes. org/gustavo/blog/post/how-the-kernel-manages-your-memory/, 2009.

Garrison, Todd (2011). Mac OS Lion Forensic Memory Acquisition Using IEEE 1394. http://www.frameloss.org/wp-content/uploads/2011/09/Lion- Memory-Acquisition.pdf, 2011.

Giuliani, Marco (2013). Mebromi: The First BIOS Rootkit in the Wild. http://www.webroot.com/blog/2011/09/13/mebromi-the-first-bios- rootkit-in-the-wild/, 2013.

107 Bibliography

Goodin, Dan (2013). Meet “badBIOS”, the mysterious Mac and PC malware that jumps airgaps. http://arstechnica.com/security/2013/10/meet-badbios- the-mysterious-mac-and-pc-malware-that-jumps-airgaps/, 2013. Gorman, Mel (2004). Understanding the Linux virtual memory manager. Prentice Hall, 2004.

Gruhn, Michael; M¨uller, Tilo (2013). On the Practicability of Cold Boot Attacks. In Proceedings of the 8th International Conference on Availability, Reliability and Security (ARES) (pp. 390–397).: IEEE, 2013.

Halderman, J. Alex; Schoen, Seth D.; Heninger, Nadia; Clarkson, William; Paul, William; Calandrino, Joseph A.; Feldman, Ariel J.; Appelbaum, Jacob; Felten, Edward W. (2008). Lest We Remember: Cold-Boot Attacks on Encryption Keys. In Proceedings of the 17th USENIX Security Symposium, 2008.

Hale, Michael (2013). Linux Support in Volatility. http://code.google.com/p/ volatility/wiki/LinuxMemoryForensics, 2013. Halfdead (2008). Mystifying the debugger for ultimate stealthness. Phrack, 0x0c, pp. 0x08, 2008.

Halvorsen, Ole Henry; Clarke, Douglas (2011). OS X and iOS Kernel programming. Apress, 2011.

Hanspach, Michael; Goetz, Michael (2013). On Covert Acoustical Mesh Networks in Air. Journal of Communications, Volume 8(11), 2013.

Haruyama, T.; Suzuki, H. (2012). One-byte Modification for Breaking Memory Forensic Analysis. http://media.blackhat.com/bh-eu-12/Haruyama/bh-eu- 12-Haruyama-Memory_Forensic-Slides.pdf, 2012. Heasman, John (2006). Implementing and Detecting an ACPI BIOS Rootkit. http://www.blackhat.com/presentations/bh-europe-06/bh-eu- 06-Heasman.pdf, 2006. Hermann, Uwe (2014). Physical memory attacks via Firewire/DMA - Part 1: Overview and Mitigation. http://www.hermann-uwe.de/blog/physical- memory-attacks-via-firewire-dma-part-1-overview-and-mitigation, 2014.

Hewlett-Packard; Intel; Microsoft; Phoenix-Technologies; Toshiba (2011). ACPI Specification 5.0. http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf, 2011.

Hex-Rays (2005). IDA: The Interactive Disassembler. https://www.hex-rays.com, 2005.

Hoglund, Greg; Butler, James (2005). Rootkits: Subverting the Windows Kernel. Addison Wesley, 2005.

108 Bibliography

Hudson, Trammell (2014). Thunderstrike: EFI bootkits for Apple MacBooks, 2014.

Inoue, Hajime; Adelstein, Frank; Joyce, Robert A (2011). Visualization in testing a volatile memory forensic tool. Digital Investigation, 8, pp. S42–S51, 2011.

Intel Corporation (1997). MultiProcessor Specification. http://download.intel. com/design/archives/processors/pro/docs/24201606.pdf, 1997.

Intel Corporation (2000). Intel 815 Chipset Family. http://download.intel.com/ design/chipsets/datashts/29068801.pdf, 2000.

Intel Corporation (2009). Intel 5 Series Platform Controller Hub (PCH) Datasheet. http://www.intel.de/content/dam/www/public/us/en/ documents/datasheets/8-series-chipset-pch-datasheet.pdf, 2009.

Intel Corporation (2013). Desktop 4th Generation Intel Core Processor Family Datasheet Volume 2. http://www.intel.com/assets/pdf/datasheet/317607. pdf, 2013.

Intel Corporation (2014a). ACPI Component Architecture. https://acpica.org/, 2014.

Intel Corporation (2014b). Intel 64 and IA-32 Architectures Software Developer’s Manual, Volume 3 System Programming Guide, 2014.

Intel Corporation (2014c). Intel 8 Series Platform Controller Hub (PCH) Datasheet. http://www.intel.de/content/dam/www/public/us/en/ documents/datasheets/8-series-chipset-pch-datasheet.pdf, 2014.

Intel Corporation (2014d). Intel Virtualization Technology for Directed I/O: Spec- ification. http://www.intel.com/content/www/us/en/intelligent-systems/ intel-technology/vt-directed-io-spec.html, 2014.

Intel Corporation (2014e). Trusted Compute Pools with Intel Trusted Execution Technology. http://www.intel.com/content/www/us/en/architecture-and- technology/trusted-execution-technology/malware-reduction-general- technology.html, 2014.

Kaspersky Labs (2015). The Great Bank Robbery: the Carbanak APT. https://securelist.com/blog/research/68732/the-great-bank- robbery-the-carbanak-apt/, 2015.

King, Samuel T; Chen, Peter M (2006). SubVirt: Implementing malware with virtual machines. In Security and Privacy, 2006 IEEE Symposium on (pp. 14–pp).: IEEE, 2006.

Kleen, Andi (2004). Virtual memory map with 4 level page tables. https://www. kernel.org/doc/Documentation/x86/x86_64/mm.txt, 2004.

109 Bibliography

Koll´ar,Ivor (2010). Forensic RAM dump image analyser. Department of Software Engineering, Charles University, Prague, 2010.

Kornblum, Jesse (2006). Exploiting the rootkit paradox with windows memory analysis. International Journal of Digital Evidence, Volume 5(1), pp. 1–5, 2006.

Kovah, Xeno; Butterworth, John; Kallenberg, Corey; Cornwell, Sam (2014). Copernicus 2: SENTER the Dragon. http://www.mitre.org/publications/ technical-papers/copernicus-2-senter-the-dragon, 2014.

Levin, Jonathan (2012). Mac OS X and IOS Internals: To the Apple’s Core. John Wiley & Sons, 2012.

Levine, John R. (1999). Linkers and Loaders. Morgan Kaufmann, 1999.

Ligh, Michael Hale; Case, Andrew; Levy, Jamie; Walters, AAron (2014). The Art of Memory Forensics: Detecting Malware and Threats in Windows, Linux, and Mac Memory. John Wiley & Sons, 2014.

Lineberry, Anthony (2009). Malicious Code Injection via /dev/mem. Black Hat Europe, 2009.

Linux Kernel Organization (2014). Kernel Virtual Machine. git://git.kernel. org/pub/scm/virt/kvm/kvm.git, 2014.

Loukas, K (2012). De Mysteriis Dom Jobsivs–Mac EFI Rootkits. http://ho.ax/ De_Mysteriis_Dom_Jobsivs_Black_Hat_Paper.pdf, 2012.

Maartmann-Moe, Carsten (2013). Inception. http://www.breaknenter.org/ projects/inception/, 2013.

Mandiant (2011). Memoryzetm. http://www.mandiant.com/resources/ download/memoryze, 2011.

Mandiant (2012). Memoryzetm for the Mac. https://www.mandiant.com/ resources/download/mac-memoryze, 2012.

ManTech CSI, Inc. (2009). mdd. http://sourceforge.net/projects/mdd/, 2009.

Martignoni, Lorenzo; Fattori, Aristide; Paleari, Roberto; Cavallaro, Lorenzo (2010). Live and Trustworthy Forensic Analysis of Commodity Production Systems. In Proceedings of the 13th International Conference on Recent Advances in Intrusion Detection (RAID), 2010.

Matz, Michael; Hubicka, Jan; Jaeger, Andreas; Mitchell, Mark (2012). System V Application Binary Interface. http://refspecs.linuxfoundation.org/elf/ x86-64-abi-0.99.pdf, 2012.

110 Bibliography

McAfee Inc. (2014). Net Losses: Estimating the Global Cost of Cy- bercrime. http://www.mcafee.com/de/resources/reports/rp-economic- impact-cybercrime2.pdf, 2014.

Microsoft Corporation (2006). Kernel Patch Protection: FAQ. http://msdn. microsoft.com/en-us/library/windows/hardware/gg487353.aspx, 2006.

Microsoft Corporation (2011). Windows Feature Lets You Generate a Memory Dump File by Using the Keyboard. http://support.microsoft.com/?scid= kb%3Ben-us%3B244139&x=5&y=9, 2011.

Microsoft Corporation (2013). Device PhysicalMemory Object. http://technet. \ microsoft.com/en-us/library/cc787565%28v=ws.10%29.aspx, 2013.

Milkovic, Luka (2012). Defeating Windows memory forensics. http://events.ccc. de/congress/2012/Fahrplan/events/5301.en.html, 2012.

Miller, David S.; Henderson, Richard; Jelinek, Jakub (2015). Dynamic DMA map- ping Guide. https://www.kernel.org/doc/Documentation/DMA-API-HOWTO. txt, 2015.

Minnich, Ron (2014). Coreboot. http://www.coreboot.org/, 2014.

MoonSols (2012). Windows Memory Toolkit. http://moonsols.com/product, 2012.

Mozak, C.P. (2011). Suppressing power supply noise using data scrambling in double data rate memory systems. http://www.google.com/patents/US7945050, 2011. US Patent 7,945,050.

Nohl, Karsten; Krißler, Sascha; Lell, Jakob (2014). BadUSB - On accessories that turn evil. Blackhat. https://srlabs.de/blog/wp-content/uploads/2014/07/ SRLabs-BadUSB-BlackHat-v1.pdf, 2014.

O’Connor, Kevin (2014). SeaBIOS. http://www.seabios.org/SeaBIOS, 2014.

O’Malley, Samuel Joseph; Choo, Kim-Kwang Raymond (2014). Bridging the Air Gap: Inaudible Data Exfiltration by Insiders. In 20th Americas Conference on Information Systems (AMCIS 2014) (pp. 7–10)., 2014.

Ooi, Tsukasa (2009). Stealthy Rootkit: How bad guy fools live memory foren- sics? http://www.slideshare.net/a4lg/stealthy-rootkit-how-bad-guy- fools-live-memory-forensics-pacsec-2009, 2009.

PCI-SIG (1998). PCI-to-PCI Bridge Architecture Specification, 1998.

PCI-SIG (2002). PCI Local Bus Specification 3.0, 2002.

PCI-SIG (2010a). PCI Express Base Specification Revision 3.0, 2010.

111 Bibliography

PCI-SIG (2010b). PCI Firmware 3.1 Specification. https://www.pcisig.com/ specifications/conventional/pci_firmware/, 2010. PCI-SIG (2015). PCI Vendor ID Search. https://www.pcisig.com/membership/ vid_search/, 2015. Petroni, Nick L.; Fraser, Timothy; Molina, Jesus; Arbaugh, William A. (2004). Copilot – A Coprocessor-Based Kernel Runtime Integrity Monitor. In Proceedings of the 13th USENIX Security Symposium, 2004. Raytheon Pikewerks (2013). Linux Incident Response with Second Look. http:// secondlookforensics.com/linux-incident-response/, 2013. Reina, Alessandro; Fattori, Aristide; Pagani, Fabio; Cavallaro, Lorenzo; Bruschi, Danilo (2012). When Hardware Meets Software: A Bulletproof Solution to Foren- sic Memory Acquisition. In Proceedings of the 28th Annual Computer Security Applications Conference, 2012. Richard, Golden G; Case, Andrew (2014). In lieu of swap: Analyzing compressed RAM in Mac OS X and Linux. Digital Investigation, 11, pp. S3–S12, 2014. Rostedt, Steven (2009). Debugging the kernel using Ftrace. http://lwn.net/ Articles/365835/, 2009. Ruff, Nicolas; Suiche, Matthieu (2007). Enter Sandman. In Proceedings of the 5th Annual PacSec Applied Security Conference, 2007. Rusakov, Vyacheslav (2011). TDL4 Rootkit. http://www.securelist.com/en/ analysis/204792157/TDSS_TDL_4, 2011. Rusakov, Vyacheslav (2012). XPAJ: Reversing a Windows x64 Bootkit. http://www.securelist.com/en/analysis/204792235/XPAJ_Reversing_ a_Windows_x64_Bootkit#5, 2012. Russinovich, Mark E.; Solomon, David A.; Ionescu, Alex (2009). Microsoft Windows Internals. Microsoft Press, 5th Edition, 2009. Rutkowska, J. (2006). Introducing Blue Pill. http://theinvisiblethings. blogspot.de/2006/06/introducing-blue-pill.html, 2006. Rutkowska, J. (2007). Beyond the CPU: Defeating hardware based RAM acquisition. In Proceedings of BlackHat DC, 2007. Salihun, Darmawan (2006). BIOS Disassembly Ninjutsu Uncovered. A-List Publish- ing, 2006. Salihun, Darmawan (2014). System Address Map Initialization in x86/x64 Architecture Part 2: PCI Express-Based Systems. http://resources. infosecinstitute.com/system-address-map-initialization-x86x64- architecture-part-2-pci-express-based-systems/, 2014.

112 Bibliography

Salzman, Peter Jay; Burian, Michael; Pomerantz, Ori (2001). The linux kernel module programming guide. TLDP: http://tldp. org/LDP/lkmpg/2.4/html, 2001.

Schatz, B. (2007a). BodySnatcher: Towards reliable volatile memory acquisition by software. Digital Investigation, 4, pp. 126–134, 2007.

Schatz, Bradley (2007b). Recent Developments in Volatile Memory Foren- sics. http://www.schatzforensic.com.au/presentations/BSchatz-CERT- CSD2007.pdf, 2007.

Schneier, Bruce (2014). DEITYBOUNCE: NSA Exploit of the Day. https://www. schneier.com/blog/archives/2014/01/nsa_exploit_of.html, 2014.

Shoumikhin, Anthony (2010). Redirecting functions in shared ELF libraries. http://www.codeproject.com/KB/library/elf-redirect.aspx, 2010.

Singh, Amit (2006). Mac OS X internals: a systems approach. Addison-Wesley Professional, 2006.

Skochinsky, Igor (2014). Intel ME: Two Years Later. https://ruxconbreakpoint. com/assets/2014/slides/bpx-Breakpoint%202014%20Skochinsky.pdf, 2014.

Sparks, S.; Butler, J. (2005). Shadow Walker: Raising the bar for rootkit detection. In Proceedings of Black Hat Japan (pp. 504–533)., 2005.

Stealth (2004). The Adore-Ng Rootkit. http://packetstormsecurity.com/files/ 32843/adore-ng-0.41.tgz.html, 2004.

Stewin, Patrick; Bystrov, Iurii (2012). Understanding DMA Malware. In Proceedings of the 9th International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA), 2012.

St¨uttgen,Johannes (2012). OSXPmem. http://code.google.com/p/pmem/wiki/ OSXPmem, 2012.

St¨uttgen,Johannes (2014). Elfrelink: An ELF code injection library. https:// github.com/google/rekall/tree/master/tools/linux/lmap/elfrelink, 2014.

St¨uttgen,Johannes; Cohen, Michael (2013). Anti-Forensic Resilient Memory Acqui- sition. Digital Investigation, 10, pp. S105–S115, 2013.

St¨uttgen,Johannes; Cohen, Michael (2014). Robust Linux memory acquisition with minimal target impact. Digital Investigation, 11, pp. S112–S119, 2014.

St¨uttgen,Johannes; V¨omel,Stefan; Denzel, Michael (2015). Acquisition and Ana- lysis of Compromised Firmware Using Memory Forensics. In Proceedings of the 2nd Annual DFRWS Europe Conference (DFRWS-EU 2015 Dublin), 2015.

113 Bibliography

Styx (2012). Infecting loadable kernel modules, kernel versions 2.6.x/3.0.x. Phrack, Volume 0x0e(0x44), pp. 0x0b, 2012.

Suiche, Matthieu (2009a). Reply to HBGary. http://www.msuiche.net/2009/11/ 16/reply-to-hbgary-and-personal-notes/, 2009.

Suiche, Matthieu (2009b). Win32dd. http://www.msuiche.net/tools/win32dd- v1.2.1.20090106.zip, 2009.

Suiche, Mathieu (2011). MoonSols DumpIt goes mainstream. http://www. moonsols.com/2011/07/18/moonsols-dumpit-goes-mainstream/, 2011.

Sutherland, Iain; Evans, Jon; Tryfonas, Theodore; Blyth, Andrew (2008). Acquiring Volatile Operating System Data Tools and Techniques. ACM SIGOPS Operating Systems Review, Volume 42(3), pp. 65–73, 2008.

Sylve, Joe (2012). LiME – Linux Memory Extractor. In Proceedings of the 7th ShmooCon Conference, 2012.

Sylve, Joe; Case, Andrew; Marziale, Lodovico; Richard, Golden G (2012). Acquisi- tion and analysis of volatile memory from android devices. Digital Investigation, Volume 8(3), pp. 175–184, 2012.

Tanenbaum, Andrew S; Bos, Herbert (2014). Modern operating systems. Prentice Hall Press, 2014.

The Bochs Project (2013). Bochs – The Cross Platform IA-32 Emulator. http:// bochs.sourceforge.net/, 2013.

The Coreboot Project (2009). Developer Manual/Tools. http://www.coreboot. org/Developer_Manual/Tools, 2009.

The Flashrom Team (2013). Flashrom. http://www.flashrom.org/, 2013.

The Linux Kernel Archives (2013). The Linux Kernel Source Code. https://www. kernel.org/pub/linux/kernel/v3.x/linux-3.12.tar.xz, 2013.

The Linux man-pages (2012). kexec load - load a new kernel for later execution. http://man7.org/linux/man-pages/man2/kexec_load.2.html, 2012.

The New York Times (2015). Bank Hackers Steal Millions via Mal- ware. http://www.nytimes.com/2015/02/15/world/bank-hackers-steal- millions-via-malware.html, 2015.

TIS Committee (1995). Tool Interface Standard Executable and Linking For- mat (ELF) Specification v1.2. http://refspecs.linuxbase.org/elf/elf.pdf, 1995.

114 Bibliography

Triulzi, Arrigo (2010). The Jedi Packet Trick Takes over the Deathstar. http://www.alchemistowl.org/arrigo/Papers/Arrigo-Triulzi-CANSEC10- Project-Maux-III.pdf, 2010.

Truff (2003). Infecting loadable kernel modules. Phrack, Volume 0x0b(0x3d), pp. 0x0a, 2003.

Turley, Jim (2014). The Basics of Intel Architecture. http://www.intel.com/ content/www/us/en/intelligent-systems/embedded-systems-training/ia- introduction-basics-paper.html, 2014. van de Ven, Arjan (2008). Introduce /dev/mem restrictions with a config option. http://lwn.net/Articles/267427/, 2008.

Vidas, Timothy (2010). Volatile Memory Acquisition via Warm Boot Memory Sur- vivability. In Proceedings of the 43rd Hawaii International Conference on System Sciences, 2010.

Vidstrom, Arne (2006). Forensic memory dumping intricacies - PhysicalMemory, DD, and caching issues. http://ntsecurity.nu/onmymind/2006/2006-06-01. html, 2006.

V¨omel,Stefan; Freiling, Felix C (2011). A survey of main memory acquisition and analysis techniques for the windows operating system. Digital Investigation, Volume 8(1), pp. 3–22, 2011.

V¨omel,Stefan; Freiling, Felix C (2012). Correctness, atomicity, and integrity: Defin- ing criteria for forensically-sound memory acquisition. Digital Investigation, Vol- ume 9(2), 2012.

V¨omel,Stefan; St¨uttgen,Johannes (2013). An Evaluation Platform for Forensic Memory Acquisition Software. In Proceedings of the 13th Annual DFRWS Con- ference, 2013.

Walters, Aaron (2014). Volatility Framework. https://github.com/ volatilityfoundation/volatility, 2014.

Walters, Aaron; Petroni, Nick L. (2007). Volatools: Integrating Volatile Memory Forensics into the Digital Investigation Process. In Proceedings of Black Hat DC, 2007.

Wang, J.; Zhang, F.; Sun, K.; Stavrou, A. (2011). Firmware-assisted Memory Acquisition and Analysis tools for Digital Forensics. In Systematic Approaches to Digital Forensic Engineering (SADFE), 2011 IEEE Sixth International Workshop on (pp. 1–5).: IEEE, 2011.

Weidong, Cui; Zhilei, Xu; Marcus, Peinado; Ellick, Chan (2012). Tracking rootkit footprints with a practical memory analysis system. In Proceedings of the 21st

115 Bibliography

USENIX conference on Security symposium (pp. 42–42).: USENIX Association, 2012.

Williams, Jake; Torres, Alissa (2014). ADD - Complicating Memory Forensics Through Memory Disarray. http://www.mediafire.com/view/ h7bmcscbtyaeb6r/ADD_Shmoocon.pdf, 2014.

WindowsSCOPE (2014). CaptureGUARD. http://www.windowsscope.com/, 2014.

Witherden, Freddie (2010). libforensic1394. https://freddie.witherden.org/ tools/libforensic1394/, 2010.

Wojtczuk, Rafal; Tereshkin, Alexander (2009). Attacking Intel BIOS. https://www.blackhat.com/presentations/bh-usa-09/WOJTCZUK/BHUSA09- Wojtczuk-AtkIntelBios-SLIDES.pdf, 2009.

You, Dong-Hoon (2012). Android platform based linux kernel rootkit. Phrack, Volume 0x0e(0x44), pp. 0x06, 2012.

Yu, Miao; Lin, Qian; Li, Bingyu; Qi, Zhengwei; Guan, Haibing (2012). Vis: Virtu- alization Enhanced Live Forensics Acquisition for Native System. Digital Investi- gation, Volume 9(1), pp. 22–33, 2012.

Zimmer, Vincent; Rothman, Michael; Marisetty, Suresh (2010). Beyond BIOS: developing with the unified extensible firmware interface. Intel Press, 2010.

116