All your System Memory are belong to us: From Low-Level Memory Acquisition to High-Level Forensic Event Reconstruction

Von Hauptspeicherakquise auf niedriger Systemebene zu forensischer Ereignisrekonstruktion auf hoher Abstraktionsebene

Der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg zur Erlangung des Doktorgrades Dr.-Ing. vorgelegt von

Tobias Latzo

aus Forchheim Als Dissertation genehmigt von der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg

Tag der mündlichen Prüfung: 09.07.2021

Vorsitzender des Promotionsorgans: Prof. Dr.-Ing. habil. Andreas P. Fröba Gutachter: Prof. Dr.-Ing. Felix Freiling Prof. Dr. rer. nat. Hans P. Reiser Abstract

This thesis comprises two parts. In the first part, we unveil the limitations of forensic event reconstruction with log files. To enhance forensic event reconstruction, we use system call traces treated as log files. System call traces are gained via virtual machine introspection, which analyzes the target system’s memory. System memory analysis, in general, has evolved into an essential part of today’s forensic investigations. For this, memory needs to be acquired first, becoming more difficult with upcoming system security features. Hence, the second part of this thesis is dedicated to memory acquisition techniques. First, we survey the landscape of forensic memory acquisition techniques. Then we introduce new low-layer memory acquisition techniques and tools. In the first part of this thesis, we calculate characteristic fingerprints for various typical administration-related events. We use different standard log files, and additionally, we make use of system call traces. While these turn out to be beneficial for event detection, they have a significant impact on performance. Hence, we research which system callsare discriminative and improve performance by only tracing relevant ones. The second part of this thesis is dedicated to low-level memory acquisition techniques starting with a universal taxonomy and survey. On the one side, the survey reveals, the lower the memory acquisition technique’s layer, the better. On the other side, only a few tools do run “below” the . In the further course of the thesis, four techniques are introduced that do operate on low layers. All techniques come with pros and cons and have their particular use cases. Our first two techniques are integrated into the computer’s firmware. UEberForensIcs acquires memory from the UEFI Shell. To acquire memory, one needs to restart the system and open the UEFI Shell. System memory is exfiltrated over the network. Another approach is to hook UEFI’s Runtime Services that are called by the target’s operating system. The third technique we introduce makes use of Direct Memory Access (DMA). We leverage the high capabilities of Baseboard Management Controllers (BMCs), which are standard in remote administration servers. With BMCLeech we introduce stealthy memory acquisition from the BMC via Direct Memory Access (DMA) that is compatible with the memory forensic tool PCILeech. Eventually, we use the little-known Intel Direct Connect Interface (DCI) to acquire system memory via JTAG debugging. This approach is beneficial in terms of atomicity and integrity of the resulting memory image. DCILeech is also compatible with PCILeech and so benefits from all its features. Additionally, we show how to read the secured memoryof Intel SGX enclaves.

i Zusammenfassung

Diese Arbeit besteht aus zwei Teilen. Im ersten Teil zeigen wir die Grenzen der forensischen Ereignisrekonstruktion mit Logdateien auf. Um die forensische Ereignisrekonstruktion zu verbessern, verwenden wir Systemaufrufspuren, die als Logs behandelt werden. System- aufrufspuren werden über Virtual Machine Introspection gewonnen, eine Technik, die den Speicher des Zielsystems analysiert. Generell hat sich die Analyse des Systemspeichers zu einem wesentlichen Bestandteil der heutigen forensischen Untersuchungen entwickelt. Hierfür muss zunächst der Speicher akquiriert werden, was durch zunehmende Systemsicher- heitsfeatures schwieriger geworden ist. Daher ist der zweite Teil dieser Arbeit den Techniken zur Hauptspeicherakquise gewidmet. Zunächst geben wir einen Überblick über existie- rende Techniken. Anschließend stellen wir neue Speicherakquisetechniken auf niedriger Systemebene vor. Im ersten Teil dieser Arbeit berechnen wir charakteristische Fingerabdrücke für verschiede- ne typische Ereignisse im Zusammenhang mit der Linux-Administration. Wir verwenden verschiedene Standard-Logdateien. Zusätzlich machen wir Gebrauch von Systemaufruf- spuren. Diese erweisen sich zwar als hilfreich für die forensische Ereignisrekonstruktion, haben aber einen erheblichen Einfluss auf die Performance. Daher untersuchen wir, welche Systemaufrufe diskriminierend sind und verbessern die Performance, indem wir nur die relevanten tracen. Der zweite Teil dieser Arbeit widmet sich den Low-Level-Speicherakquisetechniken, be- ginnend mit einer universellen Taxonomie und einem Survey. Auf der einen Seite zeigt das Survey, dass die Speicherakquisetechniken umso besser sind, je tiefer die Ebene ist, auf der sie ausgeführt werden. Andererseits gibt es nur wenige Tools, die “unterhalb” des Betriebssystems ausgeführt werden. Im weiteren Verlauf der Arbeit werden vier Techniken vorgestellt, die auf niedrigen Schichten arbeiten. Alle Techniken haben ihre Vor- und Nachteile und ihre speziellen Anwendungsfälle. Unsere ersten beiden Techniken werden in die Firmware des Computers integriert. UE- berForensIcs sichert Speicher von der UEFI Shell. Um Speicher zu sichern, muss man das System neu starten und die UEFI Shell öffnen. Der Systemspeicher wird über das Netzwerk exfiltriert. Ein anderer Ansatz ist das Hooking der UEFI Runtime Services, die vom Betriebssystem des Zielsystems aufgerufen werden. Die dritte Technik nutzt direkten Speicherzugriff (DMA). Wir nutzen die hoch-privilegierte Anbindung von Baseboard Management Controllern (BMCs), die es für Fernwartung für Server gibt. Mit BMCLeech führen wir eine verdeckte Speicherakquise vom BMC über DMA ein, die mit dem Speicherforensik-Tool PCILeech kompatibel ist. Schließlich nutzen wir das wenig bekannte Intel Direct Connect Interface (DCI), um Systemspeicher über JTAG-Debugging zu sichern. Dieser Ansatz ist vorteilhaft in Bezug auf Atomarität und Integrität des Speicherabbilds. DCILeech ist auch mit PCILeech kompatibel und profitiert so von allen seinen Funktionen. Zusätzlich zeigen wir, wieman den gesicherten Speicher von Intel SGX- Enklaven auslesen kann.

ii Acknowledgments

I would like to thank my doctoral advisor Felix Freiling for offering me to work with him at his Chair for IT-Security Infrastructures at the Department of Computer Science at the FAU Erlangen-Nuremberg, his time for various project meetings, and his continuous support. I also want to thank all my colleagues at the Chair for a friendly and casual working atmosphere, especially Ralph Palutke for discussing countless research ideas and nonsense. I also have to thank my friends and especially my girlfriend Sarah for being the precious “life” in my work-life balance. Last but not least, I want to thank my parents for their encouragement and incessant support of my dissertation project as well as my entire studies.

iii

Contents

1 Introduction 1 1.1 Outline ...... 2 1.2 Contribution ...... 4 1.3 Publications...... 7

2 Background 9 2.1 Calculating Characteristic Fingerprints...... 9 2.1.1 System Call Tracing...... 9 2.1.2 Forensic Fingerprint Calculation ...... 9 2.2 Methods for Privilege Separation...... 10 2.2.1 Privilege Rings...... 10 2.2.2 Virtual Memory ...... 11 2.2.3 Virtualizable Architectures and Virtualization Extensions ...... 12 2.2.4 Unified Extensible Firmware Interface...... 12 2.2.5 System Management Mode ...... 13 2.2.6 PCI Express and DMA ...... 13 2.2.7 Hardware Memory Encryption and Intel SGX...... 14 2.2.8 Out-Of-Band Management ...... 15 2.2.9 JTAG Debugging...... 15 2.3 Memory Forensics ...... 16

3 Limitations of Forensic Event Reconstruction based on Log Files 19 3.1 Experimental Setup ...... 20 3.1.1 Scenario and Attacker Model ...... 20 3.1.2 Log Source Classification ...... 21 3.1.3 Experimental Architecture...... 22 3.1.4 Feature Sets...... 23 3.1.5 Events and Event Classification...... 25 3.2 Forensic Fingerprints...... 25 3.2.1 Handling Background Noise...... 27 3.2.2 Non-Characteristic Fingerprints...... 28 3.2.3 Characteristic Fingerprints ...... 29 3.3 Matching ...... 32 3.3.1 Methodology ...... 33 3.3.2 Matching Results...... 33

v Contents

3.3.3 Stability against Unknown Events ...... 35 3.4 System Calls for Forensic Event Reconstruction...... 36 3.4.1 System Call Distribution in System Activity and Characteristic Fingerprints...... 37 3.4.2 The Cost Function...... 38 3.4.3 Greedy Elimination of Expensive System Calls ...... 39 3.5 Related Work...... 42 3.6 Discussion...... 43

4 Universal Taxonomy and Survey of Forensic Memory Acquisition 45 4.1 The Generic Memory Access Hierarchy...... 46 4.1.1 System Model...... 46 4.1.2 Memory Access Levels with Multiplexing ...... 47 4.1.3 Memory Access Levels without Multiplexing...... 47 4.1.4 Accessibility with Hardware Memory Encryption...... 48 4.1.5 A Generic Memory Access Hierarchy...... 48 4.1.6 Forensic Memory Acquisition...... 50 4.2 Taxonomy...... 50 4.2.1 Dimension 1: Access Hierarchy Level...... 51 4.2.2 Dimension 2: Pre- or Post-Incident Deployment...... 51 4.2.3 Dimension 3: Terminating vs. Non-Terminating Acquisition . . . . . 52 4.3 Survey...... 52 4.3.1 User Level...... 52 4.3.2 Kernel Level ...... 53 4.3.3 Hypervisor Level...... 58 4.3.4 Synchronous Management Level ...... 62 4.3.5 Device Level ...... 63 4.4 Discussion...... 66

5 Bringing Forensic Readiness to Modern Computer Firmware 67 5.1 Architecture and Setup ...... 68 5.1.1 Hardware Setup ...... 68 5.1.2 VM Setup...... 68 5.2 Built-in Cold Boot...... 69 5.2.1 Implementation...... 69 5.2.2 Evaluation ...... 70 5.2.3 Discussion...... 72 5.3 Runtime Service Forensics...... 73 5.3.1 Implementation...... 73 5.3.2 Evaluation ...... 74 5.3.3 Discussion...... 76 5.4 Related Work...... 76 5.5 Discussion...... 76

vi Contents

6 Stealthy Memory Forensics from the BMC 79 6.1 Implementation...... 80 6.1.1 Architecture...... 80 6.1.2 BMCLeech ...... 81 6.1.3 Kernel Driver...... 81 6.2 Evaluation...... 82 6.2.1 Methodology ...... 82 6.2.2 Hardware Setup ...... 83 6.2.3 Correctness...... 83 6.3 Related Work...... 87 6.4 Discussion...... 87

7 Leveraging Intel DCI for Memory Forensics 89 7.1 Intel Direct Connect Interface...... 90 7.1.1 Enabling Intel DCI...... 90 7.1.2 OpenIPC and DAL...... 91 7.2 DCILeech: Design and Implementation...... 91 7.2.1 Architecture...... 91 7.2.2 DCILeech...... 92 7.2.3 PCILeech Patch ...... 93 7.3 Evaluation...... 93 7.3.1 Methodology ...... 93 7.3.2 Hardware Setup ...... 94 7.3.3 Correctness...... 94 7.3.4 Stealthiness...... 96 7.3.5 Intel SGX...... 96 7.4 Digital Forensic Triage with Intel DCI...... 97 7.5 Related Work...... 98 7.6 Discussion...... 99

8 Conclusion 101

Bibliography 103

A Supplement Material of Forensic Fingerprints 117

vii Acronyms

AES Advanced Encryption Standard AMT Advanced Management Technology BIOS Basic Input/Output System BMC Baseboard Management Controller CLI Command Line Interface CPU Central Processing Unit CSME Converged Security and Management Engine DAL DFx Abstraction Layer DCI Direct Connect Interface DL Device Level DMA Direct Memory Access DXE Driver Execution Environment EDK II EFI Development Kit II EFI Extensible Firmware Interface EPC Enclave Page Cache EPT Extended Page Table FPGA Field Programmable Gate Array GPA Guest Physical Address HL Hypervisor Level HPA Host Physical Address IC Integrated Circuit IOMMU I/O Memory Management Unit IoT Internet of Things IPMI Intelligent Platform Management Interface JTAG Joint Test Action Group KL Kernel Level MKTME Multi-Key Total Memory Encryption MMU Memory Management Unit OOB out-of-band OS Operating System PCB Printed Circuit Board PCH Platform Control Hub PD Page Directory PDPT Page Directory Pointer Table PFH Page Fault Handler PML4 Page Map Level 4 PT Page Table PTE Page Table Entry RAM Random-Access Memory RC Root Complex RTS Runtime Service SEV Secure Encrypted Virtualization

viii Acronyms

SGX Software Guard Extensions SIEM Security Incident and Event Management SME Secure Memory Encryption SMI System Management Interrupt SML Synchronous Management Level SMM System Management Mode SoC System-on-Chip SPI Serial Peripheral Interface SVT Silicon View Technology TAP Test Access Port TLB Translation Lookaside Buffer TME Total Memory Encryption TSME Transparent Secure Memory Encryption UEFI Unified Extensible Firmware Interface UL User Level USB Universal Serial Bus VM Virtual Machine VMI Virtual Machine Introspection VMM Virtual Machine Monitor VMX Virtual Machine Extensions VT Virtualization Technology VT-d Virtualization Technology for Directed I/O

ix

1 Introduction

In 2014, the taxicab dispatcher Philip Welsh was murdered in his home [51]. Welsh did not use any digital devices. Investigators had no indications at all of what Welsh was doing or whom he met. Eventually, the case could not be solved. Officials say that this is due in large part to the fact that there was no digital evidence. On the other side, there are cases with potential digital evidence that cannot be accessed. In 2015, there was the San Bernardino attack. Two terrorists shot fourteen people. The FBI seized an iPhone 5C of a perpetrator and asked the NSA to decrypt it. However, the NSA was not able to do so. So the FBI asked Apple, but they declined. Eventually, the FBI got the phone unlocked by a third party.

The role of digital evidence in court is becoming more and more crucial. Its relevance is even compared with firearms, fingerprints, or DNA. For nearly all cases, at least one smartphone is seized. Classical digital forensics pre-eminently focused on hard disk forensics. Today’s digital forensics also needs to turn the main memory into account. There are manifold reasons for this. With memory forensics, it is possible to get insights into the current state of the system. This state may contain many interesting data that are never written to disk, such as encryption keys, decrypted containers or documents, open network connections, attached network drives, file-less malware. In the case of full-disk encryption, it iseven possible that the investigator, after turning the device off, never gains access to the system or data again.

Today, there are many tools and techniques to acquire a system’s main memory [81]. Most popular tools require Operating System (OS) support, i.e., they are implemented as kernel drivers. Examples are Pmem [140], LiME [144] or DumpIt [142]. However, this technique comes with some drawbacks. First, the investigator needs to install the software, which changes the main memory and even the disk data. Forensic data acquisition should be forensically sound. This means the data that was acquired should be as original as possible. The investigator at least needs to document any modification of the system. Secondly, to install software on a computer, one needs the corresponding rights, i.e., root or administrator rights. Therefore, a password is usually required. Furthermore, the investigator needs to trust the underlying operating system. Otherwise, there is the risk of anti-forensic techniques that may tamper with the acquired memory [117].

If we catch a glimpse at modern smartphones, we can see a trend that may spill over to the future’s desktop computers and servers. The software can often only be installed from the corresponding app store. On smartphones, the end-user has no root privileges and cannot install high-privileged software that is allowed to access the whole file system or even the main memory. Furthermore, the systems are getting smaller and smaller, and sometimes even Random-Access Memory (RAM) is integrated into the System-on-Chip (SoC) which impedes physical access like cold boot attacks [61]. Thus, there are special devices for smartphones that often use exploits to gain access to the device [15]. Recently, Apple shifted to ARM SoCs for its latest Macs that come with integrated DRAM [4]. Additionally, applications need to be signed by Apple to be executed [152].

1 1 Introduction

A posteriori deployment of memory acquisition tools is difficult and often relies on previous vulnerability exploitation. Hence, the idea is to prepare a system for forensic investigations. IS/IEC 27043:2015 [70] defines digital forensic readiness as the “process of being prepared for a digital investigation before an incident has occurred.” Forensic readiness is related to the preparation phases in many process models of incident response and digital forensic investigations. Usually, it involves establishing a capability for securely gathering legally admissible evidence in case of an incident [131]. The beginning of the title of this thesis “All your system memory are belong to us” is referring to a mistranslation in the Japanese space shooter arcade video game Zero Wings from 1989 [9]. Starting in 2000, the original mistranslation “All your base are belong to us” evolved into a well-known meme and is also widespread in the IT security community [8, 10, 137, 154]. This meme connotes that aggressive measures are applied. However, one needs to apply them carefully. Otherwise, it could become a hilarious meme. This thesis’s overall goal is to explore techniques that will allow forensic analysis in the future. Therefore, the capabilities of common log data-based event reconstruction are researched. The logs are enriched by system call traces drained using Virtual Machine Introspection (VMI), which turned out to be beneficial for forensic event reconstruction. Then, the landscape of existing memory acquisition techniques and tools is surveyed. One result is that the lower the layer of memory acquisition, the better. As a result, this thesis contains four techniques that acquire memory on lower layers — three being actual memory acquisition tools and one proof-of-concept. The tools need to be installed before, i.e., make the corresponding systems forensic-ready.

1.1 Outline

In the following, we want to provide a more high-level overview of this thesis. First, in Chapter2 we provide some background information necessary to understand the rest of this thesis. This contains the theory of characteristic fingerprint calculation, privilege separation methods for the x86 architecture, and some fundamentals of memory forensics. In Chapter3, we calculate characteristic fingerprints for typical Linux events. Figure 1.1 shows the architecture of the experimental setup. To learn new events, these are executed automatically from another system. We were able to calculate characteristic fingerprints for most events. It was also possible to match events based on logs that were extended by system call traces. System call traces that were gained using VMI — a technique that uses memory analysis to trace system calls — have turned out to be beneficial for forensic event reconstruction. However, tracing all system calls produces a lot of performance overhead which is countered with tracing only discriminative system calls. System call tracing is not the only helpful memory analysis technique. Especially when it comes to a forensic investigation of a potential incident, memory analysis has evolved to a decisive factor. Therefore, memory has to be acquired first. However, increasing system security makes it harder to deploy acquisition software. Furthermore, an installation after a potential incident occurred also may forge evidence. In Chapter4 we survey the landscape of memory acquisition techniques and tools. We classify the techniques based on their privilege level, time of deployment, and if they terminate the target. It has been shown that the lower the memory acquisition technique, the better. Hence, the rest of this thesis

2 1.1 Outline

App App

Operating System Fingerprint Calculation

Target Virtual Machine Matcher system call log Logs Parser Forensic Monitor

Server Forensic Workstation

Figure 1.1: Overview over the architecture we used for calculating and matching of charac- teristic fingerprints is dedicated to new memory acquisition techniques. Figure 1.2 gives an overview of the

Privilege

DCILeech (Section 7)

BMCLeech (Section 6)

UEberForensIcs (Section 5.3)

RTS Acquisition LiME (Section 5.2)

Atomicity

Figure 1.2: Overview over the memory acquisition techniques in this thesis classified by atomicity and privilege level. LiME is added as a reference. memory acquisition techniques in this thesis. The x-axis indicates the level of atomicity. Roughly spoken, this means how much memory can be acquired at once. Often, the acquisition duration is a good pointer but also, when the CPUs are halted, the atomicity is high. The y-axis indicates the level of privilege. In this case, it does not denote one-to-one the x86 privilege levels but takes other privileges into account, e.g., direct access to special registers. As a reference, we also classify the well-known kernel acquisition tool LiME into the graphics. The first memory acquisition technique we present in Chapter5— UEberForensIcs —

3 1 Introduction is similar to a cold boot attack. In this case, memory acquisition software is integrated into the computer’s firmware. After restarting the computer, the investigator can trigger the memory acquisition from the firmware settings over the network. Running threads are stopped immediately, and so this acquisition technique is rather atomic and powerful because it is executed within the firmware. This chapter also shows how memory from the so-called UEFI Runtime Services (RTSs) could be acquired. These are services of the Unified Extensible Firmware InterfaceUEFI ( ) that can be called by the OS. Exfiltrating data has turned out to be quite challenging with this technique. Hence, in this thesis, we provide a proof-of-concept implementation of a tracer that traces RTS calls.

In Chapter6, we introduce BMCLeech — a DMA-based memory acquisition technique. This means an external device, in this case, a Baseboard Management Controller (BMC) that can be found in nearly every server system, directly accesses the target system’s memory without the host noticing. The memory acquisition software is integrated into the firmware of the BMC. So, the BMC can then, besides other server system maintenance features, stealthily acquire memory. BMCLeech suffers like all DMA-based acquisition techniques from bad atomicity. However, the acquisition cannot be detected or tampered with by the hostOS.

In Chapter7 we use a low-cost debugging technique — namely Intel Direct Connect Interface (DCI) — to acquire memory. DCILeech allows halting the CPUs. This allows the investigator to dump atomically. DCI also allows reading the specially protected memory of Intel’s Software Guard Extensions (SGX) enclaves. Hence, memory acquisition with the in this chapter presented DCILeech software is potent.

Eventually, in Chapter8 we summarize the results of this thesis.

1.2 Contribution

In this section, we provide some more technical insights into our contributions. Our contributions are twofold: We (1) demonstrate the limitations of conventional Linux logs for forensic event reconstruction and show the benefits of memory analysis using system call traces. Therefore, we calculate characteristic fingerprints on a Linux server. We measure the impact of various factors, like the feature set or source set. Furthermore, we show how the fingerprints perform in matching. System call traces that are gained using VMI have turned out to be very beneficial. VMI is a kind of memory forensics that aims to interpret memory of a virtual machine. When it comes to an actual forensic investigation, memory forensics, in general, has turned out to be essential. Before the analysis, memory needs to be acquired, which is becoming more challenging. So, we (2) survey the landscape of forensic acquisition techniques. The survey reveals that there are only a few tools on lower layers. However, memory acquisition from lower layers is beneficial. In this thesis, we contribute with four techniques on low levels.

In detail, our contributions in the log analysis are as follows:

• Despite the high measurement noise, we could calculate reasonable characteristic fingerprints for various events. However, the evaluation revealed that mostly command line tools leave almost no traces in common log files like syslog. The detectability increases only marginally if auth.log is additionally considered.

4 1.2 Contribution

• The possibility of event reconstruction increases significantly if system call traces are logged. However, even if this is done, it was not always possible to calculate characteristic fingerprints for all events. Therefore, the reconstruction of events based on such log files is severely limited.

• Using a unified log format and parsing unstructured log sources into more struc- tured ones increases the size of characteristic fingerprints and is beneficial for event reconstruction.

• The log entries generated by complex events (administrative events or events of web applications) usually overlap those of simple events (like command line tools). If com- plex events can be excluded, the detectability of other events increases substantially.

• Our calculated characteristic fingerprints are applicable for forensic event recon- struction. The matching results are quite good. Overall, our complete event set’s sensitivity is about 88% and reaches 100% if only detectable events are considered. Also, the false-positive rate of unknown events is relatively low and concerns somewhat similar events.

• We show that the false positive rate of unknown events is relatively low and usually only appears with similar or related events.

• We show how to use system calls for forensic event reconstruction systematically. Furthermore, we reveal what system calls are discriminative and how expensive these are in terms of performance.

In the area of memory acquisition techniques, we first surveyed to explore the field. The main contributions are as follows:

• We develop a model that allows us to classify today’s memory acquisition techniques in a more general way. Briefly spoken, we define a partial order on acquisition methods based on the level of access to the address space a certain technique is supposed to acquire. It turns out that this model generalizes not only concrete operating systems but also specific hardware architectures. In contrast to possible categorizations that rely on the classic ring-based privilege model, this partial order also allows to integrate of seemingly unrelated execution contexts like Intel’s SGX [26], AMD’s Secure Memory Encryption (SME)/Secure Encrypted Virtualization (SEV)[75, 1] or virtualization-based techniques into a more general taxonomy. Using this taxonomy, we survey the field of today’s memory acquisition techniques.

• We present the first survey of forensic memory acquisition that is operating system and hardware architecture-independent. We classify a vast amount of tools and thereby give the reader a thorough understanding of the latest memory acquisition techniques. Our analysis also points to promising future work fields since we observe particular clusters of acquisition techniques (at the hypervisor level or higher). However, not many tools exist for more powerful privilege levels.

The contributions for memory acquisition from the firmware are as follows:

• We introduce UEberForensIcs and show how to integrate forensic software that enables cold boot-like memory acquisition directly into the firmware of a computer. The evaluation in the thesis reveals that this approach can also be practically used.

5 1 Introduction

• We show how to persist code in the UEFI that is executed when the operating system is running. This code runs with kernel privileges and can also be used for memory acquisition. • We develop an OS-independent RTS tracer. The RTS are thereby traced in the RTS itself. Our evaluation gives insights into which and how often specific RTS are typically called in different scenarios. The contributions for memory acquisition from the BMC are as follows: • We introduce BMCLeech, the first software that brings forensic-readiness onto the BMC and whose memory acquisition cannot be detected by the target host system. BMCLeech is implemented as a PCILeech device and thus compatible with well- known memory forensic software, so there is no additional effort needed to analyze a system’s memory. Even the acquisition software of an analyst does not need to be replaced. • We provide an evaluation that demonstrates the feasibility and practicality of BMCLeech. The contributions for memory acquisition with Intel DCI are as follows: • We introduce DCILeech — the first low-level memory acquisition method that utilizes Intel’s DCI. This technique allows to dump system memory and produce a memory snapshot that satisfies full atomicity and full integrity. No software installation on the target is required. DCILeech benefits from its compatibility to PCILeech46 [ ], which we demonstrate in the evaluation. • We show how to access the decrypted memory of Intel SGX enclaves with the Debug profile. Using DFx Abstraction LayerDAL ( ), we were able to access specially protected enclave memory in the Enclave Page Cache (EPC). • We show how to break CPU-bound encryption using Intel DCI. • We sketch how Intel DCI can be used for the digital forensic triage.

6 1.3 Publications

1.3 Publications

Most results within this thesis are taken from publications at peer-reviewed conferences and journals. This thesis is based on the following publications:

[80] Tobias Latzo and Felix Freiling. Characterizing the Limitations of Foren- sic Event Reconstruction Based on Log Files. In 18th IEEE Inter- national Conference On Trust, Security And Privacy In Computing And Communications / 13th IEEE International Conference On Big Data Science And Engineering, TrustCom/BigDataSE 2019, Rotorua, New Zealand, August 5-8, 2019. pages 466-475, IEEE, 2019. doi: 10.1109/TrustCom/BigDataSE.2019.00069. URL https://doi.org/10. 1109/TrustCom/BigDataSE.2019.00069

[79] Tobias Latzo. Efficient Fingerprint Matching for Forensic Event Recon- struction. In Digital Forensics and Cyber Crime. 11th EAI International Conference, ICDF2C 2020, Boston, MA, USA, October 15-16, 2020, Proceedings, volume 351, pages 98-120. Springer, 2021. doi: 10.1007/978- 3-030-68734-2_6 URL https://doi.org/10.1007/978-3-030-68734- 2_6

[81] Tobias Latzo, Ralph Palutke, and Felix Freiling. A universal taxonomy and survey of forensic memory acquisition techniques. Digital Investiga- tion, 28(Supplement):56-69, 2019. doi: 10.1016/j.diin.2019.01.001. URL https://doi.org/10.1016/j.diin.2019.01.001.

[83] Tobias Latzo, Florian Hantke, Lukas Kotschi and Felix Freiling. Bringing Forensic Readiness to Modern Computer Firmware. Accepted at Digital Forensics Research Workshop EU 2021, DFRWS EU 2021, 2021.

[82] Tobias Latzo, Julian Brost, and Felix Freiling. BMCLeech: Introducing Stealthy Memory Forensics to BMC. Digital Investigation, volume 32, article 300919, 2020. doi: 10.1016/j.fsidi.2020.300919. URL http://www. sciencedirect.com/science/article/pii/S2666281720300147.

[84] Tobias Latzo, Matti Schulze, and Felix Freiling. Leveraging Intel DCI for Memory Forensics. Accepted at Digital Forensics Research Workshop US 2021, DFRWS US 2021, 2021.

The publications are used in this thesis as follows: • Chapter3 is mostly based on our two conference publications: “Characterizing the Limitations of Forensic Event Reconstruction Based on Log Files” [80] and “Efficient Fingerprint Matching for Forensic Event Reconstruction” [79]. For the first paper, the author has conceptualized the paper, implemented the software and performed the evaluation. The paper of this thesis is the sole author of the second paper. • Chapter4 is based on our journal publication “A universal taxonomy and survey of forensic memory acquisition techniques” [81]. The survey is a result of joint work with Ralph Palutke, conceptualized by the author of this thesis.

7 1 Introduction

• Chapter5 is based on our conference publication “Bringing Forensic Readiness to Modern Computer Firmware”. The implementation of UEberForensIcs is based on a bachelor thesis by Lukas Kotschi supervised by the author of this thesis, while the RTS proof of concept was implemented by Florian Hantke, who also contributed to measurements for the evaluation. • Chapter6 is based on our conference publication “BMCLeech: Introducing Stealthy Memory Forensics to BMC”. While the author of this thesis wrote the publication of this paper of this thesis, the implementation of BMCLeech results from a project work by Julian Brost supervised by the author of this thesis. • Chapter7 is based on our unpublished paper “Leveraging Intel DCI for Memory Forensics” which is written by the author of this thesis. It is a result of a bachelor thesis by Matti Schulze supervised by the author of this thesis. Furthermore, the author of this thesis contributed to the following publications:

[100] Florian Menges, Fabian Böhm, Manfred Vielberth, Alexander Puchta, Benjamin Taubmann, Noëlle Rakotondravony, and Tobias Latzo. Intro- ducing DINGfest: An architecture for next generation SIEM systems. In Hanno Langweg, Michael Meier, Bernhard C. Witt, and Delphine Reinhardt, editors, Sicherheit 2018, Beiträge der 9. Jahrestagung des Fachbereichs Sicherheit der Gesellschaft für Informatik e.V. (GI), 25.- 27.4.2018, Konstanz, volume P-281 of LNI, pages 257-260. Gesellschaft für Informatik e.V., 2018. ISBN 978-3-88579-675-6.

[43] Felix Freiling, Tobias Groß, Tobias Latzo, Tilo Müller, and Ralph Palutke. Advances in Forensic Data Acquisition. IEEE Design & Test, 35(5):63-74, 2018. doi: 10.11.09/MDAT.2018.2862366. URL https://doi.org/10.1109/MDAT.2018.2862366.

[101] Florian Menges, Tobias Latzo, Manfred Vielberth, Sabine Sobola, Hen- rich C. Pöhls, Benjamin Taubmann, Johannes Köstler, Alexander Puchta, Felix Freiling, Hans P. Reiser, and Günther Pernul. To- wards GDPR-compliant data processing in modern SIEM systems. Computers & Security, 103:102165, 2021. ISSN 0167-4048. doi: 10.1016/j.cose.2020.102165. URL http://www.sciencedirect.com/ science/article/pii/S0167404820304387.

8 2 Background

In this chapter, we briefly provide some background information necessary to understand the rest of this thesis. First, the theoretical background of calculating characteristic fingerprints is given (Section 2.1). Section 2.2 focuses on x86 privilege rings including techniques for privilege separation, e.g., virtual memory, virtualization techniques, DMA, Intel SGX, etc. Section 2.3 we focus on some principles of memory forensics, especially memory acquisition.

2.1 Calculating Characteristic Fingerprints

First, we want to describe what characteristic fingerprints are and how they are calculated. Besides conventional logs, we also use system call traces (see also Section 2.1.1) that we treat as log messages. In Section 2.1.2 we describe the application of Dewald’s theory of characteristic fingerprints on logs [32].

2.1.1 System Call Tracing

System calls are service invocations of the operating system issued by an application, respectively the corresponding library. In GNU/Linux there are different system calls, like execve for starting new applications, open to open a file or connect that initiates a connection on a socket. There are many techniques for system call tracing. Traditionally, sandboxes have been used for dynamic analysis [158]. One disadvantage of using a sandbox is that they monitor system calls of one particular process (set of threads) only. Virtualization techniques today allow us to easily trace the system calls of entire machines using Virtual Machine Introspection (VMI), e.g., using libraries like libvmtrace [145] which we also employ later. Using VMI for system call tracing has the advantage that one does not need to know which thread to trace in advance. Furthermore, also those system calls are monitored that are initiated by, e.g., interprocess communication, resulting in system calls in different treads or processes. However, tracing all system calls can be quite resource-intensive.

2.1.2 Forensic Fingerprint Calculation

Forensic fingerprints are trace patterns left by certain events in digital evidence. Our forensic fingerprinting approach is based on techniques developed by Dewald[32], which we briefly explain now. An individual log message can be represented by a vector of features. Examples for such features are a generic type_id, the user (login name) responsible for the event, the path associated with the event, etc. While there can be many different features, we assume

9 2 Background there is a fixed set F of all relevant features in a specific analysis context. We define V as the set of all possible feature vectors over F .

When an event happens on a computer system, entries in log files may be written asa result of that event. Therefore, an event can be regarded as the generator of a set of feature vectors, one for every log file entry caused by the event.

Formally, we define a set Σ of all events that can happen. When an event σ ∈ Σ happens, it generates a set of feature vectors, which can be regarded as the traces left by the event in log files. For any such event σ, the Evidence Set E(σ) [32] is the set of all subsets of feature vectors in V caused by σ. Furthermore, E(σ) must be closed under subsets. Intuitively, the latter requirement expresses that partial evidence is also evidence of the event, but it is also a technical requirement to use the evidence set in calculations.

The evidence set E(σ) of an event contains all feature vectors caused by σ, and it is clear that evidence sets of different events may overlap. In a forensic context, it is interesting to know which feature vectors are actually caused by the event σ and not by any other event. This is the idea behind characteristic evidence. Formally, for an event σ ∈ Σ, the set of characteristic evidence CE(σ) with respect to a set of “other” events Σ′ ⊆ Σ is defined as follows: ⋃︂ CE(σ, Σ′) = E(σ) \ E(σ′) σ′∈Σ′ Intuitively, CE(σ, Σ′) is the set of all feature vectors from E(σ) that remain after “sub- tracting” the evidence sets of all events in the set Σ′, which we call the reference set. If |CE(σ, Σ′)| and |Σ′| are sufficiently large, σ can be regarded as a clear indication that σ has happened and not any other event from the reference set. It may, however, also happen that CE(σ, Σ′) is empty, i.e., σ cannot be reliably detected given the set of feature vectors F . The factors influencing this form of detectability are investigated in Chapter3.

In the following we use the terms characteristic evidence and characteristic fingerprint synonymously.

2.2 Methods for Privilege Separation

In the following we describe some privilege separation techniques of x86 systems more in detail. We start with the most fundamental techniques — the privilege rings (see Section 2.2.1) and virtual memory (see Section 2.2.2). Then we continue with virtualization techniques (see Section 2.2.3). Furthermore, we briefly describe the UEFI (see Section 2.2.4), the System Management Mode (SMM) (see Section 2.2.5), DMA (see Section 2.2.6) memory encryption techniques incl. Intel’s SGX (see Section 2.2.7) out-of-band (OOB) management processors (see Section 2.2.8). Finally, we give some background information about Joint Test Action Group (JTAG) debugging (see Section 2.2.9).

2.2.1 Privilege Rings

Intuitively, privilege rings are hierarchical levels of protection that generalize the classical distinction between user and supervisor mode in common CPUs. When in user mode, the processor uses a restricted instruction set. During certain events (e.g., interrupts, execution

10 2.2 Methods for Privilege Separation

Virtual Address

#PML4 #PDPT #PD #PT Physical Page Offset

47 38 29 20 11 0 8 8 8 8 12

PDPTE PTE PML4E Req. Value

PDE PML4 PDPT PD PT 4 KiB Page

CR3

Figure 2.1: Virtual memory organization in x86-64. The MMU has to walk from the Page Map Level 4 (PML4) table through the Page Directory Pointer Table (PDPT), Page Directory (PD) table, the Page Table (PT) and finally the physical 4 KiB page with the requested value. of privileged instructions, access to machine-specific registers or peripherals), the processor switches to supervisor mode. The x86-64 architecture comes with four native protection rings (ring 0 to 3) [21], where the lower the ring number, the higher the privilege level. Typical operating systems, i.e., Linux or Windows, only use two of these rings. The kernel operates in ring 0 (supervisor mode), while applications are executed in ring 3 (user mode). These protection rings are used to protect assets like memory regions, I/O ports, and privileged Central Processing Unit (CPU) instructions. Violating the current protection ring causes a context switch to supervisor mode. This typically allows an operating system’s kernel to handle the corresponding fault. Other architectures provide very similar protection models. ARM, e.g., uses exception levels that separate execution into user, system, hypervisor, and a secure monitor mode. Sparc and PowerPC, however, restrict their privilege levels to user and kernel mode.

2.2.2 Virtual Memory

Modern processors provide hardware support for transparent memory address translations at run time. This is commonly used to separate the address spaces of the kernel and applications and create an abstraction called virtual memory. A hardware Memory Management Unit (MMU) translates a virtual address to a physical address using a set of lookup tables in memory. The root of these lookup tables is accessed via a special CPU register (called CR3 in x86). To simplify the translation, memory is usually divided into blocks of equal size called pages that are mapped to blocks of equal size in physical memory called page frames. Isolation is achieved by using different and independent sets of lookup tables per process. The memory translation process is completely transparent to an application. So, for an application, it looks like it would own the whole address space. Figure 2.1 shows how the memory virtualization on x86-64 is implemented using a page size of 4 KiB. Each process holds an own CR3 value, which points to the highest-level page table in the paging hierarchy (denoted as PML4). To calculate the actual physical

11 2 Background memory address, the MMU needs to traverse the page table hierarchy down to the final Page Table Entry (PTE). By adding the physical page offset to the address of the page frame results in the physical address. For speedup purposes, today’s processors store already translated virtual to physical address mappings in special hardware caches, called Translation Lookaside Buffer (TLB).

2.2.3 Virtualizable Architectures and Virtualization Extensions

While virtual memory refers to the abstraction of an address space only, the more general term virtualization describes the abstraction of a system’s complete physical resources to one or more Virtual Machines (VMs). A VM serves a guest system as an isolated execution environment that is controlled by a Virtual Machine Monitor (VMM) or hypervisor. Furthermore, a VMM is responsible for scheduling physical resources between the VMs. To allow full virtualization, all instructions that access or control virtualized resources must be privileged instructions that cause a trap into the VMM when executed within a VM [124]. As not all of these instructions can be trapped, virtualizing the x86-64 instruction set renders it impossible. To overcome this problem, Intel introduced virtualization extensions that we now briefly explain.

In 2005, Intel released the first CPUs implementing Intel Virtualization TechnologyVT ( ) which is also known as Virtual Machine Extensions (VMX) or VT-x [27]. VT-x provides full virtualization of a CPU and enables the execution of unmodified guest systems. Therefore, it introduces two new operating modes that are orthogonal to the classic ring model: VMX root operation and VMX non-root operation. While the former refers to the mode, the hypervisor is operating in. The second constitutes the restricted guest mode. Whenever a guest tries to break these restrictions, e.g., by executing certain sensitive and privileged instructions, accessing specific registers or peripherals, the processor triggers a VM-exit and transfers control to the hypervisor. After handling the fault-causing event, the hypervisor resumes the guest by generating a VM-entry.

To prevent guests from accessing each other and the VMM, Intel provides a hardware-based second-level address translation mechanism to virtualize guest physical memory. This forces the CPU to translate a Guest Physical Address (GPA) into a Host Physical Address (HPA) through a set of Extended Page Tables (EPTs) exclusively controlled by the VMM. Like classical paging, EPTs are a hierarchically ordered set of page tables that allow the VMM to prevent a guest from accessing certain memory areas. Any attempt to violate these restrictions leads to an EPT violation fault, which switches control to the hypervisor.

Conceptually, a VMM has even more privileges than programs running in ring 0. This is why the privilege level of a hypervisor is usually termed as “ring -1”.

2.2.4 Unified Extensible Firmware Interface

The UEFI was introduced in 1998 as the successor of the legacy PC-Basic Input/Output System (BIOS). Often the UEFI is still called BIOS. A more generic term for UEFI and BIOS is firmware. In 2014, Intel released an open source implementation called Tiano of an Extensible Firmware Interface (EFI). Tiano evolved to EFI Development Kit II (EDK II) and is now maintained by the TianoCore community [150].

12 2.2 Methods for Privilege Separation

The UEFI boots itself into protected mode (32-bit) or long mode (64-bit). UEFI implemen- tations have evolved to tiny OSs with their own applications, network stack, etc. There are several specified stages, e.g., the Security (SEC) phase as the first stage, followed by the Pre EFI initialization (PEI) and Driver Execution Environment (DXE) phase. When reaching the DXE phase, basically all hardware initializations happened, and the hardware can be used. During this phase, the system resources are owned by the firmware Boot Services. When booting the actual OS, ExitBootServices() is called and control of the resources is given to the OS. However, there are the UEFI Runtime Services (RTSs) that can still be called by the OS, e.g., to read and write firmware parameters or to get and set the time.

2.2.5 System Management Mode

Sometimes, system architectures contain specific operating modes that are entirely inde- pendent of all normal system operations. Regarding the Intel architecture, this mode is called System Management Mode (SMM)[23]. In contrast to other modes, the SMM is not intended to execute operating systems or user applications. Instead, the SMM is used for low-level management functionality like power management or legacy device emulation, e.g., PS/2 support for Universal Serial Bus (USB) mouse or keyboard. An unmaskable System Management Interrupt (SMI) triggers the SMM. Then, the CPU switches to another address space — the SMRAM — and saves its current context. Afterward, a corresponding SMI handler is executed. SMI handlers can be installed to SMRAM as long as the D_LCK bit in the System Man- agement Control Register (SRAMC) is still unlocked. This is typically done by the BIOS before a system’s boot process. After the installation, the BIOS has the chance to lock the SMRAM, preventing further modifications. In SMM, the address mode is similar to the 16-bit real mode, i.e., it is only possible to access 4 GiB of RAM. The SRAM’s physical location can be in system memory and a separate RAM. The RSM instruction is used to restore the saved context and resume operation. The execution of the SMM is entirely transparent to the OS. It is only possible for the OS to detect its execution through timing discrepancies. Since the SMM has even more privileges than a hypervisor, e.g., interrupts are entirely disabled, its privilege level can be classified as “ring -2”.

2.2.6 PCI Express and DMA

PCI Express (PCIe) is a high-speed, point-to-point bus system that serves to connect several peripheral devices in modern computer systems [121]. PCIe is known to be used for graphics cards and network cards. In contrast to conventional bus systems like PCI, PCIe is organized in a tree topology. All nodes, i.e., all peripheral PCIe devices and switches that connect further end nodes, are connected to a central unit — the Root Complex (RC). Another possibility is that a PCIe device is connected via a switch to the RC. The RC itself is connected to the CPU and the memory. DMA is used to copy data by a peripheral device like a graphics card or a disk from or to the host memory without using the CPU. After the transaction is completed, the CPU can be informed that a copy operation is finished. When many bytes are copied, DMA is significantly faster than traditional Programmed I/O.

13 2 Background

PCIe devices can perform DMA to allow faster I/O. For example, if a node in this point- to-point network wants to perform a memory transaction, it sends a Transaction Layer Packet that is routed to the RC and memory through the point-to-point network. Of course, this means that PCIe devices that are connected to the computer system must be trusted as they can perform arbitrary operations in memory.

Intel’s Virtualization Technology for Directed I/O (VT-d)[25] comes with an I/O Memory Management Unit (IOMMU) that works similar to the common MMU used to implement virtual memory. The IOMMU translates addresses that are accessed by a device to the internal memory addresses. Thus, the host memory is no more unprotected, but the memory ranges that can be accessed by a specific device can be limited. Recent research [97] revealed that current IOMMU implementations in Windows, Linux and macOS do not fully prevent DMA-based attacks, however.

2.2.7 Hardware Memory Encryption and Intel SGX

Modern processor architectures increasingly provide direct memory encryption support that keeps the encryption key within the processor. A prominent example is AMD’s SME [75] (respectively Intel’s Total Memory Encryption (TME)[24]) that can encrypt memory at page granularity. Suppose memory encryption is bound to the execution context of a particular process or thread. In that case, memory protection prevents higher privileged (“lower”) layers from accessing the memory of less privileged (“higher”) layers. Examples of such features are AMD’s Secure Encrypted Virtualization (SEV)[75, 1] and Intel’s SGX [26] as well as Multi-Key Total Memory Encryption (MKTME)[24, 105]. In the following, we briefly introduce these concepts.

Intel SGX extends modern x86-64 processors to protect user mode code and data from higher privileged layers, like the firmware, VMM or OS using so-called enclaves. Enclaves can be seen as isolated containers of a ring 3 application with encrypted memory. The CPU decrypts it using the Memory Encryption Engine and stores the decrypted pages inside a special Enclave Page Cache (EPC) that is integrated into the CPU. Enclaves can only execute user mode code as their operation is quite restricted. For example, code running inside an enclave cannot directly call into another application or execute system calls to request kernel functionality. The privilege level of enclaves is, therefore, usually compared to a conventional ring 3 application. Code can be verified to run inside an enclave using Intel’s remote attestation mechanism. First, an application launches an enclave, which then attests its integrity and confidentiality to a server. Afterward, a decryption keyis sent to the enclave, used to decrypt the actual payload. To deploy one’s software inside an enclave requires an attestation key from Intel. Similar to SGX, AMD came up with its own way to encrypt a system’s volatile memory with the help of several processor extensions [105]. SME enhances the x86-64 instruction set for page-granular memory encryption support using a single 128-bit Advanced Encryption Standard (AES) key. The key is randomly created by a hardware random generator at each boot and cannot be accessed by software. SME introduces a memory encryption flag, called the C-bit, in the page table’s leaf entries that allow the software to mark corresponding pages to be encrypted. Once set, the processor’s encryption engine automatically encrypts and decrypts these pages upon software read and write accesses. Memory pages that have no C-bit set are directly accessed instead. Since the software can modify the C-bit, SME does not protect memory from software running on the system. However, it prevents external

14 2.2 Methods for Privilege Separation devices from peeking at their memory. In contrast to SME, Transparent Secure Memory Encryption (TSME) does not rely on software intervention. It is used to transparently encrypt the entire physical memory regardless of the C-bit value. Both TSME and SME share the same AES encryption key.

SEV extends SME’s functionality to enable per VM memory encryption. In that way, the content of a VM’s memory can be isolated even from the higher privileged hypervisor. With SEV enabled, a VM uses its own private encryption key derived from SME’s ephemeral key.

Meanwhile, Intel announced to enhance their processors with very similar extensions to overcome the limitations of SGX [24, 105]. While its new concept TME mostly matches AMD’s SME functionality, MKTME appears to be Intel’s equivalent to SEV. Contrary to SEV, MKTME allows the software to encrypt memory on a per-page basis and does not require the existence of aVM.

2.2.8 Out-Of-Band Management

Today’s servers, laptops, and desktop systems often come with management features that allow remote configuration and administration over protocols like IPMI [65]. These management features are usually implemented out-of-band (OOB), i.e., there is a separate co-processor that can be used for management tasks like OS deployment or remote access. Intel’s OOB management technology for consumers is called Advanced Management Technology (AMT)[67] which is part of Intel’s Converged Security and Management Engine (CSME). Note that the CSME consists of hardware, firmware, and software. Hence, the hardware, i.e., the processor, resides on the Platform Control Hub (PCH), which is placed on the same die as the main CPU. CSME is connected to the system’s main memory via a DMA engine. This allows accessing the main CPU’s system memory. On server systems, there is usually a so-called BMC. This is usually a co-processor on a server’s mainboard that is used for maintenance. Therefore, the Intelligent Platform Management Interface (IPMI) protocol is often used. Examples of IPMI implementations for servers are HP’s Integrated Lights-Out (iLO) [63] and Dell’s Remote Access Controller (DRAC) [34]. The old IPMI protocol will probably be replaced by Redfish [33].

While not all OOB approaches are implemented on the same chip as the main CPU, they all provide an active and independent processing unit that can connect to the main CPU’s memory via DMA. Since the units can access full system memory and there are no real control mechanisms, OOB management is often termed as “ring -3”. Other privileged DMA-based techniques also belong in that category.

2.2.9 JTAG Debugging

JTAG is used synonymously for the IEEE-1149.1 standard [94]. It is used for testing and debugging of Integrated Circuits (ICs). JTAG allows to test and debug ICs when they are already installed. Today, JTAG is mainly known for flashing and debugging microcontrollers. It can be regarded as a “debugger” for Integrated Circuits (ICs) which has nearly unrestricted access. For this reason, the JTAG interface is often disabled or secured by a password.

15 2 Background

One main component is the Test Access Port (TAP) which is often called JTAG interface. It comes with five data lines where four a mandatory. If one finds those pinsonan IC, the chances are good to get a JTAG connection. The pins are labeled as follows: • TDI: Data Input • TDO: Data Output • TCK: Clock • TMS: Test Mode Select • TRST: Reset (optional) When it comes to x86 systems, JTAG debugging gets harder. Some special ports and devices are necessary for JTAG debugging [85]. The corresponding devices and software licenses are relatively expensive. Furthermore, one needs a mainboard that comes with such a port. Since 2015, there is the Intel Direct Connect Interface (DCI) that makes low-cost debugging easily possible for x86 systems, which we utilize in Chapter7.

2.3 Memory Forensics

The second part of this thesis is dedicated to memory acquisition. Memory acquisition is a primary part of memory forensics. In this section, we want to provide some background information on memory forensics. Traditional digital forensics was focused on local storage. The corresponding tools are specialized in finding hidden or deleted files in a one-to-one byte-wise snapshot of the filesystem. However, some artifacts are not necessarily stored on the local disk, e.g., encryption keys, running processes, network-attached storage, memory-only malware, etc. This is where memory forensics comes into play.

Basically, memory forensics can be divided into memory acquisition and memory analy- sis [13]. Classically, memory acquisition tools read the target’s system memory. Memory analysis tools try to extract information from a given snapshot, i.e., they search for specific patterns to identify specific data structures. It is also possible that the two stepsof memory acquisition and memory analysis are performed in alternation. This happens, for example, when a tool searches for a specific signature without saving the whole memory. One prominent example is mimikatz [31] which searches for Windows login credentials in memory. Most tools and techniques we regard in this thesis do first take a full snapshot of the target’s memory. The advantage of taking a full snapshot first is that it is then possible to acquire system memory of higher quality. In the following we describe what quality means in terms of memory acquisition techniques. In 2012, Vömel and Freiling defined three criteria for forensically sound memory acquisition: correctness, atomicity and integrity [156]. In the following, we briefly explain these three criteria and shows how to measure them using the proposed framework by Gruhn and Freiling [57].

Vömel and Freiling [156] defined a memory snapshot to be correct if the memory acquisition tool acquires the actual content of the memory. Correctness is a necessary criterion of memory acquisition software. Due to the interleaving of memory acquisition with normal system operations, memory images might exhibit effects of system actions for which the actual cause has notbeen

16 2.3 Memory Forensics recorded. A memory snapshot is atomic if such inconsistencies do not arise. Therefore, a snapshot’s atomicity is proportional to the time it takes to take the entire snapshot. For black-box evaluation, Gruhn and Freiling [57] therefore, chose to quantify the atomicity of a snapshot by the time between the acquisition of the first memory region and the last memory region. Pagani et al. [116] showed that non-atomic snapshots occur and are an actual problem that should be regarded. Formally, a memory snapshot satisfies integrity if the content of memory is not changed after an analyst decides to take a snapshot. According to Vömel and Freiling [156], integrity aims at quantifying the level at which the process of taking the snapshot changes the content of memory. Gruhn and Freiling [57] quantified integrity by measuring the average time of overall memory regions from the start of the acquisition until the time when the memory region is acquired. We, however, follow the original intention of the definition of integrity.

17

3 Limitations of Forensic Event Reconstruction based on Log Files

One main focus of digital forensic analysis is the reconstruction of prior events based on traces left in digital evidence. Essential sources of traces on persistent storage are log files [90] which come in many different forms. Well-known are, for example, the standard system log files in GNU/Linux, e.g., syslog and auth.log, which are used by many different applications to record system events [48]. In , the arguably most prominent source of logging information is the Windows Event Log, a service that allows arbitrary applications to document relevant activities in a central log file [104]. In contrast to rather generic log files like syslog or systemd journal, many specific log files are used by certain applications to record events. For example, intrusion detection systems keep track of suspicious events in log files, and web server applications like Apache keep trackof external accesses to served web pages.

In this chapter, we systematically analyze the value of logging information for the recon- struction of events on an essential class of computing machines, namely the GNU/Linux server system. These machines are relevant because they often run critical server applica- tions (like web servers, blogs, or wikis), which are under constant attack from the network. Furthermore, system administrators’ activities on such machines can severely affect the functioning of an entire organization. So the reconstructability of administrative actions is also of forensic interest.

We use a formal approach [32] to answer the question of whether a prior event is recon- structable from logging information in that we empirically measure log file fingerprints of such events. Later we compute whether these fingerprints are discriminative in compar- ison to the fingerprints of other events. Because the set of comparison fingerprints will always be smaller than the set of events that may occur in practice, methodologically, our approach rather allows us to make negative statements about event reconstruction, i.e., we can confirm that a certain event is not reconstructable from the logging information we consider. However, in Section 3.3 we show in which typical cases false positives of unknown events may occur.

We further study how our characteristic fingerprints perform in matching the respective events. As an additional log source, we utilize system call traces. Since tracing all system calls is extremely expensive in terms of performance, we show how to reduce tracing overhead and detect events systematically.

Contribution The main insights of our experiments are as follows:

• Despite the high measurement noise, we can calculate reasonable characteristic fingerprints for various events. However, the evaluation revealed that mostly command line tools leave almost no traces in common log files like syslog. The detectability increases only marginally if auth.log is additionally considered.

19 3 Limitations of Forensic Event Reconstruction based on Log Files

• The possibility of event reconstruction increases significantly if system call traces are logged. However, even if this is done, it was not always possible to calculate characteristic fingerprints for all events. Therefore, the reconstruction of events based on such log files is severely limited. • Using a unified log format and parsing unstructured log sources into more struc- tured ones increases the size of characteristic fingerprints and is beneficial for event reconstruction. • The log entries generated by complex events (administrative events or events of web applications) usually overlap those of simple events (like command line tools). If com- plex events can be excluded, the detectability of other events increases substantially. • Our calculated characteristic fingerprints are applicable for forensic event recon- struction. The matching results are quite good. Overall, our complete event set’s sensitivity is about 88% and reaches 100% if only detectable events are considered. Also, the false-positive rate of unknown events is relatively low and concerns somewhat similar events. • We show that the false positive rate of unknown events is relatively low and usually only appears with similar or related events. • We show how to use system calls for forensic event reconstruction systematically. Furthermore, we reveal what system calls are discriminative and how expensive these are in terms of performance. First, in Section 3.1, we describe the setup we use for the evaluation. This also includes the logs we use, which events are regarded, and which features are considered. In Section 3.2 we show the impact of different parameters, e.g., the feature set and source set, onthe size of (characteristic) fingerprints. These fingerprints are used for matching the eventsin Section 3.3. In Section 3.4 we show how to improve the performance of system call tracing by only tracing the discriminative system calls. Related work can be found in Section 3.5. Finally, in Section 3.6 the results are discussed.

3.1 Experimental Setup

We now describe the experimental setup used for the evaluation, including the scenario and attacker model, the considered events and log sources, and the feature set which we extracted from the logging sources.

3.1.1 Scenario and Attacker Model

Our setup consists of a GNU/Linux server running Ubuntu 16.04 with several services enabled: a WordPress instance, an SSH server, Docker containers, and a Nextcloud instance. We assume that it is maybe a typical infrastructure in small and medium-sized enterprises. There, such servers are a valuable asset and might be interesting for attacks and forensic analysis. We assume an attacker with root privileges, e.g., a malicious administrator or a regular system user who gained those privileges by a privilege escalation attack. While this makes it easier for an attacker to cover traces (and delete logging information), we assume that the

20 3.1 Experimental Setup system is instrumented using a Security Incident and Event Management (SIEM) system that regularly retrieves logging information from the server. Therefore, the attacker can delete all logging information that is not regularly collected and influence outgoing logging information after compromising the system. All actions up to this point will be logged (and retrieved by the SIEM system) in an unaltered way.

3.1.2 Log Source Classification

We consider three sources of logging information that can be distinguished according to their common availability. The first class consists of highly available log sources. These sources are usually available and enabled by default on the corresponding OS. In our case, we utilize the syslog and auth.log which are by default located in the /var/log directory on GNU/Linux systems. The second class of log sources is sometimes available, i.e., it is usually available and enabled if a specific application is installed. From this class, we make use ofthe access.log that is fed by Apache’s HTTP Server [148] and records all HTTP GET and POST requests that the server receives. The third class of log sources is usually unavailable. This refers to log sources typically not installed on a default operating system and do not belong to a typical server application. We use a prominent member of this class, namely system call logs, to compare and benchmark the other (more available) sources. We chose system call logs because they are known to be a rich source of behavioral information [78, 128] and because server installations based on hypervisors can be configured to collect such information. Especially in enterprise environments, system call traces might be a viable and valuable additional source of logging information if a company needs to detect specific events or attacks that cannot be disclosed using traditional logs. In such cases, system call tracing must be pre-configured as part of forensic readiness processes. While highly available and sometimes available log sources are possible to disable or fake by the attacker, he or she cannot disable or fake system call traces. Therefore, one needs to have privileges over the hypervisor. In our attacker model, the virtual machine administrator is not the system administrator in personal union, but they are physically different persons. Also, on standard desktop systems, many logs are generated. Although an administrator has permissions to fake log messages, it is difficult to fake log messages consistently [42]. These three classes of logging information can be grouped into four distinct classes shown in Figure 3.1: Class 1 consists of the highly available sources only, class 2 consists of highly available and sometimes available sources, class 3 consists of highly available and usually unavailable sources, and class 4 representing the union of all sources. Note that classes 2 and 3 are incomparable since none fully includes the other. In this sense, Figure 3.1 represents a partial order among the sets of log sources.

21 3 Limitations of Forensic Event Reconstruction based on Log Files

auth.log syslog1

auth.log auth.log syslog syslog access.log2 syscalls3

auth.log syslog access.log syscalls4

Figure 3.1: The different sets of log sources used for the evaluation.

Event Execution auth.log

syslog Filebeat Fingerprint Engine

access.log log data

Target System Recording Record Fingerprint VMI: syscall Storage Storage tracing

Monitor Matching Engine

Hypervisor Forensic Fingerprint Workstation

Figure 3.2: Architecture of our experimental setup.

3.1.3 Experimental Architecture

In Figure 3.2 one can see a simplified schema of the architecture of our experimental setup. The target system (Ubuntu 16.04) runs virtualized as a Xen guest on an x86-64 CPU and 2 GiB of RAM. System log files are drained via Filebeat [37]. A monitor VM using libvmtrace [145] instruments the target system and traces all requested system calls. Note that tracing of system calls happens on another virtual machine, so basically, an attacker with root privileges on the target system cannot disable system call tracing. By default, we trace all system calls. Note, this highly impacts performance, which we treat later (see Section 3.4). The system log files and the system calls are sent via Apache’s147 Kafka[ ] to the Forensic Fingerprint Workstation.

To calculate a new fingerprint for an event σ (see also Section 2.1.2), the event is executed automatically multiple times from the Forensic Fingerprint Workstation. Most events can

22 3.1 Experimental Setup be executed via an open SSH session. The Recording module receives all logs (including system call traces) via Kafka and stores them in a unified log format as JSON files. Toget the data in a uniform format, the log messages are parsed using regular expressions into the uniform log format — a feature vector. Afterward, the corresponding fingerprint can be calculated by the Fingerprint Engine as described in Section 2.1.

After calculating multiple fingerprints (in our case all), one can calculate characteristic fingerprints (also described in Section 2.1. All resulting characteristic fingerprints are also stored as JSON files.

There are two ways to start matching. First, it is possible to perform live matching. This means that all incoming logs (and system call traces) are processed directly by the Matching Engine. We assume that the feature vectors of any event appear within at most t time units. For each enabled (characteristic) fingerprint, the Matching Engine checks how many feature vectors of the fingerprint match with the incoming log vectors within a timeof 2t (to not miss an event that occurs between two such time slots). The second possibility is to match on recorded logs. In this case, the Matching Engine calculates the percentage of equal feature vectors in the characteristic fingerprint and the stored log. This is the way we use the Matching Engine in this chapter and allows reproducibility.

In our evaluation, we only use a single target system. Log entries need to be sent via Kafka. Thus, it is basically also possible to have multiple target systems.

3.1.4 Feature Sets

As shown above, in theory, fingerprints are sets of feature vectors where features aretaken from a finite set F . In practice, it is necessary to determine the concrete elements of F . We do this by defining a unified message format into which each piece of logging datathat is sent to the fingerprinting engine is translated. Table 3.1 shows an excerpt of the unified message format we specified. The final unified message format has 15 fields, butmostof the fields are usually empty. We now briefly explain the meaning of the essential features.

To begin, source specifies from which of our sources log the message comes from(see Figure 3.1). One of the essential attributes of the unified log message is type_id. This ID specifies what kind of message it is. In the case of system calls, the type_id is the system call number.

Unfortunately, established Linux log formats such as syslog and auth.log have a rather loose format specification and do not contain a type_id field like Microsoft’s Event ID in the Windows Event Log [104]. The logs are usually not semantically structured, i.e., no field specifies a path or user name but rather free text. Therefore, we needtoparse log messages by ourselves and assign type_ids. However, it was impossible to parse all possible log messages from our sources for all available fields for every event. We followed the approach to writing new regular expressions for unknown incoming messages whenever such a case occurred during event execution. So while we cannot guarantee that all possible log messages are scanned correctly, we can guarantee that all log messages during our measurements were correctly parsed.

The path entry is set for a log message that contains some path to a file in the current file system. For example, the execution of the ls will cause among others the following feature vector (JSON-formatted):

23 3 Limitations of Forensic Event Reconstruction based on Log Files

{"source" : "syscalls", "type_id" : 59, "path" : "/bin/ls"} meaning that the execve system call (number 59) is executed with the path /bin/ls. Note, this system call is not performed by ls itself but by the corresponding shell.

The entry misc contains arbitrary other probably useful information that is often passed to logging services and contains various information that could be helpful for fingerprinting but does not fit into other fields of the unified message format. Hence, this feature usually has to be parsed manually, e.g., the complete argument list of the execve system call is parsed into the misc field.

Table 3.1: The unified log message format. Name Description source The source from where the message comes from. type_id Describes the type of the message, e.g., the system call number. date Timestamp of the message when it was generated. path A path, e.g., which path was opened. user The user who performs the event. process_name The name of the process that performs the event. ······ misc Can be used for random things that do not fit into that format.

source type_id1

source source type_id type_id 2misc 3path

source type_id path 4misc

Figure 3.3: The different feature sets used for the evaluation.

In general, a domain expert must select and determine an appropriate feature set for fingerprint calculation. In our model, a feature corresponds to entries in the unifiedlog message format. To experiment with different feature sets, we subdivided the entire feature set into three classes. The idea behind the selection of classes is as follows: Not all fields are suitable as a feature, e.g., the date field varies from message to message and isnot specific to a type of event. So the first feature set contains only type_id and source which most directly refer to the event.

We then consider two additional features for feature selection: Since file system paths are comparable to names, we consider the path feature to be almost equally important to the above two features. As a reference, we also consider the misc feature, which is specific to every log entry. Usually, it is left blank. Other fields of the unified message format are less

24 3.2 Forensic Fingerprints generic and only helpful in more specific contexts, e.g., if an event shall be connected with a specific user or IP address.

From these three classes, we construct four feature sets that are shown in Figure 3.3: The first feature set merely considers source and type_id. The second and third feature sets contain the misc feature and path, respectively. Finally, the fourth feature set contains all four features. Note that the subset relation between these feature sets is a partial order (as depicted in Figure 3.3).

3.1.5 Events and Event Classification

Table 3.2 lists the 45 events we use in our evaluation. It consists of execution of simple Command Line Interface (CLI) tools like ls, cp or mv. Further, some commands require root privileges, e.g., the removal of a protected file. The next class contains web events from a WordPress server instance, e.g., login into the website. Then there are typical Linux service events, for example, stopping the Apache webserver. Another class is related to Linux kernel modules, i.e., listing, loading, and unloading Linux kernel modules. Further, there is a bigger class with many typical Docker events that operate either on containers or perform events in Docker containers. Note, the corresponding image is always already loaded from the Internet. These events are similar to the events in the first class (CLI). The last set of events is dedicated to administrative events on a Nextcloud instance. The column t indicates how long the system waits for logs after performing the event. These values are determined empirically and generously rounded up. Note, all system calls are traced at this time, which strongly impacts performance.

3.2 Forensic Fingerprints

First, we want to focus on the calculation of (characteristic) fingerprints. A script was executed multiple times either on the target machine via an SSH session or on the Forensic Fingerprinting Workstation itself (see Figure 3.2) to calculate a fingerprint for a new event. Web events were automated using Selenium [136]. After executing the event, the Recording module on the Forensic Fingerprinting Workstation waits for a specific time (t) for the corresponding log messages it receives via Kafka. Incoming messages are parsed into the unified message format (see Table 3.1). More concretely, we compare the usefulness of three classes of logging information, one of them being artificial: (1) classical syslog and auth.log files, (2) web server-specific information from Apache’s access.log file, and (3) logging information available from system call tracing systems (realizable in virtualized systems through VMI [123]). We are fully aware that system call logs are unavailable in most systems. However, the additionally rich source of information from system call logs better allows to compare the usefulness of more classical log file information and thereby allows better approximate lower-bounds on event reconstruction. For example, by making statements like “even with system call tracing, this event cannot be reliably detected”. As mentioned above, the monitoring system (especially full system call tracing through VMI) caused considerable overhead. Table 3.2 also indicates how long an event takes until all corresponding log messages arrive at the fingerprinting workstation. The value t was used as a timeout for the respective events. We determined t empirically by executing

25 3 Limitations of Forensic Event Reconstruction based on Log Files

Table 3.2: List of events that is used for the evaluation. The corresponding event is recorded for t seconds.

Class Name Description t ls Lists files 5 cp Copies file 5 mv Moves file 5 cat Cats file 5 vmstat Virtual memory statistics 5 CLI netstat Network statistics 5 Creates compressed tar archive 5 rm Removes file 5 shred Shreds file 5 curl Downloads file 5 tailShadow Reads /etc/shadow 10 catCredentials Shows credentials of a Wordpress config 10 CLI Root vimHosts Opens /etc/hosts in Vim 10 rmSudo Removes file with sudo 10 shredSudo Shreds file with sudo 10 wordpressLogin Wordpress Login 20 Web wordpressSearch Wordpress Search 20 wordpressOpen Opens Wordpress website 20 sshLogin SSH login (server side) 30 Service apacheStop Stops apache web server 110 mysqlWp Login into Wordpress DB via command line 20 lsmod Lists loaded kernel modules 10 Kernel Modules insmod Loads kernel module 5 rmmod Unloads kernel module 5 dockerHelloWorld Starts docker hello world example 105 dockerUbuntuLog Starts docker ubuntu and show log 110 dockerImages Lists all docker images 10 dockerPs Lists all running dockers 10 dockerPSA Lists all dockers container 10 dockerUbuntuSleep Starts docker in background 100 dockerRm Removes all docker containers 15 Docker dockerNginx Runs nginx docker and curl it 80 dockerUbuntuBash Attaches bash of container 15 dockerPrune Removes unused container 60 dockerPruneVolumes Removes unused objects and volumes 60 dockerRmImages Removes all images 60 dockerUbuntuBashCp Attaches container and runs cp 95 dockerUbuntuBashMv Attaches container and runs mv 95 dockerUbuntuBashRm Attaches container and runs rm 95 dockerUbuntuBashCat Attaches container and runs cat 95 nextcloudStatus Shows Nextcloud status 35 nextcloudAppList Lists Nextcloud apps 40 Nextcloud nextcloudUserList Lists Nextcloud user 40 nextcloudUserAdd Adds new Nextcloud user 65 nextcloudGroupList List Nextcloud groups 40

26 3.2 Forensic Fingerprints the event multiple times and manually analyzing the log files for the evaluation. For tiny events like common CLI events, 5s turned out to be a reasonable value. So do 25s for web events. Relatively complex events also have a huge t value. Although the overhead looks quite bad and is probably not useful in productive use, we believe that the slow timing did not negatively influence fingerprint calculations. In Section 3.4 we show how the performance issue of system call tracing can be handled.

3.2.1 Handling Background Noise

218

200

163 150

100 99 Occurences 66

50 48 47 38 38 28 25 24 21 18 18 16 14 11 10 9 7 7 3 3 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 0

poll (7) stat (4) read (0) fstat (5) open (2) brk (12) ioctl (16) close (3) write (1) lseek (8) pipe (22) mmap (9) wait4 (61) clone (56) futex (202) select (23) access (21) statfs (137) execve (59) getrlimit (97) mprotect (10) munmap (11)setpgid (109) geteuid (107) nanosleep (35) fadvise64 (221) arch_prctl (158) rt_sigaction (13) sched_yield (24) rt_sigreturn (15) exit_group (231) gettimeofday (96) io_getevents (208) clock_gettime (228) rt_sigprocmask (14) set_robust_list (273) set_tid_address (218)

inotify_add_watch (254)

Figure 3.4: Fingerprint of cp with the features source and type_id. The bar plot shows the occurrences of the different system call numbers in the fingerprint.

To illustrate the output of a fingerprint measurement, Figure 3.4 shows an excerpt of a fingerprint of the cp event (see also Table 3.2) using the first feature set depicted asa histogram. Interestingly, the fingerprint consists of system calls only. Many system calls appear that do not belong to the cp event itself but are caused by background operating system behavior. Such background noise appears because it is impossible to associate log events with particular processes in our architecture. Even if this were possible and we would restrict our focus to log messages generated by a particular process id, log entries that are indirectly caused by the event would go unnoticed. We, therefore, decided to over approximate the fingerprints and eliminated noise afterward. Note also that the fingerprint corresponds to the evidence set E. To compute the characteristic fingerprint, we would need to subtract from that set the evidence sets of all other events (as well as subtracting the noise).

Instrumenting a whole system to export log information — especially tracing all system calls — produces much noise because many events without user interaction occur concurrently in today’s multitasking server systems. Examples are periodic jobs like cron or network services running in the background.

To eliminate noise in our measurements, we employed two mechanisms:

1. We executed and recorded events n times for a specific amount of time t (derived empirically, see above), and

2. we applied a threshold p out of n for feature vectors to become part of the fingerprint.

For choosing a reasonable threshold of p and a number of rounds n, we executed the events from different classes 100 times. Figure 3.5 shows the distributions of occurrences of feature vectors over 100 rounds for three different events (using feature set 1, i.e., only type_id). Some plateaus can be identified in Figure 3.5a and 3.5b which probably come from some

27 3 Limitations of Forensic Event Reconstruction based on Log Files

100 100 100

80 80 80

60 60 60

Occurences 40 Occurences 40 Occurences 40

20 20 20

0 0 0 0 1000 2000 3000 4000 5000 6000 7000 0 5000 10000 15000 20000 25000 0 20000 40000 60000 80000 Feature Vectors Feature Vectors Feature Vectors (a) ls (b) wordpressSearch (c) sshLogin

Figure 3.5: Distributions of occurrences of feature vectors in different rounds. Features are source and type_id. Threshold to become part of the fingerprint is set to 0.8. So, if a feature vector occurs more than 80 times, it is added to the fingerprint. periodical tasks. The measurements show that choosing a threshold of 0.8, i.e., 80% should be a reasonable value. Later, in Section 3.3 we show how to get rid of more noise. Even though many feature vectors are occurring every measure round, we decided to set the threshold to 0.8 (and not 1.0) because of possible measurement inaccuracy. Note, most feature vectors that occur every round are caused by high-frequency noise floor. Regarding the number of rounds n, a basic observation is that it should be as small as possible and as big as necessary. A small number of rounds makes the records smaller and requires less effort (and can be checked manually, making errors in the automatic event execution more obvious). Additionally, executing an event multiple times takes time. To get an impression of how n affects the fingerprints, we calculated the fingerprints for the events ls, wordpressSearch and sshLogin for different values of n. Table 3.3 shows the resulting number of feature vectors. One can see that the smaller and shorter the event, the smaller the variations. Long events also cause big variations. After some manual analysis and considering the values from Table 3.3, we decided to set n = 40. Note, the values in Table 3.3 do not match those in Table A.1. This is because n was determined in our first paper [80] and then used for all further calculations.

Table 3.3: The impacts of changing n to the amount of feature vectors Name 5 10 20 40 60 80 100 ls 1103 1089 1105 1088 1100 1106 1096 wordpressSearch 10510 8067 7980 7720 7768 7761 7682 sshLogin 28039 25809 20060 15994 17104 18105 19069

The evaluation in Section 3.2 considers different measurement points for calculating and relating fingerprints based on log files for different events.

3.2.2 Non-Characteristic Fingerprints

After presenting the experimental setup, we now report on the results of our measurements in which we set the number of rounds per event to n = 40 and the threshold to 0.8. Our interest is to find out which factors affect the quality and size of (characteristic) fingerprints. The first part of the evaluation focuses on the calculation of non-characteristic fingerprints. It gives a first impression of how these fingerprints look and what feature sets andsources are relevant.

28 3.2 Forensic Fingerprints

3.2.2.1 Influence of Log Source and Feature Set

The fingerprint of cp shown in Figure 3.4 gives a first impression of how a fingerprint looks like. In this case, the only used source is syscalls and the first feature set. Our experiments revealed that by far, the largest part of fingerprints comes from syscalls. These appear in every fingerprint (usually as a five-digit number). Table A.1 in the appendix gives deeper insights into how fingerprints depend on the used feature set and the sources the feature vectors do belong. It turns out that rather small events (i.e., the CLI user events) have a similar amount of feature vectors except tar and curl. These events also do not have any feature vector that belongs to any other source than syscalls, but the CLI sudo events also leave traces in other sources, namely in auth.log. Although these commands have quite similar complexity to the CLI user events, the fingerprints are nearly twice as large. As expected, the web events leave traces in access.log. So do the webserver events have feature vectors in auth.log and syslog. The Linux kernel events behave similarly. Some Docker events also produce some logs in syslog. The Nextcloud events also leave traces in the auth.log. Our experiments show that the events leave traces in the respective log sources depending on the event class. When comparing the number of feature vectors when using different feature sets, one notices that the feature set does not matter very much for fingerprint calculation. The amount of feature vectors decreases with the selected feature set because the feature sets get more restrictive. However, the total differences are marginal.

3.2.2.2 Similarities of Fingerprints

The heat map in Figure 3.6 illustrates to what extent an evidence set E(σ) of an event σ (to the left) overlaps with the evidence set E(σ′) of another event σ′ (at the top). Corresponding fingerprints were calculated using the last feature set that contains allfour relevant features: source, type_id, path and misc and all log sources, i.e., source set 4. A white x indicates a full overlap. The heat map shows that small events like CLI user events are quite similar. Even a single fingerprint of another event subtracts a significant proportion of an original fingerprint. It is also evident that fingerprints of bigger events nearly completely overlap fingerprints of smaller events. The fingerprint of the CLI event tar is different from the other events of its class. Probably this is because this event does a lot of different things, i.e., reading, writing, and compression. Therefore, that event is a bit out of the ordinary for CLI. There are some events (sshLogin, apacheStop and some Docker events) that cover other fingerprints by a considerable proportion (indicated by dark blue columns). Some fingerprints are fully covered by another fingerprint. What is also striking: Even though the web events’ fingerprints are bigger in numbers than the CLI user fingerprints, they do not cover that much of CLI user events. This fact is promising for the existence of characteristic fingerprints since for a characteristic fingerprint to exist, at least one feature vectormust remain after subtracting the evidence set of other events.

3.2.3 Characteristic Fingerprints

With characteristic fingerprints it is possible to arguably distinguish between a given event σ and the events in the reference set Σ′ (see also Section 2.1.2). Thus, it is essential

29 3 Limitations of Forensic Event Reconstruction based on Log Files

Fingerprint F2

ls cp mv cat vmstat netstat tar rm shred curl tailShadow catCredentials vimHosts rmSudo shredSudo wordpressLogin wordpressSearch wordpressOpen sshLogin apacheStop mysqlWp lsmod insmod rmmod dockerHelloWorld dockerUbuntuLog dockerImages dockerPs dockerPSA dockerUbuntuSleep dockerRm dockerNginx dockerUbuntuBash dockerPrune dockerPruneVolumes dockerRmImages dockerUbuntuBashCp dockerUbuntuBashMv dockerUbuntuBashRm dockerUbuntuBashCat nextcloudStatus nextcloudAppList nextcloudUserList nextcloudUserAdd 1.0 ls X cp X mv X cat X vmstat X netstat X tar X rm X shred X 0.8 curl X tailShadow X catCredentials X vimHosts X rmSudo X shredSudo X wordpressLogin X wordpressSearch X wordpressOpen X sshLogin X 0.6 apacheStop X

1 mysqlWp X F lsmod X insmod X rmmod X Similarity

Fingerprint dockerHelloWorld X dockerUbuntuLog X dockerImages X dockerPs X X 0.4 dockerPSA X X dockerUbuntuSleep X dockerRm X X X X dockerNginx X dockerUbuntuBash X X X X dockerPrune X dockerPruneVolumes X dockerRmImages X dockerUbuntuBashCp X 0.2 dockerUbuntuBashMv X dockerUbuntuBashRm X dockerUbuntuBashCat X nextcloudStatus X nextcloudAppList X nextcloudUserList X nextcloudUserAdd X nextcloudGroupList X

Figure 3.6: Similarities of fingerprints. The heatmap shows what proportion of the fingerprint F1 is overlapped by F2. to calculate characteristic fingerprints. When calculating CE(σ, Σ′), for all σ′ ∈ Σ′ the sets E(σ′) have to be calculated using the same feature set F as those for E(σ). In this chapter, the impact of the feature sets, reference sets as well as the acquired log sources are evaluated.

3.2.3.1 Influence of the Log Sources and Feature Set

Feature vectors coming from system call traces often dominate the calculated fingerprints. Now, the characteristic fingerprints of all events are calculated with all the other events in the reference set. The heat map in Figure 3.6 already indicates that there are many overlaps with other events. As a result, the corresponding characteristic fingerprints do have significantly less feature vectors.

For some combinations of feature set and source set, there is no characteristic fingerprint. Table A.2 in the appendix shows the raw data. It is filled with many zero entries, meaning that there is no feature vector coming from a specific log source. Calculating characteristic fingerprints shrinks the total amount of feature vectors massively (raw data can beseenin the appendix Table A.1 and Table A.2). The CLI user events’ characteristic fingerprints do not contain any feature vector from another source than syscalls. Since this class’s

30 3.2 Forensic Fingerprints original fingerprints did not contain any other feature vector either, this is an expected result. However, this shows that some events do not leave any traces in common log sources. Further, the CLI Root events also do have only a single feature vector for specific feature sets. For the other event classes, the values from other sources are relatively small. So also, these events suffer from low detectability through log files. While the feature set did not matter for the non-characteristic fingerprints, now the feature set has a significant impact on the size of characteristic fingerprints. The feature sets1and 2 behave similarly and do have a similar impact on the characteristic fingerprint. The same applies to feature sets 3 and 4, which result in significantly larger characteristic fingerprints. Note, these two feature sets do contain the path as a feature. This means that containing the path as a feature makes a feature vector considerably more characteristic, i.e., more resistant to other event’s feature vectors.

100 80 60 40 20

Average Fingerprint Size 4 3 1 2 2 Source Set3 1 Feature Set 4

Figure 3.7: Average size of the characteristic fingerprint depending on the log source set and the feature set.

Figure 3.7 summarizes the impact of the source set and feature set on the characteristic fingerprint size. The plot shows the average size of characteristic fingerprints for different source sets and feature sets. In each case, the reference set consisted of all other events. The figure clearly shows that only higher source sets than 2, i.e., when alsousing usually unavailable log sources such as syscalls, the size of the average characteristic fingerprints grows significantly more extensive. This is especially true for the feature sets 3and4— those that include the path. Both combined (i.e., feature sets 3 or 4 with source sets 3 or 4) result in quite big average characteristic fingerprints. For feature set 1 is clear that the average size of characteristic fingerprints gets larger with source set 3 and 4. Similar behavior can be observed for the feature set 2. In Figure 3.8 one can see the average size of characteristic fingerprints per class. Source set 3 and 4 lead to the largest characteristic fingerprints in all classes. Using only source set 1 and 2 produce much smaller characteristic fingerprints, and some classes like CLI do not benefit from those logs. Only Web events, CLI Root events, Service, and Nextcloud events have small average characteristic fingerprints in source set 1 and 2. These graphs confirm the results in Figure 3.7.

31 3 Limitations of Forensic Event Reconstruction based on Log Files

3.7 3.7 50 48.4 48.4 21.3 1329.3 1329.3 3.5 20.0 18.3 1200 3.0 40 17.5 1000 15.0 2.5 30 12.5 800 2.0 10.0 600 1.5 20 7.5 400 1.0 5.0 10 3.0 200 0.5 2.5 Average characteristic Fingerprint Size Average characteristic Fingerprint Size Average characteristic Fingerprint Size 0.0 0.0 Average characteristic Fingerprint Size 1.0 1.0 0.0 0 0.0 0.0 0 1.7 1.7 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Source Set Source Set Source Set Source Set (a) CLI (b) CLI Root (c) Web (d) Service

91.0 91.0 10.5 10.5 31.6 31.6 10 30 80 25 8

60 20 6 15 40 4 10 20 2 5

Average characteristic Fingerprint Size Average characteristic Fingerprint Size Average characteristic Fingerprint Size 1.0 1.0 0 0.0 0.0 0 0.0 0.0 0 1 2 3 4 1 2 3 4 1 2 3 4 Source Set Source Set Source Set (e) Kernel Modules (f) Docker (g) Nextcloud

Figure 3.8: Average characteristic fingerprint size per class.

3.2.3.2 Influence of the Reference Set and Feature Set

Previous sections showed that fingerprints of complex events often overlap fingerprints of simple events. So characteristic fingerprints with all other events in the reference setare quite small or sometimes do even not exist. Nevertheless, it might be not always necessary to distinguish between all other events but between some other events. For this reason, we now study the impact of the reference set on the size of the characteristic fingerprint. To do this, we calculated the characteristic fingerprints where the reference set consists of the events from the same class only but considering all sources (including syscalls). The measurements (full data set can be seen in the appendix in Table A.3) reveal that the reference set has the most impact on rather medium-sized events, e.g., tar, catCredentials and vimHosts. In case of tar, for example, the characteristic fingerprint’s size increases to about 136 feature vectors if only CLI events are in the reference set. Compared to all other events, the size of the characteristic fingerprint was between 1 and 5. The increase in size affects events, even for the most basic feature set. In five cases, reducing the reference set’s size to the same class of events causes characteristic fingerprints to exist, whereas these events were undetectable before. Overall, reducing the reference set’s size makes a big difference in the detectability of events and should be considered by a domain expert.

3.3 Matching

We could show that it is possible to calculate characteristic fingerprints CE(σ, Σ′) for most events for even a bigger reference set Σ′. This section wants to analyze how the characteristic fingerprints perform in terms of event reconstruction by matching event traces.

32 3.3 Matching

3.3.1 Methodology

To evaluate matching with characteristic fingerprints, we executed each event ten times (test set) and saved the corresponding traces as described in Section 3.1. Then, the Matching Engine calculates for all traces a score. Let T (σ′) be the trace of the event σ′, then the score is calculated as follows:

|CE(σ, Σ′) ∩ T (σ′)| score(CE(σ, Σ′),T (σ′)) = |CE(σ, Σ′)|

This means that the score of a characteristic fingerprint CE(σ, Σ′) is the proportion of matched feature vectors of T (σ′) in the characteristic fingerprint. The term matched means that all relevant Features F that were used for calculating the fingerprint are the same in the trace vector and in the fingerprint vector. Ideally, score(CE(σ, Σ′),T (σ)) = 1 while score(CE(σ, Σ′),T (σ′)) = 0 for σ′ ̸= σ.

3.3.2 Matching Results

Characteristic Fingerprint ls cp mv cat vmstat netstat tar rm shred curl tailShadow catCredentials vimHosts rmSudo shredSudo wordpressLogin wordpressSearch wordpressOpen sshLogin apacheStop mysqlWp lsmod insmod rmmod dockerHelloWorld dockerUbuntuLog dockerImages dockerPs dockerPSA dockerUbuntuSleep dockerRm dockerNginx dockerUbuntuBash dockerPrune dockerPruneVolumes dockerRmImages dockerUbuntuBashCp dockerUbuntuBashMv dockerUbuntuBashRm dockerUbuntuBashCat nextcloudStatus nextcloudAppList nextcloudUserList nextcloudUserAdd nextcloudGroupList 1.0 ls 1.00 0.200.03 0.06 0.010.02 0.070.13 cp 1.00 0.00 0.00 0.00 mv 1.00 0.06 cat 0.03 0.03 0.07 vmstat 1.00 0.00 0.00 0.01 0.070.07 netstat 1.00 0.02 tar 1.00 0.00 rm 0.150.03 1.00 0.02 0.010.01 0.030.07 shred 1.00 0.00 0.04 curl 0.80 0.02 0.02 0.8 tailShadow 0.99 0.02 0.00 catCredentials 1.00 0.00 0.01 0.00 vimHosts 0.03 1.00 0.02 0.01 0.030.07 rmSudo 0.200.03 0.000.940.14 0.21 0.00 0.040.67 shredSudo 0.00 0.970.06 0.12 0.00 0.00 wordpressLogin 0.140.02 0.990.03 0.02 0.010.01 0.060.12 wordpressSearch 0.080.02 0.73 0.02 0.010.01 0.060.12 wordpressOpen 0.060.02 0.000.06 0.02 0.010.01 0.030.06 sshLogin 0.280.05 0.01 1.00 0.00 0.09 0.100.27 0.6 apacheStop 0.380.11 0.050.00 0.071.000.01 0.030.03 0.130.40 mysqlWp 0.030.00 0.00 1.00 0.00 0.03 lsmod 0.00 1.00 0.00 insmod 0.04 0.00 0.00 0.970.050.000.00 0.010.02 rmmod 0.060.01 0.01 0.040.940.000.01 0.01 dockerHelloWorld 0.280.08 0.03 0.10 0.01 0.01 0.980.03 0.100.20 Event Trace

dockerUbuntuLog 0.300.08 0.03 0.09 0.01 0.01 0.000.91 0.100.20 Matching Score dockerImages 0.00 0.00 1.00 dockerPs 0.180.03 0.03 0.03 0.030.07 0.4 dockerPSA 0.220.03 0.03 0.03 0.070.13 dockerUbuntuSleep 0.450.11 0.00 0.10 0.00 0.07 1.00 0.130.33 dockerRm 0.00 0.00 0.07 dockerNginx 0.350.08 0.00 0.14 0.01 0.010.17 1.00 0.130.27 dockerUbuntuBash 0.20 0.12 0.02 0.01 0.03 dockerPrune 0.170.05 0.030.00 0.05 0.01 0.000.02 1.00 0.070.13 dockerPruneVolumes 0.170.05 0.00 0.07 0.00 0.02 1.00 0.070.13 dockerRmImages 0.180.06 0.00 0.03 0.00 0.02 0.95 0.070.13 dockerUbuntuBashCp 0.10 0.280.05 0.00 0.05 0.00 0.03 0.240.27 0.2 dockerUbuntuBashMv 0.150.05 0.01 0.05 0.02 1.000.20 dockerUbuntuBashRm 0.270.09 0.01 0.01 0.11 0.000.03 0.331.00 dockerUbuntuBashCat 0.230.07 0.01 0.07 0.00 0.00 0.02 0.130.251.00 nextcloudStatus 0.130.03 0.03 0.05 0.00 0.000.02 0.030.07 1.00 nextcloudAppList 0.130.03 0.03 0.01 0.02 0.100.13 1.00 nextcloudUserList 0.120.03 0.00 0.03 0.00 0.02 0.070.13 1.00 nextcloudUserAdd 0.250.09 0.00 0.05 0.00 0.03 0.070.20 1.00 nextcloudGroupList 0.120.02 0.00 0.05 0.02 0.060.12 0.91 noise 0.830.27 0.01 0.35 0.02 0.110.17 0.330.67 0.0

Figure 3.9: The heatmap shows a matching matrix before subtracting noise from the characteristic fingerprints. On the y-axis, the ground truth, i.e., the actual event tracesare listed while on the x-axis are the corresponding characteristic fingerprints. The average matching scores of ten traces is represented by the darker coloring and and average score.

33 3 Limitations of Forensic Event Reconstruction based on Log Files

Figure 3.9 shows the results of our matching experiment. The heatmap shows the average scores (formula is shown above for one execution) for ten executions of the event. The higher the average matching score, the darker the cell. The matching results in Figure 3.9 look considerably decent. For all existing characteristic fingerprints, the average matching score is relatively high, while overall, the matching scores of the false events are almost everywhere below 0.1. The last line in the matrix shows the matching of one hour of noise, i.e., no user interaction with the system happened in this hour, but only standard background tasks are running. The matching scores are also very low, with one exception (vmstat). Optimally, the matrix would only have a one-diagonal while the rest are zero entries. Since we use characteristic fingerprints for matching, a similar graph is expected.

Figure 3.9 also shows that only some characteristic fingerprints produce false matches (even if the score is usually relatively low). However, these characteristic fingerprints usually also do match noise (also with usually rather low scores). It is noticeable that matching feature vectors in these cases come from still present noise in the corresponding characteristic fingerprints by researching the false matches. For example, there are some system calls related to the Filebeat [37] (the software we use to drain log files) and to Xymon139 [ ] (a system monitoring software).

Characteristic Fingerprint ls cp mv cat vmstat netstat tar rm shred curl tailShadow catCredentials vimHosts rmSudo shredSudo wordpressLogin wordpressSearch wordpressOpen sshLogin apacheStop mysqlWp lsmod insmod rmmod dockerHelloWorld dockerUbuntuLog dockerImages dockerPs dockerPSA dockerUbuntuSleep dockerRm dockerNginx dockerUbuntuBash dockerPrune dockerPruneVolumes dockerRmImages dockerUbuntuBashCp dockerUbuntuBashMv dockerUbuntuBashRm dockerUbuntuBashCat nextcloudStatus nextcloudAppList nextcloudUserList nextcloudUserAdd nextcloudGroupList 1.0 ls 1.00 0.02 cp 1.00 0.00 mv 1.00 0.04 cat vmstat 1.00 0.00 netstat 1.00 0.02 tar 1.00 0.00 rm 1.00 shred 1.00 0.00 0.04 curl 0.80 0.02 0.8 tailShadow 0.99 0.02 catCredentials 1.00 0.00 vimHosts 1.00 0.00 rmSudo 0.000.940.14 0.21 shredSudo 0.00 0.970.06 0.14 wordpressLogin 0.990.03 wordpressSearch 0.73 wordpressOpen 0.000.06 sshLogin 0.00 1.00 0.6 apacheStop 0.050.00 0.001.00 mysqlWp 0.030.00 1.00 lsmod 1.00 insmod 0.970.05 rmmod 0.040.94 dockerHelloWorld 0.03 0.06 0.01 0.98 Event Trace

dockerUbuntuLog 0.03 0.04 0.01 0.90 Matching Score dockerImages 0.00 1.00 dockerPs 0.4 dockerPSA dockerUbuntuSleep 0.00 0.04 1.00 dockerRm 0.00 dockerNginx 0.00 0.08 0.01 0.11 1.00 dockerUbuntuBash 0.20 0.00 dockerPrune 0.030.00 0.02 1.00 dockerPruneVolumes 0.00 0.04 1.00 dockerRmImages 0.01 0.00 0.95 dockerUbuntuBashCp 0.10 0.00 0.02 0.21 0.2 dockerUbuntuBashMv 0.00 1.00 dockerUbuntuBashRm 0.01 0.00 0.01 0.06 0.291.00 dockerUbuntuBashCat 0.00 0.02 0.00 1.00 nextcloudStatus 0.03 0.04 1.00 nextcloudAppList 0.02 1.00 nextcloudUserList 0.00 0.02 1.00 nextcloudUserAdd 0.01 0.00 1.00 nextcloudGroupList 0.00 0.04 0.91 noise 0.00 0.20 0.0

Figure 3.10: The heatmap shows a matching matrix after subtracting noise from the characteristic fingerprints. On the y-axis, the ground truth, i.e., the actual event tracesare listed while on the x-axis are the corresponding characteristic fingerprints. The average matching scores of ten traces is represented by the darker coloring and average score.

34 3.3 Matching

To eliminate the matching noise in the characteristic fingerprints, we subtract the noise from the characteristic fingerprints and perform matching again. The results can beseenin Figure 3.10. There are much fewer false matching entries, and the corresponding matching scores are also much lower. Thus, subtracting noise from characteristic fingerprints has turned out to be beneficial. Only vimHots and sshLogin still seem to contain some noise.

1.0

0.8

0.6

0.4

all events, with noise: AUC = 0.91 0.2

True Positive Rate (Sensitivity) all events, subtracting noise: AUC = 0.92 only detectable events, with noise: AUC = 1.00 only detectable events, subtracting noise: AUC = 1.00 0.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate

Figure 3.11: ROC curves of matching with and without noise and for all events and only detectable events.

Figure 3.11 shows the corresponding Receiver Operating Characteristic curves (ROC). The ROC curves compare the true positive rate — also called sensitivity — with the false positive rate while varying the matching threshold. This value says above what threshold a matching score is interpreted as a match. Especially, using Figure 3.10, it is easily possible to determine a perfect threshold (sensitivity of 1) and a false positive rate of 0. An example of such a threshold would be 0.7. In Figure 3.9, one can see that using this threshold would produce a false match (noise). However, this single false positive has such a small impact on the false positive rate that it is not visible in the corresponding ROC curves. Figure 3.11 also shows the ROC curves when only considering detectable events. Since we can then find a perfect threshold (when subtracting noise), the corresponding Area Under the Curve (AUC) is 1.0.

3.3.3 Stability against Unknown Events

Until now, we had a kind of closed-world assumption. We pretended to know all events in the system. Now, we want to investigate how the execution of unknown events has an impact on matching. For this, we calculate new characteristic fingerprints for every event. However, now the reference set excludes each time the event for which we match the log trace, i.e., we pretend not to know σ (and the corresponding characteristic fingerprint). We then calculate the score for each characteristic fingerprint that does not have σ in the reference set anymore. Ideally, the score would be 0 everywhere. Figure 3.12 shows the average results of the matching of ten execution logs. One can see that there are some matches with high scores, i.e., these would be false positives. More precisely, six characteristic fingerprints match with more than 90 %. It is striking that the corresponding events are somehow related or are even a subset of another event. For example the trace of

35 3 Limitations of Forensic Event Reconstruction based on Log Files

Characteristic Fingerprint ls cp mv cat vmstat netstat tar rm shred curl tailShadow catCredentials vimHosts rmSudo shredSudo wordpressLogin wordpressSearch wordpressOpen sshLogin apacheStop mysqlWp lsmod insmod rmmod dockerHelloWorld dockerUbuntuLog dockerImages dockerPs dockerPSA dockerUbuntuSleep dockerRm dockerNginx dockerUbuntuBash dockerPrune dockerPruneVolumes dockerRmImages dockerUbuntuBashCp dockerUbuntuBashMv dockerUbuntuBashRm dockerUbuntuBashCat nextcloudStatus nextcloudAppList nextcloudUserList nextcloudUserAdd 1.0 ls - 0.200.03 0.06 0.010.02 0.070.13 cp - 0.50 0.00 0.00 0.02 mv - 0.06 cat - 0.03 0.03 0.070.04 vmstat - 0.00 0.00 0.01 0.070.07 netstat - 0.02 tar - 0.00 rm 0.20 0.150.03 - 0.02 0.010.01 0.030.07 shred - 0.00 0.04 0.8 curl - 0.02 0.02 tailShadow - 0.02 0.04 0.50 catCredentials - 0.00 0.06 0.00 vimHosts 0.03 - 0.02 0.01 0.030.30 rmSudo 0.200.03 0.00 - 0.14 0.21 0.03 0.040.92 shredSudo 0.00 - 0.05 0.12 0.02 0.00 wordpressLogin 0.140.02 - 1.000.02 0.06 0.010.01 0.060.12 wordpressSearch 0.080.02 0.00 - 0.02 0.010.01 0.060.12 wordpressOpen 0.060.02 0.05 - 0.02 0.010.01 0.030.06 0.6 sshLogin 0.280.05 0.01 - 0.010.02 0.40 0.100.270.08 apacheStop 0.380.11 0.050.00 0.07 - 0.10 0.290.03 0.130.40 mysqlWp 0.030.00 0.00 - 0.00 0.03 lsmod 0.00 - 0.080.00 insmod 0.04 0.00 0.00 - 0.220.000.00 0.010.02 rmmod 0.060.01 0.01 0.000.000.23 - 0.000.01 0.01 Event Trace dockerHelloWorld 0.280.08 0.03 0.100.010.06 0.01 - 0.03 0.100.200.56 Matching Score dockerUbuntuLog 0.300.08 0.03 0.09 0.12 0.01 0.00 - 1.00 0.100.20 dockerImages 0.00 0.02 - 0.4 dockerPs 0.180.03 0.03 0.03 - 0.33 0.030.07 dockerPSA 0.220.03 0.03 0.06 - 0.070.13 dockerUbuntuSleep 0.450.11 0.00 0.10 0.02 0.07 1.00 - 0.130.33 dockerRm 0.00 0.00 - 0.07 dockerNginx 0.350.08 0.00 0.14 0.01 0.010.17 - 0.130.270.34 dockerUbuntuBash 0.20 0.12 0.02 0.01 - 0.03 dockerPrune 0.170.05 0.030.00 0.05 0.10 0.000.02 - 0.070.13 dockerPruneVolumes 0.170.05 0.00 0.07 0.04 0.02 - 0.070.13 dockerRmImages 0.180.06 0.00 0.03 0.04 0.02 - 0.070.13 0.2 dockerUbuntuBashCp 0.10 0.280.05 0.00 0.05 0.02 0.03 - 0.240.27 dockerUbuntuBashMv 0.150.05 0.01 0.05 0.02 - 0.95 dockerUbuntuBashRm 0.270.09 0.020.75 0.01 0.11 0.000.03 0.85 - dockerUbuntuBashCat 0.970.230.07 0.01 0.07 0.04 0.00 0.640.02 0.17 0.130.25 - nextcloudStatus 0.130.03 0.370.03 0.05 0.02 0.000.02 0.030.07 - nextcloudAppList 0.130.03 0.03 0.06 0.02 0.100.13 - nextcloudUserList 0.120.03 0.00 0.03 0.02 0.02 0.070.13 - nextcloudUserAdd 0.250.09 0.00 0.05 0.04 0.03 0.070.20 - 0.50 nextcloudGroupList 0.120.02 0.00 0.05 0.02 0.060.12 0.04 - 0.0

Figure 3.12: The heatmap shows a matching matrix. The x-axis correspond to the characteristic fingerprints that do not contain the fingerprint of the event on the x-axis in the corresponding reference set. Fields that are not filled do not match with a feature vector at all, i.e., the score is 0. dockerUbuntuBashCat matches cat or the trace of wordpressOpen matches wordPressLogin fingerprint.

Since events whose characteristic fingerprints match high scores are quite related, our approach to calculating characteristic fingerprints is quite promising. We do not match completely different events. However, an analyst should consider that even if the matching score is 1.0, it is also possible that another (albeit similar) event is matching. For example, an event that is allegedly performed could also be performed in a Docker container.

3.4 System Calls for Forensic Event Reconstruction

The previous sections show that system calls are dominating characteristic fingerprints and play an important role in forensic event reconstruction. Many events could not be detected using standard system log sources. The t values in Table 3.2 indicate that tracing all system calls makes the system nearly unusable, however. In this section, we want to give an overview of the distribution of system calls occurring on a system. Based on this

36 3.4 System Calls for Forensic Event Reconstruction distribution, we define a cost function for system calls that makes a statement aboutthe costs of a characteristic fingerprint. Furthermore, this section shows what set of system calls is discriminatory, i.e., is useful for forensic event reconstruction.

3.4.1 System Call Distribution in System Activity and Characteristic Fingerprints

To get the distribution of system calls occurrences, we record one hour of noise in the system. So, all system calls are traced for an hour while no user interaction is happening. All records of the events (40 rounds per event) and the noise recording are merged to get a representative list of system calls occurring on the system A. This list contains more than 41 million system calls. Figure 3.13 shows a histogram of the 20 most frequent system

0.20

0.15

0.10 Relative Frequency

0.05

0.00

stat read fstat fcntl lstat futex close open select mmap access

pselect6 epoll ctl mprotect nanosleep newfstatat rt sigaction epoll pwait io getevents clock gettime System Call

Figure 3.13: Typical distribution of of system calls in a system (using reference set A). calls in our set of system calls. The most frequent system call (24%) is clock_gettime which — as the name suggests — is responsible for returning the current time. futex is used for synchronisation while the next system calls read, fstat, close and open are all related to basic file operations. Since Linux is a file-based system, it is no surprise that these kind of system calls are quite frequent. The histogram in Figure 3.14 shows the absolute numbers of system calls in characteristic fingerprints. Here, we also see that file-related system calls like open and stat are also important in characteristic fingerprints. Table 3.4 lists these system calls and describes their purpose in more detail. One can say that file-related system calls are quite generic and occur in many characteristic fingerprints. Other system calls in this list are used more for special purposes like finit_module and delete_module that are used for loading and unloading Linux kernel modules and so clearly belong to the corresponding events (see also Table 3.2).

37 3 Limitations of Forensic Event Reconstruction based on Log Files

103

102

101

100 Absolute Frequency in all Characteristic Fingerprints

stat open bind creat execve chmod chroot sendfile setfsuid setfsgid connect setreuid setregid io submit inotify init1finit module delete module System Call

Figure 3.14: Absolute occurrences of system calls in characteristic fingerprints.

In the following section, we want to show how to make the system call tracing more efficient by tracing fewer system calls based on the distributions of system calls in the system and characteristic fingerprints.

3.4.2 The Cost Function

In this section, we want to develop a cost function for characteristic fingerprints. This function shall be used to assess how expensive it is to trace all system calls that are used in a characteristic fingerprint. Basically, libvmtrace [145] (the software we use for system call tracing) allows us to trace a defined set of system calls. For example, only for a specific set of system calls istherea trap into the hypervisor. Traps into the hypervisor are quite expensive, so if one traces many system calls, there are many traps and context switches, which strongly decreases the performance. We assume every trace of a system call has the same constant costs as the most expensive part is trapping into the hypervisor that is the same for every system call. However, this is only a rough estimation. Let s be a system call that should be traced, and A be a representative list of system calls that happen over a long period of time in the system. Then we define the costs c of s as the relative frequency of the system call in the representative system activity A, i.e., the number of times that s occurs in A over the total length of the list. Overall, we traced about 320 different system calls. Table 3.4 lists the system calls that appear in characteristic fingerprints, describes their purpose, and shows the (rounded) costs of the system calls that appear in characteristic fingerprints. It is striking that most system calls that are part of characteristic fingerprints

38 3.4 System Calls for Forensic Event Reconstruction

Table 3.4: Costs to trace system calls that are part of characteristic fingerprints ordered by the frequency in characteristic fingerprints. The descriptions are taken from the Linux Programmer’s Manual. System Call Description Costs open Opens a file for reading or writing 0.0442 stat Retrieves a files status 0.0791 sendfile Transfers data between file descriptors 0.0003 execve Executes a program 0.0009 setfsuid Sets user identity used for filesystem checks 0.0000 setfsgid Sets group identity used for filesystem checks 0.0000 io_submit Submits asynchronous I/O blocks for processing 0.0000 connect Initiates a connection on a socket 0.0007 setreuid Sets a real and/or effective user or group ID 0.0000 setregid Sets a real and/or effective user or group ID 0.0000 bind Binds a name to a socket 0.0002 creat Opens and possibly creates a file 0.0000 chmod Changes permissions of a file 0.0000 chroot Changes root directory 0.0000 delete_module Unloads a kernel module 0.0000 inotify_init1 Initializes an inotify instance 0.0000 finit_module Loads a kernel module 0.0000 should not impact performance. The two most frequent system calls in characteristic fingerprints (open and stat) are the most expensive in this table, however. The table also shows that tracing most system calls is not necessary.

Based on the formula above, we can define a cost function for characteristic fingerprints. Let CE(σ, Σ′) be a characteristic fingerprint, then the cost of a characteristic fingerprint is defined as follows: ∑︂ c(CE(σ, Σ′)) = c(s) s∈CE(σ,Σ′) So, the costs of a characteristic fingerprint CE(σ, Σ′) is the sum of the relative frequencies of system calls that are part of the characteristic fingerprint. The Cost column of Table 3.5 is the cost of each characteristic fingerprint when tracing all possible system calls. It shows that the maximum cost of a characteristic fingerprint is only about 0.125. Nearly 88% of all system calls do not need to be traced, which would increase performance a lot. In the following, we show how to reduce the costs of characteristic fingerprints and the impact on the size of the characteristic fingerprints.

3.4.3 Greedy Elimination of Expensive System Calls

About 88% of occurring system calls do not need to be traced since they are not part of any characteristic fingerprint, anyway. This should increase performance already alot. However, this is only a rough estimation. The actual overhead is dependent on the event can be much more (see also Table 3.6). In the following, we show how to reduce the costs of characteristic fingerprints further and the impact on the size of the characteristic fingerprints.

39 3 Limitations of Forensic Event Reconstruction based on Log Files

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4 Accumulated Frequency 0.2 0.2 Average Size of Characteristic Fingerprints 0.0 0.0

stat read fstat fcntl lstat futex close open select mmap access

pselect6 epoll ctl mprotect nanosleep newfstatat epoll pwait rt sigaction io getevents

clock gettime System Call

Figure 3.15: The accumulated frequency of system calls occurrences (in blue) and the factor of reduction of the size of characteristic fingerprints when successively tracing fewer system calls.

The blue line in Figure 3.14 shows the accumulated frequency of the 20 most frequent system calls in our representative system call set. It can be seen that these 20 system calls make together nearly 90% of all system call calls. The red line shows the average characteristic fingerprint size when the system calls are successively removed fromthe characteristic fingerprints. One can spot two steps in the graph. One, when stat is not traced and one bigger when open is not traced. This is in accordance with Table 3.4 that shows only these two system calls are expensive. The average size of characteristic fingerprints shrinks by the factor 0.57.

Now we want to investigate in more detail how the the sizes of characteristic fingerprints decreases when not tracing stat and open. Table 3.5 shows the size of characteristic fingerprints when tracing all system calls, all butnot stat, all but not open and all but not stat and open. The table shows (as Figure 3.14) that not tracing stat shrinks the size of characteristic fingerprints with an average of about 20%. The same applies for open with about 27%. However, even though the table reveals quite large reduction rates are for some fingerprints, only two events have no vectors after the removal of stat and open from the characteristic fingerprints.

The costs in Table 3.4 also show that not tracing stat and open should increase performance a lot. Table 3.6 gives an impression of the benefits in terms of the performance of tracing fewer system calls. There, for three events, the average overheads are calculated for different sets of traced system calls. Note, the “No Tracing” value may differ from the corresponding t values since t values were determined more pessimistically. While tracing all system calls can have an overhead of 90, which makes interactive usage nearly impossible, the overheads when only tracing system calls appear in characteristic fingerprints are a lot

40 3.4 System Calls for Forensic Event Reconstruction

Table 3.5: Loss of vectors in characteristic fingerprints when not tracing stat or open or both. Class Name Cost Before w/o stat w/o open w/o stat,open ls 0.001 1 1 (-0.00) 1 (-0.00) 1 (-0.00) cp 0.124 4 2 (-0.50) 3 (-0.25) 1 (-0.75) mv 0.08 2 1 (-0.50) 2 (-0.00) 1 (-0.50) cat 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) vmstat 0.045 6 6 (-0.00) 1 (-0.83) 1 (-0.83) CLI netstat 0.045 15 15 (-0.00) 1 (-0.93) 1 (-0.93) tar 0.08 5 4 (-0.20) 5 (-0.00) 4 (-0.20) rm 0.001 1 1 (-0.00) 1 (-0.00) 1 (-0.00) shred 0.045 2 2 (-0.00) 1 (-0.50) 1 (-0.50) curl 0.044 1 1 (-0.00) 0 (-1.00) 0 (-1.00) tailShadow 0.08 7 3 (-0.57) 7 (-0.00) 3 (-0.57) catCredentials 0.045 4 4 (-0.00) 3 (-0.25) 3 (-0.25) CLI Root vimHosts 0.124 220 97 (-0.56) 127 (-0.42) 4 (-0.98) rmSudo 0.001 2 2 (-0.00) 2 (-0.00) 2 (-0.00) shredSudo 0.124 9 5 (-0.44) 8 (-0.11) 4 (-0.56) wordpressLogin 0.123 63 38 (-0.40) 41 (-0.35) 16 (-0.75) Web wordpressSearch 0.123 3 2 (-0.33) 2 (-0.33) 1 (-0.67) wordpressOpen 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) sshLogin 0.125 2219 1381 (-0.38) 1305 (-0.41) 467 (-0.79) Service apacheStop 0.124 1712 1697 (-0.01) 31 (-0.98) 16 (-0.99) mysqlWp 0.125 47 14 (-0.70) 35 (-0.26) 2 (-0.96) lsmod 0.045 251 251 (-0.00) 1 (-0.99) 1 (-0.99) Kernel Modules insmod 0.124 10 5 (-0.50) 8 (-0.20) 3 (-0.70) rmmod 0.125 12 6 (-0.50) 9 (-0.25) 3 (-0.75) dockerHelloWorld 0.045 28 28 (-0.00) 3 (-0.89) 3 (-0.89) dockerUbuntuLog 0.124 23 12 (-0.48) 16 (-0.30) 5 (-0.78) dockerImages 0.001 1 1 (-0.00) 1 (-0.00) 1 (-0.00) dockerPs 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) dockerPSA 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) dockerUbuntuSleep 0.001 2 2 (-0.00) 2 (-0.00) 2 (-0.00) dockerRm 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) dockerNginx 0.124 65 45 (-0.31) 29 (-0.55) 9 (-0.86) Docker dockerUbuntuBash 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) dockerPrune 0.001 1 1 (-0.00) 1 (-0.00) 1 (-0.00) dockerPruneVolumes 0.001 1 1 (-0.00) 1 (-0.00) 1 (-0.00) dockerRmImages 0.001 2 2 (-0.00) 2 (-0.00) 2 (-0.00) dockerUbuntuBashCp 0 0 0 (-0.00) 0 (-0.00) 0 (-0.00) dockerUbuntuBashMv 0.124 18 6 (-0.67) 13 (-0.28) 1 (-0.94) dockerUbuntuBashRm 0.08 3 1 (-0.67) 3 (-0.00) 1 (-0.67) dockerUbuntuBashCat 0.044 24 24 (-0.00) 0 (-1.00) 0 (-1.00) nextcloudStatus 0.001 3 3 (-0.00) 3 (-0.00) 3 (-0.00) nextcloudAppList 0.124 44 4 (-0.91) 43 (-0.02) 3 (-0.93) Nextcloud nextcloudUserList 0.001 3 3 (-0.00) 3 (-0.00) 3 (-0.00) nextcloudUserAdd 0.045 103 103 (-0.00) 17 (-0.83) 17 (-0.83) nextcloudGroupList 0.045 5 5 (-0.00) 3 (-0.40) 3 (-0.40) Average 0.056 109 83 (-0.24) 38 (-0.65) 13 (-0.88)

better. Removing stat and open from tracing has a huge impact. When not tracing stat and open the worst performance overhead we measure is 1.6.

41 3 Limitations of Forensic Event Reconstruction based on Log Files

Table 3.6: Overhead of system call tracing. Event Name No Tracing All Syscalls Only Occurring w/o open w/o stat w/o open, stat tar 0.05s 4.5s (90) 0.2s (4) 0.1s (2) 0.18s (3.6) 0.08s (1.6) sshLogin 0.5s 21s (42) 1.0s (2) 0.6s (1.2) 0.8s (1.6) 0.6s (1.2) dockerHelloWorld 0.9s 70s (78) 8s (8.9) 4s (4.4) 6.5s (7) 1.5s (1.6)

3.5 Related Work

There exists a considerable amount of previous work on incident detection using log files. Much of it is focused on the detection of malicious events within the context of intrusion detection. For example, UCLog [88] (Unified Correlated LOGging architecture for intrusion detection) and later UCLog+ [162] correlate different log types to generate alerts. Therefore, the authors utilize different loggers, e.g., Kernel API Logger (system calls), network logger (received packet headers), and file system loggers. The data is stored in a unified structure. Monitors receive these audit data and try to detect abnormal behavior. As mentioned above, the focus was on computer intrusions such as infections by e-mail viruses.

Marrington et al. [98] introduced an approach called computer profiling [6] that is able to infer events based on simple cause-effect rules specified by the examiner. This approach is, therefore, only partly automatable. In a similar direction, Khan et al. [76] use a neural-network approach to learn signatures based on file system metadata. Because of the necessary amount of training data and its probabilistic nature, the approach does not scale and cannot be used to establish lower bounds. Another approach is to make use of profile hidden Markov models. These are applied to determine NFS events from network traces [159] or to analyze malware behavior [127]. For Windows, there is an ontology-based approach that generates timelines from a huge amount of data sources [16] without a specific focus on log files.

In the context of forensic investigations, Gladyshev and Patel [50] pioneered the idea of formalizing event reconstruction based on finite state machine representations of computing systems and to detecting events by applying formal logic [49]. This work has evolved most recently into a more practical approach that can compute signatures of system events [73] to infer previous actions. However, these signatures are still rather complex and probabilistic, and they were not applied specifically to reconstruction based on log files. Like Gladyshev, Dewald [32] developed a theory of forensic event reconstruction that is slightly simpler and therefore applied to log files in this chapter.

Liao and Langweg [89] made a cost-benefit analysis of system call tracing for forensic readiness. They used kernel-level techniques like strace to trace system calls. Costs were measured in terms of performance impact and storage consumption. Benefit was measured by the comprehensive system call trace coverage. The detectability of specific events was not measured. System call tracing, especially with strace, also turned out to be very expensive and not applicable for production systems. Rakotondravony et al. [125] introduced reconfiguration of data collection and cost prediction for visualization of monitoring data in distributed systems. The authors focused on VMI-based data collection. Functionalities that come with high-performance impacts, such as debugging, can be enabled or disabled. There is a cost prediction for such services. The cost prediction calculates an overhead factor based on the number of an event (or system call) that occurs in a sampling time interval and the additional runtime of tracing such an event with a specific monitor mechanism.

42 3.6 Discussion

3.6 Discussion

Almost every user interaction can be logged in a log file. Often it depends on the log level and which logs are enabled or disabled. Even CLI user events can be logged in shell history files, e.g., .bash_history. However, we assume a powerful attacker would use another shell in our attacker model when he does something malicious. So, we did not consider such log sources. The evaluation revealed that it is hard to detect the usage of command line programs without system calls. Thus, tracing some system calls for specific events is beneficial if one needs to detect certain events. One significant advantage of usingsystem calls is that faking is very hard and probably makes no sense. Furthermore, even a strong attacker cannot deactivate the tracing. He or she even should not be able to detect that system call tracing is activated. This is possible because for tracing system calls, virtual machine introspection is used. Another advantage of system calls is that they are very generic and are generated for nearly all imaginable events. However, we are aware that tracing all system calls is highly expensive and not realistic. Another insight of the evaluation is that the selected feature set matters for characteristic fingerprint calculation. Feature vectors become more characteristic the bigger the feature set is. Thus, these feature vectors will be preserved when subtracting other events’ fingerprints. In general, it is beneficial to parse as much from the unstructured logdata to the unified message format. Then the danger increases that the calculated fingerprint is too characteristic for the event. Too characteristic means that the fingerprint will only match in the same environment as recorded and fingerprinted. For example, it will only match in certain paths or for certain arguments. Particularly, for small events with a small number of feature vectors in the characteristic fingerprint, the proportion of too specific feature vectors might be big. It is considered as future work to analyze the impact of the feature set for matching. We showed that these characteristic fingerprints could be used to perform matching and reconstruct events based on log files (including system call traces). Furthermore, we analyzed the possibilities and impact of system calls in characteristic fingerprints and showed how to systematically reduce overhead by tracing only necessary system calls. We furthermore named discriminating system calls where future work can build upon. Future work should also consider calculating the costs of system calls more precisely, e.g., by measuring the actual costs of specific system calls as in[125]. Our measurements also revealed that there is more overlap between feature sets with more events and a bigger reference set. So, for some events, we could not calculate a characteristic fingerprint. To calculate characteristic fingerprints, it would be necessary to make the events more characteristic. This can be done by increasing the feature set or log source set. It would also be interesting how other program versions influence characteristic fingerprints.

43

4 Universal Taxonomy and Survey of Forensic Memory Acquisition

Chapter3 revealed some limitations of forensic event reconstruction using log files. For this reason, we treated system call traces as log messages, which increased the size of characteristic fingerprints and so the capabilities for forensic event reconstruction. Totrace system calls we used libvmtrace [145] which makes use Virtual Machine Introspection (VMI). Thereby, the target virtual machine’s memory is accessed, i.e., read and written by the tracer to inject breakpoints trapped by the hypervisor. There are far more artifacts in memory that might be interesting for digital forensics: The increased use of encryption and remote storage techniques impedes digital evidence recovery. In such cases, it is necessary to perform live memory acquisition and analysis to retrieve cryptographic keys or track the location of network storage areas. Also, the analysis of malware that exclusively resides in volatile memory necessitates the need for such activities [113, 47]. Therefore, it is equally important to advance both practical knowledge and best practices on forensic memory acquisition and increase the body of scientific knowledge on this subject.

Based on the diverse individual and specialized results in the area, Vömel and Freiling [155] published a comprehensive survey of memory acquisition software for Microsoft Windows. While being an important application domain, many aspects of Microsoft Windows are not generalizable to other systems. This is why Vömel and Freiling [156] subsequently developed a set of generalized quality metrics that apply to any type of volatile storage system. Using the three metrics of correctness, atomicity and integrity it is possible to assess and compare the results of memory acquisition tools in a general way. Intuitively, a memory snapshot is atomic if it shows no signs of concurrent system activity, i.e., it is equivalent to a snapshot taken after “halting” the system. Integrity refers to the destructive influence of the acquisition procedure on the memory snapshot itself, e.g., if partsof memory are overwritten before the snapshot is taken. Experimental results by Vömel and Stüttgen [157], and by Gruhn and Freiling [57] showed that common tools generally have high correctness but differ considerably in their level of atomicity and integrity.

The focus of all these studies was on software tools like mdd [95], tools based on DMA like Inception [93], the Windows process monitor, crash dumps and cold boot attacks [60]. Modern systems, however, offer many other access methods to volatile storage, themost prominent being hypervisor-based acquisition techniques [161]. Nevertheless, there are also other methods that can potentially be leveraged to create forensic copies of main memory, such as the Intel Converged Security and Management Engine (CSME) or the Baseboard Management Controller (BMC) on servers. While it is certainly possible to compare memory snapshots from all these tools using Vömel and Freiling’s metrics, it is unclear whether this comparison is fair since these methods appear to operate on much higher levels of privilege. For example, hypervisor-based memory snapshots will generally have better atomicity and integrity values than software-based tools like FTK Imager [29] as the latter run on the same level as the system that is to be imaged.

45 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

Contribution The main contributions in this chapter are as follows: • We develop a model that allows us to classify today’s memory acquisition techniques in a more general way. Briefly spoken, we define a partial order on acquisition methods based on the level of access to the address space a certain technique is supposed to acquire. It turns out that this model generalizes not only concrete operating systems but also specific hardware architectures. In contrast to possible categorizations that rely on the classic ring-based privilege model, this partial order also allows to integrate of seemingly unrelated execution contexts like Intel’s SGX [26], AMD’s SME/SEV [75, 1] or virtualization-based techniques into a more general taxonomy. Using this taxonomy, we survey the field of today’s memory acquisition techniques. • We present the first survey of forensic memory acquisition that is operating system and hardware architecture-independent. We classify a vast amount of tools and give the reader a thorough understanding of the latest memory acquisition techniques. Our analysis also points to promising future work fields since we observe particular clusters of acquisition techniques (at the hypervisor level or higher). However, not many tools exist for more powerful privilege levels. This chapter is structured as follows. In Section 4.1 we develop a generic memory access hierarchy that generalizes the classic memory access model of most modern architectures. Based on the resulting hierarchy, we present our taxonomy in Section 4.2. Hereupon, Section 4.3 surveys the landscape of forensic memory acquisition methods and tools. Finally, there is a discussion in Section 4.4.

4.1 The Generic Memory Access Hierarchy

The ring model of Intel, AMD, and many other architectures is a strictly hierarchic order of operating privilege levels that can be used to categorize memory acquisition methods. User applications run on the user level whereas the operating systems run on the kernel level. Below, there is the hypervisor level where multiple OSs can be multiplexed. The System Management Mode (SMM) is an instance of another level below which we call synchronous management level. Finally, we call the lowest level device level. On this level, the acquisition software is not executed from the (main) CPU but from an external microprocessor or interface. However, not all acquisition methods are tied to one specific ring (especially software-based methods). To achieve a more comprehensive model that categorizes state- of-the-art memory acquisition techniques, we introduce an abstract accessibility relation that can help compare different acquisition techniques. The accessibility relation can also be used to derive conditions for acquisition that are forensically sound [14].

4.1.1 System Model

In the following, we assume that we examine a standard computer system with one or more CPUs. The CPUs are connected to memory, i.e., RAM (be it volatile or persistent). CPUs execute machine instructions that are loaded from memory together with the data they process. We focus on synchronous memory, i.e., a machine instruction has to wait until the required value is loaded. Asynchronous memory, as used in hard drives, is not considered. An address space is a set of synchronous memory available for, e.g., a program or the kernel and is designed to be accessed homogeneously. Virtual memory systems allow the

46 4.1 The Generic Memory Access Hierarchy

User Level App App

Kernel Level OS

Figure 4.1: Address space accessibility relation for system/user space multiplexing. construction of virtual address spaces that simplify memory usage and — combined with ring protection — enable memory isolation. We say that address space A is accessible from address space B when a program that runs in address space B can read memory in address space A. The definition is independent of the number and type of the CPUs that operate in address spaces A or B. In our context, reading memory means to be able to interpret it, i.e., if A is encrypted and the encryption key is not available within the context of B then A is not accessible from B. Although the CPU would not prevent the read access per se, only encrypted memory could be read. More formally, let A be the set of all address spaces. The accessibility relation ≤ is a subset of A × A. So if A and B are address spaces (elements of A) and if A is accessible from B, then we write A ≤ B. If for any A, B ∈ A we have both A ≤ B and B ≤ A we write A = B. If A ≤ B but B ̸≤ A we write A < B. The accessibility relation ≤ is reflexive and transitive.

4.1.2 Memory Access Levels with Multiplexing

For example, consider a typical x86-64 system environment where the kernel isolates multiple user processes via virtual address spaces. While applications (ring 3) are restricted to their own address space, the kernel (ring 0) can access the entire physical memory to maintain paging structures and isolate the address spaces. If an application tries to access ring 0 memory, the MMU will cause a trap into the kernel. This scenario’s accessibility relation is visualized in Figure 4.1 where an arrow A ← B denotes A < B. Technically, the kernel multiplexes virtual memory by switching address spaces depending on the currently active process. Technically, this is performed within the scheduler, the kernel component responsible for selecting and switching to the next process. Therefore, the scheduler changes the processor register that references the root of the active paging hierarchy (e.g., CR3 on x86 respectively TTBR0 on ARM) to the root PML4 table of the corresponding address space. In general, the accessibility relation from Figure 4.1 repeats in other contexts, e.g., it is structurally equivalent to one VMM running in ring -1 that separates the address spaces of different guest VMs. Whenever address space multiplexing occurs, the accessibility relation will have a tree structure, i.e., it will “fan out” towards accessible address spaces.

4.1.3 Memory Access Levels without Multiplexing

There exist also cases where the access hierarchy is linear and does not generally fan out. Consider the SMM of the x86 architecture, which operates even below a VMM, on ring -2.

47 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

Hypervisor VMM Level

Synchronous SMM Management Level

Device Level Chipset

Figure 4.2: Address space accessibility relation for the hypervisor layer and below.

Since the SMM is not restricted by any memory protection mechanism, it has full access to the memory of all lower privileged instances (higher rings) plus the SMRAM. Since there is no hardware support to trap into the SMM in order to schedule several VMMs or kernels, the SMM is not able to do memory multiplexing, and so the hierarchy is “straight” instead of fanning out. Similarly, access at the chipset level (OOB, e.g., Intel CSME) can access the SMM but cannot perform multiplexing. The corresponding accessibility relation is depicted in Figure 4.2.

4.1.4 Accessibility with Hardware Memory Encryption

Not all access relations match the two cases above. For example, Figure 4.3 depicts the accessibility relation of processor-based memory encryption technologies. Despite being part of an application’s address space, the memory of an SGX enclave cannot be accessed from any higher privileged software like the kernel (see also Section 2.2.7). However, the enclave is able to access the memory of the user process it is part of. Further memory encryption technologies like AMD’s Secure Memory Encryption (SME) or Intel’s Total Memory Encryption (TME) prevent the system’s memory from being accessed from external DMA devices (see also Section 2.2.7) . Besides, Secure Encrypted Virtualization (SEV) or Multi-Key Total Memory Encryption (MKTME) allow virtual machines to encrypt guest memory to protect from higher privileged layers individually. Our generic access model also considers such special cases.

4.1.5 A Generic Memory Access Hierarchy

In general, the accessibility relation will be a partial order. This follows from the “horizontal” isolation common in virtual memory systems, as was described earlier. The address spaces A1 and A2 of two distinct processes cannot access each other, whereas the kernel’s address space can access both. A fan out in the partial order happens when there is a higher privileged instance (e.g., the kernel) multiplexing between the lower privileged instances (e.g., the applications). Theoretically, there might even be a join in the partial order. For this to happen, e.g., two processes would have to share the same address space on two distinct VMs that are controlled by the same VMM. Since this is not a typical scenario, the partial order will usually have a tree structure.

Figure 4.4 shows the accessibility relation of modern architecture’s privilege levels. We now argue that the depicted accessibility tree is universal and can be mapped to many

48 4.1 The Generic Memory Access Hierarchy

User Level App Enclave App App

SGX

Kernel Level OS OS SEV/ MKTME

Hypervisor VMM Level

Synchronous Management SMM Level SME/TME

Device Chipset Level

Figure 4.3: The accessibility relation of systems which use processor supported encryption of volatile memory. Processor extensions like AMD’s SME or Intel’s TME thwart external devices (e.g., DMA devices) to meaningfully access a system’s memory. Extensions like SEV or MKTME can protect a VM’s memory from higher privileged software like the hypervisor.

(if not all) hardware architectures today. Although this survey mainly focuses on x86-64, today’s most relevant architectures like ARM, Sparc, or PowerPC all can be mapped to our generic classification model. ARM, e.g., provides exception levels that are very similar to x86’s protection rings. Besides EL0 (user mode), EL1 (kernel mode), and EL2 (hypervisor mode), there exists a so-called Secure Monitor Mode (SMM), respectively EL3, that is responsible for switching from the normal world to secure world and vice versa. The latter is a special operation mode that allows the secure execution of sensitive software. On mobile ARM devices, this TrustZone is often used to execute software that handles sensitive data like user fingerprints, banking information, or encryption keys. Other architectures like PowerPC or Sparc restrict their privilege levels mainly to user and kernel mode. To abstract from a specific architecture’s privilege levels, we chose generic names for each layer. In Figure 4.4 we map the privilege levels of both x86-64 (blue) and ARM (green) to our generic layers (red). We will use these layer names as one central dimension of our taxonomy.

It is important to note that our defined hierarchy is independent of the number of CPUs concurrently accessing a system’s memory. It is a general way of expressing memory access privileges, and the hierarchy exists for every physical memory.

49 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

ring 3 User Level App App App App EL0

ring 0 Kernel Level OS OS EL1

Hypervisor ring -1 VMM Level EL2

Synchronous ring -2 Management SMM EL3 Level

Device Chipset ring -3 Level

Figure 4.4: General memory accessibility levels (red) which map to x86-64 protection rings (blue) and ARM’s exception levels (green).

4.1.6 Forensic Memory Acquisition

Assume we have a program P that is of forensic interest. Investigators are, therefore, prone to acquire the content of the address space A of P . This address space can be on an arbitrary level in the memory access hierarchy. To acquire memory, we assume an acquisition method Q that runs in address space B at some memory access hierarchy level. Method Q is supposed to be used to acquire the program’s memory P (i.e., address space A).

The first insight from this discussion is a necessary condition for correct [156] forensic memory acquisition, namely that B can access the address space A, i.e., A ≤ B. Otherwise (in case B < A), it would not be possible for Q to acquire forensic evidence from P (apart maybe from tapping side channels [92, 77]). In case that B is on the same level as A (i.e., A = B), we have a potential problem of anti-forensic activities and so cases where A < B (i.e., where Q has a privilege level that is strictly higher than P ) are preferable in practice. Despite different levels of integrity and atomicity156 [ ] that memory snapshots can have in this case, they are closer to what is considered as forensically sound than for the case A = B. In general, however, considering A = B does not exclude admissibility of the collected evidence in court, it just needs more justification [14].

4.2 Taxonomy

Based on the generic memory access hierarchy described above, we now develop a taxonomy of forensic memory acquisition techniques that neither relies on operating system specifics nor specific hardware architecture specifics. As said before, we focus on synchronous

50 4.2 Taxonomy memory and assume that the system in question matches the general memory access hierarchy mentioned above.

4.2.1 Dimension 1: Access Hierarchy Level

Consider again program P with address space A and program Q with address space B. Assuming that the necessary condition of forensic memory acquisition is met (i.e., A ≤ B), the first dimension of our taxonomy is Q’s privilege level in the memory access hierarchy. Since some of the acquisition methods like cold boot and SGX are tied to a physical layer, we use the level of the general access hierarchy as the first dimension of our taxonomy, i.e., we consider the following ordered list of levels:

• User Level (UL)

• Kernel Level (KL)

• Hypervisor Level (HL)

• Synchronous Management Level (SML)

• Device Level (DL)

We say that an acquisition software Q is resident on level x if Q’s highest privileged code to access P ’s memory is executed at level x. This also considers code that Q can request via interfaces like systems calls, thus not directly belonging to its codebase. For example, the user level debugger GDB (see Section 4.3.2.4) makes use of the ptrace system call. As its corresponding system call handler, is part of the kernel, our taxonomy classifies GDB as a kernel level acquisition tool.

4.2.2 Dimension 2: Pre- or Post-Incident Deployment

unexpected system potential start of potential abort setup incident investigation of program P

1 2 3 4 time

Figure 4.5: Timeline of a system’s life when memory acquisition comes to play.

Another essential aspect is the acquisition method’s installation time. Incident response teams are often forced to deploy their tools after the occurrence of an incident. However, the installation of acquisition software often alters the memory, e.g., when a kernel driver is installed and loaded, which reduces integrity. In our taxonomy, we distinguish the following cases: If the memory acquisition software has to be installed before an incident (i.e., between points 1 and 2 in Figure 4.5), we say the software requires pre-incident deployment. Otherwise (after point 2), we say the acquisition technique can be installed post-incident.

51 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

Table 4.1: Classification of today’s state-of-the-art memory acquisition techniques (with references to relevant sections of this thesis). DMA-based Debuggers (Sect. 4.3.5.1) (Sect. 4.3.2.4) • Tribble [12] VMM Tools Emulators • GDB [40] • BMCLeech (Sect. 4.3.3.1) (Sect. 4.3.1.1) non- • WinDbg [102] (Sect.6) • libVMI [119] • Bochs [86] term. Hibernation Files Low-Level Debugging • dumpvmcore [115] Debugging Interfaces (Sect. 4.3.2.3) Interface (Sect. 4.3.5.2) • vmss2core [28] (Sect. 4.3.1.2) UEFI RTS (Sect.5) • DCILeech pre- incident (Sect.7) Cold Boot (Sect. 4.3.5.3) term. • UEberForensIcs (Sect.5) Kernel Support Forensic Post- DMA-based SMM Rootkits (Sect. 4.3.2.1) Incident Hypervisors • Pmem [140] non- (Sect. 4.3.5.1) (Sect. 4.3.4) (Sect. 4.3.3.2) • LiMe [161] term. • Inception [93] • SMMBackdoor[112] • Hypersleuth [99] • FTK Imager [29] • PCILeech [46] • TrustDump [143] • Vis [161] • DumpIt [142] • ProcDump [133] post- incident Crash Dumps Classic Cold Boot (Sect. 4.3.2.2) term. • Kdump [149] (Sect. 4.3.5.3) • minidump [103] • core dump [91]

DL SML HL KL UL

4.2.3 Dimension 3: Terminating vs. Non-Terminating Acquisition

Our survey’s third dimension refers to the abort behavior of program P after the acquisition method’s deployment Q. If due to the use of Q there is a time t after which program P aborts — see point 4 in Figure 4.5 — we say that the acquisition method Q is terminating program P . Most of our analyzed acquisition methods do not terminate P . We call these methods non-terminating. In general, program abort is not desired. However, acquisition methods that terminate the target program often achieve a higher atomicity level than non-terminating approaches.

4.3 Survey

Based on the taxonomy described in the previous section, we now proceed with presenting our comprehensive operating system and hardware architecture agnostic survey of memory acquisition techniques. Table 4.1 presents an overview of the forensic data acquisition methods (in italics) and tools that we discuss in the following sections. Tools or techniques printed in bold are tools or techniques introduced in this thesis and are described in the corresponding sections. Entries are ordered according to the three dimensions introduced in Section 4.2. We use the access hierarchy level as the main ordering criterion and work from higher layers (least privileges) down to lower layers.

4.3.1 User Level

Since the user level is the least privileged level, there exist only a few memory acquisition techniques. As A = B, it is only possible to dump the memory of the same virtual address space. The first presented technology makes use of emulation, virtually lowering the target application’s privileges. Therefore, a user level acquisition method has full control over a target running on the same privilege level.

52 4.3 Survey

4.3.1.1 Emulators

As already mentioned, one possibility to acquire memory from within the user level are software emulators. Thereby, the emulator integrates the acquisition method Q, and the guest program corresponds to P . As the emulator provides guest memory, it is trivial to create snapshots that preserve atomicity and integrity. In the following Bochs is described as a typical example of emulators.

Bochs [86] is an open source x86-64 emulator. There exist implementations for x86, x86-64, PowerPC, Alpha, Sun, and MIPS. Only the x86 (as well as the x86-64) architecture can be emulated, however. Emulators like Bochs usually run at the user level, which does not require any special kernel driver. Due to an instruction-wise interpretation, there is a huge performance overhead, however. Because of this performance impact, common programs often are not desired to be executed within an emulated environment. Therefore, emulators are mostly used for software development and debugging. An emulator tries to mimic the original system exactly. Therefore, it fetches, decodes, and executes one instruction after another. Usually, registers are implemented as global variables, which are modified by the corresponding instructions. The same applies tothe guest’s memory, which is usually implemented as an array. An instruction is typically represented as a function of the respective programming language. The nature of the implementation of the guest’s memory allows Bochs to create a dump efficiently. The emulator itself integrates the acquisition software Q, which has full control over the target’s address space which is, in this case, the emulator’s address space. Bochs integrates a debugger, enabling the possibility to read memory from both virtual and physical (in the context of the virtual environment) addresses. Since the emulator has to run P , this acquisition technique belongs to the pre-incident class and is non-terminating.

4.3.1.2 Built-in Debugging Interfaces

One could also imagine a program P that comes with some kind of debugging interface that allows dumping P ’s own address space. So, in this case, Q is a part of P . However, we have not found such software or libraries yet.

4.3.2 Kernel Level

As the kernel is responsible for maintaining memory, it is a widely used location to integrate memory acquisition techniques. Most are implemented as kernel drivers. Nowadays, all relevant desktop operating systems come with an integrated memory acquisition func- tionality — a crash dump. Thereby, an image of a process’s virtual memory or even the entire physical memory is created whenever the system encounters a critical state. When a system is hibernated, a corresponding file that holds the system’s current state is allocated. This includes the system’s RAM, CPU registers, etc. Furthermore, software debuggers also make use of kernel functionality to control the debuggee and access its memory. It is possible to dump P ’s memory from the kernel level if P runs on either UL or KL. Often, the whole known and accessible physical RAM is dumped. In the following, we elaborate on the most important techniques that use Kernel level privileges to acquire a target’s memory.

53 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

4.3.2.1 Kernel Support

Most memory acquisition tools running at Kernel Level are implemented as kernel drivers. This comes with a few advantages. Compared to techniques directly incorporated within the kernel, a kernel driver is easier to develop and faster to deploy. Besides, investigators have the chance to install the acquisition technique even after a potential incident has happened. In the following, we provide insight into Pmem [140], one of today’s most sophisticated physical memory dumpers.

Pmem is part of the well-known memory forensic framework Rekall which is supported on Linux, Windows, and macOS. Pmem features physical memory acquisition by either requesting kernel support or by manually mapping physical frames by itself. Therefore, it is capable of acquiring the physical memory of user processes as well as the kernel.

Pmem is implemented as a kernel driver, which allows it to be deployed during the runtime of the system. To install Pmem into the kernel space, an investigator requests the corresponding loading mechanism supported by the particular operating system. On Linux, e.g., this is done by using the insmod command, which internally requests the kernel to load the driver by invoking the init_module system call. After finishing the acquisition, Pmem can simply be unloaded without interfering with the system’s execution. Therefore, it appears as a non-terminating memory acquisition method that can be deployed even after a possible incident. Pmem is split into a kernel and a user-mode component. While the first provides the necessary functionality to access and dump physical memory, the second offers a control interface to the investigator. Both components communicate with the help of operating system-specific channels. In case of Linux, this done viathe ioctl interface. To defend against malicious hooks that could tamper with the acquisition process, Pmem barely uses any kernel APIs. Depending on the specific acquisition mode (see below), it can map physical memory by itself instead of relying on existing kernel functionality. However, tasks like loading the driver or copying physical memory from kernel to Pmem’s user space buffers still require kernel support. Therefore, the integrity of its memory dumps isstill prone to anti-forensic attacks.

In contrast to Windows and macOS, Linux comes with a wide variety of possible kernel configurations. To be able to load the driver into these kernels, it needs to bebuilt

Rogue Page Virtual Address

Virtual Address #PML4 #PDPT #PD #PT Physical Page Offset Physical Address Space Space 47 38 29 20 11 0 8 8 8 8

Rogue Page Arbitrary Frame PDPTE PTE PML4E

PDE Rogue Frame PML4 PDPT PD PT

CR3

Figure 4.6: Pmem systematically maps each frame of the physical memory to a specific virtual rogue page by modifying the rogue’s corresponding PTE. The procedure does not much rely on operating system support, which hardens the acquisition process against kernel level subversions.

54 4.3 Survey appropriately. Due to different circumstances like the absence of compatible build tools, this is not always possible. To overcome this problem, Stüttgen and Cohen [141] presented LMAP, a tool suite to load a stripped-down and precompiled version of the Linux Pmem driver — called minpmem — on a wide variety of Linux kernel configurations. To install the acquisition tool, LMAP injects minpmem into a compatible host driver on the system before loading both to kernel space. Therefore, minpmem’s code needs to be statically relinked to work with the host module, requiring manual relocations.

As already mentioned, Pmem supports two different memory acquisition modes. The first de- pends on the functionality offered by the respective operating system. On Linux, e.g., Pmem acquires all memory regions, that are marked as System RAM in the iomem_resource tree, by using the kernel’s kmap functionality. Alternatively, Pmem requests /proc/kcore to access physical memory. As a second option, Pmem can map physical memory entirely independent of kernel APIs. Therefore, it allocates a single page in non-pageable memory, called the rogue page. Instead of trusting kernel functionality to obtain the rogue’s PTE, Pmem walks the paging hierarchy itself. This ensures protection against malicious modifications during the translation process. Pmem then has the chance to map arbitrary physical memory by remapping the PTE’s page frame number to the required physical frame. Its content can afterward be accessed through the virtual address of the rogue page. The remapping is done for each physical frame that should be included in the dump. To prevent Pmem from acquiring a cached page instead of the current mapped frame, it flushes the TLB before accessing the rogue page. All further reads fromthe rogue’s virtual address will now be redirected to the physical target page by the system’s MMU. Finally, Pmem simply copies memory out of the rogue page to its user-space buffers. Figure 4.6 depicts Pmem’s process of remapping physical memory through the rogue page.

As much of an advantage, it is to be able to install Pmem at any given time. The tool also suffers from major drawbacks. First, because the system is not halted duringthe acquisition phase, Pmem cannot create atomic dumps. This is because the state of the system is likely to be altered while Pmem acquires physical memory. As many production systems are not allowed to be interrupted, even temporarily halting the machine may not be an option. Second, Pmem’s initial installation routine slightly alters the kernel’s memory, which reduces the integrity of the memory dump. However, introducing minor modifications to the target system’s state is common for almost all acquisition methods at this level. Finally, there is always the chance that an already compromised kernel subverts Pmem’s acquisition process using anti-forensic techniques. On Linux systems, e.g., a rootkit could subvert the memory copying process between kernel level and user level by hooking the kernel API _copy_to_user. For the sake of completeness, it should be noted that Pmem restricts its dumps to regular memory regions. However, a system’s physical memory cannot be seen as one contiguous address space but instead is interleaved with device memory. Since devices do not always claim these regions entirely, this can result in special offcuts — called hidden memory–backed by accessible RAM, but are neither used by the kernel nor any other device [140]. Palutke and Freiling [117] recently showed that hidden memory could serve advanced malware as a hideout to subvert Pmem’s acquisition process.

Additional tools that use a kernel driver to aquire physical memory are LiME [144], FTK Imager [29], DumpIt [142] and ProcDump [133]. However, these tools usually make use of built-in kernel functionality to access the memory while Pmem does come with own code to access A.

55 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

4.3.2.2 Crash Dumps

If a system or application crashes, developers often rely on a memory dump to investigate its cause. For that reason, all relevant operating systems come with crash dump abilities. Usually, the dump can be restricted to the kernel’s address space, an application, or the entire physical address space. The crashed program or system is the program of forensic interest P while the crash dump component corresponds to the program Q. In the following, Kdump [149], a Linux crash dump software is described.

OS kernel operation application A

application B kernel panic kexec run dump-capture- application C kernel

generate crash dump

OS kernel /proc/vmcore

dump-capture kernel GDB

reserved ELF file

physical memory

Figure 4.7: Kdump’s memory acquisition and analysis process.

Kdump is built-in into the kernel. No additional installation is required. Figure 4.7 shows the memory acquisition process of Kdump. If a Linux kernel crashes, a second Linux kernel – the dump-capture kernel – is started, and the old kernel’s memory is preserved. Kdump is based on the kexec system call. Hence, no bootloader or hardware initialization is required. The system call is executed when a kernel panic occurs. Kexec runs in a reserved memory area that guarantees that ongoing DMA does not corrupt the new kernel. Information about the crashed system is stored in the preserved area in a pre-generated ELF core file. The physical address of the ELF header is passed through a boot parameter. From the dump-capture kernel, it is possible to access the memory image via /proc/vmcore. The output can be copied using standard tools and be analyzed using GDB.

As crash dumps are only produced when a system undergoes a crash, they are inherently terminating acquisition methods. As Kdump is part of the Linux kernel, it is also available before the occurrence of a potential incident. Furthermore, the kexec system call is disabled on some systems for security reasons, which means that kdump might not work on these systems.

There are also other crash dumps, e.g. minidump [103] for Windows or Linux’s core dump [91] that can also be used when an application crashed (e.g., by a SEGFAULT).

56 4.3 Survey

GDB Debuggee

fork

ptrace(PTRACE_ME, ...)

execve()

wait for debuggee

handle signal

continue computation

signal

Figure 4.8: A simplified version of GDB’s debugging workflow.

4.3.2.3 Hibernation Files

When a computer hibernates (also known as ACPI state S4 or suspend to disk), major parts of volatile memory are written to disk to restore the system’s state even after power was cut. To use this mechanism for memory acquisition, program P has to be part of the dumped memory (which can be assumed if P is running at the time of hibernation).

Technically, hibernation is a builtin feature of most OS kernels, which operate in slightly different manners. On Windows systems, the hibernation file iscalled hiberfil.sys and is located in the C:\ directory. The file can be converted and analyzed using standard memory forensic tools like Volatility [41]. However, to analyze the file, one needs access to it, which can be hard if full disk encryption is used. Furthermore, the hibernation mode has to be enabled. So, if an investigator finds a hibernation file, it might be helpful. Ifthe disk is encrypted, an investigator cannot access the hibernation file. Thus, hibernation files are a rather unreliable forensic acquisition method.

4.3.2.4 Debuggers

Typical software debuggers make use of a kernel’s debugging support. The debugger (Q) can fully access (i.e., read and write) the debuggee’s (P ) memory. Modern operating systems like Windows and Linux come with debugging system calls that serve that purpose. Obviously, a debugger does not terminate the debuggee (P ) but is able to interrupt it for a random time. What follows is a short overview of the functionality and memory acquisition capabilities of the GNU Project Debugger (GDB) [40]. GDB, released by Stallman in 1986, is probably the most well-known debugger for systems. It makes use of the ptrace system call to debug a process (running in user level).

57 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

GDB is implemented as a standard user level application. Figure 4.8 shows a typical procedure of a debug session using ptrace. First, the debugger generates a child process using the fork system call. Then, the child process requests a PTRACE_ME command via ptrace. The newly spawned child process usually proceeds with a execve system call to launch the actual target process — the debuggee. From now on, the debuggee’s signals are diverted to the debugger. GDB’s typical workflow can be summarized as follows: (1) wait for a signal from the debuggee, (2) handle the signal, and (3) continue the execution of the debuggee.

Using ptrace, GDB (P ) requests the kernel to vicariously access the debuggee’s (Q) memory. Since ptrace is a standard system call provided by every Linux kernel, no additional code is installed at kernel level. However, because GDB makes use of ptrace’s corresponding system call handler, which is part of the kernel, we classify GDB as a kernel- level memory acquisition tool. Although GDB is mainly used for classically debugging a program (e.g., by examining a program’s execution), it is still feasible to create memory dumps using GDB.

Most classic debuggers work in a similar way. A widely used debugger for the Windows platform is WinDbg [102]. Windows offers special functionality, like ReadProcessMemory or WriteProcessMemory to access an application’s memory.

Using the presented technique, it is only possible to dump memory from user level ap- plications. Besides launching the debuggee from the debugger (pre-incident), it is also possible to attach to a running process (post-incident). For that case, root privileges are required, however. Like emulators (as described in Section 4.3.1.1), running an application in a debugger comes with a significant performance overhead. While not being irrelevant for forensic memory acquisition, debuggers are rather uninteresting in this context.

4.3.2.5 UEFI RTS

The UEFI Runtime Services (RTSs) are services offered by the firmware for the OS. These services are used, for example, for reading or setting the time or other firmware parameters. However, the RTS are not called as privileged as the System Management Mode (SMM) but are called in the context of the OS kernel. In Chapter5 we introduce a proof-of-concept that shows how the RTS could be used for non-terminating forensic memory acquisition. Basically, such software needs to be deployed before a potential incident.

4.3.3 Hypervisor Level

Common memory acquisition tools like Pmem or LiME require OS kernel functionalities, i.e., are implemented as drivers. However, if malware succeeded in gaining kernel-level privileges, it can subvert the acquisition process using anti-forensic techniques. For example, direct kernel object manipulation techniques can modify management structures of the kernel to tamper with the output of monitoring tools like the Windows Process Explorer [132]. As an example, the FU Rootkit [138] unlinks process objects from the linked list of active processes. After the unlinking, the respective processes disappear from the output of Windows Process Explorer. The number of possibilities for such manipulations in the kernel space is numerous. Eventually, tools that rely on internal kernel objects or structures cannot be fully trusted anymore.

58 4.3 Survey

Shadow Walker [138] was introduced at Black Hat Japan in 2005. It conceals itself by subverting the paging process. The rootkit declares PTEs that reference its pages as not present. Shadow Walker patches the Page Fault Handler (PFH) to separate between reading/writing and executing access to hide. If the instruction pointer contains a corresponding address, Shadow Walker infers an execute access. Otherwise, it assumes a read or writes access. To tamper with the results of read/write accesses, the modified PFH returns faked versions of the rootkit’s pages. As a result, it can subvert even techniques leveraging kernel-level privileges from acquiring correct memory images. To counter techniques like Shadow Walker, analysts require approaches that operate with even higher privileges than an operating system’s kernel.

4.3.3.1 Virtualization Tools and Virtual Machine Introspection

Standard virtualization solutions like VMware, KVM, or Microsoft’s Hyper-V usually integrate functionality to acquire a guest’s memory. A guest cannot hide its memory from the more privileged hypervisor. After dumping the (physical) memory, it can be analyzed, e.g., by using forensic tool suites like Rekall. Often it is sufficient to dump only particular parts of memory. This is when Virtual Machine Introspection (VMI) comes into play. There are two classes of VMI techniques: (1) synchronous and (2) asynchronous VMI [71]. Tools from the first category interfere with the program flow of the target OS. These tools are common to prevent security violations by the VMM. Tools that belong to the asynchronous class inspect memory of a concurrently running guest, usually in the form of a snapshot or with the help of a read-only or copy-on-write view of the guest’s system memory. The analyzed program running within the guest system is P , while the VMI software integrated into the VMM represents Q. A can be the address space of a user level application, the kernel, or any arbitrary memory region within the VM. In the following, we provide an overview of LibVMI [119], a well-known VMI library. In contrast to acquisition software running within the kernel, tools that operate on the hypervisor level generally do not have any contextual information on interpreting a guest process’s address space. This problem is usually characterized as the semantic gap between a hypervisor and its guests [17]. LibVMI constitutes an API for VMI that uses several heuristics to overcome the semantic gap. The framework provides support for KVM, Xen as wells as the creation of static memory snapshots. LibVMI can access both physical and virtual addresses as well as kernel symbols. LibVMI evolved from the XenAccess project [120]. It is executed from the privileged host system. There is a C API and a Python API, called PyVMI. LibVMI integrates into the VMM, e.g., as a patch to KVM, and must be configured towards different types of guests using a configuration file. It contains basic information on the underlying OS as well as important address offsets. In the case of a Linux guest, the configuration also containsthe path to the system.map file. This file contains the symbol table of the Linux kernel. Itis generated when the kernel is built. In the case of a Windows guest, the Program database (often abbreviated as PDB) information is used. If only a dump of the entire physical memory is needed, there is no need for such configuration files. However, LibVMI makes use of these configuration files to overcome the semantic gap. While LibVMI can retrieve information about an OS’s kernel symbols, it cannot introspect user level symbols. Furthermore, it requires the respective symbol tables, which, in the

59 4 Universal Taxonomy and Survey of Forensic Memory Acquisition case of Linux, cannot always be generated since this requires the possibility to compile the target kernel oneself, a futile endeavor without the correct configuration files.

Often virtual machine software comes with an integrated memory dump capability. One example is VirtualBox’s dumpvmcore [115] that dumps the guest’s memory into an ELF binary. Another example is VMWare’s vmss2core [28] that transfers the proprietary VMWare checkpoint state files or suspends state files into well-known core files thatcanbe analyzed by debuggers.

4.3.3.2 Forensic Post-Incident Hypervisors

While the last section focused on a priori acquisition techniques, the following VMMs can be installed after a potential incident. As in the previous section, the P has to be part of the running guest system. The VMM integrates the acquisition software Q.

HyperSleuth [99] is a framework for forensic live analysis leveraging Intel’s VT-x. Hyper- Sleuth comes with a memory acquisition tool, a lie detector, and a system call tracer. It operates below the OS kernel, i.e., on the hypervisor level. The original OS is virtualized at run time and becomes a guest that can be controlled by the VMM. Since an investigator cannot even trust the system’s kernel, HyperSleuth does not rely on any guest functionality after its installation has been completed.

HyperSleuth can be installed a-posteriori similar to a kernel driver. After the installation, it creates a Dynamic Root of Trust (DRT). An outer trusted host attests — using a challenge-response protocol — that the DRT is established and thus HyperSleuth is installed and working correctly. As a result, the authors claim data returned from the VMM to be trustworthy. However, we do not believe this mechanism is able to guarantee the hypervisor’s integrity, as the kernel could manipulate its installation process.

HyperSleuth is implemented as a thin-hypervisor that makes use of Intel’s VT-x technology. The VMM limits guest interceptions to a minimal set of necessary events for the forensic analysis process. Since the running host system is migrated to a VM during HyperSleuth’s installation process, there is no support for multiple guests or sharing of I/O devices.

The main feature of HyperSleuth is its memory acquisition functionality. It uses processing cycles during a system’s idle-state to dump memory pages (dump-on-idle). An investigator can also request HyperSleuth to provide an atomic dump of the guest’s physical memory. This feature (called dump-on-write) works similar to the copy-on-write mechanism that many operating systems implement to spawn new processes, i.e., the VMM dumps all pages immediately to which the guest wishes to write. The VMM utilizes the MMU by maintaining shadow page tables (SPTs) for the guest. To prevent the guest from accessing its memory, the SPTs do not map the VMM’s memory.

More specifically, the dump-on-write mechanism works as follows (see also Figure 4.9): (1) HyperSleuth intercepts write accesses to the CR3 register, i.e., when the root page table (PML4) address changes. This usually happens during a context switch. (2) The VMM synchronizes the page tables with the SPTs and (3) removes write privileges in the PTEs of all writable pages. (4) Then, the actual CR3 change happens. (5) Whenever the guest tries to access such a page, a page fault transfers control to the VMM. (6) In case this page was not dumped before, HyperSleuth now acquires its content. (7) Afterward, the data from the guest is written to memory. Furthermore, every time the guest executes a hlt

60 4.3 Survey

Application 1 Application 2

OS kernel

(5) write context switch data

HyperSleuth

(2) synchronize with SPT

(3) disable write (6) dump privilileges in data PTEs

(7) write (4) set CR3 data

MMU CR3 Hardware Memory

Figure 4.9: HyperSleuth’s dump-on-write mechanism. instruction, i.e., the CPU is sent to the idle state, the VMM dumps remaining memory pages in the background.

Typically, kernel-level rootkits tamper output of system monitors like process lists, net- work connections, loaded drivers, etc. HyperSleuth implements a lie detector to counter these malicious activities. Since it has access to the entire machine, HyperSleuth can collect relevant data from outside the VM without support from the guest OS. Besides, HyperSleuth loads an in-guest tool to retrieve the corresponding data using the in-guest interfaces. Eventually, HyperSleuth compares these two data sets. Any difference indicates the existence of kernel level malware tampering with some data.

HyperSleuth also comes with a system call tracer. Traditional system calls generate software interrupts to request kernel functionality. A VMM can configure VT-x to trigger a VM-exit on the occurrence of an interrupt natively. HyperSleuth can then simply extract the required information from the guest. The newer fast system calls using sysenter and sysexit cannot be directly intercepted by VT-x. However, system calls using this mechanism access the address defined in the machine-specific register SYSENTER_EIP. HyperSleuth assigns a non-existent address to the register and ensures to generate a page fault whenever the guest requests a system call. Thereupon, the hypervisor has the chance to determine the reason for the page fault. HyperSleuth extracts useful information like parameters and the system call number in case a system call was issued.

As already described, HyperSleuth makes use of the dump-on-write and dump-on-idle mechanism. Using these techniques, it can acquire the guest’s memory at the actual time

61 4 Universal Taxonomy and Survey of Forensic Memory Acquisition of the request. Therefore, the dump’s atomicity is guaranteed while the underlying system — and thus our program P — is still running.

However, forensic hypervisors and HyperSleuth in particular also come with some limitations. VT-x is exclusively restricted to one VMM. Hence, if another hypervisor already claimed the virtualization extensions, HyperSleuth cannot virtualize the system. Moreover, HyperSleuth does not consider scenarios in which the guest tries to detect being virtualized. For example, trying to install another VMM could potentially raise suspicions. Furthermore, HyperSleuth’s code was never published.

Another very similar project is Vis [161] which also comes with a posteriori installation capability. Vis can be considered the next generation of HyperSleuth since it uses Intel’s EPT instead of maintaining shadow page tables.

4.3.4 Synchronous Management Level

Memory acquisition in SML is quite beneficial because it is independent of the OS or a VMM. Even if there is a malicious OS kernel or a subverted VMM, it is possible to dump the memory of the system reliably. However, as already described in Section 2.2.5, the SMM is set up by the BIOS. Hence, the development and installation process is very different compared to common forensic software. Instead of deploying the acquisition technique into the SMRAM, it could also be integrated into the BIOS itself. Similar to a cold boot attack, the BIOS could then acquire memory after resetting the system. To the best of our knowledge, we are unaware of any standard BIOS including such a feature. In Chapter5 we introduce UEberForenSics which basically works as just described.

The program P has to be part of a less privileged level, i.e., Hypervisor, Kernel, or User, while the forensic memory acquisition tool Q is implemented as an SMI handler, respectively a BIOS/UEFI module. In the following, we illustrate SML memory acquisition possibilities by looking at the SmmBackdoor [112] project, the only system we are aware of that offers SML memory acquisition possibilities.

The SmmBackdoor is an open source proof-of-concept implementation of a backdoor operating within the SMM of the x86-64 architecture. Even though the term backdoor often refers to malicious software, SmmBackdoor can rather be seen as a universal tool. One of its main features is to acquire the system’s memory. The UEFI installs SmmBackdoor. Therefore, an SMI handler that dumps the system’s memory is set up. Because the project’s current state is just a proof-of-concept implementation, it has to be installed using hardware tools. First, the motherboard’s firmware is dumped via a hardware serial peripheral interface (SPI) programmer. A Python tool then extends the functionality of the periodic timer’s SMI handler with the acquisition code. The modified image is subsequently written back, once again using the SPI programmer.

To control SmmBackdoor from the user level, an application writes a predefined request as well as corresponding arguments to certain general-purpose registers. To prevent other software from accidentally requesting a memory dump, SmmBackdoor also requires specific values, i.e., magic values, in some registers. The application then waits for the SMI handler to finish by executing an infinite loop. When the SMM command has finished, it changes the saved instruction pointer to terminate the infinite loop.

62 4.3 Survey

The hooked SMI handler finally dispatches the application’s request. Among others, there are commands for reading and writing physical or virtual memory. Data can be exfiltrated using the RS-232 port. The SMRAM area should always be locked after the boot process. Thus, SML memory acquisition software has to be installed a priori. Otherwise, a BIOS/UEFI vulnerability would have to be exploited. Eventually, the installation process is very hardware- and firmware-dependent. This requires SMM-based tools to be adapted to a specific system. For ARM, there is TrustDump [143], that acquires RAM from ARM’s TrustZone. In Chapter5 we introduce UEberForenSics which we also classify as an SML acquisition tool. UEberForenSics is similar to a cold boot attack (see also Section 4.3.5.3). However, the acquisition software is part of the firmware. Furthermore, it is terminating the running OS.

4.3.5 Device Level

All acquisition methods above DL make use of the main CPU to obtain the target memory. For forensic investigations, it is advantageous to obtain memory without installing any software simply by plugging-in external hardware. Now, the address space A can be all physical RAM and B still is completely unrelated to A because Q runs on external hardware. Note, A < B still applies.

4.3.5.1 DMA-based

In contrast to previous techniques, acquisition tools that use DMA to create memory dumps often do not require running on the target system. This is because DMA requests can be sent from an external device. Compared to conventional acquisition approaches, DMA does not interfere with the main CPU’s operation, resulting in less system pollution. Particularly for production systems, this can be a significant advantage. In the following, we provide details about PCILeech [44, 46], a well-known representative of memory acquisition tools that are based on DMA. PCILeech is an open source project developed by Ulf Frisk, which uses DMA over PCI Express to read and write from a target system’s memory. Due to its ability of hot plug, PCI Express offers the possibility to deploy PCILeech during the target system’s execution. Similarly, the tool can be unplugged without causing any interruptions. Therefore, we classify PCILeech as a non-terminating acquisition tool that can be installed even after an incident. Currently, PCILeech supports a variety of different hardware configurations that need to be flashed with dedicated firmware. Most configurations are either based on the USB3380 development board or special Field Programmable Gate Array (FPGA). The hardware is connected with USB 3 (alternatively USB 2) to an external controlling system running the PCILeech host software. The controlling software is implemented for Windows, Linux, and Android. Supported target systems are the x86-64 versions of Windows, macOS, FreeBSD, UEFI, and Linux. The installation requires to flash the PCILeech firmware to the PCI Express device simply. Depending on the specific PCI Express card used, some host systems need to install appropriate drivers. No drivers are required on the target side, however. Due to different form factors of PCI Express (e.g., PCIe, mPCIe, Express Card, Thunderbolt), some system

63 4 Universal Taxonomy and Survey of Forensic Memory Acquisition configurations might want to use specific adapters to connect the PCILeech hardwareto the target system.

Via native DMA, the USB3380 hardware can only acquire up to 4 GiB of memory with around 150 MiB/s. In contrast, the FPGA version comes with native 64-bit DMA support, allowing to read entire 64-bit physical address spaces with up to 150 MiB/s. Apart from native DMA, PCILeech offers the possibility to inject kernel implants into the target system. Since kernel implants run in the target system’s kernel context, they can fully access its memory and are not restricted to the first 4 GiB independently of the PCILeech hardware. However, injecting a kernel implant implies slight modifications to the target system’s memory, which reduces the integrity and, therefore, the acquisition process’s forensic soundness.

To insert a kernel implant into the target system, PCILeech uses a three-stage approach that works as follows: It starts by locating the very end of either the kernel or a driver within the target system’s memory. Usually, there is some free space that is then used by PCILeech to map its stage 2 code (about 500 bytes). As these pages are already marked as executable, the kernel’s page tables do not require any modifications. After writing the stage 2 code to these slack areas, PCILeech searches for a potential kernel function to hijack the system’s control flow by inserting an inline hook via DMA. Therefore, it overwrites the function’s first few bytes with a call into the stage 2 code. The next time a thread starts to execute the hooked function, control flow detours to the second stage of the insertion process. The first thread that enters stage 2 immediately removes the stage 1 hook to prevent multiple executions of the setup. Therefore, all consecutive threads call the original version of the function and resume normal execution. The first thread, however, searches for kernel functions that allow allocating two additional pages. The first page is used to establish a DMA communication channel from the controlling system to the kernel implant. To propagate the command buffer’s base address to the host system, the thread writes its physical address to a previously arranged location. That way, the host program can receive the buffer address via DMA by polling the known spot. On the second page, PCILeech writes a small stub that initializes stage 3. Thereupon, it proceeds to the third stage by launching a new kernel thread. Stage 3 starts by allocating a data buffer with a size between 4 and 16 MiB. It then enters a single busy loop that awaits additional stage 3 code from the PCILeech program. Eventually, the thread enters a final loop that waits for further instructions from the host program. Supported commands are basically reading or writing memory, executing additional code, or simply exiting. Besides acquiring memory, kernel implants also allow to write to the target’s memory, access its file system, or execute arbitrary code like system shells. Besides, kernel implants enable an analyst to mount live RAM or pull files from the target system. The biggest downside of PCILeech is that it cannot access the target system’s memory if an IOMMU is in use and properly configured. In this case, the target kernel can restrict DMA accesses and successfully prevent PCILeech from acquiring any memory. Another primary concern is that PCILeech misses the ability to acquire dumps atomically. This is because the target system’s execution is not interrupted during the acquisition phase. Therefore, the memory dump does not equal the target system’s exact state when the acquisition initially started.

Inception [93] is yet another DMA-based acquisition and attack framework. It allows bypassing the Windows lock screen by searching the corresponding code in RAM and shorting the password prompt. It also possesses the typical atomicity drawbacks of this class. Gruhn and Freiling [57] showed that Inception had the worst atomicity and integrity compared to other acquisition tools. That is primarily because of the initialization of

64 4.3 Survey

FireWire and the acquisition of the entire visible physical memory, even if it contains unused RAM. In this thesis, we introduce BMCLeech (see also Chapter6), which is also DMA-based.

4.3.5.2 Low-Level Debugging Interfaces

Some systems provide low-level debugging techniques such as a JTAG interface (see also Section 2.2.9). Using JTAG, it is basically possible to arbitrarily read and write memory and registers. This class of memory acquisition technique is does not terminate the target. However, in some cases it needs to be enabled first. In this thesis, we introduce DCILeech in Chapter7 which leverages Intel Direct Connect Interface ( DCI) for forensic memory acquisition.

4.3.5.3 Cold Boot

Cold boot [60] is a special acquisition technique that exploits the remanence effect of DRAM to access memory. After a reset, the DRAM is not immediately erased. Instead, its contents gradually fade away. The same holds after power is cut. As a result, data remains in memory for a short period of time and can be readout. This time can even be extended by cooling down the DRAM.

Basically, there are numerous possibilities to acquire the memory using cold boot. First, it is possible to boot a tiny dumping software using PXE. Second, one could boot from a USB flash drive. Third, the memory acquisition routine can be included in the BIOS/UEFI. If one cannot change BIOS settings, i.e., there is a BIOS password or similar, it is also possible to transfer DRAM modules into another system. This often requires the use of coolant spray, however.

The typical scenario of a cold boot attack is to extract a full-disk encryption key. In this scenario, the encryption key has to reside somewhere in A and the dumping routine Q must not override it. After dumping the memory, the location of the encryption key has to be found. Unfortunately, cold boot is prone to bit errors. Software to identify keys in RAM must therefore have the possibility of quickly testing certain key candidates using brute-force. To, e.g., recover the AES key from a RAM image, the user level application aeskeyfind [62] can be used. The tool extracts the key if it finds the AES key schedule in the RAM dump.

Cold boot attacks allow creating atomic dumps of the entire physical memory. Since the target system is not running at the time of the acquisition, rootkits have no chance to use anti-forensic techniques to tamper with the acquired memory. However, the cold boot also comes with limitations. Newer memory technologies like DDR3 and DDR4 have relatively short remanence times and scramble memory contents to avoid bit flips due to parasitic electromagnetic effects. Nevertheless, cold boot is feasible even on this systems [7, 160]. Furthermore, memory encryption technologies like Intel’s SGX or even software-based encryption techniques like TRESOR [107] render cold boot impractical.

65 4 Universal Taxonomy and Survey of Forensic Memory Acquisition

4.4 Discussion

This chapter presented a taxonomy that categorizes the field of forensic memory acquisition independent of the operating system and the hardware architecture. The taxonomy’s main basis is a partial order on acquisition techniques that generalizes the memory access privilege model of modern hardware architectures. The survey in this chapter is based on three dimensions: (1) the privilege level in the memory access hierarchy, (2) the point in time the acquisition technique was deployed (pre- or post-incident), and (3) the potential termination of the analyzed program by the acquisition technique. As a result, the survey revealed that the lower the execution layer of the acquisition method, the higher the acquired memory image’s quality. However, with increasing privilege level acquisition techniques are also increasingly difficult to deploy. Overall, memory acquisition is still a hassle on PC systems and even more on embedded systems and smartphones where usually no specific acquisition software can be deployed, and physical access to RAM is cumbersome, if not impossible, without the destruction of the device. Our survey also showed developments towards hardware memory encryption that escape traditional memory access hierarchies and make any known form of memory acquisition infeasible. Nevertheless, a wide variety of techniques allow acquiring the memory of today’s most important architectures. Furthermore, modern desktop and server systems come with onboard management chips, creating new possibilities for forensic purposes on the device level. Therefore, integrating forensic software into these layers could potentially acquire memory in a much less intrusive way.

66 5 Bringing Forensic Readiness to Modern Computer Firmware

The UEFI was introduced as the successor of the meanwhile nearly 40 years old PC-BIOS and is a “pre-installed” software, albeit not for forensic purposes. UEFI allows to start the OS in long mode, supports Secure Boot, and even own EFI applications can be executed in the UEFI Shell. Most UEFI implementations even come with a full network stack and sometimes even with a web browser. UEFI also specifies RTSs that can be called by the OS. These allow, for example, reading and setting UEFI variables or updating the firmware. There is a UEFI reference implementation called EDK II[150]. In this chapter, we exploit the high capabilities of modern computer firmware and bring forensic readiness to the UEFI. For this, we introduce UEFI built-in memory forensics (abbreviated as UEberForensIcs) for which integrated forensic memory acquisition software that can be used during the boot process. The memory acquisition is based on the concept of cold boot described in Section 4.3.5.3. Furthermore, we show how to persist code in the UEFI RTSs and get code execution that can also be used for forensic software. Additionally, we have built a tracer that traces calls of UEFI RTSs. Basically, UEberForensIcs is a kind of cold boot attack. The software has to be flashed to the SPI flash. Since this is a rather deep intervention into the system, this has to bedone in advance. Hence, we classify UEberForensIcs as a pre-incident, terminating and running on the Synchronous Management Level (SML) (see Chapter4). However, UEberForensIcs is actually not running on ring -2 (SMM) but in long mode after reset before the operating system is running. We think that the SML still fits best because the software on the Device Level (DL) is usually executed on another processor, and Hypervisor Level or Kernel Level would be misleading. The technique of memory acquisition using the UEFI RTSs can be classified as pre-incident, non-terminating and is executed on the Kernel Level. In this case, the software also has to be flashed before a potential incident. RTS calls do not terminate any program. Contribution The main contributions of this chapter are as follows: • We introduce UEberForensIcs and show how to integrate forensic software that enables cold boot-like memory acquisition directly into the firmware of a computer. The evaluation in the thesis reveals that this approach can also be practically used. • We show how to persist code in the UEFI that is executed when the operating system is running. This code runs with kernel privileges and can also be used for memory acquisition. • We develop an OS-independent RTS tracer. The RTS are thereby traced in the RTS itself. Our evaluation gives insights into which and how often specific RTS are typically called in different scenarios.

67 5 Bringing Forensic Readiness to Modern Computer Firmware

In Section 5.1 we give an overview of the architecture and setup. Then, in Section 5.2 we introduce UEberForensIcs. In Section 5.3 we show how to use the UEFI RTSs for forensics. Related work is discussed in Section 5.4. Finally, there is a discussion in Section 5.5.

5.1 Architecture and Setup

Developing and debugging firmware is an intricate affair and usually requires a special setup to be performed. We now describe the setup in which we developed our system and performed the experiments. Figure 5.1 shows a simplified graph of the architecture we use for our experiments. Filled boxes indicate that these modules are our developments. The target is running in a virtual machine with QEMU hypervisor. This makes development easier because, in this case, we do need to reprogram SPI chips for every change. Furthermore, it simplifies debugging. The right side of the graph is dedicated to the built-in cold boot part (see also Section 5.2) while the left side is dedicated to the runtime forensics part (see also Section 5.3).

5.1.1 Hardware Setup

Runtime Cold Boot [RTSTracer]{ Forensic Analysis 'service': 'GetTime', Software 'id':0, 'type':'OUT', OS Kernel 'argmuent':'Time', 'data':{ Dump 'Year': 2020, call 'Month': 9, RTS 'Day':22, store 'Hour':16, 'Minute':12, RTS Tracer UEberForensIcs exfiltrate Server 'Second':49, 'Pad1':0, continue 'Nanoseconds':0, RTS call perform acquisition 'TimeZone':2047, hook 'Daylight':0, 'Pad2':0 Runtime Services UEFI Shell } } QEMU Monitor Target

Host Machine Restart Forensic Workstation

Figure 5.1: Simplified architecture of UEberForensIcs and the RTS tracer.

The entire research was conducted on a standard laptop with an Intel Core i5-5200U CPU (2.20GHz, 2 cores) and 8 GiB of RAM. It runs Ubuntu Linux 18.04.4 with kernel version 4.15.0-118. The installed QEMU version is 4.2.92.

5.1.2 VM Setup

Usually, virtual machine monitors come with their own specialized firmware implemen- tations. In most cases, emulating firmware is not intended. For virtual machines, there is a target for EDK II called OVMF. This port supports QEMU’s virtual hardware. In Listing 5.1 one can see the command to start the corresponding QEMU VM. The VM is running Ubuntu Linux 20.04 and has 2 GiB of RAM.

68 5.2 Built-in Cold Boot

qemu −b i o s edk2/ Build /OvmfX64/RELEASE_GCC5/FV/OVMF. fd −drive format=raw, file=ubuntu−l i n u x . raw −drive format=raw, file=fat :rw:vm−content −global virtio −net−pci.romfile="" −nic tap,model=virtio −net−p c i −m 2048M −debugcon file :debug.log −enable−kvm −cpu host −cdrom ubuntu −20.04− desktop−amd64 . i s o Listing 5.1: Start of the virtual machine using QEMU.

Table 5.1 shows the physical memory map of our virtual machine. In this case, memory is quite cohesive. Memory range #3 is by far the largest memory region. A real system’s RAM is usually more fragmented. There is only a single memory hole from #2 to #3.

Table 5.1: Memory map of the virtual machine we used for our experiments. # Start End Pages Size Purpose 1 0x00000000 0x0009ffff 160 640 kiB System RAM 2 0x000a0000 0x000bffff 32 128 kiB PCI Bus 3 0x00100000 0x7fffffff 524032 2047 MiB System RAM

5.2 Built-in Cold Boot

In this section, we give insights into the implementation of UEberForensIcs. UEberForensIcs is a forensic cold boot-like acquisition software that is integrated into the firmware.

The use case of UEberForensIcs is that it is pre-installed on the firmware of a computer. While the OS is running, a potential incident happens, and so an incident responder is alerted. The incident responder wants to analyze what happened on the system using memory analysis. For the acquisition, he or she restarts the computer into the EFI Shell and performs memory acquisition using UEberForensIcs. Therefore, the analyst needs no special equipment or installed tools on the host. The dump is transferred via network to the Forensic Workstation, where it can be analyzed.

5.2.1 Implementation

UEberForensIcs can be used as a standalone application or a dynamic command. For the evaluation, we used the latter variant. Basically, UEberForensIcs is implemented as a DXE driver.

We do not save the memory dump on the local drive because that would lead to corruption. Instead, we exfiltrate the data via the network (see also Figure 5.1). So UEberForensIcs requires an active network connection to the Forensic Workstation, so we use EDK II’s TCP stack. The IP address is obtained via DHCP. When the connection is established, UEberForensIcs traverses the memory and sends it page-wise to the Forensic Workstation.

69 5 Bringing Forensic Readiness to Modern Computer Firmware

5.2.2 Evaluation

Gruhn and Freiling [57] provided a framework to evaluate memory acquisition tools in terms of correctness, atomicity and integrity [156]. Basically, cold boot-like attacks are performed atomically, i.e., the RAM module is removed, or there is a hard reset. So in this evaluation, we want to focus on correctness and integrity. For this purpose, we have generated the following four memory dumps, which one can see in Figure 5.2.

OS running Reset into UEFI Shell Dump via UEberForensIcs

Time Q1 Q2 UF Q3

Figure 5.2: Timeline of the system with the four memory dumps.

The dumps are acquired in the following way:

Q1 The first dump is acquired using QEMU’s pmemsave feature while the OS is running. Before pmemsave is started, the system is paused. We consider this dump as the ground truth.

Q2 The second dump is also performed using QEMU’s pmemsave after the reset when the EFI Shell is started. To acquire memory atomically, the system is also paused. Furthermore, the OS is not running anymore. This means that processes that have run before do not alter memory anymore. However, EDK II also overwrites some smaller parts of memory.

UF The third dump is generated with our tool. Since the dump is performed on the same system, we cannot pause the system. However, we consider it to be atomic because the processes and any other thread of the OS processes are not running anymore. The only running activities belong to EDK II and so are not important.

Q3 The last dump is acquired using QEMU’s pmemsave after the UEberForensIcs dump is completed.

The evaluation of correctness and integrity is based on the analysis of differing bytes and pages of different dumps. In Table 5.2 one can see the results of pairwise dumps. A visualization of page-wise diffs can be found in Figure 5.3. A blue pixel indicates that the corresponding page is the same in both dumps. A red pixel indicates that the corresponding pages are differing by at least one byte. There are 1024 rows with 512 pages per line, i.e., 2 MiB. Addresses are growing from left to right and from the bottom to the top.

In the following two sections, we use these results to show to what extent UEberForensIcs affects memory and argue that UEberForensIcs works properly.

70 5.2 Built-in Cold Boot

Table 5.2: The table shows the results of differing bytes of the dumps and the corresponding proportion of total memory that is changed. # Dump 1 Dump 2 Total Pages Total Size Proportion 1 Q1 Q2 8143 24.6 MiB 1.2 % 2 Q1 UF 10245 29.1 MiB 1.4 % 3 Q1 Q3 10260 32.7 MiB 1.6 % 4 Q2 UF 2634 4.9 MiB 0.2 % 5 Q2 Q3 2568 8.5 MiB 0.4 % 6 UF Q3 2588 5.8 MiB 0.3 %

(a) diff (Q1,Q2): 24.6 MiB (b) diff (Q1, UF): 29.1 MiB (c) diff (Q1,Q3): 32.7 MIB

(d) diff (Q2, UF)): 4.9 MiB (e) diff (Q2,Q3): 8.5 MiB (f) diff (UF,Q3): 5.8 MiB

Figure 5.3: Visualization of the page-wise diffs.

71 5 Bringing Forensic Readiness to Modern Computer Firmware

5.2.2.1 Correctness

First, we want to show that UEberForensIcs works properly. To show this, we compare the ground truth (Q1) with the UEberForensIcs dump (UF), i.e., diff (Q1, UF) and with the dump Q3, i.e., diff (Q1,Q3). It is striking that the total differing numbers are in the same order of magnitude (see also Table 5.2). Furthermore, the visualizations of the corresponding diffs in Figure 5.3b and Figure 5.3c show that the diffs are very similar. The ranges of diffing memory regions are basically the same for all dumps. Note thatthedump of UF is made sequentially and transferred via the network. The dump of Q3 is acquired atomically after the execution of UEberForensIcs. So, these diffs are not completely the same.

The diff (Q2, UF) shows that the QEMU pmemsave dump and the UEberForensIcs dump are differing in about 5 MiB. The corresponding visualization in Figure 5.3d also reveals that the corresponding memory regions are basically the same. For diff (UF,Q3) the diff is around 5.8 MiB. However, as Figure 5.3f shows, the differing pages are in the upper memory regions as before. The comparisons of the diffs showed that the dump of UEberForensIcs looks reasonable. Basically, the only differing pages are located in the upper memory regions that wecan also observe with the QEMU built-in pmemsave.

5.2.2.2 Integrity

Memory acquisition using UEberForensIcs is performed on the target system. This means that we do change memory. In this section, we show how much memory is changed, and we also show which parts of the memory get changed by UEberForensIcs. Figure 5.3 gives a good impression of what and how much memory is changed when using UEberForensIcs. All diffs with the dump when the OS was running (Q1) show that some memory is overwritten in the lower memory regions — basically starting at 0x1000000 — when the computer is reset. This region has a size of about 7 MiB. When the system is restarted, there is no change in this memory region anymore (see also Figure 5.3d, Figure 5.3e and Figure 5.3f). Furthermore, Figure 5.3 also shows that the reboot of the system has the most impact. The execution of UEberForensIcs also has impact (see also Figure 5.3d, Figure 5.3e and Figure 5.3f). However, most of these memory regions are changed because of the reboot anyway. Overall, we can say that the execution of UEberForensIcs changes about 32 MiB of the whole memory. Thereof the most amount is overwritten because of the reboot that loads the firmware. The majority of the firmware in our environment was located in the upper memory regions. In the middle of the memory was not a single differing byte at all.

5.2.3 Discussion

Writing software for UEFI is much easier than for the former PC-BIOS. EDK II code is written in C, and there are many features like a full network stack that facilitate the development of own software.

72 5.3 Runtime Service Forensics

The evaluation showed that UEberForensIcs is working properly. However, the whole acquisition process using UEberForensIcs changes about 30 MiB of RAM. This may differ from setup to setup depending on the firmware’s footprint. In our setup, the memory ranges that were changed are located on the upper and lower border of the RAM, and so we argue that memory acquisition using UEberForensIcs is practical since we can acquire most memory atomically and do not rely on software on the host that may be manipulated by malware. However, there are also countermeasures for cold boot attacks. RAM reset on reboot [58], memory scrambling [7] or locking the firmware that an adversary cannot boot from an own device are three examples. In our scenario, we control the firmware, and so we can control that such countermeasures are not implemented or are only effective during normal reboots and not when an analyst is present. This could be indicated by a special hardware device.

5.3 Runtime Service Forensics

In the previous section, we have seen how our UEFI driver can perform cold boot attacks. Now, we want to provide the first steps towards forensic memory acquisition using UEFI drivers at runtime. Thus, incident response teams could extract memory without rebooting and installing any memory acquisition software on the target that would change evidence. In the following sections, we describe how to persist forensic tools in the UEFI RTSs and gain code execution. For this, we provide a proof-of-concept tool that traces all RTS calls. Basically, this technique can also be used to perform memory acquisition in the RTSs.

5.3.1 Implementation

Same as the former tool, the runtime tool is implemented as a DXE driver. However, this time the driver needs to continue execution even after ExitBootServices() is called, something common drivers do not fulfill. Therefore, the driver needs to be of thetype DXE_RUNTIME_DRIVER. Runtime drivers start in the DXE phase and continue executing after the boot process is finished when the OS is running. Also, they have access to both RTS and boot services. Boot services stop to work after the OS loader calls ExitBootServices(), RTS persist after the DXE phase. Their pointers get converted from physical addresses to virtual ones when SetVirtualAddressMap() is called. In order to execute code after the DXE phase, the driver needs to be called by another instance, such as the OS. The OS in our research is an instance we do not control. Hence we decided to take RTSs that are called by the OS as a trigger for code execution. To activate the code execution, we implemented hooks for all RTSs and set them in the DXE phase when it initializes our driver. Therefore, the driver stores the origin service pointer and replaces its address table entry with a pointer to our hook. Furthermore, it registers a notifier to react when theOS calls SetVirtualAddressMap() and converts all pointers. The hooks allow us to execute arbitrary code at runtime, which we developed further to implement an open source RTS tracer. With the tracer, we can follow the called services and view their arguments to analyze the behavior of the UEFI thoroughly. The tracer outputs its information in JSON format.

73 5 Bringing Forensic Readiness to Modern Computer Firmware

[RTSTracer]{ ’service ’: ’GetTime’, ’ id ’ : 0 , ’type ’: ’OUT’ , ’argmuent ’: ’Time’ , ’ data ’ : { ’Year’: 2020, ’Month ’ : 9 , ’Day ’ : 2 2 , ’ Hour ’ : 1 6 , ’Minute’:12 , ’Second ’:49 , ’ Pad1 ’ : 0 , ’Nanoseconds ’:0 , ’TimeZone’:2047 , ’Daylight ’:0 , ’ Pad2 ’ : 0 } } [RTSTracer]{ ’service ’: ’GetTime’, ’ id ’ : 0 , ’type ’: ’OUT’ , ’argument’: ’ Capabilities ’, ’ data ’ : { ’Resolution ’: 0, ’Accuracy ’: 0, ’SetsToZero ’:0 } } Listing 5.2: The log shows an example result provided by the RTS tracer.

An example call can be seen in Listing 5.2. Every JSON object contains one argument of the called service and its data. The JSON is limited to a maximum of 255 characters which is why not all arguments fit in one object. Additionally, every argument is either of the type INput or OUTput and accordantly listed before or after the origin call. The example shows two output arguments of the service GetTime.

5.3.2 Evaluation

The evaluation of the RTS section is split into two parts. First, we show that we have arbitrary code execution at any time at runtime. Second, we evaluate the data created by the RTS tracer and compare various scenarios. As mentioned before, we make use of runtime service hooks to execute code in UEFI. UEFI code execution at runtime could be used by incident response teams to extract memory without rebooting the OS. A requirement for this is that it can be executed at any time. However, our trigger depends on RTSs being called, which is not often the case after

74 5.3 Runtime Service Forensics the user logged in. Nevertheless, our tests show that the OS calls the RTS GetVariable whenever the user reads the efivars (/sys/firmware/efi/ efivars/). To prove this, we successfully modified our hook to force System_Reset as soon as we read efivars. Thus we fulfill the requirement to execute code at any timeat runtime, which finishes the first part of the evaluation. For the second part, we evaluate the RTS tracer. Therefore, we recorded the RTS calls on our Ubuntu VM in six different scenarios: • Boot - We started the VM but did not log in. • Login - We started the VM and logged in as the user. • Working - We started the VM, logged in as the user, and performed standard working tasks for 15 minutes. These tasks were reading, writing, and configuring OS settings. • Hour - We started the VM, logged in as the user, and let it run for one hour. The power save mode caused a lock screen which we unlocked in the end. • Switch - We started the VM and logged in as the user. Afterward, we switched the user. • Reboot - We started the VM, logged in as the user, and rebooted the machine. Then we logged in again. For every scenario, our RTS tracer collected data. We wrote a parser in Python, which is also included in the RTS tracer, to analyze and interpret the information. Table 5.3 shows the summarized results. It shows how often which runtime service was called in each scenario.

Table 5.3: The table shows the number of RTS calls in various scenarios Runtime Service Boot Login Working Hour Switch Reboot GetTime 46 46 46 46 46 92 GetVariable 754 786 786 786 850 1617 SetVariable 110 110 110 110 110 165 GetNextVariableName 499 499 499 499 499 1067 ConvertPointer 91 91 91 91 91 182 Total 1500 1532 1532 1532 1596 3123

The table lists only five of the 14 runtime services that are available according to section5 in the EDK II UEFI Driver Writer’s Guide [151]. Even if the guide lists more services, we only observed these five services in all scenarios. When we look at the different scenarios, we can see that Login, Working, and Hour have the same number of services called. Further, going into more detail, we can see that the arguments of the calls are the same every time. This means that the RTS used during the startup routine remain the same. Moreover, even without studying the EDK II source, we can conclude from the same call number in the three scenarios that during standard OS usage, no RTS is used after the login. The login and logout processes, on the other hand, make use of RTSs. This is shown by the difference of counted calls in the scenarios Boot, Login, and Switch. In the results, we see that 32 additional calls are registered on login and 32 more on logout. All of them are GetVariable calls that request either the OsIndicationsSupported or the

75 5 Bringing Forensic Readiness to Modern Computer Firmware

OsIndications variable. Both variables tell the OS which UEFI firmware features are supported and activated.

The last scenario, Reboot, is different from the previous ones as the counted call number is a lot more. For GetTime and ConvertPointer the numbers are twice as large compared to the Login scenario. This makes sense as we boot the system two times. On the other hand, we count 45 more GetVariable calls, 69 more GetNextVariableName calls, and 55 less SetVariable calls on the second boot process. The reason for this is that the second boot process does not register every variable again but uses already initialized variables from the first boot process. The variable OsIndications, for instance, is only set in the first boot process. Afterwards, it is requested 45 times during the first boot and 46times during the second boot. As shown above, the RTS tracer works well and gives clear insights into the usage of RTSs.

5.3.3 Discussion

AIn this section, we showed how to gain code execution from the UEFI during OS runtime. As a proof-of-concept, we implemented an RTS tracer. The corresponding code is not resident on the hard drive but on the SPI flash chip and copied to the RAM by the system’s firmware. However, in contrast to SMM-based approaches [111], RTSs are not executed on a higher privilege level but on the same as theOS. Developing code for the RTS is much easier than SMM’s 16-bit Real Mode code. Basically, it is also possible to perform memory acquisition from the RTSs. However, the exfiltration of memory is more difficult than in UEberForensIcs. The OS manages the network stack and has configured the network interface card. Other possibilities are touse persistent storage. However, similar to the network interface, the OS manages hard drives. So it is not easy to use the RTS for memory exfiltration without adapting the OS kernel or drivers.

5.4 Related Work

Besides cold boot (see Section 4.3.5.3), there is also related work regarding the UEFI. We are aware of a Master’s thesis [96]. The students made use of a signed UEFI application that was used to dump physical memory to a USB flash drive. They focused on building a static chain of trust whereby the trust anchor is the firmware. Furthermore, there is a blog post by Frisk, who used the UEFI RTSs to circumvent 4 GiB DMA limitations in PCILeech [45]. Usually, the Linux kernel is mapped into the upper physical memory. So PCILeech cannot inject code via 32-bit DMA. To get around this limitation, he manipulated the UEFI RTSs function pointer table — that is located in lower memory regions — to inject its code.

5.5 Discussion

In this chapter, we introduced UEberForensIcs that brings forensic memory acquisition to modern computer firmware. With UEberForensIcs, an analyst is able to perform simple cold boot attacks without any craftsmanship. Additionally, the dump can be considered to

76 5.5 Discussion be atomic. The only precondition is that the UEFI is forensic-ready, i.e., UEberForensIcs must be part of the UEFI before the need for memory acquisition arises. Our evaluation showed that only small distinct parts of the memory get overwritten by the firmware. For the development and evaluation, we used aQEMU VM. However, future work should also consider compatibility and other possible memory layouts on actual physical systems. Furthermore, we demonstrated how to gain code execution with kernel privileges without injecting code into the kernel without any persistent file on the hard drive. Therefore, we hook the UEFI RTSs. As a proof-of-concept, we developed an RTS tracer that traces all occurring RTS calls of the OS kernel. The discussion in Section 5.3.3 yielded that integrating memory acquisition software in the RTSs can be beneficial. However, it is hard to exfiltrate data from there.

77

6 Stealthy Memory Forensics from the BMC

Our survey (see Chapter4) revealed that the lower the privilege ring, the more powerful the acquisition software is since it is usually possible for lower layers to access memory of higher layers but not vice versa. The ideal acquisition method resides at the Device Level (ring -3), is available on all systems without deploying software on demand, and does not terminate the execution of the software running at higher levels. Pre-incident deployment ensures that no unnecessary interaction with the target system has to be made. Even DMA-based memory acquisition techniques can lead to popping windows indicating that a new device is connected [57]. The target system should keep running after the memory acquisition, e.g., to allow an investigator toÅ do some live analysis. Several technologies have been introduced to leverage the privileges of the Device Level. One such prominent technology for server systems is the Baseboard Management Controller (BMC), a co-processor with some firmware that allows an administrator to monitor and administer a server remotely, e.g., via protocols like IPMI [65]. Prominent examples of BMCs are Dell’s iDRAC [30] or Hewlett Packard’s iLO [63]. However, we are not aware that any such system implements generic techniques for forensic memory acquisition. In this chapter, we introduce BMCLeech, the first pre-installable memory acquisition software for BMCs. BMCLeech is software that runs on BMCs and exploits the fact that BMCs are usually attached to the PCIe bus that allows DMA to host memory. DMA through the BMC is arguably rather stealthy because BMC is a standard device in many systems, and the host, therefore, cannot distinguish between “good” BMC activities (like server administration) and “bad” ones (taking a memory snapshot). Given that BMC commonly comes with the server motherboard, we consider BMCLeech equivalent to a pre-installed acquisition software that can be used by incident response teams similar to the Rekall Agent Server [19]. Contribution Our implementation of BMCLeech runs on an ASPEED AST2500 BMC [5]. This kind of BMC is widely distributed and often used by servers that belong to the Open Compute Project (OCP) [114]. Furthermore, there are servers (like Facebook Tioga Pass [163]) that already support OpenBMC [87]. Our BMCLeech implementation runs on OpenBMC utilizing a customized kernel driver that performs the DMA access to the host memory. Basically, the host operating system cannot detect whether BMC is retrieving host memory or not. While the host system can observe that there is a BMC connected to the system, this is nothing suspicious because a server usually ships with a BMC. In this sense, BMCLeech is more stealthy than specialized forensic devices [12]. To summarize, the contributions of this chapter are twofold: • We introduce BMCLeech, the first software that brings forensic-readiness onto the BMC and whose memory acquisition cannot be detected by the target host system. BMCLeech is implemented as a PCILeech device and thus compatible with well- known memory forensic software, so there is no additional effort needed to analyze a

79 6 Stealthy Memory Forensics from the BMC

system’s memory. Even the acquisition software of an analyst does not need to be replaced.

• We provide an evaluation that demonstrates the feasibility and practicality of BMC- Leech.

First, in Section 6.1 we provide some insights into the implementation of BMCLeech. In Section 6.2 we evaluate the tool. Related work is considered in Section 6.3. Finally, there is a discussion in Section 6.4.

6.1 Implementation

In the following section, we want to provide insights into the implementation and architec- ture for BMCLeech. First, an overview of the architecture is given. Then, the components that were implemented are shortly introduced, and the functionality is explained.

6.1.1 Architecture

CPU

Memory

BMC

DMA dedicated getmem BMCLeech eth. port PCILeech

libaspeedxdma Forensic Memory Analysis Tool (e.g., aspeed-xdma dump.bin Rekall, Volatility)

Target Forensic Workstation

Figure 6.1: The architecture of BMCLeech.

Figure 6.1 shows the overall architecture with all relevant components during a forensic analysis. On the right side, one can see the Forensic Workstation. When an analyst wants to acquire memory from the target system, he or she can retrieve the corresponding snapshot of the memory by using PCILeech. By giving PCILeech the IP address, the port, and the desired memory range, PCILeech connects to the given BMCLeech device. BMCLeech is implemented as a PCILeech device. Our BMCLeech implementation currently works on an ASPEED AST2500 BMC and OpenBMC. Basically, it is also possible to use another BMC. Then only the connection to the driver has to be adapted. We implemented libaspeedxdma library to abstract the access to kernel driver (aspeed-xdma). So, other applications on the BMC can access the host memory more easily, e.g., for test reasons, we implemented the getmem application that writes the content of the desired physical address from host memory to stdout. In the end, the kernel driver then accesses the memory via DMA. Eventually, BMCLeech sends the retrieved memory to PCILeech on the Forensic

80 6.1 Implementation

Workstation, where the actual memory dump is stored. The memory can be analyzed using common memory forensic analysis tools like Rekall [146] or Volatility [41].

6.1.2 BMCLeech

BMCLeech implements a PCILeech rawtcp device. When starting the BMCLeech daemon on the BMC, it expects the port it shall use for waiting for commands by PCILeech from the Forensic Workstation. After a connection is established, PCILeech sends a request to BMCLeech. The requests are sent as a rawtcp_cmd that is structured as shown in Listing 6.1. enum rawtcp_cmd { STATUS, /∗ is device ready? ∗/ MEM_READ, /∗ read from memory ∗/ MEM_WRITE /∗ write to memory ∗/ }; struct rawtcp_msg { enum rawtcp_cmd cmd; uint64_t addr; /∗ the address ∗/ uint64_t cb; /∗ the l e n g t h ∗/ }; Listing 6.1: The definitions of the rawtcp_cmd enum and rawtcp_msg struct.

BMCLeech always answers that it is ready if the status is requested because everything is initialized and ready before it is listening on the port. If PCILeech sends a read request, BMCLeech utilizes the underlying library to read from host memory. The libaspeedxdma in turn utilizes the kernel driver to perform the actual DMA operation. Then, the result is sent to the PCILeech client on the Forensic Workstation. If PCILeech wants to write to the host memory, BMCLeech waits to receive the corresponding payload. Again, the payload is written to the host memory using the DMA kernel driver.

It is also possible to use libaspeedxdma without PCILeech. This allows to perform analysis or data and code injection directly on the BMC. Since data is not sent back to a Forensic Workstation, this approach will probably be much faster. Basically, running PCILeech directly on the BMC with BMCLeech is also possible, but there is limited storage on the BMC.

6.1.3 Kernel Driver

Our implementation of BMCLeech is currently implemented for ASPEED AST2500 BMCs running OpenBMC. Nevertheless, BMCLeech can easily be adapted to other DMA engines and Linux distributions. We use Eddie James’ ASPEED AST2500 XDMA kernel module implementation [72] with small modifications. Until now, James’ patches are not merged to the upstream kernel yet. The driver is used to initialize the XDMA hardware as well as performing the actual DMA read and write operations. Basically, the driver provides a device in /dev/aspeed-xdma that is used by libaspeedxdma to communicate with the kernel module.

81 6 Stealthy Memory Forensics from the BMC

6.2 Evaluation

As described in Section 2.3, Gruhn and Freiling provided a framework to evaluate memory acquisition tools in terms of correctness, atomicity and integrity [57]. Since DMA-based memory acquisition showed a startlingly low level of atomicity (compared, for example, to software-based approaches), because BMCLeech is based on DMA, we did not reevaluate atomicity. However, since BMCLeech does not change memory during the acquisition at all, it fully preserves integrity according to the textual definition of integrity by Vömel and Freiling [156]. In our evaluation we therefore focus on the correctness property [156], therefore demonstrating (1) BMCLeech’s feasibility and practicality and (2) the compatibility to PCILeech. The evaluation exhibits how memory changes over time without load and how much may happen in RAM during memory acquisition.

6.2.1 Methodology

For the evaluation, we compare the memory acquired using BMCLeech and a software-based technique that also acquires the physical memory and does not halt the system. Since our target is a Linux system, we consider LiME [144] to be a proper tool that serves as a kind of ground truth. Since LiME operates from the kernel level and does not halt the system, the atomicity and integrity are limited. They can probably be compared with the Windows kernel level software acquisition tools that behave all similar in Gruhn and Freiling’s evaluation [57]. We start the acquisition process with BMCLeech and LiME five times simultaneously. Figure 6.2 shows the timeline of memory acquisition. Using BMCLeech takes significantly longer than LiME.

BMCLeech

푏0 푏1 푏2 푏3 푏4

푙0 푙1 푙2 푙3 푙4 Time

Δ푡 LiME Figure 6.2: The timeline of memory acquisition processes with the corresponding starts of the acquisition processes of BMCLeech b0 to b4 and of LiME l0 to l4.

For the analysis, we calculate and visualize the differences (diffs) of several snapshots. We calculate the diffs per 4 KiB page and per byte. For practical reasons, only the page-wise diffs are visualized. The t of the times when the snapshots are taken is fixed. This is the time it took to acquire the memory using BMCLeech rounded up to the next full minute. The following diffs are calculated: 1. The diffs between sequentially following BMCLeech dumps: diff (b0, b1), diff (b1, b2), diff (b2, b3), diff (b3, b4). 2. The diffs between sequentially following LiME dumps: diff (l0, l1), diff (l1, l2), diff (l2, l3), diff (l3, l4). 3. The diffs between the dumps acquired by BMCLeech and LiME simultaneously: diff (b0, l0), diff (b1, l1), diff (b2, l2), diff (b3, l3), diff (b4, l4).

82 6.2 Evaluation

4. The diffs between the dumps acquired by BMCLeech and LiME shifted by 1: diff (b0, l1), diff (b1, l2), diff (b2, l3), diff (b3, l4). Since the acquisition of the BMCLeech dumps takes significantly longer, we also compare the dumps of a BMCLeech dump and the following LiME dump because the time when bx is finished is nearer to the start of lx+1 than to lx.

6.2.2 Hardware Setup

Our implementation of BMCLeech runs on OpenBMC and the ASPEED AST2500 BMC which is common in the OCP market. However, it is quite hard to find such a system that supports OpenBMC and is not costly. Thus, for the evaluation, we kindly were granted remote access to a Facebook Tioga Pass [163] server that was hosted by the OCP Solution Provider Circle B1. The server comes with an Intel Xeon Gold 6130 CPU with 2.10GHz (16 cores) and 32 GiB of RAM running Ubuntu Linux 18.04.2 LTS with kernel 4.18.0-16.

During the implementation and evaluation of this system, we found out that we cannot access memory above 4 GiB. Since the AST2500 XDMA engine supports 64-bit addressing of the host memory, we assume this is due to an internal PCIe connection. For this reason, we restricted the system to use only the lowest four gigabytes of its memory via the kernel command line. As a result, the effective memory size is now about 1.6 GiB, i.e., 1731760128 bytes and 422793 pages (4 KiB). These are divided into four memory ranges (see Table 6.1) that map to physical memory.

Table 6.1: The memory ranges in our evaluation environment. Start End Size 0x1000 0x9ffff 636 KiB 0x100000 0x66cf9fff 1683432 KiB 0x68dfa000 0x68ff7fff 2040 KiB 0x6f30e000 0x6f7fffff 5064 KiB

Using BMCLeech, the acquisition process over the Internet takes about six minutes. This corresponds to about 5 MiB/s. Local tests on the BMC revealed that the BMC is able to read the memory with about 50 MiB/s. However, we were not able to exfiltrate the data with that speed over the Ethernet port. We were connected using the 100 Mbit Network Controller Sideband Interface (NC-SI) port which is probably the bottleneck, in this case, i.e., the interconnection to that port. Basically, it should be possible to get considerably better transfer rates using a 1 Gbit port. However, in our case, OpenBMC did not work with the dedicated 1 Gbit port. LiME acquires the memory in about 17 seconds. To get more comparable results, there was no bigger interaction or load on the target system during the acquisition processes.

6.2.3 Correctness

Vömel and Freiling [156] define a snapshot to be correct if the actual memory values are acquired when the snapshot was taken (see Section 2.3). Basically, correctness can be

1https://circleb.eu/

83 6 Stealthy Memory Forensics from the BMC taken as granted [57]. Since we introduce a new acquisition tool, we think we have to show that it works properly, though.

DMA memory acquisition works asynchronously, and we perform black-box evaluation on a native system. It is not possible to determine what the actual content of the memory was when it was read. However, we can compare the BMCLeech snapshots with the LiME snapshots and see if the diffs are reasonable. Additionally, we test several PCILeech payloads to demonstrate BMCLeech’s compatibility.

6.2.3.1 Quantitative Analysis

Figure 6.3 shows the results of the diffs that are described in Section 6.2.1. The byte- wise and the page-wise diffs are printed above each page-wise visualized diffs. Note, the visualized diff shows the four memory ranges that are shown inTable 6.1 as a contiguous memory region. Blue pixels in the visualized diffs constitute that the pages have the same content, while red pixels constitute that the corresponding pages differ. To visualize the memory in a reasonable aspect ratio, we introduced some padding pages at the end of the memory dump in grey (that are hardly visible). The memory address is growing from the bottom left to the top right.

First, only looking at the relative differences, one can see that the page-wise diffs are overall bigger than the byte-wise diffs. A single byte difference results in the whole corresponding page is different. Our analysis revealed that, on average, about 1200 bytes differ per differing page. However, the average value is not very meaningful in this case. InFigure 6.4 one can see the distribution of different bytes per differing page. The horizontal axis shows the number of different bytes, while the vertical axis shows the number of pages where x bytes differ. One insight is that many pages differ only by a small number ofbytes.

Basically, all diffs in Figure 6.3 appear to be quite similar. This can already be regarded as a first indicator that BMCLeech works as intended. Furthermore, the most interesting rows, i.e., rows two and three — the rows where the BMCLeech snapshots are compared with the LiME snapshots — do not show any significant differences between the snapshots. The byte-wise diffs reach a maximum of about 2.5%. Because the acquisition with BMCLeech takes significantly longer than with LiME, these values seem to be sound. Besides that, the diffs between the LiME snapshots (in the last row) appear to be similar tothediffs between BMCLeech and LiME. As expected, the diffs of the snapshots that are started simultaneously (third row) have more differences in the higher addresses than in the lower addressees. The diffs of the shifted snapshots (fourth row) have more differences inthe lower addresses. This is because both tools — LiME and BMCLeech — acquire memory from the lowest to the highest address. The time when a high address of bx is acquired is nearer to the time when this address is acquired by lx+1 than by lx (see also Figure 6.2). Comparing the diffs of the BMCLeech snapshots, one can see that these come overall with the biggest differences. This is also an expected result since the time between thefirst byte of bx and the last byte of bx+1 is the longest in our evaluation. However, the extent of differences of these diffs is comparable.

As a result, what one can see between the diffs can be regarded as the distinctive volatility of memory of the system within t. There is no evidence in the diffs that militates against the correctness of BMCLeech.

84 6.2 Evaluation

page-wise: 32801(7.76 %) page-wise: 31826(7.53 %) page-wise: 32343(7.65 %) page-wise: 36337(8.59 %) diff( diff diff( diff( byte-wise: 39696447(2.29 %) byte-wise: 39723129 (2.29 %) byte-wise: 39953909 (2.31 %) byte-wise: 43724088 (2.52 %) f( dif

page-wise: 33422(7.91 %) page-wise: 33466(7.92 %) page-wise: 32365(7.66 %) page-wise: 37627(8.90 %) diff diff( diff diff byte-wise: 39432322 (2.28 %) byte-wise: 40828991 (2.36 %) byte-wise: 40828991 (2.26 %) byte-wise: 44475672 (2.57 %) f( dif

page-wise: 22534(5.33 %) page-wise: 27162(6.42 %) page-wise: 18608(4.40 %) page-wise: 25261(5.97 %) page-wise: 17965(4.25 %) diff( diff diff diff diff( byte-wise: 23210168 (1.34 %) byte-wise: 32600349 (1.88 %) byte-wise: 17442021 (1.00 %) byte-wise: 29740403 (1.71 %) byte-wise: 16072320 (0.93 %) f( dif

page-wise: 25063(5.92 %) page-wise: 30088(7.11 %) page-wise: 27545(6.51 %) page-wise: 35082(8.30 %) diff diff diff diff byte-wise: 29646687 (1.71 %) byte-wise: 38638981 (2.23 %) byte-wise: 34862931 (2.01 %) byte-wise: 43061570 (2.49 %) f( dif

Figure 6.3: Visualization of the diffs of the memory snapshots taken by BMCLeech and LiME. Addresses are growing from the lower left to the upper right.

85 6 Stealthy Memory Forensics from the BMC

600

20000 500

400 15000

300 10000 Number of pages 200 Accumulated differing pages 5000 100

0 0 0 1000 2000 3000 4000 Different bytes per page

Figure 6.4: Distribution of the different bytes per page.

What is also striking in Figure 6.3 is that it seems there are memory regions that are more volatile than others. Especially, there is a conspicuous red bar in the upper area of every diff. There are also other areas that appear to be more red or blue, e.g., in theupperof diffs is a bigger blue area without any red pixels. We want to mention these observations but do not analyze them further.

6.2.3.2 PCILeech Payloads

The comparison of BMCLeech with LiME showed that the snapshots are very similar. As BMCLeech acts as a device for PCILeech, one can not only acquire memory but can also write to the memory. PCILeech comes already with a bunch of payloads. However, we are aware that writing to guest memory can basically not be considered forensically- sound. Nevertheless, we also evaluate the writing capabilities. In total, we used for the evaluation the following PCILeech features:

• Memory snapshot

The first feature is the basic memory acquisition feature dump. As parameter, dump expects the corresponding memory range. All our snapshots are made with this feature. Furthermore, we opened a new text file in Vim and wrote some random content into it. The corresponding content could be found in the memory dump using strings.

• Kernel module injection

PCILeech also supports writing to memory. This allows injecting a kernel module into the host operating system via kmdload. PCILeech first searches for the Linux kernel base and is then able to inject the given kernel module. We injected a PCILeech kernel module that allows it easier to perform further analysis. Finally, the address of the kernel module is communicated to the analyst. Note that this kernel module

86 6.3 Related Work

can also be used to acquire memory that cannot be accessed by DMA, e.g., in our case when only the lower four gigabytes can be accessed.

• File retrieval This payload relies on the kernel module injection above. The injected kernel module is now used to pull files from the target system. For this, one needs the address where the PCILeech kernel module is loaded. Then, one can download the desired file using lx64_filepull. For the evaluation, we downloaded /etc/passwd which was downloaded correctly.

6.3 Related Work

It is well-known that BMCs come with high capabilities and so are also a valuable target for attackers. Farmer [38] revealed some critical issues and vulnerabilities in the IPMI protocol that is used to control the BMCs and maintain the server systems behind. In the same year, Rapid7 released a guide to pentest BMCs and its IPMI stacks [126]. The fact that BMCs are highly privileged and are often directly connected to the Internet — often only protected by a default password — made them ideal attack targets. A good overview of BMC security vulnerabilities and countermeasures was presented at the 2019 Open Source Firmware Conference [2]. Rather astonishingly, most of today’s BMCs do not use secure boot and are therefore vulnerable to bootkit attacks. This has raised some awareness by vendors and clients regarding the security of the BMC and similar co-processors. In 2018 and 2021, Bloomberg published an article about how China uses a tiny chip to infiltrate U.S. companies [129, 130]. While there is still no academically acknowledged evidence for this “hack”, it received a lot of attention regarding the security implications of co-processors. This was reinforced when security experts demonstrated that this kind of attack is practically possible [64]. Today, leading technology companies like Microsoft, Google, Facebook, Intel, IBM, and Dropbox invest much effort into making the BMC open source with OpenBMC [87], a Linux distribution for BMCs. One target is to have complete control over the BMC because currently, the BMC is controlled by the hardware manufacturer, and the firmware is often an ugly, intransparent, and proprietary binary blob. Dropbox recently even started to standardize the BMC hardware with the RunBMC project [134]. Recently, Synacktiv — a French IT security company — made use of an exploit in Hewlett Packard’s iLO software to install DMA acquisition software on iLO 4 [122]. They also implemented their acquisition software as a PCILeech device. However, in contrast to their work, BMCLeech does not rely on an exploit but runs on OpenBMC. Furthermore, BMCLeech is faster in reading the host’s memory via DMA.

6.4 Discussion

In this chapter, we presented BMCLeech, a tool that brings forensic-readiness features into the BMC. The compatibility with the PCILeech framework with all its features makes it a valuable tool for forensic investigations. For instance, it is possible to acquire the host’s memory, inject code into the kernel or pull files from the host’s file system. In

87 6 Stealthy Memory Forensics from the BMC this chapter, we gave insights into the implementation of BMCLeech and showed what steps are necessary to work with PCILeech. The evaluation revealed that this approach is valuable and practical. For this, we calculated diffs between a common software-based memory acquisition tool (LiME) and compared and interpreted the results quantitatively and visually. However, the evaluation also revealed some limitations of our implementation of BMCLeech. First, on the tested platform, we were only able to acquire memory in the 32-bit memory range, i.e., below four gigabytes. This is not a limitation of the used BMC itself as the AST2500 supports 64-bit DMA access but probably the interconnection of the BMC in the system. Future work should check which platforms are affected and how it is possible to access the whole memory. Another insight is that the performance (and therefore the atomicity and the formal notion of integrity [156]) of BMCLeech should be improved. While we were able to read the memory with higher speeds on the BMC (≈ 50 MiB/s), we could not send the dump with that speed over the network. All DMA-based acquisition tools require that the IOMMU, a feature of Intel’s Virtualization Technology for Directed I/O (VT-d) [25], does not block the access to the RAM. However, Marketto et al. [97] also showed that setting up the IOMMU is complicated, and layers communicating with a (malicious) device are usually not hardened. In our scenario, where we do not perform attacks but install BMCLeech to make the BMC forensic-ready, we can assume that if an IOMMU is used, it is configured in a way that BMCLeech works appropriately. BMCLeech also demonstrates the high capabilities of internal (and also external) devices that are connected to a computer system. Often these devices are connected via PCIe and so have DMA access. In our use case, the BMC is intended to be used benignly by incident response teams in a company. However, one should also be aware that such components can also be exploited for malicious purposes.

88 7 Leveraging Intel DCI for Memory Forensics

Most memory acquisition techniques do not read memory atomically. This can lead to wrong results [116]. Another problem with standard memory acquisition tools is that they need to be installed first, which inherently impacts the integrity ofthe RAM and file system.

Intel DCI is an interface that allows debugging Intel CPUs using a USB 3 port without opening the chassis. It was introduced with Intel Skylake in 2015 [52]. The interface provides JTAG debugging capabilities to an x86 system even with a small budget. On such a low level, it is even possible to debug software that is not debuggable with common tools, e.g., the firmware, SMM or VMM. Before accessing memory, the CPU needs to be stopped, which is beneficial for atomicity. Furthermore, one does not need to install any software on the target, but it must be enabled beforehand.

It is not easy to ascertain CPU registers which are quite volatile. Debugging the CPU with Intel DCI allows to stop the CPU and also to read from arbitrary registers, including debugging register. This breaks CPU-bound encryption [107] which holds the encryption key in special registers. Furthermore, it is also possible to read and write memory.

Contribution In this chapter, we research the capabilities of low-level memory acquisition using Intel’s DCI interface. Our main contributions are as follows:

• We introduce DCILeech — the first low-level memory acquisition method that utilizes Intel’s DCI. This technique allows to dump system memory and produce a memory snapshot that satisfies full atomicity and full integrity. No software installation on the target is required. DCILeech benefits from its compatibility to PCILeech46 [ ], which we demonstrate in the evaluation.

• We show how to access the decrypted memory of Intel SGX enclaves with the Debug profile. Using DAL, we were able to access specially protected enclave memory in the EPC.

• We show how to break CPU-bound encryption using Intel DCI.

• We sketch how Intel DCI can be used for the digital forensic triage.

In the following (see Section 7.1), we want to introduce some basics on Intel DCI. Then, in Section 7.2 we give some insights into the implementation of DCILeech which is evaluated in Section 7.3. In Section 7.4 we sketch how DCI can enhance the digital forensic triage. Related work can be found in Section 7.5. Finally, in Section 7.6, we conclude our results and discuss in which cases memory acquisition using Intel DCI is helpful.

89 7 Leveraging Intel DCI for Memory Forensics

7.1 Intel Direct Connect Interface

In this section, we introduce the basics of Intel DCI. DCI allows low-cost closed chassis debugging. There are two possibilities on how to connect host and target. First, there is Intel Silicon View Technology (SVT) — a device connected between the host and the target. Furthermore, there is the possibility to directly connect host and target via a USB 3 debug cable. The debug cable we used is a USB 3 A-to-A cable [68]. It is similar to crossover network cables and can also be hand-crafted from an ordinary USB A-to-A cable.

Debugging using Intel DCI is possible using Intel System Studio which is also offered as a free trial version. This software is needed because it integrates the software stack used for JTAG debugging via DCI. Also, Python-based command line interfaces for the software stacks come with Intel System Studio.

7.1.1 Enabling Intel DCI

Intel DCI is a powerful debug feature that can access random data of random processes. All security measures, such as the ring privileges, are circumvented. For this reason, manufacturers usually disable Intel DCI. In this section, we show how to enable Intel DCI. Basically, some flags are set in the firmware settings to activate Intel DCI. These flags are usually hidden in the firmware settings and written during the boot process intothe corresponding registers. However, some firmware implementations let the user enable DCI in the firmware settings, e.g., some Intel NUC systems135 [ ]. The settings of the different flags in the firmware to enable Intel DCI [36] are shown in Table 7.1. Table 7.2 describes the following values in the two relevant CPU registers that need to be set to enable DCI debugging.

Table 7.1: Flags to enable Intel DCI. Flag Value Description Debug Interface 1 Enables Silicon debug features Debug Interface Lock 0 Allows changes of the MSR Direct Connect Interface 1 Enables DCI DCI Enable (HDCIEN) 1 Indicates that DCI is enabled

Table 7.2: Fields in registers, when DCI is enabled [18]. Register Field Value Description Enable (R/W) 1 Enables debug features IA32_DEBUG_INTERFACE Lock (R/W) 0 Unlocks the MSR Debug Occurred (R/O) 1 Status of Enable bit ECTRL DCI Enable (HDCIEN) 1 DCI Debug is enabled

For some systems it appears to be possible to change those flags using the mm com- mand [153] of the UEFI Shell [52]. In our experiments, however, this approach did not work. For this reason, we enabled DCI in the following way on our analysis machine (described below in Section 7.3.2):

90 7.2 DCILeech: Design and Implementation

1. The firmware was dumped from the EEPROM by directly connecting to itvia Serial Peripheral Interface (SPI). Then, the corresponding clip was attached to the appropriate pins of a Raspberry Pi. We used flashrom [39] to perform the dump. 2. To modify the firmware values, we used the AMI BIOS Configuration Program [3]. The values were set in accordance to Table 7.1. This tool allows to read and modify even hidden firmware settings in the corresponding snapshot. 3. Afterward, the firmware is saved and flashed to the EEPROM using flashrom. Then, one needs to Reset to Default in the firmware settings. This is needed because the current settings are stored on the CMOS chip. Restoring to default causes that the values from the flash chip are used.

7.1.2 OpenIPC and DAL

When using Intel System Studio, one can choose between two providers for DCI debugging. The most recent versions use OpenIPC. Older Intel System Studio versions also support DAL. Basically, both offer software stacks for private JTAG implementations: IEEE 1149.1 and IEEE 1149.7 [52]. These interfaces can also be used from a Python command line interface. For DCILeech we make use of the OpenIPC interface. The following line of Python code, for example, shows how to read from a register: ipc.threads[0].arch_register(REGISTER) So it is possible to read from debug registers or other special-purpose registers. This affects the security of CPU-bound encryption where the encryption key is kept in registers [107]. Similarly, one can read from memory. One only needs to specify the corresponding physical address. Note that it is also possible to specify a desired virtual address. The differences between OpenIPC and DAL are not well documented. One difference we know is that there is a library called ITP that comes with the Python frontend ITPII using the DAL software stack. With this library it is possible to read memory from Intel SGX enclaves (see also Section 7.3.5).

7.2 DCILeech: Design and Implementation

This section gives some insights into the design and implementation of DCILeech. First, we provide a brief overview of the architecture. Then, we explain what steps were necessary to make DCILeech compatible with PCILeech [46].

7.2.1 Architecture

Figure 7.1 shows an overview of the setup we used. On the left side, one can see the Target System and on the right side the Forensic Workstation. DCILeech also benefits from PCILeech’s capabilities [44]. PCILeech is connected to the DCILeech server via a TCP connection, i.e., DCILeech is implemented as a PCILeech device. DCILeech itself uses the command line interface of Intel System Debugger, which can perform debugging on the target system.

91 7 Leveraging Intel DCI for Memory Forensics

PCH TAP

JTAG Intel System DMA Handler DCI Debug Cable Debugger / IPCCLI

CPU TAP DCILeech

TCP

Memory PCILeech

Target System Forensic Workstation

Figure 7.1: Schematic overview of the setup for DCILeech.

On the target system, the USB 3 port is connected to the Platform Control Hub (PCH). The PCH comes with a JTAG handler which is connected to all TAPs (see also Section 2.2.9). All TAPs that are connected and enabled can be used for debugging the corresponding component.

7.2.2 DCILeech

DCILeech implements a PCILeech rawtcp device and can be started within the Intel System Studio GUI or the corresponding Python Debug Shell. It opens a TCP socket and waits for a connection. If a connection is established, DCILeech immediately halts the target’s CPU. This is needed for OpenIPC to read and write memory. A pleasant side effect is that this allows atomic memory dumps. Afterward, PCILeech’s requests are performed. The requests are sent as a rawtcp_cmd that is structured as shown in Listing 7.1.

enum rawtcp_cmd { STATUS, // is device ready? MEM_READ, // read from memory MEM_WRITE, // write to memory DCI_GO, // continue CPU DCI_HALT // h a l t CPU }; struct rawtcp_msg { enum rawtcp_cmd cmd; uint64_t addr; // the address uint64_t cb; // the length }; Listing 7.1: The modified definitions of the rawtcp_cmd enum and rawtcp_msg struct.

92 7.3 Evaluation

If the status is requested, DCILeech always indicates that it is ready because every- thing is already initialized. If PCILeech sends a read request, DCILeech uses the ipc.threads[0].memblock function to read memory via DCI. If PCILeech wants to write to the host memory, DCILeech waits to receive the corresponding payload. Then, it is written to the physical memory of the target system. Therefore, also the memblock function can be used.

7.2.3 PCILeech Patch

As one can see in Listing 7.1, we extended the rawtcp_cmd by two further commands: 1. DCI_GO: continues all CPU threads, and 2. DCI_HALT: halts all CPU threads. DCILeech was designed to be compatible with PCILeech. However, sometimes PCILeech expects injected code to be executed before it can continue. This applies when it comes to kernel module injection. PCILeech waits for a specific physical address that is written by the injected code. Since the CPU is halted, this is never done. Thus, at this point, the DCI_GO command is sent to let the CPU run and execute the injected code. After a second, the CPU is halted again.

7.3 Evaluation

In this section, we evaluate how DCILeech performs regarding the three criteria of memory acquisition defined by Vömel and Freiling [156]: (1) Correctness, (2) atomicity and (3) integrity (see Section 2.3). Furthermore, we discuss the stealthiness (see Section 7.3.4) and demonstrate that we can read data of SGX enclaves (see Section 7.3.5).

Before reading memory, DCILeech halts the CPU, leading to fully atomic dumps. Note, this does not mean that the dump does not show any signs of “interruption”. For example, if the CPU is halted during a critical write, this may lead to some inconsistencies stemming from the fact that some parts of the write operation have already been performed while others have not. However, while the chances are relatively low compared to a CPU that is running during the dump, halting the CPU will always avoid inconsistencies that violate causality (such as the effect of an activity is recorded but not its cause)[156]. DCILeech does not require any driver on the target system, and no code on the target system has to be executed. So we argue that snapshots acquired with DCILeech satisfy full integrity. Subsequent dumps with DCILeech, therefore, yield completely identical results. In the following, we focus on the evaluation of the correctness of DCILeech.

7.3.1 Methodology

For the evaluation, we compare the physical memory acquired using DCILeech and LiME [144]. First, we dump the memory using LiME. Note, during the acquisition with LiME, the CPU is running and writing memory. LiME operates from the kernel level and does not halt the system. Thus, the atomicity and integrity are limited and

93 7 Leveraging Intel DCI for Memory Forensics can probably be compared with the Windows kernel-level software acquisition tools that behave all similar in Gruhn’s and Freiling’s evaluation [57]. After the acquisition with LiME, we dump the memory using DCILeech. During the acquisition with DCILeech, the CPU is halted. Afterward, we calculate the diffs of the dumps. The dumps are compared byte-wise and page-wise (4 KiB). We also show that DCILeech works by demonstrating that PCILeech payloads do work properly.

7.3.2 Hardware Setup

For our experiments, we used a Fujitsu Esprimo Q957 with an i5-7500T (4 cores) with 8 GiB of RAM running Ubuntu with kernel version 5.4.0-42-generic on our target system. Additionally, for testing the compatibility with PCILeech payloads (see Section 7.3.3.2), we installed Windows 10 in dual boot. However, for the evaluation, we had to limit RAM’s size using the mem parameter of GRUB to 1 GiB. This is because the speed of memory acquisition is low (≈ 70 KiB/s), and we wanted to avoid having to wait several days for the outcome of an acquisition operation. It took more than four hours to acquire 1 GiB of RAM. Additionally, DCI debugging is not very stable. In our experiments, the acquisition regularly and annoyingly stopped because the system crashed. Table 7.3 shows the memory ranges of our test setup in detail.

Table 7.3: The memory ranges in our evaluation environment. Start End Size 0x1000 0x9c3ff 621 KiB 0x100000 0x3fffffff 1032192 KiB

7.3.3 Correctness

A snapshot is considered to be correct if the acquired memory values are the values that are stored in memory [156]. For existing memory acquisition tools, correctness can be taken for granted [57]. However, DCILeech is a new acquisition tool, so we need to show that it works correctly. DCILeech uses hardware features that are not supported by any emulator we know. Thus we perform a black box evaluation. We do not know what the actual content of the physical memory is. So we use a LiME dump as a ground truth. The LiME dump is compared with the DCILeech dump. This allows a quantitative discussion of the correctness of DCILeech. Besides, we test different PCILeech features to demonstrate the compatibility of DCILeech.

7.3.3.1 Quantitative Analysis

In Figure 7.2 one can see the visualization of a page-wise (4 KiB) diff of a LiME dump and a DCILeech dump. Addresses are growing from the bottom left to the top right. Blue pixels indicate that there is no difference. The more reddish a pixel is, the more bytes are different in the corresponding page (4 KiB). On the right side, one canseethe

94 7.3 Evaluation

100 % fs perpage(4KiB) dif

0 %

Figure 7.2: Visualization of the diff of the memory snapshots taken by DCILeech and LiME. Blue pixels indicate no change of the corresponding page. The more reddish the pixel, the more bytes are different in the corresponding 4 KiB page. corresponding scale. Gray pixels indicate unmapped space, which is only hardly visible in the last row. One can see that differing pages are spread over the memory space. More reddish areas can be found in upper memory regions. A more detailed analysis revealed that about 50000 (18.76%) pages are different. However, a byte-wise comparison showed that in total, only 38 MiB (3.76%) are different. So, many pages are affected, but in total, not manybytes. So the quantitative analysis reveals that DCILeech appears to work correctly.

7.3.3.2 PCILeech Payloads

The DCILeech and LiME snapshots are relatively similar (96.24% byte-wise). Since DCILeech is implemented as a PCILeech device, it is also implementing write access to physical memory. To show that DCILeech is compatible with PCILeech, we tested some payloads similar as in Section 6.2. We are aware that writing memory is not forensically sound, but it can be useful during a live analysis. If memory is dumped beforehand, it might be beneficial to use some more advanced PCILeech features for the analysis. For the evaluation of the compatibility to PCILeech, we successfully performed the following features:

95 7 Leveraging Intel DCI for Memory Forensics

• Memory snapshot The first feature is the basic memory acquisition feature dump. As parameter, dump expects the corresponding memory range. All our snapshots are made with this feature. • Kernel module injection Since we can write to physical memory, we can exploit this to inject a kernel module into the host operating system via kmdload. PCILeech first searches for the Linux kernel base and is then able to inject the given kernel module. We injected a PCILeech kernel module that allows us to perform further analysis more comfortably. Finally, the address of the kernel module is communicated to the analyst. • File retrieval and file pushing This payload relies on the kernel module injection above. The injected kernel module is now used to pull files from the target system. For this, one needs the address where the PCILeech kernel module is loaded. Then, one can download the desired file using lx64_filepull. Pushing files was also performed using lx64_filepush. The file was then found on the target system. • Windows 10 unlock This payload has been tested with Windows 10 as a target. The payload wx64_unlock searches in the memory for the code of the lock screen and is “shorting” the password query. So, one can log in with an empty password.

7.3.4 Stealthiness

For the user in front of the computer, debugging or memory acquisition using Intel DCI is not stealthy. First, the analyst needs to connect to the system via a USB cable. Furthermore, the CPU is halted, which looks like a freeze. For the OS it is also possible to detect that it was debugged. After some experiments using Arch Linux, the following message appeared: INFO: rcu_preempt detected stalls on CPUs/tasks: [...] NMI watchdog: Watchdog detected hard LOCKUP on cpu [...] Furthermore, it can be detected that DCI is enabled. The CHIPSEC framework comes with a DCI module that checks the registers listed in Table 7.2[18]. If DCI is enabled, it is displayed in the corresponding report.

7.3.5 Intel SGX

In the following, we evaluate DCI’s ability to read the memory of SGX enclaves. In order to do this, we wrote a small program using the Intel SGX SDK [69]. Note, the corresponding enclave is running in the Debug profile. In the enclave, we allocate memory and write the well-known Lena test image with a size of 88 KiB into the enclave’s memory.

96 7.4 Digital Forensic Triage with Intel DCI

Now, we need to find the address of the EPC which contains the data. This can be done using cpuid [22]: cpuid -l 0x12 -s 0x2 Now, it is possible to read the enclave’s data via ITPII. The corresponding function is called edbgread [66] which dumps rather slow with about 4 KiB/s. After dumping some memory, we could find the image in the enclave’s memory. Note that when reading fromthe EPC, OpenIPC returns 0xffffffffffffffff. Reading from the corresponding address with LiME returns seemingly random values because this memory is encrypted. This small experiment shows that we can read decrypted memory from SGX enclaves. Note, testing real-world SGX applications, i.e., with Release profile, was not in this work’s scope. However, we think that the chances are good because there is the set_debugoptin function that can be called via ITPII [66]. This function sets the debug opt-in flag in the SGX Thread Control Structure that the enclave can be debugged. However, future work should consider reading the memory of SGX enclaves with Release profile.

7.4 Digital Forensic Triage with Intel DCI

Memory acquisition using Intel DCI is quite promising. However, it is hard to apply in practice. First, one needs a system that has Intel DCI enabled. Second, the acquisition speed is low. Our evaluation revealed that after hours of memory acquisition, system crashes become likely. In this section, we want to sketch possibilities of using Intel DCI for digital forensic triage. Inspired by triage in medicine, digital forensic triage aims to prioritize the preservation of evidence [106]. In case of digital evidence, this means that the most volatile memory should first be saved. CPU registers can be regarded as the most volatile memory ina computer. Intel DCI allows reading registers without starting and loading special software which would overwrite register contents. System memory is also quite volatile and should also be acquired as fast as possible. In Figure 7.3 we propose a way of digital forensic triage with Intel DCI. Probably the most challenging part of DCI-based memory acquisition is to enable DCI debugging. We found four ways to achieve this. First (1a), there are systems in the wild that allow to enable DCI during runtime [108, 109, 110] which is actually a security vulnerability. Second (1b), some systems allow to enable DCI from the UEFI Shell [52]. For this, the computer needs to be restarted. The third possibility (1c) is to modify the system firmware as we did in Section 7.1.1. Another possibility (1d) is to exploit the firmware. Recently, malware security researchers observed TrickBot scanning for UEFI vulnerabilities that could allow malware to persist in UEFI in the future [20]. Using this technique, it should also be possible to enable DCI. Note, most of the techniques require restarting the target system, limiting the capabilities of Intel DCI for the digital forensic triage. However, researchers also showed that evil maid attacks should be considered [35]. If DCI is enabled, the analyst should first save all register contents (2), including the debug registers. This breaks CPU-bound encryption [107]. Next, if present, Intel SGX enclave memory should be saved (3). Since memory acquisition via DCI is slow, it is recommended to inject an acquisition software that exfiltrates system memory via network ora USB thumb drive. One needs first to save the pages that are later used for the acquisition

97 7 Leveraging Intel DCI for Memory Forensics

Preparation Acquisition

Registers Memory Storage

(4a) (1a) Save Pages for Preconfigured Acquisition SW

(1b) (3) (4b) (5a) DCI (2) Flash Modified Save Enclave Inject Acquisition Save Network Enabled Save Registers Firmware Memory SW Storage

(1c) (4c) Save System (5b) Firmware Memory Save Local Storage Exploitation

Figure 7.3: Digital forensic triage with Intel DCI. software (4a). Then, the acquisition software can be injected (4b). The acquisition software should preserve atomicity. This means either the original threads must not be dispatched, e.g., by injecting a special OS that only dumps system memory. After injection, the context has to be set that all interrupts are received by the new OS. Another possibility is to virtualize the targetOS on-the-fly [118].

Afterward, volatile memory is saved, and the investigator can start with the live analysis, including saving data from network-attached storage (5a). Eventually, the investigator can ascertain local storage (5b).

7.5 Related Work

Intel DCI appears to be less well-explored than other memory acquisition techniques, with only a moderate number of publications in that area. Most notably, Goryachy and Ermolov [52, 53, 54, 55] pioneered the field of DCI research and demonstrated how to use Intel DCI to debug the CPU or PCH. We are aware of only one other public talk on the use of DCI to perform firmware debugging on Intel CPUs74 [ ]. Leveraging Intel DCI for forensic memory acquisition appears not to have been yet explored.

Forensic imaging using JTAG is more usual in the area of smartphones or other embedded devices. In 2006, researchers showed how to use the JTAG boundary scan to create a bitwise image of the memory of an embedded device [11]. This is still a popular technique when it comes to debugging Internet of Things (IoT) devices. Manufacturers of those devices often want to prevent reverse engineering of their products. So they do not label the corresponding pins or distribute them over the whole PCB. The JTAGulator [56] helps to find the pin assignment. Furthermore, JTAG turned out to be beneficial for Android rootkit detection [59]. For this, the authors extracted memory of a smartphone’s kernel memory and reconstructed it for further analysis.

98 7.6 Discussion

7.6 Discussion

In this chapter, we introduced DCILeech that combines two powerful technologies: PCILeech and Intel DCI. No installation on the target is necessary. Furthermore, the CPU is halted. Hence, the OS cannot prevent a debug session, and the dump is performed atomically. We were also able to read from registers, e.g., drX, xmmX and ymmX which breaks CPU-bound encryption. However, the evaluation in Section 7.3 also revealed some shortcomings. First, the acquisition speed is rather low, only about 70 KiB/s. We also had some crashes on the target side during long acquisition sessions. Future work should consider injecting an acquisition tool that can dump with more speed. Register contents and required memory pages can be dumped via DCI, beforehand. Hence, such a hybrid approach could also dump memory atomically. It only has to be guaranteed that other memory regions are not affected. This would probably be a terminating memory acquisition technique because the originalOS must not be running when the acquisition is in progress. Another problem is deployment. While no software needs to be installed on the target system, DCI debugging has to be enabled. For security reasons, manufacturers disable it. However, it also happens that it can be enabled during runtime [108, 109, 110]. Our approach to activate DCI debugging might not be applicable for on-site forensic investigations. However, previous work showed that it is possible to enable DCI from the UEFI Shell [52]. Other researchers showed that an evil maid attack is also possible [35]. They modified the firmware of a computer in about four minutes. Even though memory acquisition techniques using DCI are very beneficial in terms of integrity and atomicity, we would not recommend enabling DCI by default for forensic readiness. However, it might be beneficial to make it possible to enable DCI in a secured way. It is too powerful and can also be misused. However, it can also be used for offensive research because one can get “ground truths” and get insights into software components that are usually not accessible. Also, the ability to read volatile register values is unique.

99

8 Conclusion

In this thesis, we unveiled forensic event reconstruction limitations when only standard Linux log files are considered. Therefore, we traced all system calls of the systemusing VMI and treated them as structured log files. While this was quite beneficial forthe size of the corresponding characteristic fingerprints, it significantly impacted performance. However, it is unnecessary to trace all system calls to get good results for the characteristic fingerprints and matching. So, we showed how to increase performance by only tracing relevant system calls. We were able to calculate characteristic fingerprints for most events. However, there are events and digital evidence that cannot be detected with such an approach. For system call tracing, we already made use of memory analysis. The system memory may contain far more digital evidence, such as encryption keys. Before the memory analysis, the memory needs to be acquired first. We researched the landscape of memory acquisition techniques. Therefore we created a universal taxonomy and survey of those. The survey revealed that the lower the layer and so higher the technique’s privileges, the more powerful it is. Most popular tools operate o Kernel Level (KL) or sometimes Hypervisor Level (HL). Hence, the rest of this thesis was about researching new forensic memory acquisition methods on low layers. First, we introduced UEberForensIcs, which is similar to cold boot attacks. It integrates forensic readiness into the firmware of a computer. Furthermore, we showed how to use UEFI RTS for memory acquisition. In this case, data exfiltration is challenging. Servers usually come with a BMC. The BMC is used for remote maintenance tasks. Some BMCs are very powerful connected and have DMA access. BMCLeech is a small memory acquisition tool that brings stealthy memory acquisition to the BMC. BMCLeech comes with all the pros and cons like other DMA-based tools, i.e., it is quite stealthy, but these techniques suffer from bad atomicity. The last memory acquisition technique we introduce is based on Intel DCI. With our software DCILeech and a cheap USB cable, we could debug the target system using JTAG. During the debug session, the CPU is halted, so memory is acquired atomically. With this technique, we were also able to save register values. Hence, this is the most powerful technique which we introduced in this thesis. This thesis showed that memory acquisition on lower layers is beneficial. Today’s computer hardware often comes with highly privileged co-processors or external interfaces which can be exploited for forensic investigations. However, increasing system security makes it also harder to deploy such software a posteriori. The tools and techniques which we introduced in this thesis have in common that they need to be installed a priori. Especially for companies that are interested in a forensically-sound investigation when an incident occurs, this might be the way to go: It allows a robust forensic investigation while not relying on “obscure” exploits or other measures that threaten the integrity of the target. A

101 8 Conclusion good memory acquisition technique is powerful and cannot be deluded. However, with high power and high privileges, one also needs to act carefully and protect it from unauthorized access. Otherwise, it might become a hilarious meme.

102 Bibliography

[1] Advanced Microsystem Devices Inc. Secure Encrypted Virtualization API, 2019. URL https://developer.amd.com/wp-content/resources/55766.PDF. Accessed: 2021-02-15.

[2] Rick Altherr. Common BMC Vulnerabilities And How to Avoid Repeating Them. In Open Source Firmware Conference, 2019. URL https://2019.osfc.io/uploads/ talk/paper/39/Common_BMC_vulnerabilities_and_how_to_avoid_repeating_ them.pdf. Accessed: 2021-02-15.

[3] American Megatrends Incorporation. UEFI/BIOS Utilities, 2021. URL https://www.ami.com/products/firmware-tools-and-utilities/bios-uefi- utilities/. Accessed: 2021-02-12.

[4] Apple Inc. Apple M1, 2020. URL https://www.apple.com/mac/m1/. Accessed: 2021-02-15.

[5] ASPEED Technology Inc. AST2500, 2021. URL https://www.aspeedtech.com/ server_ast2500/. Accessed: 2021-02-15.

[6] Shadi Al Awawdeh, Ibrahim M. Baggili, Andrew Marrington, and Farkhund Iqbal. CAT Record (computer activity timeline record): A unified agent based approach for real time computer forensic evidence collection. In Eighth International Workshop on Systematic Approaches to Digital Forensic Engineering, SADFE 2013, Hong Kong, China, November 21-22, 2013, pages 1–8, 2013. doi: 10.1109/SADFE.2013.6911539. URL https://doi.org/10.1109/SADFE.2013.6911539.

[7] Johannes Bauer, Michael Gruhn, and Felix Freiling. Lest we forget: Cold-boot attacks on scrambled DDR3 memory. Digital Investigation, 16:65–74, 2016. doi: 10.1016/j. diin.2016.01.009. URL https://dx.doi.org/10.1016/j.diin.2016.01.009.

[8] Michael Becher, Maximillian Dornseif, and Christian N Klein. FireWire: all your mem- ory are belong to us. Proceedings of CanSecWest, 2005. URL https://cansecwest. com/core05/2005-firewire-cansecwest.pdf. Accessed: 2021-02-15.

[9] Jeffrey Benner. When Gamer Humor Attacks. Wired, 2001. URL https://www. wired.com/2001/02/when-gamer-humor-attacks/. Accessed: 2021-02-15.

[10] Leyla Bilge, Thorsten Strufe, Davide Balzarotti, and Engin Kirda. All your contacts are belong to us: automated identity theft attacks on social networks. In Proceedings of the 18th international conference on World wide web, pages 551–560, 2009.

[11] Ing. M. F. Breeuwsma. Forensic imaging of embedded systems using JTAG (boundary- scan). Digital Investigation, 3(1):32–42, 2006. doi: 10.1016/j.diin.2006.01.003. URL https://doi.org/10.1016/j.diin.2006.01.003.

103 Bibliography

[12] Brian D. Carrier and Joe Grand. A hardware-based memory acquisition procedure for digital investigations. Digital Investigation, 1(1):50–60, 2004. doi: 10.1016/j.diin. 2003.12.001. URL https://doi.org/10.1016/j.diin.2003.12.001. [13] Andrew Case and Golden G. Richard III. Memory forensics: The path forward. Digital Investigation, 20:23–33, 2017. doi: 10.1016/j.diin.2016.12.004. URL https:// doi.org/10.1016/j.diin.2016.12.004.

[14] Eoghan Casey. What does “forensically sound” really mean? Digital Inves- tigation, 4(2):49–50, 2007. ISSN 1742-2876. doi: https://doi.org/10.1016/j. diin.2007.05.001. URL http://www.sciencedirect.com/science/article/pii/ S1742287607000333. [15] Cellebrite Mobile Synchronization Ltd. UFED Hardware Platforms, 2021. URL https://www.cellebrite.com/en/platforms/. Accessed: 2021-02-15. [16] Yoan Chabot, Aurélie Bertaux, Christophe Nicolle, and M. Tahar Kechadi. An ontology-based approach for the reconstruction and analysis of digital incidents timelines. Digital Investigation, 15:83–100, 2015. doi: 10.1016/j.diin.2015.07.005. URL https://doi.org/10.1016/j.diin.2015.07.005. [17] P. M. Chen and B. D. Noble. When virtual is better than real [operating system relocation to virtual machines]. In Proceedings Eighth Workshop on Hot Topics in Operating Systems, pages 133–138, 2001. doi: 10.1109/HOTOS.2001.990073. [18] Chipsec. CHIPSEC: Platform Security Assessment Framework, 2014. URL https:// github.com/chipsec/chipsec. Accessed: 2021-02-04.

[19] Michael Cohen. Rekall Agent User Manual, 2019. URL http://www.rekall- forensic.com/documentation-1/rekall-documentation/user-manual. Ac- cessed: 2021-02-16.

[20] Lucian Constantin. TrickBot gets new UEFI attack capability that makes recovery in- credibly hard, 2020. URL https://www.csoonline.com/article/3599908/ trickbot-gets-new-uefi-attack-capability-that-makes-recovery- incredibly-hard.html. Accessed: 2021-02-05.

[21] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, volume 1 Basic Architecture, chapter 6.3.5 Calls to Other Privilege Levels. Intel Corporation, 2006.

[22] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, volume 3D: System Programming Guide, Part 4, chapter 36. Intel Corporation, 2015.

[23] Intel Corporation. Intel 64 and IA-32 Architectures Software Developer’s Manual, volume 3A: System Programming Guide, Part 1, chapter 34: System Management Mode. Intel Corporation, 2016.

[24] Intel Corporation. Memory Encryption Technologies Specification, 2017. URL https://software.intel.com/sites/default/files/managed/a5/16/Multi- Key-Total-Memory-Encryption-Spec.pdf. Rev: 1.1. Accessed: 2021-02-22. [25] Intel Corporation. Intel Virtualization Technology for Directed I/O - Architecture Specification, 2019. URL https://software.intel.com/sites/default/files/ managed/c5/15/vt-directed-io-spec.pdf. Accessed: 2021-02-15.

104 Bibliography

[26] Intel Corporation. Intel Software Guard Extensions (Intel SGX), 2021. URL https:// software.intel.com/en-us/sgx. Accessed: 2021-02-22. [27] Intel Corporation. Intel Virtualization Technolgy (Intel VT), 2021. URL https:// www.intel.com/content/www/us/en/virtualization/virtualization- technology/intel-virtualization-technology.html. Accessed: 2021-02- 22. [28] Doug Covellli. Vmss2core, 2017. URL https://labs.vmware.com/flings/ vmss2core. Accessed: 2021-02-22. [29] Access Data. FTK Imager, 2017. URL http://accessdata.com/product- download/ftk-imager-version-4.1.1. Accessed: 2021-02-22. [30] Dell Inc. Integrated Dell Remote Access Controller 8 (iDRAC8) Version 2.00.00.00 User’s Guide, 2021. URL https://www.dell.com/support/manuals/en-us/ integrated-dell-remote-access-cntrllr-8-with-lifecycle-controller- v2.00.00.00/idrac8_ug_pub-v1/overview?guid=guid-1442cb67-030e-474c- 8cfd-2e12dd4cb7db. Accessed: 2021-02-22. [31] Benjamin Delpy. mimikatz, 2014. URL https://github.com/gentilkiwi/ mimikatz. Accessed: 2021-01-29. [32] Andreas Dewald. Characteristic evidence, counter evidence and reconstruction problems in forensic computing. it - Information Technology, 57(6):339–346, 2015. [33] Distributed Management Task Force. Redfish, 2021. URL https://www.dmtf.org/ standards/redfish. Accessed: 2021-02-26. [34] Lance Osborne Donnie Bell and Jon McGary. DRAC: Dell Remote Access Card for Server Management, 2002. URL http://www1.euro.dell.com/content/topics/ global.aspx/power/en/ps2q02_bell. Accessed: 2021-02-22. [35] Eclypsium Incorporation. Eclypsium Evil Maid Attack Demo, 2018. URL https:// www.youtube.com/watch?v=loBX_vEXxVA. Accessed: 2021-02-22. [36] eiselekd. Enable DCI debugging on Gigabyte-BKi5HA-7200, 2020. URL https:// gist.github.com/eiselekd/d235b52a1615c79d3c6b3912731ab9b2. [37] Elasticsearch B.V. Filebeat - Lightweight Shipper for Logs, 2021. URL https:// www.elastic.co/products/beats/filebeat. Accessed: 2021-02-12. [38] Dan Farmer. Sold down the River, 2013. URL http://fish2.com/ipmi/river.pdf. Accessed: 2021-02-22. [39] flashrom team. flashrom, 2020. URL https://www.flashrom.org/Flashrom. Ac- cessed: 2021-02-04. [40] Foundation. GDB: The GNU Project Debugger, 2017. URL https:// www.gnu.org/software/gdb. Accessed: 2021-02-22. [41] Volatility Foundation. Volatility Framework - Volatile memory extraction utility framework, 2016. URL https://github.com/volatilityfoundation/volatility. Accessed: 2021-02-26. [42] Felix Freiling and Leonhard Hösch. Controlled experiments in digital evidence tampering. Digital Investigation, 24:83–92, 2018. doi: 10.1016/j.diin.2018.01.011. URL https://doi.org/10.1016/j.diin.2018.01.011.

105 Bibliography

[43] Felix Freiling, Tobias Groß, Tobias Latzo, Tilo Müller, and Ralph Palutke. Advances in Forensic Data Acquisition. IEEE Design & Test, 35(5):63–74, 2018. doi: 10.1109/ MDAT.2018.2862366. URL https://doi.org/10.1109/MDAT.2018.2862366.

[44] Ulf Frisk. Direct Memory Attack the KERNEL, 2016. URL https://media. defcon.org/DEF%20CON%2024/DEF%20CON%2024%20presentations/DEFCON-24- Ulf-Frisk-Direct-Memory-Attack-the-Kernel.pdf. Accessed: 2021-02-22.

[45] Ulf Frisk. Attacking UEFI Runtime Services and Linux, 2017. URL http://blog. frizk.net/2017/01/attacking-uefi-and-linux.html. Accessed: 2021-02-22.

[46] Ulf Frisk. PCILeech, 2021. URL https://github.com/ufrisk/pcileech. Accessed: 2021-02-22.

[47] Simson L. Garfinkel. Digital forensics research: The next 10 years. Digital In- vestigation, 7:64–73, 2010. ISSN 1742-2876. doi: http://dx.doi.org/10.1016/j. diin.2010.05.009. URL http://www.sciencedirect.com/science/article/pii/ S1742287610000368.

[48] Rainer Gerhards. The syslog protocol. RFC 5424, Internet Engineering Task Force, 2009. URL https://tools.ietf.org/html/rfc5424. Accessed: 2021-02-12.

[49] Pavel Gladyshev and Andreas Enbacka. Rigorous Development of Automated Inconsistency Checks for Digital Evidence Using the B Method. IJDE, 6(2), 2007. URL http://www.utica.edu/academic/institutes/ecii/publications/ articles/1C35450B-E896-6876-9E80DA0F9FEEF98B.pdf.

[50] Pavel Gladyshev and Ahmed Patel. Finite state machine approach to digital event reconstruction. Digital Investigation, 1(2):130–149, 2004. doi: 10.1016/j.diin.2004.03. 001. URL https://doi.org/10.1016/j.diin.2004.03.001.

[51] Sean E Goodison, Robert C Davis, and Brian A Jackson. Digital evidence and the US criminal justice system. Identifying Technology and Other Needs to More Effectively Acquire and Utilize Digital Evidence. Priority Criminal Justice Needs Initiative. Rand Corporation, 2015.

[52] Maxim Goryachy and Mark Ermolov. Tapping into the Core. 33rd Chaos Communi- cation Congress, 2016.

[53] Maxim Goryachy and Mark Ermolov. Intel DCI Secrets. The 8th Annual HITB Security Conference in The Netherlands, 2017. URL https://conference.hitb. org/hitbsecconf2017ams/materials/D2T4%20-%20Maxim%20Goryachy%20and% 20Mark%20Ermalov%20-%20Intel%20DCI%20Secrets.pdf.

[54] Maxim Goryachy and Mark Ermolov. Inside Intel Management Engine. 34th Chaos Communication Congress, 2017.

[55] Maxim Goryachy and Mark Ermolov. Where there’s a JTAG, there’s a way: obtaining full system access via USB, 2017. URL https://www.ptsecurity.com/ww-en/ analytics/where-theres-a-jtag-theres-a-way/. Accessed: 2021-02-05.

[56] Joe Grand. JTAGulator, 2021. URL http://www.grandideastudio.com/ jtagulator/. Accessed: 2021-02-04.

106 Bibliography

[57] Michael Gruhn and Felix Freiling. Evaluating atomicity, and integrity of correct memory acquisition methods. Digital Investigation, 16(Supplement):1–10, 2016. ISSN 1742-2876. doi: 10.1016/j.diin.2016.01.003. URL http://www.sciencedirect.com/ science/article/pii/S1742287616000049. [58] Michael Gruhn and Tilo Müller. On the Practicability of Cold Boot Attacks. In 2013 International Conference on Availability, Reliability and Security, ARES 2013, Regensburg, Germany, September 2-6, 2013, pages 390–397. IEEE Computer Society, 2013. doi: 10.1109/ARES.2013.52. [59] Mordechai Guri, Yuri Poliak, Bracha Shapira, and Yuval Elovici. JoKER: Trusted Detection of Kernel Rootkits in Android Devices via JTAG Interface. In 2015 IEEE TrustCom/BigDataSE/ISPA, Helsinki, Finland, August 20-22, 2015, Volume 1, pages 65–73. IEEE, 2015. doi: 10.1109/Trustcom.2015.358. URL https://doi.org/10. 1109/Trustcom.2015.358. [60] J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. Lest We Remember: Cold Boot Attacks on Encryption Keys. In Paul C. van Oorschot, editor, Proceedings of the 17th USENIX Security Symposium, July 28-August 1, 2008, San Jose, CA, USA, pages 45–60. USENIX Association, 2008. URL http://www. usenix.org/events/sec08/tech/full%5Fpapers/halderman/halderman.pdf. [61] J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson, William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, and Edward W. Felten. Lest We Remember: Cold Boot Attacks on Encryption Keys. In Proceedings of the 17th USENIX Security Symposium, July 28-August 1, 2008, San Jose, CA, USA, pages 45–60. USENIX Association, 2008. URL http://www.usenix.org/events/ sec08/tech/full_papers/halderman/halderman.pdf. [62] Nadia Heninger and Ariel Feldman. AESKeyFinder 1.0, 2008. URL https://github. com/eugenekolo/sec-tools/tree/master/crypto/aeskeyfind/aeskeyfind. Ac- cessed: 2021-02-22. [63] Hewlett Packard Enterprise Company. HPE Integrated Lights Out (iLO), 2021. URL https://www.hpe.com/de/de/servers/integrated-lights-out-ilo.html. Accessed: 2021-02-22.

[64] Trammel Hudson. Modchips of the state. 35th Chaos Communication Congress, 2018. URL https://trmm.net/Modchips. Accessed: 2021-02-22. [65] Intel, Hewlett-Packard, NEC, and Dell. IPMI Specification v2.0, 2013. URL https://www.intel.de/content/www/de/de/products/docs/servers/ipmi/ ipmi-second-gen-interface-spec-v2-rev1-1.html. Accessed: 2021-02-22. [66] Intel Corporation. Intel DFx Abstraction Layer Python Command Line Interface, 2016. Documentation is part of Intel System Studio. [67] Intel Corporation. Intel Active Management Technology, 2021. URL https://www.intel.com/content/www/us/en/architecture-and-technology/ intel-active-management-technology.html. Accessed: 2021-02-26. [68] Intel Corporation. C01 - Intel SVT DCI DbC2/3 A-to-A Debug Cable 1 Meter, 2021. URL https://designintools.intel.com/SVT_DCI_DbC2_3_A_to_A_Debug_ Cable_1_Meter_p/itpdciamam1m.htm. Accessed: 2021-02-04.

107 Bibliography

[69] Intel Corporation. Intel Software Guard Extensions for Linux OS, 2021. URL https://github.com/intel/linux-sgx. Accessed: 2021-02-06.

[70] International Organization for Standardization. ISO/IEC 27043:2015: Information technology – Security techniques – Incident investigation principles and processes, 2015.

[71] Bhushan Jain, Mirza Basim Baig, Dongli Zhang, Donald E. Porter, and Radu Sion. SoK: Introspections on Trust and the Semantic Gap. In 2014 IEEE Symposium on Security and Privacy, SP 2014, Berkeley, CA, USA, May 18-21, 2014, pages 605–620. IEEE Computer Society, 2014. doi: 10.1109/SP.2014.45. URL https://doi.org/ 10.1109/SP.2014.45.

[72] Eddie James. Add Aspeed XDMA Engine Driver, 2019. URL https://lkml.org/ lkml/2019/7/1/748. Accessed: 2021-02-22.

[73] Joshua I. James and Pavel Gladyshev. Automated inference of past action instances in digital investigations. Int. J. Inf. Sec., 14(3):249–261, 2015. doi: 10.1007/s10207- 014-0249-6. URL https://doi.org/10.1007/s10207-014-0249-6.

[74] Maggie Jauregui. Intro to Closed Chassis Debugging. 2nd Open Source Firmware Con- ference, 2019. URL https://2019.osfc.io/uploads/talk/paper/18/Debugging_ Intel_Firmware_using_DCI___USB_3.0.pdf. Accessed: 2021-02-22.

[75] David Kaplan. AMD x86 Memory Encryption Technologies, August 2016. URL https://www.usenix.org/conference/usenixsecurity16/technical- sessions/presentation/kaplan. Accessed: 2021-02-22.

[76] M. N. A. Khan, Chris R. Chatwin, and Rupert C. D. Young. A framework for post-event timeline reconstruction using neural networks. Digital Investigation, 4 (3-4):146–157, 2007. doi: 10.1016/j.diin.2007.11.001. URL https://doi.org/10. 1016/j.diin.2007.11.001.

[77] Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, and Yuval Yarom. Spectre attacks: exploiting speculative execution. Communications of the ACM, 63(7):93–101, 2020. doi: 10.1145/3399742. URL https://doi.org/10. 1145/3399742.

[78] Andrea Lanzi, Davide Balzarotti, Christopher Kruegel, Mihai Christodorescu, and Engin Kirda. AccessMiner: using system-centric models for malware protection. In Ehab Al-Shaer, Angelos D. Keromytis, and Vitaly Shmatikov, editors, Proceedings of the 17th ACM Conference on Computer and Communications Security, CCS 2010, Chicago, Illinois, USA, October 4-8, 2010, pages 399–412. ACM, 2010. doi: 10.1145/1866307.1866353. URL https://doi.org/10.1145/1866307.1866353.

[79] Tobias Latzo. Efficient Fingerprint Matching for Forensic Event Reconstruction. In Digital Forensics and Cyber Crime. 11th EAI International Conference, ICDF2C 2020, Boston, MA, USA, October 15-16, 2020, Proceedings, volume 351, pages 98– 120. Springer, 2021. doi: 10.1007/978-3-030-68734-2_6. URL https://doi.org/ 10.1007/978-3-030-68734-2_6.

108 Bibliography

[80] Tobias Latzo and Felix Freiling. Characterizing the Limitations of Forensic Event Reconstruction Based on Log Files. In 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications / 13th IEEE International Conference On Big Data Science And Engineering, TrustCom/Big- DataSE 2019, Rotorua, New Zealand, August 5-8, 2019, pages 466–475. IEEE, 2019. doi: 10.1109/TrustCom/BigDataSE.2019.00069. URL https://doi.org/10.1109/ TrustCom/BigDataSE.2019.00069.

[81] Tobias Latzo, Ralph Palutke, and Felix Freiling. A universal taxonomy and survey of forensic memory acquisition techniques. Digital Investigation, 28(Supplement): 56–69, 2019. doi: 10.1016/j.diinc.2019.01.001. URL https://doi.org/10.1016/j. diin.2019.01.001.

[82] Tobias Latzo, Julian Brost, and Felix Freiling. BMCLeech: Introducing Stealthy Mem- ory Forensics to BMC. In Digital Forensics Research Workshop EU 2020, DFRWS EU 2020, volume 32, page 300919. Elsevier, 2020. doi: 10.1016/j.fsidi.2020.300919. URL http://www.sciencedirect.com/science/article/pii/S2666281720300147.

[83] Tobias Latzo, Florian Hantke, Lukas Kotschi, and Felix Freiling. Bringing Forensic Readiness to Modern Computer Firmware. In Digital Forensics Research Workshop EU 2021, DFRWS EU 2021, 2021. URL https://dfrws.org/wp-content/uploads/ 2021/03/Bringing-Forensic-Readiness-to-Modern-Computer-Firmware.pdf.

[84] Tobias Latzo, Matti Schulze, and Felix Freiling. Leveraging Intel DCI for Memory Forensics. In Digital Forensics Research Workshop US 2021, DFRWS US 2021, 2021. URL https://dfrws.org/wp-content/uploads/2021/05/2021_USA_paper- leveraging_intel_dci_for_memory_forensics.pdf.

[85] Lauterbach Gmbh. Intel x86/x64 Debugger, 2020. URL https://www2.lauterbach. com/pdf/debugger_x86.pdf. Accessed: 2021-02-06.

[86] Kevin P Lawton. Bochs: A Portable PC Emulator for Unix/X. Linux Journal, 1996 (29es):7, 1996.

[87] LF Projects. Defining a Standard Baseboard Management Controller Firmware Stack, 2021. URL https://www.openbmc.org/. Accessed: 2021-02-12.

[88] Zhenmin Li, Jed Taylor, Elizabeth Partridge, Yuanyuan Zhou, William Yurcik, Cristina Abad, James J Barlow, and Jeff Rosendale. Uclog: A unified, correlated logging architecture for intrusion detection. In the 12th International Conference on Telecommunication Systems-Modeling and Analysis (ICTSM), 2004. [89] Yi-Ching Liao and Hanno Langweg. Cost-benefit analysis of kernel tracing systems for forensic readiness. In Noureddine Boudriga and Slim Rekhis, editors, Proceedings of the 2nd International Workshop on Security and Forensics in Communication Systems, SFCS 2014, Kyoto, Japan, June 3, 2014, pages 25–36. ACM, 2014. doi: 10.1145/2598918.2598921. URL https://doi.org/10.1145/2598918.2598921.

[90] Xiaodong Lin. Log Analysis. In Introductory Computer Forensics: A Hands-on Practical Approach, pages 305–332. Springer International Publishing, Cham, 2018. ISBN 978-3-030-00581-8.

[91] Linux man-pages project. core - core dump file, 2020. URL https://man7.org/ linux/man-pages/man5/core.5.html. Accessed: 2021-02-16.

109 Bibliography

[92] Moritz Lipp, Michael Schwarz, Daniel Gruss, Thomas Prescher, Werner Haas, Jann Horn, Stefan Mangard, Paul Kocher, Daniel Genkin, Yuval Yarom, Mike Ham- burg, and Raoul Strackx. Meltdown: reading kernel memory from user space. Communications of the ACM, 63(6):46–56, 2020. doi: 10.1145/3357033. URL https://doi.org/10.1145/3357033.

[93] Carsten Maartmann-Moe. Inception, 2018. URL https://github.com/carmaa/ inception. Accessed: 2021-02-22.

[94] Marcel Mangel and Sebastian Bicchi. JTAG. In Praktische Einführung in Hardware Hacking, chapter 4.4.1, page 106. mitp Verlag, 2020.

[95] ManTech CSI, Inc. Memory DD, 2009. URL http://sourceforge.net/projects/ mdd/files/. Accessed: 2021-02-22.

[96] Michel Markanovic and Simeon Persson. Trusted memory acquisition using UEFI, 2014. URL http://www.diva-portal.org/smash/get/diva2:830892/ FULLTEXT01.pdf. Accessed: 2021-02-22.

[97] A. Theodore Markettos, Colin Rothwell, Brett F. Gutstein, Allison Pearce, Peter G. Neumann, Simon W. Moore, and Robert N. M. Watson. Thunderclap: Exploring Vul- nerabilities in Operating System IOMMU Protection via DMA from Untrustworthy Peripherals. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet So- ciety, 2019. URL https://www.ndss-symposium.org/ndss-paper/thunderclap- exploring-vulnerabilities-in-operating-system-iommu-protection-via- dma-from-untrustworthy-peripherals/.

[98] Andrew Marrington, George M. Mohay, Hasmukh Morarji, and Andrew J. Clark. A Model for Computer Profiling. In ARES 2010, Fifth International Conference on Availability, Reliability and Security, 15-18 February 2010, Krakow, Poland, pages 635–640, 2010. doi: 10.1109/ARES.2010.95. URL https://doi.org/10.1109/ARES. 2010.95.

[99] Lorenzo Martignoni, Aristide Fattori, Roberto Paleari, and Lorenzo Cavallaro. Live and Trustworthy Forensic Analysis of Commodity Production Systems. In Somesh Jha, Robin Sommer, and Christian Kreibich, editors, Recent Advances in Intrusion Detection, 13th International Symposium, RAID 2010, Ottawa, Ontario, Canada, September 15-17, 2010. Proceedings, volume 6307 of Lecture Notes in Computer Science, pages 297–316. Springer, 2010. doi: 10.1007/978-3-642-15512-3_16. URL https://doi.org/10.1007/978-3-642-15512-3_16.

[100] Florian Menges, Fabian Böhm, Manfred Vielberth, Alexander Puchta, Benjamin Taubmann, Noëlle Rakotondravony, and Tobias Latzo. Introducing DINGfest: An architecture for next generation SIEM systems. In Hanno Langweg, Michael Meier, Bernhard C. Witt, and Delphine Reinhardt, editors, Sicherheit 2018, Beiträge der 9. Jahrestagung des Fachbereichs Sicherheit der Gesellschaft für Informatik e.V. (GI), 25.-27.4.2018, Konstanz, volume P-281 of LNI, pages 257–260. Gesellschaft für Informatik e.V., 2018. doi: 10.18420/sicherheit2018_21. URL https://doi.org/ 10.18420/sicherheit2018_21.

110 Bibliography

[101] Florian Menges, Tobias Latzo, Manfred Vielberth, Sabine Sobola, Henrich C. Pöhls, Benjamin Taubmann, Johannes Köstler, Alexander Puchta, Felix Freiling, Hans P. Reiser, and Günther Pernul. Towards GDPR-compliant data processing in mod- ern SIEM systems. Computers & Security, 103:102165, 2021. ISSN 0167-4048. doi: 10.1016/j.cose.2020.102165. URL http://www.sciencedirect.com/science/ article/pii/S0167404820304387. [102] Microsoft Corporation. Debugging Tools for Windows (WinDbg, KD, CDB, NTSD), 2017. URL https://docs.microsoft.com/en-us/windows-hardware/drivers/ debugger/. Accessed: 2021-02-22.

[103] Microsoft Corporation. Minidump Files, 2018. URL https://msdn.microsoft.com/ en-us/library/windows/desktop/ms680369(v=vs.85).aspx. Accessed: 2021-02- 22.

[104] Microsoft Corporation. Event Logging, 2018. URL https://docs.microsoft.com/ en-us/windows/desktop/msi/event-logging. Accessed: 2021-02-12. [105] Saeid Mofrad, Fengwei Zhang, Shiyong Lu, and Weidong Shi. A comparison study of intel SGX and AMD memory encryption technology. In Proceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy, HASP@ISCA 2018, Los Angeles, CA, USA, June 02-02, 2018, pages 9:1– 9:8. ACM, 2018. doi: 10.1145/3214292.3214301. URL https://doi.org/10.1145/ 3214292.3214301. [106] Andreas Moser and Michael I. Cohen. Hunting in the enterprise: Forensic triage and incident response. Digital Investigation, 10(2):89–98, 2013. doi: 10.1016/j.diin.2013. 03.003. URL https://doi.org/10.1016/j.diin.2013.03.003. [107] Tilo Müller, Felix Freiling, and Andreas Dewald. TRESOR Runs Encryption Securely Outside RAM. In 20th USENIX Security Symposium, San Francisco, CA, USA, August 8-12, 2011, Proceedings, 2011. URL http://static.usenix.org/events/ sec11/tech/full%5Fpapers/Muller.pdf. [108] National Institute of Standards and Technology. CVE-2017-5684, 2017. URL https://nvd.nist.gov/vuln/detail/CVE-2017-5684. Accessed: 2021-02-12. [109] National Institute of Standards and Technology. CVE-2017-5685, 2017. URL https://nvd.nist.gov/vuln/detail/CVE-2017-5685. Accessed: 2021-02-12. [110] National Institute of Standards and Technology. CVE-2017-5686, 2017. URL https://nvd.nist.gov/vuln/detail/CVE-2017-5686. Accessed: 2021-02-12. [111] Dmytro Oleksiuk. Building reliable SMM backdoor for UEFI based platform, 2015. URL http://blog.cr4.sh/2015/07/building-reliable-smm-backdoor- for-uefi.html. Accessed: 2021-02-12.

[112] Dmytro Oleksiuk. SmmBackdoor, 2016. URL https://github.com/Cr4sh/ SmmBackdoor. Accessed: 2021-02-12. [113] Liam O’Murchu and Fred P. Gutierrez. The evolution of the fileless click-fraud malware Poweliks. Symantec Security Response Version 1.0, Symantec Corp., 2015.

[114] Open Compute Project. Open Compute Project, 2021. URL https://www. opencompute.org/. Accessed: 2021-02-15.

111 Bibliography

[115] Oracle Corporation. VBoxManage debugvm. In Oracle Corporation, editor, Oracle VM Virtual Box User Manual, chapter 8.40. Oracle Corporation, 2018. URL https:// www.virtualbox.org/manual/. Accessed: 2021-02-22.

[116] Fabio Pagani, Oleksii Fedorov, and Davide Balzarotti. Introducing the Temporal Dimension to Memory Forensics. ACM Transactions on Privacy and Security, 22(2): 9:1–9:21, 2019. doi: 10.1145/3310355. URL https://doi.org/10.1145/3310355.

[117] Ralph Palutke and Felix Freiling. Styx: Countering robust memory acquisition. Digital Investigation, 24:18–28, 2018. doi: 10.1016/j.diin.2018.01.004. URL https:// doi.org/10.1016/j.diin.2018.01.004.

[118] Ralph Palutke, Simon Ruderich, Matthias Wild, and Felix Freiling. HyperLeech: Stealthy System Virtualization with Minimal Target Impact through DMA-Based Hypervisor Injection. In 23rd International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2020), pages 165–179, San Sebastian, October 2020. USENIX Association. ISBN 978-1-939133-18-2. URL https://www.usenix.org/ conference/raid2020/presentation/palutke.

[119] Bryan D Payne. Simplifying virtual machine introspection using libvmi. San- dia report, 2012. URL https://prod-ng.sandia.gov/techlib-noauth/access- control.cgi/2012/127818.pdf. Accessed: 2021-02-22.

[120] Bryan D Payne, DP de A Martim, and Wenke Lee. Secure and flexible monitoring of virtual machines. In Computer Security Applications Conference, 2007. ACSAC 2007. Twenty-Third Annual, pages 385–397. IEEE, 2007. [121] PCI-SIG. PCI Express Base Specification Revision 3.0, 2010.

[122] Fabien Perigaud. Using your BMC as a DMA device: plugging PCILeech to HPE iLO 4, 2018. URL https://www.synacktiv.com/posts/exploit/using-your-bmc-as- a-dma-device-plugging-pcileech-to-hpe-ilo-4.html. Accessed: 2021-02-26.

[123] Jonas Pfoh, Christian A. Schneider, and Claudia Eckert. Nitro: Hardware-Based System Call Tracing for Virtual Machines. In Tetsu Iwata and Masakatsu Nishigaki, editors, Advances in Information and Computer Security - 6th International Work- shop, IWSEC 2011, Tokyo, Japan, November 8-10, 2011. Proceedings, volume 7038 of Lecture Notes in Computer Science, pages 96–112. Springer, 2011. doi: 10.1007/978- 3-642-25141-2\_7. URL https://doi.org/10.1007/978-3-642-25141-2_7.

[124] Gerald J. Popek and Robert P. Goldberg. Formal Requirements for Virtualizable Third Generation Architectures. Communications of the ACM, 17(7):412–421, 1974. doi: 10.1145/361011.361073. URL http://doi.acm.org/10.1145/361011.361073.

[125] Noëlle Rakotondravony, Johannes Köstler, and Hans P. Reiser. Towards a Generic Ar- chitecture for Interactive Cost-Aware Visualization of Monitoring Data in Distributed Systems. In Proceedings of the 4th Workshop on Security in Highly Connected IT Systems, SHCISDAIS 2017, Neuchâtel, Switzerland, June 21 - 22, 2017, pages 25– 30. ACM, 2017. doi: 10.1145/3099012.3099017. URL https://doi.org/10.1145/ 3099012.3099017.

[126] Rapid 7. A Penetration Tester’s Guide to IPMI and BMCs, 2013. URL https://blog. rapid7.com/2013/07/02/a-penetration-testers-guide-to-ipmi/. Accessed: 2021-02-22.

112 Bibliography

[127] Saradha Ravi, N Balakrishnan, and Bharath Venkatesh. Behavior-based Malware analysis using profile hidden Markov models. In 2013 International Conference on Security and Cryptography (SECRYPT), pages 1–12. IEEE, 2013. [128] Konrad Rieck, Thorsten Holz, Carsten Willems, Patrick Düssel, and Pavel Laskov. Learning and Classification of Malware Behavior. In Detection of Intrusions and Malware, and Vulnerability Assessment, 5th International Conference, DIMVA 2008, Paris, France, July 10-11, 2008. Proceedings, volume 5137 of Lecture Notes in Computer Science, pages 108–125. Springer, 2008. doi: 10.1007/978-3-540-70542- 0\_6. URL https://doi.org/10.1007/978-3-540-70542-0_6.

[129] Jordan Robertson and Michael Riley. The Big Hack: How China Used a Tiny Chip to Infiltrate U.S. Companies. Bloomberg Businessweek, 4, 2018. URL https://www.bloomberg.com/news/features/2018-10-04/the-big-hack- how-china-used-a-tiny-chip-to-infiltrate-america-s-top-companies. Ac- cessed: 2021-02-17.

[130] Jordan Robertson and Michael Riley. The Long Hack: How China Exploited a U.S. Tech Supplier. Bloomberg, 2021. URL https://www.bloomberg.com/features/ 2021-supermicro/. Accessed: 2021-02-17.

[131] Robert Rowlingson. A Ten Step Process for Forensic Readiness. Interna- tional Journal of Digital Evidence, 2(3), 2004. URL http://www.utica.edu/ academic/institutes/ecii/publications/articles/A0B13342-B4E0-1F6A- 156F501C49CF5F51.pdf.

[132] Mark Russinovich. Process Explorer v16.32, 2020. URL https://docs.microsoft. com/en-us/sysinternals/downloads/process-explorer. Accessed: 2021-02-22.

[133] Mark Russinovich and Andrew Richard. ProcDump v10.0, 2020. URL https://docs. microsoft.com/en-us/sysinternals/downloads/procdump. Accessed: 2021-02- 22.

[134] Eric Shobe and Jared Mednick. RunBMC: OCP hardware spec solves data center BMC pain points, 2019. URL https://blogs.dropbox.com/tech/2019/08/runbmc-ocp- hardware-spec-solves-data-center-bmc-pain-points/. Accessed: 2021-02-22.

[135] Soflen. DCI Connection Problems, 2018. URL https://community.intel.com/ t5/Intel-System-Studio/DCI-Connection-Problems/td-p/1160475. Accessed: 2021-02-22.

[136] Software Freedom Conservancy. Selenium - Web Browser Automation, 2021. URL https://www.selenium.dev/. Accessed: 2021-02-12.

[137] Juraj Somorovsky, Mario Heiderich, Meiko Jensen, Jörg Schwenk, Nils Gruschka, and Luigi Lo Iacono. All your clouds are belong to us: security analysis of cloud management interfaces. In Proceedings of the 3rd ACM workshop on Cloud computing security workshop, pages 3–14, 2011. [138] Sherri Sparks and Jamie Butler. Shadow walker: Raising the bar for rootkit detection. Black Hat Japan, 11(63):504–533, 2005.

[139] Henrik Stoerner. The Xymon Monitor, 2009. URL https://xymon.sourceforge. io/. Accessed: 2021-02-22.

113 Bibliography

[140] Johannes Stüttgen and Michael Cohen. Anti-forensic resilient memory acquisition. Digital investigation, 10:105–115, 2013. ISSN 1742-2876. doi: https://doi.org/10.1016/ j.diin.2013.06.012. URL https://www.sciencedirect.com/science/article/pii/ S1742287613000583. [141] Johannes Stüttgen and Michael Cohen. Robust Linux memory acquisition with minimal target impact. Digital Investigation, 11(1):112–119, 2014. doi: 10.1016/j. diin.2014.03.014. URL https://doi.org/10.1016/j.diin.2014.03.014.

[142] Matt Suiche. DumpIt, 2017. URL https://www.comae.com/dumpit-memory- forensics-malware-analysis/. Accessed: 2021-02-16. [143] He Sun, Kun Sun, Yuewu Wang, Jiwu Jing, and Sushil Jajodia. TrustDump: Reliable Memory Acquisition on Smartphones. In Computer Security - ESORICS 2014 - 19th European Symposium on Research in Computer Security, Wroclaw, Poland, September 7-11, 2014. Proceedings, Part I, pages 202–218, 2014. doi: 10.1007/978-3- 319-11203-9_12. URL https://doi.org/10.1007/978-3-319-11203-9%5F12.

[144] Joe Sylve. LiME - Linux Memory Extractor, 2012. URL https://github.com/ 504ensicsLabs/LiME. Accessed: 2021-02-22. [145] Benjamin Taubmann and Bojan Kolosnjaji. Architecture for Resource-Aware VMI- based Cloud Malware Analysis. In Proceedings of the 4th Workshop on Security in Highly Connected IT Systems, SHCIS@DAIS 2017, Neuchâtel, Switzerland, June 21 - 22, 2017, pages 43–48. ACM, 2017. doi: 10.1145/3099012.3099015. URL https://doi.org/10.1145/3099012.3099015. [146] Rekall Team. Rekall Memory Forensic Framework: About the Rekall Memory Forensic Framework, 2015. URL http://www.rekall-forensic.com. Accessed: 2021-02-22.

[147] The Apache Software Foundation. Apache Kafka - A distributed streaming platform, 2019. URL https://kafka.apache.org/. Accessed: 2021-02-12. [148] The Apache Software Foundation. Apache HTTP Server Project, 2020. URL https://httpd.apache.org/. Accessed: 2021-02-22. [149] The kernel development community. Documentation for Kdump - The kexec-based Crash Dumping Solution, 2017. URL https://www.kernel.org/doc/html/latest/ admin-guide/kdump/kdump.html. Accessed: 2021-02-22.

[150] TianoCore. EDK II Project, 2020. URL https://github.com/tianocore/edk2. Accessed: 2021-02-22.

[151] TianoCore. EDK II Driver Writer’s Guide, 2021. URL https://edk2-docs.gitbook. io/edk-ii-uefi-driver-writer-s-guide/5_uefi_services. Accessed: 2021-02- 22.

[152] Robert Triggs. Don’t be duped by performance, Apple’s M1 silicon is all about plat- form control, 2020. URL https://www.androidauthority.com/apple-m1-chip- platform-control-1178210/. Accessed: 2021-02-15.

[153] Unified EFI Forum. UEFI Shell Specification, 2008. URL https://www.uefi.org/ sites/default/files/resources/UEFI_Shell_Spec_2_0.pdf. Accessed: 2021-02- 22.

114 Bibliography

[154] Timothy Vidas, Daniel Votipka, and Nicolas Christin. All Your Droid Are Belong to Us: A Survey of Current Android Attacks. In Woot, pages 81–90, 2011. [155] Stefan Vömel and Felix Freiling. A survey of main memory acquisition and analysis techniques for the windows operating system. Digital Investigation, 8(1):3–22, 2011. doi: 10.1016/j.diin.2011.06.002. URL https://doi.org/10.1016/j.diin.2011.06. 002. [156] Stefan Vömel and Felix Freiling. Correctness, atomicity, and integrity: Defining criteria for forensically-sound memory acquisition. Digital Investigation, 9(2):125– 137, 2012. doi: 10.1016/j.diin.2012.04.005. URL https://doi.org/10.1016/j. diin.2012.04.005. [157] Stefan Vömel and Johannes Stüttgen. An evaluation platform for forensic memory acquisition software. Digital Investigation, 10:30–40, 2013. ISSN 1742-2876. doi: http://dx.doi.org/10.1016/j.diin.2013.06.004. URL http://www.sciencedirect. com/science/article/pii/S1742287613000509. [158] Carsten Willems, Thorsten Holz, and Felix Freiling. Toward Automated Dynamic Malware Analysis Using CWSandbox. IEEE Security & Privacy, 5(2):32–39, 2007. doi: 10.1109/MSP.2007.45. URL https://doi.org/10.1109/MSP.2007.45. [159] Neeraja J Yadwadkar, Chiranjib Bhattacharyya, Kanchi Gopinath, Thirumale Niran- jan, and Sai Susarla. Discovery of Application Workloads from Network File Traces. In FAST, pages 183–196, 2010. [160] Salessawi Ferede Yitbarek, Misiker Tadesse Aga, Reetuparna Das, and Todd Austin. Cold Boot Attacks are Still Hot: Security Analysis of Memory Scramblers in Modern Processors. In High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on, pages 313–324. IEEE, 2017. [161] Miao Yu, Zhengwei Qi, Qian Lin, Xianming Zhong, Bingyu Li, and Haibing Guan. Vis: Virtualization enhanced live forensics acquisition for native system. Digital Investigation, 9(1):22–33, 2012. [162] William Yurcik, Cristina Abad, Ragib Hasan, Moazzam Saleem, and Shyama Sridha- ran. UCLog+: A security data management system for correlating alerts, incidents, and raw data from remote logs. arXiv preprint cs/0607111, 2006. [163] Whitney Zhao and Jia Ning. Facebook 2S Server Tioga Pass Rev 1.0, 2018. URL https://www.opencompute.org/documents/facebook-2s-server- tioga-pass-specification. Accessed: 2021-01-30.

115

A Supplement Material of Forensic Fingerprints

In this chapter we provide the raw data of (non-)characteristic fingerprints from Chapter3. Table A.1 shows the influence of the feature set and the source on non-characteristic fingerprints. In Table A.2 one can see the influence of the feature set and source on characteristic fingerprints. Table A.3 shows the influence of the reference set and feature set on characteristic fingerprints.

Table A.1: The table shows the amount of feature vectors using different feature sets (see Figure 3.3 Feature Set Name Source 1 2 3 4 syslog 0 0 0 0 auth.log 0 0 0 0 ls 1518 1518 1468 1468 access.log 0 0 0 0 syscalls 1518 1518 1468 1468 syslog 0 0 0 0 auth.log 0 0 0 0 cp 1563 1563 1548 1548 access.log 0 0 0 0 syscalls 1563 1563 1548 1548 syslog 0 0 0 0 auth.log 0 0 0 0 mv 1482 1482 1476 1476 access.log 0 0 0 0 syscalls 1482 1482 1476 1476 syslog 0 0 0 0 auth.log 0 0 0 0 cat 1461 1461 1463 1463 access.log 0 0 0 0 syscalls 1461 1461 1463 1463 syslog 0 0 0 0 auth.log 0 0 0 0 vmstat 1630 1630 1626 1626 access.log 0 0 0 0 syscalls 1630 1630 1626 1626 syslog 0 0 0 0 auth.log 0 0 0 0 netstat 1708 1708 1708 1708 access.log 0 0 0 0 syscalls 1708 1708 1708 1708 syslog 0 0 0 0 auth.log 0 0 0 0 tar 3630 3630 3622 3622 access.log 0 0 0 0 syscalls 3630 3630 3622 3622 syslog 0 0 0 0 auth.log 0 0 0 0 rm 1457 1457 1433 1433 access.log 0 0 0 0 syscalls 1457 1457 1433 1433 syslog 0 0 0 0 auth.log 0 0 0 0 shred 1500 1500 1497 1497 access.log 0 0 0 0 syscalls 1500 1500 1497 1497 syslog 0 0 0 0 auth.log 0 0 0 0 curl 2923 2923 2774 2774 access.log 0 0 0 0 syscalls 2923 2923 2774 2774 syslog 0 0 0 0 auth.log 4 4 4 4 tailShadow 5194 5194 5180 5180 access.log 0 0 0 0 syscalls 5190 5190 5176 5176 syslog 0 0 0 0 auth.log 4 4 4 4 catCredentials 5580 5580 5228 5228 access.log 0 0 0 0 syscalls 5576 5576 5224 5224 syslog 0 0 0 0 auth.log 3 3 3 3 vimHosts 6233 6233 6195 6195 access.log 0 0 0 0 syscalls 6230 6230 6192 6192 syslog 0 0 0 0 auth.log 3 3 2 2 rmSudo 5974 5973 5444 5444 access.log 0 0 0 0 syscalls 5971 5970 5442 5442

117 A Supplement Material of Forensic Fingerprints

syslog 0 0 0 0 auth.log 6 6 6 6 shredSudo 5859 5859 5731 5731 access.log 0 0 0 0 syscalls 5853 5853 5725 5725 syslog 0 0 0 0 auth.log 0 0 0 0 wordpressLogin 10303 10303 9910 9910 access.log 19 19 19 19 syscalls 10284 10284 9891 9891 syslog 0 0 0 0 auth.log 0 0 0 0 wordpressSearch 8544 8544 8501 8501 access.log 10 10 10 10 syscalls 8534 8534 8491 8491 syslog 0 0 0 0 auth.log 0 0 0 0 wordpressOpen 7955 7955 7945 7945 access.log 10 10 10 10 syscalls 7945 7945 7935 7935 syslog 1 1 1 1 auth.log 4 4 4 4 sshLogin 46596 46595 45962 45961 access.log 0 0 0 0 syscalls 46591 46590 45957 45956 syslog 0 0 0 0 auth.log 4 4 4 4 apacheStop 85994 85982 82826 82826 access.log 0 0 0 0 syscalls 85990 85978 82822 82822 syslog 0 0 0 0 auth.log 0 0 0 0 mysqlWp 7382 7382 7375 7375 access.log 0 0 0 0 syscalls 7382 7382 7375 7375 syslog 0 0 0 0 auth.log 0 0 0 0 lsmod 6128 6128 6122 6122 access.log 0 0 0 0 syscalls 6128 6128 6122 6122 syslog 1 1 1 1 auth.log 3 3 2 2 insmod 3520 3520 3488 3488 access.log 0 0 0 0 syscalls 3516 3516 3485 3485 syslog 1 1 1 1 auth.log 2 2 1 1 rmmod 3509 3509 3476 3476 access.log 0 0 0 0 syscalls 3506 3506 3474 3474 syslog 24 8 23 7 auth.log 0 0 0 0 dockerHelloWorld 83945 83908 82244 82207 access.log 0 0 0 0 syscalls 83921 83900 82221 82200 syslog 24 8 23 7 auth.log 0 0 0 0 dockerUbuntuLog 88230 88192 86666 86628 access.log 0 0 0 0 syscalls 88206 88184 86643 86621 syslog 0 0 0 0 auth.log 0 0 0 0 dockerImages 4737 4737 4752 4752 access.log 0 0 0 0 syscalls 4737 4737 4752 4752 syslog 0 0 0 0 auth.log 0 0 0 0 dockerPs 4684 4684 4495 4495 access.log 0 0 0 0 syscalls 4684 4684 4495 4495 syslog 0 0 0 0 auth.log 0 0 0 0 dockerPSA 7131 7131 6839 6839 access.log 0 0 0 0 syscalls 7131 7131 6839 6839 syslog 14 5 13 4 auth.log 0 0 0 0 dockerUbuntuSleep 68441 68417 67233 67209 access.log 0 0 0 0 syscalls 68427 68412 67220 67205 syslog 0 0 0 0 auth.log 0 0 0 0 dockerRm 6978 6978 6971 6971 access.log 0 0 0 0 syscalls 6978 6978 6971 6971 syslog 12 4 11 3 auth.log 0 0 0 0 dockerNginx 57427 57402 55651 55630 access.log 0 0 0 0 syscalls 57415 57398 55640 55627 syslog 0 0 0 0 auth.log 0 0 0 0 dockerUbuntuBash 8505 8501 8116 8116 access.log 0 0 0 0 syscalls 8505 8501 8116 8116 syslog 0 0 0 0 auth.log 0 0 0 0 dockerPrune 23925 23925 23447 23447 access.log 0 0 0 0 syscalls 23925 23925 23447 23447 syslog 0 0 0 0 auth.log 0 0 0 0 dockerPruneVolumes 25656 25656 25650 25650 access.log 0 0 0 0 syscalls 25656 25656 25650 25650 syslog 0 0 0 0 auth.log 0 0 0 0 dockerRmImages 24784 24784 24780 24780 access.log 0 0 0 0 syscalls 24784 24784 24780 24780 syslog 0 0 0 0 auth.log 0 0 0 0 dockerUbuntuBashCp 54123 54118 53099 53099 access.log 0 0 0 0 syscalls 54123 54118 53099 53099 syslog 24 8 23 7 auth.log 0 0 0 0 dockerUbuntuBashMv 77816 77779 76688 76651

118 access.log 0 0 0 0 syscalls 77792 77771 76665 76644 syslog 14 5 13 4 auth.log 0 0 0 0 dockerUbuntuBashRm 71248 71202 69787 69756 access.log 0 0 0 0 syscalls 71234 71197 69774 69752 syslog 14 5 13 4 auth.log 0 0 0 0 dockerUbuntuBashCat 74328 74282 73067 73036 access.log 0 0 0 0 syscalls 74314 74277 73054 73032 syslog 0 0 0 0 auth.log 4 4 4 4 nextcloudStatus 25674 25674 25526 25526 access.log 0 0 0 0 syscalls 25670 25670 25522 25522 syslog 0 0 0 0 auth.log 4 4 4 4 nextcloudAppList 30025 30025 29789 29789 access.log 0 0 0 0 syscalls 30021 30021 29785 29785 syslog 0 0 0 0 auth.log 4 4 4 4 nextcloudUserList 25679 25679 25111 25111 access.log 0 0 0 0 syscalls 25675 25675 25107 25107 syslog 0 0 0 0 auth.log 4 4 4 4 nextcloudUserAdd 39764 39764 39186 39186 access.log 0 0 0 0 syscalls 39760 39760 39182 39182 syslog 0 0 0 0 auth.log 4 4 4 4 nextcloudGroupList 25264 25264 25167 25167 access.log 0 0 0 0 syscalls 25260 25260 25163 25163

119 A Supplement Material of Forensic Fingerprints

Table A.2: The table shows the impact of feature set and source on characteristic fingerprints. Source Name Feature Set Total syslog auth.log access.log syscalls 1 0 0 0 0 0 2 0 0 0 1 1 ls 3 0 0 0 0 0 4 0 0 0 1 1 1 0 0 0 0 0 2 0 0 0 1 1 cp 3 0 0 0 4 4 4 0 0 0 4 4 1 0 0 0 0 0 2 0 0 0 1 1 mv 3 0 0 0 1 1 4 0 0 0 2 2 1 0 0 0 0 0 2 0 0 0 0 0 cat 3 0 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 1 1 vmstat 3 0 0 0 6 6 4 0 0 0 6 6 1 0 0 0 0 0 2 0 0 0 1 1 netstat 3 0 0 0 15 15 4 0 0 0 15 15 1 0 0 0 1 1 2 0 0 0 4 4 tar 3 0 0 0 5 5 4 0 0 0 5 5 1 0 0 0 0 0 2 0 0 0 1 1 rm 3 0 0 0 0 0 4 0 0 0 1 1 1 0 0 0 0 0 2 0 0 0 1 1 shred 3 0 0 0 1 1 4 0 0 0 2 2 1 0 0 0 0 0 2 0 0 0 0 0 curl 3 0 0 0 1 1 4 0 0 0 1 1 1 0 0 0 0 0 2 0 0 0 2 2 tailShadow 3 0 1 0 5 6 4 0 1 0 6 7 1 0 0 0 0 0 catCredentials

120 2 0 0 0 2 2 3 0 1 0 1 2 4 0 1 0 3 4 1 0 0 0 1 1 2 0 0 0 3 3 vimHosts 3 0 1 0 218 219 4 0 1 0 219 220 1 0 0 0 0 0 2 0 0 0 2 2 rmSudo 3 0 0 0 0 0 4 0 0 0 2 2 1 0 1 0 0 1 2 0 1 0 2 3 shredSudo 3 0 2 0 5 7 4 0 2 0 7 9 1 0 0 2 8 10 2 0 0 2 8 10 wordpressLogin 3 0 0 8 54 62 4 0 0 8 54 62 1 0 0 0 0 0 2 0 0 0 0 0 wordpressSearch 3 0 0 1 1 2 4 0 0 1 1 2 1 0 0 0 0 0 2 0 0 0 0 0 wordpressOpen 3 0 0 0 0 0 4 0 0 0 0 0 1 1 3 0 408 412 2 1 3 0 462 466 sshLogin 3 1 3 0 2202 2206 4 1 3 0 2215 2219 1 0 0 0 1 1 2 0 0 0 15 15 apacheStop 3 0 1 0 1706 1707 4 0 1 0 1711 1712 1 0 0 0 0 0 2 0 0 0 1 1 mysqlWp 3 0 0 0 57 57 4 0 0 0 57 57 1 0 0 0 0 0 2 0 0 0 1 1 lsmod 3 0 0 0 251 251 4 0 0 0 251 251 1 0 0 0 1 1 2 0 0 0 3 3 insmod 3 0 0 0 9 9 4 0 0 0 10 10 1 0 0 0 1 1 2 0 0 0 3 3 rmmod

121 A Supplement Material of Forensic Fingerprints

3 0 0 0 11 11 4 0 0 0 12 12 1 0 0 0 1 1 2 0 0 0 3 3 dockerHelloWorld 3 0 0 0 28 28 4 0 0 0 28 28 1 0 0 0 0 0 2 0 0 0 5 5 dockerUbuntuLog 3 0 0 0 20 20 4 0 0 0 23 23 1 0 0 0 0 0 2 0 0 0 1 1 dockerImages 3 0 0 0 0 0 4 0 0 0 1 1 1 0 0 0 0 0 2 0 0 0 0 0 dockerPs 3 0 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 0 0 dockerPSA 3 0 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 2 2 dockerUbuntuSleep 3 0 0 0 1 1 4 0 0 0 2 2 1 0 0 0 0 0 2 0 0 0 0 0 dockerRm 3 0 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 8 8 dockerNginx 3 0 0 0 64 64 4 0 0 0 65 65 1 0 0 0 0 0 2 0 0 0 0 0 dockerUbuntuBash 3 0 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 1 1 dockerPrune 3 0 0 0 0 0 4 0 0 0 1 1 1 0 0 0 0 0 2 0 0 0 1 1 dockerPruneVolumes 3 0 0 0 0 0 4 0 0 0 1 1 1 0 0 0 0 0 2 0 0 0 2 2 dockerRmImages 3 0 0 0 0 0

122 4 0 0 0 2 2 1 0 0 0 0 0 2 0 0 0 0 0 dockerUbuntuBashCp 3 0 0 0 0 0 4 0 0 0 0 0 1 0 0 0 0 0 2 0 0 0 1 1 dockerUbuntuBashMv 3 0 0 0 17 17 4 0 0 0 18 18 1 0 0 0 0 0 2 0 0 0 1 1 dockerUbuntuBashRm 3 0 0 0 2 2 4 0 0 0 3 3 1 0 0 0 0 0 2 0 0 0 0 0 dockerUbuntuBashCat 3 0 0 0 25 25 4 0 0 0 24 24 1 0 0 0 0 0 2 0 0 0 2 2 nextcloudStatus 3 0 1 0 0 1 4 0 1 0 2 3 1 0 0 0 0 0 2 0 0 0 2 2 nextcloudAppList 3 0 1 0 41 42 4 0 1 0 43 44 1 0 0 0 0 0 2 0 0 0 2 2 nextcloudUserList 3 0 1 0 0 1 4 0 1 0 2 3 1 0 0 0 0 0 2 0 0 0 16 16 nextcloudUserAdd 3 0 1 0 86 87 4 0 1 0 102 103 1 0 0 0 0 0 2 0 0 0 2 2 nextcloudGroupList 3 0 1 0 2 3 4 0 1 0 4 5

123 A Supplement Material of Forensic Fingerprints

Table A.3: The table shows the amount of feature vectors of characteristic fingerprints using different feature sets and reference sets. Event Feature Set Σ′: Same Class Σ′: All 1 0 0 2 1 1 ls 3 1 0 4 1 1 1 0 0 2 1 1 cp 3 5 4 4 5 4 1 0 0 2 1 1 mv 3 2 1 4 2 2 1 0 0 2 1 0 cat 3 1 0 4 1 0 1 1 0 2 2 1 vmstat 3 12 6 4 12 6 1 0 0 2 1 1 netstat 3 50 15 4 50 15 1 132 1 2 135 4 tar 3 136 5 4 136 5 1 2 0 2 3 1 rm 3 3 0 4 3 1 1 20 0 2 21 1 shred 3 23 1 4 23 2 1 0 0 2 0 0 curl 3 1 1 4 1 1 1 19 0 2 21 2 tailShadow 3 61 6 4 62 7 1 1 0 catCredentials

124 2 3 2 3 12 2 4 13 4 1 72 1 2 74 3 vimHosts 3 306 219 4 307 220 1 1 0 2 3 2 rmSudo 3 8 0 4 9 2 1 43 1 2 45 3 shredSudo 3 50 7 4 51 9 1 80 10 2 80 10 wordpressLogin 3 133 62 4 133 62 1 0 0 2 0 0 wordpressSearch 3 2 2 4 2 2 1 0 0 2 0 0 wordpressOpen 3 0 0 4 0 0 1 630 412 2 685 466 sshLogin 3 2610 2206 4 2611 2219 1 235 1 2 249 15 apacheStop 3 2389 1707 4 2392 1712 1 0 0 2 1 1 mysqlWp 3 62 57 4 62 57 1 25 0 2 26 1 lsmod 3 293 251 4 293 251 1 2 1 2 4 3 insmod 3 10 9 4 11 10 1 1 1 2 3 3 rmmod

125 A Supplement Material of Forensic Fingerprints

3 12 11 4 13 12 1 1 1 2 3 3 dockerHelloWorld 3 38 28 4 38 28 1 1 0 2 6 5 dockerUbuntuLog 3 35 20 4 37 23 1 0 0 2 1 1 dockerImages 3 0 0 4 1 1 1 0 0 2 0 0 dockerPs 3 0 0 4 0 0 1 0 0 2 0 0 dockerPSA 3 0 0 4 0 0 1 0 0 2 2 2 dockerUbuntuSleep 3 1 1 4 2 2 1 0 0 2 0 0 dockerRm 3 0 0 4 0 0 1 3 0 2 11 8 dockerNginx 3 278 64 4 279 65 1 0 0 2 0 0 dockerUbuntuBash 3 0 0 4 0 0 1 0 0 2 1 1 dockerPrune 3 0 0 4 1 1 1 0 0 2 1 1 dockerPruneVolumes 3 0 0 4 1 1 1 0 0 2 2 2 dockerRmImages 3 0 0

126 4 2 2 1 0 0 2 0 0 dockerUbuntuBashCp 3 1 0 4 1 0 1 0 0 2 1 1 dockerUbuntuBashMv 3 18 17 4 18 18 1 0 0 2 1 1 dockerUbuntuBashRm 3 14 2 4 14 3 1 1 0 2 2 0 dockerUbuntuBashCat 3 38 25 4 37 24 1 3 0 2 5 2 nextcloudStatus 3 5 1 4 7 3 1 0 0 2 2 2 nextcloudAppList 3 42 42 4 44 44 1 0 0 2 2 2 nextcloudUserList 3 1 1 4 3 3 1 23 0 2 39 16 nextcloudUserAdd 3 116 87 4 132 103 1 0 0 2 2 2 nextcloudGroupList 3 3 3 4 5 5

127