Coverage-Guided Fuzzing of Embedded Firmware with Avatar2
Total Page:16
File Type:pdf, Size:1020Kb
Università degli Studi di Padova Dipartimento di Matematica “Tullio Levi-Civita” Corso di Laurea in Informatica Coverage-guided fuzzing of embedded firmware with avatar2 Supervisor Candidate Prof. Mauro Conti Andrea Biondo Università di Padova Co-supervisor Marius Muench EURECOM Tutor Prof. Claudio Enrico Palazzi Università di Padova ii To my family, who always supports me (and bears with the 3am keyboard noise) iv Abstract Since the eighties, fuzz testing has been used to stress applications and find problems in them. The basic idea is to feed malformed inputs to the program, with the goal of stimulating buggy code paths that produce incorrect behavior. Early fuzzers used primitive methods to gener- ate such inputs, mostly based on random generation and mutation. Modern fuzzers have reached high levels of efficiency, often by leveraging feedback information about how apro- gram’s control flow is influenced by the inputs, and uncovered a large amount ofbugs,many with security implications. However, those fuzzers are targeted towards normal, general- purpose systems. The state of fuzzing on embedded devices is not so developed. Due to distinctive traits of this kind of device, such as atypical operating systems, low-level periph- eral access and limited memory protection, fuzzing tools do not have the same efficiency, and fuzzing jobs are more expensive to set up. In this work, we present AFLtar, a coverage-guided fuzzer for embedded firmware. AFLtar leverages avatar2, an orchestration framework for dynamic analysis, along with the Amer- ican Fuzzy Lop coverage-guided fuzzer and the AFL-Unicorn CPU emulator. The goal of AFLtar is to reduce the cost of embedded fuzzing by providing a platform that can be used to quickly setup a firmware fuzzing job, while reaping the benefits of modern, feedback-driven fuzzing strategies. v vi Sommario Fin dagli anni ottanta, il fuzzing è stato usato per stressare applicazioni e trovare problemi all’interno di esse. L’idea di base consiste nel fornire input malformati al programma, con l’obiettivo di stimolare percorsi di codice problematici che producono comportamenti non corretti. I primi fuzzer utilizzavano metodi rudimentali per generare questi input, perlopiù basati su generazioni e mutazioni casuali. I fuzzer moderni hanno raggiunto alti livelli di efficienza, spesso sfruttando informazioni di feedback sul modo in cui l’input influenza il flusso di controllo del programma, e hanno scoperto grandi quantità di bug, buona parte dei quali con implicazioni di sicurezza. Questi fuzzer, tuttavia, sono costruiti per normali sistemi general-purpose. Nell’ambito embedded, lo stato del fuzzing non è così sviluppato. A causa di alcuni tratti distintivi di questi dispositivi, come sistemi operativi atipici, accesso a basso livello alle periferiche, e limitata protezione della memoria, gli strumenti di fuzzing non raggiungono la stessa efficienza, e rendono il processo più costoso. In questo lavoro, presentiamo AFLtar, un fuzzer coverage-guided per firmware embedded. AFLtar sfrutta avatar2, un framework di orchestrazione per l’analisi dinamica, assieme al fuzzer coverage-guided American Fuzzy Lop e all’emulatoredi CPU AFL-Unicorn. L’obiettivo di AFLtar è ridurre il costo del fuzzing su firmware embedded fornendo una piattaforma us- abile per avviare velocemente processi di fuzzing, cogliendo allo stesso tempo i benefici delle moderne strategie di fuzzing guidate da feedback. vii viii Contents Abstract v List of figures xi List of tables xiii 1 Introduction 1 1.1 Contribution ................................ 2 1.2 Organization ................................ 2 2 Background 3 2.1 Program analysis .............................. 3 2.1.1 Control flow analysis ........................ 3 2.1.2 Coverage .............................. 6 2.1.3 Static analysis ............................ 6 2.1.4 Dynamic analysis .......................... 8 2.2 Fuzzing ................................... 8 2.3 Embedded devices .............................. 10 3 Technologies 13 3.1 American Fuzzy Lop ............................ 13 3.1.1 Coverage measurement ....................... 14 3.1.2 Test case evolution ......................... 15 3.1.3 Culling and trimming ....................... 17 3.1.4 Mutation strategies ......................... 18 3.1.5 Crash reporting ........................... 18 3.1.6 The forkserver ........................... 19 3.1.7 QEMU mode ............................ 20 3.2 Unicorn ................................... 21 3.2.1 Overview .............................. 21 3.2.2 Instrumentation .......................... 22 3.3 AFL-Unicorn ................................ 22 3.3.1 Overview .............................. 22 3.3.2 Driver Workflow .......................... 23 3.4 Avatar2 ................................... 24 ix 3.4.1 Architecture ............................ 24 4 AFLtar 27 4.1 avatar2 API ................................ 28 4.1.1 Top-level API ............................ 28 4.1.2 Execution protocol ......................... 30 4.1.3 Memory protocol .......................... 32 4.1.4 Register protocol .......................... 33 4.1.5 Remote memory protocol ..................... 33 4.1.6 Target ................................ 34 4.2 Unicorn API ................................ 34 4.2.1 Execution, registers and memory . 35 4.2.2 Hooks ............................... 36 4.3 Unicorn bugs ................................ 38 4.3.1 Issue A: wrong PC after stopping from hook . 39 4.3.2 Issue B: cannot stop from different thread while in hook . 39 4.3.3 Issue C: crash when stopping from different thread while in hook . 40 4.4 Design .................................... 40 4.4.1 Message passing ........................... 42 4.4.2 Hooks ............................... 45 4.4.3 Breakpoint and watchpoint handling . 46 4.4.4 Emulation ............................. 47 4.4.5 Memory forwarding ........................ 49 4.4.6 Additions to the standard API ................... 50 4.4.7 Fuzzing driver ........................... 50 5 Evaluation 53 5.1 Experiment design .............................. 53 5.2 Results ................................... 55 5.3 Discussion .................................. 56 5.4 Future work ................................. 57 6 Conclusion 59 References 60 x Listing of figures 2.1 The line_length function’s control flow graph (compiled for x86_64). 5 2.2 The line_length control flow graph without instructions. 5 2.3 Verified property sets in sound and complete static analysis, compared to real program behavior. ............................ 6 3.1 AFL architecture. .............................. 14 3.2 Example CFG fragment. .......................... 15 3.3 Example traces used in the text. ....................... 16 3.4 Workflow of AFL’s QEMU mode. ..................... 20 3.5 Workflow of AFL-Unicorn’s Unicorn mode. 22 3.6 Workflow of a typical AFL-Unicorn driver. 23 3.7 avatar2 architecture. ............................ 25 4.1 General AFLtar architecture. ........................ 41 4.2 Sequence diagram for Unicorn protocol message passing. 43 4.3 Flow chart for Unicorn hook handling. ................... 45 4.4 Flow chart for the breakpoint hook. ..................... 46 4.5 Flow chart for emulation start (protocol side). 47 4.6 Flow chart for emulation start (endpoint side). 48 5.1 Experimental hardware. On the right, the STMicroelectronics NUCLEO- L152RE board, which integrates an STM32L152RE microcontroller (bot- tom) and an ST-LINK/V2-1 programming and debugging interface (top). On the left, an RS232-USB converter based on the FTDI FT232RL chip, connected to the microcontroller’s UART interface. 54 5.2 Total executions vs total paths in our experiments. 57 xi xii Listing of tables 5.1 Experimental results. ............................ 56 xiii xiv There are three principal means of acquiring knowledge available to us: observation of nature, reflection, and experimentation. Observation col- lects facts; reflection combines them; experimenta- tion verifies the result of that combination. Denis Diderot 1 Introduction Fall of 1988. It was a dark and stormy night over the city of Madison, Wisconsin. The weather was particularly bothersome for Prof. Barton Miller, who was trying to connect to his office’s Unix system from his apartment. The rain was causing constant noise onhis 1200-baud line, which made it hard to type commands into the shell. He was not surprised by the noise itself, but by how the corrupted input made common Unix utilities crash. Could noisy, garbled or random input be used as a testing tool to find bugs in software? Miller de- cided to study this phenomenon, and assigned it as a project for students in his Advanced Operating System class: he dubbed it The Fuzz Generator. One group wrote a fuzzer that crashed a quarter to a third of Unix utilities over seven different Unix variants [1]. Later, Miller discovered that this idea was not new: in 1983, Steve Capps at Apple had written The Monkey, a testing tool that generated random GUI events to test Macintosh applications [2]. Nowadays, fuzzing is a commonly employed testing technique that has seen extensive improvement over the years. In particular, fuzzing has significant impact on security test- ing. Smart fuzzers that build feedback by observing the program’s internal behavior in re- sponse to inputs have identified a large amount of vulnerabilities in complex, high-value software [3, 4]. However, the situation is not so encouraging in the embedded world, whose security is becoming increasingly important