Advanced Systems Security: Symbolic Execution

Systems and Internet Infrastructure Security Network and Security Research Center Department of Computer Science and Engineering Pennsylvania State University, University Park PA Advanced Systems Security:! Symbolic Execution Trent Jaeger Systems and Internet Infrastructure Security (SIIS) Lab Computer Science and Engineering Department Pennsylvania State University Systems and Internet Infrastructure Security (SIIS) Laboratory Page 1 Problem • Programs have flaws ‣ Can we find (and fix) them before adversaries can reach and exploit them? • Then, they become “vulnerabilities” Systems and Internet Infrastructure Security (SIIS) Laboratory Page 2 Vulnerabilities • A program vulnerability consists of three elements: ‣ A flaw ‣ Accessible to an adversary ‣ Adversary has the capability to exploit the flaw • Which one should we focus on to find and fix vulnerabilities? ‣ Different methods are used for each Systems and Internet Infrastructure Security (SIIS) Laboratory Page 3 Is It a Flaw? • Problem: Are these flaws? ‣ Failing to check the bounds on a buffer write? ‣ printf(char *n); ‣ strcpy(char *ptr, char *user_data); ‣ gets(char *ptr) • From man page: “Never use gets().” ‣ sprintf(char *s, const char *template, ...) • From gnu.org: “Warning: The sprintf function can be dangerous because it can potentially output more characters than can fit in the allocation size of the string s.” • Should you fix all of these? Systems and Internet Infrastructure Security (SIIS) Laboratory Page 4 Is It a Flaw? • Problem: Are these flaws? ‣ open( filename, O_CREAT ); • Need O_EXCL to ensure only new file is created ‣ stat, then open • To check access/ownership before open ‣ mkdir, then chmod • Change perms after making a directory ‣ x = x + n; (all ints) • Yes, this is just addition… ‣ strncpy(filename, user_data, n), then open(filename) • Should you fix all of these? Systems and Internet Infrastructure Security (SIIS) Laboratory Page 5 Finding Flaws • Researchers have explored program analysis methods to find flaws that may lead to vulnerabilities • Dynamic analysis ‣ Dynamic program analysis is the analysis of computer software that is performed by running the program ‣ Can perform dynamic analysis easily on binaries • Static analysis ‣ Static program analysis is the analysis of computer software that is performed without running the program ‣ Generally, use the source code, but not required Systems and Internet Infrastructure Security (SIIS) Laboratory Page 6 Dynamic Analysis • Given the code, find whether the code may reach an execution state that fails an invariant ‣ Choose sequences of inputs to provide to the program • The choice of sequences can include random inputs (fuzzing) ‣ See what happens – does it violate an invariant? • Does it crash? ‣ Keep trying – must try all paths on all inputs • Not much in the way of feedback usually • Not as much theory here ‣ Can abstract conditions under which result holds Systems and Internet Infrastructure Security (SIIS) Laboratory Page 7 Dynamic Analysis – Limits • Have both access control policy and program system calls 01 /* filename = /var/mail/root */ 02 /* First, check if file already exists */ 03 fd = open (filename, flg); 04 if (fd == -1) • Inherently results in false negatives 05 /* Create the{ file */ 06 fd = open(filename, O_CREAT|O_EXCL); 07 if (fd < 0) 08 return errno;{ 09 ‣ Attacks require special conditions 10 } 11 /*} We now have a file. Make sure 12 we did not open a symlink. */ 13 struct stat fdbuf, filebuf; • Current working directory, links, … 14 if (fstat (fd, &fdbuf) == -1) 15 return errno; 16 if (lstat (filename, &filebuf) == -1) 17 return errno; • 18 /* Now check if file and fd reference the same file, Can’t run all cases… 19 file only has one link, file is plain file. */ 20 if ((fdbuf.st_dev??? != filebuf.st_dev 21 || fdbuf.st_ino != filebuf.st_ino 22 || fdbuf.st_nlink != 1 • Still, many false positives 23 || filebuf.st_nlink != 1 24 || (fdbuf.st_mode & S_IFMT) != S_IFREG)) 25 error (_("%s must be a plain file { 26 with one link"), filename); 27 close (fd); ‣ Program code might defend itself 28 return EINVAL; 29 30 /*} If we get here, all checks passed. 31 Start using the file */ • Manual audits impractical 32 read(fd, ...) ‣ In our study, only 13% of adversary- accessible name resolutions are actually vulnerable Systems and Internet Infrastructure Security (SIIS) Laboratory Page 8 Static Analysis • Given the code, find whether the code may reach an execution state that fails an invariant ‣ grep – find existence of a dangerous command • Syntactic ‣ Examine possible executions • Semantic • “Run program in aggregate” – maintain an approximation of the possible values that can reach a statement • “Run in a non-standard way” – evaluate each function separately and stitch their executions together • There is a lot of theory and experience on this! Systems and Internet Infrastructure Security (SIIS) Laboratory Page 9 Static Analysis 2 Environmental Dependencies. Most of the code is 1:void expand(char *arg, unsigned char *buffer) { 8 • “Run program in aggregate”controlled by values derived from– environmental in- 2: int i, ac; 9 put. Command line arguments determine what pro- 3: while (*arg) { 10* 4: if (*arg == ’\\’) { 11* cedures execute, input values determine which way 5: arg++; maintain an approximationif-statements trigger, and theof program depends on the 6: i = ac =0; ability to read from the file system. Since inputs can 7: if (*arg >= ’0’ && *arg <= ’7’) { be invalid (or even malicious), the code must handle 8: do { 9: ac =(ac << 3) + *arg++ − ’0’; the possible values thesethat cases gracefully. can It is not trivial to test all im- 10: i++; portant values and boundary cases. 11: } while (i<4&&*arg>=’0’ && *arg<=’7’); The code illustrates two additional common features. 12: *buffer++ = ac; 13: } else if (*arg != ’\0’) First, it has bugs, which KLEE finds and generates test reach a statement 14: *buffer++ = *arg++; cases for. Second, KLEE quickly achieves good code 15: } else if (*arg == ’[’) { 12* coverage: in two minutes it generates 37 tests that cover 16: arg++; 13 all executable statements. 2 17: i =*arg++; 14 18: if (*arg++ != ’-’) { 15! KLEE has two goals: (1) hit every line of executable 19: *buffer++ = ’[’; • “Run in a non-standardcode in the program way” and (2) detect at each– dangerous op- 20: arg −=2; eration (e.g., dereference, assertion) if any input value 21: continue; 22: } exists that could cause an error. KLEE does so by running 23: ac =*arg++; evaluate each functionprograms symbolically :unlikenormalexecution,where 24: while (i <= ac)*buffer++ = i++; operations produce concrete values from their operands, 25: arg++; /* Skip ’]’ */ here they generate constraints that exactly describe the 26: } else 27: *buffer++ = *arg++; set of values possible on a given path. When KLEE de- separately and stitch their 28: } tects an error or when a path reaches an exit call, KLEE 29: } solves the current path’s constraints (called its path con- 30: . dition)toproduceatestcasethatwillfollowthesame 31: int main(int argc, char* argv[]) { 1 executions together 32: int index =1; 2 path when rerun on an unmodified version of the checked 33: if (argc > 1&&argv[index][0] == ’-’) { 3* program (e.g, compiled with gcc). 34: . 4 KLEE is designed so that the paths followed by the 35: } 5 unmodified program will always follow the same path 36: . 6 37: expand(argv[index++], index); 7 KLEE took (i.e., there are no false positives). However, 38: . non-determinism in checked code and bugs in KLEE or 39: } its models have produced false positives in practice. The ability to rerun tests outside of KLEE,inconjunctionwith Figure 1: Code snippet from MINIX’s tr,representative standard tools such as gdb and gcov is invaluable for of the programs checked in this paper: tricky, non-obvious, diagnosing such errors and for validating our results. difficult to verify by inspection or testing. The order of the We next show how to use KLEE,thengiveanoverview statements on the path to the error at line 18 are numbered on of how it works. the right hand side. 2.1 Usage Systems and Internet Infrastructure Security (SIIS) Laboratory AusercanstartcheckingmanyrealprogramswithKLEE The first option, --max-time,tellsKLEE toPage check 10 in seconds: KLEE typically requires no source modifi- tr.bc for at most two minutes. The rest describe the cations or manual work. Users first compile their code symbolic inputs. The option --sym-args 1 10 10 to bytecode using the publicly-available LLVM com- says to use zero to three command line arguments, the 3 piler [33] for GNU C. We compiled tr using: first 1 character long, the others 10 characters long. The option --sym-files 2 2000 says to use standard llvm-gcc --emit-llvm -c tr.c -o tr.bc input and one file, each holding 2000 bytes of symbolic data. The option --max-fail 1 says to fail at most Users then run KLEE on the generated bytecode, option- one system call along each program path (see § 4.2). ally stating the number, size, and type of symbolic inputs to test the code on. For tr we used the command: 2.2 Symbolic execution with KLEE klee --max-time 2 --sym-args 1 10 10 When KLEE runs on tr,itfindsabufferoverflowerror --sym-files 2 2000 --max-fail 1 tr.bc at line 18 in Figure 1 and then produces a concrete test 2The program has one line of dead code, an unreachable return 3Since strings in C are zero terminated, this essentially generates statement, which, reassuringly, KLEE cannot run. arguments of up to that size. 3 Static Analysis – Limits • Analyze program to find potentially vulnerable

Advanced Systems Security: Symbolic Execution

Chopped Symbolic Execution

Opportunities and Open Problems for Static and Dynamic Program Analysis Mark Harman∗, Peter O’Hearn∗ ∗Facebook London and University College London, UK

Test Generation Using Symbolic Execution

The Art, Science, and Engineering of Fuzzing: a Survey

Symstra: a Framework for Generating Object-Oriented Unit Tests Using Symbolic Execution

Static Program Analysis As a Fuzzing Aid

A Randomized Dynamic Program Analysis Technique for Detecting Real Deadlocks

Learning to Accelerate Symbolic Execution Via Code Transformation

Formally Verifying Webassembly with Kwasm

Seminar on Program Analysis ECS 289C (Programming Languages and Compilers) Winter 2015 Preliminary Reading List

Decoupling Dynamic Program Analysis from Execution in Virtual Environments

Practical Concurrency Testing Or: How I Learned to Stop Worrying and Love the Exponential Explosion