Error Propagation Analysis for File Systems!
Total Page:16
File Type:pdf, Size:1020Kb
Error Propagation Analysis for File Systems! Cindy Rubio-González, Haryadi S. Gunawi, Ben Liblit, Remzi H. Arpaci-Dusseau, and Andrea C. Arpaci-Dusseau! University of Wisconsin-Madison PLDI’09 ECS 289C Seminar on Program Analysis February 17th, 2015 Motivation! • Systems software plays an important role! o Designed to operate the computer hardware! o Provides platform for running application software! • Particular focus on file systems! o Store massive amounts of data! o Used everywhere!! § Home use: photos, movies, tax returns! § Servers: network file servers, search engines! • Incorrect error handling → serious consequences! o Silent data corruption, data loss, system crashes, etc.! • Broken systems software → Broken user applications!! ! 2 Error Handling! Error Propagation + Error Recovery! USER-LEVEL APPLICATION SYSTEMS SOFTWARE Error Recovery:! ERROR ERROR • Logging! • Notifying! • Re-trying! Run-time ERROR ERROR Errors! HARDWARE Incorrect error handling → longstanding problem! 3 Return-Code Idiom! • Widely used in systems software written in C! • Also used in large C++ applications! • Run-time errors represented as integer values! • Propagated through variable assignments and function return values! 4 Error Codes in Linux! 5 Example of Error-Propagation Bug! 1 int txCommit(...) { 2 ... 3 if (isReadOnly(...)) { 4 rc = -EROFS; read-only file-system error! 5 ... 6 goto TheEnd; 7 } ... 8 if (rc = diWrite(...)) diWrite may return EIO error! 9 txAbort(...); 10 TheEnd: return rc; function returns variable rc 11 } 13 int diFree(...) { 14 ... 15 rc = txCommit(...); 16 ... rc not acknowledged by error 17 return 0; reporting/recovery code! Silent data 18 } error stored in rc is lost!! loss! Incorrect error propagation in IBM JFS Linux file system! 6 High-Level Approach! Source code Source code + Error code definitions! Static Program Analysis! Error Propagaon Analysis Set of error codes each variable may contain at each program point! Second pass over code to find:! Find Error-Propagaon 1) Overwritten errors! Bugs 2) Out-of-scope errors! 3) Unsaved errors! Bug Reports Sample path for each bug found! 7 Error Propagation Analysis! • Interprocedural, flow- and context-sensitive static program analysis! • Finds set of error codes each variable may contain at each program point! • Formulated and solved using weighted pushdown systems (WPDS)! 8 Weighted Pushdown Systems! • Control flow is encoded using Pushdown System (PDS)! • Data flow encoded as transfer functions (or weights) on PDS rules! o Elements of a set S that constitutes a bounded idempotent semiring! S =(D, , , 0¯, 1)¯ ⊕ ⌦ • Computes “meet-over-all-paths” solution! 9 Mappings Describing Transfer Functions! Definition! Example! V CV C D D== wwww: V: V C C 2 [2 [ { | [ ! } 1 if (...) { { | [ ! } 2 x = EROFS; V: set of variables ! 3 } C: set of error-code constants ! 4 else { 5 x = EIO; 6 } 7 y = x; Example of a mapping! w x y EIO EROFS OK x y EIO EROFS OK Ident[x ↦ {EROFS}]! 10 Combining Mappings: ! Modeling merging of control flow paths ! ⊕ Definition of combine operator! Example! (w w )(e) w (e) w (e) 1 if (...) { 1 ⊕ 2 ⌘ 1 [ 2 2 x = EROFS; 3 } MERGING where! w1,w2 D and! e V C BRANCHES! where 2 and 2 [ 4 else { 5 x = EIO; 6 } 7 y = x; x y EIO EROFS OK x y EIO EROFS OK x y EIO EROFS OK x y EIO EROFS OK x y EIO EROFS OK x y EIO EROFS OK w1 w2 w1 w2 Ident[x ↦ {EROFS}]! Ident[x ↦ {EIO}]! Ident[x ↦ {EROFS,⊕ EIO}]! 11 Extending Mappings: ! Modeling sequential control flow! ⌦ Definition of extend operator! Example! 1 if (...) { ((ww1 w2w)(e))(e) w1(e0) w (e0) 1⌦ 2 ⌘ 1 2 x = EROFS; ⌦ e0 ⌘w2(e) 3 } 2[e0 w2(e) 2[ 4 else { EXTENDING! where! w1,wwhere 2 D and! e V C w1,w22 D e2 V[ C 5 x = EIO; 2 2 [ 6 } 7 y = x; x y EIO EROFS OK x y EIO EROFS OK ww11 Ident[x ↦ {EIO, EROFS}]! w Ident[y ↦ {x}]! w22 x y EIO EROFS OK x y EIO EROFS OK w1 w2 Ident[x ↦ {EIO, EROFS},⌦ y ↦ {EIO, EROFS}]! 12 Detecting Handled Errors! • Handled errors vs. unhandled errors! •! Adopt simple definition of “correct handling”! ! Example! 1 if (...) { x y EIO EROFS OK 2 x = EROFS; 3 } 4 else { 5 x = EIO; 6 } x y EIO EROFS OK 7 y = x; Ident[y ↦ {OK}]! logging error! 8 printk(“error %d\n”, y); • Error-handling functions file-system specific! o ext3_error, jfs_error, etc.! 13 Analysis Result! • Mappings representing execution from the beginning of the program to any program point! o Meet over all paths value ! o Errors each variable may contain at each program point! • For each mapping a subset of inspected paths whose combine is equal to ! o Explains why value is as claimed! 14 Dropped Errors! Error-code instances that vanish before being handled OVERWRITTEN! Error overwritten with new value! OUT OF SCOPE! Variable storing error goes out of scope! UNSAVED! Error returned by a function is not saved by the caller! 15 Program Transformations! 1. Transform unsaved errors into out-of-scope errors! 1 int bar(...) { 2 return –EIO; EIO is returned! 3 } 5 int foo(...) { 6 OUT OF SCOPE! 7 ... 8 bar(); EIO is not saved! 9 ... 10 11 UNSAVED! 12 return ...; 13} 16 Program Transformations! 1. Transform unsaved errors into out-of-scope errors! 1 int bar(...) { 2 return –EIO; 3 } 5 int foo(...) { 6 OUT OF SCOPE! 7 ... 8 int tmp = bar(); EIO is saved in 9 ... temporary variable! 10 11 UNSAVED! 12 return ...; tmp is out of scope ! 13} 17 Program Transformations! 2. Transform out-of-scope errors into overwritten errors! 1 int bar(...) { OVERWRITTEN! 2 return –EIO; 3 } 5 int foo(...) { foo(...) { 6 OUT OF SCOPE! 7 ... 8 int tmp = bar(); 9 ... 10 11 tmp = OK; EIOAssigning is overwritten OK at the ! end UNSAVED! 12 return ...; oftmp the isfunction out of scope! ! 13} 18 Finding Dropped Errors! 1. Transform all error-propagation bugs into overwrites! ! 2. Apply error-propagation analysis! 3. Find whether each assignment may overwrite an error! ! • Before each assignment t = s, retrieve the associated mapping w! ! • Let T = w(t), S = w ( s ) , and E be the set of all error codes! (1) If T E = No error overwrite ! \ {} (2) If T E = S = e OK to overwrite same error \ { } (3) Otherwise Error overwrite!! 19 Sample Output! 1 int nextId(){ 2 static int id; 3 return id++; “result” has unchecked error 4 } 5 6 int getError(){ 7 return –EIO; unchecked error “EIO” is returned 8 } 9 10 int load(){ 11 int status, result = 0; 12 13 if (nextId()) 14 status = getError(); “status” receives unchecked error from function “getError” 15 16 result = status; “result” receives unchecked error from “status” 17 18 if (nextId()) “result” has unchecked error 19 result = -EPIPE; overwriting unchecked error in “result” 20 21 return result; 22 } 20 Sample Output! 1 int nextId(){ 2 static int id; 3 return id++; example.c:7: uncHecked error “EIO” is returned 4 } 5 example.c:14: “status” receives uncHecked error from funcEon “getError” 6 int getError(){ example.c:16: “result” receives uncHecked error from “status” 7 return –EIO; example.c:18: “result” Has uncHecked error 8 } example.c:3: “result” Has uncHecked error 9 example.c:18: “result” Has uncHecked error 10 int load(){ example.c:19: overwriEng uncHecked error in “result” 11 int status, result = 0; 12 Complete diagnostic path trace 13 if (nextId()) 14 status = getError(); 15 example.c:7: uncHecked error “EIO” is returned 16 result = status; example.c:14: “status” receives uncHecked error from funcEon “getError” 17 example.c:16: “result” receives uncHecked error from “status” 18 if (nextId()) 19 result = -EPIPE; example.c:19: overwriEng uncHecked error in “result” 20 Diagnostic path slice 21 return result; 22 } 21 Experimental Results! • Case studies: CIFS, ext3, ext4, IBM JFS, ReiserFS and VFS in Linux 2.6.27 kernel! • 501 bug reports in total! Many errors actually dropped! True Bugs Per Category False Posi2ves Per Category 25 18 44 8% 6% 23% Overwrien Overwrien Out of Scope 97 Out of Scope 51% 269 Unsaved 48 Unsaved 86% 26% 312 reports 189 reports 22 Unsaved Errors! • 269 true bugs, 97 false positives! Unsaved True Bugs per File System 12 24 ext4 68 ext3 38 IBM JFS VFS ReiserFS 58 69 cifs • Most common unsaved errors: EIO, ENOSPC and ENOMEM! 23 Inconsistencies! • Inconsistent use of function return values! ! • For example, a function whose return value is: ! o Saved at 17 call sites! o Unsaved at 35 other call sites! § 9 out of 35 identified as true bugs! § The rest are false positives (errors are intentionally dropped!)! ! • Some developers have suggested to use annotations to mark intentionally dropped errors! 24 False Positives – Unsaved Errors! • Redundant error reporting! 1 buffer_head *b = NULL; 2 ... 3 if (buffer_dirty(b)) 4 sync_dirty_buffer(b); An error may not be saved 85% 5 6 if (!buffer_uptodate(b)) { Checking “b” 7 reiserfs_warning(...); 8 retval = -EIO; 9 } • Error paths: already failing! 15% • Met preconditions: cannot possibly fail! 25 (Updated) Running Time! • 3.2 GHz Intel Pentium 4 processor workstation with 3 GB RAM! Running Time (in minutes) 2.5 2 1.5 1 0.5 0 CIFS ext3 ext4 IBM JFS ReiserFS KLOC: 91 83 97 93 88 26 (Updated) Memory Usage! • 3.2 GHz Intel Pentium 4 processor workstation with 3 GB RAM! Memory Usage (in MB) 400 300 200 100 0 CIFS ext3 ext4 IBM JFS ReiserFS KLOC: 91 83 97 93 88 27 Other File Systems! • 43 other Linux file systems! o Bug reports to be examined! • Different Linux versions! o Significant evolution in each release! • NASA/JPL Laboratory for Reliable Software! o Using tool to check code in the Mars Science Laboratory! o Found one real error propagation bug so far “We’ve found a legitimate problem…We call a non-void function (that can return some critical error codes) and don’t assign the return value, dropping some nice things such as EASSERT, EABOUND, and EEBADHDR on the ground.