DART and CUTE: Concolic Testing
Koushik Sen University of California, Berkeley
Joint work with Gul Agha, Patrice Godefroid, Nils Klarlund, Rupak Majumdar, Darko Marinov Used and adapted by Jonathan Aldrich, with permission, for 17-355/17-655 Program Analysis
3/28/2017 1 Today, QA is mostly testing “50% of my company employees are testers, and the rest spends 50% of their time testing!” Bill Gates 1995
2 Software Reliability Importance
Software is pervading every aspects of life
Software everywhere Software failures were estimated to cost the US economy about $60 billion annually [NIST 2002]
Improvements in software testing infrastructure may save one- third of this cost Testing accounts for an estimated 50%-80% of the cost of software development [Beizer 1990]
3 Big Picture
Safe Programming Languages and Type systems
Model Testing Checking Runtime Static Program and Monitoring Analysis Theorem Proving Dynamic Program Analysis
Model Based Software Development and Analysis
4 Big Picture
Safe Programming Languages and Type systems
Model Testing Checking Runtime Static Program and Monitoring Analysis Theorem Proving Dynamic Program Analysis
Model Based Software Development and Analysis
5 A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) { int i=lo, j=hi, h; int x=a[(lo+hi)/2];
// partition do { while (a[i]
// recursion if (lo 6 A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) { Test QuickSort int i=lo, j=hi, h; int x=a[(lo+hi)/2]; Create an array Initialize the elements of // partition the array do { Execute the program on while (a[i] 7 Automated Test Generation Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? 8 Automated Test Generation Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient 9 Automated Test Generation Studied since 70’s King 76, Myers 79 30 years have passed, and yet no effective solution What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient In the recent years we have seen remarkable progress in static program-analysis and constraint solving SLAM, BLAST, ESP, Bandera, Saturn, MAGIC 10 Automated Test Generation Studied since 70’s King 76, Myers 79 Question: Can we use similar 30 years have passed, and yet no effective solutiontechniques in Automated Testing? What Happened??? Program-analysis techniques were expensive Automated theorem proving and constraint solving techniques were not efficient In the recent years we have seen remarkable progress in static program-analysis and constraint solving SLAM, BLAST, ESP, Bandera, Saturn, MAGIC 11 Systematic Automated Testing Model-Checking Static Analysis and Formal Methods Testing Automated Constraint Theorem Proving Solving 12 CUTE and DART Combine random testing (concrete execution) and symbolic testing (symbolic execution) [PLDI’05, FSE’05, FASE’06, CAV’06,ISSTA’07, ICSE’07] Concrete + Symbolic = Concolic 13 Goal Automated Unit Testing of real-world C and Java Programs Generate test inputs Execute unit under test on generated test inputs so that all reachable statements are executed Any assertion violation gets caught Sub-problem: Input Generation Given a statement s in program P, compute input i, such that P(i) executes s This formulation is due to Wolfram Schulte 14 Execution Paths of a Program Can be seen as a binary tree with possibly infinite depth 0 1 Computation tree Each node represents the 1 execution of a “if then else” 001 statement Each edge represents the 0 1 execution of a sequence of 1 non-conditional statements Each path in the tree 1 represents an equivalence class of inputs 0 1 15 Example of Computation Tree void test_me(int x, int y) { if(2*x==y){ if(x != y+10){ printf(“I am fine here”); } else { NY2*x==y printf(“I should not reach here”); ERROR; } } NYx!=y+10 } ERROR 16 Concolic Testing: Finding Security and Safety Bugs Divide by 0 Error Buffer Overflow x = 3 / i; a[i] = 4; 17 Concolic Testing: Finding Security and Safety Bugs Key: Add Checks Automatically and Perform Concolic Testing Divide by 0 Error Buffer Overflow if (i !=0) if (0<=i && i < a.length) x = 3 / i; a[i] = 4; else else ERROR; ERROR; 18 Existing Approach I Random testing test_me(int x){ generate random inputs if(x==94389){ execute the program on ERROR; generated inputs } Probability of reaching an error can be } astronomically less Probability of hitting ERROR = 1/232 19 Existing Approach II Symbolic Execution test_me(int x, int y){ use symbolic values for if((x%10)*4==y){ input variables ERROR; execute the program symbolically on symbolic } else { input values // OK collect symbolic path constraints } use theorem prover to } check if a branch can be taken Symbolic execution Does not scale for large cannot solve for x, y programs Missed error Cannot solve all constraints (or false positive) 20 Existing Approach II Symbolic Execution test_me(int x, int y){ use symbolic values for if(bbox(x)==y){ input variables ERROR; execute the program symbolically on symbolic } else { input values // OK collect symbolic path constraints } use theorem prover to } check if a branch can be taken Symbolic execution Does not scale for large cannot look inside the programs bbox function Cannot solve all Missed error constraints (or false positive) 21 Concolic Execution Concolic Execution test_me(int x, int y){ Randomly choose x and if(bbox(x)==y){ y the first time ERROR; Use concrete value of x and result of running } else { bbox(x) to solve for y // OK Finds the error! } } 22 Concolic Testing Approach int double (int v) { return 2*v; } Random Test Driver: random value for x void testme (int x, int y) { and y z = double (y); Probability of reaching if (z == x) { ERROR is extremely low if (x > y+10) { ERROR; } } } 23 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { x = 22, y = 7 x = x0, y = y0 z = double (y); if (z == x) { if (x > y+10) { ERROR; } } } 24 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { z = double (y); x = 22, y = 7, x = x0, y = y0, if (z == x) { z = 14 z = 2*y0 if (x > y+10) { ERROR; } } } 25 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { z = double (y); if (z == x) { 2*y0 != x0 if (x > y+10) { ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 14 z = 2*y0 26 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { Solve: 2*y0 == x0 z = double (y); Solution: x0 = 2, y0 = 1 if (z == x) { 2*y0 != x0 if (x > y+10) { ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 14 z = 2*y0 27 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { x = 2, y = 1 x = x0, y = y0 z = double (y); if (z == x) { if (x > y+10) { ERROR; } } } 28 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { z = double (y); x = 2, y = 1, x = x0, y = y0, if (z == x) { z = 2 z = 2*y0 if (x > y+10) { ERROR; } } } 29 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { z = double (y); if (z == x) { 2*y0 == x0 x = 2, y = 1, x = x0, y = y0, if (x > y+10) { z = 2 z = 2*y0 ERROR; } } } 30 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { z = double (y); if (z == x) { 2*y0 == x0 if (x > y+10) { x0 y0+10 ERROR; } } x = 2, y = 1, x = x0, y = y0, } z = 2 z = 2*y0 31 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { Solve: (2*y0 == x0) $ (x0 > y0 + 10) z = double (y); Solution: x0 = 30, y0 = 15 if (z == x) { 2*y0 == x0 if (x > y+10) { x0 y0+10 ERROR; } } x = 2, y = 1, x = x0, y = y0, } z = 2 z = 2*y0 32 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { x = 30, y = 15 x = x0, y = y0 z = double (y); if (z == x) { if (x > y+10) { ERROR; } } } 33 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { Program Error z = double (y); if (z == x) { 2*y0 == x0 if (x > y+10) { x0 > y0+10 x = 30, y = 15 x = x0, y = y0 ERROR; } } } 34 Explicit Path (not State) Model Checking Traverse all execution paths one by one to F T detect errors assertion violations T FFT program crash uncaught exceptions T F T combine with valgrind to discover memory errors T F T 35 Explicit Path (not State) Model Checking Traverse all execution paths one by one to F T detect errors assertion violations T FFT program crash uncaught exceptions T F T combine with valgrind to discover memory errors T F T 36 Explicit Path (not State) Model Checking Traverse all execution paths one by one to F T detect errors assertion violations T FFT program crash uncaught exceptions T F T combine with valgrind to discover memory errors T F T 37 Explicit Path (not State) Model Checking Traverse all execution paths one by one to F T detect errors assertion violations T FFT program crash uncaught exceptions T F T combine with valgrind to discover memory errors T F T 38 Explicit Path (not State) Model Checking Traverse all execution paths one by one to F T detect errors assertion violations T FFT program crash uncaught exceptions T F T combine with valgrind to discover memory errors T F T 39 Explicit Path (not State) Model Checking Traverse all execution paths one by one to F T detect errors assertion violations T FFT program crash uncaught exceptions T F T combine with valgrind to discover memory errors T F T 40 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution return (v*v) % 50; concrete symbolic path } state state condition void testme (int x, int y) { x = 22, y = 7 x = x0, y = y0 z = foo (y); if (z == x) { if (x > y+10) { ERROR; } } } 41 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution return (v*v) % 50; concrete symbolic path } state state condition void testme (int x, int y) { Solve: (y0*y0 )%50 == x0 z = foo (y); Don’t know how to solve! Stuck? if (z == x) { (y0*y0)%50 !=x0 if (x > y+10) { ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 49 z = (y0 *y0)%50 42 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic Execution Execution concrete symbolic path state state condition void testme (int x, int y) { Solve: foo (y0) == x0 z = foo (y); Don’t know how to solve! Stuck? if (z == x) { foo (y0)!=x0 if (x > y+10) { ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 49 z = foo (y0) 43 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution return (v*v) % 50; concrete symbolic path } state state condition void testme (int x, int y) { Solve: (y0*y0 )%50 == x0 z = foo (y); Don’t know how to solve! Not Stuck! if (z == x) { (y0*y0)%50 !=x0 Use concrete state if (x > y+10) { Replace y0 by 7 (sound) ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 49 z = (y0 *y0)%50 44 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution return (v*v) % 50; concrete symbolic path } state state condition void testme (int x, int y) { Solve: 49 == x0 z = foo (y); Solution : x0 = 49, y0 = 7 if (z == x) { 49 !=x0 if (x > y+10) { ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 48 z = 49 45 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution return (v*v) % 50; concrete symbolic path } state state condition void testme (int x, int y) { x = 49, y = 7 x = x0, y = y0 z = foo (y); if (z == x) { if (x > y+10) { ERROR; } } } 46 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution return (v*v) % 50; concrete symbolic path } state state condition void testme (int x, int y) { Program Error z = foo (y); if (z == x) { 2*y0 == x0 if (x > y+10) { x0 > y0+10 x = 49, y = 7, x = x0, y = y0 , ERROR; z = 49 z = 49 } } } 47 Concolic Testing: A Middle Approach Random Symbolic Testing Testing Concolic Testing + Complex programs + Complex programs - Simple programs + Efficient +/- Somewhat efficient - Not efficient - Less coverage + High coverage + High coverage + No false positive + No false positive - False positive 48 Implementations DART and CUTE for C programs jCUTE for Java programs Goto http://srl.cs.berkeley.edu/~ksen/ for CUTE and jCUTE binaries MSR has four implementations SAGE, PEX, YOGI, Vigilante Similar tool: EXE at Stanford Easiest way to use and to develop on top of CUTE Implement concolic testing yourself 49 Testing Data Structures (joint work with Darko Marinov and Gul Agha 50 Example typedef struct cell { int v; struct cell *next; } cell; • Random Test Driver: int f(int v) { • random memory graph return 2*v + 1; } reachable from p • random value for x int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) • Probability of reaching abort( ) is if (p->next == p) extremely low abort(); return 0; } 51 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p , x=236 p=p0, x=x0 int testme(cell *p, int x) { NULL if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } 52 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p if (x > 0) , x=236 p=p0, x=x0 if (p != NULL) NULL if (f(x) == p->v) if (p->next == p) abort(); return 0; } 53 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) p 0 if (p != NULL) , x=236 p=p0, x=x0 if (f(x) == p->v) NULL if (p->next == p) abort(); return 0; } 54 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) !(p0!=NULL if (f(x) == p->v) ) if (p->next == p) abort(); p return 0; , x=236 p=p0, x=x0 } NULL 55 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x >0 and p NULL int f(int v) { 0 0 return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0=NULL if (f(x) == p->v) if (p->next == p) abort(); p return 0; , x=236 p=p0, x=x0 } NULL 56 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x >0 and p NULL int f(int v) { 0 0 return 2*v + 1; x =236, p NULL } 0 0 634 int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0=NULL if (f(x) == p->v) if (p->next == p) abort(); p return 0; , x=236 p=p0, x=x0 } NULL 57 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p NULL p=p0, x=x0, , x=236 int testme(cell *p, int x) { p->v =v0, if (x > 0) 634 p->next=n0 if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } 58 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p NULL p=p , x=x , x >0 if (x > 0) 0 0 0 , x=236 p->v =v , if (p != NULL) 0 634 p->next=n if (f(x) == p->v) 0 if (p->next == p) abort(); return 0; } 59 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 p NULL p=p , x=x , if (p != NULL) 0 0 p0NULL , x=236 p->v =v , if (f(x) == p->v) 0 634 p->next=n if (p->next == p) 0 abort(); return 0; } 60 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL p NULL p=p , x=x , if (f(x) == p->v) 0 0 , x=236 p->v =v , 2x0+1v0 if (p->next == p) 0 634 p->next=n abort(); 0 return 0; } 61 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1v0 if (p->next == p) abort(); p NULL p=p , x=x , return 0; 0 0 , x=236 p->v =v , } 0 634 p->next=n0 62 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x0>0 and p0NULL int f(int v) { and 2x +1=v return 2*v + 1; 0 0 } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1v0 if (p->next == p) abort(); p NULL p=p , x=x , return 0; 0 0 , x=236 p->v =v , } 0 634 p->next=n0 63 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x0>0 and p0NULL int f(int v) { and 2x0+1=v0 return 2*v + 1; } x0=1, p0 NULL int testme(cell *p, int x) { 3 x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1v0 if (p->next == p) abort(); p NULL p=p , x=x , return 0; 0 0 , x=236 p->v =v , } 0 634 p->next=n0 64 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p NULL p=p0, x=x0, , x=1 int testme(cell *p, int x) { p->v =v0, if (x > 0) 3 p->next=n0 if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } 65 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p NULL p=p , x=x , x >0 if (x > 0) 0 0 0 , x=1 p->v =v , if (p != NULL) 0 3 p->next=n if (f(x) == p->v) 0 if (p->next == p) abort(); return 0; } 66 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 p NULL p=p , x=x , if (p != NULL) 0 0 p0NULL , x=1 p->v =v , if (f(x) == p->v) 0 3 p->next=n if (p->next == p) 0 abort(); return 0; } 67 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL p NULL p=p , x=x , if (f(x) == p->v) 0 0 , x=1 p->v =v , 2x0+1=v0 if (p->next == p) 0 3 p->next=n abort(); 0 return 0; } 68 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) p NULL p=p , x=x , 2x0+1=v0 if (p->next == p) 0 0 , x=1 p->v =v , n p abort(); 0 0 0 3 p->next=n return 0; 0 } 69 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1=v0 if (p->next == p) n p abort(); 0 0 p NULL return 0; p=p0, x=x0, , x=1 } p->v =v0, 3 p->next=n0 70 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; solve: x0>0 and p0NULL } and 2x0+1=v0 and n0=p0 int testme(cell *p, int x) { . x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1=v0 if (p->next == p) n p abort(); 0 0 p NULL return 0; p=p0, x=x0, , x=1 } p->v =v0, 3 p->next=n0 71 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; solve: x0>0 and p0NULL } and 2x0+1=v0 and n0=p0 int testme(cell *p, int x) { x =1, p 0 0 x >0 if (x > 0) 0 if (p != NULL) 3 p0NULL if (f(x) == p->v) 2x0+1=v0 if (p->next == p) n p abort(); 0 0 p NULL return 0; p=p0, x=x0, , x=1 } p->v =v0, 3 p->next=n0 72 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p p=p0, x=x0, , x=1 int testme(cell *p, int x) { p->v =v0, if (x > 0) 3 p->next=n0 if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } 73 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p p=p , x=x , x >0 if (x > 0) 0 0 0 , x=1 p->v =v , if (p != NULL) 0 3 p->next=n if (f(x) == p->v) 0 if (p->next == p) abort(); return 0; } 74 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 p p=p , x=x , if (p != NULL) 0 0 p0NULL , x=1 p->v =v , if (f(x) == p->v) 0 3 p->next=n if (p->next == p) 0 abort(); return 0; } 75 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL p p=p , x=x , if (f(x) == p->v) 0 0 , x=1 p->v =v , 2x0+1=v0 if (p->next == p) 0 3 p->next=n abort(); 0 return 0; } 76 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) Program Error 0 if (p != NULL) p0NULL if (f(x) == p->v) p p=p , x=x , 2x0+1=v0 if (p->next == p) 0 0 , x=1 p->v =v , n =p abort(); 0 0 0 3 p->next=n return 0; 0 } 77 Pointer Inputs: Input Graph typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; } 78 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path 79 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically 80 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution 81 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution concrete execution enables symbolic execution to overcome incompleteness of theorem prover replace symbolic expressions by concrete values if symbolic expressions become complex resolve aliases for pointer using concrete values handle arrays naturally 82 CUTE in a Nutshell Generate concrete inputs one by one each input leads program along a different path On each input execute program both concretely and symbolically Both cooperate with each other concrete execution guides the symbolic execution concrete execution enables symbolic execution to overcome incompleteness of theorem prover replace symbolic expressions by concrete values if symbolic expressions become complex resolve aliases for pointer using concrete values handle arrays naturally symbolic execution helps to generate concrete input for next execution increases coverage 83 Handling Pointer Casting and Arithmetic struct foo { Reasoning about dynamic int i; data is easy char c; }; Due to limitation of alias void * memset(void *s,char c,size_t n){ analysis “static analyzers” for(int i=0;i 84 Library Functions With Side-effects struct foo { Reasoning about function int i; with side-effects char c; Sound static tool will give }; false alarm bar(struct foo *a){ CUTE will not report the if(a && a->c==1){ error memset(a,0,sizeof(struct foo)); because it cannot generate if(a->c== 1) { a concrete input that will ERROR; make the program hit } ERROR. } } 85 Underlying Random Testing Helps 1 foobar(int x, int y){ static analysis based 2 if (x*x*x > 0){ model-checkers would 3 if (x>0 && y==10){ consider both branches 4 ERROR; both ERROR statements are reachable 5 } false alarm 6 } else { Symbolic execution 7 if (x>0 && y==20){ gets stuck at line number 8 ERROR; 2 9 } or warn that both 10 } ERRORs are reachable 11 } CUTE finds the only error 86 Data-structure Testing Solving Data-structure Invariants testme(slist *L,slist *e){ int isSortedSlist(slist * head) { CUTE_assume(isSortedSlist(L)); slist * cur, *tmp; sglib_slist_add(&L,e); int i,j; CUTE_assert(isSortedSlist(L)); if (head == 0) return 1; } i=j=0; for (cur = head; cur!=0; cur = cur->next){ i++; j=1; for (tmp = head; jnext){ j++; if(cur==tmp) return 0; } } for (cur = head; cur->next!=0; cur = cur- >next){ if(cur->i > cur->_next->i) return 0; } return 1; } 87 Data-structure Testing Generating Call Sequence for (i=1; i<10; i++) { CU_input(toss); CU_input(e); switch(toss){ case 2: sglib_hashed_ilist_add_if_not_member(htab,e,&m); break; case 3: sglib_hashed_ilist_delete_if_member(htab,e,&m); break; case 4: sglib_hashed_ilist_delete(htab,e); break; case 5: sglib_hashed_ilist_is_member(htab,e); break; case 6: sglib_hashed_ilist_find_member(htab,e); break; } } 88 Instrumentation 89 Instrumented Program Maintains a symbolic state maps each memory location used by the program to symbolic expressions over symbolic input variables Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 $ C2 $ … $ Ck holds over the path currently executed by the program 90 Instrumented Program Maintains a symbolic state Arithmetic Expressions maps each memory location a1x1+…+anxn+c used by the program to Pointer expressions symbolic expressions over x symbolic input variables Maintains a path constraint a sequence of constraints C1, C2,…, Ck such that Ci s are symbolic constraints over symbolic input variables C1 $ C2 $ … $ Ck holds over the path currently executed by the program 91 Instrumented Program Maintains a symbolic state Arithmetic Expressions maps each memory location a1x1+…+anxn+c used by the program to Pointer expressions symbolic expressions over x symbolic input variables Arithmetic Constraints Maintains a path constraint a1x1+…+anxn 0 a sequence of constraints Pointer Constraints C1, C2,…, Ck x=y such that Ci s are symbolic x y constraints over symbolic x=NULL input variables x NULL C1 $ C2 $ … $ Ck holds over the path currently executed by the program 92 Constraint Solving Path constraint C1$ C2$… $ Ck $… Cn 93 Constraint Solving Path constraint C1$ C2$… $ Ck $… Cn Solve C1$ C2$… $ Ck to generate next input 94 Constraint Solving Path constraint C1$ C2$… $ Ck $… Cn Solve C1$ C2$… $ Ck to generate next input If Ck is pointer constraint then invoke pointer solver C $ C $ … $ C where ij [1..k-1] and C is i1 i2 k ij pointer constraint If C is arithmetic constraint then invoke linear solver k C $ C $ … $ C where ij [1..k-1] and C is i1 i2 k ij arithmetic constraint 95 Optimizations Fast un-satisfiability check C1$ C2$… $ Ck 96 Optimizations Fast un-satisfiability check C1$ C2$… $ Ck Eliminate same constraints (y>2) $ (y>2)$(y==4) (y>2)$(y==4) that is C1$ C2$…Ci … Cj … $ Ck = C1$ C2$… Ci … $ Ck if Ci = Cj 97 Optimizations Fast un-satisfiability check C1$ C2$… $ Ck Eliminate same constraints (y>2) $ (y>2)$(y==4) (y>2)$(y==4) that is C1$ C2$…Ci … Cj … $ Ck = C1$ C2$… Ci … $ Ck if Ci = Cj Eliminate non-dependent constraints (x==1) $ (y>2)$(y==4) (y>2)$(y==4) 98 SGLIB: popular library for C data-structures Used in Xrefactory a commercial tool for refactoring C/C++ programs Found two bugs in sglib 1.0.1 reported them to authors fixed in sglib 1.0.2 Bug 1: doubly-linked list library segmentation fault occurs when a non-zero length list is concatenated with a zero-length list discovered in 140 iterations ( < 1second) Bug 2: hash-table an infinite loop in hash table is member function 193 iterations (1 second) 99 SGLIB: popular library for C data-structures Run time # of # of # of % Branch # of Bugs Name in Branches Functions Iterations Coverage Found seconds Explored Tested Array Quick 2 732 43 97.73 2 0 Sort Array Heap 4 1764 36 100.00 2 0 Sort Linked List 2 570 100 96.15 12 0 Sorted List 2 1020 110 96.49 11 0 Doubly Linked 3 1317 224 99.12 17 1 List Hash Table 1 193 46 85.19 8 1 Red Black 2629 1,000,000 242 71.18 17 0 Tree 100 Experiments: Needham-Schroeder Protocol Tested a C implementation of a security protocol (Needham-Schroeder) with a known attack 600 lines of code Took less than 13 seconds on a machine with 1.8GHz Xeon processor and 2 GB RAM to discover middle-man attack In contrast, a software model-checker (VeriSoft) took 8 hours 101 Testing Data-structures of CUTE itself Unit tested several non-standard data- structures implemented for the CUTE tool cu_depend (used to determine dependency during constraint solving using graph algorithm) cu_linear (linear symbolic expressions) cu_pointer (pointer symbolic expresiions) Discovered a few memory leaks and a copule of segmentation faults these errors did not show up in other uses of CUTE for memory leaks we used CUTE in conjunction with Valgrind 102 Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree 103 Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree Explored by Concolic Testing 104 Limitations Path Space of a Large Program is Huge Path Explosion Problem Entire Computation Tree Explored by Concolic Testing 105 Hybrid Concolic Testing (joing work with Rupak Majumdar) 106 Limitations: A Comparative View Concolic: Broad, shallow Random: Narrow, deep 107 Hybrid Concolic Testing Interleave Random Testing and Concolic Testing to increase coverage Motivated by similar idea used in VLSI design validation: Ganai et al. 1999, Ho et al. 2000 108 Hybrid Concolic Testing Interleave Random Testing and Concolic Testing to increase coverage while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; } 109 Hybrid Concolic Testing Interleave Random Testing and Concolic Testing to increase coverage while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; } Deep, broad search Hybrid Search 110 Hybrid Concolic Testing 4x more coverage than random testing 2x more coverage than concolic testing 111 Results 112 Results 113 Summary Static Analysis Model-Checking and Formal Methods Program Verification Automated Constraint Theorem Proving Solving 114 Summary Static Analysis Model-Checking and Formal Methods Program Verification Testing Automated Constraint Theorem Proving Solving 115 References DART: Directed Automated Random Testing. Patrice Godefroid, Nils Klarlund, and Koushik Sen (PLDI 2005) CUTE: A Concolic Unit Testing Engine for C. Koushik Sen, Darko Marinov, and Gul Agha (ESEC/FSE 2005) 1. "Dynamic Test Input Generation for Database Applications” Michael Emmi, Rupak Majumdar, and Koushik Sen, International Symposium on Software Testing and Analysis (ISSTA'07), ACM, 2007. 2. "Hybrid Concolic Testing” Rupak Majumdar and Koushik Sen, 29th International Conference on Software Engineering (ICSE'07), IEEE, 2007. 3. "A Race-Detection and Flipping Algorithm for Automated Testing of Multi-Threaded Programs” Koushik Sen and Gul Agha, Haifa verification conference 2006 (HVC'06), Lecture Notes in Computer Science, Springer, 2006. 4. "CUTE and jCUTE : Concolic Unit Testing and Explicit Path Model-Checking Tools” Koushik Sen and Gul Agha, 18th International Conference on Computer Aided Verification (CAV'06), Lecture Notes in Computer Science, Springer, 2006. (Tool Paper). 5. "Automated Systematic Testing of Open Distributed Programs” Koushik Sen and Gul Agha, International Conference on Fundamental Approaches to Software Engineering (FASE'06) (ETAPS'06 conference), Lecture Notes in Computer Science, Springer, 2006. 6. “EXE: Automatically Generating Inputs of Death,” Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler, 13th ACM Conference on Computer and Communications Security, 2006. 7. “Automatically generating malicious disks using symbolic execution,” Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler. IEEE Security and Privacy, 2006. 116