DART and CUTE: Concolic Testing

Koushik Sen University of California, Berkeley

Joint work with Gul Agha, Patrice Godefroid, Nils Klarlund, Rupak Majumdar, Darko Marinov Used and adapted by Jonathan Aldrich, with permission, for 17-355/17-655 Program Analysis

3/28/2017 1 Today, QA is mostly testing “50% of my company employees are testers, and the rest spends 50% of their time testing!” Bill Gates 1995

2 Software Reliability Importance

 Software is pervading every aspects of life

 Software everywhere  Software failures were estimated to cost the US economy about $60 billion annually [NIST 2002]

 Improvements in infrastructure may save one- third of this cost  Testing accounts for an estimated 50%-80% of the cost of software development [Beizer 1990]

3 Big Picture

Safe Programming Languages and Type systems

Model Testing Checking Runtime Static Program and Monitoring Analysis Theorem Proving

Model Based Software Development and Analysis

4 Big Picture

Safe Programming Languages and Type systems

Model Testing Checking Runtime Static Program and Monitoring Analysis Theorem Proving Dynamic Program Analysis

Model Based Software Development and Analysis

5 A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) { int i=lo, j=hi, h; int x=a[(lo+hi)/2];

// partition do { while (a[i]x) j--; if (i<=j) { h=a[i]; a[i]=a[j]; a[j]=h; i++; j--; } } while (i<=j);

// recursion if (lo

6 A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) {  Test QuickSort int i=lo, j=hi, h; int x=a[(lo+hi)/2];  Create an array  Initialize the elements of // partition the array do {  Execute the program on while (a[i]x) j--; if (i<=j) {  How much confidence h=a[i]; a[i]=a[j]; do I have in this testing a[j]=h; method? i++;  Is my test suite j--; *Complete*? } } while (i<=j);  Can someone generate a small and *Complete* // recursion test suite for me? if (lo

7 Automated Test Generation  Studied since 70’s  King 76, Myers 79  30 years have passed, and yet no effective solution  What Happened???

8 Automated Test Generation  Studied since 70’s  King 76, Myers 79  30 years have passed, and yet no effective solution  What Happened???  Program-analysis techniques were expensive  Automated theorem proving and constraint solving techniques were not efficient

9 Automated Test Generation  Studied since 70’s  King 76, Myers 79  30 years have passed, and yet no effective solution  What Happened???  Program-analysis techniques were expensive  Automated theorem proving and constraint solving techniques were not efficient  In the recent years we have seen remarkable progress in static program-analysis and constraint solving  SLAM, BLAST, ESP, Bandera, Saturn, MAGIC

10 Automated Test Generation  Studied since 70’s  King 76, Myers 79 Question: Can we use similar  30 years have passed, and yet no effective solutiontechniques in Automated Testing?  What Happened???  Program-analysis techniques were expensive  Automated theorem proving and constraint solving techniques were not efficient  In the recent years we have seen remarkable progress in static program-analysis and constraint solving  SLAM, BLAST, ESP, Bandera, Saturn, MAGIC

11 Systematic Automated Testing Model-Checking Static Analysis and Formal Methods

Testing

Automated Constraint Theorem Proving Solving

12 CUTE and DART

 Combine random testing (concrete execution) and symbolic testing () [PLDI’05, FSE’05, FASE’06, CAV’06,ISSTA’07, ICSE’07]

Concrete + Symbolic = Concolic

13 Goal  Automated of real-world C and Java Programs  Generate test inputs  Execute unit under test on generated test inputs  so that all reachable statements are executed  Any assertion violation gets caught

 Sub-problem: Input Generation  Given a statement s in program P, compute input i, such that P(i) executes s  This formulation is due to Wolfram Schulte

14 Execution Paths of a Program

 Can be seen as a binary tree with possibly infinite depth 0 1  Computation tree  Each node represents the 1 execution of a “if then else” 001 statement

 Each edge represents the 0 1 execution of a sequence of 1 non-conditional statements  Each path in the tree 1 represents an equivalence class of inputs 0 1

15 Example of Computation Tree

void test_me(int x, int y) { if(2*x==y){ if(x != y+10){ printf(“I am fine here”); } else { NY2*x==y printf(“I should not reach here”); ERROR; } } NYx!=y+10 }

ERROR

16 Concolic Testing: Finding Security and Safety Bugs

Divide by 0 Error Buffer Overflow

x = 3 / i; a[i] = 4;

17 Concolic Testing: Finding Security and Safety Bugs

Key: Add Checks Automatically and Perform Concolic Testing

Divide by 0 Error Buffer Overflow

if (i !=0) if (0<=i && i < a.length) x = 3 / i; a[i] = 4; else else ERROR; ERROR;

18 Existing Approach I

 Random testing test_me(int x){  generate random inputs if(x==94389){  execute the program on ERROR; generated inputs }  Probability of reaching an error can be } astronomically less Probability of hitting ERROR = 1/232

19 Existing Approach II

 Symbolic Execution test_me(int x, int y){  use symbolic values for if((x%10)*4==y){ input variables ERROR;  execute the program symbolically on symbolic } else { input values // OK  collect symbolic path constraints }  use theorem prover to } check if a branch can be taken Symbolic execution  Does not scale for large cannot solve for x, y programs Missed error  Cannot solve all constraints (or false positive)

20 Existing Approach II

 Symbolic Execution test_me(int x, int y){  use symbolic values for if(bbox(x)==y){ input variables ERROR;  execute the program symbolically on symbolic } else { input values // OK  collect symbolic path constraints }  use theorem prover to } check if a branch can be taken Symbolic execution  Does not scale for large cannot look inside the programs bbox function  Cannot solve all Missed error constraints (or false positive) 21 Concolic Execution

 Concolic Execution test_me(int x, int y){  Randomly choose x and if(bbox(x)==y){ y the first time ERROR;  Use concrete value of x and result of running } else { bbox(x) to solve for y // OK  Finds the error! } }

22 Concolic Testing Approach int double (int v) {

return 2*v; }  Random Test Driver:

 random value for x void testme (int x, int y) { and y z = double (y);  Probability of reaching if (z == x) { ERROR is extremely low if (x > y+10) {

ERROR; } }

}

23 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { x = 22, y = 7 x = x0, y = y0 z = double (y);

if (z == x) {

if (x > y+10) {

ERROR; } }

}

24 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) {

z = double (y); x = 22, y = 7, x = x0, y = y0, if (z == x) { z = 14 z = 2*y0

if (x > y+10) {

ERROR; } }

}

25 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) {

z = double (y);

if (z == x) { 2*y0 != x0

if (x > y+10) {

ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 14 z = 2*y0

26 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { Solve: 2*y0 == x0

z = double (y); Solution: x0 = 2, y0 = 1

if (z == x) { 2*y0 != x0

if (x > y+10) {

ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 14 z = 2*y0

27 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { x = 2, y = 1 x = x0, y = y0 z = double (y);

if (z == x) {

if (x > y+10) {

ERROR; } }

}

28 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) {

z = double (y); x = 2, y = 1, x = x0, y = y0, if (z == x) { z = 2 z = 2*y0

if (x > y+10) {

ERROR; } }

}

29 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) {

z = double (y);

if (z == x) { 2*y0 == x0

x = 2, y = 1, x = x0, y = y0, if (x > y+10) { z = 2 z = 2*y0

ERROR; } }

}

30 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) {

z = double (y);

if (z == x) { 2*y0 == x0

if (x > y+10) { x0  y0+10

ERROR; } }

x = 2, y = 1, x = x0, y = y0, } z = 2 z = 2*y0

31 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { Solve: (2*y0 == x0) $ (x0 > y0 + 10)

z = double (y); Solution: x0 = 30, y0 = 15

if (z == x) { 2*y0 == x0

if (x > y+10) { x0  y0+10

ERROR; } }

x = 2, y = 1, x = x0, y = y0, } z = 2 z = 2*y0

32 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { x = 30, y = 15 x = x0, y = y0 z = double (y);

if (z == x) {

if (x > y+10) {

ERROR; } }

}

33 Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition void testme (int x, int y) { Program Error z = double (y);

if (z == x) { 2*y0 == x0

if (x > y+10) { x0 > y0+10

x = 30, y = 15 x = x0, y = y0 ERROR; } }

}

34 Explicit Path (not State) Model Checking

 Traverse all execution

paths one by one to F T detect errors

 assertion violations T FFT  program crash

 uncaught exceptions T F T  combine with valgrind to discover memory errors T

F T

35 Explicit Path (not State) Model Checking

 Traverse all execution

paths one by one to F T detect errors

 assertion violations T FFT  program crash

 uncaught exceptions T F T  combine with valgrind to discover memory errors T

F T

36 Explicit Path (not State) Model Checking

 Traverse all execution

paths one by one to F T detect errors

 assertion violations T FFT  program crash

 uncaught exceptions T F T  combine with valgrind to discover memory errors T

F T

37 Explicit Path (not State) Model Checking

 Traverse all execution

paths one by one to F T detect errors

 assertion violations T FFT  program crash

 uncaught exceptions T F T  combine with valgrind to discover memory errors T

F T

38 Explicit Path (not State) Model Checking

 Traverse all execution

paths one by one to F T detect errors

 assertion violations T FFT  program crash

 uncaught exceptions T F T  combine with valgrind to discover memory errors T

F T

39 Explicit Path (not State) Model Checking

 Traverse all execution

paths one by one to F T detect errors

 assertion violations T FFT  program crash

 uncaught exceptions T F T  combine with valgrind to discover memory errors T

F T

40 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition

void testme (int x, int y) { x = 22, y = 7 x = x0, y = y0 z = foo (y);

if (z == x) {

if (x > y+10) {

ERROR; } }

}

41 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition

void testme (int x, int y) { Solve: (y0*y0 )%50 == x0 z = foo (y); Don’t know how to solve! Stuck? if (z == x) { (y0*y0)%50 !=x0

if (x > y+10) {

ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 49 z = (y0 *y0)%50

42 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic Execution Execution concrete symbolic path state state condition

void testme (int x, int y) { Solve: foo (y0) == x0 z = foo (y); Don’t know how to solve! Stuck? if (z == x) { foo (y0)!=x0

if (x > y+10) {

ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 49 z = foo (y0)

43 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition

void testme (int x, int y) { Solve: (y0*y0 )%50 == x0 z = foo (y); Don’t know how to solve! Not Stuck! if (z == x) { (y0*y0)%50 !=x0 Use concrete state if (x > y+10) { Replace y0 by 7 (sound) ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 49 z = (y0 *y0)%50

44 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition

void testme (int x, int y) { Solve: 49 == x0

z = foo (y); Solution : x0 = 49, y0 = 7

if (z == x) { 49 !=x0

if (x > y+10) {

ERROR; } } x = 22, y = 7, x = x0, y = y0, } z = 48 z = 49

45 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition

void testme (int x, int y) { x = 49, y = 7 x = x0, y = y0 z = foo (y);

if (z == x) {

if (x > y+10) {

ERROR; } }

}

46 Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition

void testme (int x, int y) { Program Error z = foo (y);

if (z == x) { 2*y0 == x0

if (x > y+10) { x0 > y0+10

x = 49, y = 7, x = x0, y = y0 , ERROR; z = 49 z = 49 } }

}

47 Concolic Testing: A Middle Approach Random Symbolic Testing Testing

Concolic Testing + Complex programs + Complex programs - Simple programs + Efficient +/- Somewhat efficient - Not efficient - Less coverage + High coverage + High coverage + No false positive + No false positive - False positive

48 Implementations  DART and CUTE for C programs  jCUTE for Java programs

 Goto http://srl.cs.berkeley.edu/~ksen/ for CUTE and jCUTE binaries  MSR has four implementations

 SAGE, PEX, YOGI, Vigilante  Similar tool: EXE at Stanford  Easiest way to use and to develop on top of CUTE

 Implement concolic testing yourself

49 Testing Data Structures (joint work with Darko Marinov and Gul Agha

50 Example typedef struct cell { int v; struct cell *next; } cell; • Random Test Driver: int f(int v) { • random memory graph return 2*v + 1; } reachable from p • random value for x int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) • Probability of reaching abort( ) is if (p->next == p) extremely low abort(); return 0; }

51 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p

, x=236 p=p0, x=x0 int testme(cell *p, int x) { NULL if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; }

52 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p if (x > 0) , x=236 p=p0, x=x0 if (p != NULL) NULL if (f(x) == p->v) if (p->next == p) abort(); return 0; }

53 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) p 0 if (p != NULL) , x=236 p=p0, x=x0 if (f(x) == p->v) NULL if (p->next == p) abort(); return 0; }

54 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) !(p0!=NULL if (f(x) == p->v) ) if (p->next == p) abort(); p return 0; , x=236 p=p0, x=x0 } NULL

55 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x >0 and p NULL int f(int v) { 0 0 return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0=NULL if (f(x) == p->v) if (p->next == p) abort(); p return 0; , x=236 p=p0, x=x0 } NULL

56 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x >0 and p NULL int f(int v) { 0 0 return 2*v + 1; x =236, p NULL } 0 0 634 int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0=NULL if (f(x) == p->v) if (p->next == p) abort(); p return 0; , x=236 p=p0, x=x0 } NULL

57 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p NULL p=p0, x=x0, , x=236 int testme(cell *p, int x) { p->v =v0, if (x > 0) 634 p->next=n0 if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; }

58 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p NULL p=p , x=x , x >0 if (x > 0) 0 0 0 , x=236 p->v =v , if (p != NULL) 0 634 p->next=n if (f(x) == p->v) 0 if (p->next == p) abort(); return 0; }

59 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 p NULL p=p , x=x , if (p != NULL) 0 0 p0NULL , x=236 p->v =v , if (f(x) == p->v) 0 634 p->next=n if (p->next == p) 0 abort(); return 0; }

60 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL p NULL p=p , x=x , if (f(x) == p->v) 0 0 , x=236 p->v =v , 2x0+1v0 if (p->next == p) 0 634 p->next=n abort(); 0 return 0; }

61 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1v0 if (p->next == p) abort(); p NULL p=p , x=x , return 0; 0 0 , x=236 p->v =v , } 0 634 p->next=n0

62 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x0>0 and p0NULL int f(int v) { and 2x +1=v return 2*v + 1; 0 0 } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1v0 if (p->next == p) abort(); p NULL p=p , x=x , return 0; 0 0 , x=236 p->v =v , } 0 634 p->next=n0 63 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state solve: x0>0 and p0NULL int f(int v) { and 2x0+1=v0 return 2*v + 1; } x0=1, p0 NULL int testme(cell *p, int x) { 3 x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1v0 if (p->next == p) abort(); p NULL p=p , x=x , return 0; 0 0 , x=236 p->v =v , } 0 634 p->next=n0 64 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p NULL p=p0, x=x0, , x=1 int testme(cell *p, int x) { p->v =v0, if (x > 0) 3 p->next=n0 if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; }

65 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p NULL p=p , x=x , x >0 if (x > 0) 0 0 0 , x=1 p->v =v , if (p != NULL) 0 3 p->next=n if (f(x) == p->v) 0 if (p->next == p) abort(); return 0; }

66 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 p NULL p=p , x=x , if (p != NULL) 0 0 p0NULL , x=1 p->v =v , if (f(x) == p->v) 0 3 p->next=n if (p->next == p) 0 abort(); return 0; }

67 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL p NULL p=p , x=x , if (f(x) == p->v) 0 0 , x=1 p->v =v , 2x0+1=v0 if (p->next == p) 0 3 p->next=n abort(); 0 return 0; }

68 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) p NULL p=p , x=x , 2x0+1=v0 if (p->next == p) 0 0 , x=1 p->v =v , n p abort(); 0 0 0 3 p->next=n return 0; 0 }

69 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1=v0 if (p->next == p) n p abort(); 0 0 p NULL return 0; p=p0, x=x0, , x=1 } p->v =v0, 3 p->next=n0 70 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; solve: x0>0 and p0NULL } and 2x0+1=v0 and n0=p0 int testme(cell *p, int x) { . x >0 if (x > 0) 0 if (p != NULL) p0NULL if (f(x) == p->v) 2x0+1=v0 if (p->next == p) n p abort(); 0 0 p NULL return 0; p=p0, x=x0, , x=1 } p->v =v0, 3 p->next=n0 71 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; solve: x0>0 and p0NULL } and 2x0+1=v0 and n0=p0 int testme(cell *p, int x) { x =1, p 0 0 x >0 if (x > 0) 0 if (p != NULL) 3 p0NULL if (f(x) == p->v) 2x0+1=v0 if (p->next == p) n p abort(); 0 0 p NULL return 0; p=p0, x=x0, , x=1 } p->v =v0, 3 p->next=n0 72 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } p p=p0, x=x0, , x=1 int testme(cell *p, int x) { p->v =v0, if (x > 0) 3 p->next=n0 if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; }

73 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { p p=p , x=x , x >0 if (x > 0) 0 0 0 , x=1 p->v =v , if (p != NULL) 0 3 p->next=n if (f(x) == p->v) 0 if (p->next == p) abort(); return 0; }

74 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 p p=p , x=x , if (p != NULL) 0 0 p0NULL , x=1 p->v =v , if (f(x) == p->v) 0 3 p->next=n if (p->next == p) 0 abort(); return 0; }

75 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) 0 if (p != NULL) p0NULL p p=p , x=x , if (f(x) == p->v) 0 0 , x=1 p->v =v , 2x0+1=v0 if (p->next == p) 0 3 p->next=n abort(); 0 return 0; }

76 CUTE Approach typedef struct cell { Concrete Symbolic int v; Execution Execution struct cell *next; concrete symbolic constraints } cell; state state int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { x >0 if (x > 0) Program Error 0 if (p != NULL) p0NULL if (f(x) == p->v) p p=p , x=x , 2x0+1=v0 if (p->next == p) 0 0 , x=1 p->v =v , n =p abort(); 0 0 0 3 p->next=n return 0; 0 }

77 Pointer Inputs: Input Graph typedef struct cell { int v; struct cell *next; } cell; int f(int v) { return 2*v + 1; } int testme(cell *p, int x) { if (x > 0) if (p != NULL) if (f(x) == p->v) if (p->next == p) abort(); return 0; }

78 CUTE in a Nutshell  Generate concrete inputs one by one  each input leads program along a different path

79 CUTE in a Nutshell  Generate concrete inputs one by one  each input leads program along a different path  On each input execute program both concretely and symbolically

80 CUTE in a Nutshell  Generate concrete inputs one by one  each input leads program along a different path  On each input execute program both concretely and symbolically  Both cooperate with each other  concrete execution guides the symbolic execution

81 CUTE in a Nutshell  Generate concrete inputs one by one  each input leads program along a different path  On each input execute program both concretely and symbolically  Both cooperate with each other  concrete execution guides the symbolic execution  concrete execution enables symbolic execution to overcome incompleteness of theorem prover  replace symbolic expressions by concrete values if symbolic expressions become complex  resolve aliases for pointer using concrete values  handle arrays naturally

82 CUTE in a Nutshell  Generate concrete inputs one by one  each input leads program along a different path  On each input execute program both concretely and symbolically  Both cooperate with each other  concrete execution guides the symbolic execution  concrete execution enables symbolic execution to overcome incompleteness of theorem prover  replace symbolic expressions by concrete values if symbolic expressions become complex  resolve aliases for pointer using concrete values  handle arrays naturally  symbolic execution helps to generate concrete input for next execution  increases coverage

83 Handling Pointer Casting and Arithmetic struct foo {  Reasoning about dynamic int i; data is easy char c; };  Due to limitation of alias void * memset(void *s,char c,size_t n){ analysis “static analyzers” for(int i=0;ic” has been rewritten }  BLAST would infer that bar(struct foo *a){ if(a && a->c==1){ the program is safe memset(a,0,sizeof(struct foo));  practical if(a->c!= 1) {  CUTE finds the error ERROR; } } }

84 Library Functions With Side-effects struct foo {  Reasoning about function int i; with side-effects char c;  Sound static tool will give }; false alarm bar(struct foo *a){  CUTE will not report the if(a && a->c==1){ error

memset(a,0,sizeof(struct foo));  because it cannot generate if(a->c== 1) { a concrete input that will ERROR; make the program hit } ERROR. } }

85 Underlying Random Testing Helps

1 foobar(int x, int y){  static analysis based 2 if (x*x*x > 0){ model-checkers would 3 if (x>0 && y==10){ consider both branches 4 ERROR;  both ERROR statements are reachable 5 }  false alarm 6 } else {  Symbolic execution 7 if (x>0 && y==20){  gets stuck at line number 8 ERROR; 2

9 }  or warn that both 10 } ERRORs are reachable 11 }  CUTE finds the only error

86 Data-structure Testing Solving Data-structure Invariants testme(slist *L,slist *e){ int isSortedSlist(slist * head) { CUTE_assume(isSortedSlist(L)); slist * cur, *tmp; sglib_slist_add(&L,e); int i,j; CUTE_assert(isSortedSlist(L)); if (head == 0) return 1; } i=j=0; for (cur = head; cur!=0; cur = cur->next){ i++; j=1; for (tmp = head; jnext){ j++; if(cur==tmp) return 0; } } for (cur = head; cur->next!=0; cur = cur- >next){ if(cur->i > cur->_next->i) return 0; } return 1; }

87 Data-structure Testing Generating Call Sequence for (i=1; i<10; i++) { CU_input(toss); CU_input(e); switch(toss){ case 2: sglib_hashed_ilist_add_if_not_member(htab,e,&m); break; case 3: sglib_hashed_ilist_delete_if_member(htab,e,&m); break; case 4: sglib_hashed_ilist_delete(htab,e); break; case 5: sglib_hashed_ilist_is_member(htab,e); break; case 6: sglib_hashed_ilist_find_member(htab,e); break; } }

88 Instrumentation

89 Instrumented Program

 Maintains a symbolic state

 maps each memory location used by the program to symbolic expressions over symbolic input variables  Maintains a path constraint

 a sequence of constraints

C1, C2,…, Ck

 such that Ci s are symbolic constraints over symbolic input variables

 C1 $ C2 $ … $ Ck holds over the path currently executed by the program

90 Instrumented Program

 Maintains a symbolic state  Arithmetic Expressions

 maps each memory location a1x1+…+anxn+c used by the program to  Pointer expressions symbolic expressions over x symbolic input variables  Maintains a path constraint

 a sequence of constraints

C1, C2,…, Ck

 such that Ci s are symbolic constraints over symbolic input variables

 C1 $ C2 $ … $ Ck holds over the path currently executed by the program

91 Instrumented Program

 Maintains a symbolic state  Arithmetic Expressions

 maps each memory location a1x1+…+anxn+c used by the program to  Pointer expressions symbolic expressions over x symbolic input variables  Arithmetic Constraints

 Maintains a path constraint a1x1+…+anxn  0  a sequence of constraints  Pointer Constraints

C1, C2,…, Ck x=y

 such that Ci s are symbolic x y constraints over symbolic x=NULL input variables x NULL  C1 $ C2 $ … $ Ck holds over the path currently executed by the program

92 Constraint Solving Path constraint

C1$ C2$… $ Ck $… Cn

93 Constraint Solving Path constraint

C1$ C2$… $ Ck $… Cn

Solve

C1$ C2$… $  Ck to generate next input

94 Constraint Solving Path constraint

C1$ C2$… $ Ck $… Cn

Solve

C1$ C2$… $  Ck to generate next input

If Ck is pointer constraint then invoke pointer solver C $ C $ … $  C where ij  [1..k-1] and C is i1 i2 k ij pointer constraint If C is arithmetic constraint then invoke linear solver k C $ C $ … $  C where ij  [1..k-1] and C is i1 i2 k ij arithmetic constraint

95 Optimizations

 Fast un-satisfiability check

C1$ C2$… $  Ck

96 Optimizations

 Fast un-satisfiability check

C1$ C2$… $  Ck  Eliminate same constraints (y>2) $ (y>2)$(y==4)  (y>2)$(y==4) that is

C1$ C2$…Ci … Cj … $  Ck = C1$ C2$… Ci … $  Ck if Ci = Cj

97 Optimizations

 Fast un-satisfiability check

C1$ C2$… $  Ck  Eliminate same constraints (y>2) $ (y>2)$(y==4)  (y>2)$(y==4) that is

C1$ C2$…Ci … Cj … $  Ck = C1$ C2$… Ci … $  Ck if Ci = Cj  Eliminate non-dependent constraints (x==1) $ (y>2)$(y==4)  (y>2)$(y==4)

98 SGLIB: popular library for C data-structures

 Used in Xrefactory a commercial tool for refactoring C/C++ programs  Found two bugs in sglib 1.0.1  reported them to authors  fixed in sglib 1.0.2  Bug 1:  doubly-linked list library  segmentation fault occurs when a non-zero length list is concatenated with a zero-length list  discovered in 140 iterations ( < 1second)  Bug 2:  hash-table  an infinite loop in hash table is member function  193 iterations (1 second)

99 SGLIB: popular library for C data-structures

Run time # of # of # of % Branch # of Bugs Name in Branches Functions Iterations Coverage Found seconds Explored Tested

Array Quick 2 732 43 97.73 2 0 Sort Array Heap 4 1764 36 100.00 2 0 Sort Linked List 2 570 100 96.15 12 0 Sorted List 2 1020 110 96.49 11 0 Doubly Linked 3 1317 224 99.12 17 1 List Hash Table 1 193 46 85.19 8 1 Red Black 2629 1,000,000 242 71.18 17 0 Tree

100 Experiments: Needham-Schroeder Protocol

 Tested a C implementation of a security protocol (Needham-Schroeder) with a known attack

 600 lines of code

 Took less than 13 seconds on a machine with 1.8GHz Xeon processor and 2 GB RAM to discover middle-man attack  In contrast, a software model-checker (VeriSoft) took 8 hours

101 Testing Data-structures of CUTE itself  Unit tested several non-standard data- structures implemented for the CUTE tool  cu_depend (used to determine dependency during constraint solving using graph algorithm)  cu_linear (linear symbolic expressions)  cu_pointer (pointer symbolic expresiions)  Discovered a few memory leaks and a copule of segmentation faults  these errors did not show up in other uses of CUTE  for memory leaks we used CUTE in conjunction with Valgrind

102 Limitations

 Path Space of a Large Program is Huge  Path Explosion Problem

Entire Computation Tree

103 Limitations

 Path Space of a Large Program is Huge  Path Explosion Problem

Entire Computation Tree

Explored by Concolic Testing

104 Limitations

 Path Space of a Large Program is Huge  Path Explosion Problem

Entire Computation Tree

Explored by Concolic Testing

105 Hybrid Concolic Testing (joing work with Rupak Majumdar)

106 Limitations: A Comparative View

Concolic: Broad, shallow Random: Narrow, deep 107 Hybrid Concolic Testing  Interleave Random Testing and Concolic Testing to increase coverage

Motivated by similar idea used in VLSI design validation: Ganai et al. 1999, Ho et al. 2000

108 Hybrid Concolic Testing  Interleave Random Testing and Concolic Testing to increase coverage

while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; }

109 Hybrid Concolic Testing  Interleave Random Testing and Concolic Testing to increase coverage

while (not required coverage) { while (not saturation) perform random testing; Checkpoint; while (not increase in coverage) perform concolic testing; Restore; }

Deep, broad search Hybrid Search 110 Hybrid Concolic Testing

 4x more coverage than random testing  2x more coverage than concolic testing

111 Results

112 Results

113 Summary

Static Analysis Model-Checking and Formal Methods Program Verification

Automated Constraint Theorem Proving Solving

114 Summary

Static Analysis Model-Checking and Formal Methods Program Verification

Testing

Automated Constraint Theorem Proving Solving

115 References  DART: Directed Automated Random Testing. Patrice Godefroid, Nils Klarlund, and Koushik Sen (PLDI 2005)  CUTE: A Concolic Unit Testing Engine for C. Koushik Sen, Darko Marinov, and Gul Agha (ESEC/FSE 2005) 1. "Dynamic Test Input Generation for Database Applications” Michael Emmi, Rupak Majumdar, and Koushik Sen, International Symposium on Software Testing and Analysis (ISSTA'07), ACM, 2007. 2. "Hybrid Concolic Testing” Rupak Majumdar and Koushik Sen, 29th International Conference on Software Engineering (ICSE'07), IEEE, 2007. 3. "A Race-Detection and Flipping Algorithm for Automated Testing of Multi-Threaded Programs” Koushik Sen and Gul Agha, Haifa verification conference 2006 (HVC'06), Lecture Notes in Computer Science, Springer, 2006. 4. "CUTE and jCUTE : Concolic Unit Testing and Explicit Path Model-Checking Tools” Koushik Sen and Gul Agha, 18th International Conference on Computer Aided Verification (CAV'06), Lecture Notes in Computer Science, Springer, 2006. (Tool Paper). 5. "Automated Systematic Testing of Open Distributed Programs” Koushik Sen and Gul Agha, International Conference on Fundamental Approaches to Software Engineering (FASE'06) (ETAPS'06 conference), Lecture Notes in Computer Science, Springer, 2006. 6. “EXE: Automatically Generating Inputs of Death,” Cristian Cadar, Vijay Ganesh, Peter M. Pawlowski, David L. Dill, Dawson R. Engler, 13th ACM Conference on Computer and Communications Security, 2006. 7. “Automatically generating malicious disks using symbolic execution,” Junfeng Yang, Can Sar, Paul Twohey, Cristian Cadar, and Dawson Engler. IEEE Security and Privacy, 2006.

116