Today, QA is mostly testing “50% of my company employees are testers, Modern Symbolic Execution: and the rest spends 50% of their time testing!” DART, EGT, CUTE, jCUTE, EXE, 1995 KLEE, CREST, CATG

Cristian Cadar Koushik Sen Department of Computing EECS Department Imperial College London University of California, Berkeley

1 2

A Familiar Program: QuickSort A Familiar Program: QuickSort void quicksort (int[] a, int lo, int hi) { void quicksort (int[] a, int lo, int hi) { n Test QuickSort int i=lo, j=hi, h; int i=lo, j=hi, h; int x=a[(lo+hi)/2]; int x=a[(lo+hi)/2]; q Create an array q Initialize the elements of // partition // partition the array do { do { q Execute the program on while (a[i]x) j--; if (i<=j) { while (a[j]>x) j--; if (i<=j) { n How much confidence h=a[i]; h=a[i]; a[i]=a[j]; a[i]=a[j]; do I have in this testing a[j]=h; a[j]=h; method? i++; i++; n Is my test suite j--; j--; *Complete*? } } } while (i<=j); } while (i<=j); n Can someone generate a small and *Complete* // recursion // recursion test suite for me? if (lo

3 4

1 Automated Test Generation Automated Test Generation n Studied since 70’s n Studied since 70’s q King 76, Myers 79 q King 76, Myers 79 n 30 years have passed, and yet no effective n 30 years have passed, and yet no effective solution solution n What Happened??? n What Happened??? q Program-analysis techniques were expensive q Automated theorem proving and constraint solving techniques were not efficient

5 6

Automated Test Generation Automated Test Generation n Studied since 70’s n Studied since 70’s q King 76, Myers 79 q King 76, Myers 79 n 30 years have passed, and yet no effective solution n 30 Question:years have passed,Can we anduse yet teachniques no effective n What Happened??? solutionfrom axiomatic semantics in q Program-analysis techniques were expensive Automated Test Generation? q Automated theorem proving and constraint solving n What Happened??? techniques were not efficient q Program-analysis techniques were expensive n In the recent years we have seen remarkable q Automated theorem proving and constraint solving progress in static program-analysis and techniques were not efficient constraint solving q SLAM, BLAST, ESP, Bandera, Saturn, MAGIC

7 8

2 Goal n Automated Unit Testing of real-world and Java Programs q Generate test inputs Symbolic Execution

q Execute unit under test on generated test inputs n so that all reachable statements are executed

q Any assertion violation gets caught

9 10

Goal Execution Paths of a Program n Automated Unit Testing of real-world C and n Can be seen as a binary Java Programs tree with possibly infinite depth Y q Generate test inputs N q Computation tree q Execute unit under test on generated test inputs n Each node represents the Y n so that all reachable statements are executed execution of a “if then else” N Y N q Any assertion violation gets caught statement n Each edge represents the n Our Approach: N Y execution of a sequence of Y q Explore all execution paths of an Unit for all non-conditional statements possible inputs n Each path in the tree Y n Exploring all execution paths ensure that all reachable represents an equivalence statements are executed class of inputs N Y

11 12

3 Example of Computation Tree Example of Computation Tree int double (int v) { int double (int v) {

return 2*v; return 2*v; } } void testMe (int x, int y) { void testMe (int x, int y) {

N z==y Y N z==y Y z = double (y); z = double (y); x=0, y=1 if (z == x) { if (z == x) {

if (x > y+10) { N x>y+10 Y if (x > y+10) { N x>y+10 Y

ERROR; ERROR; x=0, y=0 } } x=22, y=11 } } ERROR ERROR

} }

13 14

Computation Tree Computation Tree void testMe1(int x) { void testMe1(int x) { for (int j=0; j < 2; j++) { for (int j=0; j < 2; j++) { j < 2 Y if (x==j) { if (x==j) { printf(“Good\n”); printf(“Good\n”); x==j } } N Y } } } } j < 2 j < 2 How many feasible execution paths do you have in this Y Y program? ☐ 3 x==j x==j ☐ 4 N Y N ☐ 5 ☐ more than 100 j < 2 j < 2 j < 2

15 16

4 Computation Tree Computation Tree Repeated many times void testMe2(int x, unsigned int N) { void testMe2(int x, unsigned int N) { for (int j=0; j < N; j++) { for (int j=0; j < N; j++) { j < 2 if (x==j) { if (x==j) { printf(“Good\n”); printf(“Good\n”); x==j } } } } } } j < 2 j < 2 How many feasible execution paths do you have in this program? ☐ 3 x==j x==j ☐ 4 ☐ 5 j < 2 j < 2 j < 2 ☐ more than 100

17 18

Existing Approach I Random Testing Approach n Random testing testMe(int x){ int double (int v) { q generate random inputs if(x == 94389){ return 2*v; q execute the program on ERROR; } n Random Test Driver: generated inputs q random value for x and y } void testMe (int x, int y) { n Probability of reaching } an error can be z = double (y); astronomically small if (z == x) { n Probability of hitting Probability of reaching ERROR = 1/232 if (x > y+10) { ERROR is extremely low ERROR; } }

}

19 20

5 Existing Approach II Concrete Execution n Symbolic Execution int double (int v) { x = 30, y = 15 q use symbolic values for input variables return 2*v; q execute the program } symbolically on symbolic input values void testMe (int x, int y) {

q collect symbolic path z = double (y); constraints q use theorem prover to if (z == x) { check if a branch can be taken if (x > y+10) {

ERROR; } }

}

21 22

Concrete Execution Concrete Execution int double (int v) { x = 30, y = 15 int double (int v) { x = 30, y = 15

return 2*v; return 2*v; } x = 30, y = 15 } x = 30, y = 15 z = 30 z = 30 void testMe (int x, int y) { void testme (int x, int y) {

z = double (y); z = double (y); z == x

if (z == x) { if (z == x) { true

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } }

} }

24

6 Concrete Execution Concrete Execution int double (int v) { x = 30, y = 15 int double (int v) { x = 30, y = 15

return 2*v; return 2*v; } x = 30, y = 15 } x = 30, y = 15 z = 30 z = 30 void testMe (int x, int y) { void testMe (int x, int y) {

z = double (y); z == x z = double (y); z == x

if (z == x) { true if (z == x) { true

if (x > y+10) { if (x > y+10) { x > y + 10 x > y + 10 ERROR; ERROR; true true } } } }

} } ERROR

25 26

Symbolic Execution Symbolic Execution int double (int v) { x = x0, y = y0 int double (int v) { x = x0, y = y0

true return 2*v; return 2*v;

} } x = x0, y = y0 z = 2y0 void testMe (int x, int y) { void testMe (int x, int y) {

z = double (y); z = double (y);

if (z == x) { if (z == x) {

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } }

} }

27

7 Symbolic Execution Symbolic Execution int double (int v) { x = x0, y = y0 int double (int v) { x = x0, y = y0

true true return 2*v; return 2*v; Check path feasibility

} x = x0, y = y0 } x = x0, y = y0 using a SMT solver z = 2y0 z = 2y0 void testMe (int x, int y) { void testMe (int x, int y) {

z = double (y); z = double (y); 2y0== x0 2y0== x0

2y0== x0 2y0== x0 if (z == x) { 2y0!= x0 if (z == x) { 2y0!= x0

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } }

} }

Symbolic Execution Symbolic Execution int double (int v) { x = x0, y = y0 int double (int v) { x = x0, y = y0

true true return 2*v; return 2*v; Check path feasibility

} x = x0, y = y0 } x = x0, y = y0 using a SMT solver z = 2y0 z = 2y0 void testMe (int x, int y) { void testMe (int x, int y) {

z = double (y); z = double (y); 2y0== x0 2y0== x0

2y0== x0 2y0== x0 if (z == x) { 2y0!= x0 if (z == x) { 2y0!= x0

if (x > y+10) { if (x > y+10) { x = x0, y = y0 x = x0, y = y0 x0 > y0 + 10 x0 > y0 + 10 z = 2y0 z = 2y0 ERROR; 2y0== x0 && ERROR; 2y0== x0 && 2y == x && 2y == x && } 0 0 x0 > y0+10 } 0 0 x0 > y0+10 x <= y +10 x <= y +10 } 0 0 } 0 0

} }

31 32

8 Symbolic Execution Symbolic Execution int double (int v) { x = x0, y = y0 int double (int v) { x = x0, y = y0 Solve these constraints: true true return 2*v; return 2*v; (a.k.a. path constraints)

} x = x0, y = y0 } x = x0, y = y0 to generate inputs for each path z = 2y0 z = 2y0 void testMe (int x, int y) { void testMe (int x, int y) {

z = double (y); z = double (y); 2y0== x0 2y0== x0

2y0== x0 2y0== x0 if (z == x) { 2y0!= x0 if (z == x) { 2y0!= x0

if (x > y+10) { if (x > y+10) { x = x0, y = y0 x = x0, y = y0 x0 > y0 + 10 x0 > y0 + 10 z = 2y0 z = 2y0 ERROR; 2y0== x0 && ERROR; 2y0== x0 && 2y == x && 2y == x && } 0 0 x0 > y0+10 } 0 0 x0 > y0+10 x <= y +10 x <= y +10 } 0 0 } 0 0 x = x , y = y x = x , y = y 0 0 ERROR 0 0 ERROR } z = 2y0 } z = 2y0

33 34

Symbolic Execution Existing Approach II

n Symbolic Execution testMe(int x){ int double (int v) { x = x0, y = y0 Solve these constraints: q use symbolic values for true if(pickEven(x) == 17) { return 2*v; (a.k.a. path constraints) input variables x = x , y = y to generate inputs for ERROR; } 0 0 q execute the program z = 2y each path 0 symbolically on symbolic } else { void testMe (int x, int y) { input values ERROR; z = double (y); q collect symbolic path 2y0== x0 } constraints if (z == x) { 2y != x 2y0== x0 } x=0, y=1 0 0 q use theorem prover to check if a branch can be if (x > y+10) { x = x0, y = y0 taken x0 > y0 + 10 Symbolic execution will z = 2y0 ERROR; 2y0== x0 && n Does not scale for large 2y == x && say both branches are } 0 0 x0 > y0+10 x <= y +10 programs } x=0, y=0 0 0 reachable: x=22, y=11 x = x , y = y 0 0 ERROR False positive } z = 2y0

35 36

9 Modern Symbolic Execution Approaches Concolic Testing Approach Concrete Symbolic int double (int v) { Execution Execution

return 2*v; concrete symbolic path } state state condition Symbolic Execution (Dynamic) void testme (int x, int y) { x = 22, y = 7 x = x0, y = y0 z = double (y); n Concolic Testing: DART, CUTE, jCUTE, CREST, CATG

q Handling imprecision if (z == x) { n Execution Generated Testing: EGT, EXE, KLEE if (x > y+10) { q Handling imprecision ERROR;

} }

}

37 38

Concolic Testing Approach Concolic Testing Approach Concrete Symbolic Concrete Symbolic int double (int v) { Execution Execution int double (int v) { Execution Execution

return 2*v; concrete symbolic path return 2*v; concrete symbolic path } state state condition } state state condition void testme (int x, int y) { void testme (int x, int y) {

z = double (y); z = double (y); x = 22, y = 7, x = x0, y = y0, z = 14 z = 2*y if (z == x) { 0 if (z == x) { 2*y0 != x0

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } } x = 22, y = 7, x = x0, y = y0, } } z = 14 z = 2*y0

39 40

10 Concolic Testing Approach Concolic Testing Approach Concrete Symbolic Concrete Symbolic int double (int v) { Execution Execution int double (int v) { Execution Execution

return 2*v; concrete symbolic path return 2*v; concrete symbolic path } state state condition } state state condition void testme (int x, int y) { void testme (int x, int y) { Solve: 2*y0 == x0 x = 2, y = 1 x = x0, y = y0 z = double (y); Solution: x0 = 2, y0 = 1 z = double (y);

if (z == x) { 2*y0 != x0 if (z == x) {

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } } x = 22, y = 7, x = x0, y = y0, } z = 14 z = 2*y0 }

41 42

Concolic Testing Approach Concolic Testing Approach Concrete Symbolic Concrete Symbolic int double (int v) { Execution Execution int double (int v) { Execution Execution

return 2*v; concrete symbolic path return 2*v; concrete symbolic path } state state condition } state state condition void testme (int x, int y) { void testme (int x, int y) {

z = double (y); z = double (y); x = 2, y = 1, x = x0, y = y0, z = 2 z = 2*y if (z == x) { 0 if (z == x) { 2*y0 == x0

x = 2, y = 1, x = x0, y = y0, if (x > y+10) { if (x > y+10) { z = 2 z = 2*y0

ERROR; ERROR; } } } }

} }

43 44

11 Concolic Testing Approach Concolic Testing Approach Concrete Symbolic Concrete Symbolic int double (int v) { Execution Execution int double (int v) { Execution Execution

return 2*v; concrete symbolic path return 2*v; concrete symbolic path } state state condition } state state condition void testme (int x, int y) { void testme (int x, int y) { Solve: (2*y == x ) ∧(x > y + 10) 0 0 0 0 z = double (y); z = double (y); Solution: x0 = 30, y0 = 15

if (z == x) { 2*y0 == x0 if (z == x) { 2*y0 == x0

if (x > y+10) { x0 ≤ y0+10 if (x > y+10) { x0 · y0+10

ERROR; ERROR; } } } }

x = 2, y = 1, x = x0, y = y0, x = 2, y = 1, x = x0, y = y0, } z = 2 z = 2*y0 } z = 2 z = 2*y0

45 46

Concolic Testing Approach Concolic Testing Approach Concrete Symbolic Concrete Symbolic int double (int v) { Execution Execution int double (int v) { Execution Execution

return 2*v; concrete symbolic path return 2*v; concrete symbolic path } state state condition } state state condition void testme (int x, int y) { void testme (int x, int y) { Program Error x = 30, y = 15 x = x0, y = y0 z = double (y); z = double (y);

if (z == x) { if (z == x) { 2*y0 == x0

if (x > y+10) { if (x > y+10) { x0 > y0+10

x = 30, y = 15 x = x0, y = y0 ERROR; ERROR; } } } }

} }

47 48

12 Explicit Path (not State) Model Checking Explicit Path Model Checking n Traverse all execution n Traverse all execution

paths one by one to F T paths one by one to F T detect errors detect errors

q assertion violations T q assertion violations T F T F F T F q program crash q program crash

q uncaught exceptions q uncaught exceptions T F T T F T n combine with valgrind n combine with valgrind to discover memory to discover memory errors T errors T

F T F T

49 50

Explicit Path (not State) Model Checking Explicit Path (not State) Model Checking n Traverse all execution n Traverse all execution

paths one by one to F T paths one by one to F T detect errors detect errors

q assertion violations T q assertion violations T F T F F T F q program crash q program crash

q uncaught exceptions q uncaught exceptions T F T T F T n combine with valgrind n combine with valgrind to discover memory to discover memory errors T errors T

F T F T

51 52

13 Explicit Path (not State) Model Checking Explicit Path (not State) Model Checking n Traverse all execution n Traverse all execution

paths one by one to F T paths one by one to F T detect errors detect errors

q assertion violations T q assertion violations T F T F F T F q program crash q program crash

q uncaught exceptions q uncaught exceptions T F T T F T n combine with valgrind n combine with valgrind to discover memory to discover memory errors T errors T

F T F T

53 54

Modern Symbolic Execution Approaches Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path } state state condition Symbolic Execution (Dynamic) void testme (int x, int y) { x = 22, y = 7 x = x0, y = y0 z = foo (y); n Concolic Testing: DART, CUTE, jCUTE, CREST, CATG

q Handling imprecision if (z == x) { n Execution Generated Testing: EGT, EXE, KLEE if (x > y+10) { q Handling imprecision ERROR;

} }

}

55 56

14 Novelty : Simultaneous Concrete and Symbolic Execution Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic Concrete Symbolic int foo (int v) { Execution Execution Execution Execution

return (v*v) % 50; concrete symbolic path concrete symbolic path } state state condition state state condition

void testme (int x, int y) { Solve: (y *y )%50 == x void testme (int x, int y) { Solve: foo (y ) == x 0 0 0 0 0 z = foo (y); Don’t know how to solve! z = foo (y); Don’t know how to solve! Stuck? Stuck? if (z == x) { (y0*y0)%50 !=x0 if (z == x) { foo (y0) !=x0

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } } x = 22, y = 7, x = x0, y = y0, x = 22, y = 7, x = x0, y = y0, } z = 49 z = (y0 *y0)%50 } z = 49 z = foo (y0)

57 58

Novelty : Simultaneous Concrete and Symbolic Execution Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic Concrete Symbolic int foo (int v) { Execution Execution int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path return (v*v) % 50; concrete symbolic path } state state condition } state state condition

void testme (int x, int y) { Solve: (y *y )%50 == x void testme (int x, int y) { Solve: 49 == x 0 0 0 0 z = foo (y); Don’t know how to solve! z = foo (y); Solution : x0 = 49, y0 = 7 Not Stuck! if (z == x) { (y0*y0)%50 !=x0 if (z == x) { 49 !=x0 Use concrete state if (x > y+10) { if (x > y+10) { Replace y0 by 7 (sound)

ERROR; ERROR; } } } } x = 22, y = 7, x = x0, y = y0, x = 22, y = 7, x = x0, y = y0, } z = 49 z = (y0 *y0)%50 } z = 48 z = 49

59 60

15 Novelty : Simultaneous Concrete and Symbolic Execution Novelty : Simultaneous Concrete and Symbolic Execution Concrete Symbolic Concrete Symbolic int foo (int v) { Execution Execution int foo (int v) { Execution Execution

return (v*v) % 50; concrete symbolic path return (v*v) % 50; concrete symbolic path } state state condition } state state condition

void testme (int x, int y) { void testme (int x, int y) { Program Error x = 49, y = 7 x = x0, y = y0 z = foo (y); z = foo (y);

if (z == x) { if (z == x) { 2*y0 == x0

if (x > y+10) { if (x > y+10) { x0 > y0+10

x = 49, y = 7, x = x0, y = y0 , ERROR; ERROR; z = 49 z = 49 } } } }

} }

61 62

Concolic Testing: A Middle Approach Summary: Pointers and Data-Structures Logical Input Map to n Pointer Constraints Random Symbolic symbolically represent q p ≠ NULL Testing Testing Memory Graph pointed by q p = NULL an input Pointer q p ≠ q q p = q n Solving Pointer Constraints q Construct equivalence class [p] for Concolic each pointer input p Testing q p ≠ NULL ! Add a node and point [p] to it + Complex programs + Complex programs - Simple programs q p = NULL + Efficient +/- Somewhat efficient - Not efficient 236 next ! Delete node pointed by [p] - Less coverage + High coverage + High coverage q p = q ! Make [p] and [q] point to same + No false positive + No false positive - False positive node q p ≠ q {0 ! 1, 1 ! 236, 2 ! 1 } ! Add a node and point [p] or [q] to it

16 Modern Symbolic Execution Approaches Execution-Generated Testing: Novelty

x = x0, y = y0

Symbolic Execution (Dynamic) void testme (int x, int y) {

z = foo (y); n Concolic Testing: DART, CUTE, jCUTE, CREST, CATG

q Handling imprecision if (z == x) { n Execution Generated Testing: EGT, EXE, KLEE if (x > y+10) { q Handling imprecision ERROR;

} }

}

65 66

Execution-Generated Testing: Novelty Execution-Generated Testing: Novelty

x = x0, y = y0 x = x0, y = 11 Cannot Execute concretize all arguments foo() symbolically. of foo using SMT solver.

x = x0, y = y0 Source of foo is x = x0, y = 11 Solve path-constraint missing so-far z = foo(y0) z = 22 void testme (int x, int y) { void testme (int x, int y) {

z = foo (y); z = foo (y);

if (z == x) { if (z == x) {

if (x > y+10) { if (x > y+10) {

ERROR; ERROR; } } } }

} }

17 Execution-Generated Testing: Novelty Execution-Generated Testing: Novelty

x = x0, y = 11 x = x0, y = 11

true Check path feasibility

x = x0, y = 11 x = x0, y = 11 using a SMT solver z = 22 z = 22 void testme (int x, int y) { void testme (int x, int y) {

z = foo (y); z = foo (y); 22== x0 22 == x0

22 == x0 22 == x0 if (z == x) { 22 != x0 if (z == x) { 22 != x0 x=0, y=11 if (x > y+10) { if (x > y+10) { x = x0, y = 11 x0 > 11 + 10 z = 22 ERROR; ERROR; 22== x0 && 22== x && } } 0 x0 > 11+10 Xx <= 11+10 } } 0 x=22, y=11

} } ERROR

Symbolic Execution: Finding Security and Safety Bugs Symbolic Execution: Finding Security and Safety Bugs

Key: Add Checks Automatically and Perform Symbolic Execution

Divide by 0 Error Buffer Overflow Divide by 0 Error Buffer Overflow

if (i !=0) if (0<=i && i < a.length) x = 3 / i; a[i] = 4; x = 3 / i; a[i] = 4; else else ERROR; ERROR;

71 72

18 A Big Advantage Concolic: CUTE, jCUTE, CREST n Simplify symbolic expressions on the fly n CUTE for C and jCUTE for Java q 5000+ downloads q Execute x = y*y + x + 10 from the state x = (x0,5) q used in both academia and industry and y = (y0,4) n CREST

q extensible open-source tool for C q x = x0+26 if our decision procedure cannot q 1500+ download since mid-2008 release handle non-linear arithmetic q Used to n augment existing test suites

n detecting SQL injection vulnerabilities q x = y0*y0+x0+10 if we our decision procedure can handle non-linear arithmetic n modified to run distributed on a cluster for testing a flash storage platform

n more sophisticated concolic search heuristics n used in teaching courses at some universities

73 74

Execution-Generated Tests: EXE and Concolic Testing: SAGE KLEE

— SAGE found many new expensive security bugs in Windows Targeted at low-level systems code. applications Found bugs (including security vulnerabilities) in: — Cost of each Microsoft Security Bulletin: $Millions — Cost due to worms (Slammer, CodeRed, Blaster, etc.):$Billions UNIX file systems ext2, ext3, JFS UNIX utilities — Apps: image processors, media players, file decoders,… Coreutils, Busybox, Minix MINIX device drivers pci, lance, sb16 — Many bugs triaged as “security critical, severity 1, priority Library code 1” (would trigger Microsoft security bulletin if known outside PCRE, uClibc, Pintos MS) Packet filters FreeBSD BPF, Linux BPF — Bugs missed by black-box fuzzers or static analysis Networking servers udhcpd, Bonjour, Avahi, WsMp3 Operating Systems — Used daily in various Microsoft groups HiStar kernel Computer vision code OpenCV Slide Source: Corina Pasareanu

19 Underlying Random Testing Helps Limitations n Path Space of a Large Program is Huge 1 foobar(int x, int y){ n static analysis based q Path Explosion Problem 2 if (x*x*x > 0){ model-checkers would

3 if (x>0 && y==10){ consider both branches Entire Computation 4 ERROR; q both ERROR statements Tree are reachable 5 } q false alarm 6 } else { n Symbolic execution 7 if (x>0 && y==20){ q gets stuck at line number 8 ERROR; 2

9 } q or warn that both 10 } ERRORs are reachable 11 } n CUTE finds the only error

77 78

Limitations Hybrid Concolic Testing n Path Space of a Large Program is Huge q Path Explosion Problem

Entire Computation Tree

Explored by Concolic Testing

Concolic: Broad, shallow Random: Narrow, deep

79 80

20 Hybrid Concolic Testing Challenges n Interleave Random Testing and Concolic Testing to increase coverage n Scalability q Heuristic search [Burnim & Sen, ASE’08] [Majumdar & Sen, ICSE’07] while (not required coverage) { q Compositional techniques [Godefroid, POPL’07] while (not saturation) q Pruning redundant paths [Boonstoppel et al, TACAS’08] q Parallel techniques [Siddiqui & Khurshid, ICSTE’10] [Staats & perform random testing; Pasareanu, ISSTA’10] Checkpoint; q Incremental techniques [Person et al, PLDI’11] n Complex non-linear mathematical constraints while (not increase in coverage) q Un-decidable or hard to solve perform concolic testing; q Heuristic solving [Lakhotia et al., ICTSS’10][Souza et al, NFM’11] Restore; n Testing web applications and security problems } q String constraints [Bjorner et al, 2009] … q Mixed numeric and string constraints Deep, broad search Hybrid Search 81

CATG for Java n CATG: concolic testing for Java q Thanks to NTT Multimedia Communications Laboratories, INC. Thank you q Open-source BSD license For more information visit n You can join the project and extend CATG http://www.doc.ic.ac.uk/~cristic/ q http://github.com/ksen007/janala2 q Support for several constraint solvers:

http://srl.cs.berkeley.edu/~ksen/ n Yices, CVC Lite, Choco Solver

q Open framework for Dynamic Analysis of Java n Implement concolic testing

n Implement concurrency bug finding tools: data race detector, deadlock detector, etc.

83 84

21 CATG Architecture

Java Class janala.logger.* janala.interpreters.* Files janala.instrument.* Implement Concolic Testing Generates a log of Reads “trace” and execute Instruments Java instructions “trace.aux” to program class files on the executed and perform concolic using at most fly and links values read from execution and janala.logger.DJVM local variables and generate path 300 lines of Ruby or Python Code “input” class heap locations. constraints Stores in “trace” File and “trace.aux”

janala.solvers.*

Maintains path history, i.e. sequence of branches being executed, in file “history”. Uses a solver (e.g. ChocoSolver or YicesSolver) to generate new inputs (i.e. generates the file “input”)

86

How to implement CUTE in a day? n Goto http://srl.cs.berkeley.edu/~ksen/ n Search for CUTE homework for Java n Download CUTE homework infra-structure q Contains SOOT for Java Instrumentation q Contains Yices for constraint solving q Contains instructions to implement CUTE n Experience with this homework: q Many students in my class implemented CUTE using at most 300 lines of Ruby or Python code q Developed new search strategies to significantly improve test generation

87 88

22 Problem Sample Program public class Testme { static int dbl(int x) { return 2 * x; n Your Program: Trace -> Input } public static void main(String[] args){ n int x; int y; You must compute overall branch coverage x = Concolic.input.Integer(); y = Concolic.input.Integer();

int z = dbl(x); if(z==y){ if(x != y+10){ System.out.println("I am fine here"); } else { System.out.println("I should not reach here"); } } } }

89 90

Instrumentation Sample Trace n (65537,_) = (int)(x1,_) n (65538,_) = (int)(x2,_) n (131073,_) = (65537,_) n (131091,_) = (0,2) * (131073,0) n (65539,_) = (131091,_) n else:537 (65539,0) != (65538,0) n (65553,_) = (65538,0) + (0,10) n else:539 (65537,0) == (65553,10)

91 92

23 Sample Input PEX n Pex is a Visual Studio 2010 Power Tool sat q http://msdn.microsoft.com/en-us/vstudio/ bb980963.aspx (= x1 -10) q Power Tools are a set of enhancements, tools and (= x2 -20) command-line utilities n Used by several groups within Microsoft n Externally, available under academic and commercial licenses n Downloaded > 40,000 times n Anyone can try out Pex in the browser q http://pexforfun.com q > 250,000 programs analyzed within the first 5 months of the launch of the website

Slide Source: Corina Pasareanu

93

String Constraint Solver for Database String Constraint Solvers n Emmi, Majumdar, Sen 2007 n String Constraints n Assume conjunctive constraints σ = σ , σ ≠σ , σ ∈ L(R) 1 2 1 2 n Normalize to C = ∧ σ ∈ R i i i [DPRLE’09] [HAMPI’09] [PEX’09 n Define σ ≡σ when(σ =σ )∈C ] 1 2 1 2 Concatenation (Word Equations)    n Take quotient {p ,...,p } = {σ } /≡ Regular Language Membership 1 n i i    n ∧ ∈ String Length Define Lp = R for R constraining σ p    Equality n Find w1 ∈ Lp1 , . . . , wn ∈ Lpn   Multiple String Variables  q w ≠ w when(σ ̸=σ )∈C and σ ∈p ,σ ∈p . Enumerate through n i j k l k i l j    shortest words in each—PSPACE Boolean and Integer Logic    n PSPACE-complete

Slide Source: Prateek Saxena

95

24 Floating Point Constraints WISE: Performance-Directed Testing n In principle: can encode IEEE 754 format into n Testing has focused on correctness bugs. Boolean constraints [Kroening,Ivancic] n Goal: Apply to software performance. q hard to scale q Find performance bottlenecks, Algorithmic denial-of- n One Idea: Treat all floating point service. computations as interval arithmetic n Wise: Computational complexity testing. computations [Majumdar et al. 10] q How slow is an operation in the worst case? q Actual values guaranteed to lie within the interval q Does a function meet its algorithmic complexity spec?

q intervals can become quite large n We need better solvers

Slide Source: Rupak Majumdar

Performance-Directed Testing Incremental Constraint Solving n Example: Performance bug in Jar! n Observation: one constraint is negated at each execution q Reported by Sun on May 15, 2009 2 q C1 ∧ C2 ∧ … ∧ Ck has a satisfying assignment q update method O(N ) instead of O(N) q Need to solve C1 ∧ C2 ∧ … ∧ ¬ Ck q O(N) look-up on every file, rather than O(1) q Previous solution more or less similar to current solution q wasted 75% of run-time building rt.jar! q Eliminate non-dependent constraints (x==1) ∧ (y>2) ∧ ¬ (y==4) to (y>2) ∧ ¬ (y==4) n Incremental Solving

q 100 -1000 times faster than a naïve solver

25