Koushik Sen's Research Statement
Total Page:16
File Type:pdf, Size:1020Kb
Research Overview (2006-2011) Koushik Sen 1 Research A primary goal of my research is to improve software reliability and quality through the development of tools and techniques based on formal methods and programming language theory. In my research I have developed specification and testing techniques that use sophisticated program analysis and formal methods internally while providing a familiar usage model that daily programmers can readily use. Motivation behind my research Today’s software systems suffer from poor reliability. NIST estimates that $60 billion is lost annually in the US due to software errors, and such errors in transportation, medicine, and other areas can put lives at risk. This indicates that our techniques to ensure software reliability are far from the level of maturity attained by other engineering disciplines that create critical infrastructure. The situation is likely to get worse, as the complexity of software systems increases without a matching increase in the effectiveness of software quality tools and techniques. Testing is the only predominant technique used by the software industry to make software reliable. Studies show that testing accounts for more than half of the total software development cost in industry. Although testing is a widely used and a well- established technique for building reliable software, existing techniques for testing are mostly ad hoc and ineffective— serious bugs are often exposed post-deployment. Our research aims to make real-world software specification and testing more usable, effective, scalable, and rigorous. Concolic Testing and Directed Automated Random Testing Prior to 2006 (dissertation work), I worked on several novel approaches to improving software reliability; these approaches dramatically improve the testing and runtime monitoring of software, provide a method for quantifying the correctness of software, and enable the verification of software using machine learning. We proposed Concolic Testing (also called DART: Directed Automated Random Testing), an efficient way to automatically and systematically test programs. We showed how a combination of runtime symbolic execution and automated theorem proving techniques could generate non-redundant and exhaustive test inputs [PLDI’05, FSE’05, CAV’06a, FASE’06, HVC’06]. The success of concolic testing in scalably and exhaustively testing real-world software was a major milestone in the ad hoc world of software testing and has inspired the development of several industrial and academic automated testing and security tools such as PEX, SAGE, YOGI, and Vigilante at Microsoft, Apollo at IBM, and SPLAT, BitBlaze, jFuzz, Oasis, and SmartFuzz in academia. (My two key papers on the subject have been cited over 1000 times and have won 3 awards—an ACM SIGSOFT Distinguished Paper Award at ESEC/FSE’05, a NSF CAREER Award in 2008, and a Haifa Verification Conference Award in 2009.) A central reason behind the wide adoption of concolic testing is that, while concolic testing uses program analysis and automated theorem proving techniques internally, it exposes a testing usage model that is familiar to most software developers. I further proposed combining concolic testing with formal specifications by using runtime monitors [FSE’03, ICSE’04, VMCAI’04, TACAS’04]. While concolic testing and runtime monitoring are suitable for finding bugs, they cannot guarantee the correctness of software. Thus, I proposed statistical model checking [CAV’04, CAV’05, QEST’05] and learning-to-verify [ICFEM’04, FSTTCS’04, TACAS’05] to quantify and prove the correctness of software. After 2006 (at Berkeley), I worked on a number of techniques to adapt concolic testing to larger programs and a wider variety of software and testing domains. The number of test inputs that need to be generated to exhaustively test large programs is usually astronomically large, and CUTE or jCUTE fails to generate all those inputs in reasonable time budget. I have worked on a variety of techniques [ISSTA’07, ICSE’07, ASE’07b, ISSTA’08, ASE’08b, SAS’08, ICSE’09b, ICSE’11, SAT’11] to make automated test generation scalable and exhaustive. We have proposed hybrid concolic testing [ICSE’07], a combination of random testing, a fast and non-exhaustive method of testing, with con- colic testing, an exhaustive and slow testing technique. We have also developed a novel strategy in concolic test gener- ation to quickly achieve high code coverage [ASE’08b]. In the strategy, test generation is guided by the static control flow graph of the program under test. The above two heuristic techniques have enabled us to scale concolic testing significantly. We have invented predictive testing to amplify the effectiveness of testing with manually generated test 1 cases by combining static and dynamic program analysis [FSE’07, ASE’08a]. We have extended concolic testing to test “Big O” runtime complexity of programs [ICSE’09b] and to automatically test database applications [ISSTA’07]. Specifying Correctness of Parallel Programs My research focus has shifted from sequential program correctness to that of parallel programs, as the spread of multicore and manycore processors and graphic processing units (GPUs) has greatly increased the need for parallel correctness tools. Reasoning about multithreaded programs is significantly more difficult than for sequential programs due to the nondeterministic interleaving of parallel threads. We believe that the only way to tackle this complexity is to find ways to separately specify, test, and reason about the correctness of a program’s use of parallelism and a program’s functional correctness. Driven by this belief, we have developed two fundamental techniques for separating the parallelization correctness aspect of a program from the program’s sequential functional correctness. Our first technique is based on the observation that a key simplifying assumption available to the developer of se- quential code is the determinism of the execution model. In contrast, parallel programs exhibit nondeterministic behav- ior due to the interleaving of parallel threads. This nondeterminism is essential to harness the power of parallel chips. But programmers often strive to preserve the determinism of their applications in the face of this nondeterminism—that is, to always compute the same output for the same input, no matter how the parallel threads of execution are sched- uled. We argue that programmers should be provided with a framework that will allow them to express the natural, intended deterministic behavior of parallel programs directly and easily. Thus, we proposed extending programming languages with constructs for writing specifications, called bridge assertions, that focus on relating outcomes of two parallel executions differing only in thread-interleavings [CACM’10, FSE’09, ICSE’10, ASPLOS’11]. We evaluated the utility of these assertions by manually adding deterministic specifications to a number of parallel Java benchmarks. We found it to be fairly easy to specify the correct deterministic behavior of the benchmark programs using our asser- tions, despite being unable in most cases to write traditional invariants or functional correctness assertions. Our papers on this work won an ACM SIGSOFT Distinguished Paper Award at [FSE’09] and the 2010 IFIP TC2 Manfred Paul Award for Excellence in Software: Theory and Practice at [ICSE’10]. Our research has been featured in the CACM’s Research Highlights [CACM’10]. Our second technique, in the same spirit of decomposing our correctness efforts, is suggested by the observation that manually parallelized programs often employ algorithms that are different than the sequential versions. In some cases, the algorithm itself is nondeterministic. Examples of such nondeterministic algorithms include branch-and- bound algorithms, which may produce multiple valid outputs for the same input. Such algorithmic nondeterminism is often tightly coupled with the functional correctness of the code, thus violating the premise of our determinism checking technique. To address these situations, we proposed to separate the reasoning about the algorithmic and the scheduler sources of nondeterminism. For such cases, we provide a specification technique in which a nonde- terministic sequential (NDSEQ) version of a parallel program is the specification for the program’s use of paral- lelism [HotPar’10, PLDI’11]. Using this technique, the programmer expresses the intended, algorithmic nondetermin- ism in his or her program—and the program’s parallelism is correct if the nondeterministic thread scheduling adds no additional program behaviors to those of the NDSEQ program. Testing, debugging, and reasoning about functional correctness can then be performed on the NDSEQ version, with controlled nondeterminism but with no interleaving of parallel threads. We developed an automatic technique [PLDI’11] to test parallelism correctness, by checking that an observed parallel execution obeys its nondeterministic sequential specification. Active Testing of Concurrent Programs To check parallel correctness, we have developed active testing, a new scalable automated method for testing and debugging parallel programs. Active testing combines the power of imprecise program analysis with the precision of software testing to quickly discover concurrency bugs and to reproduce discovered bugs on demand [PLDI’08, FSE’08, PLDI’09, CAV’09, ICSE’09a, FSE’10a, FSE’10b, HotPar’11, TACAS’11, ISSTA’11, SC’11]. The key idea behind active testing is to control the thread