Maple: A Coverage-Driven Testing Tool for Multithreaded Programs Jie Yu Satish Narayanasamy Cristiano Pereira Gilles Pokam University of Michigan Intel Corporation {jieyu, natish}@umich.edu {cristiano.l.pereira, gilles.a.pokam}@intel.com Abstract the infrequently occuring buggy thread interleaving, because Testing multithreaded programs is a hard problem, because there can be many correct interleavings for that input. it is challenging to expose those rare interleavings that can One common practice for exposing concurrency bugs is trigger a concurrency bug. We propose a new thread inter- stress-testing, where a parallel program is subjected to ex- leaving coverage-driven testing tool called Maple that seeks treme scenarios during a test run. This method is clearly to expose untested thread interleavings as much as possible. inadequate, because naively executing a program again and It memoizes tested interleavings and actively seeks to ex- again over an input tends to unnecessarily test similar thread pose untested interleavings for a given test input to increase interleavings and has less likelihood of exposing a rare interleaving coverage. buggy interleaving. An alternative to stress testing is system- We discuss several solutions to realize the above goal. atic testing [13], where the thread scheduler systematically First, we discuss a coverage metric based on a set of in- explores all legal thread interleavings for a given test input. terleaving idioms. Second, we discuss an online technique Though the number of thread schedules could be reduced to predict untested interleavings that can potentially be ex- by using partial-order reduction [10, 12] and by bounding posed for a given test input. Finally, the predicted untested the number of context-switches [32], this approach does not interleavings are exposed by actively controlling the thread scale well for long running programs. schedule while executing for the test input. We discuss our Another recent development is active testing [36, 38, 50]. experiences in using the tool to expose several known and Active testing tools use approximate bug detectors such as unknown bugs in real-world applications such as Apache static data-race detectors [7, 43] to predict buggy thread in- and MySQL. terleavings. Using a test input, an active scheduler would try to excercise a suspected buggy thread interleaving in a real Categories and Subject Descriptors D.2.5 [Software En- execution and produce a failed test run to validate that the gineering]: Testing and Debugging suspected bug is a true positive. Active testing tools target General Terms Design, Reliability specific bug types such as data-races [38] or atomicity viola- tions [17, 23, 35, 36, 40], and therefore are not generic. For Keywords Testing, Debugging, Concurrency, Coverage, a given test input, after actively testing for all the predicted Idioms buggy thread interleavings, a programmer may not be able to determine whether she should continue testing other thread 1. Introduction interleavings for the same input or proceed to test a different Testing a shared-memory multi-thread program and expos- input. ing concurrency bugs is a hard problem. For most concur- In this paper, we propose a tool called Maple that em- rency bugs, the thread interleavings that can expose them ploys a coverage-driven approach for testing multithreaded manifest only rarely during an unperturbed execution. Even programs. An interleaving coverage-driven approach has the if a programmer manages to construct a test input that can potential to find different types of concurrency bugs, and trigger a concurrency bug, it is often difficult to expose also provide a metric for the programmers to understand the quality of their tests. While previous studies have attempted to define coverage metrics for mulithreaded programs based Permission to make digital or hard copies of all or part of this work for personal or on synchronization operations [2] and inter-thread memory classroom use is granted without fee provided that copies are not made or distributed dependencies [22, 24, 39], synergistic testing tools that can for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute help programmers achieve higher coverage for those metrics to lists, requires prior specific permission and/or a fee. have been lacking. OOPSLA’12, October 19–26, 2012, Tucson, Arizona, USA. Copyright c 2012 ACM 978-1-4503-1561-6/12/10. $10.00 the predicted iRoot in an actual execution using a set of novel heuristics. If the iRoot gets successfully exposed, then it is memoized by storing it in a database of iRoots tested for the program. We also consider the possibility that certain iRoots may never be feasible for any input. We progressively learn these iRoots and store them in a separate database. These iRoots are given a lower priority when there is only limited time available for testing. When the active scheduler for an iRoot triggers a concur- rency bug causing the program produces an incorrect result, Maple generates a bug report that contains the iRoot. Our active scheduler orchestrates thread schedules on a unipro- Figure 1. Overview of the framework. cessor, and therefore recording the order of thread sched- ule along with other non-deterministic system input, if any, could allow a programmer to reproduce the failed execution The first contribution of this paper is the set of interleav- exposed by Maple. ing idioms which we use to define coverage for mulithreaded We envision two usage models for Maple. One usage programs. An interleaving idiom is a pattern of inter-thread scenario is when a programmer has a test input and wants to dependencies through shared-memory accesses. An instance test her program with it. In this scenario, Maple will help the of an interleaving idiom is called an iRoot which is repre- programmer actively expose thread interleavings that were sented using a set of static memory instructions. The goal of not tested in the past. Also, a programmer can determine how Maple is to expose as many new iRoots as possible during long to test for an input, because Maple’s predictor would testing. produce a finite number of iRoots for testing. We define our set of interleaving idioms based on two Another usage scenario is when a programmer acciden- hypothesis. One is the well-known small scope hypothe- tally exposed a bug for some input, but is unable to repro- sis [18, 29] and the other is what we refer to as the value- duce the failed execution. A programmer could use Maple independence hypothesis. Small scope hypothesis [18, 29] with the bug triggering input to quickly expose the buggy states that most concurrency bugs can be exposed using a interleaving. We helped a developer at Intel in a similar sit- small number of preemptions. CHESS [32] exploits this ob- uation to expose an unknown bug using Maple. servation to bound the number of preemptions to reduce the We built a dynamic analysis framework using PIN [28] test space. We apply the same principle to bound the number for analyzing concurrent programs. Using this framework, of inter-thread memory dependencies in our interleaving pat- we built several concurrency testing tools including Maple, terns to two. Our empirical analysis of several concurrency a systematic testing tool called CHESS [32] and tools such bugs in real applications show that a majority of them can be as PCT [3] that rely on randomized thread schedulers, which triggered if at most two inter-thread memory dependencies we compare in our experiments. are exposed in an execution. We perform several experiments using open-source appli- Our value-independence hypothesis is that a majority cations (Apache, MySQL, Memcached, etc.). Though Maple of concurrency bugs gets triggered if the errorneous inter- does not provide hard guarantees similar to CHESS [29] and thread memory dependencies are exposed, irrespective of PCT [3], it is effective in achieving higher iRoot coverage the data values of the shared variables involved in the de- faster than those tools in practice. We also show that Maple pendencies. We leverage this hypothesis to test for an iRoot is effective in exposing 13 documented bugs faster than only once, and avoid testing the same thread interleaving these prior methods, which provides evidence that achieving (iRoot) again and again across different test input. Thus, the higher coverage for our metric based on interleaving idioms number of thread interleavings to test would progressively is effective in exposing concurrency bugs. We also discuss reduce as we test for more inputs. our experiences in using Maple to find 3 unknown bugs in A critical challenge is in exposing untested iRoots for aget, glibc,andCNC. a given test input. To this end, we built the Maple testing Our dynamic analysis framework for concurrent pro- infrastructure comprised of an online profiler and an active grams and all the testing tools we developed are made avail- scheduler shown in Figure 1. able to the public under the Apache 2.0 license. They can be Maple’s online profiler examines an execution for a test downloaded from (https://github.com/jieyu/maple). input, and predicts the set of candidate iRoots that are fea- sible for that input but have not yet been exposed in any prior test runs. Predicted untested iRoots are given as input to Maple’s active scheduler. The active scheduler takes the test input and orchestrates the thread interleaving to realize 2. Coverage-Driven Testing Based on Idiom1 Idiom2 Idiom3 Interleaving Idioms ࢄ ࢄ ࢄ ࢄ ࢄ In this section we discuss a coverage-driven testing method- ࢄ ࢄ ࢄ ology for multithreaded programs. For sequential programs, ࡰࢄ metrics such as program statement coverage are commonly Idiom4 Idiom5 Idiom6 used to understand the effectiveness of a test suite and de- ࢄ ࢄ ࢄ ࢅ termine if additional testing is required.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages18 Page
-
File Size-