Arxiv:2007.15943V1 [Cs.SE] 31 Jul 2020
Total Page:16
File Type:pdf, Size:1020Kb
MUZZ: Thread-aware Grey-box Fuzzing for Effective Bug Hunting in Multithreaded Programs Hongxu Chen§† Shengjian Guo‡ Yinxing Xue§∗ Yulei Sui¶ Cen Zhang† Yuekang Li† Haijun Wang# Yang Liu† †Nanyang Technological University ‡Baidu Security ¶University of Technology Sydney §University of Science and Technology of China #Ant Financial Services Group Abstract software performance. A typical computing paradigm of mul- tithreaded programs is to accept a set of inputs, distribute Grey-box fuzz testing has revealed thousands of vulner- computing jobs to threads, and orchestrate their progress ac- abilities in real-world software owing to its lightweight cordingly. Compared to sequential programs, however, multi- instrumentation, fast coverage feedback, and dynamic adjust- threaded programs are more prone to severe software faults. ing strategies. However, directly applying grey-box fuzzing On the one hand, the non-deterministic thread-interleavings to input-dependent multithreaded programs can be extremely give rise to concurrency-bugs like data-races, deadlocks, inefficient. In practice, multithreading-relevant bugs are usu- etc [32]. These bugs may cause the program to end up with ab- ally buried in the sophisticated program flows. Meanwhile, normal results or unexpected hangs. On the other hand, bugs existing grey-box fuzzing techniques do not stress thread- that appear under specific inputs and interleavings may lead interleavings that affect execution states in multithreaded pro- to concurrency-vulnerabilities [5, 30], resulting in memory grams. Therefore, mainstream grey-box fuzzers cannot ade- corruptions, information leakage, etc. quately test problematic segments in multithreaded software, although they might obtain high code coverage statistics. There exist a line of works on detecting bugs and vulner- To this end, we propose MUZZ, a new grey-box fuzzing abilities in multithreaded programs. Static concurrency-bug technique that hunts for bugs in multithreaded programs. predictors [2, 40, 45, 50] aim to approximate the runtime MUZZ owns three novel thread-aware instrumentations, behaviors of a program without actual concurrent execution. namely coverage-oriented instrumentation, thread-context However, they typically serve as a complementary solution instrumentation, and schedule-intervention instrumentation. due to the high percentage of false alarms [19]. Dynamic During fuzzing, these instrumentations engender runtime feed- detectors detect concurrency-violations by reasoning memory back to accentuate execution states caused by thread inter- read/write and synchronization events in a particular execu- leavings. By leveraging such feedback in the dynamic seed tion trace [5, 12, 21, 41, 42, 49, 58]. Several techniques like selection and execution strategies, MUZZ preserves more valu- ThreadSanitizer (a.k.a., TSan) [42] and Helgrind [49] have able seeds that expose bugs under a multithreading context. been widely used in practice. However, these approaches by We evaluate MUZZ on twelve real-world multithreaded themselves do not automatically generate new test inputs to programs. Experiments show that MUZZ outperforms exercise different paths in multithreaded programs. arXiv:2007.15943v1 [cs.SE] 31 Jul 2020 AFL in both multithreading-relevant seed generation and Meanwhile, grey-box fuzzing is effective in generating test concurrency-vulnerability detection. Further, by replaying inputs to expose vulnerabilities [34, 36]. It is reported that the target programs against the generated seeds, MUZZ also grey-box fuzzers (GBFs) such as AFL [63] and libFuzzer [31] reveals more concurrency-bugs (e.g., data-races, thread-leaks) have detected more than 16,000 vulnerabilities in hundreds than AFL. In total, MUZZ detected eight new concurrency- of real-world software projects [16, 31, 63]. vulnerabilities and nineteen new concurrency-bugs. At the Despite the great success of GBFs in detecting vulner- time of writing, four reported issues have received CVE IDs. abilities, there are few efforts on fuzzing user-space multi- threaded programs. General-purpose GBFs usually cannot 1 Introduction explore thread-interleaving introduced execution states due to their unawareness of multithreading. Therefore, they can- Multithreading has been popular in modern software systems not effectively detect concurrency-vulnerabilities inherently since it substantially utilizes the hardware resources to boost buried in sophisticated program flows [30]. In a discussion in 2015 [64], the author of AFL, Michal Zalewski, even suggests ∗Corresponding Author. that “it’s generally better to have a single thread”. In fact, due 1 to the difficulty and inefficiency, the fuzzing driver programs Algorithm 1: Grey-box Fuzzing Workflow in Google’s continuous fuzzing platform OSS-fuzz are all input :program Po, initial seed queue QS tested in single-threaded mode [15]. Also, by matching unions output :final seed queue QS, vulnerable seed files TC of keyword patterns “race*”, “concurren*” and “thread*” in 1 P f instrument(Po); // instrumentation the MITRE CVE database [48], we found that only 202 CVE 2 TC 0/; records are relevant to concurrency-vulnerabilities out of the 3 while True do 70438 assigned CVE IDs ranging from CVE-2014-* to CVE- 4 t select_next_seed(QS); // seed selection 2018-*. In particular, we observed that, theoretically, at most 5 M get_mutation_chance(P f , t) ; // seed scheduling 4 CVE records could be detected by grey-box fuzzers that 6 for i 2 1:::M do 0 work on user-space programs. 7 t mutated_input(t) ; // seed mutation As a result, there are no practical fuzzing techniques to 8 res run(P f , t’, Nc); // repeated execution 9 if is_crash(res) then // seed triaging test input-dependent user-space multithreaded programs and 0 10 T T [ ft g ; // report vulnerable seeds detect bugs or vulnerabilities inside them. To this end, we C C 11 else if cov_new_trace(t’, res) then present a dedicated grey-box fuzzing technique, MUZZ, to 0 12 Q Q ⊕t ; // preserve “effective” seeds reveal bugs by exercising input-dependent and interleaving- S S dependent paths. We categorize the targeted multithreading- relevant bugs into two major groups: • concurrency-vulnerabilities (Vm): they correspond to memory corruption vulnerabilities that occur in a multi- seeds QS, a GBF first utilizes instrumentation to track the cov- threading context. These vulnerabilities can be detected erage information in Po. Then it enters the fuzzing loop: 1) during the fuzzing phase. Seed selection decides which seed to be selected next; 2) Seed scheduling decides how many mutations will be applied • concurrency-bugs (Bm): they correspond to the bugs like M data-races, atomicity-violations, deadlocks, etc. We detect on the selected seed t; 3) Seed mutation applies mutations 0 them by replaying the seeds generated by MUZZ with state- on seed t to generate a new seed t ; 4) During repeated ex- 0 of-the-art concurrency-bug detectors such as TSan. ecution, for each new seed t , the fuzzer executes against it Note that B may not be revealed during fuzzing since they Nc times to get its execution statistics; 5) Seed triaging eval- m 0 do not necessarily result in memory corruption crashes. In the uates t based on the statistics and the coverage feedback remaining sections, when referring to multithreading-relevant from instrumentation, to determine whether the seed leads bugs, we always mean the combination of concurrency-bugs to a vulnerability, or whether it is “effective” and should be preserved in the seed queue for subsequent fuzzing. Here, and concurrency-vulnerabilities, i.e., Vm [ Bm. We summarize the contributions of our work as follows: steps 3), 4), 5) are continuously processed M times. Notably, 1) We develop three novel thread-aware instrumentations for Nc times of repeated executions are necessary since a GBF needs to collect statistics such as average execution time for grey-box fuzzing that can distinguish the execution states 0 caused by thread-interleavings. t , which will be used to calculate mutation times M for seed scheduling in the next iteration. In essence, the effectiveness 2) We optimize seed selection and execution strategies based of grey-box fuzzing relies on the feedback collected from on the runtime feedback provided by the instrumentations, the instrumentation. Specifically, the result of cov_new_trace which help generate more effective seeds concerning the mul- (line 11) is determined by the coverage feedback. tithreading context. 3) We integrate these analyses into MUZZ for an effective bug hunting in multithreaded programs. Experiments on 12 real- 2.2 The Challenge in Fuzzing Multithreaded world programs show that MUZZ outperforms other fuzzers Programs and Our Solution like AFL and MOPT in detecting concurrency-vulnerabilities and revealing concurrency-bugs. Figure1 is an abstracted multithreaded program that accepts 4) MUZZ detected 8 new concurrency-vulnerabilities and 19 a certain input file and distributes computing jobs to threads. new concurrency-bugs, with 4 CVE IDs assigned. Consider- Practically it may behave like compressors/decompressors ing the small portion of concurrency-vulnerabilities recorded (e.g., lbzip2, pbzip2), image processors (e.g., ImageMagick, in the CVE database, the results are promising. GraphicsMagick), encoders/decoders (e.g., WebM, libvpx), etc. After reading the input content buf, it does an initial validity check inside the function check. It exits immediately if the 2 Background and Motivation buffer does not satisfy certain properties.