Effective Testing for Concurrency Bugs

Effective Testing for Concurrency Bugs Thesis for obtaining the title of Doctor of Engineering Science of the Faculties of Natural Science and Technology I of Saarland University From Pedro Fonseca Saarbrücken, 13. April 2015 Date of Colloquium: 24/06/2015 Dean of Faculty: Univ.-Prof. Dr. Markus Bläser Chair of the Committee: Prof. Dr. Andreas Zeller First Reviewer: Prof. Dr. Rodrigo Rodrigues Second Reviewer: Prof. Dr. Peter Druschel Third Reviewer: Prof. Dr. George Candea Forth Reviewer: Dr. Björn Brandenburg Academic Assistant: Dr. Przemyslaw Grabowicz ii c 2015 Pedro Fonseca ALL RIGHTS RESERVED iii Dedicated to my parents and brother v This work was supported by the Foundation for Science and Technology of Portugal (SFRH/BD/45309/2008) and the Max Planck Society vi Abstract In the current multi-core era, concurrency bugs are a serious threat to software reliability. As hardware becomes more parallel, concurrent programming will become increasingly pervasive. However, correct concurrent programming is known to be extremely challenging for developers and can easily lead to the introduction of concurrency bugs. This dissertation addresses this challenge by proposing novel techniques to help developers expose and detect concurrency bugs. We conducted a bug study to better understand the external and internal effects of real-world concurrency bugs. Our study revealed that a significant fraction of concurrency bugs qualify as semantic or latent bugs, which are two particularly challenging classes of concurrency bugs. Based on the insights from the study, we propose a concurrency bug detector, PIKE that analyzes the behavior of program executions to infer whether concurrency bugs have been triggered during a concurrent execution. In addi- tion, we present the design of a testing tool, SKI, that allows developers to test operating system kernels for concurrency bugs in a practical manner. SKI bridges the gap between user-mode testing and kernel-mode testing by enabling the systematic exploration of the kernel thread interleaving space. Our evaluation shows that both PIKE and SKI are effective at finding concurrency bugs. vii Kurzdarstellung Im gegenwärtigenMulticore-Zeitalter sind Fehler aufgrund von Nebenläufigkeit eine ern- sthafte Bedrohung der Zuverlässigkeit von Software. Mit der wachsenden Parallelisierung von Hardware wird nebenläufigesProgrammieren nach und nach allgegenwärtig.Diese Art von Programmieren ist jedoch als äußerstschwierig bekannt und kann leicht zu Programmierfehlern führen. Die vorliegende Dissertation nimmt sich dieser Heraus- forderung an indem sie neuartige Techniken vorschlägt,die Entwicklern beim Aufdecken von Nebenäufigkeitsfehlern helfen. Wir führeneine Studie von Fehlern durch, um die externen und internen Effekte von in der Praxis vorkommenden Nebenläufigkeitsfehlern besser zu verstehen. Diese ergibt, dass ein bedeutender Anteil von solchen Fehlern als semantisch bzw. latent zu charak- terisieren ist { zwei besonders herausfordernde Klassen von Nebenläufigkeitsfehlern. Basierend auf den Erkenntnissen der Studie entwickeln wir einen Detektor (PIKE), der Programmausführungendaraufhin analysiert, ob Nebenläufigkeitsfehler aufgetreten sind. Weiterhin präsentieren wir das Design eines Testtools (SKI), das es Entwicklern erm¨glicht, Betriebssystemkerne praktikabel auf Nebenäufigkeitsfehler zu uberpr¨rufen. SKI fülltdie Lücke zwischen Testen im Benutzermodus und Testen im Kernelmodus, indem es die systematische Erkundung der Kernel-Thread-Verschachtelungen erlaubt. Unsere Auswertung zeigt, dass sowohl PIKE als auch SKI effektiv Nebenläufigkeits- fehler finden. viii Acknowledgements I'm greatly indebted to my advisor, Rodrigo Rodrigues, for his continuous effort over the years to support and encourage me. His exceptional energy and constructive optimism were absolutely essential to the success of this work. His altruism made me a better person. This thesis would not exist without the contributions of my collaborators: Björn Brandenburg, Cheng Li and Vishal Singhal. Special thanks to Björnfor the feedback and support provided during the last few years of my work. I won't forget that, during my first years at MPI-SWS, I learned a lot with the per- severance and experience of Atul Singh and Andreas Haeberlen. The energy of Pramod Bhatotia and Cheng Li was contagious. Thank you all. My friends and colleagues at MPI-SWS, in particular the SysNets group members, had a crucial role in producing an inspiring research environment and providing invaluable feedback that were critical to this research. Living in Saarbrüecken, while working on my dissertation, was an enjoyable experience, in large part, thanks to my friends. I am grateful to my parents and my brother for their permanent support throughout my life. xi Contents List of Figures xvii List of Tables xix 1 Introduction1 1.1 Hardware parallelism: A blessing and a curse................1 1.2 The threat of concurrency bugs........................2 1.3 Problem statement...............................3 1.4 Contributions..................................5 1.5 Outline.....................................6 2 Related work7 2.1 Studies of bugs.................................7 2.2 Testing software................................9 2.2.1 Detecting concurrency bugs......................9 2.2.2 Exposing concurrency bugs...................... 12 2.3 Deterministic replay.............................. 14 2.4 Virtual machine introspection (VMI)..................... 14 3 Background 17 3.1 Systematic exploration of the interleaving space............... 17 3.1.1 PCT scheduling algorithm....................... 18 3.2 Database management systems........................ 19 3.2.1 MySQL overview............................ 20 3.2.2 Internal structure............................ 20 3.2.3 Concurrent programming....................... 21 3.2.4 Request vs. transaction concurrency................. 21 3.2.5 Storage engines............................. 22 3.3 Operating system kernels........................... 22 3.3.1 Concurrency and synchronization mechanisms............ 23 xiii Contents 3.3.2 Thread affinity............................. 23 4 The effects of concurrency bugs 25 4.1 Overview.................................... 25 4.2 Study methodology............................... 26 4.2.1 Selection of concurrency bugs..................... 26 4.2.2 Manual analysis of bug reports.................... 29 4.3 Results...................................... 29 4.3.1 Evolution of concurrency bugs.................... 29 4.3.2 External effects............................. 30 4.3.3 Latent bugs............................... 31 4.3.4 Characteristics of semantic bugs................... 33 4.3.5 Internal effects of latent bugs..................... 34 4.3.6 Recovering from latent errors..................... 35 4.3.7 Severity and fixing complexity.................... 35 4.4 Limitations and discussion........................... 37 4.5 Summary.................................... 38 5 Detecting latent and semantic bugs 41 5.1 Overview.................................... 41 5.2 Semantic and latent concurrency bugs.................... 42 5.2.1 Linearizability: The spec from within................ 42 5.2.2 Capturing application state...................... 43 5.2.3 Maintaining the summary functions................. 45 5.3 PIKE: A concurrency bug finding tool.................... 45 5.3.1 Handling false positives........................ 45 5.3.2 Implementation............................. 47 5.4 Experience with MySQL............................ 48 5.4.1 State summary functions....................... 48 5.4.2 Input generation............................ 49 5.5 Results...................................... 51 5.5.1 Development effort........................... 51 5.5.2 Bugs found............................... 51 5.5.3 False positives............................. 57 5.6 Limitations and discussion........................... 59 5.6.1 Reliance on modifications to tested software............. 59 5.6.2 Extending traditional test suites................... 60 xiv Contents 5.7 Summary.................................... 60 6 Exposing concurrency bugs in kernels 61 6.1 Overview.................................... 61 6.2 Kernel API and kernel modifications..................... 63 6.3 SKI: Exploring kernel interleavings...................... 64 6.3.1 Overview................................ 64 6.3.2 Exercising control over threads.................... 65 6.3.3 Inferring liveness............................ 65 6.3.4 Scheduling algorithm.......................... 67 6.3.5 Discussion................................ 69 6.4 Scaling to real-world kernels.......................... 70 6.5 Implementation................................. 72 6.5.1 Overview................................ 72 6.5.2 Runnable contexts........................... 73 6.5.3 Helper testing framework....................... 73 6.5.4 Optimizations and parallelization................... 74 6.5.5 Bug detectors.............................. 74 6.5.6 Traces and bug diagnosis....................... 75 6.6 Results...................................... 75 6.6.1 Configuration.............................. 76 6.6.2 Reproducing concurrency bugs.................... 76 6.6.3 Exposing unknown concurrency bugs................. 79 6.7 Limitations and discussion........................... 83 6.7.1 Reliance on thread pinning...................... 83 6.7.2 Reliance on VMM........................... 85 6.8 Summary.................................... 85 7 Testing software in the

Load more