Pure User Mode Deterministic Replay on Windows
Total Page:16
File Type:pdf, Size:1020Kb
Pure User Mode Deterministic Replay on Windows Atle Nærum Eriksen Thesis submitted for the degree of Master in Informatics: programming and networks 60 credits Department of Informatics Faculty of mathematics and natural sciences UNIVERSITY OF OSLO Autumn 2017 Pure User Mode Deterministic Replay on Windows Atle Nærum Eriksen August 1, 2017 Abstract The ability to record and replay program executions has many interesting applications such as debugging in the backwards direction, discovering and fixing the source of non-deterministic bugs and data races and retracing the steps of a system intrusion. Unfortunately, the power of deterministic replay tools is underutilized by the general public as the tools are either too difficult to deploy or unable to fulfill the performance and log size requirements of the user. As it happens, the majority of the research has been aimed at implementing such tools for Linux, and other platforms, including Windows, have mostly been neglected. In this thesis we look at whether it is possible to implement a determin- istic replay tool for the Windows platform that is easily deployable (user mode only without operating system support { this entails no OS modifi- cations or drivers), can record all system calls and their side-effects (even if unknown), works on large programs (1 GB+ RAM), and has a high recording performance (≈2x slowdown). We found that the challenges deterministic replay tools are facing in user mode are exacerbated on Windows due to a lack of documentation and a more restrictive API. Despite this we came up with a design proposal that solves all the problems necessary to implement a deterministic replay tool that satisfies all our requirements for the Windows platform. We present novel techniques to record thread scheduling and non-deterministic instructions. We also describe in detail how to recreate the address space of a recorded program in which code can be executed and access resources directly without instrumentation or modifications just like in the original program. An alternative novel approach to this technique is also suggested. None of the methods rely on operating system support. Although the design proposal remains theoretical we have implemented two partial prototypes that were used to experiment on a small dummy pro- gram. Our findings show that it is reasonable to expect a recording slowdown on real programs in the range of 1-5x that will stay consistent even on pro- grams with high memory usage. Regardless, the results are not conclusive and should be taken with a grain of salt. Acknowledgements I would like to express my sincere gratitude to my advisors Trond Arne Sørby and P˚alHalvorsen for giving me the opportunity to work on this amazing project. Considering the circumstances this was not a given, and I wanted to let you know that I am eternally grateful, in particular for giving me the freedom to run my own show. I experienced significant growth as a programmer and I consider the project to be one of the most difficult I have ever taken on { and it was totally worth it! I appreciated all of your input even though I did not have the time to act upon all of it. This thesis helped me grow as a person in unexpected ways and I thank you for this life-changing opportunity. I also want to thank Duncan Ogilvie, known in the community as mrex- odia, for starting and maintaining the x64dbg project, the debugger I have been using while implementing this project. Let me just say this is a seri- ously good debugger. I have spent, or should I say "time wasted debugging", literally weeks of my life staring at assembly code in the time I have been working on this thesis, and the other debuggers I would normally use just kept crashing over and over due to the abnormal behavior of my program (think malware), but x64dbg failed me only a few times in the timespan of a whole year. I can guarantee you this thesis would not exist without this debugger. A shout-out to all the developers on this project! I dedicate this thesis to Lisa, Erika and Miku. Thank you for your never- ending love and support. i Contents 1 Introduction 1 1.1 Background and Motivation . 1 1.2 Problem Statement . 2 1.3 Limitations and Scope . 5 1.4 Research Method . 6 1.5 Main Contributions . 6 1.6 Outline . 8 2 Technical Background 10 2.1 Windows Operating System Background . 10 2.1.1 Processes and Threads . 10 2.1.2 Handles . 15 2.1.3 Virtual Memory . 16 2.1.4 Libraries . 20 2.1.5 WOW64 . 23 2.1.6 Asynchronous Procedure Calls . 25 2.1.7 User Mode Callbacks . 28 2.1.8 Exception Handling . 29 2.1.9 System Calls . 32 2.2 Program Analysis Background . 34 2.2.1 Hooking . 34 2.2.2 Dynamic Binary Instrumentation . 36 2.2.3 Software Emulation . 37 2.2.4 DLL Injection . 39 3 Overview of Deterministic Replay 41 3.1 Program Tracing . 42 3.2 Deterministic Replay . 42 ii 3.3 Sources of Non-Determinism . 43 3.4 Taxonomy of Deterministic Replay . 46 3.4.1 Technical Difficulties of Deterministic Replay . 46 3.4.2 Single-Threaded vs Multi-Threaded Schemes . 49 3.4.3 Single-Processor vs Multi-Processor Schemes . 50 3.4.4 Recording Granularity . 51 3.4.5 Abstraction Levels . 52 3.4.6 Fidelity . 55 3.4.7 Apples and Oranges . 56 3.5 Applications of Deterministic Replay . 57 3.5.1 Online Program Analysis . 57 3.5.2 Fault Tolerance . 58 3.5.3 Intrusion Analysis . 58 3.6 Debugging Using Deterministic Replay . 60 3.6.1 Cyclic Debugging . 60 3.6.2 Time-Travel Debugging . 61 3.6.3 Automated Analysis . 63 3.6.4 Characteristics of a Deterministic Replay Debugger . 64 4 Deterministic Replay in User Mode 66 4.1 Operating System Modifications . 67 4.1.1 Upsides . 67 4.1.2 Downsides . 68 4.1.3 Conclusion . 69 4.2 User Mode Virtual Machines . 70 4.2.1 Recording in User Mode Virtual Machines . 71 4.2.2 Conclusion . 72 4.3 Instrumentation . 72 4.3.1 Instrumentation in User Mode Virtual Machines . 72 4.3.2 Instrumentation in Native Programs . 73 4.4 Challenges of Deterministic Replay in User Mode . 76 4.4.1 Transparency . 76 4.4.2 Recording Non-Deterministic Instructions . 78 4.4.3 Recording Thread Scheduling . 80 4.4.4 Replaying Thread Scheduling . 81 4.4.5 Recording System Calls . 86 iii 5 Dragoon: A Deterministic Replay Tool Design Proposal for Windows 89 5.1 Design Requirements and Limitations . 92 5.1.1 Availability . 92 5.1.2 Flexibility . 93 5.1.3 Performance . 95 5.1.4 Self-Modifying Code . 96 5.1.5 Limitations . 96 5.2 Design Overview . 97 5.2.1 User Space as a Self-Contained Black Box . 97 5.2.2 Unrecorded Zones . 99 5.2.3 System Components . 100 5.2.4 Logging . 102 5.2.5 Transparency . 104 5.3 Recording Component . 105 5.3.1 Recording System Calls . 105 5.3.2 Recording Callbacks . 110 5.3.3 Recording Thread Scheduling . 111 5.3.4 Recording Non-Deterministic Instructions . 114 5.3.5 Recording SharedUserData . 114 5.3.6 Recording Self-Modifying Code . 115 5.3.7 Recording GUI, User Input, Graphics and Audio . 115 5.4 Replay Component . 117 5.4.1 Replay Component Implementations . 117 5.4.2 Replaying Events . 122 5.4.3 Replaying Allocations . 126 5.4.4 Replaying Self-Modifying Code . 129 5.4.5 Post-Processing . 129 6 Evaluation 131 6.1 Dummy Program . 132 6.2 Memory Usage . 133 6.2.1 Conclusion . 133 6.3 Log Size . 134 6.3.1 Conclusion . 135 6.4 Performance . 135 6.4.1 Allocations . 136 6.4.2 Checksumming . 137 iv 6.4.3 Manually Handling System Calls . 139 6.4.4 Conclusion . 141 7 Conclusion 143 7.1 Summary . 143 7.2 Main Contributions . 145 7.3 Future Work . 147 Appendices 149 A Source Code 150 v List of Tables 6.1 Recording slowdown factor for varying system call spread in- tervals. 141 vi Chapter 1 Introduction 1.1 Background and Motivation Computer programs are deterministic; if given the same inputs on every ex- ecution, they will always produce the same outputs. This statement remains true even if we consider sources of randomness such as rand() and rdtsc, be- cause the condition states: given the same inputs { ergo if rand() returns 42 twice the program will output the exact same result both times. The skeleton of the program remains static after compile-time and it is only the inputs that can change, and sources of randomness are considered as inputs to the program for the reason that they could not be predicted and optimized at compile-time. The property of determinism helps us in formally evaluating a program, because the sequence of steps taken inside the program and the resulting outcome may be predicted based on the given inputs. Unfortunately, this is not the case in practice. Any program of reasonable size will contain sources of non-determinism such as user input, network- and disk I/O, OS scheduling, and current-time-retrieval. Any kind of program input that can not be predicted beforehand can be classified as non-deterministic. Even in the case where no such inputs have been added by the programmer it may have been added by the compiler. If the program's execution path can not be predicted, it can be very difficult to formally assess the correctness, robustness, safety, and resource usage of the program as required by its specification.