Virtual Breakpoints for X86/64

Virtual Breakpoints for x86/64 Gregory Price1∗y August 22, 2019 Abstract Debuggers [1] and other instrumentation tools [3][4][7][11] are becoming increasingly complex as Efficient, reliable trapping of execution in a pro- malware deploys equally complex anti-analysis gram at the desired location is a linchpin tech- techniques. Many techniques require special mem- nique for dynamic malware analysis. The pro- ory allocators, compilers, or complex systems to gression of debuggers and malware is akin to a control the \view" of memory [10, 2]. In some game of cat and mouse - each are constantly in cases, the efficiency or robustness of these tech- a state of trying to thwart one another. At the niques rely on architecture-level quirks that are core of most efficient debuggers today is a com- more of a happy accident, rather than intentional bination of virtual machines and traditional bi- design. For example, SPIDER breakpoints [2] nary modification breakpoints (int3). In this pa- rely on a caching quirk of Intel CPU's to remain per, we present a design for Virtual Breakpoints efficient, and many dynamic binary instrumenta- | a modification to the x86 MMU which brings tion systems [10] are a mess of trampolines and breakpoint management into hardware alongside runtime hooking. page tables. In this paper we demonstrate the Despite the growing complexity, all of these fundamental abstraction failures of current trap- systems must trade between efficiency, reliabil- ping methods, and design a new mechanism from ity, and transparency [4]. If a trapping mech- the hardware up. This design incorporates lessons anism introduces significant overhead, analysis learned from 50 years of virtualization and de- becomes tedious. Traps that can be bypassed bugger design to deliver fast, reliable trapping via detection or eviction inherently unreliable. without the pitfalls of traditional binary modifi- Traps that modify a target program's memory cation. can cause corruption. These trade-offs create a game of whack-a-mole, which suggests we are 1 Introduction simply treating symptoms, rather than address- ing the disease. Trapping solutions such as single-stepping, arXiv:1801.09250v3 [cs.OS] 20 Aug 2019 In the modern age of malware analysis and \fuzzing for dollars", security researchers still seek a ro- debug registers, and full system emulation have bust, transparent, efficient trapping system. The been tried, but all fail to meet the flexibility field heavily leverages virtualization technologies and efficiency requirements to reach wide-spread [2, 5, 6, 7, 8, 9, 14, 10]. Unfortunately, these tech- adoption. Single stepping a large, complex pro- nologies inherit the failings of traditional trap- gram (like an OS) is extremely inefficient [1][2]. ping methods while adding the complexity of Usage of debug registers can be detected[2]. Fi- architecture-specific implementations. nally, pure-emulation is simply no longer suffi- cient to run and debug modern operating sys- ∗ *This work was supported by Raytheon CSI tems. y1Gregory Price is price.gr at husky.neu.edu Researchers have made significant headway formation is stored on a byte-per-byte basis with in solving reliability and efficiency problems by the data frame. leveraging virtualization[4, 2, 3, 8], but most still For each byte read from a breakpointed page, rely on some form of binary modification. The an 8-bit value is retrieved from the breakpoint difficulty with binary modification for debugging buddy-frame and used to determine if an inter- hostile programs is the lack of assurance the mod- rupt is generated. This byte implements stan- ifications are not a) detected, b) bypassed, or dard read/write/execute breakpoint settings, and c) introducing undefined behavior. This is a generates a debug break prior to executing the result of these trapping mechanisms being de- target instruction if the conditions are met. The signed prior to the advent of modern security remaining bits remain open for future develop- and reverse engineering requirements. ment. \Cutting edge" x86/64 debugging systems such By doing away with binary modification as as SPIDER [2] may accomplish (to an extent) the de facto standard of breakpointing, we gain the goal of stealth and efficiency, but still fail reliability, transparency, and guaranteed correct- to mitigate the corruption and reliability prob- ness of target program execution | all without lem with binary modification breakpoints. Bi- sacrificing flexibility or efficiency. nary modification systems carry a presumption of well-behaved programs and lack of user error. 2 Background For example, if a 5-byte instruction is located at memory address 0x10, and a trap instruction is Debuggers and dynamic analysis tools tradition- inserted at memory address 0x12, the resulting ally implement breakpoints in one of three ways: behavior could be a range of unintended con- single-stepping, debug registers, and binary mod- sequences (Illegal Instructions, Jumps to Ran- ification [1]. Each mechanism is not without dom Memory, Memory Corruption, etc). SPI- their flaws, and much research [3, 4, 5, 6, 7, 8, 11] DER cannot handle this modification or user- has been done on the topic of mitigating the ver- error case directly, handling the degenerate case itable list of issues. by simply evicting the trap and falling back to In a review of debugging technology published emulation (trading reliability for efficiency). in 1990[16], Vernon Paxson lays out a list of In this paper, we propose a memory man- techniques for debugging software that is eerily agement unit extension for virtual machine de- familiar. Despite almost 30 years of research bugging that addresses each of these core issues. since then, few if any new hardware-supported We claim that the corruption issue is the criti- debugging capabilities have been developed by cal piece of the puzzle that, if solved, will lead major hardware manufacturers. Even the \new" to robust, efficient debuggers at both the system ARMv8.5 Memory Tagging Extensions (MTE) and virtual machine level. The current field of are not truly new, with Paxson discussing fully debuggers attempt to fix this fundamentally un- tagged architectures existing as far back as 1982 solvable problem by way of building \a better (possibly earlier). mousetrap", but the problem they are attempt- Despite modern tools trying to mitigate the ing to solve reduces to the halting problem. flaws of dated trapping mechanisms, each subse- Our proposed solution introduces a \break- quent system increases complexity, reduces effi- point buddy-frame" and adds a \breakpoint bit" ciency, or narrows in applicability. All still fall to page table entries. When a byte on a page prey to anti-debugging techniques such as timing with this bit set is accessed, the breakpoint frame attacks, code integrity checks, and just-in-time is referenced (in hardware) to determine if that compilation or code relocation. address has a breakpoint set. This breakpoint in- 2.1 Traditional Trapping Methods ical headaches, namely that it is easily detected and may cause corruption. Much research has Each established method of trapping exhibits their been done to hide these breakpoints, but only own unique failures. Each battles with trying to recently has an efficient solution emerged [2]. achieve flexibility, efficiency, transparency, and When Intel and AMD released virtualized Page reliability - but none seem to solve for all four. Table support (known as Extended or Nested Single-stepping approaches make executing Page Tables [12, 13]), the idea of transparent bi- large, complex programs unbearably slow. Typi- nary modification came to fruition with SPIDER cally implemented via the use of the eflags/rflags [2]. SPIDER introduced a binary modification register, a full context switch between debug- mechanism that made use of extended page ta- ger and debuggee is required on each instruc- bles to split the \read-write" view of guest data tion. This can push execution time to be orders from the \execute" view of data. By doing so, a of magnitude longer than the original program guest cannot view a trap set via binary modifi- [2]. Even emulation approaches, which can be cation, because the guest may only read from a viewed as a form of binary interpreter, are sim- sanitized view of memory. ply too slow for general use. Unfortunately, this transparent breakpoint- Debug Registers are an efficient but finite re- ing system still fails to be sufficiently flexible source that are not practical nor easily virtu- and reliable. First, it falls victim to the \Crit- alized. On x86 there are only 4 debug regis- ical Byte Problem" which will be described in ters (DR0-DR4), limiting a debugger to 4 to- the next section. Second, because the host must tal watch/break points. Further, because this is maintain consistency between data and execu- a physical register limitation, the debuggee can tion views, the guest still has a mechanism (writ- typically detect whether these registers are being ing to its code pages) with which to for a trap used [2][10]. eviction. Finally, while binary modification meets the Moving forward, we will demonstrate that requirement of efficient execution, it is easily de- systems relying on binary modification cannot tected by a debuggee which monitors the integrity achieve perfect flexibility and reliability. If we of its own code [2, 10, 5, 6]. On x86, these traps hope to accomplish truly transparent and effi- are implemented by placing an \int3" (debug cient breakpointing, then we must design it from break) instruction at the given address. To see the hardware up. the problem with this technique, imagine a debuggee that periodically hashes its entire read- only codebase. It would be able to detect this 3 Overview change, and modify its behavior. It is still not apparent whether these tradi- In this section we discuss the goals of an \opti- tional breakpointing methods can meet the re- mal" trapping system.

Load more