<<

Virtual Breakpoints for x86/64

Gregory Price1∗†

August 22, 2019

Abstract [1] and other instrumentation tools [3][4][7][11] are becoming increasingly complex as Efficient, reliable trapping of execution in a pro- malware deploys equally complex anti-analysis gram at the desired location is a linchpin tech- techniques. Many techniques require special mem- nique for dynamic malware analysis. The pro- ory allocators, compilers, or complex systems to gression of debuggers and malware is akin to a control the “view” of memory [10, 2]. In some game of cat and mouse - each are constantly in cases, the efficiency or robustness of these tech- a state of trying to thwart one another. At the niques rely on architecture-level quirks that are core of most efficient debuggers today is a com- more of a happy accident, rather than intentional bination of virtual machines and traditional bi- design. For example, SPIDER breakpoints [2] nary modification breakpoints (int3). In this pa- rely on a caching quirk of Intel CPU’s to remain per, we present a design for Virtual Breakpoints efficient, and many dynamic binary instrumenta- — a modification to the x86 MMU which brings tion systems [10] are a mess of trampolines and breakpoint management into hardware alongside runtime hooking. page tables. In this paper we demonstrate the Despite the growing complexity, all of these fundamental abstraction failures of current trap- systems must trade between efficiency, reliabil- ping methods, and design a new mechanism from ity, and transparency [4]. If a trapping mech- the hardware up. This design incorporates lessons anism introduces significant overhead, analysis learned from 50 years of virtualization and de- becomes tedious. Traps that can be bypassed bugger design to deliver fast, reliable trapping via detection or eviction inherently unreliable. without the pitfalls of traditional binary modifi- Traps that modify a target program’s memory cation. can cause corruption. These trade-offs create a game of whack-a-mole, which suggests we are 1 Introduction simply treating symptoms, rather than address- ing the disease. Trapping solutions such as single-stepping, arXiv:1801.09250v3 [cs.OS] 20 Aug 2019 In the modern age of malware analysis and “fuzzing for dollars”, security researchers still seek a ro- debug registers, and full system emulation have bust, transparent, efficient trapping system. The been tried, but all fail to meet the flexibility field heavily leverages virtualization technologies and efficiency requirements to reach wide-spread [2, 5, 6, 7, 8, 9, 14, 10]. Unfortunately, these tech- adoption. Single stepping a large, complex pro- nologies inherit the failings of traditional trap- gram (like an OS) is extremely inefficient [1][2]. ping methods while adding the complexity of Usage of debug registers can be detected[2]. Fi- architecture-specific implementations. nally, pure-emulation is simply no longer suffi- cient to run and debug modern operating sys- ∗ *This work was supported by Raytheon CSI tems. †1Gregory Price is price.gr at husky.neu.edu Researchers have made significant headway formation is stored on a byte-per-byte basis with in solving reliability and efficiency problems by the data frame. leveraging virtualization[4, 2, 3, 8], but most still For each byte read from a breakpointed page, rely on some form of binary modification. The an 8-bit value is retrieved from the breakpoint difficulty with binary modification for buddy-frame and used to determine if an inter- hostile programs is the lack of assurance the mod- rupt is generated. This byte implements stan- ifications are not a) detected, b) bypassed, or dard read/write/execute breakpoint settings, and c) introducing undefined behavior. This is a generates a debug break prior to executing the result of these trapping mechanisms being de- target instruction if the conditions are met. The signed prior to the advent of modern security remaining bits remain open for future develop- and reverse engineering requirements. ment. “Cutting edge” x86/64 debugging systems such By doing away with binary modification as as SPIDER [2] may accomplish (to an extent) the de facto standard of breakpointing, we gain the goal of stealth and efficiency, but still fail reliability, transparency, and guaranteed correct- to mitigate the corruption and reliability prob- ness of target program execution — all without lem with binary modification breakpoints. Bi- sacrificing flexibility or efficiency. nary modification systems carry a presumption of well-behaved programs and lack of user error. 2 Background For example, if a 5-byte instruction is located at memory address 0x10, and a trap instruction is Debuggers and dynamic analysis tools tradition- inserted at memory address 0x12, the resulting ally implement breakpoints in one of three ways: behavior could be a range of unintended con- single-stepping, debug registers, and binary mod- sequences (Illegal Instructions, Jumps to Ran- ification [1]. Each mechanism is not without dom Memory, Memory Corruption, etc). SPI- their flaws, and much research [3, 4, 5, 6, 7, 8, 11] DER cannot handle this modification or user- has been done on the topic of mitigating the ver- error case directly, handling the degenerate case itable list of issues. by simply evicting the trap and falling back to In a review of debugging technology published emulation (trading reliability for efficiency). in 1990[16], Vernon Paxson lays out a list of In this paper, we propose a memory man- techniques for debugging software that is eerily agement unit extension for virtual machine de- familiar. Despite almost 30 years of research bugging that addresses each of these core issues. since then, few if any new hardware-supported We claim that the corruption issue is the criti- debugging capabilities have been developed by cal piece of the puzzle that, if solved, will lead major hardware manufacturers. Even the “new” to robust, efficient debuggers at both the system ARMv8.5 Memory Tagging Extensions (MTE) and virtual machine level. The current field of are not truly new, with Paxson discussing fully debuggers attempt to fix this fundamentally un- tagged architectures existing as far back as 1982 solvable problem by way of building “a better (possibly earlier). mousetrap”, but the problem they are attempt- Despite modern tools trying to mitigate the ing to solve reduces to the halting problem. flaws of dated trapping mechanisms, each subse- Our proposed solution introduces a “break- quent system increases complexity, reduces effi- point buddy-frame” and adds a “breakpoint bit” ciency, or narrows in applicability. All still fall to page table entries. When a byte on a page prey to anti-debugging techniques such as timing with this bit set is accessed, the breakpoint frame attacks, code integrity checks, and just-in-time is referenced (in hardware) to determine if that compilation or code relocation. address has a breakpoint set. This breakpoint in- 2.1 Traditional Trapping Methods ical headaches, namely that it is easily detected and may cause corruption. Much research has Each established method of trapping exhibits their been done to hide these breakpoints, but only own unique failures. Each battles with trying to recently has an efficient solution emerged [2]. achieve flexibility, efficiency, transparency, and When Intel and AMD released virtualized Page reliability - but none seem to solve for all four. Table support (known as Extended or Nested Single-stepping approaches make executing Page Tables [12, 13]), the idea of transparent bi- large, complex programs unbearably slow. Typi- nary modification came to fruition with SPIDER cally implemented via the use of the eflags/rflags [2]. SPIDER introduced a binary modification register, a full between debug- mechanism that made use of extended page ta- ger and debuggee is required on each instruc- bles to split the “read-write” view of guest data tion. This can push execution time to be orders from the “execute” view of data. By doing so, a of magnitude longer than the original program guest cannot view a trap set via binary modifi- [2]. Even emulation approaches, which can be cation, because the guest may only read from a viewed as a form of binary , are sim- sanitized view of memory. ply too slow for general use. Unfortunately, this transparent breakpoint- Debug Registers are an efficient but finite re- ing system still fails to be sufficiently flexible source that are not practical nor easily virtu- and reliable. First, it falls victim to the “Crit- alized. On x86 there are only 4 debug regis- ical Byte Problem” which will be described in ters (DR0-DR4), limiting a to 4 to- the next section. Second, because the host must tal watch/break points. Further, because this is maintain consistency between data and execu- a physical register limitation, the debuggee can tion views, the guest still has a mechanism (writ- typically detect whether these registers are being ing to its code pages) with which to for a trap used [2][10]. eviction. Finally, while binary modification meets the Moving forward, we will demonstrate that requirement of efficient execution, it is easily de- systems relying on binary modification cannot tected by a debuggee which monitors the integrity achieve perfect flexibility and reliability. If we of its own code [2, 10, 5, 6]. On x86, these traps hope to accomplish truly transparent and effi- are implemented by placing an “int3” (debug cient breakpointing, then we must design it from break) instruction at the given address. To see the hardware up. the problem with this technique, imagine a de- buggee that periodically hashes its entire read- only codebase. It would be able to detect this 3 Overview change, and modify its behavior. It is still not apparent whether these tradi- In this section we discuss the goals of an “opti- tional breakpointing methods can meet the re- mal” trapping system. Next, we will construct quirements of an “optimal” solution. In-fact, we an abstraction of how present-day trapping sys- claim binary modification produces an abstrac- tems are implemented and identify violations of tion that may guarantee it can never provide an our goals. What we find is that current systems “optimal” solution. are attempting to solve an impossible problem.

2.2 “Stealthy” VM Breakpoints 3.1 Goals The first virtual machine breakpointing systems To start, we borrow the requirements laid out by utilized binary modification as a primary trap- SPIDER[2], with one modification made to the ping mechanism. This carried with it all the typ- requirement of Flexibility. 1. Flexibility : Ability to set a trap at any memory address.

2. Efficiency : Maintain high performance

3. Transparency : The target program should not be able to detect breakpoints

4. Reliability : The target program should not be able to bypass or tamper with break- points.

When we modify flexibility to state any mem- ory address instead of any instruction, we find that no binary modification solution can accom- plish all four requirements. Even the best sys- Figure 1: Standard x86 processor Read-Eval- tem, given a degenerate case, falls back on al- Execute loop. Breakpoints fire on execute. ternate trapping methods which violate at least one requirement. For example, when a debuggee overwrites instrumented code under SPIDER, bi- nary modification traps must be evicted from the behavior and the full length of the instruction. modified page to avoid introducing undefined be- When a trap is aligned correctly, the instruction havior. We have dubbed this issue the ”Critical is effectively replaced in its entirety. When a Byte Problem”. trap is misaligned, the 0xCC byte is decoded as part of the jump address. 3.2 The “Critical Byte Problem” Binary modification breakpoints depend on the execution of an instruction to trigger a break- point (Figure 1). On x86, this is ac- complished by replacing the first byte of an in- struction with an int3 instruction (binary 0xCC) (Figure 2). This dependence on execution of de- buggee code to determine debugger behavior is a Figure 2: Misaligned breakpoints cause unde- form of implicit trust. The debugger trusts the fined behavior binary will behavior nicely, while the debuggee trusts that the modification will not introduce What the figure 2 shows is that our “optimal undefined behavior. system” should not depend on the execution of This causes what we have dubbed the “Criti- debuggee instructions, nor should it modify de- cal Byte Problem”. If a binary modification trap buggee data. We should avoid trusting the de- is set on any byte other than the first of an in- buggee to provide debugger functionality, and we struction, it introduces undefined behavior. Fig- should maintain the integrity of debuggee data ure 2 shows an example of how a jmp instruction to guarantee correctness. can be overwritten with an int3 instruction, or Systems such as SPIDER attempt to build modified to jump to the incorrect place. complex solutions to this problem, but only end The processor reads and evaluates the first up trading efficiency for reliability. Their sys- byte of an instruction to determine the general tem does not correct for user-error, and when instrumented code pages are modified, a “Code machines. Extended (Intel) or Nested (AMD) Modification Handler” is invoked — usually to Page Tables, as in figure 4, create two layers of evict the trap and fall back to emulation. Fur- page tables, where each guest memory reference ther, no binary modification breakpoint system must go through a second layer of translation to can programmatically re-set a breakpoint over reach the physical memory frame. These page data that has been overwritten by the debuggee, tables are appropriately named the “Guest Page as determining where in an arbitrary program is Table” and the “Host Page Table”. “safe” to instrument maps directly to the halt- ing problem (the problem depends entirely on the behavior of the program). What we find is that the critical byte problem is a symptom of a larger abstraction failure. The next section will de-construct this failure, and show how providing a proper virtualization layer can mitigate the problem.

3.3 Memory Abstractions It’s useful to step back and look at how data in a virtualized system is segregated by owner- ship. First, in Figure 3, we examine virtual mem- ory implementation on x86. The page tables, in Figure 4: Extended Page Tables still share a software or in hardware, can be viewed as being single frame for data and breakpoints owned by the . Conversely, the When we build a similar abstraction model contents of the data frame can be viewed as being for SPIDER (figure 5 ), we see a pattern emerge owned by the software. Most Operating Systems when we focus on “ownership” of resources. do not allow programs to directly modify page table contents for security reasons. This concept of will be useful in designing our optimal breakpoint solution later.

Figure 5: SPIDER also uses a frame containing data and breakpoints

Figure 3: Virtual Memory on x86 In all three abstractions, we can see that de- bugger and debuggee data are contained within Virtualization extensions for x86 on both In- the same frame when binary modification traps tel [12] and AMD [13] processors provide a sec- are introduced. A userland debugger, such as ond layer of memory virtualization for virtual GDB, places int3 instructions directly into the program’s data frame. The same mechanism is unit of a processor, some care must be taken used in naive virtual machine debugging, as seen to extend it in such a way that is reasonably in Figure 4. Likewise under SPIDER, despite the friendly to existing operating systems. Our de- read/write view of program data being sanitized sign is no exception, so we must identify a solu- of debugger data, the execution view still holds tion that leverages existing Operating Systems both debugger and debuggee data. structures. Likewise, we must retain backwards Given this type of trap and debugger design compatibility with all existing trapping mecha- design, an instrumented program can trivially nisms. read and/or interact with breakpoints — or cause a change in debugger behavior (via eviction). A 4.1 Breakpoint Virtualization Layer program debugged under SPIDER, despite not being able to see the breakpoints, can still force Unlike developing software breakpoint systems, an eviction by blindly overwriting its own code a similar hardware-based breakpoint virtualiza- pages (a simple code-migration technique may tion system has extremely strict limitations and be sufficient). There exists no mechanism with requirements. We claim any MMU modification which to prevent this, as the program is trusted that supports a separate view of data and break- with complete control over its data. points must adhere to (at a minimum) the fol- In summary, even the most effective mech- lowing three requirements to have a chance at anism today fails to fully mitigate breakpoint being adopted: eviction (either outright, or efficiently) due to 1. TLB Compatibility placing debugger and debuggee data in the same data frame. When a breakpoint overwrite oc- 2. Hardware-Friendly Translation Mechanism curs, current solutions must either fall back to another execution method, use a more complex 3. Flexible Allocation and Use trapping mechanism, or simply evict the break- point all together. To be TLB compatible, any design must be In the following section, we will propose a limited to using only the machine physical ad- system which entirely separates debuggee data dress and the page size. When using a virtual from debugger data (breakpoints). This simpli- machine, the use of host virtual addresses is pro- fies the handling of degenerate cases, mitigates hibited because the TLB stores only a direct the “Critical Byte Problem”, and inherently pro- translation from guest virtual to machine phys- vides transparency. ical address[12]. We are afforded the page size from the MMU being set up by the operating system to walk a predetermined structure, how- 4 Design ever this could also be configurable via a new Model Specific Register. When new hardware-supported trap mechanisms Next, the mechanism of translation from a stopped appearing, malware analysis was not a data address to a breakpoint address must be global industry, and the challenges and require- simple and fast. Requiring multiple additional ments related to the work hardly dreamt of. Given memory dereferences during an instruction or this, it is worth exploring extensions which may data fetch may not be feasible. require new hardware support. Our focus will Finally, the mechanism must be optional. Re- be on solving the “Critical-Byte Problem” with quiring constant use on all pages would effec- a new virtualization extension to the x86 Mem- tively halve the size of usable memory and in- ory Management Unit (MMU). troduce significant overhead. While this may be When modifying the memory management considered a fair trade off for a specialized sys- tem, a general purpose system must allow op- tional use.

Figure 7: Virtualization layer to translate phys- ical address to buddy frame addresses

to use pages larger than the standard 4KB as long as this information is retrievable in hard- ware by the MMU. The last requirement dictates an agreement between the operating system and the MMU about how page frames will be allocated and managed. Figure 6: Individually contiguous buddy frames This is accomplished through the use of a page avoid the need for large contiguous areas of mem- table entry which has software-defined bits. ory

When applying the first two requirements strictly, we find a design that uses physically contigu- ous page frames to be the obvious solution. We call this a Buddy Frame. Implementing buddy frames on a per-frame basis, rather than in bulk, allows us to avoid allocating large contiguous areas memory (figure 6 ). This frame could be global (1 per physical frame) or per-cpu (1 buddy frame per physical or virtual cpu). A per-cpu buddy frame system would require the use of the cpu id (0-n) during address transla- tion to determine the offset from the data frame Figure 8: Virtual Breakpoints. Data is to buddy frame. It may require additional TLB “debuggee-only”, Breakpoints are “debugger- and Cache invalidation as well. There are some only”. complexities with a per-cpu option that are not considered within this work. We leave this de- We propose the addition of a “Breakpoint sign extension up to an implementer to experi- Bit” to the page table entry (figure 8). Much ment with. For the remainder of this work, we like the “present bit” determines whether the will work with a single buddy frame design. MMU produces a page-fault, the breakpoint-bit The translation from data address to break- would determine whether the MMU would do a point address is then a simple addition of the subsequent breakpoint-frame lookup for an ac- page-size (figure 7 ). This also makes our solu- cessed address. Breakpoint info would be stored tion extensible to operating systems configured in byte-for-byte parity with the data frame. Fig- ure 8 also shows that our new “ownership” ab- Finally, in a naive system, care must be taken straction has resolved the issue of debugger and not to swap out buddy frames when their data- debuggee data living within the same data frame. frame counterparts are still in memory (and vice- This design can be seen as applying the idea versa). The operating system’s swapping algo- of Virtual Memory (Figure 3) to the concept of rithm could account for this by pinning the frame breakpoints. This is the provenance of the name or by utilizing an additional bit in the Page Ta- Virtual Breakpoint. ble Entry that denotes whether a given frame is itself a breakpoint frame. This work leaves this 4.2 OS Modifications as future work for OS and debugger designers. Given the above hardware implementation, the 4.3 Analysis of Design OS modifications to support virtual breakpoints are straight-forward. An Operating System must Using the above design, we re-examine the orig- provide four new mechanisms: inal design goals we set out to accomplish.

1. Buddy Frame Allocation • Flexibility: Breakpoints can be set on any address, and the allocation of buddy-frames 2. Page Table Entry “Breakpoint Bit” is limited to just those pages where break- 3. Breakpoint Manipulation API points are set. At worst the number of frames can increases 4. Buddy Frame Swap Compatibility linearly with the number of breakpoints. The first two mechanisms are solved by im- This can double memory usage. However, plementing a new kernel allocator option. This we note that SPIDER breakpoints suffer option would allocate two physically contiguous from the same problem [2]. frames, and set the breakpoint bit on the data • Efficiency: Debuggee execution now occurs frame’s page table entry. Buddy Frame Alloca- purely in bare-metal without requiring ad- tion may be flexibly applied if Copy-On-Write ditional instrumentation for misbehaved pro- is required for an instrumented page. The in- grams. heriting task may either automatically allocate its own buddy page during Copy-On-Write, or it A debuggee now executes and performs I/O may leave it behind. The author notes that this on the same data frame without incurring is an example of how such a hardware extension a context switch. A context switch on sys- may open new opportunities for debugger design. tems such as SPIDER can take thousands The third mechanism is a new kernel API of instructions [2], and occurs twice per which allows for the breakpoint frame to be ac- fault (guest exit and entry). Instead, it is cessed in a pre-defined manner by the debugger. possible this solution may reduce this cost A per-byte breakpoint API could be used for pre- by distributing a small number of cycles cision, and a per-page optimization could be pro- per instruction on an instrumented page. vided if large breakpoints are needed. This API When the number of breakpoints is small gives debugger and instrumentation developers a and concentrated to a small portion of code standardized way to set breakpoints. (a typical use-case), performance suffers only This system places no limitation set on acces- when utilizing an instrumented page. In sibility of debuggee data by the debugger, tradi- fact, performance should be no worse than tional binary modification breakpoints can still SPIDER’s stealthy breakpoints for self-modifying be used. The retains backward compatibility for code given the similar design. As the num- all existing debugging platforms. ber of breakpoint frames grow, the number of memory references grows by a factor of • raise an exception when breakpoint terms at most 2 (one for data, one for breakpoint are met information). 5.2 Modified Linux Kernel • Transparency: Complete segregation of de- buggee and debugger data ensures the de- • page table entry breakpoint-bit modifica- buggee has no avenue with which to see tion breakpoint information. • breakpoint page allocation option/interface • Reliability: Complete segregation of debuggee for buddy frames and debugger data ensures the debuggee has no avenue with which to affect break- • breakpoint management interface (set break- point information. point on given address) • virtual breakpoint interrupt handler In the worst case scenario, a breakpoint is set on every page of a target. This doubles memory • add memory mapping modifier (modify non- usage and causes an additional memory refer- breakpointed page to add virtual break- ence to occur per memory reference. In both point page) cases, existing systems exhibit at least as bad performance, if not worse. SPIDER breakpoints • memory swap algorithm modification to avoid double memory usage per utilized page and can swapping buddy-pages cause thousands of instructions of overhead per memory access on an instrumented page. 5.3 Modified Debugger • add/remove virtual breakpoint page com- 5 Implementation mand

This section provides a roadmap for implement- • add/remove virtual breakpoint command ing a prototype of a virtual breakpoint system. • add/remove virtual watchpoint command At first, we propose implementing the solution in an emulator to prove feasibility, and then ex- • virtual breakpoint signal handler tending an existing hardware platform (poten- tially via FPGA) to prove efficiency. However, 6 Future Work notably, neither implementation would be useful beyond further academic exploration of the prob- By solving the critical-byte problem, we open up lem, since extending an emulator would lead to a number of possibilities - including new types severe inefficiencies and existing open-architecturesof breakpoints, and implementing more efficient tend not to exhibit the critical byte problem. code coverage systems. This section discusses this potential extensions. 5.1 Modify QEMU i386 MMU • tlb pte breakpoint bit checking 6.1 New breakpoint types

• page table walking bit checking • Break on Instruction Fetch Presently, instruction-fetch breakpoints are • buddy frame lookups when breakpoint bit not possible (except for work-around like is set debug registers and page table permission twiddling). A bit in the 8-bit breakpoint field on a buddy-page could be utilized to [6] [14] employ software to modify shadow page cause a breakpoint interrupt to fire on in- tables, extended page tables, and interrupt han- struction fetch. dlers to mask certain data from guest machines. Overshadow[14] provides a mechanism with • Nesting which to hide (encrypt) the contents of a guest- Implementing the virtualization layer such ’s memory from the guest kernel. The that a guest can set up another layer of researchers goals were to provide a mechanism virtual breakpointing. The would allow for with which a guest process could execute securely, introspection of a hostile virtual machine even if the guest operating system was actively within another virtual machine. hostile. Overshadow accomplished this by ex- tending the original form of page table virtual- • Coverage/Taint ization released by Intel and AMD - Shadow page A taint-propagation debugger can be de- tables. veloped that copies breakpoint-bits when- Overshadow implemented a shadow page ta- ever data is transferred from one area of ble containing multiple mappings (encrypted and memory to another. For example, When unencrypted) of a guest’s physical memory, and external data is injected into memory (via actively tracks the “identity” of the guest pro- inbound network traffic), the storage lo- cess attempting to read a page. When the ac- cation has coverage bits set in its buddy cessing process does not have the correct iden- frame. As this data propagates to other tity, an encrypted page is presented (on read) or areas of memory, the coverage bits propa- the machine is terminated (on write). When the gate with them. accessing process does have the correct identity, an unencrypted page is presented with the per- • ”Hook Points” missions originally granted to the page. Unfor- A specialized interrupt handler could be tunately, its dependence on shadow page tables registered with the operating system that means it’s likely to be too inefficient [15] for mod- is called whenever a “hook point” interrupt ern operating systems and heavy load systems. is fired. The “Hook Point” is set on an ad- VAMPiRE[5] takes advantage of virtual mem- dress, and the address is entered into a map ory page permissions to be executed via alternate accessed by the interrupt handler. Upon methods. In particular, the developers chose to executing / accessing the hooked address, implement a single-step handler which executes the debuggee exits directly to the interrupt any instruction (or accesses any data) falling on handler for dispatching. This is exception- a page which contains a breakpoint. This is ally useful for virtual machines, where dis- done to ensure any reads and overwrites of break- patching could be handled directly upon points are not possible, and so the integrity of VMExit (similar to a vmcall). guest data is preserved. Unfortunately, relying This could be useful for recording inbound on single-stepping and what are effectively page- or outbound non-deterministic events for length breakpoints limits the flexibility of the checkpoint/restart and replay systems (such breakpoint system. For example, if a program as clock reads via RDTSC). is contained entirely within a single page, or a breakpointed page contains “hot” code, then speed will suffer dramatically. 7 Related Work Finally, Spider[2] comes the closest to im- plementing a solution that maximizes efficiency, The idea of segregating “views” of data is not while retaining the flexibility of traditional bi- necessarily novel. A number of technologies[2] [5] nary modification breakpoints. Similar to VAM- accessible only by the debugger. Unlike other so- PiRE, it leverages virtual page permissions to de- lutions which make heavy use of binary modifica- termine what “view” of memory should be pro- tion, our system retains integrity of breakpoints vided to the debuggee. On read/write, a “sani- and correctness of execution by not requiring the tized” view of memory (sans breakpoints) is pro- modification of debuggee data. vided to hardware to prevent detection. On exe- We believe that this Virtual Breakpointing cute, the modified page (containing int3 instruc- design the solution that virtual machine intro- tions) is provided for execution. spection and interposition systems have been look- Spider maintains efficiency by leveraging a ing for. It presents opportunities for the devel- quirk in caching mechanisms, which allows both opment of brand new types of breakpoints, and views to be cached (one in the instruction cache, is efficient enough to run commodity operating one in the data cache), limiting the number of systems under test. VM Exits required to expose the correct data on a given access. Unfortunately, as previously dis- References cussed, Spider still falls prey to some forms of classic anti-debugging techniques (such as over- [1] Gdb. http://www.gnu.org/software/gdb/. writing) due to its reliance on binary modifica- tion. [2] Deng, Z., Zhang, X., & Xu, D. (2013, De- Each of these works provide a trend of re- cember). Spider: Stealthy binary program searchers attempting to split the view of data instrumentation and debugging via hard- based on whether the debuggee is trusted. In ware virtualization. In Proceedings of the Overshadow, some parts of the debuggee are trusted 29th Annual Computer Security Applica- but not others. In VAMPiRE and Spider, the tions Conference (pp. 289-298). ACM. debuggee is not trusted to read or write instru- mented memory. Recognizing the trust violation [3] U. Bayer, C. Kruegel, and E. Kirda. Tt- was the primary inspiration for producing fully analyze: A tool for analyzing malware. In segregated data and breakpoint frames. EICAR06 [4] D. Bruening. Efficient, transparent, and 8 Conclusion comprehensive runtime code manipulation. PhD thesis, 2004. In this paper we demonstrated the problems with traditional and modern introspection systems to [5] Vasudevan, A., & Yerraballi, R. (2005, De- show a clear need for a fully virtualized break- cember). Stealth breakpoints. In Computer point solution. As we broke down various break- security applications conference, 21st An- pointing techniques, we identified two fundamen- nual (pp. 10-pp). IEEE. tal issues with current solutions. Either these so- [6] Vasudevan, A. (2009, October). Re-inforced lutions were inefficient, or they traded integrity stealth breakpoints. In Risks and Security and correctness of execution for efficiency. of Internet and Systems (CRiSIS), 2009 We proposed a virtualization extension for Fourth International Conference on (pp. 59- the . We build this 66). IEEE. from the ground up with the goals of an optimal system in mind. Our solution is novel in that it [7] Dinaburg, A., Royal, P., Sharif, M., & Lee, treats debuggee execution as an fundamentally W. (2008, October). Ether: malware anal- untrusted action, and segregates all debug re- ysis via hardware virtualization extensions. lated information into a separate buddy-frame In Proceedings of the 15th ACM conference on Computer and communications security for x86 virtualization. ACM SIGOPS Oper- (pp. 51-62). ACM. ating Systems Review, 40(5), 2-13.

[8] Willems, C., Hund, R., Fobian, A., Felsch, [16] Paxson, V. (1990). A Survey of Support for D., Holz, T., & Vasudevan, A. (2012, De- Implementing Debuggers. cember). Down to the bare metal: Using processor features for binary analysis. In [17] Jeffk. Proceedings of the 28th Annual Computer [18] Albert. Security Applications Conference (pp. 189- 198). ACM. [9] Vogl, S., & Eckert, C. (2012, April). Using hardware performance events for instruction-level monitoring on the x86 ar- chitecture. In Proceedings of the 2012 Euro- pean Workshop on System Security EuroSec (Vol. 12). [10] R. R. Branco, G. N. Barbosa, and P. D. Neto. Scientific but not academ- ical overview of malware anti-debugging, anti-disassembly and anti-vm technologies. Blackhat USA12. [11] P. Feiner, A. D. Brown, and A. Goel. Com- prehensive kernel instrumentation via dy- namic binary translation. In ASPLOS12 [12] Intel, Intel. ”Intel(R) 64 and IA-32 architec- tures software developers manual.” Volume 3A: System Programming Guide, Part 1.64 (64). [13] Amd, Amd. ”Developer Guides, Manuals, & ISA Documents.” https://developer.amd.com/resources/developer- guides-manuals/. [14] Chen, X., Garfinkel, T., Lewis, E. C., Sub- rahmanyam, P., Waldspurger, C. A., Boneh, D., ... & Ports, D. R. (2008, March). Over- shadow: a virtualization-based approach to retrofitting protection in commodity oper- ating systems. In ACM SIGARCH Com- puter Architecture News (Vol. 36, No. 1, pp. 2-13). ACM. [15] Adams, K., & Agesen, O. (2006). A com- parison of software and hardware techniques