Tdb: A Source-level Debugger for Dynamically Translated Programs
Naveen Kumar†, Bruce R. Childers†, and Mary Lou Soffa‡
†Department of Computer Science ‡Department of Computer Science University of Pittsburgh University of Virginia Pittsburgh, Pennsylvania 15260 Charlottesville, Virginia 22904 {naveen, childers}@cs.pitt.edu [email protected]
Abstract single stepping through program execution, watching for par- ticular conditions and requests to add and remove breakpoints. Debugging techniques have evolved over the years in response In order to respond, the debugger has to map the values and to changes in programming languages, implementation tech- statements that the user expects using the source program niques, and user needs. A new type of implementation vehicle viewpoint, to the actual values and locations of the statements for software has emerged that, once again, requires new as found in the executable program. debugging techniques. Software dynamic translation (SDT) As programming languages have evolved, new debugging has received much attention due to compelling applications of techniques have been developed. For example, checkpointing the technology, including software security checking, binary and time stamping techniques have been developed for lan- translation, and dynamic optimization. Using SDT, program guages with concurrent constructs [6,19,30]. The pervasive code changes dynamically, and thus, debugging techniques use of code optimizations to improve performance has necessi- developed for statically generated code cannot be used to tated techniques that can respond to queries even though the debug these applications. In this paper, we describe a new optimization may have changed the number of statement debug architecture for applications executing with SDT sys- instances and the order of execution [15,25,29]. tems. The architecture provides features that create the illusion Currently, a new implementation vehicle, software that the source program is being debugged, while allowing the dynamic translation (SDT), is being increasingly used for SDT system to modify the executing code. We incorporated this important applications, including software security [17,22], architecture in a new tool, called tdb, that integrates a SDT dynamic code optimization [1,2,4], binary translation of one system, Strata, with a widely used debugger, gdb. We evaluated instruction set to another [10,11,12,23], host machine virtual- tdb in the context of a code security checker. The results show ization [28], and computer architecture simulation [8,28]. A that a dynamically translated program can be debugged at the SDT system translates (generates) target code fragments/traces source level and that the approach does not overly increase the dynamically and stores the generated code in a fragment cache run-time performance or memory demands of the debugger. from which it executes. “Trampoline” code is put in the trans- lated code to switch between the fragment cache and the dynamic translator for code generation. A SDT system may Categories and Subject Descriptors also insert instrumentation code into a translated program to D.2.5. [Software Engineering]: Testing and Debugging— gather run-time information and monitor program behavior. Debugging aids; D.3.3. [Programming Languages]: Lan- The motivation to debug dynamically translated programs guage Constructs and Features—Program instrumentation, arises from two sources. First, some environments may have a run-time environments dynamic translator tightly integrated with the host machine such that all applications must be translated before execution. Therefore, a developer may not have the luxury of debugging General Terms the application without dynamic translation. Security systems enforcing security policies by checking an application’s binary Languages, Performance, Algorithms code fall in this category [5]. Secondly, SDT systems may result in code transformations that expose bugs that are not Keywords manifested in the absence of the transformation. An example in this category is a dynamic optimizer [1,2,4]. Debugging, Dynamic Binary Translation, Dynamic Instrumen- SDT creates problems in debugging that cannot be solved tation by current techniques. In particular, debuggers targeted for statically generated code produce debug information at com- 1 Introduction pile-time. The debug information consists of mappings that can be used to relate source code and data to the executable. Although the importance of debugging techniques to assist Since the target code is generated dynamically, this static users in finding software bugs has long been recognized, new debug information for relating the untranslated code state- debugging techniques continue to be needed as programming ments and data values to the executing code, is insufficient. languages, user demands, and implementation strategies Another problem is that a translated statement can be modified change. The goal of a debugger is to accurately respond to user during execution, or a statement may be translated several queries and commands from the viewpoint of the source pro- times such that the number of statement instances can change gram. Debugging activities include requesting variable values, throughout execution. Yet another problem that current tech- niques cannot handle is a user may put a breakpoint, using the Permission to make digital or hard copies of all or part of this work for source program, in code that has not been translated yet. Solv- personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that ing these problems requires techniques that can relate the copies bear this notice and the full citation on the first page. To copy source code to the executing code, as the code is dynamically otherwise, or republish, to post on servers or to redistribute to lists, generated and modified. requires prior specific permission and/or a fee. In SDT systems, the trampoline code, as well as instrumen- AADEBUG’05, September 19–21, 2005, Monterey, California, USA. tation code, is executed as part of the application code. How- Copyright 2005 ACM 1-59593-050-7/05/0009...$5.00. ever, this code should be transparent to the user debugging the application. Similarly, the translation actions should also be hidden from debug users. Therefore, debugging techniques Application Binary need to be developed that hide execution of everything that is Context Dynamic Translator unrelated to source code. Capture Finally, due to both the growing importance of SDT and New No New Cached? the difficulty of developing SDT systems, several frameworks, PC Fragment including Strata [21], Dynamo/RIO [4], and Walkabout [7], Yes have attempted to enable dynamic translator reconfigurability Fetch and retargetability to ease the burden of SDT development. As Decode TranslateTranslate Context Yes a result, the SDT system can easily be targeted to a different Finished? Next PC Switch SDT application or machine platform. Hence, there is a strong No need to decouple the debugger as much from the SDT system e lat ns OS ra as possible, so that debugging capabilities can be quickly real- T & nt me ru st ized for a new SDT system. In In this paper, we present a new debug architecture for SDT CPU systems that achieves transparency while supporting the usual Figure 1: A software dynamic translator source-level debug queries including step, watch, and set and remove breakpoints. The debug architecture also provides nec- control a program, including pausing and stepping through essary decoupling between the SDT system and the debugger. execution and inspecting and modifying data values. To This paper provides a reference implementation of the answer source-level queries, a debugger maps a program’s debug architecture. The dynamic code changes that we con- source code to its binary executable instructions and data loca- sider result from basic translations, overhead reduction trans- tions using information provided by the compiler. A debug- formations, and dynamic instrumentation. The basic dynamic ger’s job is difficult when programs are modified dynamically translations include generating a new instruction, inserting because information provided by the compiler can become multiple instructions for a single program statement during inconsistent with the code modifications. translation, ignoring and not generating instructions for a pro- A SDT is an execution environment that changes code gram statement, deletion (flushing) of previously translated dynamically. A basic SDT system, shown in Figure 1, is a soft- instructions, and the duplication of program instructions in the ware layer between the application and the underlying operat- translated code. We also consider several overhead reduction ing system. Thus, given an application’s source program, a transformations, including instruction trace formation, condi- compiler first generates some type of code, including binary or tional branch linking, indirect branch translation caching, par- a higher level representation such as byte code. This code, tial inlining of unconditional branches and calls, and fast which we call “untranslated” code, is then translated by the return handling [21]. Finally, we consider the effect of inser- SDT to a binary executable. Translated instructions are held in tion and removal of instrumentation in the translated code. a software cache, called the fragment cache. The cache holds To demonstrate the effectiveness of our techniques, we “code fragments” (or instruction traces), which are instruction developed a new debugger, tdb, based on our debug architec- sequences that execute sequentially, ending with a conditional ture. Tdb extends the GNU gdb debugger and supports all of or indirect control transfer. gdb’s source-level commands and queries. We integrated the The working of a SDT system is shown in Figure 1. A SDT SDT framework Strata [21] into the debug architecture and system first captures the context of an application (e.g., PC, evaluated the debugging capabilities in the context of a soft- registers and condition codes) and saves it. Following context ware security checker. capture, the SDT system processes the subsequent application This paper makes a number of contributions, including: code to be executed.The SDT system checks to see whether • A debug architecture that enables source-level debug- the code to be executed has been previously translated and ging for dynamically translated programs and an imple- cached. If yes, a context switch restores the application context mentation of this architecture in a new tool, tdb; and begins executing translated application instructions on the • An interface that enables SDT debugging tools to be host CPU. If the code is not cached, the instructions are easily written without regard to machine specific details fetched and translated one-by-one. When an instruction is and SDT system specific details; translated, it is placed in a fragment within the fragment cache. • An interface that enables existing debuggers to be easily Instructions are translated until an end of fragment condition is extended to support dynamically translated programs; encountered (e.g., a branch or an indirect jump). A control • Debug techniques that provide source-level transparency transfer that terminates a fragment returns control to the SDT even though instrumentation and trampoline code are layer through a trampoline because the code at the control executed as part of the application code; transfer’s target may not have been translated yet. When trans- • Debug techniques that handle the dynamic transforma- lation of a fragment is complete, the SDT layer performs a tions that are used to reduce translation overhead by context switch to the code cache for execution. Thus, the avoiding context switches; and, dynamic translator gets control after each fragment executes to • A verification and experimental evaluation that shows translate subsequent fragments. our approach works well and has minimal overhead. Because the SDT layer executes different code than the In the next section, we describe background and the chal- untranslated code, the source-to-binary debug mappings main- lenges of debugging dynamically translated programs. Section tained with static information are insufficient by themselves 3 describes an overview of our debug architecture. Section 4 for debugging translated code. As an example, consider a sim- describes the components and interfaces in the debug architec- ple translation that copies a binary instruction from the ture, including techniques for generating and using debug untranslated code to exactly one location in the fragment information, for a basic SDT system. Section 5 describes the cache. We assume that the mappings from source to untrans- generation of debug information for a more complex SDT sys- lated code are available from the compiler. During a debug tem. Validation and experimental results of tdb are presented session, when a user places a breakpoint at a source program in Section 6. Finally, Section 7 describes related work and Sec- instruction, the debugger cannot locate the translated instruc- tion 8 summarizes and concludes with future work. tion where the breakpoint should actually be placed because of the translation. One complexity of debugging an application executed with 2 Background SDT involves the translation of a branch. A branch is handled Debuggers use breakpoints to enable a user to understand and with a trampoline that is used to re-invoke the dynamic trans- lator to translate the branch’s actual target fragment(s). In a debug session, the location of the source branch is mapped to Native Debugger the untranslated branch; however, the executable branch instruction has changed which can affect the reportability of mapAddress inser/delete breakpoint debug queries. When debugging a dynamically modified pro- writeValue exceptionCallback gram with a traditional debugger, the user has to be aware of the trampoline code and the subsequent entry into the transla- Mapping Breakpoint Breakpoint Repository Mapper Repository tor. Thus, if a user issues a “next” command, he/she has to be Manager aware that the stopping point may be the trampoline code. Another problem occurs when the translator deletes a Mapping DEBUG instruction, reorders instructions, or introduces more instances Generator of an instruction than were in the untranslated code. Also, the ENGINE number of instruction instances may change during execution. In these cases, the debugger has to detect the correct instruc- translation tion instance being executed. Another case happens when the read/write exception dynamic translator flushes the fragment cache to free space. operations The debugger needs to recognize that the debug information for the flushed instructions are no longer in effect. Dynamic Translator Fragment Cache Even more challenging is when the translator applies over- head reduction transformations. Most dynamic translators use several overhead reduction transformations to avoid context Figure 2: SDT Debug Architecture switches between the dynamic translator and the code cache, The SDT system is minimally altered to communicate the including fragment linking, indirect branch translation caching translation operations to the debug engine. The architecture and chaining, instruction trace formation, fast return handling, does not impose restrictions on the SDT system, such as trans- and partial inlining of target blocks for unconditional control formations that should not be performed in a debug session. transfers [2,4,11,21]. The application being debugged needs to be compiled with the Fragment linking rewrites conditional branches to avoid static debug flag turned on, as is traditionally done to debug at invoking the dynamic translator when the code at the actual source level. The debug architecture does not require any other target of a branch is already translated. Linking two fragments changes to theprogram. requires rewriting the existing branch to directly transfer con- Debug Engine. The debug engine consists of three compo- trol to the translated target (thus, by-passing trampoline code). nents: a mapping generator, a mapper, and a breakpoint man- If the debugger is unaware of this code transformation, then it ager, and two repositories, the mapping repository and the may report incorrect information about the translated branch breakpoint repository. The components and repositories are or the trampoline code. Similarly, instructions are emitted to shown in Figure 2. minimize the context-switches into the SDT system from frag- The mapping generator computes debug information, con- ments with indirect branches [2,21]. sisting of dynamic debug mappings, to relate source program Another overhead reduction technique forms instruction locations to translated program locations and the vice-versa. It traces, chains of code fragments, to improve cache locality and uses information provided by the SDT system for generating simplify branch handling. An instruction trace can be, for and updating mappings. The mappings are stored in the map- example, a sequence of instructions on a hot path. Instruction ping repository. traces present a challenge for a debugger because the traces The mapper uses the debug information from the generator change at run-time and incur code duplication, code flushes to map untranslated and translated code. The output of the from the fragment cache, branch inversions and fragment link- mappings can be used by either the native debugger or the ing. The debugger must update its knowledge of duplication, SDT system. The mapper also interacts with the breakpoint flushes, and dynamic linking and keep these modifications manager to determine if a breakpoint needs to be placed in transparent from the user. The other overhead reduction tech- freshly translated code. niques and dynamic instrumentation also perform code modi- The breakpoint manager keeps track of all active break- fications that need to be tracked by the debugger. points (and watchpoints) for the executing program, in the breakpoint repository. Information about the breakpoints is 3 Overview of the SDT Debug Architecture communicated to the breakpoint manager by the native debug- In this section, we describe a debug architecture that can be ger. The breakpoint manager is also responsible for inserting used for source-level debugging of dynamically translated pro- breakpoints in translated code. When breakpoints are hit in the grams. The structure of this architecture is shown in Figure 2. translated code, the breakpoint manager is notified. The debug architecture has three components: a SDT system, a To understand the flow of information through the debug native debugger and a debug engine. The arrows in the figure engine, consider a typical debug session when a user tries to represent the invocation of one component (or an action in a insert a breakpoint at a source location. The native debugger repository) by another component. For example, the arrow computes the corresponding untranslated program location between the native debugger and the mapper denotes initiation and invokes the breakpoint manager. The breakpoint manager of operations in the mapper. consults the mapper to determine the corresponding translated The debug engine is the key component of the debug archi- locations and inserts breakpoints at each of these locations. tecture, which computes debug information at run-time. This The breakpoint manager also saves the breakpoint information information is used to hide code generation and modification in the breakpoint repository. in the SDT system from the native debugger. When a breakpoint is hit in translated code, an exception is The native debugger is similar to a conventional debugger, raised which is handled by the breakpoint manager. This han- except that it performs debug actions by targeting the debug dler overrides the exception handler in the native debugger. engine, rather than the executing program. For example, if a The breakpoint manager invokes the mapper, which then uses user sets a breakpoint at a source location, the native debugger the mapping repository to determine corresponding untrans- computes the corresponding untranslated location (binary or lated location. The untranslated location is passed to the native bytecode) and notifies the debug engine to set a breakpoint at debugger with the information that a breakpoint has just been that location. This is in contrast to the traditional scenario in hit at that location. which the debugger would itself set the breakpoint. 4 The Debug Engine Operations that affect the location of data-values, such as those resulting from statement reordering in a dynamic opti- This section describes how the debug engine orchestrates mizer, can be described in a similar way. Such operations are information flow between the native debugger and the SDT not shown in this paper, although the debug architecture system. We first discuss how mappings are generated. Then, described above can be used to generate debug information for we discuss how the mapper and the breakpoint manager use these operations. the mappings to enable debug queries and actions. 4.1.1 Dynamic Debug Mappings 4.1 Generating Debug Mappings Before we describe the generation and update of mappings, we The mapping generator uses information received from the discuss the different categories of mappings that are used for SDT system to compute debug mappings. It stores the map- debugging applications translated by a basic SDT system. For pings in the mapping repository. The debug architecture does brevity, we do not make a distinction between an instruction not specify how the mappings are stored in the repository. and its location and use the term “instruction” to refer to its The communication from the SDT system to the mapping “instruction location” in the following discussion. There are generator is performed through a set of API’s, shown in Table four types of debug mappings: 1. SDT systems may perform a number of different transfor- • Untranslated-to-translated code mapping (U-T): This mations on the generated code. For example, a dynamic opti- mapping relates untranslated code to translated code. It mizer may reorder statements [2]. The number of is bidirectional; i.e., given an untranslated instruction, a transformation operations performed during dynamic transla- corresponding translated instruction can be found, and tion is diverse and depends on the purpose of translation, such given a translated instruction, a corresponding untrans- as optimization, security checks, or binary translation, among lated instruction can be found. A U-T mapping is nota- others. While describing the API’s for all of these operations is tionally shown as u↔D, where u is the untranslated out of scope for this paper, we discuss the API’s for operations statement location and D is a set of translated locations. performed by a basic SDT system. • Translated-to-translated code mapping (T-T): This type A basic SDT system generates code, one fragment at a time relates translated instructions to translated instructions. (as shown in Figure 1). There are five basic translation opera- It is used when a dynamic code modification results in tions that are done in a SDT system. Other operations can be several new instructions for one translated instruction. A handled similarly or using a combination of these operations. T-T mapping is unidirectional and shown as t →{t }, A regular operation corresponds to the translation of one 1 2 instruction to another. An API, as shown in the first row of where t1 and t2 are translated instructions. Table 1, is used to describe this operation. In this API, u is the • Translated-to-untranslated code mapping (T-U): This untranslated location, t is the translated location, the operation type relates a translated instruction to an instruction in type is regular, and the transformation is applied on code, as the untranslated code. It is used to map instructions in opposed to data. the fragment cache that do not have a direct relationship to untranslated instructions (e.g., as in trampoline code). This mapping is unidirectional and shown as t→{u}, operation API where t is a translated location and u is a location in the regular regular(Loc u, Loc t, Trans code) untranslated code. Initially, before an untranslated statement, u, is translated, delete delete(Loc u, Loc t, Trans code) it has the default U-T mapping, u↔∅. During translation, an many many(Loc u, Loc t, Trans code) instruction’s U-T mappings are updated and T-T and T-U map- pings are generated/updated. An update modifies the destina- flush flush(Loc t, Trans code) tions of a mapping. The dynamic debug mappings are trampoline trampoline(Loc u, Loc t1, Loc t2, Trans code) generated/updated differently based on how the code is dynamically transformed. For example, a translation of an Table 1: API’s for different translation operations instruction into another instruction results in update of a single U-T mapping, while the translation of an instruction to two A delete operation happens when an untranslated instruc- instructions causes the update of a U-T mapping and the gen- tion does not result in any translated code (i.e., the untrans- eration of a new T-T mapping. Side effects that result from lated instruction is “deleted” because it is eliminated in the translation, such as a fragment cache flush also generate and fragment cache). The second row in Table 1 shows an API that update mappings. describes this operation. In this API, u, represents untranslated location, t represents the translated location for the next 4.1.2 Basic Translation Operations instruction to be translated (e.g., statement location u+1), the operation is delete, and the transformation is applied to code. The mapping algorithms for the five basic translation opera- A many operation involves an untranslated instruction that tions, shown in Table 1, are described below. For each opera- is translated into multiple new instructions. An API for this tion, an algorithm is used to update and generate the dynamic operation is shown in the third row of Table 1, which describes mappings. In the algorithms, a mapping for a program or trans- the case when translation of a statement leads to generation of lated instruction is represented as , where =s→R (or two statements. In this example, u represents the untranslated = s↔R), s is an instruction and R is a set of destination location and t is the location for the additional statement. instruction(s). For notational convenience, “*” is the set of A flush operation evicts translated instructions from the destinations for a mapping (e.g., if =s↔R in a U-T map- fragment cache. The API shown in the fourth row of Table 1 ping, then *=R). The symbol u is an untranslated instruction can be used to describe a flush operation, in which t is a trans- location and the symbol t is a location in the fragment cache. lated location containing the statement being flushed. The normal set operators, such as union and difference, can be Finally, a trampoline operation handles control transfers applied to the mappings. that result in trampoline code that transfers control from one Regular. For this operation, the mapping generator updates location to another. The last row of Table 1 shows an API for a a U-T mapping for a program instruction as shown in Figure trampoline operation, in which u is the untranslated branch 3(a)1. Consider an untranslated location u that is translated to a location and t1and t2 are the not-taken and taken trampoline new location t. In this case, the U-T mapping is updated locations. to include the new destination t in u’s set of translated destina- tions (*). Figure 3(b) shows this mapping between the pro- many(Loc u, Loc t, Trans code) { gram instruction u and translated location t. // t is an additional instruction; e.g., // t2 or t3 in the example (b) if (instruction(u) is not a branch) regular(Loc u, Loc t, Trans code) {
Execution Time (secs.) Total Breakpoints Number of Mappings Memory Program GDB TDB GDB TDB U-T T-U T-T (kilobytes) mcf 183 283 1,941,434 4,280,539 6,081 1,701 186 56 gcc 244 354 2,737,719 6,222,586 174,796 52,065 6,806 1,634 gzip 164 234 1,680,855 3,736,738 8,154 1,930 230 74 bzip 135 192 1,511,400 3,195,215 9,060 2,210 220 82 twolf 191 379 1,996,974 5,382,900 42,580 8,568 1,999 382 vortex 156 329 1,736,651 5,557,198 116,013 21,074 683 1,015 vpr 153 231 1,774,162 3,843,540 29,424 6,340 1,548 267 Table 4: Run-time performance, number of breakpoints, number of mappings, and memory overhead entries in the mapping table is considerably less than the maxi- lines, instrumentation probe) which would not be handled by mum number of instructions. For example, mcf has a maxi- this approach. Other work on optimized code has proposed mum of 7,968 entries at any time and gcc has 233,667 entries, code location mappings [9,13], which are also not suitable in a as shown in the table. The average number of entries is 23,425 dynamic translator. There has been much work on the data- across all benchmarks. The total amount of memory for the value problem in debugging [3,9,13,14,15,25,29]. The data- mapping table varies from 56 KB to 1.3 MB (average of 501 value problem refers to the challenge of finding the appropri- KB). The breakpoint table’s size is minimal as the table holds ate location (memory or register) of a source-level data con- only active breakpoints. The memory requirement was less struct, when program code is optimized. While this problem is than 1KB in all benchmarks. very difficult, a basic dynamic translator does not have this problem (unless dynamic optimizations are applied). 7 Related Work SDT has been used in a variety of scenarios, including architecture simulation [8,28], dynamic optimization [1,2,4], A number of debuggers for statically optimized code have security checking [17,22] and binary translation [10,12,23]. been developed [3,9,13,14,15,25,29], some of which deal with Transmeta’s Crusoe and Efficeon processors use code mor- both data value and code location problems (reordering/dele- phing, which performs dynamic binary translation of x86 tion of statements). The code location problem in statically instructions to VLIW instructions in hardware [10]. In this optimized code entail debug similar mappings to that proposed system, insertion of a breakpoint causes re-translation of the in this paper. However, these mappings are generated assum- code containing the breakpoint, which raises an exception to ing all of the optimizations have been applied statically. transfer control to the debugger. In effect, the debugger modi- Dynamic mapping must handle having translations done as the fies the behavior of the translator itself. One of the advantages code executes as well as having trampoline and instrumenta- from a debugger’s perspective is that dynamic translation is tion code. The translation itself also needs to be hidden from kept completely transparent to the user. Some other SDT the debug users. Jaramillo et al. used mappings, generated stat- frameworks, such as Dynamo [2] and Dynamo-RIO [4], pro- icaly, that are powerful and capture path-sensitive information vide low-level debug constructs to analyze translated code and by analyzing program [15]. However, the mappings require the data. In these cases, the user must understand the working of entire optimized code to be available for mappings. The map- the translator and debugging is not transparent. Indeed, this ping techniques are also not well-suited to being updated, debug mechanism is provided primarily for SDT developers. which is needed for dynamically changing code. Wu et al. The Java Platform Debug Architecture (JPDA) provides an used statically generated code location mappings to insert architecture similar to that proposed in this paper [Ref]. How- invisible breakpoints and techniques of selective emulation ever, the JDPA is very tightly integrated with the underlying and forward recovery for data reportability [29]. The anchor just-in-time compiler. This is because, the JDPA targets points described by Wu are similar to our U-T mappings. exactly one Virtual Machine (Java), which is not intended to Tabatabai et al. proposed code location mappings similar to be reconfigured for different purposes, unlike SDT systems. the basic translation mappings proposed in this paper [23,24]. Further, the JDPA does not provide techniques to generate However, these mappings are insufficient for handling the debug information for SDT systems, where code is generated overhead reduction techniques and dynamic instrumentation. and modified at a much finer granularity than JIT compilers. Tice et al. proposed changing the source code so that code location mappings better relate the source code and optimized binary [27]. However, a dynamic translator often generates code that has no relation with the source code (e.g., trampo- 8 Summary [14] U. Holzle, C. Chambers and D. Ungar, “Debugging opti- mized code with dynamic deoptimization”, Conf. on In this paper, we propose a debug architecture that enables Programming Language Design and Implementation, source-level debugging of dynamically translated code. We 27(7), pages 32–43, 1992. present code mappings that are constructed during dynamic [15] C. Jaramillo, R. Gupta, and M. L. Soffa, “FULLDOC: A translation and used by the debugger to relate dynamically full reporting debugger for optimized code”, Proc. of generated code to source code. We provide a reference imple- Static Analysis Symposium, vol. 1824, pages 240–259, mentation of the debug architecture for a retargetable and 2000. reconfigurable SDT system, Strata. We also show how to gen- [16] P. Kessler, “Fast breakpoints: Design and implementa- erate mappings for several basic code translations, overhead tion”, Conf. on Programming Language Design and reduction techniques, and instrumentation probes. Our map- Implementation, 25(6), pages 78–84, 1990. pings can be used by a debugger to keep the user unaware of [17] V. Kiriansky, D. Bruening and S. Amarasinghe, “Secure the SDT’s existence. We developed a reference implementa- execution via program shepherding”, USENIX Security tion of our debug architecture as a new debugger, called tdb, Symposium, pages 191–206, 2002. and evaluated it. Our experiments show that the debugger has [18] B. P. Miller, M. D. Callaghan, J. M. Cargille, J. K. Holl- reasonable performance and memory overheads. It is our ingsworth, R. Irvin, K. Karavanic, K. Kunchithapadam, intention to contribute tdb to the public domain for other SDT Tia Newhallt, “The Paradyn parallel performance mea- users and developers. surement tools”, IEEE Computer, 8(11), pages 37–46, 1995. 9 References [19] M. Prvulovic and J. Torrellas, “ReEnact: Using Thread- Level Speculation Mechanisms to Debug Data Races in [1] M. Arnold, S. Fink, D. Grove, M. Hind and P. Sweeney, Multithreaded Codes”, Int’l. Symp. on Computer Archi- “Adaptive optimization in the Jalapeño JVM”, Conf. on tecture, 31(2), pages 110–121, June 2003. Object-Oriented Programming, Systems, Languages and [20] N. Ramsey and D. Hanson, “A retargetable debugger”, Applications, 8(10), pages 47–65, 2000. Conf. on Programming Language Design and Imple- [2] V. Bala, E. Duesterwald, and S. Banerjia, “Dynamo: A mentation, 27(7), pages 22–31, 1992. transparent dynamic optimization system”, Conf. on [21] K. Scott, N. Kumar, S. Veluswamy, B. Childers, J. Programming Language Design and Implementation, Davidson, M. L. Soffa, “Reconfigurable and retargetable 35(5), pages. 1–12, 2000. software dynamic translation”, Int’l. Symp. on Code [3] G. Brooks, G. Hansen and S. Simmons, “A new Generation and Optimization, pages 36–47, 2003. approach to debugging optimized code”, Conf. on Pro- [22] K. Scott and J. Davidson, “Safe virtual execution using gramming Language Design and Implementation, 27(7), software dynamic translation”, Annual Computer Secu- pages 1–11, 1992. rity Applications Conference, page 209, 2002. [4] D. Bruening, T. Garnett and S. Amarasinghe, “An infra- [23] A. Srivastava and A. Edwards, “Vulcan: Binary transfor- structure for adaptive dynamic optimization”, Int’l. mation in a distributed environment”, Microsoft Symp. on Code Generation and Optimization, pages Research, MSR–TR–2001–50, 2001. 265–275, 2003. [24] R. M. Stallman and R. H. Pesch, “Using GDB: A guide [5] D. Bruening, S. Amarsinghe, “Maintaining consistency to the GNU source-level debugger”, GDB v4.0, Free and bounding capacity of software code caches”, Int’l. Software Foundation, Cambridge, MA, 1991. Symp. on Code Generation and Optimization, pages 74– [25] A. Tabatabai and T. Gross, “Source-level debugging of 85, 2005. scalar optimized code”, Conf. on Programming Lan- [6] J. Choi and S. Min, “Race Frontier: Reproducing Data guage Design and Implementation, 31(5), pages 33–43, Races in Parallel-Program Debugging”, ACM Sympo- 1996. sium on Principles and Practice of Parallel Program- [26] A. Tabatabai, “Source-Level Debugging of Globally ming, 26(7), pages 145–154, 1991. Optimized Code”, PhD Dissertation, 1996. [7] C. Cifuentes, B. Lewis and D. Ung, “Walkabout: A [27] C. Tice and S. Graham. “Optview: A new approach for retargetable dynamic binary translation framework”, examining optimized code”, Workshop on Program Workshop on Binary Translation, 2002. Analysis For Software Tools and Engineering, 2000. [8] R. Cmelik and D. Keppel, “Shade: A fast instruction-set [28] E. Witchel and M. Rosenblum, “Embra: Fast and flexi- simulator for execution profiling”, Technical Report 93– ble machine simulation”, Conf. on Measurement and 06–06, University of Washington, 1993. Modeling of Computer Systems, 24(1), pages 68–79, [9] M. Copperman, “Debugging optimized code without 1996. being Misled”, ACM Transactions on Programming [29] L. Wu, R. Mirani, H. Patil, B. Olsen and W. Hwu, “A Languages and Systems, 16(3), pages 387–427, 1994. new framework for debugging globally optimized [10] J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. code”, Conf. on Programming Language Design and Kistler, A. Klaiber, and J. Mattson, “The Transmeta Implementation, 34(5), pages 181–191, 1999. Code Morphing Software: Using speculation, recovery, [30] M. Xu, R. Bodik, and M. Hill, “A “Flight Data and adaptive retranslation to address real-life chal- Recorder” for Enabling Full-system Multiprocessor lenges”, Int’l. Symp. on Code Optimization and Genera- Deterministic Replay”, Int’l. Symp. on Computer Archi- tion, pages 15–24, 2003. tecture, 31(2), pages 122–135, 2003. [11] G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and [31] Java Platform Debugger Architecture, Sun Microsys- J. Fisher, “DELI: A new runtime control point”, Int’l. tems, http://java.sun.com/products/jpda/index.jsp, 2005. Symp. on Microarchitecture, 40(6), pages 201–212, 2002. [12] K. Ebcioglu and E. Altman, “DAISY: Dynamic compila- tion for 100% architectural compatibility”, Int’l. Sympo- sium on Computer Architecture, 25(2), pages 26–37, 1997. [13] J. Hennessy, “Symbolic debugging of optimized code”, ACM Transactions on Programming Languages and Systems, 4(3), pages 323–244, 1982.