Tdb: a Source-Level Debugger for Dynamically Translated Programs

Tdb: A Source-level Debugger for Dynamically Translated Programs Naveen Kumar†, Bruce R. Childers†, and Mary Lou Soffa‡ †Department of Computer Science ‡Department of Computer Science University of Pittsburgh University of Virginia Pittsburgh, Pennsylvania 15260 Charlottesville, Virginia 22904 {naveen, childers}@cs.pitt.edu [email protected] Abstract single stepping through program execution, watching for particular conditions and requests to add and remove breakpoints. Debugging techniques have evolved over the years in response In order to respond, the debugger has to map the values and to changes in programming languages, implementation tech- statements that the user expects using the source program niques, and user needs. A new type of implementation vehicle viewpoint, to the actual values and locations of the statements for software has emerged that, once again, requires new as found in the executable program. debugging techniques. Software dynamic translation (SDT) As programming languages have evolved, new debugging has received much attention due to compelling applications of techniques have been developed. For example, checkpointing the technology, including software security checking, binary and time stamping techniques have been developed for lan- translation, and dynamic optimization. Using SDT, program guages with concurrent constructs [6,19,30]. The pervasive code changes dynamically, and thus, debugging techniques use of code optimizations to improve performance has necessi- developed for statically generated code cannot be used to tated techniques that can respond to queries even though the debug these applications. In this paper, we describe a new optimization may have changed the number of statement debug architecture for applications executing with SDT sys- instances and the order of execution [15,25,29]. tems. The architecture provides features that create the illusion Currently, a new implementation vehicle, software that the source program is being debugged, while allowing the dynamic translation (SDT), is being increasingly used for SDT system to modify the executing code. We incorporated this important applications, including software security [17,22], architecture in a new tool, called tdb, that integrates a SDT dynamic code optimization [1,2,4], binary translation of one system, Strata, with a widely used debugger, gdb. We evaluated instruction set to another [10,11,12,23], host machine virtual- tdb in the context of a code security checker. The results show ization [28], and computer architecture simulation [8,28]. A that a dynamically translated program can be debugged at the SDT system translates (generates) target code fragments/traces source level and that the approach does not overly increase the dynamically and stores the generated code in a fragment cache run-time performance or memory demands of the debugger. from which it executes. “Trampoline” code is put in the translated code to switch between the fragment cache and the dynamic translator for code generation. A SDT system may Categories and Subject Descriptors also insert instrumentation code into a translated program to D.2.5. [Software Engineering]: Testing and Debugging— gather run-time information and monitor program behavior. Debugging aids; D.3.3. [Programming Languages]: Lan- The motivation to debug dynamically translated programs guage Constructs and Features—Program instrumentation, arises from two sources. First, some environments may have a run-time environments dynamic translator tightly integrated with the host machine such that all applications must be translated before execution. Therefore, a developer may not have the luxury of debugging General Terms the application without dynamic translation. Security systems enforcing security policies by checking an application’s binary Languages, Performance, Algorithms code fall in this category [5]. Secondly, SDT systems may result in code transformations that expose bugs that are not Keywords manifested in the absence of the transformation. An example in this category is a dynamic optimizer [1,2,4]. Debugging, Dynamic Binary Translation, Dynamic Instrumen- SDT creates problems in debugging that cannot be solved tation by current techniques. In particular, debuggers targeted for statically generated code produce debug information at com- 1 Introduction pile-time. The debug information consists of mappings that can be used to relate source code and data to the executable. Although the importance of debugging techniques to assist Since the target code is generated dynamically, this static users in finding software bugs has long been recognized, new debug information for relating the untranslated code state- debugging techniques continue to be needed as programming ments and data values to the executing code, is insufficient. languages, user demands, and implementation strategies Another problem is that a translated statement can be modified change. The goal of a debugger is to accurately respond to user during execution, or a statement may be translated several queries and commands from the viewpoint of the source pro- times such that the number of statement instances can change gram. Debugging activities include requesting variable values, throughout execution. Yet another problem that current techniques cannot handle is a user may put a breakpoint, using the Permission to make digital or hard copies of all or part of this work for source program, in code that has not been translated yet. Solv- personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that ing these problems requires techniques that can relate the copies bear this notice and the full citation on the first page. To copy source code to the executing code, as the code is dynamically otherwise, or republish, to post on servers or to redistribute to lists, generated and modified. requires prior specific permission and/or a fee. In SDT systems, the trampoline code, as well as instrumen- AADEBUG’05, September 19–21, 2005, Monterey, California, USA. tation code, is executed as part of the application code. How- Copyright 2005 ACM 1-59593-050-7/05/0009...$5.00. ever, this code should be transparent to the user debugging the application. Similarly, the translation actions should also be hidden from debug users. Therefore, debugging techniques Application Binary need to be developed that hide execution of everything that is Context Dynamic Translator unrelated to source code. Capture Finally, due to both the growing importance of SDT and New No New Cached? the difficulty of developing SDT systems, several frameworks, PC Fragment including Strata [21], Dynamo/RIO [4], and Walkabout [7], Yes have attempted to enable dynamic translator reconfigurability Fetch and retargetability to ease the burden of SDT development. As Decode TranslateTranslate Context Yes a result, the SDT system can easily be targeted to a different Finished? Next PC Switch SDT application or machine platform. Hence, there is a strong No need to decouple the debugger as much from the SDT system e lat ns OS ra as possible, so that debugging capabilities can be quickly real- T & nt me ru st ized for a new SDT system. In In this paper, we present a new debug architecture for SDT CPU systems that achieves transparency while supporting the usual Figure 1: A software dynamic translator source-level debug queries including step, watch, and set and remove breakpoints. The debug architecture also provides nec- control a program, including pausing and stepping through essary decoupling between the SDT system and the debugger. execution and inspecting and modifying data values. To This paper provides a reference implementation of the answer source-level queries, a debugger maps a program’s debug architecture. The dynamic code changes that we con- source code to its binary executable instructions and data loca- sider result from basic translations, overhead reduction trans- tions using information provided by the compiler. A debug- formations, and dynamic instrumentation. The basic dynamic ger’s job is difficult when programs are modified dynamically translations include generating a new instruction, inserting because information provided by the compiler can become multiple instructions for a single program statement during inconsistent with the code modifications. translation, ignoring and not generating instructions for a pro- A SDT is an execution environment that changes code gram statement, deletion (flushing) of previously translated dynamically. A basic SDT system, shown in Figure 1, is a soft- instructions, and the duplication of program instructions in the ware layer between the application and the underlying operat- translated code. We also consider several overhead reduction ing system. Thus, given an application’s source program, a transformations, including instruction trace formation, condi- compiler first generates some type of code, including binary or tional branch linking, indirect branch translation caching, par- a higher level representation such as byte code. This code, tial inlining of unconditional branches and calls, and fast which we call “untranslated” code, is then translated by the return handling [21]. Finally, we consider the effect of inser- SDT to a binary executable. Translated instructions are held in tion and removal of instrumentation in the translated code. a software cache, called the fragment cache. The cache holds To demonstrate the effectiveness

Tdb: a Source-Level Debugger for Dynamically Translated Programs

Compilers & Translator Writing Systems

Source-To-Source Translation and Software Engineering

Virtual Machine Part II: Program Control

A Brief History of Just-In-Time Compilation

Language Translators

A Language Translator for a Computer Aided Rapid Prototyping System

Development of a Translator from LLVM to ACL2

Program Translation and Execution I:Linking Sept. 29, 1998

18 a Retargetable Static Binary Translator for the ARM Architecture

Project 7: Virtual Machine Translator I

Automatic Generation of Assembly to IR Translators Using Compilers Niranjan Hasabnis and R

Decoupling Dynamic Program Analysis from Execution in Virtual Environments