Instruction-Level Reverse Execution for Debugging
Total Page:16
File Type:pdf, Size:1020Kb
Instruction-level Reverse Execution for Debugging Tankut Akgul and Vincent J. Mooney III School of Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, GA 30332 ftankut, [email protected] ABSTRACT Furthermore, for difficult to find bugs, this process might The ability to execute a program in reverse is advantageous have to be repeated multiple times. Even worse, for inter- for shortening debug time. This paper presents a reverse ex- mittent bugs due to rare timing behaviors, the bug might ecution methodology at the assembly instruction-level with not reappear right away when the program is restarted. low memory and time overheads. The core idea of this ap- Reverse execution provides the programmer with the abil- proach is to generate a reverse program able to undo, in ity to return to a particular previous state in program exe- almost all cases, normal forward execution of an assembly cution. By reverse execution, program re-executions can be instruction in the program being debugged. The method- localized around a bug in a program. When the programmer ology has been implemented on a PowerPC processor in a misses a bug location by over-executing a program, he/she custom made debugger. Compared to previous work { all can roll back to a point where the processor state is known of which use a variety of state saving techniques { the ex- to be correct and then re-execute from that point on without perimental results show 2.5X to 400X memory overhead re- having to restart the program. This eliminates the require- duction for the tested benchmarks. Furthermore, the results ment to re-execute unnecessary parts of the program every with the same benchmarks show an average of 4.1X to 5.7X time a bug location is missed, thus, potentially reducing the time overhead reduction. overall debugging time significantly. In this paper, a novel reverse execution methodology in Categories and Subject Descriptors software is proposed. The proposed methodology is unique in the sense that it provides reverse execution at the assem- D.2.5 [Software Engineering]: Testing and Debugging bly instruction-level granularity and yet still has reasonable General Terms memory and time overheads when the program is being exe- cuted. Note that in the rest of this paper, the word \instruc- Algorithms tion" refers to an assembly instruction. Keywords 2. BACKGROUND AND MOTIVATION Reverse code generation, reverse execution, debugging An execution of a program T on a processor P can be rep- resented by a transition among a series of states 1. INTRODUCTION S=(S0; S1; : : : ; Sn) where a state Si can be written as a com- As human beings are quite prone to making mistakes, it bination of the program counter (P Ci), memory (Mi) and is very difficult for a programmer to write an error-free pro- register (Ri) values of P . From this representation, reverse gram before testing it. For this reason, debugging is an execution of a program can be defined as follows: important and inevitable part of software development. Definition 1. Reverse Execution: Reverse execution of a pro- Locating bugs by just looking at the source code is quite gram T running on a processor P can be defined as taking pro- difficult. Consequently, a run-time interaction with the pro- cessor P from its current state Si=(P Ci; Mi; Ri) to a previous gram is very useful for debugging. Unfortunately, many of state Sj =(P Cj ; Mj ; Rj ), (0 ≤ j < i ≤ n). The closest achiev- the bugs in programs do not cause errors immediately, but able distance between Si and Sj determines the granularity of the instead show their effects much later in program execution. reverse execution. If state Sj is allowed to be as early as one in- For this reason, even the most careful programmer equipped struction before state Si, then the reverse execution is said to be at with a state of the art debugger might well miss the first oc- the instruction-level granularity. 2 currence of a bug and might have to restart the program. The simplest method for obtaining a previous state would be saving that state before it is destroyed. Saving a state during forward execution of a program introduces two over- Permission to make digital or hard copies of all or part of this work for heads: memory and time. Several approaches have been pro- personal or classroom use is granted without fee provided that copies are posed for state saving. In [3, 7], processor states are recorded not made or distributed for profit or commercial advantage and that copies periodically at certain checkpoints during forward execution bear this notice and the full citation on the first page. To copy otherwise, to of the program. Then, a previous state at a checkpoint can republish, to post on servers or to redistribute to lists, requires prior specific be recovered by restoring that state from the record. How- permission and/or a fee. PASTE'02, November 18–19, 2002, Charleston, SC, USA. ever, a previous state at an arbitrary point cannot imme- Copyright 2002 ACM 1-58113-479-7/02/0011 ...$5.00. diately be recovered, which results in a coarser granularity reverse execution. In incremental state saving [10], on the a single instruction because more than one reverse instruc- other hand, instead of recording the whole state, only the tion may need to be generated for reversing the effect of an modified parts of a state are recorded. However, if modified instruction in I. Then, the effect of the complete sequence I state space is large, memory and time overheads of incre- can be taken back by an ordered completion of the sequence 0 0 0 0 0 0 mental state saving might again exceed affordable limits. of sets I = (In; In−1; : : : ; I1) where Ik 2 I reverses the In program animation [4, 6], a virtual machine with a effect of ik 2 I . Therefore, it is sufficient to perform the reversible set of instructions is constructed. Since these following three steps to generate a reverse program T 0 for a instructions are reversible, the program can be run back- program T : wards. However, in program animation, a program can only 1. Extract the completion order of the instructions in T be interpreted, which slows down the animation consider- ably, and makes it impossible to execute the program using 2. Generate the sets of reverse instructions of the instruc- native machine instructions directly, not even in the forward tions in T direction. Moreover, since reversible instructions are usu- 3. Combine the generated sets of reverse instructions in ally constructed as stack operations, a significant amount of a way to make those sets complete in the opposite stack memory may be required in program animation. completion order of the instructions in T Another approach introduced is the source transforma- Given a program T as an input, the reverse code gen- tion approach [5]. In source transformation, the source code eration (RCG) algorithm presented in this paper generates (e.g., in C) is transformed to a reversible source code version an instruction-level reverse program T 0 for T by perform- excluding destructive statements such as direct assignments. ing the three steps mentioned. Let us call these three steps For destructive statements, state saving is applied. Conse- the RCG steps. Reverse code is generated for every proce- quently, the execution time and memory requirement of the dure/function in T separately and reverses of procedures/ transformed code are increased. Source transformation does functions are combined by a state saving approach [2]. There- not provide reverse execution at the instruction-level gran- fore, the rest of the paper focuses on the generation of the ularity, but instead at the source code (e.g., C) granularity. reverse procedures/functions. Moreover, since the original source code is transformed, the A procedure/function within a program for which a re- program being debugged is no longer the original code, but verse program will be generated is statically analyzed. This the transformed code instead. This might be a serious prob- static analysis occurs assembly instruction by assembly in- lem in real-time computing where a small change in program struction in the order the instructions are placed by the com- code can ruin the timing behavior of the code. piler (lexical order). After an instruction is analyzed, the al- Therefore, our goal is to achieve reverse execution at the gorithm goes over the RCG steps consecutively to grow the native instruction level with low memory and time over- reverse procedure/function. Some instructions within a loop heads, which will add a missing feature to state of the art are analyzed by more than one pass (at most three passes) debuggers and will also enable reverse execution of low-level over the loop body before the reverse code is generated for programs such as device drivers. those instructions. Note that this paper focuses on assembly level analysis as 3. METHODOLOGY a first step to provide reverse execution capabilities. How- In order to achieve reverse execution of a program at the ever, this research can be extended in a straightforward way instruction-level, a new term instruction-level reverse pro- to be integrated into a compiler and to connect the reverse gram is introduced as follows: assembly execution to reverse source code execution. In the following subsections, we will explain how the RCG Definition 2. Instruction-level Reverse Program: Suppose that steps are handled in the RCG algorithm. We begin with the a processor P attains the series of states S = (S0; S1; : : : ; Sn) explanation of the first RCG step in the next subsection.