Debugging Using DWARF

Introduction to the DWARF Debugging Format Michael J. Eager, Eager Consulting February, 2007 It would be wonderful if we could write quence of simple operations, registers, difficult time connecting its manipulations programs that were guaranteed to work memory addresses, and binary values of low-level code to the original source correctly and never needed to be debugged. which the processor actually understands. which generated it. Until that halcyon day, the normal pro- After all, the processor really doesn©t care The second challenge is how to describe gramming cycle is going to involve writing whether you used object oriented program- the executable program and its relationship a program, compiling it, executing it, and ming, templates, or smart pointers; it only to the original source with enough detail to then the (somewhat) dreaded scourge of understands a very simple set of operations allow a debugger to provide the program- debugging it. And then repeat until the pro- on a limited number of registers and mem- mer useful information. At the same time, gram works as expected. ory locations containing binary values. the description has to be concise enough so It is possible to debug programs by in- As a compiler reads and parses the that it does not take up an extreme amount serting code that prints values of various in- source of a program, it collects a variety of of space or require significant processor teresting variables. Indeed, in some situa- information about the program, such as the time to interpret. This is where the DWARF tions, such as debugging kernel drivers, this line numbers where a variable or function Debugging Format comes in: it is a compact may be the preferred method. There are is declared or used. Semantic analysis ex- representation of the relationship between low-level debuggers that allow you to step tends this information to fill in details such the executable program and the source in a through the executable program, instruc- as the types of variables and arguments of way that is reasonably efficient for a debug- tion by instruction, displaying registers and functions. Optimizations may move parts of ger to process. memory contents in binary. the program around, combine similar pieces, expand inline functions, or remove But it is much easier to use a source-lev- parts which are unneeded. Finally, code The Debugging el debugger which allows you to step generation takes this internal representa- Process through a program©s source, set break- tion of the program and generates the actu- points, print variable values, and perhaps a hen a programmer runs a program al machine instructions. Often, there is an- few other functions such as allowing you to under a debugger, there are some other pass over the machine code to per- W call a function in your program while in the common operations which he or she may form what are called "peephole" optimiza- debugger. The problem is how to coordi- want to do. The most common of these are tions that may further rearrange or modify nate two completely different programs, setting a breakpoint to stop the debugger at the code, for example, to eliminate dupli- the compiler and the debugger, so that the a particular point in the source, either by cate instructions. program can be debugged. specifying the line number or a function All-in-all, the compiler©s task is to take name. When this breakpoint is hit, then the well-crafted and understandable source the programmer usually would like to dis- Translating from code and convert it into efficient but essen- play the values of local or global variables, Source to tially unintelligible machine language. The or the arguments to the function. Display- better the compiler achieves the goal of cre- ing the call stack lets the programmer know Executable ating tight and fast code, the more likely it how the program arrived at the breakpoint he process of compiling a program is that the result will be difficult to under- in cases where there are multiple execution Tfrom human-readable form into the bi- stand. paths. After reviewing this information, the nary that a processor executes is quite com- programmer can ask the debugger to con- plex, but it essentially involves successively During this translation process, the tinue execution of the program under test. recasting the source into simpler and sim- compiler collects information about the There are a number of additional opera- pler forms, discarding information at each program which will be useful later when tions that are useful in debugging. For ex- step until, eventually, the result is the se- the program is debugged. There are two challenges in doing this well. The first is ample, it may be helpful to be able to step through a program line by line, either en- Michael Eager is Principal Consultant at that in the later parts of this process, it may tering or stepping over called functions. Eager Consulting (www.eagercon.com), be difficult for the compiler to relate the Setting a breakpoint at every instance of a specializing in development tools for changes it is making to the program to the template or inline function can be impor- embedded systems. He was a member original source code that the programmer tant for debugging C++ programs. It can of PLSIG©s DWARF standardization com- wrote. For example, the peephole optimizer be helpful to stop just before the end of a mittee and has been Chair of the may remove an instruction because it was function so that the return value can be dis- DWARF Standards Committee since able to switch around the order of a test in played or changed. Sometimes the pro- 1999. Michael can be contacted at code that was generated by an inline func- grammer may want to bypass execution of [email protected]. tion in the instantiation of a C++ template. a function, returning a known value instead © Eager Consulting, 2006, 2007 By the time it gets its metaphorical hands on the program, the optimizer may have a of what the function would have (possibly Sun extensions. Nonetheless, stabs is still A Brief History of incorrectly) computed. widely used. DWARF2 There are also data related operations COFF stands for Common Object File that are useful. For example, displaying Format and originated with Unix System V the type of a variable can avoid having to Release 3. Rudimentary debugging infor- DWARF 1 ─ Unix SVR4 sdb look up the type in the source files. Dis- mation was defined with the COFF format, and PLSIG playing the value of a variable in different but since COFF includes support for named WARF originated with the C compiler formats, or displaying a memory or register sections, a variety of different debugging Dand sdb debugger in Unix System V in a specified format is helpful. formats such as stabs have been used with Release 4 (SVR4) developed by Bell Labs in COFF. The most significant problem with There are some operations which might the mid-1980s. The Programming Lan- COFF is that despite the Common in its be called advanced debugging functions: guages Special Interest Group (PLSIG), part name, it isn't the same in each architecture for example, being able to debug multi- of Unix International (UI), documented the which uses the format. There are many threaded programs or programs stored in DWARF generated by SVR4 as DWARF 1 in variations in COFF, including XCOFF (used write-only memory. One might want a de- 1989. Although the original DWARF had on IBM RS/6000), ECOFF (used on MIPS bugger (or some other program analysis several clear shortcomings, most notably and Alpha), and the Windows PE-COFF. tool) to keep track of whether certain sec- that it was not very compact, the PLSIG de- Documentation of these variants is avail- tions of code had been executed or not. cided to standardize the SVR4 format with able to varying degrees but neither the ob- Some debuggers allow the programmer to only minimal modification. It was widely ject module format nor the debugging in- call functions in the program being tested. adopted within the embedded sector where formation is standardized. In the not-so-distant past, debugging pro- it continues to be used today, especially for grams that had been optimized would have PE-COFF is the object module format small processors. been considered an advanced feature. used by Microsoft Windows beginning with Windows 95. It is based on the COFF for- DWARF 2 ─ PLSIG The task of a debugger is to provide the mat and contains both COFF debugging programmer with a view of the executing he PLSIG continued on to develop and data and Microsoft's own proprietary Code- program in as natural and understandable document extensions to DWARF to ad- View or CV4 debugging data format. Docu- T fashion as possible, while permitting a wide dress several issues, the most important of mentation on the debugging format is both range of control over its execution. This which was to reduce the amount of data sketchy and difficult to obtain. means that the debugger has to essentially that were generated. There were also addi- reverse much of the compiler's carefully OMF stands for Object Module Format tions to support new languages such as the crafted transformations, converting the pro- and is the object file format used in CP/M, up-and-coming C++ language. DWARF 2 gram's data and state back into the terms DOS and OS/2 systems, as well as a small was released as a draft standard in 1990. that the programmer originally used in the number of embedded systems.

Load more