Introduction to the DWARF Debugging Format Michael J

Introduction to the DWARF Debugging Format Michael J. Eager, Eager Consulting April, 2012 It would be wonderful if we could write memory addresses, and binary values of low-level code to the original source programs that were guaranteed to work which the processor actually understands. which generated it. correctly and never needed to be debugged. After all, the processor really doesn©t care The second challenge is how to describe Until that halcyon day, the normal pro- whether you used object oriented program- the executable program and its relationship gramming cycle is going to involve writing ming, templates, or smart pointers; it only to the original source with enough detail to a program, compiling it, executing it, and understands a very simple set of operations allow a debugger to provide the program- then the (somewhat) dreaded scourge of on a limited number of registers and mem- mer useful information. At the same time, debugging it. And then repeat until the pro- ory locations containing binary values. the description has to be concise enough so gram works as expected. As a compiler reads and parses the that it does not take up an extreme amount It is possible to debug programs by in- source of a program, it collects a variety of of space or require significant processor serting code that prints values of selected information about the program, such as the time to interpret. This is where the DWARF interesting variables. Indeed, in some situa- line numbers where a variable or function Debugging Format comes in: it is a compact tions, such as debugging kernel drivers, this is declared or used. Semantic analysis ex- representation of the relationship between may be the preferred method. There are tends this information to fill in details such the executable program and the source in a low-level debuggers that allow you to step as the types of variables and arguments of way that is reasonably efficient for a debug- through the executable program, instruc- functions. Optimizations may move parts of ger to process. tion by instruction, displaying registers and the program around, combine similar memory contents in binary. pieces, expand inline functions, or remove parts which are unneeded. Finally, code The Debugging But it is much easier to use a source-lev- generation takes this internal representa- Process el debugger which allows you to step tion of the program and generates the actu- through a program©s source, set break- hen a programmer runs a program al machine instructions. Often, there is an- points, print variable values, and perhaps a under a debugger, there are some other pass over the machine code to per- W few other functions such as allowing you to common operations which he or she may form what are called "peephole" optimiza- call a function in your program while in the want to do. The most common of these are tions that may further rearrange or modify debugger. The problem is how to coordi- setting a breakpoint to stop the debugger at the code, for example, to eliminate dupli- nate two completely different programs, a particular point in the source, either by cate instructions. the compiler and the debugger, so that the specifying the line number or a function program can be debugged. All-in-all, the compiler©s task is to take name. When this breakpoint is hit, the pro- the well-crafted and understandable source grammer usually would like to display the code and convert it into efficient but essen- values of local or global variables, or the ar- Translating from tially unintelligible machine language. The guments to the function. Displaying the Source to Executable better the compiler achieves the goal of cre- call stack lets the programmer know how the program arrived at the breakpoint in he process of compiling a program ating tight and fast code, the more likely it cases where there are multiple execution from human-readable form into the bi- is that the result will be difficult to under- T paths. After reviewing this information, the nary form that a processor executes is quite stand. programmer can ask the debugger to con- complex, but it essentially involves succes- During this translation process, the tinue execution of the program under test. sively recasting the source into simpler and compiler collects information about the simpler forms, discarding information at program which will be useful later when There are a number of additional opera- each step until, eventually, the result is the the program is debugged. There are two tions that are useful in debugging. For ex- sequence of simple operations, registers, challenges to doing this well. The first is ample, it may be helpful to be able to step that in the later parts of this process, it may through a program line by line, either en- tering or stepping over called functions. Michael Eager is Principal Consultant at be difficult for the compiler to relate the Setting a breakpoint at every instance of a Eager Consulting (www.eagercon.com), changes it is making to the program to the template or inline function can be impor- specializing in development tools for original source code that the programmer tant for debugging C++ programs. It can embedded systems. He was a member wrote. For example, the peephole optimizer be helpful to stop just before the end of a of PLSIG©s DWARF standardization com- may remove an instruction because it was function so that the return value can be dis- mittee and has been Chair of the able to switch around the order of a test in played or changed. Sometimes the pro- DWARF Standards Committee since code that was generated by an inline func- grammer may want to bypass execution of 1999. Michael can be contacted at tion in the instantiation of a C++ template. a function, returning a known value instead [email protected]. By the time it gets its metaphorical hands of what the function would have (possibly © Eager Consulting, 2006, 2007, 2012 on the program, the optimizer may have a difficult time connecting its manipulations incorrectly) computed. There are also data related operations while attempting to reverse engineer the A Brief History of that are useful. For example, displaying Sun extensions. Nonetheless, stabs is still the type of a variable can avoid having to widely used. DWARF look up the type in the source files. Dis- COFF stands for Common Object File playing the value of a variable in different Format and originated with Unix System V DWARF 1 ─ Unix SVR4 sdb formats, or displaying a memory or register Release 3. Rudimentary debugging infor- in a specified format is helpful. and PLSIG mation was defined with the COFF format, WARF3 was developed by Brian Rus- There are some operations which might but since COFF includes support for named Dsell, Ph.D., at Bell Labs in 1988 for use be called advanced debugging functions: sections, a variety of different debugging with the C compiler and sdb debugger in for example, being able to debug multi- formats such as stabs have been used with Unix System V Release 4 (SVR4). The Pro- threaded programs or programs stored in COFF. The most significant problem with gramming Languages Special Interest read-only memory. One might want a de- COFF is that despite the Common in its Group (PLSIG), part of Unix International bugger (or some other program analysis name, it isn't the same in each architecture (UI), documented the DWARF generated by tool) to keep track of whether certain sec- which uses the format. There are many SVR4 as DWARF Version 1 in 1992. Al- tions of code had been executed or not. variations in COFF, including XCOFF (used though the original DWARF had several Some debuggers allow the programmer to on IBM RS/6000), ECOFF (used on MIPS clear shortcomings, most notably that it call functions in the program being tested. and Alpha), and Windows PE-COFF. Docu- was not very compact, the PLSIG decided to In the not-so-distant past, debugging pro- mentation of these variants is available to standardize the SVR4 format with only grams that had been optimized would have varying degrees but neither the object mod- minimal modification. It was widely adopt- been considered an advanced feature. ule format nor the debugging information ed within the embedded sector where it is standardized. The task of a debugger is to provide the continues to be used today, especially for programmer with a view of the executing PE-COFF is the object module format small processors. program in as natural and understandable used by Microsoft Windows beginning with fashion as possible, while permitting a wide Windows 95. It is based on the COFF for- DWARF 2 ─ PLSIG range of control over its execution. This mat and contains both COFF debugging he PLSIG continued to develop and means that the debugger has to essentially data and Microsoft's own proprietary Code- document extensions to DWARF to ad- reverse much of the compiler's carefully View or CV4 debugging data format. Docu- T dress several issues, the most important of crafted transformations, converting the pro- mentation on the debugging format is both which was to reduce the size of debugging gram's data and state back into the terms sketchy and difficult to obtain. data that were generated. There were also that the programmer originally used in the OMF stands for Object Module Format additions to support new languages such as program's source. and is the object file format used in CP/M, the up-and-coming C++ language. DWARF The challenge of a debugging data for- DOS and OS/2 systems, as well as a small Version 2 was released as a draft standard mat, like DWARF, is to make this possible number of embedded systems.

Load more