Pmac Binary Instrumentation Library for Powerpc/AIX

PMaC Binary Instrumentation Library for PowerPC/AIX Mustafa M. Tikir, Michael Laurenzano, Laura Carrington, Allan Snavely Performance Modeling and Characterization Lab San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093 {mtikir,michaell,lcarring,allans}@sdsc.edu Abstract In this paper we introduce PMaCinst, the PMaC binary instrumentation library. PMaCinst enables The decommissioning of Alpha AXP-based systems instrumentation of XCOFF [5] executables on the AIX carrying the ATOM toolkit has left the need for an operating system on PowerPC processors [6]. PMaCinst is efficient, flexible binary instrumentation tool-building a tool-building system similar in nature to popular toolkits framework for another platform. PMaCinst is a binary such as Dyninst [2], ATOM and Pin [3]. It is designed to instrumentation toolkit that operates on XCOFF binaries provide functionality that is similar to these tools in that it on AIX for PowerPC processors. PMaCinst has a C++ allows arbitrary functions to be inserted at arbitrary points API that provides the means to inject code and data into a in executables. Furthermore, it provides the facilities to binary file In this paper, we first present the mechanisms insert hand optimized low-level assembly code, which can for performing these modifications along with the key help keep instrumentation overhead minimal. PMaCinst parts of the API. We then present three example rewriting works on any compiled XCOFF binary (both 32 and 64- tools that have been built using PMaCinst, which help to bit), independent of the compiler and language system highlight some of the correctness and efficiency issues that used to generate the binary. can be encountered by tool writers. Finally, we show that programs instrumented with PMaCinst slow down at rates We also present several rewriting tools implemented that are comparable to equivalently instrumented using the PMaCinst library. These tools include a basic programs created with ATOM. block tracing tool that prints function name and line number information of basic blocks as they are executed, a 1. Introduction basic block counting tool that counts the number of times each basic block is executed during a program run, and a Program instrumentation tools [1-3] provide a unique and cache simulation tool that can simulate multiple memory important source of information for understanding and hierarchies with different numbers of cache levels using tuning the execution behavior of applications. These tools the address stream of the program. The PMaCinst have proven to be effective during every stage of an instrumentation library source code and the previously application’s development cycle. They have been mentioned rewriting tools are available for download at extensively used to evaluate how a program will perform http://www.sdsc.edu/PMaC/PMaCinst/. on new systems, to identify performance bottlenecks during program execution, to gather information on 2. PMaCinst API underlying systems for profile-driven optimizations, and to identify bugs in programs [9]. PMaCinst library provides an Application Programmer Interface (API) in C++ for building program analysis and In the Performance Modeling and Characterization rewriting tools. This API includes functionality to parse an (PMaC) Lab at the San Diego Supercomputer Center, we XCOFF executable and generate program objects, to inject focus on developing methods and tools for understanding instrumentation code and data into the executable, and to and predicting the performance of scientific applications generate a modified XCOFF executable that includes on existing and future HPC systems. One of these tools, instrumentation code and the accompanying changes to called the MetaSim Tracer[4], provides the user with a program objects. The general framework for PMaCinst is detailed summary of the fundamental operations carried shown in Figure 1. out by the application. MetaSim Tracer is a binary rewriting tool currently implemented on top of the ATOM 2.1 Parsing the Executable toolkit [1] for Tru64 Unix on Alpha AXP processors. However, due to the decommissioning of current Alpha- The PMaCinst API provides class definitions for XCOFF based systems, the need has emerged for an efficient, file objects such as the file header, section header, symbol flexible instrumentation toolkit that operates on a different table, relocation table and line information table, and platform. provides class definitions for program objects such as instructions, basic blocks, functions, control flow graphs address space. Each traceback table starts with a null word (CFG), memory operations, and natural loops. (with a value of 0 also indicating an invalid instruction), which can then be used to mark the end of its The method of the class (shown in parse XCoffFile accompanying function. Figure 1) parses an input executable file, creates XCOFF objects, and creates program objects for the entire To generate the CFG for a function with indirect executable. The parse method parses every object at jumps, we first identify whether the jump instruction is a startup instead of incremental parsing, where program multi-target lookup table jump generated by a high level objects would be parsed as needed with a potentially code construct such as a switch statement or whether it is smaller overhead. Parsing every object at startup allows simply an indirect call to a function. To identify multi- PMaCinst to assign a unique and persistent ID, called a target jumps, we search for compiler-specific instruction hashcode, to each program object encountered in the sequence patterns generated for lookup table jumps similar executable. This hashcode is based on a hierarchy of to general peephole optimization techniques used in container objects (such as instructions in a basic block, compilers. Such patterns contain instruction sequences to basic blocks in a function, functions in a code section, compare the switch value with the lookup table size as etc.), and parsing the executable in its entirety at startup well as instruction sequences to calculate the jump enables PMaCinst to assign the same hashcode to each addresses. Using the information available in these object every time the executable is parsed. Unlike parsing, instruction sequences, we identify the locations of lookup much of the remaining functionality of PMaCinst is tables in the executable as well as the number of entries in invoked on demand, such as constructing loops or them. Later we parse the lookup table to generate the generating line number information for a particular potential target addresses for the indirect jump instruction. function or for all functions in a code section. Since we search for instruction sequence patterns to identify the multi-target jump instructions, PMaCinst // Step I : Executable parsing requires prior knowledge about all potential patterns that XCoffFile file(execName); may be generated by the available compilers on the file.parse(); system. In this early version of the PMaCinst library, we handle common patterns for the GNU and native compilers for C, C++, and Fortran. We plan to extend these patterns // Step II : Code/Data Injection in future versions of PMaCinst to include other languages XCoffFileGen fileGen(file); fileGen.instrument(); and compilers. PMaCinst also provides on-demand methods that discover and interface to more complex program // Step III: New Executable information such as dominator information, natural loops fileGen.dump(); and line number information. For loop discovery, we use the algorithms presented in [12] along with the linear- Figure 1. General PMaCinst Framework complexity dominator finding method presented by Executable parsing is fairly straightforward as XCOFF Lenguar and Tarjan in [13]. is a well defined and self contained object file format. Necessary line information can be found in the line However, special processing is required in two notable information and symbol tables of the XCOFF executable cases: identifying the sizes of some functions, and when debugging information is present, but finding generating the CFG for functions with indirect jumps relevant information from these sources can involve time- resulting from high-level constructs such as switch wasting searches. For example, determining the source file statements (multi-target jumps that use lookup tables). from which a particular instruction came from would PMaCinst does not require that executables contain typically involve searching the line information table to debugging information. The minimal symbol table for an determine which symbol (function) is associated with that XCOFF file, which is generated when the program is instruction then searching backwards from this function in compiled without the debug flag (e.g. –g), provides the symbol table to find the file symbol that contains the information about the sizes of all control sections (CSECT function. Such searching is inefficient and can be [5]) in the executable. It also provides size information for incredibly redundant when many such lookups are being some functions. However, a CSECT may contain several done. Instead of working from scratch for each such functions for which the symbol table only provides search, the user can invoke the setLineInfoFinder information about the containing CSECT. For these method of the XCoffFile class to create data structures functions, we use the start address of the function’s that maintain intermediate information about how the traceback table to determine

Pmac Binary Instrumentation Library for Powerpc/AIX

The “Stabs” Debug Format

The IAR XLINK Linker™ and IAR XLIB Librarian™ Reference Guide

Packet Structure Definition

Debugging with GDB

Portable Executable File Format

Common Object File Format (COFF)

CS429: Computer Organization and Architecture Linking I & II

IBM AIX Version 6.1 Differences Guide

Chapter 6: the Linker

IAR Linker and Library Tools Reference Guide

Using Ld the GNU Linker

CPE 323 Introduction to Software Reverse Engineering in Embedded Computer Systems