A Shared Library and Dynamic Linking Monitor
Total Page:16
File Type:pdf, Size:1020Kb
MONDO: A Shared Library and Dynamic Linking Monitor C. Donour Sizemore, Jacob R. Lilly, and David M. Beazley Department of Computer Science University of Chicago Chicago, Illinois 60637 {donour,jrlilly,beazley}@cs.uchicago.edu March 15, 2003 Abstract ically loadable extension modules. Similarly, dy- namic modules are commonly used to extend web Dynamic modules are one of the most attractive servers, browsers, and other large programming en- features of modern scripting languages. Dynamic vironments. modules motivate programmers to build their ap- Although dynamic linking offers many benefits plications as small components, then glue them to- such as increased modularity, simplified mainte- gether. They rely on the runtime linker to assemble nance, and extensibility, it has also produced a con- components and shared libraries as the application siderable amount of programmer confusion. Few runs. MONDO, a new debugging tool, provides pro- programmers would claim to really understand how grammers with the ability to monitor in real-time dynamic linking actually works. Moreover, the run- the dynamic linking of a program. MONDO sup- time linking process depends heavily on the system plies programmers with a graphical interface show- configuration, environment variables, subtle com- ing library dependencies, listing symbol bindings, piler and linker options, and the flags passed to low- and providing linking information used to uncover level dynamic loader. The interaction of these pieces subtle programming errors related to the use of tends to produce a very obscured picture of what shared libraries. The use of MONDO requires no is actually going on inside an application that uses modification to existing code or any changes to the shared libraries. Unfortunately, traditional debug- dynamic linker. MONDO can be used with any ap- ging tools are of little assistance since they are pri- plication that uses shared libraries. marily concerned with errors in program logic rather than errors that arise from the way in which a pro- gram is constructed and linked. 1 Introduction To address this limitation, we present a special purpose debugger, MONDO (Monitor of Dynamic For the past 10-15 years, shared libraries and dy- Objects), that is designed to help programmers dis- namic linking have changed the way application pro- cover problems related to the run-time linking of grammers write software. Instead of writing huge an application. MONDO works by watching real- monolithic applications, it is quite common to exe- time debug traces generated by the run-time linker cute smaller software components, extension mod- (ld.so.1) and using the output information to con- ules, and plugins. For example, when programmers struct a global view of an application, its libraries, use scripting languages like Python or Tcl, they also and dynamically loaded modules (if any). Using this tend to use a collection of loosely-coupled dynam- information, it is possible examine symbol bindings, library dependencies, as well as to uncover subtle % ldd a.out programming problems related to linking that are libsocket.so.1 => /usr/lib/libsocket.so.1 otherwise hidden from the programmer by the lim- libthread.so.1 => /usr/lib/libthread.so.1 itations of classic debugging tools. libdl.so.1 => /usr/lib/libdl.so.1 In the first part of this paper, we illustrate dy- libc.so.1 => /usr/lib/libc.so.1 namic linking and some of the common program- ming errors that arise. Next, we describe how run- When a dynamically linked program runs, con- time linking information can be extracted from the trol is first passed to a special runtime loader (often loader. This information is used to construct the called ld.so.1). The loader is then responsible for MONDO debugger and how it can be used by pro- locating, loading, and binding library dependencies grammers. to the the executable–a process that occurs at run- time. To improve performance, most of the runtime binding occurs lazily. Library symbols are not actu- 2 An Overview of Dynamic Linking ally bound until they are used for the first time by a program. This binding is managed by routing proce- To create an executable, a “linker” is used to dure calls through a special procedure linking table assemble the appropriate object files and libraries (PLT), a jump table use to match procedures with into a runnable program [2]. Linking is nothing memory addresses. Initially, the PLT is set up to more than the binding of symbolic names to mem- pass control back to the dynamic loader when a pro- ory addresses. For example, if you have some proce- cedure is called for the first time. When this occurs, dure foo() in your application, the linker calculates the dynamic linker locates the symbol in a library where in memory foo() will reside and patches all and patches the PLT to point to the newly bound references to foo() making sure they point to the symbol. Many programmers are unaware that link- memory address of foo(). Most programmers view ing occurs throughout the execution of a program. linking as merely the final step of building a pro- For instance, when a user selects certain applica- gram. That is,can one simply runs the linker and it tion features, the linker quietly runs underneath the collects all of the object files and library functions program–binding all of the newly used symbols as into some huge “bundle of bits” that is the program. they are needed. A popular misconceptions of program state is that In addition to automatically binding libraries, once linked the program runs statically without any the runtime loader can also be used to explicitly dynamic changes (i.e., one just executes the pro- load new program modules through a special dy- gram). namic loading API. Using calls such as dlopen() In reality, the situation is much more compli- and dysym(), programmers can load shared object cated and convoluted than the previously described files, search for symbols, and invoke functions. This model. Most modern systems use dynamic linking– API is the basis for systems that support dynami- a technique in which much of the actual linking is cally loadable extension modules and plugins. deferred until runtime. For example, when you cre- ate a program like this, 3 Enter MONDO % ld $(OBJS) -lsocket -lthread -ldl Unlike in conventional programming langauges The linker merely records the names of libraries in (C, C++, etc), users routinely change the ”struc- the executable instead of linking them. A command ture” of their program when they import modules. such as ldd can be used to list these library and their In scripting languages errors in program construc- dependencies. For example: tion can be considered programming errors. A single module may introduce many libraries with no clear 4 Data Collection way of watching what is happening to a process as they are linked. Here a mistake in program logic As previously mentioned, MONDO collects infor- results in an incorrectly built program. mation from ld.so.1 using its debugging mode. The runtime linker (ld.so.1) on many systems can This is enabled by setting the LD DEBUG and supply real-time updates as it binds symbols from LD DEBUG OUTPUT environment variables of a dynamic objects. The LD DEBUG environment process. Debugging data is redirected to a file where variable allows the user to collect information about it is read and parsed by MONDO as the process exe- libraries, dependencies, bindings, and relocations as cutes. Information of interest includes library search they are processed by ld.so.1 during program exe- paths, library loading, library dependencies, symbol cution. Figure 1 illustrates some of this output for bindings, and relocations. In addition to collecting Solaris. the trace data, MONDO also collects information from the library files themselves. This is used to The ld.so.1 debugging trace contains a wealth build a model of what libraries have been loaded, of information about almost everything the runtime what symbols are contained in those libraries, and linker is doing during program execution. Moreover, which symbols have actually been bound by the run- this information is presented in real-time–making time linker. it qualitatively different than the information pro- MONDO can simultaneously monitor any number vided by static library diagnostic tools such as ldd, of programs and is limited only by the underlying nm, or elfdump. However, the debug trace is often operating system (number of process, open file han- unreadable and contains too much unordered infor- dles, etc.). Since tracing is enabled by the environ- mation to be easily usable by programmers. ment (and environments are preserved after calls to MONDO is a tool that synthesizes real-time data exec()), any process spawned by a traced program from the dynamic loader with the information that will be traced itself. Process ID is provided inline is available from existing library diagnostic tools with traces data (first column in fig.1) and ld.so.1 (e.g., nm, ldd, etc.). Moreover, it tries to present is able to dump trace data to a file corresponding this information to the user in a coherent manner the PID. MONDO can follow the growth of a pro- using an interactive graphical user interface. The cess tree without the need to do system call tracing. system consists of two components; a real-time data Function overloading, Namespaces, etc. do not collector that extracts data from the ld.so.1 trace exists at the object level. To support these fea- and an analysis tool that assembles this data, com- tures C++ compiles mangle type names. The man- bines it with static library information, and presents gled names reflect typing and scope information. it to the user. MONDO uses the demangingle code from GCC to To use MONDO, a programmer simply types demangle these symbols names.