Outline Executable/Object File Formats Brief History of Binary File Formats
Total Page:16
File Type:pdf, Size:1020Kb
Outline CSci 5980/8980 ELF basics Manual and Automated Binary Reverse Engineering Slides 5: The ELF Binary File Format Stephen McCamant Static and dynamic linking University of Minnesota Executable/object file formats Brief history of binary file formats (Unix) Modern systems usually use a common format for Early Unix had a simple a.out format relocatable object files during compilation and final Lasted until early days of free Linux/BSD, now obsolete executables AT&T’s second try was named COFF Mostly binary data representing code and data Still limited, but widely adopted with changes Plus metadata allowing the data to be linked and AT&T’s third try was ELF, now used in almost all Unix loaded systems Brief history of binary file formats (non-Unix) Compile-time and run-time Early DOS and Windows had several limited formats Some file features are used during compilation Since the 32-bit era, Windows uses the PE (Portable Typically first created by assembler, then used/modified Executable) format by the linker Partially derived from COFF Other features are used when the program runs OS X era Apple (including iOS, etc) uses a format By the OS when the program starts named Mach-O And now also by runtime linking First developed for the Mach microkernel used on the NeXT Static and dynamic/shared linking ELF Traditional “static” linking happens all at compile time Executable (or Extensible) and Linking (or Linkable) Libraries become indistinguishable from the rest of the Format program First appeared in System V Release 4 Unix, c. 1989 For efficiency and flexibility, it is now more common to postpone library linking until runtime Linux switched to ELF c. 1995 At startup, or later in execution In part because they’d chosen a hard-to-use approach to Library code stays separate, so its memory can be a.out shared libraries shared BSD switched later, c. 1998-2000 Segments and sections High-level ELF file structure A fixed-size header with a magic number \x7fELF The run-time structure of an ELF executable divides it into segments and basic information The table describing segments is the program headers The program headers The compile-time structure of an ELF file divides it Code and data that are loaded when the program into sections runs The table describing segments is the section headers Data that isn’t normally loaded, like debugging Commonly several sections make up a segment symbols The section headers Kinds of ELF files The two main segments The code segment contains executable code, and .o .obj Relocatable object files (Windows: ) read-only data .exe Executables (Windows: ) The data segment contains writeable data .so .dll Dynamic/shared object files (Windows: ) Both are type LOAD Core dumps The segments are arranged this way so only two memory mappings are needed Other common segments The main sections INTERP: holds pathname of dynamic loader .text: most code DYNAMIC: information used by dynamic linking .rodata: read-only data like string constants (GNU )STACK: specifies stack permissions .data: initialized data (values stored in file) NOTE: miscellaneous data; in core dumps, register .bss: zero-initialized data (zeros not stored) values Other common sections Outline .init/.fini: startup/cleanup code .rel.*/.rela.*: relocation information ELF basics .comment: compiler version number .eh frame: exception-handling metadata Static and dynamic linking .debug *: debugging information Static linking Static linking vs. PIE Standard fixed addresses: At compile time, combining .o files into an x86-32: Starting at 0x08480000 executable binary x86-64: Code at 0x400000, data at 0x600000 The Unix linker is traditionally called ld Recent systems default to making even the main Generally, content from the same section of each executable position independent object file is grouped together PIE = position-independent executable PIE binaries look like shared libraries, and look on Traditionally, the linker chooses a fixed address for disk like they start at address 0 the binary to be loaded at At run-time, these offsets added to a random base Static libraries Relocation Unix static libraries end in .a, and are just archives Content in .o files must be fixed up when final of .o files locations chosen Program ar was once a relative of tar The relocation table tells the linker how The .o file are the unit of code inclusion Gives location, target symbol, machine-specific type So, often one per API function An additional offset (“addend”) may be stored in the Transitive requirements and ordering are not original bytes or in the table automatic Relocations are always size-preserving Symbol table Static program startup Static programs are loaded just by the kernel, and ELF files that define symbols list them in a symbol fairly simple table section .symtab Code and data regions are mapped as if by mmap Can examine with nm as well as objdump (demand paged) By default, finished executables include the symbol Stack is initialized with arguments, environment table variables, and auxiliary vector auxv But it is removed by strip Execution starts at the entry point Kinds of dynamic linking Dynamic program startup Automatic: libraries chosen at compile time, loaded The kernel loads both the program and the dynamic before main loader ld.so See list with ldd Full name, e.g., /lib64/ld-linux-x86-64.so.2 Linking process transparent to code, like static ld.so On-demand: requested by program runs first and performs linking Then returns control to the main program Open library with dlopen, lookup symbol with dlsym (in library libdl) You can also invoke ld.so manually Used for things like plugins Dynamic linking structures The PLT and GOT Dynamic linking uses ELF file structures separate but The Procedure Linkage Table (PLT) contains a code analogous to static linking, e.g.: stub for each called function from an external library printf@plt .dynamic section is DYNAMIC segment The static linker makes calls to, e.g., , in place Dynamic symbols .dynsym (c.f. nm -D) of a static function address Dynamic relocations .rela.dyn PLT stubs reference function pointers stored in the Calls to dynamically-linked functions use code stubs Global Offset Table (GOT) E.g. a pointer holding the location of printf in the C in the PLT referencing pointers in the GOT library Lazy resolution vs. RELRO To save startup time, symbol lookup for a function is often delayed until the first call, “lazy” On the other hand, dynamic linker structures are useful for attackers Writeable function pointers with a standard layout RELRO (relocation read-only) configurations make more dynamic linking state read-only after startup Full RELRO disables lazy resolution.