<<

Outline

CSci 5980/8980 ELF basics Manual and Automated Binary Reverse Engineering Slides 5: The ELF Binary Format Stephen McCamant Static and dynamic linking University of Minnesota

Executable/ formats Brief of formats ()

Modern systems usually use a common format for Early Unix had a simple a.out format relocatable object files during compilation and final Lasted until early days of free /BSD, now obsolete &T’s second try was named COFF Mostly binary representing code and data Still limited, but widely adopted with changes Plus metadata allowing the data to be linked and AT&T’s third try was ELF, now used in almost all Unix loaded systems

Brief history of binary file formats (non-Unix) Compile- and run-time

Early DOS and Windows had several limited formats Some file features are used during compilation Since the 32-bit era, Windows uses the PE (Portable Typically first created by assembler, then used/modified ) format by the Partially derived from COFF Other features are used when the program runs OS X era Apple (including iOS, etc) uses a format By the OS when the program starts named Mach-O And now also by runtime linking First developed for the Mach microkernel used on the NeXT

Static and dynamic/shared linking ELF

Traditional “static” linking happens all at Executable (or Extensible) and Linking (or Linkable) Libraries become indistinguishable from the rest of the Format program First appeared in System V Release 4 Unix, . 1989 For efficiency and flexibility, it is now common to postpone linking until runtime Linux switched to ELF c. 1995 At startup, or later in In part because they’d chosen a hard-to-use approach to Library code stays separate, so its memory can be a.out shared libraries shared BSD switched later, c. 1998-2000 Segments and sections High-level ELF file structure A fixed-size header with a magic number \x7fELF The run-time structure of an ELF executable divides it into segments and basic information The table describing segments is the program headers The program headers The compile-time structure of an ELF file divides it Code and data that are loaded when the program into sections runs The table describing segments is the section headers Data that isn’t normally loaded, like Commonly several sections up a segment symbols The section headers

Kinds of ELF files The two main segments

The contains executable code, and .o .obj Relocatable object files (Windows: ) -only data .exe Executables (Windows: ) The contains writeable data .so .dll Dynamic/shared object files (Windows: ) Both are LOAD Core dumps The segments are arranged this way so only two memory mappings are needed

Other common segments The main sections

INTERP: holds pathname of dynamic .text: code DYNAMIC: information used by dynamic linking .rodata: read-only data like string constants (GNU )STACK: specifies stack permissions .data: initialized data (values stored in file) NOTE: miscellaneous data; in core dumps, register .bss: zero-initialized data (zeros not stored) values

Other common sections Outline

.init/.fini: startup/cleanup code .rel.*/.rela.*: information ELF basics .comment: version number .eh frame: exception-handling metadata Static and dynamic linking .debug *: debugging information Static linking Static linking vs. PIE Standard fixed addresses: At compile time, combining .o files into an x86-32: Starting at 0x08480000 executable binary x86-64: Code at 0x400000, data at 0x600000 The Unix linker is traditionally called ld Recent systems default to making even the main Generally, content from the same section of each executable position independent object file is grouped together PIE = position-independent executable PIE binaries look like shared libraries, and look on Traditionally, the linker chooses a fixed address for disk like they start at address 0 the binary to be loaded at At run-time, these offsets added to a random base

Static libraries Relocation

Unix static libraries end in .a, and are just archives Content in .o files must be fixed up when final of .o files locations chosen Program was once a relative of tar The relocation table tells the linker how The .o file are the unit of code inclusion Gives location, target symbol, machine-specific type So, often one per API function An additional offset (“addend”) may be stored in the Transitive requirements and ordering are not original or in the table automatic Relocations are always size-preserving

Symbol table Static program startup

Static programs are loaded just by the kernel, and ELF files that define symbols list them in a symbol fairly simple table section .symtab Code and data regions are mapped as if by mmap Can examine with as well as (demand paged) By default, finished executables include the symbol Stack is initialized with arguments, environment table variables, and auxiliary vector auxv But it is removed by Execution starts at the

Kinds of dynamic linking Dynamic program startup

Automatic: libraries chosen at compile time, loaded The kernel loads both the program and the dynamic before main loader ld.so See list with ldd Full name, e.g., /lib64/ld-linux-x86-64.so.2 Linking transparent to code, like static ld.so On-demand: requested by program runs first and performs linking Then returns control to the main program library with dlopen, lookup symbol with dlsym (in library libdl) You can also invoke ld.so manually Used for things like plugins Dynamic linking structures The PLT and GOT

Dynamic linking uses ELF file structures separate but The Procedure Linkage Table (PLT) contains a code analogous to static linking, e.g.: stub for each called function from an external library @plt .dynamic section is DYNAMIC segment The static linker makes calls to, e.g., , in place Dynamic symbols .dynsym (c.f. nm -D) of a static function address Dynamic relocations .rela.dyn PLT stubs reference function pointers stored in the Calls to dynamically-linked functions use code stubs (GOT) E.g. a pointer holding the location of printf in the C in the PLT referencing pointers in the GOT library

Lazy resolution vs. RELRO

To save startup time, symbol lookup for a function is often delayed until the first call, “lazy” On the other hand, structures are useful for attackers Writeable function pointers with a standard layout RELRO (relocation read-only) configurations make more dynamic linking state read-only after startup Full RELRO disables lazy resolution