Carnegie Mellon Carnegie Mellon
Today
¢ Linking Linking ¢ Case study: Library interposi oning
15-213 / 18-213: Introduc on to Computer Systems 12th Lecture, Feb. 23, 2012
Instructors: Todd C. Mowry & Anthony Rowe
1 2
Carnegie Mellon Carnegie Mellon
Example C Program Sta c Linking
¢ Programs are translated and linked using a compiler driver: main.c swap.c § unix> gcc -O2 -g -o p main.c swap.c int buf[2] = {1, 2}; extern int buf[]; § unix> ./p
int main() int *bufp0 = &buf[0]; { static int *bufp1; main.c swap.c Source files swap(); return 0; void swap() Translators Translators } { (cpp, cc1, as) (cpp, cc1, as) int temp;
Separately compiled bufp1 = &buf[1]; main.o swap.o relocatable object files temp = *bufp0; *bufp0 = *bufp1; Linker (ld) *bufp1 = temp; } Fully linked executable object file p (contains code and data for all func ons defined in main.c and swap.c)
3 4
1 Carnegie Mellon Carnegie Mellon
Why Linkers? Why Linkers? (cont)
¢ Reason 1: Modularity ¢ Reason 2: Efficiency
§ Program can be wri en as a collec on of smaller source files, § Time: Separate compila on rather than one monolithic mass. § Change one source file, compile, and then relink. § No need to recompile other source files. § Can build libraries of common func ons (more on this later) § e.g., Math library, standard C library § Space: Libraries § Common func ons can be aggregated into a single file... § Yet executable files and running memory images contain only code for the func ons they actually use.
5 6
Carnegie Mellon Carnegie Mellon
What Do Linkers Do? What Do Linkers Do? (cont)
¢ Step 1. Symbol resolu on ¢ Step 2. Reloca on
§ Programs define and reference symbols (variables and func ons): § Merges separate code and data sec ons into single sec ons § void swap() {…} /* define symbol swap */ § swap(); /* reference symbol a */ § Relocates symbols from their rela ve loca ons in the .o files to § int *xp = &x; /* define symbol xp, reference x */ their final absolute memory loca ons in the executable.
§ Symbol defini ons are stored (by compiler) in symbol table. § Updates all references to these symbols to reflect their new § Symbol table is an array of structs posi ons. § Each entry includes name, size, and loca on of symbol.
§ Linker associates each symbol reference with exactly one symbol defini on.
7 8
2 Carnegie Mellon Carnegie Mellon
Three Kinds of Object Files (Modules) Executable and Linkable Format (ELF)
¢ Relocatable object file (.o file) ¢ Standard binary format for object files § Contains code and data in a form that can be combined with other ¢ Originally proposed by AT&T System V Unix relocatable object files to form executable object file. § Later adopted by BSD Unix variants and Linux § Each .o file is produced from exactly one source (.c) file ¢ One unified format for § Relocatable object files (.o), ¢ Executable object file (a.out file) § Executable object files (a.out) § Contains code and data in a form that can be copied directly into § Shared object files (.so) memory and then executed.
¢ Generic name: ELF binaries ¢ Shared object file (.so file) § Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load me or run- me. § Called Dynamic Link Libraries (DLLs) by Windows
9 10
Carnegie Mellon Carnegie Mellon ELF Object File Format ELF Object File Format (cont.) ¢ Elf header § Word size, byte ordering, file type (.o, ¢ .symtab sec on 0 0 exec, .so), machine type, etc. ELF header § Symbol table ELF header § Procedure and sta c variable names ¢ Segment header table Segment header table Segment header table § Sec on names and loca ons § Page size, virtual addresses memory segments (required for executables) (required for executables) (sec ons), segment sizes. .text sec on ¢ .rel.text sec on .text sec on § Reloca on info for .text sec on ¢ .text sec on .rodata sec on § Addresses of instruc ons that will need to be .rodata sec on § Code .data sec on modified in the executable .data sec on § Instruc ons for modifying. ¢ .rodata sec on .bss sec on .bss sec on § Read only data: jump tables, ... ¢ .rel.data sec on .symtab sec on § Reloca on info for .data sec on .symtab sec on ¢ .data sec on .rel.txt sec on § Addresses of pointer data that will need to be .rel.txt sec on § Ini alized global variables modified in the merged executable .rel.data sec on .rel.data sec on ¢ .bss sec on ¢ .debug sec on .debug sec on .debug sec on § Unini alized global variables § Info for symbolic debugging (gcc -g) § “Block Started by Symbol” Sec on header table ¢ Sec on header table Sec on header table § “Be er Save Space” § Offsets and sizes of each sec on § Has sec on header but occupies no space 11 12
3 Carnegie Mellon Carnegie Mellon
Linker Symbols Resolving Symbols
Global External Local ¢ Global symbols Global § Symbols defined by module m that can be referenced by other modules. int buf[2] = {1, 2}; extern int buf[]; § E.g.: non-static C func ons and non-static global variables. int main() int *bufp0 = &buf[0]; { static int *bufp1; ¢ External symbols swap(); § Global symbols that are referenced by module m but defined by some return 0; void swap() Global other module. } main.c { int temp;
¢ Local symbols External Linker knows bufp1 = &buf[1]; § Symbols that are defined and referenced exclusively by module m. nothing of temp temp = *bufp0; § E.g.: C func ons and variables defined with the static a ribute. *bufp0 = *bufp1; *bufp1 = temp; § Local linker symbols are not local program variables } swap.c
13 14
Carnegie Mellon Carnegie Mellon Reloca ng Code and Data Reloca on Info (main)
Relocatable Object Files Executable Object File main.c main.o int buf[2] = 0000000
4 Carnegie Mellon Carnegie Mellon Reloca on Info (swap, .text) Reloca on Info (swap, .data) swap.c swap.o swap.c extern int buf[]; Disassembly of section .text: extern int buf[]; Disassembly of section .data: 00000000
17 18
Carnegie Mellon Carnegie Mellon
0: 8b 15 00 00 00 00 mov 0x0,%edx Executable Before/A er Reloca on (.text) 2: R_386_32 buf 0000000
5 Carnegie Mellon Carnegie Mellon
Executable A er Reloca on (.data) Strong and Weak Symbols
¢ Program symbols are either strong or weak Disassembly of section .data: § Strong: procedures and ini alized globals 08049620
08049628
strong p1() { p2() { strong } }
21 22
Carnegie Mellon Carnegie Mellon
Linker’s Symbol Rules Linker Puzzles int x; Link me error: two strong symbols (p1) ¢ Rule 1: Mul ple strong symbols are not allowed p1() {} p1() {} § Each item can be defined only once § Otherwise: Linker error int x; int x; References to x will refer to the same p1() {} p2() {} unini alized int. Is this what you really want?
¢ Rule 2: Given a strong symbol and mul ple weak symbol, int x; double x; int y; p2() {} Writes to x in p2 might overwrite y! choose the strong symbol p1() {} Evil! § References to the weak symbol resolve to the strong symbol int x=7; double x; Writes to x in p2 will overwrite y! int y=5; p2() {} Nasty! p1() {} ¢ Rule 3: If there are mul ple weak symbols, pick an arbitrary one int x=7; int x; References to x will refer to the same ini alized § Can override this with gcc –fno-common p1() {} p2() {} variable.
Nightmare scenario: two iden cal weak structs, compiled by different compilers with different alignment rules. 23 24
6 Carnegie Mellon Carnegie Mellon
Role of .h Files Running Preprocessor global.h global.h c1.c c1.c #ifdef INITIALIZE #include "global.h" #ifdef INITIALIZE #include "global.h" int g = 23; int g = 23;
static int init = 1; int f() { static int init = 1; int f() { #else return g+1; #else return g+1; int g; } int g; } static int init = 0; static int init = 0; #endif #endif c2.c -DINITIALIZE #include
int main() { int g = 23; int g; if (!init) static int init = 1; static int init = 0; g = 37; int f() { int f() { int t = f(); return g+1; return g+1; printf("Calling f yields %d\n", t); } } return 0; } #include causes C preprocessor to insert file verba m 25 26
Carnegie Mellon Carnegie Mellon
Role of .h Files (cont.) Global Variables global.h c1.c #ifdef INITIALIZE ¢ Avoid if you can #include "global.h" int g = 23; static int init = 1; int f() { #else ¢ Otherwise return g+1; int g; } static int init = 0; § Use static if you can #endif § Ini alize if you define a global variable c2.c § Use extern if you use external global variable #include
int main() { ?? if (!init) gcc -o p c1.c c2.c \ g = 37; -DINITIALIZE int t = f(); ?? printf("Calling f yields %d\n", t); return 0; } 27 28
7 Carnegie Mellon Carnegie Mellon
Packaging Commonly Used Func ons Solu on: Sta c Libraries
¢ How to package func ons commonly used by programmers? ¢ Sta c libraries (.a archive files) § Math, I/O, memory management, string manipula on, etc. § Concatenate related relocatable object files into a single file with an index (called an archive).
¢ Awkward, given the linker framework so far: § Enhance linker so that it tries to resolve unresolved external references § Op on 1: Put all func ons into a single source file by looking for the symbols in one or more archives. § Programmers link big object file into their programs
§ Space and me inefficient § If an archive member file resolves reference, link it into the executable. § Op on 2: Put each func on in a separate source file
§ Programmers explicitly link appropriate binaries into their programs § More efficient, but burdensome on the programmer
29 30
Carnegie Mellon Carnegie Mellon
Crea ng Sta c Libraries Commonly Used Libraries libc.a (the C standard library) § 8 MB archive of 1392 object files. atoi.c printf.c random.c § I/O, memory alloca on, signal handling, string handling, data and me, random numbers, integer math Translator Translator ... Translator libm.a (the C math library) § 1 MB archive of 401 object files. atoi.o printf.o random.o § floa ng point math (sin, cos, tan, log, exp, sqrt, …)
% ar -t /usr/lib/libc.a | sort % ar -t /usr/lib/libm.a | sort Archiver (ar) unix> ar rs libc.a \ atoi.o printf.o … random.o … … fork.o e_acos.o … e_acosf.o libc.a C standard library fprintf.o e_acosh.o fpu_control.o e_acoshf.o fputc.o e_acoshl.o freopen.o e_acosl.o fscanf.o e_asin.o ¢ Archiver allows incremental updates fseek.o e_asinf.o ¢ Recompile func on that changes and replace .o file in archive. fstab.o e_asinl.o … …
31 32
8 Carnegie Mellon Carnegie Mellon Linking with Sta c Libraries Using Sta c Libraries
addvec.o multvec.o ¢ Linker’s algorithm for resolving external references: § Scan .o files and .a files in the command line order. § During the scan, keep a list of the current unresolved references. main2.c vector.h Archiver § (ar) As each new .o or .a file, obj, is encountered, try to resolve each unresolved reference in the list against the symbols defined in obj. Translators § If any entries in the unresolved list at end of scan, then error. (cpp, cc1, as) libvector.a libc.a Sta c libraries
and any other ¢ Problem: Relocatable main2.o addvec.o printf.o object files modules called by printf.o § Command line order ma ers! § Moral: put libraries at the end of the command line. Linker (ld) unix> gcc -L. libtest.o -lmine unix> gcc -L. -lmine libtest.o Fully linked p2 libtest.o: In function `main': executable object file libtest.o(.text+0x4): undefined reference to `libfun'
33 34
Carnegie Mellon Carnegie Mellon Loading Executable Object Files Shared Libraries Memory outside 32-bit Executable Object File Kernel virtual memory 0 address space ¢ Sta c libraries have the following disadvantages: ELF header 0x100000000 User stack § Duplica on in the stored executables (every func on need std libc) Program header table (created at run me) (required for executables) %esp § Duplica on in the running executables (stack .init sec on pointer) § Minor bug fixes of system libraries require each applica on to explicitly relink .text sec on Memory-mapped region for shared libraries .rodata sec on 0xf7e9ddc0 .data sec on ¢ Modern solu on: Shared Libraries
.bss sec on brk § Object files that contain code and data that are loaded and linked into .symtab Run- me heap an applica on dynamically, at either load- me or run- me (created by malloc) .debug § Also called: dynamic link libraries, DLLs, .so files Read/write segment Loaded .line (.data, .bss) from the .strtab Read-only segment executable Sec on header table (.init, .text, .rodata) file (required for relocatables) 0x08048000 Unused 0 35 36
9 Carnegie Mellon Carnegie Mellon
Shared Libraries (cont.) Dynamic Linking at Load- me main2.c vector.h unix> gcc -shared -o libvector.so \ addvec.c multvec.c ¢ Dynamic linking can occur when executable is first loaded and run (load- me linking). Translators (cpp, cc1, as) libc.so § Common case for Linux, handled automa cally by the dynamic linker libvector.so (ld-linux.so). Relocatable main2.o Reloca on and symbol § Standard C library (libc.so) usually dynamically linked. object file table info Linker (ld) ¢ Dynamic linking can also occur a er program has begun (run- me linking). Par ally linked p2 § In Linux, this is done by calls to the dlopen() interface. executable object file § Distribu ng so ware. Loader § High-performance web servers. libc.so (execve) libvector.so § Run me library interposi oning. Code and data ¢ Shared library rou nes can be shared by mul ple processes. Fully linked § More on this when we learn about virtual memory executable Dynamic linker (ld-linux.so) in memory 37 38
Carnegie Mellon Carnegie Mellon Dynamic Linking at Run- me Dynamic Linking at Run- me
#include
/* dynamically load the shared lib that contains addvec() */ /* unload the shared library */ handle = dlopen("./libvector.so", RTLD_LAZY); if (dlclose(handle) < 0) { if (!handle) { fprintf(stderr, "%s\n", dlerror()); fprintf(stderr, "%s\n", dlerror()); exit(1); exit(1); } } return 0; }
39 40
10 Carnegie Mellon Carnegie Mellon
Today Case Study: Library Interposi oning
¢ Linking ¢ Library interposi oning : powerful linking technique that ¢ Case study: Library interposi oning allows programmers to intercept calls to arbitrary func ons ¢ Interposi oning can occur at: § Compile me: When the source code is compiled § Link me: When the relocatable object files are sta cally linked to form an executable object file § Load/run me: When an executable object file is loaded into memory, dynamically linked, and then executed.
41 42
Carnegie Mellon Carnegie Mellon
Some Interposi oning Applica ons Example program
¢ Security #include
43 44
11 Carnegie Mellon Carnegie Mellon
Compile- me Interposi oning Compile- me Interposi oning
#ifdef COMPILETIME #define malloc(size) mymalloc(size, __FILE__, __LINE__ ) /* Compile-time interposition of malloc and free using C #define free(ptr) myfree(ptr, __FILE__, __LINE__ ) * preprocessor. A local malloc.h file defines malloc (free)
* as wrappers mymalloc (myfree) respectively. void *mymalloc(size_t size, char *file, int line); */ void myfree(void *ptr, char *file, int line);
malloc.h #include
Carnegie Mellon Carnegie Mellon
Link- me Interposi oning Link- me Interposi oning
linux> make hellol #ifdef LINKTIME gcc -O2 -Wall -DLINKTIME -c mymalloc.c /* Link-time interposition of malloc and free using the gcc -O2 -Wall -Wl,--wrap,malloc -Wl,--wrap,free \ static linker's (ld) "--wrap symbol" flag. */ -o hellol hello.c mymalloc.o linux> make runl #include
/* * __wrap_malloc - malloc wrapper function ¢ The “-Wl” flag passes argument to linker */ ¢ Telling linker “--wrap,malloc ” tells it to resolve void *__wrap_malloc(size_t size) { references in a special way: void *ptr = __real_malloc(size); § Refs to malloc should be resolved as __wrap_malloc printf("malloc(%d) = %p\n", (int)size, ptr); § Refs to __real_malloc should be resolved as malloc return ptr; } mymalloc.c 47 48
12 Carnegie Mellon Carnegie Mellon #ifdef RUNTIME /* Run-time interposition of malloc and free based on * dynamic linker's (ld-linux.so) LD_PRELOAD mechanism */ Load/Run- me Interposi oning #define _GNU_SOURCE #include
/* get address of libc malloc */ if (!mallocp) { mallocp = dlsym(RTLD_NEXT, "malloc"); ¢ The LD_PRELOAD environment variable tells the dynamic if ((error = dlerror()) != NULL) { fputs(error, stderr); linker to resolve unresolved refs (e.g., to malloc)by looking exit(1); in libdl.so and mymalloc.so first. } § } libdl.so necessary to resolve references to the dlopen ptr = mallocp(size); func ons. printf("malloc(%d) = %p\n", (int)size, ptr); return ptr; } mymalloc.c 49 50
Carnegie Mellon
Interposi oning Recap
¢ Compile Time § Apparent calls to malloc/free get macro-expanded into calls to mymalloc/myfree ¢ Link Time § Use linker trick to have special name resolu ons § malloc à __wrap_malloc § __real_malloc à malloc ¢ Compile Time § Implement custom version of malloc/free that use dynamic linking to load library malloc/free under different names
51
13