Linking Today Example C Program Stapc Linking

Carnegie Mellon Carnegie Mellon Today ¢ Linking Linking ¢ Case study: Library interposioning 15-213 / 18-213: Introduc2on to Computer Systems 12th Lecture, Feb. 23, 2012 Instructors: Todd C. Mowry & Anthony Rowe 1 2 Carnegie Mellon Carnegie Mellon Example C Program Sta7c Linking ¢ Programs are translated and linked using a compiler driver: main.c swap.c § unix> gcc -O2 -g -o p main.c swap.c int buf[2] = {1, 2}; extern int buf[]; § unix> ./p int main() int *bufp0 = &buf[0]; { static int *bufp1; main.c swap.c Source files swap(); return 0; void swap() Translators Translators } { (cpp, cc1, as) (cpp, cc1, as) int temp; Separately compiled bufp1 = &buf[1]; main.o swap.o relocatable object files temp = *bufp0; *bufp0 = *bufp1; Linker (ld) *bufp1 = temp; } Fully linked executable object file p (contains code and data for all func;ons defined in main.c and swap.c) 3 4 1 Carnegie Mellon Carnegie Mellon Why Linkers? Why Linkers? (cont) ¢ Reason 1: Modularity ¢ Reason 2: Efficiency § Program can be wriKen as a coLLec2on of smalLer source fiLes, § Time: Separate compiLaon rather than one monoLithic mass. § Change one source fiLe, compiLe, and then relink. § No need to recompiLe other source fiLes. § Can buiLd Libraries of common func2ons (more on this Later) § e.g., Math Library, standard C Library § Space: Libraries § Common func2ons can be aggregated into a singLe fiLe... § Yet executabLe fiLes and running memory images contain onLy code for the func2ons they actualLy use. 5 6 Carnegie Mellon Carnegie Mellon What Do Linkers Do? What Do Linkers Do? (cont) ¢ Step 1. Symbol resoluon ¢ Step 2. Reloca7on § Programs define and reference symbols (variabLes and func2ons): § Merges separate code and data sec2ons into singLe sec2ons § void swap() {…} /* define symbol swap */ § swap(); /* reference symbol a */ § ReLocates symboLs from their reLave Locaons in the .o fiLes to § int *xp = &x; /* define symbol xp, reference x */ their final absoLute memory Locaons in the executabLe. § SymboL defini2ons are stored (by compiLer) in symbol table. § Updates alL references to these symboLs to reflect their new § SymboL tabLe is an array of structs posions. § Each entry incLudes name, size, and Locaon of symboL. § Linker associates each symboL reference with exactly one symboL defini2on. 7 8 2 Carnegie Mellon Carnegie Mellon Three Kinds of Object Files (Modules) Executable and Linkable Format (ELF) ¢ Relocatable object file (.o file) ¢ Standard binary format for object files § Contains code and data in a form that can be combined with other ¢ Originally proposed by AT&T System V Unix reLocatabLe object fiLes to form executabLe object fiLe. § Later adopted by BSD Unix variants and Linux § Each .o fiLe is produced from exactly one source (.c) file ¢ One unified format for § ReLocatabLe object fiLes (.o), ¢ Executable object file (a.out file) § ExecutabLe object fiLes (a.out) § Contains code and data in a form that can be copied directly into § Shared object fiLes (.so) memory and then executed. ¢ Generic name: ELF binaries ¢ Shared object file (.so file) § Special type of reLocatabLe object fiLe that can be Loaded into memory and Linked dynamicalLy, at either Load 2me or run-2me. § CalLed Dynamic Link Libraries (DLLs) by Windows 9 10 Carnegie Mellon Carnegie Mellon ELF Object File Format ELF Object File Format (cont.) ¢ Elf header § Word size, byte ordering, fiLe type (.o, ¢ .symtab secon 0 0 exec, .so), machine type, etc. ELF header § SymboL tabLe ELF header § Procedure and stac variabLe names ¢ Segment header table Segment header table Segment header table § Sec2on names and Locaons § Page size, virtual addresses memory segments (required for executables) (required for executables) (sec2ons), segment sizes. .text secon ¢ .rel.text secon .text secon § ReLocaon info for .text secon ¢ .text secon .rodata secon § Addresses of instruc2ons that wiLL need to be .rodata secon § Code .data secon modified in the executabLe .data secon § Instruc2ons for modifying. ¢ .rodata sec7on .bss secon .bss secon § Read onLy data: jump tabLes, ... ¢ .rel.data secon .symtab sec7on § ReLocaon info for .data secon .symtab sec7on ¢ .data secon .rel.txt sec7on § Addresses of pointer data that wiLL need to be .rel.txt sec7on § Ini2alized gLobal variabLes modified in the merged executabLe .rel.data sec7on .rel.data sec7on ¢ .bss secon ¢ .debug secon .debug sec7on .debug sec7on § Unini2alized gLobal variabLes § Info for symboLic debugging (gcc -g) § “BLock Started by SymboL” Sec7on header table ¢ Sec7on header table Sec7on header table § “BeKer Save Space” § Offsets and sizes of each sec2on § Has sec2on header but occupies no space 11 12 3 Carnegie Mellon Carnegie Mellon Linker Symbols Resolving Symbols Global External Local ¢ Global symbols Global § Symbols defined by module m that can be referenced by other moduLes. int buf[2] = {1, 2}; extern int buf[]; § E.g.: non-static C func2ons and non-static gLobal variabLes. int main() int *bufp0 = &buf[0]; { static int *bufp1; ¢ External symbols swap(); § GLobal symboLs that are referenced by moduLe m but defined by some return 0; void swap() Global other moduLe. } main.c { int temp; ¢ Local symbols External Linker knows bufp1 = &buf[1]; § SymboLs that are defined and referenced excLusiveLy by moduLe m. nothing of temp temp = *bufp0; § E.g.: C func2ons and variabLes defined with the static aribute. *bufp0 = *bufp1; *bufp1 = temp; § Local linker symbols are not local program variables } swap.c 13 14 Carnegie Mellon Carnegie Mellon Reloca7ng Code and Data Reloca7on Info (main) Relocatable Object Files Executable Object File main.c main.o int buf[2] = 0000000 <main>: {1,2}; 0: 8d 4c 24 04 lea 0x4(%esp),%ecx 4: 83 e4 f0 and $0xfffffff0,%esp System code .text 0 Headers int main() 7: ff 71 fc pushl 0xfffffffc(%ecx) a: 55 push %ebp System data .data System code { b: 89 e5 mov %esp,%ebp swap(); d: 51 push %ecx main() return 0; e: 83 ec 04 sub $0x4,%esp .text main.o } 11: e8 fc ff ff ff call 12 <main+0x12> swap() 12: R_386_PC32 swap main() .text 16: 83 c4 04 add $0x4,%esp More system code 19: 31 c0 xor %eax,%eax int buf[2]={1,2} .data 1b: 59 pop %ecx System data 1c: 5d pop %ebp swap.o int buf[2]={1,2} .data 1d: 8d 61 fc lea 0xfffffffc(%ecx),%esp int *bufp0=&buf[0] 20: c3 ret swap() .text int *bufp1 .bss int *bufp0=&buf[0] .data .symtab .debug Disassembly of section .data: static int *bufp1 .bss Source: objdump –r -d 00000000 <buf>: Even though private to swap, requires allocaon in .bss 0: 01 00 00 00 02 00 00 00 15 16 4 Carnegie Mellon Carnegie Mellon Reloca7on Info (swap, .text) Reloca7on Info (swap, .data) swap.c swap.o swap.c extern int buf[]; Disassembly of section .text: Disassembly of section .data: extern int buf[]; 00000000 <swap>: int 00000000 <bufp0>: *bufp0 = &buf[0]; 0: 8b 15 00 00 00 00 mov 0x0,%edx int *bufp0 = 2: R_386_32 buf &buf[0]; 0: 00 00 00 00 6: a1 04 00 00 00 mov 0x4,%eax static int *bufp1; static int *bufp1; 7: R_386_32 buf 0: R_386_32 buf b: 55 push %ebp void swap() c: 89 e5 mov %esp,%ebp void swap() { e: c7 05 00 00 00 00 04 movl $0x4,0x0 { int temp; 15: 00 00 00 int temp; 10: R_386_32 .bss bufp1 = &buf[1]; 14: R_386_32 buf bufp1 = &buf[1]; 18: 8b 08 mov (%eax),%ecx temp = *bufp0; temp = *bufp0; 1a: 89 10 mov %edx,(%eax) *bufp0 = *bufp1; *bufp0 = *bufp1; 1c: 5d pop %ebp *bufp1 = temp; *bufp1 = temp; 1d: 89 0d 04 00 00 00 mov %ecx,0x4 } 1f: R_386_32 buf } 23: c3 ret 17 18 Carnegie Mellon Carnegie Mellon 0: 8b 15 00 00 00 00 mov 0x0,%edx Executable Before/Aer Reloca7on (.text) 2: R_386_32 buf 0000000 <main>: 6: a1 04 00 00 00 mov 0x4,%eax . 0x8048396 + 0x1a 7: R_386_32 buf e: 83 ec 04 sub $0x4,%esp ... 11: e8 fc ff ff ff call 12 <main+0x12> = 0x80483b0 e: c7 05 00 00 00 00 04 movl $0x4,0x0 12: R_386_PC32 swap 15: 00 00 00 16: 83 c4 04 add $0x4,%esp 10: R_386_32 .bss . 14: R_386_32 buf . 08048380 <main>: 1d: 89 0d 04 00 00 00 mov %ecx,0x4 8048380: 8d 4c 24 04 lea 0x4(%esp),%ecx 1f: R_386_32 buf 8048384: 83 e4 f0 and $0xfffffff0,%esp 23: c3 ret 8048387: ff 71 fc pushl 0xfffffffc(%ecx) 804838a: 55 push %ebp 080483b0 <swap>: 804838b: 89 e5 mov %esp,%ebp 80483b0: 8b 15 20 96 04 08 mov 0x8049620,%edx 804838d: 51 push %ecx 80483b6: a1 24 96 04 08 mov 0x8049624,%eax 804838e: 83 ec 04 sub $0x4,%esp 80483bb: 55 push %ebp 8048391: e8 1a 00 00 00 call 80483b0 <swap> 80483bc: 89 e5 mov %esp,%ebp 8048396: 83 c4 04 add $0x4,%esp 80483be: c7 05 30 96 04 08 24 movl $0x8049624,0x8049630 8048399: 31 c0 xor %eax,%eax 80483c5: 96 04 08 804839b: 59 pop %ecx 80483c8: 8b 08 mov (%eax),%ecx 804839c: 5d pop %ebp 80483ca: 89 10 mov %edx,(%eax) 804839d: 8d 61 fc lea 0xfffffffc(%ecx),%esp 80483cc: 5d pop %ebp 80483a0: c3 ret 80483cd: 89 0d 24 96 04 08 mov %ecx,0x8049624 80483d3: c3 ret 19 20 5 Carnegie Mellon Carnegie Mellon Executable Aer Reloca7on (.data) Strong and Weak Symbols ¢ Program symbols are either strong or weak Disassembly of section .data: § Strong: procedures and ini2alized gLobals 08049620 <buf>: § Weak: unini2alized gLobals 8049620: 01 00 00 00 02 00 00 00 08049628 <bufp0>: 8049628: 20 96 04 08 p1.c p2.c strong int foo=5; int foo; weak strong p1() { p2() { strong } } 21 22 Carnegie Mellon Carnegie Mellon Linker’s Symbol Rules Linker Puzzles int x; Link 2me error: two strong symboLs (p1) ¢ Rule 1: Mul7ple strong symbols are not allowed p1() {} p1() {} § Each item can be defined onLy once § Otherwise: Linker error int x; int x; References to x wiLL refer to the same p1() {} p2() {} unini2alized int.

Linking Today Example C Program Stapc Linking

The “Stabs” Debug Format

Portable Executable File Format

Architectural Support for Dynamic Linking Varun Agrawal, Abhiroop Dabral, Tapti Palit, Yongming Shen, Michael Ferdman COMPAS Stony Brook University

Common Object File Format (COFF)

CS429: Computer Organization and Architecture Linking I & II

Chapter 6: the Linker

Protecting Against Unexpected System Calls

Cray XT Programming Environment's Implementation of Dynamic Shared Libraries

Dynamic Linking Considered Harmful

Linking Basics.Docx Page 1 of 35 Initial Creation: March, 24, 2015 Date Updated: October 4, 2015

Linker Script Guide Emprog

Chapter 13 ARM Image Format