
Carnegie Mellon Carnegie Mellon Today ¢ Linking Linking ¢ Case study: Library interposioning 15-213 / 18-213: Introduc2on to Computer Systems 12th Lecture, Feb. 23, 2012 Instructors: Todd C. Mowry & Anthony Rowe 1 2 Carnegie Mellon Carnegie Mellon Example C Program Sta7c Linking ¢ Programs are translated and linked using a compiler driver: main.c swap.c § unix> gcc -O2 -g -o p main.c swap.c int buf[2] = {1, 2}; extern int buf[]; § unix> ./p int main() int *bufp0 = &buf[0]; { static int *bufp1; main.c swap.c Source files swap(); return 0; void swap() Translators Translators } { (cpp, cc1, as) (cpp, cc1, as) int temp; Separately compiled bufp1 = &buf[1]; main.o swap.o relocatable object files temp = *bufp0; *bufp0 = *bufp1; Linker (ld) *bufp1 = temp; } Fully linked executable object file p (contains code and data for all func;ons defined in main.c and swap.c) 3 4 1 Carnegie Mellon Carnegie Mellon Why Linkers? Why Linkers? (cont) ¢ Reason 1: Modularity ¢ Reason 2: Efficiency § Program can be wriKen as a coLLec2on of smalLer source fiLes, § Time: Separate compiLaon rather than one monoLithic mass. § Change one source fiLe, compiLe, and then relink. § No need to recompiLe other source fiLes. § Can buiLd Libraries of common func2ons (more on this Later) § e.g., Math Library, standard C Library § Space: Libraries § Common func2ons can be aggregated into a singLe fiLe... § Yet executabLe fiLes and running memory images contain onLy code for the func2ons they actualLy use. 5 6 Carnegie Mellon Carnegie Mellon What Do Linkers Do? What Do Linkers Do? (cont) ¢ Step 1. Symbol resoluon ¢ Step 2. Reloca7on § Programs define and reference symbols (variabLes and func2ons): § Merges separate code and data sec2ons into singLe sec2ons § void swap() {…} /* define symbol swap */ § swap(); /* reference symbol a */ § ReLocates symboLs from their reLave Locaons in the .o fiLes to § int *xp = &x; /* define symbol xp, reference x */ their final absoLute memory Locaons in the executabLe. § SymboL defini2ons are stored (by compiLer) in symbol table. § Updates alL references to these symboLs to reflect their new § SymboL tabLe is an array of structs posions. § Each entry incLudes name, size, and Locaon of symboL. § Linker associates each symboL reference with exactly one symboL defini2on. 7 8 2 Carnegie Mellon Carnegie Mellon Three Kinds of Object Files (Modules) Executable and Linkable Format (ELF) ¢ Relocatable object file (.o file) ¢ Standard binary format for object files § Contains code and data in a form that can be combined with other ¢ Originally proposed by AT&T System V Unix reLocatabLe object fiLes to form executabLe object fiLe. § Later adopted by BSD Unix variants and Linux § Each .o fiLe is produced from exactly one source (.c) file ¢ One unified format for § ReLocatabLe object fiLes (.o), ¢ Executable object file (a.out file) § ExecutabLe object fiLes (a.out) § Contains code and data in a form that can be copied directly into § Shared object fiLes (.so) memory and then executed. ¢ Generic name: ELF binaries ¢ Shared object file (.so file) § Special type of reLocatabLe object fiLe that can be Loaded into memory and Linked dynamicalLy, at either Load 2me or run-2me. § CalLed Dynamic Link Libraries (DLLs) by Windows 9 10 Carnegie Mellon Carnegie Mellon ELF Object File Format ELF Object File Format (cont.) ¢ Elf header § Word size, byte ordering, fiLe type (.o, ¢ .symtab secon 0 0 exec, .so), machine type, etc. ELF header § SymboL tabLe ELF header § Procedure and stac variabLe names ¢ Segment header table Segment header table Segment header table § Sec2on names and Locaons § Page size, virtual addresses memory segments (required for executables) (required for executables) (sec2ons), segment sizes. .text secon ¢ .rel.text secon .text secon § ReLocaon info for .text secon ¢ .text secon .rodata secon § Addresses of instruc2ons that wiLL need to be .rodata secon § Code .data secon modified in the executabLe .data secon § Instruc2ons for modifying. ¢ .rodata sec7on .bss secon .bss secon § Read onLy data: jump tabLes, ... ¢ .rel.data secon .symtab sec7on § ReLocaon info for .data secon .symtab sec7on ¢ .data secon .rel.txt sec7on § Addresses of pointer data that wiLL need to be .rel.txt sec7on § Ini2alized gLobal variabLes modified in the merged executabLe .rel.data sec7on .rel.data sec7on ¢ .bss secon ¢ .debug secon .debug sec7on .debug sec7on § Unini2alized gLobal variabLes § Info for symboLic debugging (gcc -g) § “BLock Started by SymboL” Sec7on header table ¢ Sec7on header table Sec7on header table § “BeKer Save Space” § Offsets and sizes of each sec2on § Has sec2on header but occupies no space 11 12 3 Carnegie Mellon Carnegie Mellon Linker Symbols Resolving Symbols Global External Local ¢ Global symbols Global § Symbols defined by module m that can be referenced by other moduLes. int buf[2] = {1, 2}; extern int buf[]; § E.g.: non-static C func2ons and non-static gLobal variabLes. int main() int *bufp0 = &buf[0]; { static int *bufp1; ¢ External symbols swap(); § GLobal symboLs that are referenced by moduLe m but defined by some return 0; void swap() Global other moduLe. } main.c { int temp; ¢ Local symbols External Linker knows bufp1 = &buf[1]; § SymboLs that are defined and referenced excLusiveLy by moduLe m. nothing of temp temp = *bufp0; § E.g.: C func2ons and variabLes defined with the static aribute. *bufp0 = *bufp1; *bufp1 = temp; § Local linker symbols are not local program variables } swap.c 13 14 Carnegie Mellon Carnegie Mellon Reloca7ng Code and Data Reloca7on Info (main) Relocatable Object Files Executable Object File main.c main.o int buf[2] = 0000000 <main>: {1,2}; 0: 8d 4c 24 04 lea 0x4(%esp),%ecx 4: 83 e4 f0 and $0xfffffff0,%esp System code .text 0 Headers int main() 7: ff 71 fc pushl 0xfffffffc(%ecx) a: 55 push %ebp System data .data System code { b: 89 e5 mov %esp,%ebp swap(); d: 51 push %ecx main() return 0; e: 83 ec 04 sub $0x4,%esp .text main.o } 11: e8 fc ff ff ff call 12 <main+0x12> swap() 12: R_386_PC32 swap main() .text 16: 83 c4 04 add $0x4,%esp More system code 19: 31 c0 xor %eax,%eax int buf[2]={1,2} .data 1b: 59 pop %ecx System data 1c: 5d pop %ebp swap.o int buf[2]={1,2} .data 1d: 8d 61 fc lea 0xfffffffc(%ecx),%esp int *bufp0=&buf[0] 20: c3 ret swap() .text int *bufp1 .bss int *bufp0=&buf[0] .data .symtab .debug Disassembly of section .data: static int *bufp1 .bss Source: objdump –r -d 00000000 <buf>: Even though private to swap, requires allocaon in .bss 0: 01 00 00 00 02 00 00 00 15 16 4 Carnegie Mellon Carnegie Mellon Reloca7on Info (swap, .text) Reloca7on Info (swap, .data) swap.c swap.o swap.c extern int buf[]; Disassembly of section .text: Disassembly of section .data: extern int buf[]; 00000000 <swap>: int 00000000 <bufp0>: *bufp0 = &buf[0]; 0: 8b 15 00 00 00 00 mov 0x0,%edx int *bufp0 = 2: R_386_32 buf &buf[0]; 0: 00 00 00 00 6: a1 04 00 00 00 mov 0x4,%eax static int *bufp1; static int *bufp1; 7: R_386_32 buf 0: R_386_32 buf b: 55 push %ebp void swap() c: 89 e5 mov %esp,%ebp void swap() { e: c7 05 00 00 00 00 04 movl $0x4,0x0 { int temp; 15: 00 00 00 int temp; 10: R_386_32 .bss bufp1 = &buf[1]; 14: R_386_32 buf bufp1 = &buf[1]; 18: 8b 08 mov (%eax),%ecx temp = *bufp0; temp = *bufp0; 1a: 89 10 mov %edx,(%eax) *bufp0 = *bufp1; *bufp0 = *bufp1; 1c: 5d pop %ebp *bufp1 = temp; *bufp1 = temp; 1d: 89 0d 04 00 00 00 mov %ecx,0x4 } 1f: R_386_32 buf } 23: c3 ret 17 18 Carnegie Mellon Carnegie Mellon 0: 8b 15 00 00 00 00 mov 0x0,%edx Executable Before/Aer Reloca7on (.text) 2: R_386_32 buf 0000000 <main>: 6: a1 04 00 00 00 mov 0x4,%eax . 0x8048396 + 0x1a 7: R_386_32 buf e: 83 ec 04 sub $0x4,%esp ... 11: e8 fc ff ff ff call 12 <main+0x12> = 0x80483b0 e: c7 05 00 00 00 00 04 movl $0x4,0x0 12: R_386_PC32 swap 15: 00 00 00 16: 83 c4 04 add $0x4,%esp 10: R_386_32 .bss . 14: R_386_32 buf . 08048380 <main>: 1d: 89 0d 04 00 00 00 mov %ecx,0x4 8048380: 8d 4c 24 04 lea 0x4(%esp),%ecx 1f: R_386_32 buf 8048384: 83 e4 f0 and $0xfffffff0,%esp 23: c3 ret 8048387: ff 71 fc pushl 0xfffffffc(%ecx) 804838a: 55 push %ebp 080483b0 <swap>: 804838b: 89 e5 mov %esp,%ebp 80483b0: 8b 15 20 96 04 08 mov 0x8049620,%edx 804838d: 51 push %ecx 80483b6: a1 24 96 04 08 mov 0x8049624,%eax 804838e: 83 ec 04 sub $0x4,%esp 80483bb: 55 push %ebp 8048391: e8 1a 00 00 00 call 80483b0 <swap> 80483bc: 89 e5 mov %esp,%ebp 8048396: 83 c4 04 add $0x4,%esp 80483be: c7 05 30 96 04 08 24 movl $0x8049624,0x8049630 8048399: 31 c0 xor %eax,%eax 80483c5: 96 04 08 804839b: 59 pop %ecx 80483c8: 8b 08 mov (%eax),%ecx 804839c: 5d pop %ebp 80483ca: 89 10 mov %edx,(%eax) 804839d: 8d 61 fc lea 0xfffffffc(%ecx),%esp 80483cc: 5d pop %ebp 80483a0: c3 ret 80483cd: 89 0d 24 96 04 08 mov %ecx,0x8049624 80483d3: c3 ret 19 20 5 Carnegie Mellon Carnegie Mellon Executable Aer Reloca7on (.data) Strong and Weak Symbols ¢ Program symbols are either strong or weak Disassembly of section .data: § Strong: procedures and ini2alized gLobals 08049620 <buf>: § Weak: unini2alized gLobals 8049620: 01 00 00 00 02 00 00 00 08049628 <bufp0>: 8049628: 20 96 04 08 p1.c p2.c strong int foo=5; int foo; weak strong p1() { p2() { strong } } 21 22 Carnegie Mellon Carnegie Mellon Linker’s Symbol Rules Linker Puzzles int x; Link 2me error: two strong symboLs (p1) ¢ Rule 1: Mul7ple strong symbols are not allowed p1() {} p1() {} § Each item can be defined onLy once § Otherwise: Linker error int x; int x; References to x wiLL refer to the same p1() {} p2() {} unini2alized int.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-