<<

Summer 2013 CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization Kitty Reeves MTWR 10:20-12:25pm

1 Drivers = GCC

When you invoke GCC, it normally does preprocessing, compilation, assembly and linking, as needed, on behalf of the user accepts options and file names as operands % gcc –O1 -g -o prog main.c swap.c % man gcc

Pre-processor (cpp)  .c to .i Compiler (cc1)  .i file to .s Compiler Assembler (as)  .s to “relocatable” .o driver (ld)  creates object file called a.out as default

Loader  the shell invokes a function in the OS to call the Copies the code and data in the executable file into memory Transfers control to the beginning of the program

Remember the flow chart of this info that we’ve seen multiple times already?

2 Overview example

3 Compiling, linking…

What happens when you use the compiler driver? gcc -O2 -g -o test main.c swap.c

Step1 – Preprocessor (cpp) The C preprocessor translates the C source file main.c into an ASCII intermediate file main.i cpp [other args] main.c /tmp/main.i

Step2 – C compiler (ccl) Driver runs C compiler cc1  translates main.i into an ASCII file main.s cc1 /tmp/main.i main.c -O2 [other args] -o /tmp/main.s

4 Compiling, Linking… (cont)

Step3 – Assembler (as) driver runs assembler as to translate main.s into relocatable object file main.o as [other args] -o /tmp/main.o /tmp/main.s Same 3 steps are executed for swap.c

Step4 – Linker (ld) driver runs linker ld to combine main.o and swap.o into the executable file test ld -o test [sys obj files and args] /tmp/main.o /tmp/swap.o

5 So far…

test

6 Compiling, Linking… Executing

unix> ./test shell invokes function in OS called loader copy code and data of executable test into memory then transfers control to the beginning of the *main* function

7 Classic Unix

Linker is inside of gcc command Loader is part of exec Executable image contains all object and modules needed by program Entire image is loaded at once Every image contains copy of common library routines Every loaded program contain duplicate copy of library routines

8 Object Files

Putting it all together: Compiler and assembler generate relocatable object files, including shared object files. Linker generates executable object files. Object formats first Unix systems used a.out format (the term is still used) early System V used Common Object (COFF) and other modern unix systems use Executable and Linkable Format (ELF)***

9 Types of object files (file.o)

Object files, created by the assembler and link editor, are binary representations of programs intended to be executed directly on a processor A relocatable object file holds code and data suitable for linking with other object files to create an executable or a shared object file.  Each .o file is produced from exactly one source (.c) file

An executable object file (default is a.out) holds code and data that can be copied directly into memory then executed.

A shared object file (.so file) is a special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time.  Load time: the link editor processes the shared object file with other relocatable and shared object files to create another object file  Run time: the combines it with an executable file and other shared objects to create a process image

10 Linkage

Combining a set of programs, including library routines, to create a loadable image Resolving symbols defined within the set Listing symbols needing to be resolved by loader After the individual source files comprising a program are compiled, the object files are linked together with functions from one or more libraries to form the executable program When the same identifier appears in more than one source file, do they refer to the same entity or two different entities?

11 Static linking

What does the static linker do? takes as input a collection of relocatable object files and command-line arguments generates fully linked executable object file that can be loaded and run

Linker performs two tasks: symbol resolution: associates each symbol reference with exactly one symbol definition : compiler and assembler generate code and data sections that start at address 0.  Linker relocates these sections: associates memory location with each symbol definition, then updates all references to point to these locations.

12 Static linking – What do linkers do?

Step 1. Symbol resolution Programs define and reference “symbols” (variables and functions):  void swap() {…} // define symbol swap  swap(); // reference to a symbol swap  int *xp = &x; // define symbol xp, reference x Symbol definitions are stored (by compiler) in a “”  A symbol table is an array of structs  Each entry includes name, size, and location of symbol Linker associates each symbol reference with exactly one symbol definition  % –d –t hello // -t option gives symbol table

13 Static linking – What do linkers do?

Step 2. Relocation Merges separate code and data sections into single sections Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable. Updates all references to these symbols to reflect their new positions.

14 Linking Summary

Linker – key part of OS Combines object files and libraries into a “standard” format that the OS loader can interpret Resolves references and does static relocation of addresses Creates information for loader to complete binding process Supports dynamic shared libraries

15 Loading

Copying the loadable image into memory, connecting it with any other programs already loaded, and updating addresses as needed (in unix) interpreting file to initialize the process address space (in all systems) kernel image is special (own format)

16 From source code to a process

Process = Systems 2

17 Static Linking and Loading

#include int main() { printf("Hello World"); return 0; } //HelloWorld.c

r

18

Routine is not loaded until it is called Better memory-space utilization; unused routine is never loaded. Useful when large amounts of code are needed to handle infrequently occurring cases.

19 Shared Libraries

Observation – “everyone” links to standard libraries (libc.a, etc.) Consume space in every executable image every process memory at runtime Would it be possible to the common libraries? Automatically load at runtime? Libraries designated as “shared” .so, .dll, etc. Supported by corresponding “.a” libraries containing symbol information Linker sets up symbols to be resolved at runtime Loader: Is library already in memory? If so, map into new process space If not, load and then map

20 Run-time Linking/Loading

21 Executable files

The OS expects executable files to have a specific format Header info  Code locations  Data locations Code & data Symbol Table  List of names of things defined in your program and where they are defined  List of names of things defined elsewhere that are used by your program, and where they are used.

22 Example

23 Example (cont)

24 0000000000000000

: Hello World 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp revisited 4: bf 00 00 00 00 mov $0x0,%edi 9: b8 00 00 00 00 mov $0x0,%eax #include Object file e: e8 00 00 00 00 callq 13 int main() { has not 13: b8 00 00 00 00 mov $0x0,%eax printf("Hello World"); been 18: c9 leaveq return 0; linked yet 19: c3 retq } //hello.c % gcc -S -m32 hello.c OR %gcc -c -m32 hello.c then % objdump -d hello.o %gcc -m32 -o hello hello.c % objdump -d hello 00000000004004cc
: 4004cc: 55 push %rbp Relocatable 4004cd: 48 89 e5 mov %rsp,%rbp 4004d0: bf dc 05 40 00 mov $0x4005dc,%edi vs 4004d5: b8 00 00 00 00 mov $0x0,%eax Executable 4004da: e8 e1 fe ff ff callq 4003c0 4004df: b8 00 00 00 00 mov $0x0,%eax 4004e4: c9 leaveq 4004e5: c3 retq 25 ELF - Executable and Linkable Format

From wiki: The segments contain information that is necessary for runtime execution of the file, LOADING LINKING while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section. ELF describes two separate ‘views’ of an executable, a linking view and a loading view Linking view is used at static link time to combine relocatable files Loading view is used at run time to load and execute program

26 ELF Link View

Not all sections needed at run time Information used for linking information

Difference between link time and run time

27 Program Linkage Table

Stored in the .plt section Allows to call functions that aren’t present at compile time Shared library functions (e.g printf()) Set of function stubs Relocations point them to real location of the functions Normally relocated ‘lazily’

28 ELF Loading View w/ segment types

Much simpler view, divides executable into ‘Segments’ Describes Parts of file to be loaded into memory at run time Locations of important data at run time Segments have: A simple type Requested memory location Permissions (R/W/X) Size (in file and in memory)

29 ELF Linking to Loading

Semantics of section table (Linking View) are irrelevant in Loading View Section information can be removed from executable routines to load executable and begin execution

30 Object File Format/Organization The object file formats provide parallel views of a file's contents, reflecting the differing needs of those activities. ELF header (executable and linkable Linking View format) resides at the beginning and holds a “road map” describing the file's organization. Program header table Tells the system how to create a process image  Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one. Loading View

31 Object File Format/Organization (cont) Section header table Contains information describing the file's sections Every section has an entry in the table  each entry gives information such as the section name, the section size, and so on. Sections Hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, etc. Files used during linking must have a section header table; other object files may or may not have one.

FYI: Although the figure shows the program header table immediately after the ELF header, and the section header table following the sections, actual files may differ. Moreover, sections and segments have no specified order. Only the ELF header has a fixed position in the file. 32 ELF Object File Format (details)

33 ELF Object File Format (cont)

34 Processes (section 8.2)

What is a process? Provides each program with the illusion that is has exclusive use of the processor and memory What is a process image? Process address space

35 Linker Symbols (reminders)

Global symbols Symbols defined by module m that can be referenced by other modules External symbols Global symbols that are referenced by module m but defined by some other module Local symbols Symbols that are defined and referenced exclusively by module m  Ex. C functions and variables defined with the “static” attribute Linker symbol locations are different than locations

36 Resolving Symbols

37 Relocating Code and Data

38 Practice problem 7.1 (pg 662)

SYMBOL TABLE .symtab Info about functions and global variables that are defined and referenced in a program Does not contain entries for local variables

Understanding the relationship between linker symbols and C variables/functions… notice that the C local variable temp does NOT have a symbol table entry. Why? It goes on the stack!

Symbol swap.o .symtab entry? symbol type module where defined section buf yes extern main.o .data bufp0 yes global swap.o .data bufp1 yes globallocal swap.o .bss swap yes global swap.o .text temp no ------39 Swap relocatable symbol table

% objdump -r -d -t swap.o swap.o: file format elf32-i386 SYMBOL TABLE: 00000000 l df *ABS* 00000000 swap.c 00000000 l d .text 00000000 .text 00000000 l d .data 00000000 .data 00000000 l d .bss 00000000 .bss 00000000 l O .bss 00000004 bufp1 00000000 g O .data 00000004 bufp0 00000000 *UND* 00000000 buf 00000000 g F .text 00000035 swap

O = object F = function d = debug f = file l = local g = global

40 Symbols and Symbol Tables

Local linker symbols != local program variables .symtab does not contain any symbols that correspond to local non-static program variables. These are managed at run time on the stack and are not of interest to the linker However… local procedure variables that are defined with the C static attribute (EXCEPTION) are not managed on the stack  The compiler allocates space in .data or .bss for each and created a local linker symbol in the symbol table with a unique name

FYI: ’s lifetime extends across the entire run of the program where local variables are allocated and deallocated on the stack

41 SYMBOL TABLES

Built by assemblers using symbols exported by the compiler into the .s file An ELF symbol table is contained in the .symtab section It contains an array of entries where each entry contains: Symbol’s value i.e. address Flag bits (l=local, g = global, F=function, etc)  Characters and spaces – up to 7 bits Section or  *ABS* (absolute – not in any section)  *UND* if referenced but not defined Alignment or size Symbol’s name

42 Relocation

Relocation merges the input modules and assigns run- time addresses to each symbol When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory or the locations of any externally defined functions or global variables referenced by the module A “relocation entry” is generated when the assembler encounters a reference to an object whose ultimate location is unknown 2 types R_386_PC32 R_386_32

43 2 Relocation types

R_386_PC32 relocate a reference that uses a 32-bit PC-relative address. Effective address = PC + instruction encoded addr R_386_32 Absolute addressing Uses the value encoded in the instruction

44 Relocation Info

% gcc –c –m32 main.c NOTE: relocation % objdump -r -tdata main.o entries and instructions are SYMBOL TABLE: stored in different 00000000 l df *ABS* 00000000 main.c 00000000 l d .text 00000000 .text sections of the object 00000000 l d .data 00000000 .data file  .rel.txt vs .text 00000000 l d .bss 00000000 .bss 00000000 g O .data 00000008 buf .text + .offset 00000000 g F .text 00000014 main % gcc -o main -m32 main.c 0x80483b4 + 0x7 00000000 *UND* 00000000 swap = 0x80483bb = ref addr /tmp/ccEVrEUg.o: In function `main': Disassembly of section .text: 0x80483b4 main.c:(.text+0x7): undefined 00000000

: 0x80483c8 reference to `swap' 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp collect2: call swap 3: 83 e4 f0 and $0xfffffff0,%esp ld returned 1 exit status 6: e8 fc ff ff ff call 7 7: R_386_PC32 swap Relocation Call instruction @ offset 0x6, b: b8 00 00 00 00 mov $0x0,%eax entry opcode e8, 32-bit ref (-4) 10: 89 ec mov %ebp,%esp Offset = 0x7 Symbol = swap 12: 5d pop %ebp .symbol swap + 32-bit ref – ref addr Type = R_386_PC32 13: c3 ret 0x80483c8 + -4 - 0x80483bb = 0x9 45 Executable before/after relocation

Return address R_386_PC32 relocate a Location of swap function reference that uses a 32-bit PC-relative address.

Effective address = PC + instruction encoded address

.text + .offset = 0x8048380 + 0x12 = 0x8048392  reference address Call + 32-bit ref – reference address = 0x80483b0 + -4 – 0x8048392 = 0x1a 46 Relocation Info .text and .data

[0]

[1]

47 Executable After Relocation and External Reference Resolution

48 Chp. 7 Linking – building a program vs program execution which runs a program

EFFICIENCY issue: Small change requires recompilation i.e. text file MODULARITY issue: hard to share common functions (i.e. printf) SOLUTION  static LINKER

Compile each C program like LAB3 separately to create .o file: % gcc –c m.c % gcc –c a.c

Combined .o files into single executable, then run: % gcc m.o a.o –o p % p 49 Relocating Symbols and Resolving External References (another example)

50 Relocation information (m.c)

51 Relocation information (a.c)

52 Executable after relocation with external reference resolution

53