Summer 2013 CSE2421 Systems1 Introduction to Low-Level Programming and Computer Organization Kitty Reeves MTWR 10:20-12:25pm
1 Compiler Drivers = GCC
When you invoke GCC, it normally does preprocessing, compilation, assembly and linking, as needed, on behalf of the user accepts options and file names as operands % gcc –O1 -g -o prog main.c swap.c % man gcc
Pre-processor (cpp) .c to .i Compiler (cc1) .i file to .s Compiler Assembler (as) .s to “relocatable” object file .o driver Linker (ld) creates executable object file called a.out as default
Loader the unix shell invokes a function in the OS to call the loader Copies the code and data in the executable file into memory Transfers control to the beginning of the program
Remember the flow chart of this info that we’ve seen multiple times already?
2 Overview example
3 Compiling, linking…
What happens when you use the compiler driver? gcc -O2 -g -o test main.c swap.c
Step1 – Preprocessor (cpp) The C preprocessor translates the C source file main.c into an ASCII intermediate file main.i cpp [other args] main.c /tmp/main.i
Step2 – C compiler (ccl) Driver runs C compiler cc1 translates main.i into an ASCII assembly language file main.s cc1 /tmp/main.i main.c -O2 [other args] -o /tmp/main.s
4 Compiling, Linking… (cont)
Step3 – Assembler (as) driver runs assembler as to translate main.s into relocatable object file main.o as [other args] -o /tmp/main.o /tmp/main.s Same 3 steps are executed for swap.c
Step4 – Linker (ld) driver runs linker ld to combine main.o and swap.o into the executable file test ld -o test [sys obj files and args] /tmp/main.o /tmp/swap.o
5 So far…
test
6 Compiling, Linking… Executing
unix> ./test shell invokes function in OS called loader copy code and data of executable test into memory then transfers control to the beginning of the *main* function
7 Classic Unix
Linker is inside of gcc command Loader is part of exec system call Executable image contains all object and library modules needed by program Entire image is loaded at once Every image contains copy of common library routines Every loaded program contain duplicate copy of library routines
8 Object Files
Putting it all together: Compiler and assembler generate relocatable object files, including shared object files. Linker generates executable object files. Object formats first Unix systems used a.out format (the term is still used) early System V used Common Object File format (COFF) Linux and other modern unix systems use Executable and Linkable Format (ELF)***
9 Types of object files (file.o)
Object files, created by the assembler and link editor, are binary representations of programs intended to be executed directly on a processor A relocatable object file holds code and data suitable for linking with other object files to create an executable or a shared object file. Each .o file is produced from exactly one source (.c) file
An executable object file (default is a.out) holds code and data that can be copied directly into memory then executed.
A shared object file (.so file) is a special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run time. Load time: the link editor processes the shared object file with other relocatable and shared object files to create another object file Run time: the dynamic linker combines it with an executable file and other shared objects to create a process image
10 Linkage
Combining a set of programs, including library routines, to create a loadable image Resolving symbols defined within the set Listing symbols needing to be resolved by loader After the individual source files comprising a program are compiled, the object files are linked together with functions from one or more libraries to form the executable program When the same identifier appears in more than one source file, do they refer to the same entity or two different entities?
11 Static linking
What does the static linker do? takes as input a collection of relocatable object files and command-line arguments generates fully linked executable object file that can be loaded and run
Linker performs two tasks: symbol resolution: associates each symbol reference with exactly one symbol definition relocation: compiler and assembler generate code and data sections that start at address 0. Linker relocates these sections: associates memory location with each symbol definition, then updates all references to point to these locations.
12 Static linking – What do linkers do?
Step 1. Symbol resolution Programs define and reference “symbols” (variables and functions): void swap() {…} // define symbol swap swap(); // reference to a symbol swap int *xp = &x; // define symbol xp, reference x Symbol definitions are stored (by compiler) in a “symbol table” A symbol table is an array of structs Each entry includes name, size, and location of symbol Linker associates each symbol reference with exactly one symbol definition %objdump –d –t hello // -t option gives symbol table
13 Static linking – What do linkers do?
Step 2. Relocation Merges separate code and data sections into single sections Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable. Updates all references to these symbols to reflect their new positions.
14 Linking Summary
Linker – key part of OS Combines object files and libraries into a “standard” format that the OS loader can interpret Resolves references and does static relocation of addresses Creates information for loader to complete binding process Supports dynamic shared libraries
15 Loading
Copying the loadable image into memory, connecting it with any other programs already loaded, and updating addresses as needed (in unix) interpreting file to initialize the process address space (in all systems) kernel image is special (own format)
16 From source code to a process
Process = Systems 2
17 Static Linking and Loading
#include
r
Routine is not loaded until it is called Better memory-space utilization; unused routine is never loaded. Useful when large amounts of code are needed to handle infrequently occurring cases.
19 Shared Libraries
Observation – “everyone” links to standard libraries (libc.a, etc.) Consume space in every executable image every process memory at runtime Would it be possible to share the common libraries? Automatically load at runtime? Libraries designated as “shared” .so, .dll, etc. Supported by corresponding “.a” libraries containing symbol information Linker sets up symbols to be resolved at runtime Loader: Is library already in memory? If so, map into new process space If not, load and then map
20 Run-time Linking/Loading
21 Executable files
The OS expects executable files to have a specific format Header info Code locations Data locations Code & data Symbol Table List of names of things defined in your program and where they are defined List of names of things defined elsewhere that are used by your program, and where they are used.
22 Example
23 Example (cont)
24 0000000000000000
From wiki: The segments contain information that is necessary for runtime execution of the file, LOADING LINKING while sections contain important data for linking and relocation. Any byte in the entire file can be owned by at most one section, and there can be orphan bytes which are not owned by any section. ELF describes two separate ‘views’ of an executable, a linking view and a loading view Linking view is used at static link time to combine relocatable files Loading view is used at run time to load and execute program
26 ELF Link View
Not all sections needed at run time Information used for linking Debugging information
Difference between link time and run time
27 Program Linkage Table
Stored in the .plt section Allows executables to call functions that aren’t present at compile time Shared library functions (e.g printf()) Set of function stubs Relocations point them to real location of the functions Normally relocated ‘lazily’
28 ELF Loading View w/ segment types
Much simpler view, divides executable into ‘Segments’ Describes Parts of file to be loaded into memory at run time Locations of important data at run time Segments have: A simple type Requested memory location Permissions (R/W/X) Size (in file and in memory)
29 ELF Linking to Loading
Semantics of section table (Linking View) are irrelevant in Loading View Section information can be removed from executable Operating system routines to load executable and begin execution
30 Object File Format/Organization The object file formats provide parallel views of a file's contents, reflecting the differing needs of those activities. ELF header (executable and linkable Linking View format) resides at the beginning and holds a “road map” describing the file's organization. Program header table Tells the system how to create a process image Files used to build a process image (execute a program) must have a program header table; relocatable files do not need one. Loading View
31 Object File Format/Organization (cont) Section header table Contains information describing the file's sections Every section has an entry in the table each entry gives information such as the section name, the section size, and so on. Sections Hold the bulk of object file information for the linking view: instructions, data, symbol table, relocation information, etc. Files used during linking must have a section header table; other object files may or may not have one.
FYI: Although the figure shows the program header table immediately after the ELF header, and the section header table following the sections, actual files may differ. Moreover, sections and segments have no specified order. Only the ELF header has a fixed position in the file. 32 ELF Object File Format (details)
33 ELF Object File Format (cont)
34 Processes (section 8.2)
What is a process? Provides each program with the illusion that is has exclusive use of the processor and memory What is a process image? Process address space
35 Linker Symbols (reminders)
Global symbols Symbols defined by module m that can be referenced by other modules External symbols Global symbols that are referenced by module m but defined by some other module Local symbols Symbols that are defined and referenced exclusively by module m Ex. C functions and variables defined with the “static” attribute Linker symbol locations are different than scope locations
36 Resolving Symbols
37 Relocating Code and Data
38 Practice problem 7.1 (pg 662)
SYMBOL TABLE .symtab Info about functions and global variables that are defined and referenced in a program Does not contain entries for local variables
Understanding the relationship between linker symbols and C variables/functions… notice that the C local variable temp does NOT have a symbol table entry. Why? It goes on the stack!
Symbol swap.o .symtab entry? symbol type module where defined section buf yes extern main.o .data bufp0 yes global swap.o .data bufp1 yes globallocal swap.o .bss swap yes global swap.o .text temp no ------39 Swap relocatable symbol table
% objdump -r -d -t swap.o swap.o: file format elf32-i386 SYMBOL TABLE: 00000000 l df *ABS* 00000000 swap.c 00000000 l d .text 00000000 .text 00000000 l d .data 00000000 .data 00000000 l d .bss 00000000 .bss 00000000 l O .bss 00000004 bufp1 00000000 g O .data 00000004 bufp0 00000000 *UND* 00000000 buf 00000000 g F .text 00000035 swap
O = object F = function d = debug f = file l = local g = global
40 Symbols and Symbol Tables
Local linker symbols != local program variables .symtab does not contain any symbols that correspond to local non-static program variables. These are managed at run time on the stack and are not of interest to the linker However… local procedure variables that are defined with the C static attribute (EXCEPTION) are not managed on the stack The compiler allocates space in .data or .bss for each and created a local linker symbol in the symbol table with a unique name
FYI: Static variable’s lifetime extends across the entire run of the program where local variables are allocated and deallocated on the stack
41 SYMBOL TABLES
Built by assemblers using symbols exported by the compiler into the .s file An ELF symbol table is contained in the .symtab section It contains an array of entries where each entry contains: Symbol’s value i.e. address Flag bits (l=local, g = global, F=function, etc) Characters and spaces – up to 7 bits Section or *ABS* (absolute – not in any section) *UND* if referenced but not defined Alignment or size Symbol’s name
42 Relocation
Relocation merges the input modules and assigns run- time addresses to each symbol When an assembler generates an object module, it does not know where the code and data will ultimately be stored in memory or the locations of any externally defined functions or global variables referenced by the module A “relocation entry” is generated when the assembler encounters a reference to an object whose ultimate location is unknown 2 types R_386_PC32 R_386_32
43 2 Relocation types
R_386_PC32 relocate a reference that uses a 32-bit PC-relative address. Effective address = PC + instruction encoded addr R_386_32 Absolute addressing Uses the value encoded in the instruction
44 Relocation Info
% gcc –c –m32 main.c NOTE: relocation % objdump -r -tdata main.o entries and instructions are SYMBOL TABLE: stored in different 00000000 l df *ABS* 00000000 main.c 00000000 l d .text 00000000 .text sections of the object 00000000 l d .data 00000000 .data file .rel.txt vs .text 00000000 l d .bss 00000000 .bss 00000000 g O .data 00000008 buf .text + .offset 00000000 g F .text 00000014 main % gcc -o main -m32 main.c 0x80483b4 + 0x7 00000000 *UND* 00000000 swap = 0x80483bb = ref addr /tmp/ccEVrEUg.o: In function `main': Disassembly of section .text: 0x80483b4 main.c:(.text+0x7): undefined 00000000
Return address R_386_PC32 relocate a Location of swap function reference that uses a 32-bit PC-relative address.
Effective address = PC + instruction encoded address
.text + .offset = 0x8048380 + 0x12 = 0x8048392 reference address Call + 32-bit ref – reference address = 0x80483b0 + -4 – 0x8048392 = 0x1a 46 Relocation Info .text and .data
[0]
[1]
47 Executable After Relocation and External Reference Resolution
48 Chp. 7 Linking – building a program vs program execution which runs a program
EFFICIENCY issue: Small change requires recompilation i.e. text file MODULARITY issue: hard to share common functions (i.e. printf) SOLUTION static LINKER
Compile each C program like LAB3 separately to create .o file: % gcc –c m.c % gcc –c a.c
Combined .o files into single executable, then run: % gcc m.o a.o –o p % p 49 Relocating Symbols and Resolving External References (another example)
50 Relocation information (m.c)
51 Relocation information (a.c)
52 Executable after relocation with external reference resolution
53