<<

Compilation Pipeline

(gcc): .c Æ .s o translates high-level language to • Assembler (as): .s Æ .o Assembler and Linker o translates assembly language to machine language • Archiver (ar): .o Æ .a o collects object files into a single library CS 217 • Linker (ld): .o + .a Æ a.out o builds an executable file from a collection of object files • Execution (execlp) o loads an executable file into memory and starts it

1 2

Compilation Pipeline Assembler

•Purpose .c o Translates assembly language into machine language Store result in object file (.o) Compiler o .s • Assembly language o A symbolic representation of machine instructions Assembler • Machine language .o Contains everything needed to Archiver o link, load, and execute the program .a

Linker/Loader a.out Execution

3 4 Translating to General IA32 Instruction Format • Assembly language: Instruction Opcode ModR/M SIB Displacement Immediate prefixes leal (%eax,%eax,4), %eax Up to 4 1 or 2 byte 1 byte 1 byte 0, 1, 2, 0, 1, 2, • Machine code: prefixes of opcode (if required) (if required) or 4 bytes or 4 bytes 1 byte each (optional) 1000 1101 7 6 5 3 2 0 7 6 5 3 2 0 o Byte 1: 8D (opcode LEA) Reg/ Mod R/M Scale Index Base Opcode o Byte 2: 04 (Dest %eax, with SIB) 0000 0100 • Prefixes: lock, rep, repne/repnz, repe/repz, segment overwrite, o Byte 3: 80 (base=%eax, index %eax *4) 1000 0000 -size overwrite, address-size overwrite • Opcode: see Intel manual • ModR/M and SIB: most memory need these • Displacement and immediate: depending on opcode, ModR/M and SIB • IA32’s byte order is little endian 5 6

Assembly Language Statements Main Task: Symbol Manipulation

• Imperative statements specify instructions .text .globl loop o Typically map 1 imperative statement to 1 machine instruction ... loop: • Synthetic instructions movl count, %eax cmpl %edx, %eax o They are mapped to one or more machine instructions ... .data jge done • Declarative statements specify assembly time actions count: pushl %edx o Reserve space (.comm, .lcomm, … ) .word 0 o Define symbols ( .globl Foo, … ) ... call foo Identify segments ( .text, .rodata, …) o loop o Initialize data (they do not yield machine instructions but they may add information to the object file that is used by the linker) done:

Create labels and remember their addresses Deal with the “forward reference problem” 7 8 Dealing with Forward References Implementing an Assembler

• Most assemblers have two passes o Pass 1: symbol definition o Pass 2: instruction assembly instruction • Or, alternatively, symbol table o Pass 1: instruction assembly instruction .s file data section o Pass 2: patch the cross-reference input assemble output .o file

I will illustrate this technique instruction text section

bss instruction section

disk in memory in memory disk structure structure 9 10

Input Functions Input Functions

• Read assembly language and produce • Lexical analyzer list of instructions o Group a stream of characters into tokens add %g1 , 10 , %g2

instruction • Syntactic analyzer symbol table o Check the syntax of the program instruction .s file data section .o file input assemble output • Instruction list producer Produce an in-memory list of instruction data structures instruction text section o

bss instruction section instruction

instruction

11 12 Instruction Assembly Symbol Table

... loop def loop 0 done disp10 done 2 loop: disp21 foo 5 cmpl %edx, %eax D 0 3 9 [0] disp10 loop 10 .globl loop def done 15 jge done Disp? 7 D [2] loop: pushl %edx 5 2 [4] cmpl %edx, %eax D 0 3 9 [0] jge done Disp10 7 D [2] call foo Disp? E 8 [5] pushl %edx 5 2 [4] jmp loop Disp? E 9 [10] call foo Disp21 E 8 [5] done: [15] jmp loop Disp10 E 9 [10] done: [15] 13 14

Filling in Local Addresses Relocation Records

loop def loop 0 ... done disp10 done 2 .globl loop disp21 foo 5 def loop 0 loop: disp10 loop 10 disp21 foo 5 cmpl %edx, %eax .globl loop def done 15 D 0 3 9 [0] jge done loop: +13 7 D [2] pushl %edx cmpl %edx, %eax D 0 3 9 [0] 5 2 [4] call foo jge done +13 7 D [2] Disp21 E 8 [5] jmp loop pushl %edx 5 2 [4] -10 E 9 [10] done: call foo Disp21 E 8 [5] [15] jmp loop -10 E 9 [10] done: [15] 15 16 Assembler Directives Assemble into Sections

• Delineate segments • Process instructions and directives to produce object file o .section output structures o may need multiple location counters (one per segment) • Allocate/initialize data and bss segments o .word .half .byte .ascii .asciz o instruction o .align .skip symbol table • Make symbols in text externally visible instruction .s file data section o .global input assemble output .o file

instruction text section

bss instruction section 17 18

Output Functions ELF: Executable and Linking Format

• Machine language output • Format of .o and a.out files Write symbol table and sections into object file o o Output by the assembler o Input and output of linker

instruction symbol table ELF Header instruction .s file data section .o file Program Hdr input assemble output optional for .o files Table

instruction text section Section 1 . . . bss Section n instruction section optional for files Section Hdr a.out Table

19 20 ELF (cont) ELF (cont)

• Section Header Table: array of… E_ident[EI_CLASS]=ELFCLASS32 typedef struct { .text E_ident[EI_DATA]=ELFDATA2LSB ELF Header Elf32_Word sh_name; .data typedef struct { Elf32_Word sh_type; .bss unsigned char e_ident[EI_NIDENT]; Elf32_Word sh_flags; Elf32_Half e_type; ET_REL Elf32_Addr sh_addr; Elf32_Half e_machine; ET_EXEC ET_DYN Elf32_Off sh_offset; Elf32_Word e_version; SHT_SYMTAB ET_CORE Elf32_Word sh_size; Elf32_Addr e_entry; SHT_RELA Elf32_Off e_phoff; Elf32_Word sh_link; SHT_PROGBITS Elf32_Off e_shoff; ...... EM_386 } Elf32_Shdr; SHT_NOBITS } Elf32_Ehdr;

21 22

Invoking the Linker Multiple Object Files

• ld bar.o main.o –l libc.a –o a.out main.o bar.o start main 0 compiled library output def foo 8 def loop 0 program (contains (also in “.o” disp21 loop 15 disp21 foo 5 modules more format, but [0] D 0 3 9 [0] .o files) no undefined [2] +13 7 D [2] symbols) [4] 5 2 [4] [7] Disp21 E 8 [5] • Invoked automatically by gcc, [8] -10 E 9 [10] • but you can call it directly if you like. [12] [15] Disp21 [15] [20]

23 24 Step 1: Pick An Order Step 2: Patch main.o bar.o main.o bar.o

start main 15+0 start main 15+0 def foo 15+8 def loop 0 def foo 15+8 def loop 0 disp21 loop 15+15 disp21 foo 5 disp21 loop 15+15 disp21 foo 5 [15] D 0 3 9 [0] [15] D 0 3 9 [0] [17] +11 7 D [2] [17] +11 7 D [2] [19] 5 2 [4] [19] 5 2 [4] [22] Disp21 E 8 [5] [22] 15+8-5=13 E 8 [5] [23] -15 E 9 [10] [23] -15 E 9 [10] [27] [15] [27] [15] Disp21 [30] 0-(15+15)=-30 [30] [35] [35]

25 26

Step 3: Concatenate Summary

• Assember a.out o Read assembly language start main 15+0 o Two-pass execution (resolve symbols) [0] D 0 3 9 o Produce object file +13 7 D [2] 5 2 [4] •Linker +13 E 8 [5] o Relocation records [10] -10 E 9 o Order object codes [15] Patch and resolve displacements [17] o Produce executable [19] o [22] [23] [27] -30 [30] [35]

27 28