Compiler Construction 2015 X86: Assembly for a Real Machine

Compiler construction 2015 x86: assembly for a real machine Lecture 6 High-level view of x86 Not a stack machine; no direct correspondence to operand stacks. x86 architecture Arithmetics etc is done with values in registers. Calling conventions Much more limited support for function calls; you need to Some x86 instructions handle return addresses, jumps, allocation of stack frames etc From LLVM to assembler yourself. Instruction selection Instruction scheduling Your code is assembled and run; no further optimization. Register allocation CISC architecture with few registers. Straightforward code will run slowly. x86 assembler, a first example Example explained NASM assembly code NASM code commented segment .text ; code area Javalette (or C) segment .text global f ; f has external scope > more ex1.c global f f: ; entry point for f int f (int x, int y) { f: push dword ebp ; save caller’s fp int z = x + y; push dword ebp mov ebp, esp ; set our fp return z; mov ebp, esp sub esp, 4 ; allocate space for z } sub esp, 4 mov eax, [ebp+12] ; move y to eax > mov eax, [ebp+12] add eax, [ebp+8] add eax, [ebp+8] ; add x to eax mov [ebp-4], eax mov [ebp-4], eax ; move eax to z This might be compiled to the mov eax, [ebp-4] mov eax, [ebp-4] ; return value to eax assembler code to the right. leave leave ; restore caller’s fp/sp ret ret ; pop return addr, jump Intel x86 architectures Which version should you target? Long history 8086, 1978. First IBM PCs. 16 bit registers, real mode. 80286, 1982. AT, Windows. Protected mode. x86 80386, 1985. 32 bit registers, virtual memory. When speaking of the x86 architecture, one generally means 80486, Pentium, Pentium II, III, IV. 1989 – 2003. register/instruction set for the 80386 (with floating-point ops). Math coprocessor, pipelining, caches, SSE ... Intel Core 2. 2006. Multi-core. You can compile code which would run on a 386 Core i3/i5/i7. 2009/10. – or you may use SSE2 operations for a more recent version. Backwards compatibility important; leading to a large set of opcodes. Not only Intel offer x86 processors: also AMD is in the market. x86 registers Data area for parameters and local variables General purpose registers (32-bits) EAX, EBX, ECX, EDX, EBP, ESP, ESI, EDI. Runtime stack Illustration Conventional use: Contiguous memory area. High address EBP and ESP for frame pointer and stack pointer. Grows from high addresses Caller’s base pointer downwards. Stack Local vars Caller’s Segment registers growth AR layout illustrated. AR Legacy from old segmented addressing architecture. EBP contains current base Parameters Pn P1 Can be ignored in Javalette compilers. pointer (= frame pointer). Return address EBP ESP contains current stack Caller’s base pointer Callee’s Floating-point registers pointer. AR Local vars Eight 80–bit registers ST0 – ST7 organised as a stack. Note: We need to store return ESP address (address of instruction to Flag registers jump to on return). Status registers with bits for results of comparisons, etc. We discuss these later. Calling convention Parameters, local variables and return values Callee, on entry Caller, before call Parameters Push params (in reverse order). Push caller’s base pointer. In the callee code, integer parameter 1 has address ebp+8, Push return address. Update current base pointer. parameter 2 ebp+12, etc. Jump to callee entry. Allocate space for locals. Parameter values accessed with indirect addressing: [ebp+8], etc. Double parameters require 8 bytes. Code pattern: Code pattern: push dword ebp Here ebp+n means “(address stored in ebp) + n”. push dword paramn ... mov ebp, esp sub esp, Local variables push dword param1 localbytes call f First local var is at address ebp-4, etc. Callee, on exit Local vars are conventionally addressed relative to ebp, not esp. Caller, after call Restore base and stack pointer. Again, refer to vars by indirect addressing: [ebp-4], etc. Pop parameters. Pop return address and jump. Code pattern: Return values Code pattern: leave Integer and boolean values are returned in eax, doubles in st0. add esp parambytes ret Register usage Assemblers for x86 Scratch registers (caller save) Several alternatives EAX, ECX and EDX must be saved by caller before call, if used; Several assemblers for x86 exist, with different syntax. can be freely used by callee. We will use NASM, the Netwide Assembler, which is available for several platforms. Callee save register We also recommend Paul Carter’s book and examples. Follow EBX, ESI, EDI, EBP, ESP. link from course web site. For EBP and ESP, this is handled in the code patterns. Some syntax differences to the GNU assembler: GNU uses %eax etc, as register names. Note For two-argument instructions, the operands have opposite What we have described is one common calling convention for order(!). 32-bit x86, called cdecl. Different syntax for indirect addressing. If you use gcc -S ex.c, you will get GNU syntax. Other conventions exist, but we omit them. Example: GNU syntax Integer arithmetic; two-adress code First example, revisited Addition, subtraction and multiplication > gcc -c ex1.c add dest, src ; dest := dest + src > objdump -d ex1.o sub dest, src ; dest := dest - src ex1.o: file format elf32-i386 imul dest, src ; dest := dest src · Disassembly of section .text: Operands can be values in registers or in memory; src also a literal. 00000000 <f>: 0: 55 push %ebp Division – one-address code 1: 89 e5 mov %esp,%ebp idiv denom 3: 8b 45 0c mov 0xc(%ebp),%eax (eax,edx) := ((edx:eax)/denom,(edx:eax)%denom) 6: 03 45 08 add 0x8(%ebp),%eax 9: c9 leave The numerator is the 64-bit value edx:eax (no other choices). a: c3 ret Both div and mod are performed; results in eax resp. edx. > edx must be zeroed before division. Example Example, continued Code for main Comments push dword ebp Complete file IO functions are external; javalette program mov ebp, esp extern printString, printInt we come back to that. int main () { push str1 extern readInt printString "Input a number: "; call printString The .data segment int n = readInt(); add esp, 4 segment .data contains constants such printInt (2*n); call readInt str1 db "Input a number: " as str1. return 0; imul eax, 2 The .text segment } push eax segment .text contains code. call printInt global main The global declaration The above code could be translated as add esp, 4 gives main external scope follows (slightly optimized to fit on slide). mov eax, 0 main: (can be called from code leave code from previous slide outside this file). ret Floating-point arithmetic in x86 Floating-point arithmetic in SSE2 Moving numbers (selection) fld src pushes value in src on fp stack. New registers fild src pushes integer value in src on fp stack. 128-bit registers XMM0–XMM7 (later also XMM8–XMM15). fstp dest stores top of fp stack in dest and pops. Each can hold two double precision floats or four single-precision src and dest can be fp register or memory reference. floats. SIMD operations for arithmetic. Arithmetic (selection) Arithmetic instructions fadd src src added to ST0. Two-address code, ADDSD, MULSD, etc. fadd to dest ST0 added to dest. SSE2 fp code similar to integer arithmetic. faddp dest ST0 added to dest, then pop. Similar variants for fsub, fmul and fdiv. Control flow One more example Naive assembler sum: push dword ebp mov ebp, esp Branch instructions (selection) Javalette (or C) sub esp, 8 Integer comparisons JZ lab branches if ZF is set. int sum(int n) { mov [ebp-4], 0 cmp v1 v2 JL lab branches if SF is set. int res = 0; mov [ebp-8], 0 v1-v2 is computed and bits in the Similarly for the other relations int i = 0; jmp L2 flag registers are set: between v1 and v2. while (i < n) { L3: mov eax, [ebp-8] ZF is set iff value is zero. res = res + i; add [ebp-4], eax fcomi st0 OF is set iff result overflows. src compares and src i++; inc [ebp-8] SF is set iff result is negative. and sets flags; can be followed by } L2: mov eax, [ebp-8] branching as above. return res; cmp eax, [ebp+8] } jl L3 mov eax, [ebp-4] leave ret How to do an x86 backend Input and output Starting point A simple proposal Two alternatives: Define printInt, readInt etc in C. Then link this file together with gcc From LLVM code (requires your basic backend to generate your object files using . LLVM code as a data structure, not directly as strings). Alternative: Compile runtime.ll with llvm-as and llc to get Will generate many local vars. runtime.s; this can be given to gcc as below. From AST’s generated by frontend (means a lot of code Linux building common with LLVM backend). To assemble a NASM file to file.o: Variables nasm -f elf file.asm To link: In either case, your code will contain a lot of variables/virtual gcc file.o runtime.c registers. Possible approaches: Result is executable a.out. Treat these as local vars, storing to and fetching from stack at each access. Gives really slow code. More info Do (linear scan) register allocation. Much better code. Paul Carter’s book (link on course web site) gives more info. From LLVM to assembler Native code generation, revisited More complications Several stages So far, we have ignored some important concerns in code Instruction selection. generation: Instruction scheduling. The instruction set in real-world processors typically offer SSA-based optimizations. many different ways to achieve the same effect.

Load more