PA5 Overview (for real this time) Lecture 15

lol, let’s not target Windows executables

March 7, 2018 PA5 Overview

“[PA5] is by far the most difficult assignment... you have to get almost everything right which makes it difficult to estimate how much progress you’re making...” “[PA5] was the most time intensive and the hardest of the assignments in both classes. There was (1) just a lot more stuff and (2) a lot more creativity required...” “[PA5] needs a lot to come together to get a basic program working.”

Compiler Construction 2/31 Compiler Construction 3/31 Considerations for PA5

É Stack generation É Without a sensible register allocator, we must make sure to use a fixed number of registers for everything. É Vtables É Object layout É Interfacing with external libraries

Compiler Construction 4/31 -64 Cool Assembly CISC — Complex É É RISC — Reduced Instruction Set Instruction Set É Lots of Only 8 registers opportunities for É Fewer optimization optimizations É opportunities É Nightmarish É Reference compiler interfacing with has helpful external libraries debugging É Must use external capabilities tools (e.g., gdb É If you don’t like instead of the the reference reference compiler) compiler.... well, sorry Amazing CV É Easier to complete booster É Compiler Construction 5/31 Cool Assembly

Compiler Construction 6/31 x86-64 Assembly

X86/WIN32 REVERSE ENGINEERING CHEAT­SHEET Registers Instructions

GENERAL PURPOSE 32­BIT REGISTERS ADD , Adds to . may be a register or memory. may EAX Contains the return value of a function call. Be a register, memory or immediate value. ECX Used as a loop counter. "this" pointer in C++. CALL Call a function and return to the next instruction when finished. EBX General Purpose may be a relative offset from the current location, a register or memory addr. EDX General Purpose CMP , Compare with . Similar to SUB instruction but does not ESI Source index pointer Modify the operand with the result of the subtraction. EDI Destination index pointer DEC Subtract 1 from . may be a register or memory. ESP Stack pointer DIV Divide the EDX:EAX registers (64‐bit combo) by . may be EBP Stack base pointer a register or memory. SEGMENT REGISTERS INC Add 1 to . may be a register or memory. CS Code segment JE Jump if Equal (ZF=1) to . SS Stack segment JG Jump if Greater (ZF=0 and SF=OF) to . DS Data segment JGE Jump if Greater or Equal (SF=OF) to . ES Extra data segment JLE Jump is Less or Equal (SF<>OF) to . FS Points to Thread Information Block (TIB) JMP Jump to . Unconditional. GS Extra data segment JNE Jump if Not Equal (ZF=0) to . MISC. REGISTERS JNZ Jump if Not Zero (ZF=0) to . EIP Instruction pointer JZ Jump if Zero (ZF=1) to . EFLAGS status flags. LEA , Load Effective Address. Gets a pointer to the memory expression STATUS FLAGS and stores it in . ZF Zero: Operation resulted in Zero MOV , Move data from to . may be an immediate value, CF Carry: source > destination in subtract register, or a memory address. Dest may be either a memory address or a SF Sign: Operation resulted in a negative # register. Both and may not be memory addresses. OF Overflow: result too large for destination MUL Multiply the EDX:EAX registers (64‐bit combo) by . may 16­BIT AND 8­BIT REGISTERS be a register or memory. The four primary general purpose registers (EAX, EBX, POP Take a 32‐bit value from the stack and store it in . ESP is incremented ECX and EDX) have 16 and 8 bit overlapping aliases. by 4. may be a register, including segment registers, or memory. EAX 32‐bit PUSH Adds a 32‐bit value to the top of the stack. Decrements ESP by 4. AX 16‐bit may be a register, segment register, memory or immediate value. AH AL 8‐bit ROL , Bitwise Rotate Left the value in by bits. may be a register or memory address. may be immediate or CL register. ROR , Bitwise Rotate Right the value in by bits. may be a The or memory address. may be immediate or CL register. Low Empty SHL , Bitwise Shift Left the value in by bits. Zero bits added to Addresses the least significant bits. may be reg. or mem. is imm. or CL. Local Variables <‐ESP points here SHR , Bitwise Shift Left the value in by bits. Zero bits added to the least significant bits. may be reg. or mem. is imm. or CL. ↑ EBP‐x SUB , Subtract from . may be immediate, memory or a <‐EBP points here ↓ EBP+x Saved EBP register. may be memory or a register. (source = dest)‐>ZF=1, Return Pointer (source > dest)‐>CF=1, (source < dest)‐>CF=0 and ZF=0 Parameters TEST , Performs a logical OR operation but does not modify the value in the Parent function's operand. (source = dest)‐>ZF=1, (source <> dest)‐>ZF=0. data XCHG Exchange the contents of and . Operands may be register High Grand‐parent or memory. Both operands may not be memory. Addresses function's data XOR , Bitwise XOR the value in with the value in , storing the result in . may be reg or mem and may be reg, mem or imm. Assembly Language Terminology and Formulas

Instruction listings contain at least a mnemonic, which Pointer to Raw Data Offset of section data within the executable file. is the operation to be performed. Many instructions Size of Raw Data Amount of section data within the executable file. will take operands. Instructions with multiple RVA Relative Virtual Address. Memory offset from the beginning of the executable. operands list the destination operand first and the Virtual Address (VA) Absolute Memory Address (RVA + Base). The PE Header fields named source operand second (, ). Assembler VirtualAddress actually contain Relative Virtual Addresses. directives may also be listed which appear similar to Virtual Size Amount of section data in memory. instructions. Base Address Offset in memory that the executable module is loaded. ASSEMBLER DIRECTIVES ImageBase Base Address requested in the PE header of a module. DB Define Byte. Reserves an explicit Module An PE formatted file loaded into memory. Typically EXE or DLL. byte of memory at the current Pointer A memory address location. Initialized to value. Entry Point The address of the first instruction to be executed when the module is loaded. DW Define Word. 2‐Bytes Import DLL functions required for use by an executable module. DD Define DWord. 4‐Bytes Export Functions provided by a DLL which may be Imported by another module. OPERAND TYPES RVA‐>Raw Conversion Raw = (RVA ‐ SectionStartRVA) + (SectionStartRVA ‐ SectionStartPtrToRaw) Immediate A numeric operand, hard coded RVA‐>VA Conversion VA = RVA + BaseAddress Register A general purpose register VA‐>RVA Conversion RVA = VA ‐ BaseAddress Memory Memory address w/ brackets [ ] Raw‐>VA Conversion VA = (Raw ‐ SectionStartPtrToRaw) + (SectionStartRVA + ImageBase) Copyright © 2009 Nick Harbour www.rnicrosoft.net Compiler Construction 7/31 Stack Machines

É A simple evaluation model É No variables or registers (aside from temporary storage) É A stack of values for intermediate results

Compiler Construction 8/31 Example Stack Machine Program

É Consider two instructions É push i place integer i on top of the stack É add pop two elements, add them and put the result back on the stack

É A program to compute 7 + 5: push 7 push 5 add

Compiler Construction 9/31 Stack Machine Example

5 7 7 ⊕ 12 stack ...... push 7 push 5 add

É Each instruction: É Takes its operands from the top of the stack É Removes those operands from the stack É Computes the required operation on them É Pushes the result on the stack

Compiler Construction 10/31 Why Stack Machines?

É Each operation takes operands from the same place and puts the results in the same place (i.e., fixed offsets from the top of the stack)

É This means a uniform compilation scheme É To do an add, always get sp[0] and sp[1], add them, store result at sp[0]

É And thus a simpler compiler É This is the easiest way to do PA5 É Register allocation is more complex!

Compiler Construction 11/31 Why Stack Machines?

É Location of the operands is implicit É Always on the top of the stack É No need to specify operands explicitly É You can load fixed temporary registers with operands from stack!

É example discipline: load operand 1 by popping sp[0] to r4, load operand 2 by popping sp[1] to r5 É No need to specify result location É e.g., always put result in r6, then push r6 É Can represent instruction as add instead of add r1, r2 É Smaller program size (sometimes faster: why?) É Java uses a stack evaluation model! É uses register allocation Compiler Construction 12/31 É Idea: the top of the stack is accessed frequently É Keep an in a fixed register É Implement add as: É accumulator accumulator + top_of_stack ← É Only one memory operation!

Remarks on Stack Machines

É The add instruction performs 3 memory operations

É Two reads and one write to the stack

É How can we improve this? É Hint: fold from functional programming

Compiler Construction 13/31 Remarks on Stack Machines

É The add instruction performs 3 memory operations

É Two reads and one write to the stack

É How can we improve this? É Hint: fold from functional programming

É Idea: the top of the stack is accessed frequently É Keep an accumulator in a fixed register É Implement add as: É accumulator accumulator + top_of_stack ← É Only one memory operation!

Compiler Construction 13/31 Stack Machine with Accumulator From before:7 + 5

acc 7 5 12 ⊕

7 7 stack ......

acc 7 acc 5 acc acc + top push← 7 ← pop←

Compiler Construction 14/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init ←

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 ←

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 push← acc 7 init, 3, 7

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 push← acc 7 init, 3, 7 acc 5 5 init, 3, 7 ←

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 push← acc 7 init, 3, 7 acc 5 5 init, 3, 7 acc ← acc + top 12 init, 3, 7 ←

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 push← acc 7 init, 3, 7 acc 5 5 init, 3, 7 acc ← acc + top 12 init, 3, 7 pop← 12 init, 3

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 push← acc 7 init, 3, 7 acc 5 5 init, 3, 7 acc ← acc + top 12 init, 3, 7 pop← 12 init, 3 acc acc + top 15 init, 3 ←

Compiler Construction 15/31 Example: 3 + (7 + 5)

Code Acc Stack acc 3 3 init push← acc 3 init, 3 acc 7 7 init, 3 push← acc 7 init, 3, 7 acc 5 5 init, 3, 7 acc ← acc + top 12 init, 3, 7 pop← 12 init, 3 acc acc + top 15 init, 3 pop← 15 init

Compiler Construction 15/31 Notes É It is critical that the stack is preserved across the evaluation of a subexpression É Stack before evaluating7 + 5 is init, 3 É Stack after evaluating7 + 5 is init, 3 É The first operand is top of the stack

Compiler Construction 16/31 RISC-y Business

É We’ve been talking about some abstract notion of a stack machine

É The compiler actually generates assembly code for a specify ISA

É Briefly: COOL-ASM is a RISC-style assembly language

É Untyped, unsafe, low-level, few primitives É 8 general registers, 3 special registers sp, fp, and ra É load-store architecture: bring values into registers from memory to operate on them

Compiler Construction 17/31 Drink your Cool-aid

É Sample COOL-ASM instructions (more in CRM)

1 add r2 <- r5 r2; r2= r5+ r2 2 li r5 <- 183; r5= 183 3 ld r2 <- r1[5]; r2=*(r1+ 5) 4 st r1[6] <- r7;*(r1+6)= r7 5 my_label :; label my_label 6 push r1;*sp= r1; sp--; 7 sub r1 <- r1 1; r1= r1-1 8 bnz r1 my_label; if(r1 != 0) goto my_label

Compiler Construction 18/31 Simulating a Stack Machine

É The compiler generates COOL-ASM (or x86-64)

É Emit code to implement a stack machine! É Convention: accumulator is kept in register r1 É Stack is stored in memory É We have more memory than registers É Both x86-64 and COOL-ASM have supply stacks that grow downwards É The address of the next unused location on the stack is kept in register sp

É i.e., the top of the stack is at address sp+1 ld r1 <- sp[1] ; load top of stack into acc É Word size = 1 in COOL-ASM (8 in x86-64)

Compiler Construction 19/31 COOL-ASM for 7+5 Stack Machine COOL-ASM acc <- 7 li r1 7 push acc push r1 acc <- 5 li r1 5 acc <- acc + top pop r2 add r1 <- r1 r2 pop pop r2

Compiler Construction 20/31 Generalizing Code Generation

É Cool is an expression-based language... É We want to take a principled approach that lets us generate code for every type of expression, no matter where the expression appears in the code, and regardless of how complex the expressions are

É We make a cgen(e) function that generates code for an expression in the AAST

É Follow a discipline: cgen(e) should compute the value of e and store it in r1 (the accumulator) É Preserve sp and the contents of the stack É No matter what the expression must do to the stack, leave it the way you found it

Compiler Construction 21/31 Code Generation: Constants

cgen(123) = li r1 123

Just move the constant into the accumulator! This also preserves the stack (doesn’t touch it)

Compiler Construction 22/31 Code generation: Addition

cgen(e1 + e2) = cgen(e1) push r1 cgen(e2) ;; e2 now in r1 pop r2 ;; r2 contains the saved result of evaluating r1! add r1 <- r1 r2 Why can’t we use r2 directly after cgen(e1) to save a push and pop?

Compiler Construction 23/31 Code generation notes

É The cgen code for + is a template with holes for code that evaluates e1 and e2

É Stack Machine code generation is recursive

É Code for e1 + e2 consists of code for e1 and e2 glued together

É Most critically! Code generation can be written as a recursive-descent tree walk of the AAST

É At least, for expressions

Compiler Construction 24/31 Code Generation: If cgen(if e1 = e2 then e3 else e4fi) = cgen(e1) push r1 cgen(e2) pop r2 beq r1 r2 true_branch ;; else fall through bne r1 r2 else_branch true_branch: cgen(e3) jmp after_if else_branch: cgen(e4) jmp after_if

Compilerafter_if: Construction 25/31 Code Generation: Variables É “Variables” could be function parameters, let bindings... É Parameters get pushed on the stack É Recall: fp is established by the function prologue, sp can change during function execution Consider: f(x1, x2)

Compiler Construction 26/31 Summary

É Activation record is co-designed with the code generator

É Code generation is a recursive traversal of the AAST

É Stack Machine code is simpler for PA5 (although you’re encouraged to try register allocation!)

Compiler Construction 27/31 Object Oriented Code Generation

É “Variables” are references to objects É What are the semantics of addition in Cool?

É Liskov Substitution Principle: If B is a subclass of A, then an object of class B can be used wherever an object of class A is expected

É This means that code in class A must work unmodified on an object of class B

Compiler Construction 28/31 Two issues

É How are objects represented in memory? É How is dynamic dispatch implemented?

Compiler Construction 29/31 Object Layout

É An object is like a struct in C. The reference foo.field is an index into a foo struct at an offset corresponding to field

É Objects are laid out in contiguous memory É Each attribute stored at a fixed offset in object É When a method is invoked, the object becomes self and the fields are the object’s attributes

Compiler Construction 30/31 Cool Object Layout

É Class tag É An integer that identifies the class of the object (Int=1, Bool=2, ...)

É Object size É Number of words occupied by the object

É Dispatch pointer É Address to a table of methods

É Attributes É Each attribute laid out in contiguous memory

Compiler Construction 31/31