<<

Where we are at:

Assembler Human Abstract design Software abstract interface Thought Chapters 9, 12 hierarchy H.L. Language & abstract interface Chapters 10 - 11 Operating Sys. Virtual VM Translator abstract interface Machine Chapters 7 - 8 Assembly Language

Assembler

Chapter 6

abstract interface Computer Building a Modern Computer From First Principles Machine Architecture abstract interface Language Chapters 4 - 5 www.nand2tetris.org Hardware Gate Logic abstract interface Platform Chapters 1 - 3 Electrical Chips & Engineering Hardware Physics hierarchy Logic Gates

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 1 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 2

Why care about assemblers? Assembly example  Assemblers employ nifty programming tricks Target code (a text file of (example) binary Hack code) // Computes 1+...+RAM[0] 0000000000010000  Assemblers are the first rung up the software hierarchy ladder // And stored the sum in RAM[1] 1110111111001000 @i 0000000000010001 M=1 // i = 1 1110101010001000  An assembler is a translator of a simple language @sum 0000000000010000 M=0 // sum = 0 assemble 1111110000010000 execute (LOOP) 0000000000000000  @i // if i>RAM[0] goto WRITE 1111010011010000 Writing an assembler = low-impact practice for writing . D=M 0000000000010010 @R0 1110001100000001 D=D‐M 0000000000010000 @WRITE 1111110000010000 D;JGT 0000000000010001 ... // Etc. ...

The program translation challenge  Extract the program’s semantics from the source program, using the syntax rules of the source language  Re-express the program’s semantics in the target language, using the syntax rules of the target language cross-platform Assembler = simple translator compiling  Translates each assembly command into one or more binary machine instructions  Handles symbols (e.g. i, sum, LOOP, …).

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 3 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 4 Revisiting Hack low-level programming: an example The assembler’s view of an assembly program

Assembly program (sum.asm) CPU emulator screen shot Assembly program // Computes 1+...+RAM[0] after running this program // Computes 1+...+RAM[0] // And stores the sum in RAM[1]. // And stores the sum in RAM[1]. Assembly program = @i @i a stream of text lines, each being M=1 // i= 1 M=1 // i= 1 one of the following: @sum @sum M=0 // sum = 0 M=0 // sum = 0 user (LOOP) supplied (LOOP) @i // if i>RAM[0] goto WRITE input @i // if i>RAM[0] goto WRITE D=M D=M @R0 @R0 D=D‐M program D=D‐M generated @WRITE output @WRITE D;JGT D;JGT @i // sum += i @i // sum += i D=M D=M @sum @sum M=D+M M=D+M @i // i++ @i // i++ M=M+1 M=M+1 @LOOP // goto LOOP @LOOP // goto LOOP 0;JMP 0;JMP (WRITE) (WRITE) @sum @sum D=M D=M @R1 @R1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) The CPU emulator allows loading and executing (END) @END @END 0;JMP symbolic Hack code. It resolves all the symbolic 0;JMP symbols to memory locations, and executes the code.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 5 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 6

The assembler’s view of an assembly program The assembler’s view of an assembly program

Assembly program Assembly program // Computes 1+...+RAM[0] // Computes 1+...+RAM[0] // And stores the sum in RAM[1]. Assembly program = // And stores the sum in RAM[1]. Assembly program = @i a stream of text lines, each being @i a stream of text lines, each being M=1 // i= 1 one of the following: M=1 // i= 1 one of the following: @sum @sum M=0 // sum = 0 M=0 // sum = 0   (LOOP) White space (LOOP) White space @i // if i>RAM[0] goto WRITE  Empty lines/indentation @i // if i>RAM[0] goto WRITE  Empty lines/indentation D=M D=M @R0  Line comments @R0  Line comments D=D‐M  In-line comments D=D‐M  In-line comments @WRITE @WRITE D;JGT D;JGT  @i // sum += i @i // sum += i Instructions D=M D=M  A-instruction @sum @sum M=D+M M=D+M  -instruction @i // i++ @i // i++ M=M+1 M=M+1 @LOOP // goto LOOP @LOOP // goto LOOP 0;JMP 0;JMP (WRITE) (WRITE) @sum @sum D=M D=M @R1 @R1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) (END) @END @END 0;JMP 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 7 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 8 The assembler’s view of an assembly program The assembler’s view of an assembly program

Assembly program Assembly program // Computes 1+...+RAM[0] // Computes 1+...+RAM[0] // And stores the sum in RAM[1]. Assembly program = // And stores the sum in RAM[1]. Assembly program = @i a stream of text lines, each being @16 a stream of text lines, each being M=1 // i= 1 one of the following: M=1 // i= 1 one of the following: @sum @17 M=0 // sum = 0 M=0 // sum = 0   (LOOP) White space White space @i // if i>RAM[0] goto WRITE  Empty lines/indentation @16 // if i>RAM[0] goto WRITE  Empty lines/indentation D=M D=M @R0  Line comments @0  Line comments D=D‐M  In-line comments D=D‐M  In-line comments @WRITE @18 D;JGT D;JGT   @i // sum += i Instructions @16 // sum += i Instructions D=M  A-instruction D=M  A-instruction @sum @17 M=D+M  C-instruction M=D+M  C-instruction @i // i++ @16 // i++ M=M+1  Symbols M=M+1  Symbols @LOOP // goto LOOP  references @4 // goto LOOP  references 0;JMP 0;JMP (WRITE)  Label declaration (XXX)  Label declaration (XXX) @sum @17 D=M D=M Assume that there is @R1 @1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum no symbol for now! (END) @END @22 0;JMP 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 9 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 10

White space White space → ignore/remove them

Assembly program Assembly program // Computes 1+...+RAM[0] @16 // And stores the sum in RAM[1]. Assembly program = M=1 Assembly program = @16 a stream of text lines, each being @17 a stream of text lines, each being M=1 // i= 1 one of the following: M=0 one of the following: @17 @16 M=0 // sum = 0 D=M   White space @0 White space @16 // if i>RAM[0] goto WRITE  Empty lines/indentation D=D‐M  Empty lines/indentation D=M @18 @0  Line comments D;JGT  Line comments D=D‐M  In-line comments @16  In-line comments @18 D=M D;JGT @17   @16 // sum += i Instructions M=D+M Instructions D=M  A-instruction @16  A-instruction @17 M=M+1 M=D+M  C-instruction @4  C-instruction @16 // i++ 0;JMP M=M+1  Symbols @17  Symbols @4 // goto LOOP  references D=M  references 0;JMP @1  Label declaration (XXX) M=D  Label declaration (XXX) @17 @22 D=M 0;JMP @1 M=D // RAM[1] = the sum

@22 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 11 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 12 Instructions → binary encoding Translating / assembling A-instructions

Assembly program @16 Symbolic: @value // Where value is either a non-negative decimal number M=1 Assembly program = // or a symbol referring to such number. @17 a stream of text lines, each being M=0 one of the following: @16 value (v = 0 or 1) D=M  @0 White space D=D‐M  Empty lines/indentation Binary: 0 v v v v v v v v v v v v v v v @18 D;JGT  Line comments @16  In-line comments D=M @17  M=D+M Instructions @16  A-instruction Translation to binary: M=M+1 @4  C-instruction 0;JMP  If value is a non-negative decimal number, simple, e.g. @16 @17  Symbols D=M  references  If value is a symbol, later. @1 M=D  Label declaration (XXX) @22 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 13 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 14

Translating / assembling C-instructions Translating / assembling C-instructions

Symbolic: dest=comp;jump // Either the dest or jump fields may be empty. Symbolic: dest=comp;jump // Either the dest or jump fields may be empty. // If dest is empty, the "=" is ommitted; // If dest is empty, the "=" is ommitted; // If jump is empty, the ";" is omitted. // If jump is empty, the ";" is omitted.

comp dest jump Example: MD=D+1 comp dest jump

Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3 Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 15 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 16 Translating / assembling C-instructions The overall assembly logic

Symbolic: dest=comp;jump // Either the dest or jump fields may be empty. Assembly program // If dest is empty, the "=" is ommitted; // Computes 1+...+RAM[0] For each (real) command // If jump is empty, the ";" is omitted. // And stores the sum in RAM[1]. @i  Parse the command, Example: D; JGT comp dest jump M=1 // i = 1 i.e. break it into its underlying fields @sum M=0 // sum = 0  A-instruction: replace the symbolic Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3 (LOOP) reference (if any) with the @i // if i>RAM[0] goto WRITE D=M corresponding memory address, @0 which is a number D=D‐M @WRITE  (how to do it, later) D;JGT @i // sum += i  C-instruction: for each field in the D=M @sum instruction, generate the M=D+M corresponding binary code @i // i++ M=M+1  Assemble the translated binary codes @LOOP // goto LOOP 0;JMP into a complete 16-bit machine (WRITE) instruction @sum D=M  Write the 16-bit instruction to the @1 output file. M=D // RAM[1] = the sum (END) @END 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 17 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 18

Typical symbolic Hack Typical symbolic Hack Handling symbols (aka symbol resolution) assembly code: Handling symbols: user-defined symbols assembly code: @R0 @R0 D=M Label symbols: Used to label destinations of goto D=M Assembly programs typically have many symbols: @END @END D;JLE commands. Declared by the pseudo-command D;JLE  Labels that mark destinations of goto commands @counter (XXX). This directive defines the symbol XXX to @counter M=D refer to the instruction memory location holding M=D  @SCREEN @SCREEN Labels that mark special memory locations D=A the next command in the program. (the assembler D=A @x needs to maintain instrCtr) @x  Variables M=D M=D (LOOP) (LOOP) @x Variable symbols: Any user-defined symbol xxx @x These symbols fall into two categories: A=M appearing in an assembly program that is not A=M M=‐1 defined elsewhere using the (xxx) directive is M=‐1 @x @x  User–defined symbols (created by programmers) D=M treated as a variable, and is automatically D=M @32 assigned a unique RAM address, starting at RAM @32  Variables D=D+A D=D+A @x address 16 (the assembler needs to maintain @x  Labels (forward reference could be a problem) M=D nextAddr) M=D @counter @counter MD=M‐1 MD=M‐1  Pre-defined symbols (used by the Hack platform). @LOOP (why start at 16? Later.) @LOOP D;JGT D;JGT (END) By convention, Hack programmers use lower-case (END) @END and upper-case to represent variable and label @END 0;JMP names, respectively 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 19 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 20 Typical symbolic Hack Handling symbols: pre-defined symbols assembly code: Handling symbols: symbol table @R0 Virtual registers: D=M Source code (example) Symbol table @END // Computes 1+...+RAM[0] R0 0 The symbols R0,…, R15 are automatically D;JLE // And stored the sum in RAM[1] R1 1 predefined to refer to RAM addresses 0,…,15 @counter @i R2 2 M=D M=1 // i = 1 ...... @sum @SCREEN R15 15 I/O pointers: The symbols SCREEN and KBD are M=0 // sum = 0 SCREEN 16384 D=A automatically predefined to refer to RAM (LOOP) KBD 24576 @x @i // if i>RAM[0] goto WRITE SP 0 addresses 16384 and 24576, respectively M=D D=M LCL 1 @R0 (base addresses of the screen and keyboard (LOOP) ARG 2 @x D=D‐M THIS 3 @WRITE THAT 4 memory maps) A=M D;JGT LOOP 4 M=‐1 @i // sum += i WRITE 18 VM control pointers: the symbols SP, LCL, ARG, @x D=M D=M @sum END 22 THIS THAT i16 , and (that don’t appear in the code @32 M=D+M sum 17 example on the right) are automatically D=D+A @i // i++ M=M+1 @x predefined to refer to RAM addresses 0 to 4, @LOOP // goto LOOP M=D 0;JMP respectively @counter (WRITE) MD=M‐1 @sum This symbol table is generated by the (The VM control pointers, which overlap R0,…, R4 @LOOP D=M assembler, and used to translate the D;JGT @R1 will come to play in the virtual machine M=D // RAM[1] = the sum symbolic code into binary code. (END) (END) implementation, covered in the next lecture) @END @END 0;JMP 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 21 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 22

Handling symbols: constructing the symbol table Handling symbols: constructing the symbol table

Source code (example) Symbol table Source code (example) Symbol table // Computes 1+...+RAM[0] R0 0 // Computes 1+...+RAM[0] R0 0 // And stored the sum in RAM[1] R1 1 // And stored the sum in RAM[1] R1 1 @i R2 2 @i R2 2 M=1 // i = 1 ... M=1 // i = 1 ... @sum R15 15 @sum R15 15 M=0 // sum = 0 SCREEN 16384 M=0 // sum = 0 SCREEN 16384 (LOOP) KBD 24576 (LOOP) KBD 24576 @i // if i>RAM[0] goto WRITE SP 0 @i // if i>RAM[0] goto WRITE SP 0 D=M LCL 1 D=M LCL 1 @R0 ARG 2 @R0 ARG 2 D=D‐M THIS 3 D=D‐M THIS 3 @WRITE @WRITE THAT 4 THAT 4 D;JGT D;JGT LOOP 4 @i // sum += i @i // sum += i WRITE 18 D=M D=M @sum @sum END 22 M=D+M M=D+M @i // i++ @i // i++ M=M+1 Initialization: create an empty symbol table and M=M+1 Initialization: create an empty symbol table and @LOOP // goto LOOP populate it with all the pre-defined symbols @LOOP // goto LOOP populate it with all the pre-defined symbols 0;JMP 0;JMP (WRITE) (WRITE) First pass: go through the entire source code, @sum @sum and add all the user-defined label symbols to D=M D=M @R1 @R1 the symbol table (without generating any code) M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) (END) @END @END 0;JMP 0;JMP

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 23 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 24 Handling symbols: constructing the symbol table Handling symbols: constructing the symbol table (one-pass solution?)

Source code (example) Symbol table Source code (example) Symbol table // Computes 1+...+RAM[0] R0 0 // Computes 1+...+RAM[0] R0 0 // And stored the sum in RAM[1] R1 1 // And stored the sum in RAM[1] R1 1 @i R2 2 @i R2 2 M=1 // i = 1 ... M=1 // i = 1 ... @sum R15 15 @sum R15 15 M=0 // sum = 0 SCREEN 16384 M=0 // sum = 0 SCREEN 16384 (LOOP) KBD 24576 (LOOP) KBD 24576 @i // if i>RAM[0] goto WRITE SP 0 @i // if i>RAM[0] goto WRITE SP 0 D=M LCL 1 D=M LCL 1 @R0 ARG 2 @R0 ARG 2 D=D‐M THIS 3 D=D‐M THIS 3 @WRITE @WRITE THAT 4 THAT 4 D;JGT D;JGT LOOP 4 @i // sum += i @i // sum += i WRITE 18 D=M D=M @sum END 22 @sum M=D+M i16 M=D+M @i // i++ sum 17 @i // i++ M=M+1 Initialization: create an empty symbol table and M=M+1 @LOOP // goto LOOP populate it with all the pre-defined symbols @LOOP // goto LOOP 0;JMP 0;JMP (WRITE) First pass: go through the entire source code, (WRITE) @sum and add all the user-defined label symbols to @sum D=M D=M @R1 the symbol table (without generating any code) @R1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) Second pass: go again through the source code, (END) @END and use the symbol table to translate all the @END 0;JMP commands. In the process, handle all the user- 0;JMP defined variable symbols.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 25 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 26

The assembly process (detailed) The assembly process (detailed)

 Initialization: create the symbol table and initialize it with the pre-  Second pass: march again through the source, and process each line: defined symbols  If the line is a C-instruction, simple  First pass: march through the source code without generating any  If the line is @xxx where xxx is a number, simple code. (LABEL) For each label declaration that appears in the source code,  If the line is @xxx and xxx is a symbol, look it up in the symbol add the pair

 If the symbol is found, replace it with its numeric value and complete the command’s translation

 If the symbol is not found, then it must represent a new variable: add the pair to the symbol table, where n is the next available RAM address, and complete the command’s translation.

 (Platform design decision: the allocated RAM addresses are running, starting at address 16).

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 27 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 28 The result ... Proposed assembler implementation

Source code (example) Target code // Computes 1+...+RAM[0] 0000000000010000 An assembler program can be written in any high-level language. (and // And stored the sum in RAM[1] 1110111111001000 could be run in the other platforms, cross-platform compiling) @i 0000000000010001 M=1 // i = 1 1110101010001000 @sum 0000000000010000 The book proposes a language-independent design, as follows. M=0 // sum = 0 1111110000010000 (LOOP) 0000000000000000 @i // if i>RAM[0] goto WRITE 1111010011010000 Software modules: D=M 0000000000010010 @R0 1110001100000001 D=D‐M 0000000000010000  Parser: Unpacks each command into its underlying fields @WRITE assemble 1111110000010000 D;JGT 0000000000010001 @i // sum += i 1111000010001000 D=M 0000000000010000  Code: Translates each field into its corresponding binary value, @sum 1111110111001000 and assembles the resulting values M=D+M 0000000000000100 @i // i++ 1110101010000111 M=M+1 0000000000010001  SymbolTable: Manages the symbol table @LOOP // goto LOOP 1111110000010000 0;JMP 0000000000000001 (WRITE) 1110001100001000  @sum 0000000000010110 Main: Initializes I/O files and drives the show. D=M 1110101010000111 @R1 M=D // RAM[1] = the sum (END) @END Note that comment lines and pseudo-commands 0;JMP (label declarations) generate no code.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 29 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 30

Perspective

 Simple machine language, simple assembler  Most assemblers are not stand-alone, but rather encapsulated in a translator of a higher order  C programmers that understand the code generated by a C compiler can improve their code considerably  C programming (e.g. for real-time systems) may involve re-writing critical segments in assembly, for optimization  Writing an assembler is an excellent practice for writing more challenging translators, e.g. a VM Translator and a compiler, as we will do in the next lectures.

Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 31