Where we are at:
Assembler Human Abstract design Software abstract interface Thought Chapters 9, 12 hierarchy H.L. Language Compiler & abstract interface Chapters 10 - 11 Operating Sys. Virtual VM Translator abstract interface Machine Chapters 7 - 8 Assembly Language
Assembler
Chapter 6
abstract interface Computer Building a Modern Computer From First Principles Machine Architecture abstract interface Language Chapters 4 - 5 www.nand2tetris.org Hardware Gate Logic abstract interface Platform Chapters 1 - 3 Electrical Chips & Engineering Hardware Physics hierarchy Logic Gates
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 1 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 2
Why care about assemblers? Assembly example Assemblers employ nifty programming tricks Target code (a text file of Source code (example) binary Hack code) // Computes 1+...+RAM[0] 0000000000010000 Assemblers are the first rung up the software hierarchy ladder // And stored the sum in RAM[1] 1110111111001000 @i 0000000000010001 M=1 // i = 1 1110101010001000 An assembler is a translator of a simple language @sum 0000000000010000 M=0 // sum = 0 assemble 1111110000010000 execute (LOOP) 0000000000000000 @i // if i>RAM[0] goto WRITE 1111010011010000 Writing an assembler = low-impact practice for writing compilers. D=M 0000000000010010 @R0 1110001100000001 D=D‐M 0000000000010000 @WRITE 1111110000010000 D;JGT 0000000000010001 ... // Etc. ...
The program translation challenge Extract the program’s semantics from the source program, using the syntax rules of the source language Re-express the program’s semantics in the target language, using the syntax rules of the target language cross-platform Assembler = simple translator compiling Translates each assembly command into one or more binary machine instructions Handles symbols (e.g. i, sum, LOOP, …).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 3 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 4 Revisiting Hack low-level programming: an example The assembler’s view of an assembly program
Assembly program (sum.asm) CPU emulator screen shot Assembly program // Computes 1+...+RAM[0] after running this program // Computes 1+...+RAM[0] // And stores the sum in RAM[1]. // And stores the sum in RAM[1]. Assembly program = @i @i a stream of text lines, each being M=1 // i= 1 M=1 // i= 1 one of the following: @sum @sum M=0 // sum = 0 M=0 // sum = 0 user (LOOP) supplied (LOOP) @i // if i>RAM[0] goto WRITE input @i // if i>RAM[0] goto WRITE D=M D=M @R0 @R0 D=D‐M program D=D‐M generated @WRITE output @WRITE D;JGT D;JGT @i // sum += i @i // sum += i D=M D=M @sum @sum M=D+M M=D+M @i // i++ @i // i++ M=M+1 M=M+1 @LOOP // goto LOOP @LOOP // goto LOOP 0;JMP 0;JMP (WRITE) (WRITE) @sum @sum D=M D=M @R1 @R1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) The CPU emulator allows loading and executing (END) @END @END 0;JMP symbolic Hack code. It resolves all the symbolic 0;JMP symbols to memory locations, and executes the code.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 5 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 6
The assembler’s view of an assembly program The assembler’s view of an assembly program
Assembly program Assembly program // Computes 1+...+RAM[0] // Computes 1+...+RAM[0] // And stores the sum in RAM[1]. Assembly program = // And stores the sum in RAM[1]. Assembly program = @i a stream of text lines, each being @i a stream of text lines, each being M=1 // i= 1 one of the following: M=1 // i= 1 one of the following: @sum @sum M=0 // sum = 0 M=0 // sum = 0 (LOOP) White space (LOOP) White space @i // if i>RAM[0] goto WRITE Empty lines/indentation @i // if i>RAM[0] goto WRITE Empty lines/indentation D=M D=M @R0 Line comments @R0 Line comments D=D‐M In-line comments D=D‐M In-line comments @WRITE @WRITE D;JGT D;JGT @i // sum += i @i // sum += i Instructions D=M D=M A-instruction @sum @sum M=D+M M=D+M C-instruction @i // i++ @i // i++ M=M+1 M=M+1 @LOOP // goto LOOP @LOOP // goto LOOP 0;JMP 0;JMP (WRITE) (WRITE) @sum @sum D=M D=M @R1 @R1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) (END) @END @END 0;JMP 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 7 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 8 The assembler’s view of an assembly program The assembler’s view of an assembly program
Assembly program Assembly program // Computes 1+...+RAM[0] // Computes 1+...+RAM[0] // And stores the sum in RAM[1]. Assembly program = // And stores the sum in RAM[1]. Assembly program = @i a stream of text lines, each being @16 a stream of text lines, each being M=1 // i= 1 one of the following: M=1 // i= 1 one of the following: @sum @17 M=0 // sum = 0 M=0 // sum = 0 (LOOP) White space White space @i // if i>RAM[0] goto WRITE Empty lines/indentation @16 // if i>RAM[0] goto WRITE Empty lines/indentation D=M D=M @R0 Line comments @0 Line comments D=D‐M In-line comments D=D‐M In-line comments @WRITE @18 D;JGT D;JGT @i // sum += i Instructions @16 // sum += i Instructions D=M A-instruction D=M A-instruction @sum @17 M=D+M C-instruction M=D+M C-instruction @i // i++ @16 // i++ M=M+1 Symbols M=M+1 Symbols @LOOP // goto LOOP references @4 // goto LOOP references 0;JMP 0;JMP (WRITE) Label declaration (XXX) Label declaration (XXX) @sum @17 D=M D=M Assume that there is @R1 @1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum no symbol for now! (END) @END @22 0;JMP 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 9 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 10
White space White space → ignore/remove them
Assembly program Assembly program // Computes 1+...+RAM[0] @16 // And stores the sum in RAM[1]. Assembly program = M=1 Assembly program = @16 a stream of text lines, each being @17 a stream of text lines, each being M=1 // i= 1 one of the following: M=0 one of the following: @17 @16 M=0 // sum = 0 D=M White space @0 White space @16 // if i>RAM[0] goto WRITE Empty lines/indentation D=D‐M Empty lines/indentation D=M @18 @0 Line comments D;JGT Line comments D=D‐M In-line comments @16 In-line comments @18 D=M D;JGT @17 @16 // sum += i Instructions M=D+M Instructions D=M A-instruction @16 A-instruction @17 M=M+1 M=D+M C-instruction @4 C-instruction @16 // i++ 0;JMP M=M+1 Symbols @17 Symbols @4 // goto LOOP references D=M references 0;JMP @1 Label declaration (XXX) M=D Label declaration (XXX) @17 @22 D=M 0;JMP @1 M=D // RAM[1] = the sum
@22 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 11 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 12 Instructions → binary encoding Translating / assembling A-instructions
Assembly program @16 Symbolic: @value // Where value is either a non-negative decimal number M=1 Assembly program = // or a symbol referring to such number. @17 a stream of text lines, each being M=0 one of the following: @16 value (v = 0 or 1) D=M @0 White space D=D‐M Empty lines/indentation Binary: 0 v v v v v v v v v v v v v v v @18 D;JGT Line comments @16 In-line comments D=M @17 M=D+M Instructions @16 A-instruction Translation to binary: M=M+1 @4 C-instruction 0;JMP If value is a non-negative decimal number, simple, e.g. @16 @17 Symbols D=M references If value is a symbol, later. @1 M=D Label declaration (XXX) @22 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 13 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 14
Translating / assembling C-instructions Translating / assembling C-instructions
Symbolic: dest=comp;jump // Either the dest or jump fields may be empty. Symbolic: dest=comp;jump // Either the dest or jump fields may be empty. // If dest is empty, the "=" is ommitted; // If dest is empty, the "=" is ommitted; // If jump is empty, the ";" is omitted. // If jump is empty, the ";" is omitted.
comp dest jump Example: MD=D+1 comp dest jump
Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3 Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 15 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 16 Translating / assembling C-instructions The overall assembly logic
Symbolic: dest=comp;jump // Either the dest or jump fields may be empty. Assembly program // If dest is empty, the "=" is ommitted; // Computes 1+...+RAM[0] For each (real) command // If jump is empty, the ";" is omitted. // And stores the sum in RAM[1]. @i Parse the command, Example: D; JGT comp dest jump M=1 // i = 1 i.e. break it into its underlying fields @sum M=0 // sum = 0 A-instruction: replace the symbolic Binary: 1 1 1 a c1 c2 c3 c4 c5 c6 d1 d2 d3 j1 j2 j3 (LOOP) reference (if any) with the @i // if i>RAM[0] goto WRITE D=M corresponding memory address, @0 which is a number D=D‐M @WRITE (how to do it, later) D;JGT @i // sum += i C-instruction: for each field in the D=M @sum instruction, generate the M=D+M corresponding binary code @i // i++ M=M+1 Assemble the translated binary codes @LOOP // goto LOOP 0;JMP into a complete 16-bit machine (WRITE) instruction @sum D=M Write the 16-bit instruction to the @1 output file. M=D // RAM[1] = the sum (END) @END 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 17 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 18
Typical symbolic Hack Typical symbolic Hack Handling symbols (aka symbol resolution) assembly code: Handling symbols: user-defined symbols assembly code: @R0 @R0 D=M Label symbols: Used to label destinations of goto D=M Assembly programs typically have many symbols: @END @END D;JLE commands. Declared by the pseudo-command D;JLE Labels that mark destinations of goto commands @counter (XXX). This directive defines the symbol XXX to @counter M=D refer to the instruction memory location holding M=D @SCREEN @SCREEN Labels that mark special memory locations D=A the next command in the program. (the assembler D=A @x needs to maintain instrCtr) @x Variables M=D M=D (LOOP) (LOOP) @x Variable symbols: Any user-defined symbol xxx @x These symbols fall into two categories: A=M appearing in an assembly program that is not A=M M=‐1 defined elsewhere using the (xxx) directive is M=‐1 @x @x User–defined symbols (created by programmers) D=M treated as a variable, and is automatically D=M @32 assigned a unique RAM address, starting at RAM @32 Variables D=D+A D=D+A @x address 16 (the assembler needs to maintain @x Labels (forward reference could be a problem) M=D nextAddr) M=D @counter @counter MD=M‐1 MD=M‐1 Pre-defined symbols (used by the Hack platform). @LOOP (why start at 16? Later.) @LOOP D;JGT D;JGT (END) By convention, Hack programmers use lower-case (END) @END and upper-case to represent variable and label @END 0;JMP names, respectively 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 19 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 20 Typical symbolic Hack Handling symbols: pre-defined symbols assembly code: Handling symbols: symbol table @R0 Virtual registers: D=M Source code (example) Symbol table @END // Computes 1+...+RAM[0] R0 0 The symbols R0,…, R15 are automatically D;JLE // And stored the sum in RAM[1] R1 1 predefined to refer to RAM addresses 0,…,15 @counter @i R2 2 M=D M=1 // i = 1 ...... @sum @SCREEN R15 15 I/O pointers: The symbols SCREEN and KBD are M=0 // sum = 0 SCREEN 16384 D=A automatically predefined to refer to RAM (LOOP) KBD 24576 @x @i // if i>RAM[0] goto WRITE SP 0 addresses 16384 and 24576, respectively M=D D=M LCL 1 @R0 (base addresses of the screen and keyboard (LOOP) ARG 2 @x D=D‐M THIS 3 @WRITE THAT 4 memory maps) A=M D;JGT LOOP 4 M=‐1 @i // sum += i WRITE 18 VM control pointers: the symbols SP, LCL, ARG, @x D=M D=M @sum END 22 THIS THAT i16 , and (that don’t appear in the code @32 M=D+M sum 17 example on the right) are automatically D=D+A @i // i++ M=M+1 @x predefined to refer to RAM addresses 0 to 4, @LOOP // goto LOOP M=D 0;JMP respectively @counter (WRITE) MD=M‐1 @sum This symbol table is generated by the (The VM control pointers, which overlap R0,…, R4 @LOOP D=M assembler, and used to translate the D;JGT @R1 will come to play in the virtual machine M=D // RAM[1] = the sum symbolic code into binary code. (END) (END) implementation, covered in the next lecture) @END @END 0;JMP 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 21 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 22
Handling symbols: constructing the symbol table Handling symbols: constructing the symbol table
Source code (example) Symbol table Source code (example) Symbol table // Computes 1+...+RAM[0] R0 0 // Computes 1+...+RAM[0] R0 0 // And stored the sum in RAM[1] R1 1 // And stored the sum in RAM[1] R1 1 @i R2 2 @i R2 2 M=1 // i = 1 ... M=1 // i = 1 ... @sum R15 15 @sum R15 15 M=0 // sum = 0 SCREEN 16384 M=0 // sum = 0 SCREEN 16384 (LOOP) KBD 24576 (LOOP) KBD 24576 @i // if i>RAM[0] goto WRITE SP 0 @i // if i>RAM[0] goto WRITE SP 0 D=M LCL 1 D=M LCL 1 @R0 ARG 2 @R0 ARG 2 D=D‐M THIS 3 D=D‐M THIS 3 @WRITE @WRITE THAT 4 THAT 4 D;JGT D;JGT LOOP 4 @i // sum += i @i // sum += i WRITE 18 D=M D=M @sum @sum END 22 M=D+M M=D+M @i // i++ @i // i++ M=M+1 Initialization: create an empty symbol table and M=M+1 Initialization: create an empty symbol table and @LOOP // goto LOOP populate it with all the pre-defined symbols @LOOP // goto LOOP populate it with all the pre-defined symbols 0;JMP 0;JMP (WRITE) (WRITE) First pass: go through the entire source code, @sum @sum and add all the user-defined label symbols to D=M D=M @R1 @R1 the symbol table (without generating any code) M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) (END) @END @END 0;JMP 0;JMP
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 23 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 24 Handling symbols: constructing the symbol table Handling symbols: constructing the symbol table (one-pass solution?)
Source code (example) Symbol table Source code (example) Symbol table // Computes 1+...+RAM[0] R0 0 // Computes 1+...+RAM[0] R0 0 // And stored the sum in RAM[1] R1 1 // And stored the sum in RAM[1] R1 1 @i R2 2 @i R2 2 M=1 // i = 1 ... M=1 // i = 1 ... @sum R15 15 @sum R15 15 M=0 // sum = 0 SCREEN 16384 M=0 // sum = 0 SCREEN 16384 (LOOP) KBD 24576 (LOOP) KBD 24576 @i // if i>RAM[0] goto WRITE SP 0 @i // if i>RAM[0] goto WRITE SP 0 D=M LCL 1 D=M LCL 1 @R0 ARG 2 @R0 ARG 2 D=D‐M THIS 3 D=D‐M THIS 3 @WRITE @WRITE THAT 4 THAT 4 D;JGT D;JGT LOOP 4 @i // sum += i @i // sum += i WRITE 18 D=M D=M @sum END 22 @sum M=D+M i16 M=D+M @i // i++ sum 17 @i // i++ M=M+1 Initialization: create an empty symbol table and M=M+1 @LOOP // goto LOOP populate it with all the pre-defined symbols @LOOP // goto LOOP 0;JMP 0;JMP (WRITE) First pass: go through the entire source code, (WRITE) @sum and add all the user-defined label symbols to @sum D=M D=M @R1 the symbol table (without generating any code) @R1 M=D // RAM[1] = the sum M=D // RAM[1] = the sum (END) Second pass: go again through the source code, (END) @END and use the symbol table to translate all the @END 0;JMP commands. In the process, handle all the user- 0;JMP defined variable symbols.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 25 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 26
The assembly process (detailed) The assembly process (detailed)
Initialization: create the symbol table and initialize it with the pre- Second pass: march again through the source, and process each line: defined symbols If the line is a C-instruction, simple First pass: march through the source code without generating any If the line is @xxx where xxx is a number, simple code. (LABEL) For each label declaration that appears in the source code, If the line is @xxx and xxx is a symbol, look it up in the symbol add the pair
If the symbol is found, replace it with its numeric value and complete the command’s translation
If the symbol is not found, then it must represent a new variable: add the pair
(Platform design decision: the allocated RAM addresses are running, starting at address 16).
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 27 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 28 The result ... Proposed assembler implementation
Source code (example) Target code // Computes 1+...+RAM[0] 0000000000010000 An assembler program can be written in any high-level language. (and // And stored the sum in RAM[1] 1110111111001000 could be run in the other platforms, cross-platform compiling) @i 0000000000010001 M=1 // i = 1 1110101010001000 @sum 0000000000010000 The book proposes a language-independent design, as follows. M=0 // sum = 0 1111110000010000 (LOOP) 0000000000000000 @i // if i>RAM[0] goto WRITE 1111010011010000 Software modules: D=M 0000000000010010 @R0 1110001100000001 D=D‐M 0000000000010000 Parser: Unpacks each command into its underlying fields @WRITE assemble 1111110000010000 D;JGT 0000000000010001 @i // sum += i 1111000010001000 D=M 0000000000010000 Code: Translates each field into its corresponding binary value, @sum 1111110111001000 and assembles the resulting values M=D+M 0000000000000100 @i // i++ 1110101010000111 M=M+1 0000000000010001 SymbolTable: Manages the symbol table @LOOP // goto LOOP 1111110000010000 0;JMP 0000000000000001 (WRITE) 1110001100001000 @sum 0000000000010110 Main: Initializes I/O files and drives the show. D=M 1110101010000111 @R1 M=D // RAM[1] = the sum (END) @END Note that comment lines and pseudo-commands 0;JMP (label declarations) generate no code.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 29 Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 30
Perspective
Simple machine language, simple assembler Most assemblers are not stand-alone, but rather encapsulated in a translator of a higher order C programmers that understand the code generated by a C compiler can improve their code considerably C programming (e.g. for real-time systems) may involve re-writing critical segments in assembly, for optimization Writing an assembler is an excellent practice for writing more challenging translators, e.g. a VM Translator and a compiler, as we will do in the next lectures.
Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org , Chapter 6: Assembler slide 31