SPARC ! Load/Store architectures ! , assembly, and high level languages ! The SPARC assembly language "Structure of an assembly program "SPARC register pool "Directives "Basic instructions The SPARC Assembly Language " in the SPARC architecture "Branching instructions "mapping of control flow instructions to assembly language

Prof. Gustavo Alonso Computer Science Department ETH Zürich [email protected] http://www.inf.ethz.ch/department/IS/iks/

©Gustavo Alonso, ETH Zürich. Programming in Assembly 2

Load/Store architectures Machine code ! Processors are just a complex ! Machine code is the binary collection of digital gates and registers representation of the instructions processing electronic signals that can understood by the processor (by the be interpreted in binary form: 1 or 0 instruction decoder in the processor) ! The instruction decoder takes an ! How the machine code is processed instruction and generates (how it does determines many characteristics of the not matter) the necessary signals to processor: perform operations across the different "CISC (complex instruction set components: computer): the machine code "load data (bring some data into a contains many different register) operations, including complex ones "store data (move data from a (e.g., matrix multiplication). The register to somewhere else) execution often involves executing "operate on data (add, shift, “” (micro-code) compare, etc.) "RISC (reduced instruction set ! When the operations are done on a set computer): machine code contains of registers rather than directly in only instructions that can be memory, the architecture is called a quickly executed. The number of load/store architecture instructions is small (reduced) ©Gustavo Alonso, ETH Zürich. Programming in Assembly 3 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 4 Assembly language High level programming languages ! ! Machine code is binary and, therefore, .data Advantages of assembly language: main() unsuitable for direct manipulation by a: .word 1 "very efficient { humans b: .word 2 "allows to manipulate the hardware int a = 1; ! To program at the machine code level, int b = 2; start: .text almost directly (necessary for int ; one uses an assembly language. The set a, %r1 writing drivers and low level c = a + b; assembly language is simply a textual ld [%r1], %r1 components of the operating printf("%d\n",c); representation of machine code plus set b, %r2 system) } some syntactic rules that can be ld [%r2], %r2 interpreted by the assembler. The add %r1, %r2, %r3 ! Disadvantages of assembly language: .data assembler is the program that takes end: ta 0 "machine dependent (the language a: .word 1 assembly code as input and produces works on that processor and b: .word 2 nowhere else) machine code as output ^A^C^A^K \234 start: .text ^C^P\202^P`^DÂ@^E^P\204^ "programs are cumbersome and set a, %r1 ! Assembly code is closely tied to the P Ä\200\206X@^B\221D underlying processor architecture. Its ^Oÿ^Oÿ^D^C^N^D cryptic ld [%r1], %r1 ^U^F@^X^F@^D^[^D !^D set b, %r2 basic instruction set is the machine <%^E @-^E @4 @^H:^G@^HB "very repetitive programs ld [%r2], %r2 code instruction set of the processor. @^HN ! Higher level languages try to solve add %r1, %r2, %r3 Each assembler adds a dialect the these problems by providing better end: ta 0 programmer can use to build real abstractions programs in assembly language ©Gustavo Alonso, ETH Zürich. Programming in Assembly 5 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 6

A toy assembly language Structure of an assembly program ! LOADA mem - Load register A from memory address mem Assembly language programs are line-oriented (the assembler translates an assembly LOADB mem - Load register B from memory address mem program one line at a time). The assembler recognizes four types of lines: empty CONB con - Load a constant value into register B lines, label definition lines, directive lines, and instruction lines. SAVEB mem - Save register B to memory address mem "A line that only has spaces or tabs (i.e., white space) is an empty line. Empty SAVEC mem - Save register C to memory address mem lines are ignored by the assembler. ADD - Add register A and register B and store the result in register C SUB - Subtract register A and register B and store the result in register C "A label definition line consists of a label definition. A label definition consists of MUL - Multiply register A and register B and store the result in register C an identifier followed by a colon (“:”). As in most programming languages, an DIV - Divide register A and register B and store the result in register C identifier must start with a letter (or an underscore) and may be followed by any COM - Compare register A and register B and store result in register test number of letters, underscores, and digits. JUMP addr - Jump to address addr JEQ addr - Jump if the previous comparison was equal (register test is 0), to address addr "A directive line consists of an optional label definition, followed by the name of JNEQ addr - Jump if the previous comparison was not equal (register test is 0), to address an assembler directive, followed by the arguments for the directive. addr "An instruction line consists of an optional label definition, followed by the name JG addr - Jump if the comparison is Greater than (result is in register test), to address addr of an operation, followed by the . JGE addr - Jump if Greater than or equal (result is in register test), to address addr JL addr - Jump if Less than (result is in register test), to address addr ! Comments within a line begin with the character “!”. C-style type of comments JLE addr - Jump if Less than or equal (result is in register test), to address addr (spanning several lines) are allowed using /* … */ STOP - Stop execution

©Gustavo Alonso, ETH Zürich. Programming in Assembly 7 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 8 Segments and statements Assembly program (example 1)

! An assembly program, is organized in ! Internally, machine code is binary and .data ! variables three segments: it is processed in binary form a: .word 0x42 ! a initialized to 0x42 b: .word 0x43 ! b initialized to 0x43 "data segment: constants and data ! In assembly, one can work with c: .word 0x44 ! c initialized to 0x44 necessary for the program different systems: d: .word 0x45 ! d initialized to 0x45 "text segment: the instructions of " (0x…) .text ! Instructions a = (a+b) - (c-d) the program " start: set a, %r1 octal (0…) ld [%r1], %r2 ! $a$ --> %r2 "BSS segment: (Block Storage "decimal set b, %r1 Segment or Block Started by ld [%r1], %r3 ! $b$ --> %r3 Symbol) space for dynamic data ! Later on we will discuss these different set c, %r1 systems. However, keep in mind that ld [%r1], %r4 ! $c$ --> %r4 and non initialized global variables we will use hexadecimal and octal more set d, %r1 ! An statement: often than decimal ld [%r1], %r5 ! $d$ --> %r5 label: instruction ! add %r2, %r3, %r2 ! $a+b$ --> %r2 label: Also keep in mind that load and store sub %r4, %r5, %r3 ! $c-d$ --> %r3 instruction architectures are register based. Most sub %r2, %r3, %r2 ! $(a+b)-(c-d)$ --> %r2 of the instructions involve set a, %r1 ! A label is a symbol or a single digit. manipulating one or more registers st %r2, [%r1] ! $(a+b)-(c-d)$ --> a An instruction is a pseudo-op (assembler directive), synthetic end: ta 0 instruction, or instruction.

©Gustavo Alonso, ETH Zürich. Programming in Assembly 9 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 10

SPARC register pool Directives ! The assembly language we will learn is ! Global registers are used for global ! The different sections of the program ! . value1, ..., valuen the assembly language of the SPARC V8 variables are marked with the following Represents a sequence of , architecture (current V9) "%r0 is an special register that directives (also called pseudo- initialized with the given data, in always holds the value 0 and operations): ! This is a RISC architecture with 32 cannot be modified sequence. The values must fit integer registers. Each integer register ".text for the program code, within 8 bits each ! Output registers are used for local data holds 32-bits. The integer registers are and arguments to/from subroutines ".data for the global writeable ! .halfword value1, ..., valuen called %r0 through %r31. In addition to "%r14 (%sp, %o6) is the stack initialized data, Represents a sequence of halfwords, the names %r0 through %r31, the pointer " integer registers have alternative names .bss for the global uninitialized initialized with the given data, in "%r15 is the return address of the data sequence. The values must fit "global registers (%g0-%g7) called within 16 bits each correspond to registers %r0-%r7 ! .ascii string1, ..., stringn ! Local registers are for general use ! .word value1, ..., valuen " (local variables) Represents a sequence of bytes, output registers (%o0-%o7) initialized with the ASCII Represents a sequence of words, correspond to registers %r8-%r15 ! Input registers are used for argument passing from subroutines encoding of the strings, in initialized with the given data, in "local registers (%l0-%l7) sequence, without string sequence. The values must fit correspond to registers %r16-%r23 "%r30 (%fp, %i6) is the frame pointer terminators (.asciz adds \0) within 32 bits each "input registers (%i0-%i7) "%r31 is the subroutine return ! .global label ! .include “file_name” correspond to registers %r24-%r31 address Makes a label global Used to add additional definitions from other files ©Gustavo Alonso, ETH Zürich. Programming in Assembly 11 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 12 Basic instructions (1) Basic instructions (2) Program begin set load move ! The beginning of a program is indicated ! The set operation allows to load a ! The load instruction brings a word from ! The mov (move) instruction copies the with the .text directive. The accepted constant into a register memory into a register contents of a register or a small format is integer into another register (this set 0x42, %r2 ld [%r2], %r3 instruction is synthetic) .text set x, %r3 mov %r1, %r2 start: first instruction mov 1, %r2 second instruction store … clear ! The store instruction copies the value add, sub ! The clr (clear) operations sets a in a register (a word) into a location in ! The add and sub operators take three Program termination particular location in memory to 0 memory arguments: two operands, and a (this operation is synthetic, works with ! You should terminate their execution st %r3, [%r2] destination for the result. The by executing the instruction ta. This is registers and labels denoting memory operands can be either two registers or a trap instruction that calls the locations) In SPARC assembly language a register and a signed small constant operating system with a request instructions, the destination is (must fit in 13 bits) clr [%r3] always specified as the last . encoded in register %g1 clr a /* A declared in load and store operate with different the .data add %r3, %r4, %r5 addressing modes, not just registers ! %r5 = %r3 + %r4 end: ta 0 segment */ [%r2] means interpret the contents of sub %r3, 1, %r3 %r2 as a memory address ! %r3 = %r3 - 1 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 13 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 14

Basic instructions (3) Example 2

signed and unsigned multiplication signed and unsigned division .data ! The integer multiplication operations ! The integer division operations divide a a: .word 0x40 multiply two 32-bit source values and 32-bit value into a 64-bit value and b: .word 0x0A produce a 64-bit result. The most produce a 32-bit result. The Y register c: .word 0x04 significant 32 bits of the result are provides the most significant 32 bits .text ! a = (a*b)/c stored in the Y register (%y) and the of the 64-bit dividend. One of the start: set a, %r1 remaining 32 bits are stored in one of source values provides the least ld [%r1], %r2 the integer registers. The second significant 32 bits, while the other set b, %r1 operand can be a small integer provides the 32 bit divisor ld [%r1], %r3 set c, %r1 ld [%r1], %r4 smul %r1, %r2, %r3 sdiv %r1, %r2, %r3 umul %r1, 10, %r3 ! %r3 = {%y,%r1}/%r2 smul %r2, %r3, %r2 ! a* b --> %y, %r2 sdiv %r2, %r4, %r2 ! %y, %r2 / c --> %r2 null operation udiv %r1, 10, %r3 ! %r3 = {%y,%r1}/10 set a, %r1 ! The operation skips a cycle without st %r2, [%r1] ! %r2 --> a doing anything end: ta 0 nop

©Gustavo Alonso, ETH Zürich. Programming in Assembly 15 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 16 Example 3 Instruction pipelining .data ! To speed up the processing of FDOES a2: .word 1 instructions, most modern Operand a1: .word 5 Fetch Decode Execute Store a0: .word 7 architectures use pipelining fetch x: .word 9 ! In a non- architecture, there is non-pipelined execution of an instruction y: .word 0 one single instruction executed at any .text /*y= (x-a2)*(x-a1)/(x-a0) */ given time start: set x, %r1 ld [%r1], %r1 ! In a pipelined architecture, an F F F F F F set a2, %r2 instruction is broken up in different D D D D D D ld [%r2], %r2 parts and executed separately. At any set a1, %r3 O O O O O O ld [%r3], %r3 given point in time there are several set a0, %r4 instructions being executed E E E E E E ld [%r4], %r4 set y, %r7 ! In the SPARC architecture S S S S S S sub %r1, %r2, %r5 ! %r5 = (x-a2) " sub %r1, %r3, %r6 ! %r6 = (x-a1) pipeline of depth 5 smul %r6, %r5, %r5 ! %r5 = (x-a2)*(x-a1) "2 instructions concurrently being sub %r1, %r4, %r6 ! %r6 = (x - a0) time in machine cycles sdiv %r5, %r6, %r5 ! %r5 = %r5 / %r6 executed are visible: st %r5, [%r7] ! y = %r5 •%pc pipelined execution of an instruction end: ta 0 •%npc

©Gustavo Alonso, ETH Zürich. Programming in Assembly 17 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 18

Problems with pipelining Solving hazards ! Pipelining is a great idea but not as ! Structural hazards are solved by ! Importance of code reordering: easy as it looks due to several Stall, instruction is replicating functional units so that unable to continue ! Accessing memory is much slower than problems (hazards) there are always enough of them to accessing registers: ! Hazard = when an instruction’s stage F F F F F F perform all steps of the pipeline ld [%r2], %r3 add %r3, %r4, %r4 in the pipeline is unable to execute D D D D D D "modern SPARC systems have 4 during the current cycle. Hazards occur integer ALUs and 2 Floating point ld [%r1], %r5 in several situations: O O O O O O ALUs (and a pipeline depth of 14) stall "(data) data dependencies: the data E E E E E E ! Data hazards are typically solved by F F needed is not ready S S S S S S "forwarding: a hardware technique D D "(structural) shared resources: the whereby a pipeline stage can O O functional unit needed is currently access the results of another being used time in machine cycles pipeline stage (rather than waiting E E for the instruction to complete) S S "(control) branches: we don’t know pipelined execution of an instruction what instruction will be executed "code reordering: change the order ld [%r2], %r3 next of instructions to avoid data ld [%r1], %r5 ! Hazards result in stalls, i.e., delays in dependencies (this technique can add %r3, %r4, %r4 the pipeline be applied by the programmer or the )

©Gustavo Alonso, ETH Zürich. Programming in Assembly 19 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 20 Branching hazards Branching in assembly language ! Branching creates its own set of ! The branch can be used to ! The SPARC architecture has a ! Target is a label. The branch is taken problems with the pipeline: execute instructions (more efficiency) condition code (cc) register that can only if condition is true, otherwise "when we reach the branching but makes it difficult to understand the be used to test certain characteristics execution continues after the branch instruction, we don’t know the code of the result of an operation. This instruction result, i.e., we don’t know what special register has 4 bits: instruction should be executed "Z (Zero): set to 1 if the result was next 0 "the pipeline is automatically filled "N (Negative): set to 1 if the result with the next instruction, which was negative will be executed anyway (%pc, "C (Carry): carry bit of the MSB of %npc) … the result "… but if we branch, the "V (oVerflow): indicates whether instruction execute is invalid the result was too big to fit in one (should not have been executed register since the flow of control went ! Not all operations set these bits somewhere else) (special operations are needed): ! The easiest solution is to include a nop addcc, subcc after a branch instruction smulcc, sdivcc ©Gustavo Alonso, ETH Zürich. Programming in Assembly 21 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 22

While loop While loop (optimization) ! ! Why does the code look so The while loop is a basic instruction in .data / *While loop in assembly (version 1) */ .data /* While loop in assembly (version 2) */ computer programs. It is important to x: .word 0 complicated? x: .word 0 implement it as efficiently as possible y: .word 0x9 ! The problem is that the instruction y: .word 0x9 in assembly (the compiler does this z: .word 0x42 after the branch is always executed. z: .word 0x42 Thus, this would not work: implementation for you) .text .text ! start: set y, %r1 while loop in C: ld [%r1], %r2 ! %r2 = temp add %r2, 1, %r2 ! set up for decrement start: set y, %r1 top: subcc %r2, 1, %r2 ! temp - 1 --> temp set z, %r1 ld [%r1], %r2 ! %r2 = temp int temp; ld [%r1], %r3 ! %r3 = z bg top ! test the loop condition mov %r0, %r4 ! %r4 = x = 0 add %r4, %r3, %r4 ! x + z --> x set z, %r1 int x = 0; ld [%r1], %r3 ! %r3 = z int y = 0x9; add %r2, 1, %r2 ! set up for decrement ! The addition takes place in the branch mov %r0, %r4 ! %r4 = x = 0 int z = 0x42; ba test ! test the loop condition delay slot (good!!) but it happens one nop ! BRANCH DELAY SLOT too many times (bad!!) add %r2, 1, %r2 ! set up for decrement temp = y; top: add %r4, %r3, %r4 ! x + z --> x test: subcc %r2, 1, %r2 ! temp - 1 --> temp ! To allow using the branch delay slot top: subcc %r2, 1, %r2 ! temp - 1 --> temp while( temp > 0 ) { bg top ! temp > 0 ? and yet avoid this problem, one can bg,a top ! temp > 0 ? x = x + z; nop ! BRANCH DELAY SLOT nullify (annul) the branch delay slot: if add %r4, %r3, %r4 ! x + z --> x temp = temp - 1; the branch is not taken, the instruction } set x, %r1 st %r4, [%r1] ! store x in the branch delay slot is “undone” set x, %r1 end: ta 0 st %r4, [%r1] ! store x end: ta 0

©Gustavo Alonso, ETH Zürich. Programming in Assembly 23 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 24 More on while loops While loops with cmp ! When the condition is more complex (not simply a comparison with 0), we .data ! First attempt a: .word 0 .data ! Optimized version need an extra instruction: b: .word 3 a: .word 8 .global _main b: .word 3 cmp register1, register2 .global _main main () { .text cmp register2, const. .text int a= 0; _main: set a, %r1 ld [%r1], %r2 ! %r2 = a _main: set a, %r1 /* cmp is synthetic */ int b= 3; set b, %r1 ld [%r1], %r2 while (a<=17) { ld [%r1], %r3 ! %r3 = b set b, %r1 subcc register1, register2, g0 a= a+ b; ld [%r1], %r3 subcc register2, const, g0 } loop: cmp %r2, 17 ! a>17 bg store loop: cmp %r2, 17 } /* compare is implemented by nop !1. delay slot ble,a loop subtracting the values with add %r2, %r3, %r2 add %r2, %r3, %r2 subcc and then checking the sign. Note that %go is ba loop register %r0, it canot be nop !2. delay slot store: set a, %r1 modified */ st %r2, [%r1] store: set a, %r1 end: ta 0 st %r2, [%r1] end: ta 0 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 25 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 26 do while loops for loops ! main() { .data The difference between while loops and .data int a = 0; do-while loops is that, in the latter, a: .word 0 a: .word 0 int b = 3; b: .word 3 the loop is executed at least once; b: .word 3 int i; i: .word 1 .global _main con: .word 15 for(i=1; i<=15; i++) { .global _main .text a = a + b; main() { _main: set a, %r1 .text } _main: set a, %r1 int a = 0; ld [%r1], %r2 ld [%r1], %r2 ! %r2 = a int b = 3; set b, %r1 } set b, %r1 do { ld [%r1], %r3 ld [%r1], %r3 ! %r3 = b a = a + b; main(){ set i, %r1 } while (a<=17); /*equivalent program */ ld [%r1], %r4 ! %r4 = i add %r2, %r3, %r2 } int a = 0; set con, %r1 loop: cmp %r2, 17 int b = 3; ld [%r1], %r5 ! %r5 = 15 ble,a loop int i = 1; ba test !branch always loop: cmp %r4, %r5 add %r2, %r3, %r2 while(i<=15) { add %r2, %r3, %r2 !delay slot a = a + b; test: ble,a loop store: set a, %r1 i++; inc %r4 st %r2, [%r1] } store: set a, %r1 } st %r2, [%r1] end: ta 0 end: ta 0 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 27 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 28 If-then-else switch ! A switch statement involves a complex set of branching instructions !a->r2, b->r3, c->r4 switch (i) { ! It is implemented using a table. In the case 1: i += 1; add %r2, %r3, %r6 table, we put the different instructions break; cmp %r6, %r4 ! if for each case case 2: i += 2; if ((a + b)>=c) { bl,a else ! Since the cases must be integers (now break; a += b; sub %r2, %r3, %r2 !1. instruction of else case 15: i += 15; c++; you know why), we use the indexing add %r2, %r3, %r2 !1. instruction of then variable to calculate where into the case 3: i += 3; } else { inc %r4 break; a -= b; table we must branch to continue ba store case 4: i += 4; c--; execution add %r4, 10, %r4 case 6: i += 6; } ! The important points to remember are: else: dec %r4 break; c += 10; case 5: i += 5; add %r4, 10, %r4 "default if no matching case is break; store: set a, %r1 found "what to do in case of break default: i--; st %r2, [%r1] } end: ta 0

©Gustavo Alonso, ETH Zürich. Programming in Assembly 29 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 30 switch table

.data L1: ba end !first case in jump table .align 4 add %l0,1,%l0 !i++ table: .word L1,L2,L3,L4,L5,L6,L7,L8,L9 L2: ba end .word L10,L11,L12,L13,L14,L15 add %l0,2,%l0 !i += 2 .text L15: add %l0,15,%l0 !i+=15; no break .align 4 L3: ba end start: set i, %r1 add %l0,3,%l0 !i += 3 ld [%r1], %l0 L4: add %l0, 4, %l0 !i+=4; no break ld [%r1], %o0 ! %o0 = i L6: ba end cmp %o0, 1 add %l0,6,%l0 !i += 6 blu default !expression too small L5: ba end cmp %o0, 1 add %l0,5,%l0 !i += 5 bgu default !too large L7: nop L8: set table, %o1 !jump table … ! All other labels (lack of space in foil) sll %o0, 2, %o0 !%o0 x 4 (words) add %o1, %o0, %o0 L13: !%o0 points to case in table L14: jmpl %o0, %g0 !transfer control default: sub %l0, 1, %l0 !i-- nop end: ta 0 ©Gustavo Alonso, ETH Zürich. Programming in Assembly 31