Chapter 3: Arithmetic for Computers

182.092 Computer Architecture Chapter 3: Arithmetic for Computers Adapted from Computer Organization and Design, 4th Edition, Patterson & Hennessy, © 2008, Morgan Kaufmann Publishers and Mary Jane Irwin (www.cse.psu.edu/research/mdl/mji) 182.092 Chapter 3.1 Herbert Grünbacher, TU Vienna, 2010 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number of instruction formats opcode always the first 6 bits Smaller is faster limited instruction set limited number of registers in register file limited number of addressing modes Make the common case fast arithmetic operands from the register file (load-store machine) allow instructions to contain immediate operands Good design demands good compromises three instruction formats (R, I, J) 182.092 Chapter 3.2 Herbert Grünbacher, TU Vienna, 2010 Review: MIPS Addressing Modes Illustrated 1. Register addressing op rs rt rd funct Register word operand 2. Base (displacement) addressing op rs rt offset Memory word or byte operand base register 3. Immediate addressing op rs rt operand 4. PC-relative addressing op rs rt offset Memory branch destination instruction Program Counter (PC) 5. Pseudo-direct addressing op jump address Memory || jump destination instruction Program Counter (PC) 182.092 Chapter 3.3 Herbert Grünbacher, TU Vienna, 2010 Number Representations 32-bit signed numbers (2’s complement): 0000 0000 0000 0000 0000 0000 0000 0000two = 0ten maxint 0000 0000 0000 0000 0000 0000 0000 0001two = + 1ten ... 0111 1111 1111 1111 1111 1111 1111 1110two = + 2,147,483,646ten 0111 1111 1111 1111 1111 1111 1111 1111two = + 2,147,483,647ten 1000 0000 0000 0000 0000 0000 0000 0000two = – 2,147,483,648ten 1000 0000 0000 0000 0000 0000 0000 0001two = – 2,147,483,647ten ... 1111 1111 1111 1111 1111 1111 1111 1110two = – 2ten minint MSB 1111 1111 1111 1111 1111 1111 1111 1111two = – 1ten LSB Converting <32-bit values into 32-bit values copy the most significant bit (the sign bit) into the “empty” bits 0010 -> 0000 0010 1010 -> 1111 1010 sign extend versus zero extend (lb vs. lbu) 182.092 Chapter 3.4 Herbert Grünbacher, TU Vienna, 2010 MIPS Arithmetic Logic Unit (ALU) zero ovf Must support the Arithmetic/Logic operations of the ISA 1 1 A add, addi, addiu, addu 32 sub, subu ALU result 32 mult, multu, div, divu B sqrt 32 4 and, andi, nor, or, ori, xor, xori m (operation) beq, bne, slt, slti, sltiu, sltu With special handling for sign extend – addi, addiu, slti, sltiu zero extend – andi, ori, xori overflow detection – add, addi, sub 182.092 Chapter 3.5 Herbert Grünbacher, TU Vienna, 2010 1-bit MIPS ALU 182.092 Chapter 3.6 Herbert Grünbacher, TU Vienna, 2010 Final 32-bit ALU 182.092 Chapter 3.7 Herbert Grünbacher, TU Vienna, 2010 Dealing with Overflow Overflow occurs when the result of an operation cannot be represented in 32-bits, i.e., when the sign bit contains a value bit of the result and not the proper sign bit When adding operands with different signs or when subtracting operands with the same sign, overflow can never occur Operation Operand A Operand B Result indicating overflow A + B ≥ 0 ≥ 0 < 0 A + B < 0 < 0 ≥ 0 A - B ≥ 0 < 0 < 0 A - B < 0 ≥ 0 ≥ 0 MIPS signals overflow with an exception (aka interrupt) – an unscheduled procedure call where the EPC contains the address of the instruction that caused the exception 182.092 Chapter 3.8 Herbert Grünbacher, TU Vienna, 2010 But What about Performance? Critical path of n-bit ripple-carry adder is n*CP CarryIn0 A0 1-bit result0 ALU B0 CarryOut0 CarryIn1 A1 1-bit result1 ALU B1 CarryOut1 CarryIn2 A2 1-bit result2 ALU B2 CarryOut2 CarryIn3 A3 1-bit result3 ALU B3 CarryOut3 Design trick – throw hardware at it (Carry Lookahead) 182.092 Chapter 3.9 Herbert Grünbacher, TU Vienna, 2010 Multiply Binary multiplication is just a bunch of right shifts and adds n multiplicand multiplier partial can be formed in parallel product n and added in parallel for array faster multiplication double precision product 2n 182.092 Chapter 3.10 Herbert Grünbacher, TU Vienna, 2010 Long-multiplication approach multiplicand 1000 multiplier × 1001 1000 0000 0000 1000 product 1001000 Length of product is the sum of operand lengths 182.092 Chapter 3.11 Herbert Grünbacher, TU Vienna, 2010 Long-multiplication approach Initially 0 182.092 Chapter 3.12 Herbert Grünbacher, TU Vienna, 2010 Add and Right Shift Multiplier Hardware 0 1 1 0 = 6 multiplicand 32-bit ALU add shift right product multiplier Control 0 0 0 0 0 1 0 1 = 5 add 0 1 1 0 0 1 0 1 0 0 1 1 0 0 1 0 add 0 0 1 1 0 0 1 0 0 0 0 1 1 0 0 1 add 0 1 1 1 1 0 0 1 0 0 1 1 1 1 0 0 add 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 0 = 30 182.092 Chapter 3.14 Herbert Grünbacher, TU Vienna, 2010 MIPS Multiply Instruction Multiply (mult and multu) produces a double precision product mult $s0, $s1 # hi||lo = $s0 * $s1 0 16 17 0 0 0x18 Low-order word of the product is left in processor register lo and the high-order word is left in register hi Instructions mfhi rd and mflo rd are provided to move the product to (user accessible) registers in the register file Multiplies are usually done by fast, dedicated hardware and are much more complex (and slower) than adders 182.092 Chapter 3.15 Herbert Grünbacher, TU Vienna, 2010 Fast Multiplication Hardware Can build a faster multiplier by using a parallel tree of adders with one 32-bit adder for each bit of the multiplier at the base ‘ier1*’icand ‘ier7*’icand ‘ier5*’icand ‘ier3*’icand ‘ier30*’icand ‘ier6*’icand ‘ier4*’icand ‘ier2*’icand ‘ier0*’icand 32-bit ALU . 32-bit ALU 32-bit ALU 32-bit ALU 32-bit ALU 33-bit ALU 33-bit ALU 33-bit ALU . 36-bit ALU 43-bit ALU product3 product2 product1 59-bit ALU product4 product0 182.092 Chapter 3.16 Herbert Grünbacher, TU Vienna, 2010 product63 . product5 Division Division is just a bunch of quotient digit guesses and left shifts and subtracts dividend = quotient x divisor + remainder n n quotient 0 0 0 dividend divisor 0 partial 0 remainder array 0 remainder n 182.092 Chapter 3.17 Herbert Grünbacher, TU Vienna, 2010 Algorithm for hardware division (restoring) Do n times: {left shift A and Q by 1 bit A A – M; If A < 0 (an-1 = 1), then q0 0, A A + M (restore) else q0 1 } 182.092 Chapter 3.19 Herbert Grünbacher, TU Vienna, 2010 Left Shift and Subtract Division Hardware 0 0 1 0 = 2 divisor 32-bit ALU subtract shift left dividend remainder quotient Control 0 0 0 0 0 1 1 0 = 6 0 0 0 0 1 1 0 0 sub 1 1 1 0 1 1 0 0 rem neg, so ‘ient bit = 0 0 0 0 0 1 1 0 0 restore remainder 0 0 0 1 1 0 0 0 sub 1 1 1 1 1 1 0 0 rem neg, so ‘ient bit = 0 0 0 0 1 1 0 0 0 restore remainder 0 0 1 1 0 0 0 0 sub 0 0 0 1 0 0 0 1 rem pos, so ‘ient bit = 1 0 0 1 0 0 0 1 0 sub 0 0 0 0 0 0 1 1 rem pos, so ‘ient bit = 1 = 3 with 0 remainder 182.092 Chapter 3.20 Herbert Grünbacher, TU Vienna, 2010 82 Beispiel: 2 33 [M] 0011 [A] 0000 [Q] 1000 left shift [A][Q] 0001 000_ A =[A] - [M] + 1101 A < 0 1110 0000 Subtract via [A ] = [A] + [M] + 0011 2s complement add 0001 0000 left shift [A][Q] 0010 000_ A =[A] - [M] + 1101 A < 0 1111 0000 [A ] = [A] + [M] + 0011 0010 0000 left shift [A][Q] 010o 000_ A =[A] - [M] + 1101 A < 0 0001 0001 left shift [A][Q] 0010 001_ A =[A] - [M] + 1101 A < 0 1111 0010 [A ] = [A] + [M] + 0011 0010 = 2 0010 = 2 8/3 = 2 reminder 2 182.092 Chapter 3.21 Herbert Grünbacher, TU Vienna, 2010 Faster Division Can’t use parallel hardware as in multiplier Subtraction is conditional on sign of remainder Faster dividers (e.g. SRT division) generate multiple quotient bits per step Still require multiple steps 182.092 Chapter 3.22 Herbert Grünbacher, TU Vienna, 2010 MIPS Divide Instruction Divide (div and divu) generates the reminder in hi and the quotient in lo div $s0, $s1 # lo = $s0 / $s1 # hi = $s0 mod $s1 0 16 17 0 0 0x1A Instructions mfhi rd and mflo rd are provided to move the quotient and reminder to (user accessible) registers in the register file As with multiply, divide ignores overflow so software must determine if the quotient is too large. Software must also check the divisor to avoid division by 0. 182.092 Chapter 3.23 Herbert Grünbacher, TU Vienna, 2010 Representing Big (and Small) Numbers Like scientific notation –2.34 × 1056 normalized +0.002 × 10–4 not normalized +987.02 × 109 There is no way we can encode either of the above in a 32-bit integer. Floating point representation (-1)sign x F x 2E Still have to fit everything in 32 bits (single precision) s E (exponent) F (fraction) 1 bit 8 bits 23 bits The base (2, not 10) is hardwired in the design of the FPALU More bits in the fraction (F) or the exponent (E) is a trade-off between precision (accuracy of the number) and range (size of the number) 182.092 Chapter 3.24 Herbert Grünbacher, TU Vienna, 2010 Exception Events in Floating Point Overflow (floating point) happens when a positive exponent becomes too large to fit in the exponent field Underflow (floating point) happens when a negative exponent becomes too large to fit in the exponent field -∞ +∞ - largestE -smallestF - largestE +smallestF + largestE -largestF + largestE +largestF One way to reduce the chance of underflow or overflow is to offer another format that has a larger exponent field Double precision – takes two MIPS words s E (exponent) F (fraction) 1 bit 11 bits 20 bits F (fraction continued) 32 bits 182.092 Chapter 3.25 Herbert Grünbacher, TU Vienna, 2010 IEEE 754 FP Standard Most (all?) computers these days conform to the IEEE 754 floating point standard (-1)sign x (1+F) x 2E-bias Formats for both single and double precision F is stored in normalized format where the

Chapter 3: Arithmetic for Computers

Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

Overview of the SPEC Benchmarks

3 — Arithmetic for Computers 2 MIPS Arithmetic Logic Unit (ALU) Zero Ovf

Asustek Computer Inc.: Asus P6T

Modeling and Analyzing CPU Power and Performance: Metrics, Methods, and Abstractions

Continuous Profiling: Where Have All the Cycles Gone?

Specfp Benchmark Disclosure

Intel Corporation: Lenovo T400 (Intel Core 2 Duo T9900)

Historical Perspective and Further Reading 4.7

An Evaluation of the TRIPS Computer System

Dell Inc.: Poweredge T610 (Intel Xeon X5690, 3.46 Ghz)

A Lightweight Processor Benchmark Mark Claypool Worcester Polytechnic Institute, [email protected]