Arithmetic MIPS Integer Representation Logical, Integer & Subtraction Š 32-bit signed integers, e.g., for numeric operations Chapter 3.1-3.3 • 2’s complement: one representationrepresentation for zero, balanced, allows add/subtract to be treated uniformly EEC170 FQ 2005 Š 32-bit unsigned integers, e.g., for address operations • Address considered 32-bit unsigned integer Š Provides distinct instructions for signed/unsigned: • ADD, ADDI: add signed register, add signed immediate à causes exception on overflow • ADDU, ADDIU: add unsigned register, add unsigned immediate à no exception on overflow

OP Rs Rt Rd 0 ADD/U

ADDI/U Rs Rt immediate data Layout of a full cell 1 2

Comparison Sign Extension Š Distinct instructions for comparison of Š Sign of immediate data extended to form 32-bit signed/unsigned integers representation: • Which is larger: 1111...1111 or 0000...0000 ? Depends of type, signed or unsigned 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 0 Š Two versions of slt for signed/unsigned: • slt, sltu: set less than signed, unsigned 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 0

OP Rs Rt Rd 0 SLT/U

Š Thus, ALU always uses 32-bit operands Š Two versions of immediate comparison also provided: • slti, sltiu: set less than immediate signed, unsigned Š Extension occurs for signed and unsigned arithmetic SLTI/U Rs Rt immediate data

3 4

Overflow Computer System, Top Level View

Š MIPS has no flag (status) register • complicates pipeline (see Chapter 6) Compiler Š Overflow (underflow): • Occurs if operands are same sign, result is different sign. Control Input • Can be checked in software if necessary Input

• MIPS generates interrupt on overflow for Memory signed arithmetic to notify program DatapathDatapath • C compiler only generates unsigned Output arithmetic instructions (avoids interrupts) à Representation of result is the same if signed/unsigned

5 6 Simple Processor Simple Processor Datapath Š Includes registers and ALU Š Includes registers and ALU • Cycle 1: Register Fetch • Cycle 2: ALU Operation

Registers Registers

Operand1 Operand1

Result Result

ALU ALU

Operand2 Operation Operand2 Operation 7 8

Simple Processor Datapath Processor Control Š Includes registers and ALU Š Control directs actions in the data path. Instruction is top level of control • Cycle 3: Register Write Back

OP Rs Rt Rd 0 ADD Registers

Operand1 Registers

Result operand1 result

ALU ALU

operand2 Operand2 Operation Operation 9 10

Inside the ALU Boolean Operations Š Each pair of inputs resolved using a Operand1 mux Operand1 add corresponding gate

32 sub result

slt 32

and AND Unit 32 AND Unit

or

eq/ne Operand2

Operation 11 12 One Bit Full Adder Full Adder Truth Table Š Adder truth table maps inputs to the outputs Š Multiple-bit adder can be built from series of one-bit full adders Input Output In a b Carry_In Carry_Out Sum 0 0 0 0 0 0 0 1 0 1 a 0 1 0 0 1 0 1 1 1 0 Sum 1 0 0 0 1 b 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 Carry Out Carry_Out = b*Carry_In + a*Carry_In + a*b

Sum = a*b*Carry_in + a*b*Carry_in + a*b*Carry_in

13 + a*b*Carry_in 14

Carry_Out Implementation Two’s Complement Subtraction

Š For two’s complement arithmetic the following Carry In is true: X + X = -1 Proof: result of X + X is all ones 101001 + 010110 = 111111 a X + (X + 1) = 0

- X = X + 1 b Š Our strategy for computing A – B: convert to A + (-B) and use the above rule for -B Carry Out

15 16

One Bit Add/Subtract Cell Single Bit ALU

Š Modify the adder cell so that the B input Š Combine gates with add/subtract unit to form 1- is selectable bit ALU • Operation selects which result is sent out Binvert Carry In • 1-bit ALUs will be connected in a chain to form 32-bit ALU a Sum b 0 1

Carry Out

17 18 Support for Set Less Than Most Significant 1-Bit ALU Š And an input Less to 1-bit ALU for SLT Š Specialize ALU Bit 31 to produce Overflow and Set (for set less than) • Input connections will be seen soon • What’s inside the overflow detection box?

19 20

32-Bit ALU Support for BEQ/BNE Š 32 one-bit ALUs are cascaded to form Š Determine EQ/NE using subtract operation 32-bit ALU • All zero result is EQ, else NE Š SLT condition is computed using A-B, Set is sign-bit of result • All other Less inputs set to 0 • Set is not accurate when there’s overflow, why?

How is this corrected?

21 22

Ripple Carry Adder Performance Vector Arithmetic Š This Ripple Carry Adder is very slow. Each Š Allows arithmetic to be done in parallel stage causes 3 two-input gate delays: on subcomponents of a register, e.g., arithmetic on bytes: cout = ab + acin + bcin 31 24 23 16 15 8 7 0 X3 X2 X1 X0

+ Y3 Y2 Y1 Y0 ======

Z3 Z2 Z1 Z0 Š For a 32-bit adder this would be 96 two-input gate delays, much too slow Š Can speedup addition by using more hardware • Carry-look ahead adder 23 24 Vector Arithmetic (cont.) Multimedia Arithmetic

Š Vector arithmetic has been used for decades in Š Multimedia applications require unconventional supercomputers for scientific applications types of arithmetic. Multimedia architecture extensions include new arithmetic operations Š Also used for architecture extensions to facilitate multimedia processing for RISC • E.g., saturating addition/subtraction: processors and for (MMX,SIMD), e.g.: à when altering the intensity of a pixel (e.g., adding two • The intensity of each display dot (pixel) is often scenes, increasing overall intensity by 2x), we do not represented by 1 byte want overflow or underflow to cause wrap around (bright pixel becomes dark). • Arithmetic on a vector of bytes allows parallel pixel arithmetic, e.g., adding two scenes together bit by bit à Overflow or underflow detection/special processing is not feasible Š Simple extensions to our MIPS architecture and à Rather, arithmetic result should saturate at maximum or the 32-bit adder implementation will allow both minimum value that can be represented. Thus for 8-bit 32-bit arithmetic and byte-parallel arithmetic: representation: 243 + 124 = 255 Addv Rd,Rs,Rt Subv Rd,Rs,Rt Š New multimedia operations require additional ALU changes 25 26