<<

EE141

EE141-Spring 2006 Outline Digital Integrated ‰ Introduction Circuits Circuits ‰ What is the ? ‰ High Level Design Considerations Design of an Execution Unit ‰ Circuit Design of an ‰ “Real Life” Designs Luke Tsai AMD

1 2 EECS141EE141 EECS141EE141

Introduction

‰ If you love EE141… What is an ƒ Consider a career in Design Execution ƒ All aspects and variety of circuit design Unit (EX)? ƒ Maximum complexity ƒ Leading Edge Technology

3 4 EECS141EE141 EECS141EE141

A Classical The EX Unit Implements the Block Diagram Integer Instruction Set

Instruction Fetch (IF) Decode (DE) Add* R1, R2 Instruction Fetch (IF) Sub R1, R2 Scheduler (SC) Memory Decode (DE) Mult R1, R2 (L2 ) Memory Execution Unit (EX) Div R1, R2 Scheduler (SC) ROL R1, R2 (L2 Cache) Execution Unit (EX) Load-Store (LS) SAR R1, R2 Floating Point (FPU) CLZ R1 Load-Store (LS) * notation. The first register is Floating Point (FPU) both a source and the destination 5 6 EECS141EE141 EECS141EE141

1 EE141

Interface to the SC Interface to the LS ‰The SC issues instructions to the EX ‰ For Load/Store Ops, EX generates address for the LS, which in turn sends/receives Data to/from EX. ƒ Out-of-order SC needs to check for source dependency ‰ Address generation to load data return is a classical Instruction Fetch (IF) critical path in processorInstruction design Fetch (IF)

Add R1, R2 ncy ende Decode (DE) Decode (DE) Dep Add R1, [R2] Sub R3, R1 Scheduler (SC) Memory Load Scheduler (SC) Memory (L2 Cache) Sub [R3], R1 (L2 Cache) Mult R4, R2 Execution Unit (EX) Store Execution Unit (EX) No Dependency, Mult [R4], [R2] Can Issue in Parallel Load-Store (LS) Load-Store (LS) Load-Op-Store . Floating Point (FPU) Floating Point (FPU)

7 8 EECS141EE141 EECS141EE141

A Typical Block Diagram of EX

Execution Unit High Level Multi-ported ALU0 Design Operand Design

Bypass Considerations Mult Shifter ALU1..N AGen1..N Div/CLZ/Popcnt Result Bus

9 10 EECS141EE141 EECS141EE141

Meeting the Performance Target Micro-Architecture Considerations

‰ IPC: How each instr is executed ‰ Pipeline ƒ What EX unit and how many each to build ‰ Interface with the Scheduler ‰ Frequency ƒ What type of circuit style ƒ How to handle Out-of-order Execution ‰ Power ‰ Interface with the LS unit ƒ How much energy per operation ƒ How many cycle for Agen-Data loop? ‰ Area ƒ How to suppress ƒ Silicon real estate is expensive when load data is invalid? ‰ The design point is based on trade-offs of the above criteria

11 12 EECS141EE141 EECS141EE141

2 EE141

Physical Design Considerations Physical Design Considerations

‰ Operand Bypass ‰ Floorplan ƒ Bypass condition occurs when an operand of an instruction ƒ Floorplan of an EX unit is very crucial piece of scheduled to be executed in cycle n is generated in the design decision. It impacts: immediate preceding cycle (n-1). – Bus length (frequency, power) ƒ The data of this operand do not reside in the register file and need to be bypassed from one of the result buses. – pitch (frequency, power, area) – Bypass Scheme (area, power) n ditio Add* R1, R2 Con ass Byp Sub R3, R1 Mult R4, R2

* Actual execution sequence (not program order)

13 14 EECS141EE141 EECS141EE141

What is a Barrel Shifter?

‰ Performs a shift or rotate on the Circuit Design full/partial data of an Barrel ƒ Example: 8 shifter Shifter Input Bit Position 7 6 5 4 3 2 1 0 Rot Left 1 6 5 4 3 2 1 0 7 Rot Right 1 0 7 6 5 4 3 2 1 Left 2 5 4 3 2 1 0 L L (= mult by 4) Arithmetic Shift Left 2 5 4 3 2 1 0 L L (Same as above) Logical Shift Right 3 L L L 7 6 5 4 3 Arithmetic Shift Right 3 7 7 7 7 6 5 4 3 L = Low (zero)

15 16 EECS141EE141 EECS141EE141

Barrel Shifter Design Barrel Shifter Implementations

‰ Observe: Any input bit could be passed 1. Single-stage NxN mux ƒ Fewest gates between input and output to ALL output bit positions. ƒ Most number of select signals (largest load for shift amount) ƒ Therefore: the shifter is nothing but a giant 2. Multi-stage Mux NxN mux, where N is the width of data. ƒ More stage = more gates between input and output ƒ The mux select is the one-hot decode of the ƒ Reduction in select signal is a diminishing return – For 64 bit shifts:

shift amount. z 1 stage = 64 selects

z 2 stages (8x8) = 16 selects (75% reduction)

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 z 3 stages (4x4x4) = 12 selects (25% reduction) 3. Mux Implementation 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ƒ Low swing passgate ƒ Full Swing Domino 17 18 EECS141EE141 EECS141EE141

3 EE141

Barrel Shifter Array Barrel Shifter Additional Complexity One-Stage Mux Two-Stage Mux Inputs Inputs 1. Partial Shifts/Rotates Selects Selects ƒ X86 Instruction Set supports 8(L/H)/16/32/64 bit shifts 2. Shift differs from Rotate ƒ Shifts fills in zeros or the sign bit => How do you build a barrel shifter that does both shift and rotate? Inputs Inter- 3. Rotate could include the Carry bit turn 90o mediate ƒ X86 supports RCL/RCR (Rotate with Carry Left/Right) => A 64-bit RCL requires a 65-bit barrel shifter!

Connection Outputs Connection Outputs

19 20 EECS141EE141 EECS141EE141

Robustness and Reliability ‰ Robustness: Higher Yield=Higher Profit Margin ƒ Circuit needs to function across PVT variation ƒ Chip target yield of 70% could require EX yield of 99% “Real Life” ƒ What works in spice (w/o PVT) may not work in real life ‰ Reliability Designs ƒ In addition to simulation for speed, real design also checks –Noise – IR Drop –Electro-Migration – Inductive Effects –…

21 22 EECS141EE141 EECS141EE141

Process Variation Voltage/Temperature Variations ‰ Major Culprits: Threshold, Channel Length, ‰ Introduce more timing variations Channel Width ‰ Increase Noise ƒ In 45nm, Vth ~ +- 150mV, ΔL ~ +- 15%, ΔW ~ +- 10% (for min ‰ Worsen cross chip matching (e.g. Clock tree)

devices). (Idsat/Idoff relationships to variation non-linear. Try ‰ Degrade reliability 1.072 V it in spice.)

ƒ Matching devices/paths: sense-amp, analog, memory cell 1.103 V stability, clock tree ƒ Increases Leakage: 80% of chip leakage caused by 20% of 1.134 V devices: limits usage of dynamic circuit 1.164 V ƒ Slows down critical paths

ƒ Worse hold-time requirements 1.194 V

1.224 V

23 24 EECS141EE141 EECS141EE141

4