Recap

● RISC vs. CISC Application – Load, Store instructions Algorithm Programming Language ● Caches OS/VM ISA – Locality of reference Microarchitecture RTL – It is small and it is fast Gates Circuits ● Is it fast because it is small? Devices ● Why is it small? Physics

Technology Trends

CO403 – Advanced Microprocessors

IS860 – HPC for Security

Outline

● Semiconductors, MOSFET

● Moore's Law: Process, Feature Size, Scaling

● Power, Energy

● Performance Metrics

Semiconductors

● Conductivity of Si changes over many orders of magnitude depending on the concentration of dopants

Metal Oxide Semiconductor

● gate, source, drain, body

● Gate – oxide – body stack looks like a capacitor – Gate and body are conductors

– SiO2 (oxide) is a very good insulator

nMOS and pMOS

Gate Gate Source Drain Polysilicon Source Drain

SiO2

n+ n+ p+ p+

p bulk Si n bulk Si

g g

s d s d

nMOSnMOS TransistorTransistor pMOSpMOS TransistorTransistor Field–Effect–

● MOSFET is a voltage-controlled switch

● Gate voltage creates an electric field that turns ON or OFF a connection between the source and drain.

I-V Characteristics of nMOS

α I ds ∝(V DD−V t ) 1<α<2

V : Supply Voltage DD

Threshold Voltage (transistor turns ON at V ) t

1 Switching Speed ∝ (V DD−V t ) Channel Length

● Characteristic of “process generation”

● Channel length reduces by a 'scaling factor' every generation – Continued miniaturization ● Improves switching speed, power/transistor, area(cost)/transistor

● Reduces transistor reliability

● Transistor density (transistors/cm2) doubles every 18 months

n+ n+ 32nm n+ n+

bulk Si bulk Si p p

Channel Length (22nm) Moore's Law (Technology Scaling)

Parameter Value in Current Value in the New Generation Generation Area of the 2 Transistor A A/S Supply Voltage (V ) V V/S DD Frequency of operation f f*S Power 2 consumption P P/S

S:S: ScalingScaling factorfactor (~(~√√2)2)

Power

P total =P dynamic+ P static +P

● Dynamic Power – while transistors are switching ● Static and Leakage Power – while transistors are not switching (idle), or off

Static Power

● Transistors consume power when idle.

● Static and Leakage – Subthreshold conduction, leakage, tunneling through gate oxide ● Leakage increases at high temperature

● Switching Speed vs. Leakage tradeoff

– Vt↓, Switching Speed↑

– Vt↓, Leakage↑ exponentially

P leakage =V DD⋅I leakage P static=V DD⋅I static

2 Dynamic Power Pdyn=α⋅C⋅V ⋅f

● Activity Factor or Switching factor – Number of times the circuit element transitions from 0 to 1 or 1 to 0 during operation. – 0 <= α <= 1 ● Capacitance – Load capacitance – wires, gate capacitance

● Supply Voltage (Vdd) 1 ● Frequency of operation, f = T clock cycle

Clock Cycle

● Clock is a special signal to hardware

● A well defined indication for event start and complete.

Time (ns) 1.0 1.5 2.0 2.5

Clock

Rising Edge Falling edge

Levels

Clock Cycle

● Clock is a special signal to hardware

● A well defined indication for event start and complete.

Time (ns) 1.0 1.5 2.0 2.5

Clock

Addition ADDITION COMPLETES IN ANOTHER BLOCK THIS DURATION

The result Both operands is ready. are ready. The result is consumed by another block. Clock Cycle

● Clock is a special signal to hardware

● A well defined indication for event start and complete.

Time (ns) 1.0 1.5 2.0 2.5

Clock

Memory Memory access happens here. ANOTHER BLOCK Access

Data is Address is ready. ready The data is consumed Power (Problem)

1.5μ 1.0μ 0.7μ 0.5μ 0.350.35μμ 0.25μ 0.18μ 0.13μ 0.1μ 0.07μ

Power (Problem)

Reducing Dynamic Power

2 ● Reduce capacitance (C) P dyn=α⋅C⋅V ⋅f – Smaller transistors

● Reduce voltage (Vdd) – Quadratic reduction in energy consumption! – But also slows transistors ● Reduce frequency (f) – Slower clock frequency (reduces power but not energy) Why? ● Reduce switching

● Clock gating

Dynamic Frequency Scaling (DFS)

TotalTotal PowerPower == 100100 W.W. LeakageLeakage == 25%.25%. ExecutionExecution timetime == 100s.100s. OriginalOriginal FrequencyFrequency == 11 GHz.GHz.

ScaledScaled Frequency Frequency = = 0.5 0.5 GHz. GHz. Total Total Power Power = = ? ? Execution Execution time time = = ? ? EnergyEnergy = = ? ?

CPUCPU MemoryMemory ProgramProgram is is not not entirely entirely CPU CPU bound! bound! DecreasesDecreases memory memory latency latency in in cycles. cycles. IncreasesIncreases IPC. IPC. Execution Execution time time is is < < 200s. 200s. T 1GHz

T 0.5GHz DVFS

IntelIntel XScale: XScale: 1 1 GHz GHz → → 200 200 MHz MHz reduces reduces energy energy used used by by 30x. 30x. 5x 5x slower. slower. 55 x x 200 200 MHz MHz in in parallel, parallel, use use 1/6th 1/6th the the energy energy ! !

PowerPower hashas drivendriven thethe industryindustry towardstowards multi-core.multi-core.

TotalTotal Power Power = = 100 100 W. W. Leakage Leakage = = 25%. 25%. Execution Execution time time = = 100s. 100s. Voltage Voltage = = 1 1 V. V. NewNew Voltage Voltage = = 0.9V. 0.9V. New New Delay Delay = = ? ? New New Frequency? Frequency? Power Power = = ? ? Energy Energy = = ? ? Power and Energy

Energy= Power×Time

● Power poses constraints – System can only work fast enough to max out the power delivery or cooling solution ● Energy is the ultimate metric – True “cost” of performing a fixed task – Eg. Instructions per Joule, Bits per Joule

ProcessorProcessor A A consumes consumes 1.2x1.2x thethe powerpower consumedconsumed byby ProcessorProcessor B.B. ProcessorProcessor A A completes completes executionexecution inin 30%30% ofof thethe timetime takentaken byby ProcProc B.B. WhichWhich oneone wouldwould youyou choosechoose forfor youryour application?application?

Power and Energy Metrics

● Energy (Joules) – Measure of battery lifetime (mobile scenarios), operating costs (non-mobile environments) ● Power (Watts, Joules per second) – Measure of current delivery and voltage regulation on-chip. ● Power density (Power per Area – Watts/cm2) – Thermal studies; 200 W/64 cm2 vs. 200 W/4 cm2 ● Energy-per-Instruction (EPI) – Reducing energy at the expense of performance may not be acceptable – Improves if one of the factors improve while holding the other constant

Metrics 1 EDP ∝ ● Energy-Delay Product (EDP) MIPS 2 /Watt – Equal weight for energy and performance degradation

Gochman, et. al., “The Intel Pentium M Processor: Microarchitecture and Performance,” Intel Tech. J., 2003. Summary

● I-V Characteristics of a Transistor

● Moore's Law

● Effects of scaling

● Power, Energy, Energy Delay

References

● CS6810 – Rajeev Balasubramonian. University of Utah.

● CIS 501 – Milo Martin and Amir Roth. University of Pennsylvania.

● Kaxiras and Martonosi, Computer Architecture Techniques for Power Efficiency, Synthesis Lectures on Computer Architecture #4, Morgan and Claypool.

● Weste and Harris. CMOS Digital VLSI. 3e. Pearson-AW.

● Harris and Harris. Digital Design and Computer Architecture. MK.

● Hennessy and Patterson. Computer Architecture. 5e. MK.

Backup

Working of an nMOS Transistor

Working of an nMOS Transistor

Working of an nMOS Transistor

Ideal I-V Characteristics

Ideal I-V Characteristics of nMOS

Pipeline Stage

Wire

resistivity⋅Length R∝ Height⋅Width

ϵ⋅Length C ∝ Pitch 2 Delaywire ∝ Length

● Long wires are getting relatively slow to transistors

● And relatively longer time to cross relatively larger chips

Complementary MOS

● nMOS transistors pass 0’s well but pass 1’s poorly.

● pMOS transistors pass 1’s well but 0’s poorly.

● Manufacturing processes provide both flavors of transistors are called Complementary MOS or CMOS.

Complementary CMOS gates

Switching Speed of a Transistor

● Delay through an electrical component ~ RC

L ● R= A

● Capacitance (C) – ~ length * area / distance-to-other-plate

● Voltage (VDD)

● Threshold Voltage (Vt) – Voltage at which a transistor turns “on” – Property of transistor based on fabrication technology ● Switching time ~ to R⋅C V −V DD t Reducing Static Power −V t T P static ∝V DD⋅e ● Disable transistors – “Power gating” disable power to unused parts (long latency to power up) ● Reduce voltage (V) – Linear reduction in static energy consumption – Slows transistors

● Dual Vt – use a mixture of high and low Vt transistors – Use slow, low-leak transistors in SRAM arrays – Requires extra fabrication steps (cost) ● Low-leakage transistors – High-K/Metal-Gates in Intel’s 45nm process, “tri-gate” in Intel’s 22nm ● Reducing frequency can hurt energy efficiency due to leakage power