18nm FinFET
Double-gate structure + raised source/drain
400 -1.50 V Gate 350 Gate 300 -1.25 V
Lecture 30 Source Drain 250 -1.00 V Silicon 200 Fin [uA/um] -0.75 V BOX d Si fin - Body! I 150 -0.50 V 100 -0.25 V Perspectives 50 0 X. Huang, et al, 1999 IEDM, p.67~70 -1.5 -1.0 -0.5 0.0 Vd [V]
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Administrivia Power Density
z With Vdd ~1.2V, these devices are quite fast. FO4 z Final on Friday December 14, 2001 – 8 am delay is <5ps Location: 180 Tan Hall z If we continue with today’s architectures, we could z Topics – all what was covered in class. run digital circuits at 30GHz
z Review Session - TBA z But - we will end up with 20kW/cm2 power density. z Lab and hw scores to be posted on the web – please z Lower supply – to 0.6V, we are down to 5kW/cm2. check if correct or if something is missing z z Superb Job on Posters! Speeds will be a bit lower, too, FO4 = 10ps, lowering the frequencies to ~10GHz [Tang, ISSCC’01], and z FEEDBACK ON COURSE EXTREMELY WELCOME! lowering power
z Assume that a high performance DG or bulk FET can be designed with 1kW/cm2, with FO4 = 10ps [Frank, Proc IEEE, 3/01] Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Transistor Count Power will be a problem
10000 100000 1.8B 18KW 1000 900M 10000 5KW 425M 1.5KW 100 200M 1000 500W 10 P6 Pentium® proc 100 486 Pentium ® proc 1 286 386 486 10 8086 386 0.1 286 Power (Watts) 8085 Transistors (MT) 8080 8085 8086 8008 0.01 8080 1 4004 8008 S. Borkar S. Borkar 4004 0.001 0.1 1970 1980 1990 2000 2010 1971 1974 1978 1985 1992 2000 2004 2008 Year Year 200M--1.8B transistors on the Lead Microprocessor Power delivery and dissipation will be prohibitive
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
1 Power is a Limiting Factor Microprocessor Design
z Core datapath will be running at 7 - 10GHz z If we have 2cm x 2cm die in a high-performance z Requires fast devices, low thresholds with 0.5-0.6V microprocessor, we will end up with 4kW power supplies dissipation. z Lowest NMOS VTh ~ -0.1V to get swing in CMOS. z If our power has to be limited to 180W, we can afford z Assume threshold of 0 – 0.1V. The devices will be to have only 4.5% of these devices with 0.6V supply very leaky, will use second threshold to control on the die, given that nothing else dissipates power. leakage power. z With second threshold set to have 10x less leakage, 90% of devices off critical paths can be made high- threshold. z Power limits the size of the µP core to 5-10% die (today’s transistor count, just shrunk), 30-50% of total power budget. Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Possible Scenario Add Dedicated Datapath
z Example: 0.5 % of devices will be of highest z Can execute e.g. DIVX decoder, graphics performance Vdd Vdd/2 Freq = 1 Freq = 0.5 z 35% is leakage (assume: 20% drain, 10% gate, 5% Vdd = 0.5 drain-to-body) Logic Block Vdd = 1 Logic Block Throughput = 1 Throughput = 1 z 65% is active power, if just 0.5% of these CV2 = 13W, Power = 1 Power = 0.25 leakage 7W Area = 1 Logic Block Area = 2 Pwr Den = 1 Pwr Den = 0.125 z How would other 99.5% devices that populate the Leakage Curr. = 2 2cmx2cm die look like? z Will run at 10x lower frequency, at 0.5-0.7 of the processor VDD = 0.25 - 0.35V
z Thresholds for critical paths VTh = 150mV z Need leakage power management – another threshold or control of VT Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Microprocessors 180W Gives Us:
Power Area Today → 20nm
µP Core Cache Cache
Memory µP Core Dedicated datapath µP Core Memory 2GHz Dedicated µP Logic Core Dedicated datapath
7-10 GHz
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
2 Today’s Design Methodologies Memory Will Not Scale Much Further
z z Density is the key requirement The Deep Sub-Micron (DSM) Effect (≤ 0.25µ) z Will occupy 70-80% of the die ∝ DSM ∝ 1/DSM z Low leakage z Low activity – Inherently low active power, low power “Microscopic Problems” “Macroscopic Issues” density (at least 10x less than logic) • Wiring Load Management • Time-to-Market • Noise, Crosstalk • Millions of Gates z Need higher VTh ~ 0.5V, and higher supply 0.8-1V (?) • Reliability, Manufacturability • High-Level Abstractions • Complexity: LRC, ERC • Reuse & IP: Portability • Accurate Power Prediction • Predictability • Accurate Delay Prediction • etc. • etc. Everything Looks a Little Different ? …and There’s a Lot of Them!
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Systems-on-a-Chip The Productivity Gap
Today → 20nm 10,000,000 Logic Transistors/Chip 100,000,000 p
i .10µ 1,000,000 Transistor/Staff Month 10,000,000
h C
h
100,000 58%/Yr. compound 1,000,000 t
n er
Complexity growth rate o
p
M y s .35µ 10,000 100,000
t r
- i
) o
v
f t
i
f K
1,000 10,000 t
s (
a i
c
t s
x u
S
/ n
d
100 x x 1,000 . a
o Radio s r x x
x r
n
T x P (60GHz (?), CMOS ?) 10 21%/Yr. compound 100 a
r
c 2.5µ i
T
g Productivity growth rate
o 1 10
1 9 1 3 7
7 7 9
9 3 5 5 L
3 5
1
9 9 0 0 0
8 9 0 8 9 9 0
8 8
8
9 9 0 0 0
9 9 0 9 9 9 0
9 9 25M transistors, 3MB embedded SRAM 9
1 1 2 2 2
1 1 2 1 1 1 2
1 1 MIPS core @ 100MHz, DSP @ 144MHz 1 2W 7 PLLs, 12 ADC, DACs, 100 clocks, 1.4W Source: SEMATECH
Digital Integrated Circuits Broadcom set-topPerspectives box © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Transistor Requirements Implementation Methodologies
z Will need different kinds of transistors: Digital Circuit Implementation Approaches » Datapaths (speed, leakage) » Dedicated DSP (power, leakage) » Memory (density is main concern) » Analog (?) Custom Semi-custom z Power and leakage determine the size ratios between these blocks Cell-Based Array-Based z Number of different transistors types is determined by parameter spread z Less devices could solve the problem, but, need control of the Standard Cells Macro Cells Pre-diffused Pre-wired th threshold (4 terminal), with strong transfer function. Compiled Cells (Gate Arrays) (FPGA)
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
3 Custom Design – Layout Editor Gate Array — Sea-of-gates
polysilicon
VDD
metal rows of Uncommited uncommitted possible cells GND contact Cell
In1 In2 In 3 In4
routing channel Committed Cell (4-input NOR) Magic Layout Editor Out (UC Berkeley)
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Standard Cell - Example Sea-of-gate Primitive Cells
Oxide-isolation
PMOS
PMOS
NMOS
NMOS NMOS 3-input NAND cell (from Mississippi State Library) characterized for fanout of 4 and Using oxide-isolation Using gate-isolation for three different technologies
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Synthesis Sea-of-gates
1. Describe your circuit in HDL (VHDL, Verilog) Random Logic 2. Syntehsis programs map it into a standard cell library. Set the constraints (timing, area) 3. Get a gate level netlist – automatic place and route 4. Insert clock 5. Extract the netlist from layout 6. Does it meet constraints? – go back to 1, 2, 3, 4. Called ‘Design closure’ – timing closure, power closure. Memory Subsystem
LSI Logic LEA300K (0.6 µm CMOS)
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
4 Field-Programmable Gate Arrays Prewired Arrays Fuse-based
I/O Buffers
Program/Test/Diagnostics Categories of prewired arrays (or field- Vertical routes
programmable devices): Standard-cell like floorplan z Fuse-based (program-once)
z I/O Buffe rs Non-volatile EPROM based I/O Buffers z RAM based Rows of logic modules Routing channels
I/O Buffe rs
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Programmable Logic Devices Interconnect
Programmed interconnection Input/output pin
Cell Antifus e
Horizontal tracks
PAL Ve r t ic a l t r a c k s PLA PROM Programming interconnect using anti-fuses
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Field-Programmable Gate Arrays EPLD Block Diagram RAM-based
Primary inputs Macrocell CLB CLB
switching matrix Horizontal routing channel Interconnect point
CLB CLB
Vertical routing channel Courtesy Altera Corp.
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
5 RAM-based FPGA Basic Cell (CLB) Architecture ReUse
z Silicon System Platform Combinationa l logic Storage eleme nts » Flexible architecture for hardware and software
R » Specific (programmable) components A Din R » Network architecture 1 2 Any function of up to B/Q /Q 4 variables F D Q1 F C/Q1/Q2 G » Software modules CE F D » Rules and guidelines for design of HW and SW
A Any function of up to z Has been successful in PC’s R B/Q1/Q2 4 variable s G C/Q1/Q2 F D Q2 » Dominance of a few players who specify and control architecture D G CE G z Application-domain specific (difference in constraints) E Clock » Speed (compute power)
CE » Dissipation »Costs » Real / non-real time data Courtesy of Xilinx
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Platform-Based Design RAM-based FPGA
“Only the consumer gets freedom of choice; designers need freedom from choice” (Orfali, et al, 1996, p.522)
z A platform is a restriction on the space of possible implementation choices, providing a well-defined abstraction of the underlying technology for the application developer z New platforms will be defined at the architecture-micro-architecture boundary z They will be component-based, and will provide a range of choices from structured-custom to fully programmable implementations z Key to such approaches is the representation of communication in the platform model
Xilinx XC4025 Source:R.Newton Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
Addressing the Design Complexity Issue Design at a crossroad Architecture Reuse System-on-a-Chip
Reuse comes in generations Generation Reuse element Status Multi- 500 k Gates FPGA z Embedded applications Spectral where cost, performance, st RAM + 1 Gbit DRAM 1 Standard cells We ll e s ta b lis he d Imager Preprocessing Analog and energy are the real 2 nd IP b lo c ks Being introduced issues! 64 SIMD Processor µC rd 3 Architecture Eme rging Array + SRAM system z DSP and control intensive 4 th IC Early re s e arc h +2 Gbit z Mixed-mode Image Conditioning DRAM z Combines programmable 100 GOPS Recog- and application-specific nition modules Source: Theo Claasen (Philips) – DAC 00 z Software plays crucial role
Digital Integrated Circuits Perspectives © Prentice Hall 2000 Digital Integrated Circuits Perspectives © Prentice Hall 2000
6 EE 141 Summary
z Digital CMOS design is well and kicking z Some major challenges down the road caused by Deep Sub-micron » Super GHz design » Power consumption!!!! » Reliability – making it work » Device variations Some new circuit solutions are bound to emerge z Who can afford design in the years to come? Some major design methodology change in the making!
Digital Integrated Circuits Perspectives © Prentice Hall 2000
7