Design of Clock Distribution in High Performance Processors
Ian Young
Intel Senior Fellow and Director, Advanced Circuits and Technology Integration (ACTI)
Technology Manufacturing Group (TMG) Intel Corporation, Hillsboro, Oregon
R®
Page 1 Desktop Clock Frequency Trend
10 GHz P4 X
1 GHz PIII X
PIIX 100 MHz PentiumX
Clock Frequency 486 X 10 MHz
1985 1990 1995 2000 2005
R® Year Clock Distribution in Microprocessors I. Young 3/30/2005 Page 2 Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 3 Synchronous Logic
Logic progresses at a rate controlled by the clock — Retiming removes the effects of different logic and wire delays — Slows down signals that arrive too fast
Requires a state element — Latch stores Input when clock is low — Flip-Flop stores Input when clock rises
Requires a precise clock
Enables CPU pipelining and high through-put CPUs
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 4 Flip-Flop: Set-up and Hold Times
Setup Time: — time before the clock signal, that a data signal must be valid in order to be stored. Hold Time: — time after the clock signal, that a data signal must be valid in order to be stored. Setup time Data In
Clock Hold time Data Out
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 5 The Clock Distribution Problem
Deliver the clock signal from the source (PLL) to all the receivers with the best timing precision. CLOCK SKEW is the inaccuracy of the same clock edge arriving at various locations in the chip (spatial separation) CLOCK JITTER is the inaccuracy of consecutive clock edges arriving at the same location (temporal separation)
PLL A A B B Skew
C C Jitter
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 6 Clock and Logic Structure & Operation
Data path Data In Data Out Data In Data Out
U1FF U2FF U1 C U2 Clock path 1 Computation B1 CLOCKClock Circuitry
Buffer controlling U1 data capture & transmit
Clock path 2 B2
Buffer Controlling U2 data capture & transmit Clock Delay 1 = delay from Clock to the Clock input of U1. Clock Delay 2 = delay from Clock to the Clock input of U2. Clock Skew = Clock Delay 2 – Clock Delay 1 (should be zero).
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 7 SETUP violation: Data arrives at U2 too late, and doesn’t get captured by U2 clock cycle.
DATA IN C DATA IN U1 U2 DATA OUT CLOCK Combinational logic
DATA IN 0 Data Arrives 1 Too Late U1 CLOCK 1 DATA OUT DATA IN 1 1 U2 CLOCK b
DATA OUT 0
CLOCK SKEW CLOCK SKEW
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 8 HOLD violation: Data arrives at U2 too soon, not held long enough to be captured by U2 clock cycle.
DATA IN C U1 U2 DATA OUT CLOCK Combinational logic
DATA IN 0 Data Arrives 1 Too Soon U1 CLOCK 1 DATA OUT DATA IN 1 1 U2 CLOCK
DATA OUT 0
CLOCK SKEW CLOCK SKEW
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 9 Challenges for Microprocessor Clock Design Rapid increases in Core Clock Frequencies — 1991: 100 MHz (0.8mm) — 1997: 400 MHz (0.35mm) — 2001: 2.0GHz (0.13mm) — 2005: 3.8GHz (90nm) Increasing Clock Load — as indicated by total transistors/die — 1991: 1.2 million transistors (0.8mm) — 1997: 7.5 million transistors (0.35mm) — 2001: 42 million transistors (0.13mm) — 2005: 1.7 billion transistor (90nm) Worsening within-die process variations — Lithography and Etch — Supply Noise — Hot Spots
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 10 Clock Distribution H-Tree (2 level) Global / Local Skew
L3
L2
Ext PLL (2 GHz) L1 Clk (100MHz)
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 11 Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 12 Poly Gate Transistor Length Variation Sources
Long-range Within-die — Stepper lens aberrations
Proximity effects (systematic) — Nest or isolated
Random component — Stepper lens — Poly gate line edge roughness
— Threshold voltage Lgate
Source Drain
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 13 Transistor Vth Variation Sources
Die-to-die
Random component — From Random Dopant Fluctuations f (W, L)
Short channel component.
Well and Halo doping
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 14 Vth Variation Model
Model relates variability to device size:
C σ ( ∆ Vt ) = C + 2 1 WeZe
Where We and Le are the effective device width and length. C1 and C2 relate to some physical device parameters such as Tox, junction depth, etc.
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 15 Interconnect Variability
Conductor width and space
Conductor Thickness
Dielectric Thickness
Inductance — needs to be analyzed and modeled for busses
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 16 Interconnect Variability
Conductor Thickness Dielectric Thickness M7 Power and Global Signal Inductance (needs to be analyzed and modeled) M6
M5
Metal Line M4 Via M3 Insulating Dielectric M2 M1
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 17 Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 18 The 50MHz 80486 Microprocessor (1991)
• 3 Layer Metal • 2 Phase Clocking • RC clock skew • On-chip PLL
PLL
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 19 Clkout Clock 80486 Microprocessor: PLL Buffers Ph1/ Ext Clk Phase Vcntl Ph2 Div Freq CP VCO 2 Det Fb Clk
Clkin
50% duty-cycle 2φ clocks, Clkin, Clkout compatible with prior gen. Internal Clock Skews Between Chips Reduced by ~2ns Enables 0-1ns Hold Time for frequency Scalability VCO Designed with: Wide Frequency Range (5-130MHz) Supply Voltage Noise Rejection Conventional With PLL Setup 3ns 1.5ns Hold 2.5ns 1.0ns Output Valid 9.0ns 7.5ns R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 20 Pentium Microprocessor Clock Distribution
Single 50% Duty-cycle Clock 66-133MHz Internal Frequency Internal clock freq. vs. External bus freq. Ratios of 2:1, 3:2, 1:1
Serpentines for PLL Clock Generator RC matching
To Global To Global Clock Driver Clock Driver
Local Clock Enable T=0ns
Global Clock Drivers Local Clock Enable Located in the Pad Ring R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 21 Pentium II (P6) Microprocessor Clock Distribution
Buffer Network designed for > 300 MHz Minimized the Propagation Delay Minimized Global Clock Skew Global Clock Power Down Supply di/dt noise reduction — Vdd / Vss decoupling capacitance — Minimize Vdd / Vss DC Resistance (IR drop) — Minimize Vdd / Vss AC Resistance and Inductance
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 22 Global Clock Drivers Global Local Clock Drivers Clock_en
Local Clock Drivers gclk# clk PLL
Local Clock Drivers
1.2 ns 2 ns
Clock Dis tribution Network R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 23 Pentium II Global Clock Skew measured test chip results
SK= -564ps Floor Plan of measured SK= -592ps
skew Input Point to Local Buffers with clock gating SK = Skew relative to feedback point from
local buffer SK= -488ps SK= -476ps 5 Level Driver for 500pF load Skew across M4 Global with M4 Metal Strapping Ring Distribution = 140ps
SK= -460ps
SK = -424ps SK = -548ps R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 24 Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 25 Itanium Processor Clocking
IA-64 architecture IA-32 0.18µm CMOS Control FPU 6 metal layers IA-64 Control TLB 25.4M transistors Integer Units Cache Instr. 800MHz frequency Fetch & Decode Cache Bus
Ref: S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, I. Young JSSC Nov 2000
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 26 Clock Distribution Hierarchy
Reference RCD Clock DSK CLKP CLKN PLL DSK VCC/2
Main DSK Clock RCD DLCLK
OTB
Global Regional Local Distribution Distribution Distribution
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 27 Itanium Global Clock Distribution
Balanced H-tree routed in M5 and M6 DSK DSK DSK DSK Lateral shielding Distributes both main and reference clock Optimized to account for inductive effects PLL DSK DSK DSK
DSK
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 28 Itanium Regional Clock Distribution
Distributed array of deskew buffers to reduce process related skew DSK DSK DSK DSK — 8 deskew clusters each holding up to 4 buffers Regional clock grids CDC driven by modular Regional Clock Drivers
DSK — M4-M5 grid tailored for the DSK DSK clock load density of the underlying block DSK — Full support for scan and clock gating
DSK = Cluster of 4 deskew buffers
R® CDC = Central Deskew Controller
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 29 Deskew Buffer: Block Diagram
RCD Deskew Buffer Delay Global Clock Circuit Regional Clock Grid TAP I/F
Ref. Clock RCD Local Controller
Regional Feedback Clock
Deskew covers the entire clock distribution up to the input of the local clock buffer
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 30 Deskew Buffer: Delay Circuit
Input Output
Enable
TAP I/F Step size = 8.5ps 20-bit Delay Control Register Deskew range = 170ps
Small step size enables fine granularity skew control over a wide range TAP read/write access to Control Register enables faster timing debug and performance tuning
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 31 Deskew Buffer: Local Controller
Reference clock 16-to-1 Enable Counter
To Deskew Buffer Register Phase Feedback Digital Low-Pass Filter Detector clock Lead/Lag
Phase detector output sampled every 16 core cycles 6-tap digital low pass filter reduces comparison noise Local controller ensures stable deskew operation
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 32 Itanium Skew Measurements
120 110 100 Projected max skew without 90 deskew mechanism = 110ps 80 70 60 50 Max skew with deskew mechanism = 28ps 40 30 20
Distribution Skew (ps) 10 0
R01 R03 R11 R20 R22 R31 R33 R41 R43 R51 R53 R63 R71 Regional Clock
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 33 Clock Skew Timing Budgets
DSK RCD LCB Category Skew Budget Common LCB Ts1 CB
Common CB Common RCD Ts2 Reference
Common Reference Ts3 Main DSK RCD LCB CB Clock
DSK RCD CB Common Main Clock Ts4
Common Reference Ts1 < Ts2 < Ts3 < Ts4
Multiple skew budgets minimize the skew penalty
R® and enable timing optimization
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 34 Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 35 Pentium®4 Clock Challenges
Enable Netburst™ micro-architecture for 0.18um technology ≥ 2GHz clock for Hyper Pipelined Technology core ≥ 4GHz clock for Rapid Execution Engine ≥ 400MHz I/O clock for fast data transfer — < 10% clock inaccuracy Enable clock gating for low power Clock observability and controllability for fast debug
Reference: Kurd, Barkatullah, Dizon, Fletcher, Madland JSSC Nov 2001
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 36 Clock Generation & Distribution
LCDs 2x 4GHz LCDs 1x system Core Clock 2GHz clocks PLL Dist LCDs ½x 1GHz Core Clocks 100MHz Local Clock Drivers LCDs
1x s
100MHz k
c
o LCDs 2x l
C
200MHz
I/O Clock O
/ LCDs 4x I PLL Dist 400MHz
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 37 Clock Distribution
LCDs 2x s
k
c
o LCDs 1x l
C system Core Clock e
r
clocks PLL Dist LCDs ½x o
C Local Clock Drivers LCDs 1x
s
k
c
o LCDs 2x l
C
O
I/O Clock / LCDs 4x I PLL Dist
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 38 Binary distribution tree in three spines
From PLL
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 39 Triple Clock Spines
From PLL
Binary Top Spine Dist.Tree
Spine Middle Spine Skew Optimizer
PLL Bottom Spine To Local Clock Drivers
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 40 Skew Optimization Scheme
Filtered To Test Local Clock Access Port Local Clock Macro VCC LCDMacro SE DB1 PD binary DB2 LCD SE
tree PD Local Clock LCDMacro SE of DB3 Phase clock Adjustable Local Sequential PLL repea- Domain Detectors Clock Elements ters Buffers PD Drivers DB46 LCD SE PD DB47 LCD SE Skew Optimizer
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 41 Skew Profile Graph
40.0 30.0 ) s
p 20.0
ng ( 10.0 i 0.0 AFTER
Tim REFERENCE e
v -10.0 i
t Top
la -20.0 Middle BEFORE Re Bottom -30.0 -40.0 Left Right
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 42 Skew Control Scheme
For the particular die example shown — Pre-compensation max skew ~ ±32ps — Post-compensation max skew ±8ps
Side benefits — Provides a within-die skew profile — Deliberately skew clocks for performance – 200MHz frequency increase obtained
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 43 Outline
Introduction to Synchronous Logic and clock distribution
Manufacturing effects
Early history of clocking: 80486, Pentium and Pentium II
Itanium active deskewing clock distribution
Pentium4 clock distribution
Montecito (Itanium family next gen.) clock distribution
Summary
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 44 Montecito Clock System Floorplan
RVDs
FSB DFDs Core PLL / DFDs translatio CORE n table / 0 clock control Foxton Controller DFD
Bus Logic DFD FSB DFDs CORE 1
References: ISSCC 2005 Paper 16.1, T. Fischer et al, R® ISSCC 2005 Paper 16.2, P. Mahoney et al Page 45 Clock System Architecture Variable CVD Gater Supply SLCB
RVD Balanced Pins RAD Frequency Divisors Tree Clock Fuses Translation Distribution Table DFD SLCB Fixed Core0 DFD CVD Gater 1/N Supply RAD DFD Core1 Bus Clock PLL Foxton SLCB CVD Gater 1/M DFD I/Os DFD SLCB CVD Gater 1/1 Bus Logic DFD SLCB CVD Gater 1/N
R® Phase Aligner Clock Distribution in Microprocessors I. Young 3/30/2005 Page 46 Montecito Clock Distribution Summary (1)
Core Clock Frequency controlled real-time based upon DC and transient power supply voltage — Montecito/Foxton Power delivery sets DC supply voltage based upon power dissipation (temp. sensors around the chip) of 100W. No worst case power. — Ldi/dt supply noise transients slow critical paths and reduce the operating frequency for a few cycles. Constantly varying frequency responds to the core supply voltage transient behavior. Regional Active Deskewing system reduces the process voltage and temperature sources of skew across the 21.5mm x 27.7 mm die. Clock Venier Devices (CVD) inserted at each local clock buffer allow 70ps of adjustment via Scan control. The clock distribution system consumes less than 25W for the 30mm route from PLL through the clock tree to all the Latches.
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 47 Montecito Clock System Floorplan RVDs
FSB DFDs Core PLL / DFDs translation CORE 0 table / clock control
Foxton Controller DFD
Bus Logic DFD FSB DFDs CORE 1
R®
Page 48 Overview with block diagram
L0 route L1 route L2 route L3 Route
CVD GATERS Latches
SLCB Latches
RAD REPEATERS DFD Latches Bus Clock SLCB core0 CVD PLL DFD GATERS Latches core1 DFD RAD Foxton DFD SLCB CVD GATERS Latches IOs DFD SLCB CVD GATERS Latches
DFD SLCB CVD GATERS Latches Bus Logic
Fixed frequency Variable Frequency Full Rail Transitions Low Voltage Swings
R® Differential Single Ended Clock Distribution in Microprocessors I. Young 3/30/2005 Page 49 Montecito Clock Distribution Summary (2)
The LO clock route is differential with 400mV swig and resistive load at the end of each line (length = 20mm) — Line width tapering is used — A self-biased differential amplifier is the repeater L1 route (length = 2mm) from the Digital Frequency Divider (DFD) to the Second Level Clock Buffer (SLCB) is distributed as a half frequency 0o / 90o clocks that are XOR’d in the SLCB and not duty cycle sensitive L2 route (length=3mm) from SLCBs to the Clock Venier Device has < 6ps skew using optimization with a CAD timing tool. L3 route (length = 2mm), from the CVDs through the clock gaters to the Latches, provides an overall clock skew adjustment to < 10ps to the Latches controlled by test scan
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 50 Digital Frequency Divider (DFD) Block Diagram
½ FREQUENCY QUADRATURE PLL CLOCK DIFFERENTIAL INPUT DIFFERENTIAL CLOCK ROUTES TO SLCBS DIVIDE 64 16-PHASE DLL DIVIDE BY 2 PHASES AND BY 2 INTERPOLATION STATE PERIOD MACHINE FULL RVD UP / ADJUST FREQUENCY DOWN +2 TO -1 PCSM DIFFERENTIAL REQUESTS “UTILITY” STARTUP CLOCK CONTROL ROUTES TO CLOCK SYSTEM TO / FROM SAME-CORE ODCS PCSMS CONTROL
SCAN AND TRIGGERS
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 51 Voltage-to-Frequency Conversion (VFC) Per Core
Utility Clocks 2 RVD8 RVD9 RVD10 RVD11 8 4 DFD2 L1 Clock Route To SLCBS 2 2 2 Utility Clocks
RVD4 RVD5 RVD6 RVD7 8 DFD1 4 L0 Clock 2 L1 Clock Route Route To from PLL 2 SLCBS 2 Utility Clocks
RVD0 RVD1 RVD2 RVD3 8 4 DFD0 L1 Clock Route To R® SLCBS Clock Distribution in Microprocessors I. Young 3/30/2005 Page 52 Variable Frequency Mode: CMOS Critical Path Scaling
2.5 2.3 )
V 2.1
1.9 circuit1 1.7
1100 m circuit2 o 1.5 circuit3 m t r circuit4 o 1.3 n circuit5
y ( 1.1 a l
e 0.9 D 0.7 0.5 600 800 1000 1200 Supply Voltage (mV) R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 53 RVD Coarse Delay Element
run nrun I VDD I nodd nfet config_fet fet nclear I nrun I
VDD VDD fbp nrun out O nfet
I in
I clear run Metal 1 Serpentine Resistor even GND GND I GND nrun fbn nout
run fbp
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 54 RVD Block Diagram
Delay Line 0A dly0in dly0out HOLD
Delay Line 0B
eval0 eval0 eval1
RVD FSM additional delay creates deadzone
Delay Line 1A DOWN dly1in dly1out
Delay Line 1B eval1 clk R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 55 Example VFC Supply Droop Response
Clock period increased No Adjust needed this cycle DFD Output Clock 12 345
Vcore
Droop increases RVD delay line delay RVD Delay Line Clock Increased delay asserts period “UP” for one cycle
Period “UP” to DFD R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 56 SLCB Block Diagram
128 Bit Shifter PCa Zone Thermometer Control PCb Summer Shifter Filter PRESET Output Buffer & Duty ina SLCBO 128 Bit Delay Cycle inb Control inc Element ind Setback dutycycle Duty Cycle Set-back Scan Scan Registers Registers
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 57 CVD Circuit and Operation
SLCBO SLCBOx cvdo I O
Drive fight with feedback is low high mid attenuated with pass gate I I I settings change the delay as desired
SPICE simulations showing low, mid and high delay settings for SLCBOx (top graph) and CVDO (bottom graph)
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 58 Route Statistics and Power
L0 L1 L2 Route Terminals Distance Delay
L0 14 20mm 640ps L3 L1 71 5mm 215ps
L2 14500 2-3.3mm 60ps
L3 ~5 million 0-1.5mm 12ps Total CPU Route statistics Power dissipation contribution by route •Highest load and most power dissipated in the L3 route. •Future research into low-power clock distribution should focus on last section of route. R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 59 Evolution in Clock Distribution
1991 • 486: PLL on-chip to remove the large clock distribution delay (zero delay buffer). Clock RC skew minimized across chip with metal. 1993 • Pentium: Clock tree with length “tuning” for skew balancing. 1995 • Pentium II: Clock Binary Tree in center Spine with branch length “tuning” to local clock buffer for skew balancing. 1998 • Itanium I: Lightly loaded “balanced” reference clock routed with the highly loaded “unbalanced” clock tree - actively adjust clock buffer delay for low skew (at product test). 2000 • Pentium IV: Three binary tree Spines with “tunable delays” and Phase Detectors distributed across the die. Blow fuses (based upon compare algorithm at test). 2004 • Itanium (Next Gen): - Differential global clock distribution (20mm). - Digital Frequency Divider (synthesizer) adjusts frequency in 1.6% steps within 2 cycles based upon measured local supply. - Regional Active Deskew adjusts Second Level Clock Buffer delay for low skew ( done during test) - Local clock buffer variable delay adjust to load (flip-flops) by design (time borrowing)
R® Complexity
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 60 Summary / Future Directions
Clocking systems have evolved with even more complex electrical methods — Trimming and active feedback de-skewing circuits developed — Transient Frequency adjust based upon local supply voltage Design the micro-architecture with interconnect delay in mind Exploit locality for frequency scaling — Logic / clock domains Clock Distribution Power will take a larger % of the total chip power
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 61 Acknowledge the contributions of
Keng Wong, TMG/LTD Design Simon Tam and the Itanium clock design team Nasser Kurd and the Pentium 4 clock design team Patrick Mahoney, Tim Fischer and Montecito clock team
R®
Clock Distribution in Microprocessors I. Young 3/30/2005 Page 62