Design of Clock Distribution in High Performance Processors

Ian Young

Intel Senior Fellow and Director, Advanced Circuits and Technology Integration (ACTI)

Technology Manufacturing Group (TMG) Intel Corporation, Hillsboro, Oregon

Page 1 Desktop Clock Frequency Trend

10 GHz P4 X

1 GHz PIII X

PIIX 100 MHz PentiumX

Clock Frequency 486 X 10 MHz

1985 1990 1995 2000 2005

R® Year Clock Distribution in I. Young 3/30/2005 Page 2 Outline

‹ Introduction to Synchronous Logic and clock distribution

‹ Manufacturing effects

‹ Early history of clocking: 80486, Pentium and Pentium II

‹ active deskewing clock distribution

‹ Pentium4 clock distribution

‹ Montecito (Itanium family next gen.) clock distribution

‹ Summary

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 3 Synchronous Logic

‹ Logic progresses at a rate controlled by the clock — Retiming removes the effects of different logic and wire delays — Slows down signals that arrive too fast

‹ Requires a state element — Latch stores Input when clock is low — Flip-Flop stores Input when clock rises

‹ Requires a precise clock

‹ Enables CPU pipelining and high through-put CPUs

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 4 Flip-Flop: Set-up and Hold Times

‹ Setup Time: — time before the , that a data signal must be valid in order to be stored. ‹ Hold Time: — time after the clock signal, that a data signal must be valid in order to be stored. Setup time Data In

Clock Hold time Data Out

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 5 The Clock Distribution Problem

‹ Deliver the clock signal from the source (PLL) to all the receivers with the best timing precision. ‹ CLOCK SKEW is the inaccuracy of the same clock edge arriving at various locations in the chip (spatial separation) ‹ CLOCK JITTER is the inaccuracy of consecutive clock edges arriving at the same location (temporal separation)

PLL A A B B Skew

C C Jitter

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 6 Clock and Logic Structure & Operation

Data path Data In Data Out Data In Data Out

U1FF U2FF U1 C U2 Clock path 1 Computation B1 CLOCKClock Circuitry

Buffer controlling U1 data capture & transmit

Clock path 2 B2

Buffer Controlling U2 data capture & transmit Clock Delay 1 = delay from Clock to the Clock input of U1. Clock Delay 2 = delay from Clock to the Clock input of U2. Clock Skew = Clock Delay 2 – Clock Delay 1 (should be zero).

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 7 SETUP violation: Data arrives at U2 too late, and doesn’t get captured by U2 clock cycle.

DATA IN C DATA IN U1 U2 DATA OUT CLOCK

DATA IN 0 Data Arrives 1 Too Late U1 CLOCK 1 DATA OUT DATA IN 1 1 U2 CLOCK b

DATA OUT 0

CLOCK SKEW CLOCK SKEW

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 8 HOLD violation: Data arrives at U2 too soon, not held long enough to be captured by U2 clock cycle.

DATA IN C U1 U2 DATA OUT CLOCK Combinational logic

DATA IN 0 Data Arrives 1 Too Soon U1 CLOCK 1 DATA OUT DATA IN 1 1 U2 CLOCK

DATA OUT 0

CLOCK SKEW CLOCK SKEW

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 9 Challenges for Clock Design ‹ Rapid increases in Core Clock Frequencies — 1991: 100 MHz (0.8mm) — 1997: 400 MHz (0.35mm) — 2001: 2.0GHz (0.13mm) — 2005: 3.8GHz (90nm) ‹ Increasing Clock Load — as indicated by total transistors/die — 1991: 1.2 million transistors (0.8mm) — 1997: 7.5 million transistors (0.35mm) — 2001: 42 million transistors (0.13mm) — 2005: 1.7 billion transistor (90nm) ‹ Worsening within-die variations — Lithography and Etch — Supply Noise — Hot Spots

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 10 Clock Distribution H-Tree (2 level) Global / Local Skew

L3

L2

Ext PLL (2 GHz) L1 Clk (100MHz)

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 11 Outline

‹ Introduction to Synchronous Logic and clock distribution

‹ Manufacturing effects

‹ Early history of clocking: 80486, Pentium and Pentium II

‹ Itanium active deskewing clock distribution

‹ Pentium4 clock distribution

‹ Montecito (Itanium family next gen.) clock distribution

‹ Summary

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 12 Poly Gate Transistor Length Variation Sources

‹ Long-range Within-die — Stepper lens aberrations

‹ Proximity effects (systematic) — Nest or isolated

‹ Random component — Stepper lens — Poly gate line edge roughness

— Threshold voltage Lgate

Source Drain

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 13 Transistor Vth Variation Sources

‹ Die-to-die

‹ Random component — From Random Dopant Fluctuations f (W, L)

‹ Short channel component.

‹ Well and Halo doping

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 14 Vth Variation Model

‹ Model relates variability to device size:

C σ ( ∆ Vt ) = C + 2 1 WeZe

‹ Where We and Le are the effective device width and length. C1 and C2 relate to some physical device parameters such as Tox, junction depth, etc.

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 15 Interconnect Variability

‹ Conductor width and space

‹ Conductor Thickness

‹ Dielectric Thickness

‹ Inductance — needs to be analyzed and modeled for busses

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 16 Interconnect Variability

‹ Conductor Thickness ‹ Dielectric Thickness M7 ‹ Power and Global Signal Inductance (needs to be analyzed and modeled) M6

M5

Metal Line M4 Via M3 Insulating Dielectric M2 M1

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 17 Outline

‹ Introduction to Synchronous Logic and clock distribution

‹ Manufacturing effects

‹ Early history of clocking: 80486, Pentium and Pentium II

‹ Itanium active deskewing clock distribution

‹ Pentium4 clock distribution

‹ Montecito (Itanium family next gen.) clock distribution

‹ Summary

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 18 The 50MHz 80486 Microprocessor (1991)

• 3 Layer Metal • 2 Phase Clocking • RC clock skew • On-chip PLL

PLL

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 19 Clkout Clock 80486 Microprocessor: PLL Buffers Ph1/ Ext Clk Phase Vcntl Ph2 Div Freq CP VCO 2 Det Fb Clk

Clkin

‹ 50% duty-cycle 2φ clocks, Clkin, Clkout compatible with prior gen. ‹ Internal Clock Skews Between Chips Reduced by ~2ns ‹ Enables 0-1ns Hold Time for frequency Scalability ‹ VCO Designed with: Wide Frequency Range (5-130MHz) Supply Voltage Noise Rejection Conventional With PLL Setup 3ns 1.5ns Hold 2.5ns 1.0ns Output Valid 9.0ns 7.5ns R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 20 Pentium Microprocessor Clock Distribution

‹ Single 50% Duty-cycle Clock ‹ 66-133MHz Internal Frequency ‹ Internal clock freq. vs. External freq. Ratios of 2:1, 3:2, 1:1

Serpentines for PLL Clock Generator RC matching

To Global To Global Clock Driver Clock Driver

Local Clock Enable T=0ns

Global Clock Drivers Local Clock Enable Located in the Pad Ring R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 21 Pentium II (P6) Microprocessor Clock Distribution

‹ Buffer Network designed for > 300 MHz ‹ Minimized the Propagation Delay ‹ Minimized Global Clock Skew ‹ Global Clock Power Down ‹ Supply di/dt noise reduction — Vdd / Vss decoupling capacitance — Minimize Vdd / Vss DC Resistance (IR drop) — Minimize Vdd / Vss AC Resistance and Inductance

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 22 Global Clock Drivers Global Local Clock Drivers Clock_en

Local Clock Drivers gclk# clk PLL

Local Clock Drivers

1.2 ns 2 ns

Clock Dis tribution Network R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 23 Pentium II Global Clock Skew measured test chip results

SK= -564ps ‹ Floor Plan of measured SK= -592ps

skew Input Point to Local Buffers with clock gating ‹ SK = Skew relative to feedback point from

local buffer SK= -488ps SK= -476ps 5 Level Driver for 500pF load ‹ Skew across M4 Global with M4 Metal Strapping Ring Distribution = 140ps

SK= -460ps

SK = -424ps SK = -548ps R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 24 Outline

‹ Introduction to Synchronous Logic and clock distribution

‹ Manufacturing effects ‹ ‹ Early history of clocking: 80486, Pentium and Pentium II

‹ Itanium active deskewing clock distribution

‹ Pentium4 clock distribution

‹ Montecito (Itanium family next gen.) clock distribution

‹ Summary

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 25 Itanium Clocking

‹ IA-64 architecture IA-32 ‹ 0.18µm CMOS Control FPU ‹ 6 metal layers IA-64 Control TLB ‹ 25.4M transistors Integer Units Instr. ‹ 800MHz frequency Fetch & Decode Cache Bus

Ref: S. Tam, S. Rusu, U. Desai, R. Kim, J. Zhang, I. Young JSSC Nov 2000

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 26 Clock Distribution Hierarchy

Reference RCD Clock DSK CLKP CLKN PLL DSK VCC/2

Main DSK Clock RCD DLCLK

OTB

Global Regional Local Distribution Distribution Distribution

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 27 Itanium Global Clock Distribution

‹ Balanced H-tree routed in M5 and M6 DSK DSK DSK DSK ‹ Lateral shielding ‹ Distributes both main and reference clock ‹ Optimized to account for inductive effects PLL DSK DSK DSK

DSK

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 28 Itanium Regional Clock Distribution

‹ Distributed array of deskew buffers to reduce process related skew DSK DSK DSK DSK — 8 deskew clusters each holding up to 4 buffers ‹ Regional clock grids CDC driven by modular Regional Clock Drivers

DSK — M4-M5 grid tailored for the DSK DSK clock load density of the underlying block DSK — Full support for scan and clock gating

DSK = Cluster of 4 deskew buffers

R® CDC = Central Deskew Controller

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 29 Deskew Buffer: Block Diagram

RCD Deskew Buffer Delay Global Clock Circuit Regional Clock Grid TAP I/F

Ref. Clock RCD Local Controller

Regional Feedback Clock

‹ Deskew covers the entire clock distribution up to the input of the local clock buffer

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 30 Deskew Buffer: Delay Circuit

Input Output

Enable

TAP I/F Step size = 8.5ps 20-bit Delay Control Register Deskew range = 170ps

‹Small step size enables fine granularity skew control over a wide range ‹TAP read/write access to Control Register enables faster timing debug and performance tuning

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 31 Deskew Buffer: Local Controller

Reference clock 16-to-1 Enable

To Deskew Buffer Register Phase Feedback Digital Low-Pass Filter Detector clock Lead/Lag

‹Phase detector output sampled every 16 core cycles ‹6-tap digital low pass filter reduces comparison noise ‹Local controller ensures stable deskew operation

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 32 Itanium Skew Measurements

120 110 100 Projected max skew without 90 deskew mechanism = 110ps 80 70 60 50 Max skew with deskew mechanism = 28ps 40 30 20

Distribution Skew (ps) 10 0

R01 R03 R11 R20 R22 R31 R33 R41 R43 R51 R53 R63 R71 Regional Clock

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 33 Clock Skew Timing Budgets

DSK RCD LCB Category Skew Budget Common LCB Ts1 CB

Common CB Common RCD Ts2 Reference

Common Reference Ts3 Main DSK RCD LCB CB Clock

DSK RCD CB Common Main Clock Ts4

Common Reference Ts1 < Ts2 < Ts3 < Ts4

Multiple skew budgets minimize the skew penalty

R® and enable timing optimization

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 34 Outline

‹ Introduction to Synchronous Logic and clock distribution

‹ Manufacturing effects ‹ ‹ Early history of clocking: 80486, Pentium and Pentium II

‹ Itanium active deskewing clock distribution

‹ Pentium4 clock distribution

‹ Montecito (Itanium family next gen.) clock distribution

‹ Summary

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 35 Pentium®4 Clock Challenges

‹ Enable Netburst™ micro-architecture for 0.18um technology ≥ 2GHz clock for Hyper Pipelined Technology core ≥ 4GHz clock for Rapid Execution Engine ≥ 400MHz I/O clock for fast data transfer — < 10% clock inaccuracy ‹ Enable clock gating for low power ‹ Clock observability and controllability for fast debug

Reference: Kurd, Barkatullah, Dizon, Fletcher, Madland JSSC Nov 2001

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 36 Clock Generation & Distribution

LCDs 2x 4GHz LCDs 1x system Core Clock 2GHz clocks PLL Dist LCDs ½x 1GHz Core Clocks 100MHz Local Clock Drivers LCDs

1x s

100MHz k

c

o LCDs 2x l

C

200MHz

I/O Clock O

/ LCDs 4x I PLL Dist 400MHz

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 37 Clock Distribution

LCDs 2x s

k

c

o LCDs 1x l

C system Core Clock e

r

clocks PLL Dist LCDs ½x o

C Local Clock Drivers LCDs 1x

s

k

c

o LCDs 2x l

C

O

I/O Clock / LCDs 4x I PLL Dist

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 38 Binary distribution tree in three spines

From PLL

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 39 Triple Clock Spines

From PLL

Binary Top Spine Dist.Tree

Spine Middle Spine Skew Optimizer

PLL Bottom Spine To Local Clock Drivers

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 40 Skew Optimization Scheme

Filtered To Test Local Clock Access Port Local Clock Macro VCC LCDMacro SE DB1 PD binary DB2 LCD SE

tree PD Local Clock LCDMacro SE of DB3 Phase clock Adjustable Local Sequential PLL repea- Domain Detectors Clock Elements ters Buffers PD Drivers DB46 LCD SE PD DB47 LCD SE Skew Optimizer

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 41 Skew Profile Graph

40.0 30.0 ) s

p 20.0

ng ( 10.0 i 0.0 AFTER

Tim REFERENCE e

v -10.0 i

t Top

la -20.0 Middle BEFORE Re Bottom -30.0 -40.0 Left Right

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 42 Skew Control Scheme

‹ For the particular die example shown — Pre-compensation max skew ~ ±32ps — Post-compensation max skew ±8ps

‹ Side benefits — Provides a within-die skew profile — Deliberately skew clocks for performance – 200MHz frequency increase obtained

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 43 Outline

‹ Introduction to Synchronous Logic and clock distribution

‹ Manufacturing effects

‹ Early history of clocking: 80486, Pentium and Pentium II

‹ Itanium active deskewing clock distribution

‹ Pentium4 clock distribution

‹ Montecito (Itanium family next gen.) clock distribution

‹ Summary

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 44 Montecito Clock System Floorplan

RVDs

FSB DFDs Core PLL / DFDs translatio CORE n table / 0 clock control Foxton Controller DFD

Bus Logic DFD FSB DFDs CORE 1

References: ISSCC 2005 Paper 16.1, T. Fischer et al, R® ISSCC 2005 Paper 16.2, P. Mahoney et al Page 45 Clock System Architecture Variable CVD Gater Supply SLCB

RVD Balanced Pins RAD Frequency Divisors Tree Clock Fuses Translation Distribution Table DFD SLCB Fixed Core0 DFD CVD Gater 1/N Supply RAD DFD Core1 Bus Clock PLL Foxton SLCB CVD Gater 1/M DFD I/Os DFD SLCB CVD Gater 1/1 Bus Logic DFD SLCB CVD Gater 1/N

R® Phase Aligner Clock Distribution in Microprocessors I. Young 3/30/2005 Page 46 Montecito Clock Distribution Summary (1)

‹ Core Clock Frequency controlled real-time based upon DC and transient power supply voltage — Montecito/Foxton Power delivery sets DC supply voltage based upon power dissipation (temp. sensors around the chip) of 100W. No worst case power. — Ldi/dt supply noise transients slow critical paths and reduce the operating frequency for a few cycles. Constantly varying frequency responds to the core supply voltage transient behavior. ‹ Regional Active Deskewing system reduces the process voltage and temperature sources of skew across the 21.5mm x 27.7 mm die. ‹ Clock Venier Devices (CVD) inserted at each local clock buffer allow 70ps of adjustment via Scan control. ‹ The clock distribution system consumes less than 25W for the 30mm route from PLL through the clock tree to all the Latches.

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 47 Montecito Clock System Floorplan RVDs

FSB DFDs Core PLL / DFDs translation CORE 0 table / clock control

Foxton Controller DFD

Bus Logic DFD FSB DFDs CORE 1

Page 48 Overview with block diagram

L0 route L1 route L2 route L3 Route

CVD GATERS Latches

SLCB Latches

RAD REPEATERS DFD Latches Bus Clock SLCB core0 CVD PLL DFD GATERS Latches core1 DFD RAD Foxton DFD SLCB CVD GATERS Latches IOs DFD SLCB CVD GATERS Latches

DFD SLCB CVD GATERS Latches Bus Logic

Fixed frequency Variable Frequency Full Rail Transitions Low Voltage Swings

R® Differential Single Ended Clock Distribution in Microprocessors I. Young 3/30/2005 Page 49 Montecito Clock Distribution Summary (2)

‹ The LO clock route is differential with 400mV swig and resistive load at the end of each line (length = 20mm) — Line width tapering is used — A self-biased differential amplifier is the repeater ‹ L1 route (length = 2mm) from the Digital Frequency Divider (DFD) to the Second Level Clock Buffer (SLCB) is distributed as a half frequency 0o / 90o clocks that are XOR’d in the SLCB and not duty cycle sensitive ‹ L2 route (length=3mm) from SLCBs to the Clock Venier Device has < 6ps skew using optimization with a CAD timing tool. ‹ L3 route (length = 2mm), from the CVDs through the clock gaters to the Latches, provides an overall clock skew adjustment to < 10ps to the Latches controlled by test scan

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 50 Digital Frequency Divider (DFD) Block Diagram

½ FREQUENCY QUADRATURE PLL CLOCK DIFFERENTIAL INPUT DIFFERENTIAL CLOCK ROUTES TO SLCBS DIVIDE 64 16-PHASE DLL DIVIDE BY 2 PHASES AND BY 2 INTERPOLATION STATE PERIOD MACHINE FULL RVD UP / ADJUST FREQUENCY DOWN +2 TO -1 PCSM DIFFERENTIAL REQUESTS “UTILITY” STARTUP CLOCK CONTROL ROUTES TO CLOCK SYSTEM TO / FROM SAME-CORE ODCS PCSMS CONTROL

SCAN AND TRIGGERS

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 51 Voltage-to-Frequency Conversion (VFC) Per Core

Utility Clocks 2 RVD8 RVD9 RVD10 RVD11 8 4 DFD2 L1 Clock Route To SLCBS 2 2 2 Utility Clocks

RVD4 RVD5 RVD6 RVD7 8 DFD1 4 L0 Clock 2 L1 Clock Route Route To from PLL 2 SLCBS 2 Utility Clocks

RVD0 RVD1 RVD2 RVD3 8 4 DFD0 L1 Clock Route To R® SLCBS Clock Distribution in Microprocessors I. Young 3/30/2005 Page 52 Variable Frequency Mode: CMOS Critical Path Scaling

2.5 2.3 )

V 2.1

1.9 circuit1 1.7

1100 m circuit2 o 1.5 circuit3 m t r circuit4 o 1.3 n circuit5

y ( 1.1 a l

e 0.9 D 0.7 0.5 600 800 1000 1200 Supply Voltage (mV) R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 53 RVD Coarse Delay Element

run nrun I VDD I nodd nfet config_fet fet nclear I nrun I

VDD VDD fbp nrun out O nfet

I in

I clear run Metal 1 Serpentine Resistor even GND GND I GND nrun fbn nout

run fbp

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 54 RVD Block Diagram

Delay Line 0A dly0in dly0out HOLD

Delay Line 0B

eval0 eval0 eval1

RVD FSM additional delay creates deadzone

Delay Line 1A DOWN dly1in dly1out

Delay Line 1B eval1 clk R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 55 Example VFC Supply Droop Response

Clock period increased No Adjust needed this cycle DFD Output Clock 12 345

Vcore

Droop increases RVD delay line delay RVD Delay Line Clock Increased delay asserts period “UP” for one cycle

Period “UP” to DFD R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 56 SLCB Block Diagram

128 Bit Shifter PCa Zone Thermometer Control PCb Summer Shifter Filter PRESET Output Buffer & Duty ina SLCBO 128 Bit Delay Cycle inb Control inc Element ind Setback dutycycle Duty Cycle Set-back Scan Scan Registers Registers

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 57 CVD Circuit and Operation

SLCBO SLCBOx cvdo I O

Drive fight with feedback is low high mid attenuated with pass gate I I I settings change the delay as desired

SPICE simulations showing low, mid and high delay settings for SLCBOx (top graph) and CVDO (bottom graph)

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 58 Route Statistics and Power

L0 L1 L2 Route Terminals Distance Delay

L0 14 20mm 640ps L3 L1 71 5mm 215ps

L2 14500 2-3.3mm 60ps

L3 ~5 million 0-1.5mm 12ps Total CPU Route statistics Power dissipation contribution by route •Highest load and most power dissipated in the L3 route. •Future research into low-power clock distribution should focus on last section of route. R®

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 59 Evolution in Clock Distribution

1991 • 486: PLL on-chip to remove the large clock distribution delay (zero delay buffer). Clock RC skew minimized across chip with metal. 1993 • Pentium: Clock tree with length “tuning” for skew balancing. 1995 • Pentium II: Clock Binary Tree in center Spine with branch length “tuning” to local clock buffer for skew balancing. 1998 • Itanium I: Lightly loaded “balanced” reference clock routed with the highly loaded “unbalanced” clock tree - actively adjust clock buffer delay for low skew (at product test). 2000 • Pentium IV: Three binary tree Spines with “tunable delays” and Phase Detectors distributed across the die. Blow fuses (based upon compare algorithm at test). 2004 • Itanium (Next Gen): - Differential global clock distribution (20mm). - Digital Frequency Divider (synthesizer) adjusts frequency in 1.6% steps within 2 cycles based upon measured local supply. - Regional Active Deskew adjusts Second Level Clock Buffer delay for low skew ( done during test) - Local clock buffer variable delay adjust to load (flip-) by design (time borrowing)

R® Complexity

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 60 Summary / Future Directions

‹ Clocking systems have evolved with even more complex electrical methods — Trimming and active feedback de-skewing circuits developed — Transient Frequency adjust based upon local supply voltage ‹ Design the micro-architecture with interconnect delay in mind ‹ Exploit locality for frequency scaling — Logic / clock domains ‹ Clock Distribution Power will take a larger % of the total chip power

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 61 Acknowledge the contributions of

‹ Keng Wong, TMG/LTD Design ‹ Simon Tam and the Itanium clock design team ‹ Nasser Kurd and the Pentium 4 clock design team ‹ Patrick Mahoney, Tim Fischer and Montecito clock team

Clock Distribution in Microprocessors I. Young 3/30/2005 Page 62