High- Memory Interface Design

Chulwoo Kim [email protected] Dept. of Electrical Engineering Korea University, Seoul, Korea

February 17, 2013

Chulwoo Kim 1 of 86 Outline

 Introduction

 Clock Generation and Distribution

 Transceiver Design

 TSV Interface for DRAM

 Summary

 References

Chulwoo Kim 2 of 86 Outline

 Introduction  DRAM 101  Simplified DRAM Architecture and Operation  Differences of DRAM (DDRx, GDDRx, LPDDRx)  Trend  Memory Interface: Differences and Issues  Clock Generation and Distribution  Transceiver Design  TSV Interface for DRAM  Summary  References

Chulwoo Kim 3 of 86 DRAM 101

SDR Single Rate Main Memory DDRx CLK SDRAM PC, Notebook, DQ D Graphics Memory Synchronous DDR GDDRx Dynamic Graphic Card, Console Random Access Memory CLK Mobile Memory LPDDRx DQ D D Phone, Tablet PC

CLK MCU Command C CAS* Latency CLK & Data Command CLK SDRAM DQ D D D D D D D D Burst Length *CAS : Column Address Strobe

Chulwoo Kim Introduction 4 of 86 DRAM DDR4 Die Photo

Bank Bank Bank Bank Bank Bank Bank Bank 0 1 2 3 8 9 10 11

Supply Voltage VDD=1.2V, VPP=2.5V Process 38nm CMOS /3-metal Banks 4-Bank Group, 16 Bank Bank BankData Bank Rate Bank 2400Bank Mbps Bank Bank Bank 4 5 Number6 of IO‟s 7 X4 12/ X8 13 14 15

[1] K. B. Koo et al., ISSCC 2012, pp. 40-41

Chulwoo Kim Introduction 5 of 86 Simplified DRAM Architecture

Bank Bank WordLineDriver

BLT BLB RowRepair Fuse RowDecoder

WL Cell Array

Column Decoder

BLSA* Write Drv. / Read Amp.

Column Repair Fuse

Peripheral Circuit Generator Serial to Parallel DCLK ICLK CMD DLL parallel to serial Controller

DQ RX DQ TX CLK/ADD/CMD Buffer

Bank Bank

* BLSA : line sense amplifier Chulwoo Kim Introduction 6 of 86 Concept of DRAM operation

Bank Bank WRITE Np×Ndq : Serial to parallel BLSABLSA (DQ  GIO) READ *BLSA : Bit line sense : Parallel to serial amplifier *Np: Number of (GIO  DQ) pre-fetch *Ndq: Number of DQ

Peripheral Circuit

Serial to Parallel GIO *GIO : Global I/O parallel to serial Ndq Ndq bits Np×Ndq bits DQ RX DQ TX

Bank Bank

Chulwoo Kim Introduction 7 of 86 Pre-fetch Timing(DDR1,BL*=2)

tCCD*=1

CLK RD RD

GIO GIO GIO

After CL* BL*=2

DQS

DQ 0 1 0 1

 Number of GIO channel=Np×Ndq=2×8=16 (DDR1 x8)

* tCCD : CAS to CAS delay * CL : CAS latency [2] JEDEC, JESD79F, pp. 24-29 * BL : Burst length

Chulwoo Kim Introduction 8 of 86 Pre-fetch Diagram(DDR1)

Bank Bank Bank Bank

Num. of GIO channel = 2×Ndq

Bank Bank Bank Bank

 Pre-fetch operation  2-bit pre-fetch  [2×Ndq] data access (If the output data rate is 400Mbps, the internal data rate is 200Mbps)

Chulwoo Kim Introduction 9 of 86 Pre-fetch Timing(DDR2,BL=4)

tCCD=2

CLK RD RD

GIO GIO GIO

After RL* BL=4

DQS

DQ 0 1 2 3 0 1 2 3

 Number of GIO channel=Np×Ndq=4×8=32 (DDR2 x8)

* RL : READ latency [3] JEDEC, JESD79-2F, pp. 35

Chulwoo Kim Introduction 10 of 86 Pre-fetch Diagram(DDR2)

Bank Bank Bank Bank

Num. of GIO channel = 4×Ndq

Bank Bank Bank Bank

 Pre-fetch operation  4-bit pre-fetch  [4×Ndq] data access (If the output data rate is 800Mbps, the internal data rate is 200Mbps, same as DDR1)

Chulwoo Kim Introduction 11 of 86 Pre-fetch Timing(DDR3,BL=8)

tCCD=4

CLK RD RD

GIO GIO GIO After RL BL=8

DQS

DQ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

 Number of GIO channel=Np×Ndq=8×8=64 (DDR3 x8)

[4] JEDEC, JESD79-3F, pp. 62

Chulwoo Kim Introduction 12 of 86 Pre-fetch Diagram(DDR3)

Bank Bank Bank Bank

Num. of GIO channel = 8×Ndq

Bank Bank Bank Bank

 Pre-fetch operation  8-bit pre-fetch  [8×Ndq] data access (If the output data rate is 1.6Gbps, the internal data rate is 200Mbps, same as DDR1)

Chulwoo Kim Introduction 13 of 86 Bank Grouping Timing(DDR4,BL=8)

tCCD_S=4 tCCD_L=5 CLK RD RD RD G0 G1 G1 GIO_BG0 GIO_BG0 GIO_BG1 GIO_BG1 GIO_BG1 GIO_BG2 GIO_BG3 After RL BL=8 DQS

DQ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7  Number of GIO channel=Np×Ndq×Ngroup=8×8×4 = 256(DDR4 x8) [5] JEDEC, JESD79-4, pp. 77-78 [6] T. Y. Oh et al., ISSCC 2010, pp. 434-435

Chulwoo Kim Introduction 14 of 86 Pre-fetch & Bank Grouping(DDR4)

Bank Bank Bank Bank Group0 Group1

GIO MUX Num. of GIO channel = 8×Ndq

Group2 Group3 Bank Bank Bank Bank

 Pre-fetch operation  8-bit pre-fetch  Bank grouping [1] K. B. Koo et al., ISSCC 2012, pp. 40-41

Chulwoo Kim Introduction 15 of 86 Differences of DDRx,GDDRx,LPDDRx

DDRx GDDRx LPDDRx

Bank Bank Bank Bank PAD Bank Bank Architecture PAD PAD Bank Bank Bank Bank Bank Bank PAD

Application PC/Server Graphic card Mobile/Consumer

Socket DIMM On board MCP*/PoP*/SiP*

IO ×4/×8 ×16/×32 ×16/×32

.Single uni-directional .No DLL Unique WDQS, RDQS .DPD* .VDDQ termination .PASR* Function .CRC, DBI .TCSR* .ABI

* MCP: Multi chip package * DPD: Deep power down * PoP : Package on package * PASR : Partial array self refresh * SiP : System in package * TCSR : Temperature compensated self refresh Chulwoo Kim Introduction 16 of 86 DDR Comparison

DDR1 DDR2 DDR3 DDR4

VDD [V] 2.5 1.8 1.5 1.2 Data Rate 200M~400M 400M~800M 800M~2.1G 1.6G~3.2G [bps/pin] Pre-Fetch 2 bit 4 bit 8 bit 8 bit

STROBE Single DQS Differential DQS, DQSB

Interface SSTL_2 SSTL_18 SSTL_15 POD_12

.OCD calibration .Dynamic ODT .CA parity .ODT .ZQ calibration .DBI*, CRC* New .Write leveling .Gear down Feature .CAL* ▪ PDA* .FGREF * ▪ TCAR* .Bank grouping

* DBI: Data inversion * PDA: Per DRAM addressability * CRC: Cyclic redundancy check * FGREF: Fine granularity refresh * CAL: Command address latency * TCAR: Temperature controlled array refresh Chulwoo Kim Introduction 17 of 86 GDDR Comparison

GDDR1 gDDR2 GDDR3 GDDR4 GDDR5

VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35 Data Rate 300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G [bps/pin] Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit Differential STROBE Single DQS Bi-direction Single Uni-direction WDQS, RDQS DQS*, DQSB Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15

.OCD* .ZQ .DBI .No DLL calibration .Parity(opt) .PLL(option) New .ODT* .WCK, WCKB Feature .CRC ▪ ABI* .RDQS(option) .Bank grouping

* DQS: DQ strobe signal, DQ is dada I/O Pin * ODT: On die termination * OCD: Off chip driver * ABI: Address bus inversion Chulwoo Kim Introduction 18 of 86 LPDDR Comparison

LPDDR1 LPDDR2 LPDDR3

VDD [V] 1.8 1.2 1.2 Data Rate 200M~400M 200M~1066M 333M~1600M [bps/pin] Pre-Fetch 2 bit 4 bit 8 bit

STROBE DQS DQS_T, DQS_C DQS_T, DQS_C

Interface SSTL_18* HSUL_12* HSUL_12*

DLL X X X

.CA pin .ODT New (High tapped termination) Feature

* SSTL: Stub series terminated logic * HSUL: High speed un-terminated logic Chulwoo Kim Introduction 19 of 86 Trend

DDR1 Although all types of DRAMs are 2.5 reaching their limits in supply voltage, GDDR1 the demand of high-bandwidth

memory is keep increasing

LPDDR1 gDDR2 1.8 DDR2 VDD [V] VDD GDDR3 1.5 GDDR4 GDDR5 DDR3 LPDDR2 DDR4 1.2 LPDDR3

0.2 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 … 7.0 Data Rate [Gbps]

Chulwoo Kim Introduction 20 of 86

Memory Interface

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM

DRAM DRAM DRAM DRAM DRAM DRAM DRAM

DRAM DRAM DRAM

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM

DRAM GPU DRAM CPU

 System Feature  Issue  Single-ended/high speed  Reflection  Many channel  Inter-symbol interference (weak for coupling effect)  Simultaneous switching output  DDR: multi-drop noise (multi rank, multi DIMM)  Pin to pin skew GDDR: point to point  Poor transistor performance  Impedance discontinuities (stubs, connector, via, etc. )

Chulwoo Kim Introduction 21 of 86 Outline

 Introduction  Clock Generation and Distribution  Delay-locked loop (DLL)  Duty cycle corrector (DCC)  Clock distribution  Transceiver Design  TSV  Conclusions  References

Chulwoo Kim 22 of 86 Basic DLL Architecture

tD1 tDVDL tDREP

I_CLK Variable Replica Clock Delay Line Delay

PD Controller FB_CLK

Data O_CLK DATA from memory core External DRAM tD2

tD1 Clock tCK ∙ N = tDVDL +tDREP

I_CLK tDREP ≈ tD1 +tD2 FB_CLK tDREP tDVDL tCK ∙ N = tDVDL +tD1 +tD2 + γ O_CLK

Data γ = tDREP – (tD1 +tD2) tD2 Chulwoo Kim Clock Generation and Distribution 23 of 86 Replica Delay Mismatch

γ variation [ps]

Supply Voltage [V]

γ ≈0

HVDD Long

Valid HVDD Valid Valid

Data Data Data

Window Window Window γ <0

tCK

VDD

VDD

LVDD LVDD Short γ >0 tDQSCK* (or tAC) tDQSCK (or tAC) tDQSCK (or tAC) *tDQSCK (or tAC) – DQS output access time for CK/CKb

Chulwoo Kim Clock Generation and Distribution 24 of 86

Locking Range Considerations

I_CLK Long tDREQUIRED tDINIT+tDREP

FB_CLK

Bird’s beak tCK

I_CLK tD +tD

INIT REP tDREQUIRED

FB_CLK Short tDQSCK (or tAC) tDINIT = tDVDL(0) + tDREP N×tCK > tDVDL(0) + tDREP

tCK = tDVDL + tDREP + t∆ [7] H.-W. Lee et al., submitted to TVLSI

Chulwoo Kim Clock Generation and Distribution 25 of 86 Synchronous Mirror Delay (SMD)

tD1 tD1+tD2 tD3 Clock Clock Delay Measure Delay Line I_CLK

I_CLK Replica Replicate Measure Delay Replicate Delay Line OUT tD2 OUT tD1 tD3 tD3 tD2

tD1+tD2

 Basic Operation  Measure and replicate the delay  No feedback  Match delay in two cycles [8] T. Saeki et al., ISSCC 1996, pp. 374-375

Chulwoo Kim Clock Generation and Distribution 26 of 86 Disadvantages of SMD

 Disadvantages  Mismatch between replica delay and input buffer & clock distribution  Coarse resolution  Input jitter multiplication tCK

tD1 tD 1+tD2 tD3 Clock w/o jitter tD1+tD2 Clock Delay Measure Delay Line tD1 tCK-(tD1+tD2) tD2 OUT I_CLK -Δ +Δ +2Δ

Replicate Delay Line Clock tD2 w/ jitter tD1+tD2 tCK-(tD1+tD2)+2Δ tD OUT tD1 tCK-(tD1+tD2)+2Δ 2 Input pk-pk OUT jitter(±Δ) Output pk-pk jitter(±2Δ)

Chulwoo Kim Clock Generation and Distribution 27 of 86 Register Controlled DLL

Sub Delay Line tD+Δ Sub Delay Line

tD+Δ tD+Δ tD+Δ tD+Δ fan-out=2 IN

SW(n-1) SW(n) SW0 SW1 SW2 SW3 SW4

tD OUT fan-out=1

tD tD tD tD tD

Main Delay Line Main Delay Line

 Locking information is stored digitally in register  Vernier type delay line increases resolution

[9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73

Chulwoo Kim Clock Generation and Distribution 28 of 86 Single Register Controlled Delay Line Coarse Delay Fine Delay

IN1 Phase OUT12 Mixer IN2 I_CLK CSL1 CSL2 CSL3 UP/DN* Controller from PD

IN1 OUT1 IN2

IN1 1-K tUD OUT12 OUT1 tUD IN2 K OUT12

OUT2 OUT2

*DN=Down Chulwoo Kim Clock Generation and Distribution 29 of 86 Boundary Switching Problem

Shift left I_CLK IN1 Phase IN1×(1-K)+IN2×K OUT12 Mixer UDC* IN2 K=0.9

Passing through 3 UDCs  tUD

IN1 IN2 K=0 K=1 K=0.9

Passing through 4 UDCs  tUD Coarse shift & fine reset do not occur IN1 IN2 simultaneously

*UDC=Unit delay cell K=0 K=1 Chulwoo Kim Clock Generation and Distribution 30 of 86 Seamless Boundary Switching

Shift left

Clock IN1 Phase IN1×(1-K)+IN2×K Unit Delay Cell OUT12 Mixer IN2

K(0≤K≤1) Dual Coarse Delay Line K=0.9 K=1.0

Fine set first tUD tUD and then coarse shift IN1 IN2 IN2 IN1 K=0 K=1 K=1 K=0 [10] J.-T. Kwak et al., VLSI 2003, pp. 283-284

Chulwoo Kim Clock Generation and Distribution 31 of 86 Adaptive Bandwidth DLL w/ SDVS*

I_CLK Variable Replica Delay Line Delay

PD Controller

Update Pulse FB_CLK To Upper NCODE<0:N> Update Period O_CLK Pulse Gen. Fine Unit Delay vs. Mode I_CLK 1818 Update 16 15.9 ps

Pulse 1414 FB_CLK 12 10.2 ps [ps] 1010 Update Period 8 7.8 ps m×tCK-tDREP+tDREP=m×tCK 6 6 Low Base High m=2,BW =1/(2×tCK) DN BASE UP DLL -Speed -Speed *SDVS: Self-dynamic voltage scaling Mode Mode [11] H.-W. Lee et al., ISSCC 2011, pp. 502-504

Chulwoo Kim Clock Generation and Distribution 32 of 86 Duty Cycle Corrector (DCC)

 DCC  Reduces duty cycle error  Enlarges valid data window for DDR  Needs to correct ±15% duty error at max speed  Can be implemented either in analog or digital type

 DCC Design Issues  Location of DCC (before/after DLL)  Embedded in DLL or not  Power consumption  Area  Operating frequency range  Locking time in case of digital DCC  Offset of duty cycle detector

Chulwoo Kim Clock Generation and Distribution 33 of 86

Digital DCC Invert and delay IN IN Out Phase IN Invert-Delay IN Mixer Clock OUT Generator 50% 50% IN Pulse Width Out Controller IN

Duty Cycle OUT Detector 50% 50% IN IN

Half-Cycle Edge Out HD_IN HD_IN Delayed Combiner Clock OUT Generator 50% 50% Chulwoo Kim Clock Generation and Distribution 34 of 86 DCC in GDDR5 Clk Distribution

CML only Network

r

l

PLL e

r

t a

e 4 4

4 a

b

v

i

r e

o DQ

r

l

e p

4 D

d

e

G

X

i

v

R

R i D PLL sel. DQ rxclk rxclkb Duty Cycle CML2 Duty Corrector CMOS Cycle Adjuster Control Pulse 4 Generator rxclk c<1:5> sw hclk & lclk rxclkb

Adder- b

Duty Cycle WCK WCKb k

k l

based l Decreasing

c

c x

Detector x duty-cycle Counter r up/dn 4-phase CML_bias RX r s<1:4> clock Duty-Cycle Adjuster (DCA)  DCA is not in clock path X8 X4 X2 X1 X1 X2 X4 X8  No jitter addition c<1:5> [12] D. Shin et al., VLSI 2009, pp. 138-139 Decoder

Chulwoo Kim Clock Generation and Distribution 35 of 86 DLL-related Parameters & Reference

DDR1 DDR2 DDR3/DDR3L GDDR3 GDDR4 VDD 2.5V 1.8V 1.5V/1.35V 1.8V 1.5V Lock time 200 cycles 200 cycles 512 cycles 2~5K cycles 2~20K cycles Max. tDQSCK 600ps 300ps 225ps 180ps 140ps Nominal 333MHz~ 600MHz~ 166MHz 333MHz 1.6GHz speed 800MHz 1.37GHz Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns

tXPDLL*(tXARD) 1×tCK 2×tCK 10×tCK 7×tCK+tIS 9×tCK+tIS tXPDLL*(tXARD) – Timing for exit precharge power-down to any non-READ command RELATED AREA REFERENCE Type DCC block [13][ 14][ 15]**[ 16][ 17]** [18][ 19]** [20][ 21]** [ ] Variable [14][ 18][ 19]**[ 20][ 22][ 23]**[ 24][ 25]*[ 26] digital Delay Line [27][28]** [29] [30] [ ]* Delay [13][ 14][ 15]**[ 16][ 18][ 20][ 21]**[ 23]*[ 25]*[ 26][ 28]** mixed Control Logic [29][ 30]** [31][ 32]*[ 33]**[ 34]* [35]* Replica [27] [28]** [30]**[ 32] [ ]** Low Jitter [14][ 15]**[ 16][ 17]** [19]**[ 24][ 26][ 27][ 32]* [36]* analog

Chulwoo Kim Clock Generation and Distribution 36 of 86 Clock Distribution

DQ DQ DQ DQ CK/CKB DQ DQ DQ DQ Global Clock Buffer DQ DQ DQ DQ DQ DQ DQ DQ 93,750μm

 Clock Distribution Issues

 Clock skew among DQs m  Low power μ  Robust under PVT variations

 CML to CMOS converter jitter 1,200

[37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500

Chulwoo Kim Clock Generation and Distribution 37 of 86 CML to CMOS Converter

Global Clock Buffer CML to CMOS Converter

OUTN 1700μm OUTP CLKOUT DQ CLKP CLKN CLKP CLKN

 Global Clock Buffer  Current logic mode : high-speed clock

 CML to CMOS Converter Issue  Susceptible to noise  Jitter

Chulwoo Kim Clock Generation and Distribution 38 of 86 Outline

 Introduction

 Clock Generation and Distribution Pre-emphasis  Transceiver Design  Channel Training DBI/CRC  Pre-emphasis  Equalizer Output  Crosstalk and skew driver

 Training CH

 Input buffer  Output driver Input  DBI/CRC buffer  TSV Interface for DRAM  Summary Training DBI/CRC  References Equalizer

Chulwoo Kim 39 of 86 Channel Characteristics

GDDRx CPU

Socket GDDRx

GPU

DIMM Slot DIMM

 GDDRx  DDRx  Point to point connection  Multidrop  Performance target  Performance and power • High data rate  Many reflection components  Few reflection components • PCB VIAS, DIMM connector…. • PCB VIAS

Chulwoo Kim Transceiver Design 40 of 86

Emphasis for Channel Compensation

e e

d d

u u

t t

i i

l l

p p

m m

A A

Channel

Original Signal Time Distorted Signal Time

D(in) FFE D(out)

Channel

e

e

e

d

d

d

u u

u FFE Channel Channel FFE

t

t

t

i

i

i

l

l

l

p

p

p

m

m

m

A

A A

fdata/2 Freq. fdata/2 Freq. fdata/2 Freq.

Chulwoo Kim Transceiver Design 41 of 86 Pre-emphasis vs. De-emphasis

1-tap pre-emphasis Va

No emphasis Va

1-tap de-emphasis Va

Time  Pre-emphasis : Transition Bit Boosting  De-emphasis : Non-transition Bit Suppression

Chulwoo Kim Transceiver Design 42 of 86 Basic De-emphasis Circuit

Y(n)

Dout

K0 -K1

Din X(n) Unit D Q delay QB

<1-tap de-emphasis model>

 The Number of Taps  Depends on the channel quality and  Usually from one to three taps

Chulwoo Kim Transceiver Design 43 of 86 Pre-emphasis Circuit[1/2]

DQ Din(n) 1.20 4:2 2:1 Driver No Pre-emphasis DQB ]

V 1.00 Din(n-1) [ 1.20 4:2 2:1 e g Conventional

a Pre-emphasis t l

o 1.08

V 1.20 Pre- Proposed 2:1 emph. Pre-emphasis 1.04 Din(n-2) 0 400 4:2 2:1 Time[psec]

 Cascaded Pre-emphasis  Internal node ISI due to limited TR performance at high speed  Internal node pre-emphasis ratio would not be affected by the channel  Less sensitive to the system environment or channel variations [38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134

Chulwoo Kim Transceiver Design 44 of 86 Pre-emphasis Circuit[2/2]

Main Driver

r Equivalent Linear Model e

v RT i

r D R D

Din TX CH GPU in T out

D

- e

r RT P CP RT CL RT Dout

RC CC r

BW e CL

v RC CP

i

r D

- CC e BW r RC P Boosting Capacitor

Pre-Emph. Driver  Voltage Mode Driver Pre-emphasis  Additional zero by Cc  Time continuous pre-emphasis [39] H. Partovi et al., ISSCC, 2009, pp.136-137

Chulwoo Kim Transceiver Design 45 of 86 Decision Feedback Equalization (DFE)

Emulated e

e e e

d

d d ISI d ISI

u

u u u

t

t t

1UI t

i No ISI

i i i

l l

l l

p p

p p

m m

m m

A A

A A

Time Time Time Time (A) (B) (C) (D)  DFE cancels ISI without noise amplification  Clock must be provided by DLL or PLL  Critical path (feedback path) is important [40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010

Chulwoo Kim Transceiver Design 46 of 86 Fast Feedback 1-tap DFE

DQ DFE SA P0 SR Latch D0

WCK/2_0 WCK/2_0 WCK/2_0

P90 P0b P0 DFE SA SR Latch D90 WCK/2_90

DFE SA P180 SR Latch D180

WCK/2_180 DQ Vref P270b × α × α P270 DFE SA P270 SR Latch D270 Vref WCK/2_0 WCK/2_0 × α WCK/2_270

DQ D270 D0 D90

WCK/2_270 T  The previously captured data FB P270 Precharge Evaluation must be fed back to the 1UI receiver within 1UI WCK/2_0 -ISI P0 Precharge Evaluation

[41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279 TFB=TSA<1UI

Chulwoo Kim Transceiver Design 47 of 86 Crosstalk

Cm Lm

Near Far Near Far ICm ILm

Inear=ICm+ILm

Ifar=ICm─ILm

Timing Effect Input signal Input signal Timing Jitter at far end Near end

crosstalk Far end

crosstalk

 Crosstalk is coupling of energy from one line to another

Chulwoo Kim Transceiver Design 48 of 86 Staggered Memory Bus

Staggered Memory Bus Channel

MCU DRAM τ Channel

 No discrepancy of propagation delay due to the crosstalk  Difference of transition point is τ/2  Distance between channels with the same transition is increased  Jitter due to coupling from the adjacent channel is reduced [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232

Chulwoo Kim Transceiver Design 49 of 86 Glitch Canceller

TX1 Aggressor

TX2 D Victim DTX1 TX2 Transition Rise/Fall D Detector TX1 IBIAS+ICOMP DTX3 Rise TX3 Fall

DTX2

 Compensation for glitch by adding or subtracting current

 Rise : ICOMP is added to the main driver

 Fall : ICOMP is subtracted from the main driver [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232

Chulwoo Kim Transceiver Design 50 of 86 Crosstalk Equalizer (TX)

Crosstalk Equalizing Driver DO[0] DQ[0]

DO[1:3] EN[1] EN[0] EN[0:5] DO[0] DQ[0]

EN[0] EN[1]

DO[1]

∆t

 Crosstalk equalization at transmitter  Cancel the crosstalk by the impedance calibration [37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500

Chulwoo Kim Transceiver Design 51 of 86 Skew

MCU/GPU CLK DRAM

Peripheral Circuit Peripheral

Controller

Bank CMD TD

Command

TD‟

Address DLL

T ‟‟ Serial

DQS D Parallel

Bank

.

DQ

Generator

 Differences of flight time between signals  Skew can cause timing errors  Key design criterion in high-speed systems

Chulwoo Kim Transceiver Design 52 of 86 Pre/De-skew with Preamble Signal

8 Skewed Data Data 8 Data De-skewed Ext.Clk Delay Data PLL 8 Mux Lines

Sampling 8 3 Clk Skew 3 Register Data[n] Estimator Skew Files

 Skew cancellation circuit is put in each DRAM  With estimated skew information  De-skew the data during write mode  Pre-skew the data during read mode [43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657

Chulwoo Kim Transceiver Design 53 of 86 Fly-by Topology for DDR3

T-branch  T-branch Topology DRAM DRAM DRAM DRAM #1 #2 #7 #8  CLK/CMD/Address are applied to each DRAM in parallel

DQ & DQS  Small skew bw. CLK and DQS ] s

[ CLK, CMD, Address w

e k

S  Fly-by Topology DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM  Better signal integrity to reduce #1 #2 #3 #4 #5 #6 #7 #8 the number of stubs and stub Fly-by VTT length DRAM DRAM DRAM DRAM #1 #2 #7 #8  Easy to apply a single termination at the end of signal DQ & DQS  DQ and DQS are applied to each

] CLK, CMD, Address DRAM at the same time

s [

w  Large skew bw. CLK and DQS

e k

S  Need to calibrate skew

DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM [4] JEDEC, JESD79-3E, pp. 56-59 #1 #2 #3 #4 #5 #6 #7 #8 Chulwoo Kim Transceiver Design 54 of 86 Write Leveling for DDR3

T0 T1 T2 T3 T4 T5 T6 T7 CK# Source CK

diff_DQS Tn T0 T1 T2 T3 T4 T5 T6 Destination CK# CK

diff_DQS

DQ 0 or 1 0 0 0

Push DQS to capture 0-1 transition diff_DQS

DQ 0 or 1 1 1 1  Write Leveling  Timing mismatch compensation between CLK and DQS  Write leveling is applied to all DRAMs, respectively [4] JEDEC, JESD79-3F, pp. 56-59

Chulwoo Kim Transceiver Design 55 of 86 Training for GDDR5

GDDR5 Timing after Training

CK

CMD

ADDR

WCK

DQ

 Adaptive Interface Training  Ensure the Widest Timing Margins for All Signals  Controlled by MCU [44] W. Hubert et al., ATS, 2008, pp. 24-27

Chulwoo Kim Transceiver Design 56 of 86 Training Sequence for GDDR5

Power Up  Detect the configuration and mirror function  ODT setting

 Optional Address Training  Optimize address input data eye

WCK2CK  Clock alignment Alignment Training  Ready for read/write

 Search for best read data eye READ Training  Detect burst boundaries of read stream

 Search for best write data eye WRITE Training  Detect burst boundaries of write stream

Exit [45] JEDEC, JESD212, pp. 23-39

Chulwoo Kim Transceiver Design 57 of 86 Training Example : Write Training

Data eyes GDDR5 Device Write Data eyes

t0

t0

t0 t1 t2

Memory Controller Data eyes GDDR5 Device Write Data eyes

t0

t0 + t1

t0 - t2 t2 t1

[44] W. Hubert et al., ATS, 2008, pp. 24-27

Chulwoo Kim Transceiver Design 58 of 86 Input Buffer

MCU/GPU Circuit Peripheral DRAM

CLK Controller CMD

Command Bank

4

Address DLL

m* Serial Serial

DQS Parallel

Bank

.

DQ

n GEN

* m: The number of address channels which are depend on kinds of memory or its density

 Convert attenuated external signal to rail-to-rail signal  Trade-off between high speed operation and power consumption

Chulwoo Kim Transceiver Design 59 of 86 Input Buffer Comparison

 Differential Type En En OUT  Complex circuit  High-speed input Vref In  Robust to noise  Stable threshold En  Commonly used

 CMOS Type En In OUT  Simple circuit  Low-speed input (CKE)  Susceptible to noise En  Unstable threshold

Chulwoo Kim Transceiver Design 60 of 86 DDR4 Input Buffer

In  Gain Enhanced Buffer InBuffer Vref  Signal transition detector is added  The bias level (I) is controlled Transition I  Sensitivity can be enhanced Detector at higher frequencies

 Wide Common-Mode Range DQ Buffer Amp.  Delivers stable inputs to Vref In the second stage Amp.

CMFB  Feedback network reduces the output common-mode variation

* CMFB : Common-mode feedback [46] K. Sohn et al., ISSCC, 2012, pp. 38-40

Chulwoo Kim Transceiver Design 61 of 86 Pseudo Open Drain (POD)

Pull-UP

Din Din

I/O Channel Buffer

Din Din

240Ω 240Ω Pull-DOWN  Impedance Calibration  Manual vs. Automatic  External Resistor

Chulwoo Kim Transceiver Design 62 of 86

Impedance Calibration

Din PUcon + PU PU n REG En PUcon WP WP WP

R R R Dout ZQ PAD Vref

En REG R R R Din PDcon + PD n PDcon WN WN WN External DRAM

 Thermometer Code Control [47] C. Park et al., JSSC, Apr. 2006, pp. 831-838

Chulwoo Kim Transceiver Design 63 of 86

Multi Slew-rate Output Driver

D in PU + PU con PU n WP/4 WP/2 WP 32WP PUcon DF En

128R 64R 32R R Dout ZQ 128R 64R 32R R PAD Vref Din + En DF nWN/4 WN/2 WN 32WN PDcon PDcon 60Ω PD 120Ω DF = Digital LPF + UP/DOWN Counter 240Ω External DRAM

 Binary-weighted Code Control [48] D. U. Lee et al., ISSCC, 2008, pp. 280-613

Chulwoo Kim Transceiver Design 64 of 86

Global ZQ Calibration

ZQ DQ0 DQn (n=1~31) Z

O b LO

l

D o i cal LO

c T 0

k

c CP i0cal

a

a

t

l

i

b CP Z Ref.

r

Q

a

t Ref.

p i PA

o

i

n n PA

Zcal LS LS Zcal (-) Global Reference Signal CP: Comparator Z PA: Pre-amplifier i0cal LS: Local PVT sensor LO: Local controller  Global Impedance Mismatch Error < 1%  PVT variation sensor [49] J. Koo et al., CICC, 2009, pp. 717-720

Chulwoo Kim Transceiver Design 65 of 86

Data Bus Inversion (DBI)

 Power reduction technique independent of data pattern  Dominant power (I/O Buffer) 2  P=α X CPCB X VDD  α < 0.5  For high-BW memory, inversion time +CRC can be a bottle neck [50] S.-S. Yoon et al., ASSCC 2008, pp.249-252

Chulwoo Kim Transceiver Design 66 of 86 Cyclic Redundancy Check (CRC)

Error type Detection rate

random single bit 100%

random double bit 100%

random odd count 100%

burst ≤ 8 100%

 Data error check for every unit interval (64 bits – data only)  Redundancy bit : 1 bit/  Speed bottleneck for high-BW  Time (READ DBI + READ CRC + CRC calculator) < 9 periods

[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252 Chulwoo Kim Transceiver Design 67 of 86 CRC (cont’d)

 X8+X2+X1+1 with an initial value of „0‟  Algorithm for GDDR5 ATM-0M83  Logic for algorithm takes a long time  To increase CRC speed  XOR logic optimization

 CRC calculation time < TCRC

Chulwoo Kim Transceiver Design 68 of 86 Outline

 Introduction  Clock Generation and Distribution  Transceiver Design  TSV Interface for DRAM  Bandwidth requirement  DRAM with TSV  TSV DRAM type  DRAM stacking type  Data confliction issue & solution  Failed TSV issue & solution  Summary  References

Chulwoo Kim 69 of 86 Bandwidth Requirements

DDRx / GDDRx Dat a Rate/Pin Trend 12 DDR Gb/s/chip Gb/s/pin ? 10 DDR2 DDR3 GDDR1 32 1 8 DDR4 GDDR3 GDDR3 51.2 1.6 6 GDDR4 GDDR5 G DDR4 102.4 3.2 4 GDDR5 224 7 2 GDDR? 448 (?) 14 (?)

Data Rate/Pin [Gbps]Data Rate/Pin 0 2000 2005 2010 2015  Requirement  Next GDDR will require over 10Gb/s/pin data rate  Restrictions  Very difficult over 10Gb/s/pin  Cost for performance improvements  Power consumption

Chulwoo Kim TSV Interface for DRAM 70 of 86 DRAM with TSV Wide I/O Memory Memory TSV Memory MCU/GPU Memory MCU/GPU Memory Interposer

 Advantages of DRAM with TSV  Higher density per area  Shorter interconnection : lower power, faster flight time  Higher bandwidth with wide I/O

 Wide I/O easily achieves 448 Gb/s/chip at next GDDR (Example : 800 Mb/s/pin ×512 I/O ≈ 448 Gb/s/chip)

Chulwoo Kim TSV Interface for DRAM 71 of 86 TSV DRAM Type

Type Main Memory Mobile Graphics

Architecture GPU Controller Package Interposer

No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA

• Low power • Low power • Max bandwidth Feature • Multi channel • High speed • Multi channel • Wide I/O

Chulwoo Kim TSV Interface for DRAM 72 of 86 Stacking Type

Type Homogeneous Heterogeneous

Slave Slave Architecture Slave Master

• Same chips • Slave : only cells Feature • Low cost • Master : with peripheral

Chulwoo Kim TSV Interface for DRAM 73 of 86 Data Confliction Issue

/EN0 DQ DQ DQ DQ DQ DQ MP0 DQ of CHIP 0 HIGH PVT Variations MN0 Fastest Chip EN0

Slowest Chip T

S

V DQ DQ DQ DQ /EN3 MP3 Data Confliction DQ of DQ CHIP 3 Pin LOW MN3  PVT variations cause the data skew EN3  Data Confliction increases the short current

[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50

Chulwoo Kim TSV Interface for DRAM 74 of 86 Separate Data Bus per Group

Group A Group B Group A Group B Bank Bank Bank Bank Group A Group B Bank Bank Bank Bank Group TSVA array Group TSVB array BankTSV array Bank BankTSV array Bank BankBank BankBank BankBank BankBank TSV array TSV array Bank Bank Bank Bank Rank 3 TSV array TSV array Bank Bank Bank Bank Rank 2 Bank Bank Bank Bank Rank 1 Rank 0

 Separate Data Bus per Bank Group  Less dependent on the PVT variation

[52] U. Kang et al., ISSCC, 2009, pp. 130-131

Chulwoo Kim TSV Interface for DRAM 75 of 86 DLL-Based Self-Aligner

Skew Skew CHIP 0 Detector Compensator Datas Aligned Datas SAM LPaPipitpeeches C_CLK latches MODE REAL PATH latches 0 Fine CK 1 Aligner

MODE T

S 0 UP/DN V

1 M

o CLKOUT PD1 d Replica e

RFBCLK l READb TRCLK PD2 TFBCLK READ

CHIP 1

CHIP 2

CHIP 3

TSV model DQS or PIN Dummy Pin  Data alignment to external clock or clock of the slowest chip [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50

Chulwoo Kim TSV Interface for DRAM 76 of 86 Failed TSV Issue

Failed TSV

 a. TSV plating defect b. pinch-off  Decreasing the assembly yield  Increasing the total cost

[53] D. Malta et al., ECTC, 2010, pp. 1779-1775

Chulwoo Kim TSV Interface for DRAM 77 of 86 TSV Check

Out_0 Out_1 Out_2 Out_3 Out_4

Scan Chain Based Testing Circuits Receiver End

TSV_0 TSV_1 TSV_3 TSV_4 TSV_2

Test Signal Generating Circuits Sender End

In_0 In_1 In_2 In_3 In_4

 A TSV connectivity check by using the internal circuit

[54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722

Chulwoo Kim TSV Interface for DRAM 78 of 86 TSV Repair

Chip1 a Chip2 Chip1 a Chip2

b A‟ A A‟ A b B B‟ B c B‟ r1 C C‟ C d C‟ r2 D c D‟ D e D‟

Conventional d Proposed f

 Redundant TSVs for Failed TSV  Conventional : redundant TSVs are dedicated and fixed  Proposed : failed TSV is repaired with a neighboring TSV

[52] U. Kang et al., ISSCC, 2009, pp. 130-131

Chulwoo Kim TSV Interface for DRAM 79 of 86 Outline

 Introduction  Clock Generation and Distribution  Transceiver Design  TSV Interface for DRAM  Summary  References

Chulwoo Kim 80 of 86 Summary

 Although all types of DRAMs are reaching their limits in supply voltage, the demand of high-bandwidth memory is keep increasing

 For synchronization of external clock and output of DRAM, low power, small area, and low skew are important design parameters

 To achieve high-BW memory, many design techniques have been and will be adopted from other high-speed wireline transceivers

 TSV interface for DRAM might be a good solution to achieve high bandwidth and low power

Chulwoo Kim Summary 81 of 86

Suggested Papers to See

 17.1 “A 6.4Gb/s near-ground single-ended transceiver for dual-rank DIMM memory interface systems”

 17.2 “A 27% reduction in transceiver power for single- ended point-to-point DRAM interface with the

termination resistance of 4×Z0 at both TX and RX”

 17.3 “A 5.7mW/Gb/s 24-to-240Ω 1.6Gb/s thin-oxide DDR transmitter with 1.9-to-7.6V/ns clock-feathering slew-rate control in 22nm CMOS”

 17.4 “An adaptive-bandwidth PLL for avoiding noise interference and DFE-less fast precharge sampling for over 10Gb/s/pin graphics DRAM interface”

Chulwoo Kim 82 of 86 References

[1] K. Koo et al., “A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and ×4 half-page architecture”, in IEEE ISSCC Dig. Tech. Papers, pp. 40–41, 2012. [2] JEDEC, JESD79F. [3] JEDEC, JESD79-2F. [4] JEDEC, JESD79-3F. [5] JEDEC, JESD79-4. [6] T.-Y. Oh et al., “A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-group restriction”, in IEEE ISSCC Dig. Tech. Papers, pp. 434–435, 2010. [7] H.-W. Lee et al., “Survey and analysis of delay-locked loops used in DRAM interfaces”, submitted to IEEE Trans. VLSI Syst. [8] T. Saeki et al., “A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay”, in IEEE ISSCC Dig. Tech. Papers, pp. 374-375, 1996. [9] A. Hatakeyama et al., “A 256 Mb SDRAM using a register-controlled digital DLL”, in IEEE ISSCC Dig. Tech. Papers, pp. 72-73, 1997. [10] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 283-284, 2003. [11] H.-W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology”, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011. [12] D. Shin et al., “Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection scheme for 54nm 7Gb/s GDDR5 DRAM interface”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 138-139, 2009. [13] W.-J. Yun et al., “A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nm CMOS technology,” IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011. [14] H.–W. Lee et al., “A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAM interface,” in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010. [15] B.-G. Kim et al., “A DLL with jitter reduction techniques and quadrature phase generation for DRAM interfaces,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.

Chulwoo Kim References 83 of 86 References

[16] W.–J. Yun et al., “A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and update gear circuit for DRAM in 66nm CMOS Technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008. [17] S. Kim et al., “A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low stand-by power DDR I/O interface” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 285-286, 2003. [18] T. Matano et al., “A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output buffer,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003. [19] K.-H. Kim et al., “Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2 synchronous DRAM application” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 287-288, 2003. [20] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAM” in IEEE Symp. VLSI Circuits Dig. Tech. Papers , pp. 283-284, 2003. [21] O. Okuda et al., “A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for SDRAMs]” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 37-38, 2001. [22] F. Lin et al., “A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5 Gb/s/pin GDDR4 SDRAM,” IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008. [23] K.-W. Kim et al., “A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase input strobing, and low-jitter fully analog DLL,” IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007. [24] D.–U. Lee et al., “A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual- loop digital DLL,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006. [25] S.–J. Bae et al., “A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration of equalization skew and offset coefficients,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521, 2005. [26] Y.-J. Jeon et al., “A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty- cycle clock dividers for production DDR SDRAMs,” IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092, Nov. 2004. [27] T. Hamamoto et al., “A 667-Mb/s operating digital DLL architecture for 512-Mb DDR,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.

Chulwoo Kim References 84 of 86

References

[28] S. Kim et al., “A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speed DRAM,” IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002. [29] J.–B. Lee et al., “Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM,” in IEEE ISSCC Dig. Tech. Papers, pp. 68-69, 2001. [30] S. Kuge et al., “A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,” IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000. [31] H.–W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology,” IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012. [32] Y. K. Kim et al., “A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bang jitter reduced DLL scheme” in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007. [33] K.–H. Kim et al., “A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high- speed DRAM application,” in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004. [34] J.–H. Lee et al., “A 330 MHz low-jitter and fast-locking direct skew compensation DLL,” in IEEE ISSCC Dig. Tech. Papers, pp. 352-353, 2000. [35] J. Kim et al., “A low-jitter mixed-mode DLL for high-speed DRAM applications,” IEEE J. Solid-State Circuits, vol. 35, no. 10, pp. 1430-1436, Oct. 2000. [36] H.–W. Lee et al., “A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power- noise management with unregulated power supply in 54nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2009, pp. 140-141. [37] S.-J. Bae et al., “A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable clock-Tracing BW,” in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011. [38] K.-h. Kim et al., “A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre- emphasis transmitter,” IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006. [39] H. Partovi et al., “Single-ended transceiver design techniques for 5.33Gb/s graphics applications,” in IEEE ISSCC Dig. Tech. Papers, pp. 136-137, 2009. [40] Y. Hidaka, “Sign-based-Zero-Forcing Adaptive Equalizer Control,” in CMOS Emerging Technologies Workshop, May 2010.

References Chulwoo Kim 85 of 86

References

[41] S.-J. Bae et al., “A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN- reduction techniques,” in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008. [42] K.-I. Oh et al., “A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009. [43] S. H. Wang et al., “A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,” IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001. [44] W. Hubert et al., “GDDR5 training-challenges and solution for ATE-based test,” in Asian Test Symposium, pp. 24-27, Nov. 2008. [45] JEDEC, JESD212. [46] K. Sohn et al., “A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme,” in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012. [47] C. Park et al., “A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006. [48] D. Lee et al., “Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm 3.0Gb/s/pin DRAM interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008. [49] J. Koo et al., “Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5 application,” in Proc. IEEE CICC, pp. 717-720, Sep. 2009. [50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid- State Circuits Conference, pp. 249-252, 2008 [51] H.-W. Lee et al., “A 283.2μW 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV) interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012. [52] U. Kang et al., “8Gb 3D DDR3 DRAM using through-silicon-via technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 130-131, 2009. [53] D. Malta et al., “Integrated process for defect-free copper plating and chemical-mechanical polishing of through-silicon vias for 3D interconnects,” in ECTC, pp. 1769-1775, 2010. [54] A.-C. Hsieh et al., “TSV redundancy: architecture and design issues in 3-D IC,” IEEE Trans. VLSI Systems, pp. 711-722, Apr. 2012.

Chulwoo Kim References 86 of 86