High-Bandwidth Memory Interface Design
Chulwoo Kim [email protected] Dept. of Electrical Engineering Korea University, Seoul, Korea
February 17, 2013
Chulwoo Kim 1 of 86 Outline
Introduction
Clock Generation and Distribution
Transceiver Design
TSV Interface for DRAM
Summary
References
Chulwoo Kim 2 of 86 Outline
Introduction DRAM 101 Simplified DRAM Architecture and Operation Differences of DRAM (DDRx, GDDRx, LPDDRx) Trend Memory Interface: Differences and Issues Clock Generation and Distribution Transceiver Design TSV Interface for DRAM Summary References
Chulwoo Kim 3 of 86 DRAM 101
SDR Single Data Rate Main Memory DDRx CLK SDRAM PC, Notebook, Server DQ D Graphics Memory Synchronous DDR GDDRx Dynamic Graphic Card, Console Random Double Data Rate Access Memory CLK Mobile Memory LPDDRx DQ D D Phone, Tablet PC
CLK MCU Command C CAS* Latency CLK & Data Command CLK SDRAM DQ D D D D D D D D Burst Length *CAS : Column Address Strobe
Chulwoo Kim Introduction 4 of 86 DRAM DDR4 Die Photo
Bank Bank Bank Bank Bank Bank Bank Bank 0 1 2 3 8 9 10 11
Supply Voltage VDD=1.2V, VPP=2.5V Process 38nm CMOS /3-metal Banks 4-Bank Group, 16 Bank Bank BankData Bank Rate Bank 2400Bank Mbps Bank Bank Bank 4 5 Number6 of IO‟s 7 X4 12/ X8 13 14 15
[1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Chulwoo Kim Introduction 5 of 86 Simplified DRAM Architecture
Bank Bank WordLineDriver
BLT BLB RowRepair Fuse RowDecoder
WL Cell Array
Column Decoder
BLSA* Write Drv. / Read Amp.
Column Repair Fuse
Peripheral Circuit Generator Serial to Parallel DCLK ICLK CMD DLL parallel to serial Controller
DQ RX DQ TX CLK/ADD/CMD Buffer
Bank Bank
* BLSA : Bit line sense amplifier Chulwoo Kim Introduction 6 of 86 Concept of DRAM operation
Bank Bank WRITE Np×Ndq : Serial to parallel BLSABLSA (DQ GIO) READ *BLSA : Bit line sense : Parallel to serial amplifier *Np: Number of (GIO DQ) pre-fetch *Ndq: Number of DQ
Peripheral Circuit
Serial to Parallel GIO *GIO : Global I/O parallel to serial Ndq bits Ndq bits Np×Ndq bits DQ RX DQ TX
Bank Bank
Chulwoo Kim Introduction 7 of 86 Pre-fetch Timing(DDR1,BL*=2)
tCCD*=1
CLK RD RD
GIO GIO GIO
After CL* BL*=2
DQS
DQ 0 1 0 1
Number of GIO channel=Np×Ndq=2×8=16 (DDR1 x8)
* tCCD : CAS to CAS delay * CL : CAS latency [2] JEDEC, JESD79F, pp. 24-29 * BL : Burst length
Chulwoo Kim Introduction 8 of 86 Pre-fetch Diagram(DDR1)
Bank Bank Bank Bank
Num. of GIO channel = 2×Ndq
Bank Bank Bank Bank
Pre-fetch operation 2-bit pre-fetch [2×Ndq] data access (If the output data rate is 400Mbps, the internal data rate is 200Mbps)
Chulwoo Kim Introduction 9 of 86 Pre-fetch Timing(DDR2,BL=4)
tCCD=2
CLK RD RD
GIO GIO GIO
After RL* BL=4
DQS
DQ 0 1 2 3 0 1 2 3
Number of GIO channel=Np×Ndq=4×8=32 (DDR2 x8)
* RL : READ latency [3] JEDEC, JESD79-2F, pp. 35
Chulwoo Kim Introduction 10 of 86 Pre-fetch Diagram(DDR2)
Bank Bank Bank Bank
Num. of GIO channel = 4×Ndq
Bank Bank Bank Bank
Pre-fetch operation 4-bit pre-fetch [4×Ndq] data access (If the output data rate is 800Mbps, the internal data rate is 200Mbps, same as DDR1)
Chulwoo Kim Introduction 11 of 86 Pre-fetch Timing(DDR3,BL=8)
tCCD=4
CLK RD RD
GIO GIO GIO After RL BL=8
DQS
DQ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
Number of GIO channel=Np×Ndq=8×8=64 (DDR3 x8)
[4] JEDEC, JESD79-3F, pp. 62
Chulwoo Kim Introduction 12 of 86 Pre-fetch Diagram(DDR3)
Bank Bank Bank Bank
Num. of GIO channel = 8×Ndq
Bank Bank Bank Bank
Pre-fetch operation 8-bit pre-fetch [8×Ndq] data access (If the output data rate is 1.6Gbps, the internal data rate is 200Mbps, same as DDR1)
Chulwoo Kim Introduction 13 of 86 Bank Grouping Timing(DDR4,BL=8)
tCCD_S=4 tCCD_L=5 CLK RD RD RD G0 G1 G1 GIO_BG0 GIO_BG0 GIO_BG1 GIO_BG1 GIO_BG1 GIO_BG2 GIO_BG3 After RL BL=8 DQS
DQ 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Number of GIO channel=Np×Ndq×Ngroup=8×8×4 = 256(DDR4 x8) [5] JEDEC, JESD79-4, pp. 77-78 [6] T. Y. Oh et al., ISSCC 2010, pp. 434-435
Chulwoo Kim Introduction 14 of 86 Pre-fetch & Bank Grouping(DDR4)
Bank Bank Bank Bank Group0 Group1
GIO MUX Num. of GIO channel = 8×Ndq
Group2 Group3 Bank Bank Bank Bank
Pre-fetch operation 8-bit pre-fetch Bank grouping [1] K. B. Koo et al., ISSCC 2012, pp. 40-41
Chulwoo Kim Introduction 15 of 86 Differences of DDRx,GDDRx,LPDDRx
DDRx GDDRx LPDDRx
Bank Bank Bank Bank PAD Bank Bank Architecture PAD PAD Bank Bank Bank Bank Bank Bank PAD
Application PC/Server Graphic card Mobile/Consumer
Socket DIMM On board MCP*/PoP*/SiP*
IO ×4/×8 ×16/×32 ×16/×32
.Single uni-directional .No DLL Unique WDQS, RDQS .DPD* .VDDQ termination .PASR* Function .CRC, DBI .TCSR* .ABI
* MCP: Multi chip package * DPD: Deep power down * PoP : Package on package * PASR : Partial array self refresh * SiP : System in package * TCSR : Temperature compensated self refresh Chulwoo Kim Introduction 16 of 86 DDR Comparison
DDR1 DDR2 DDR3 DDR4
VDD [V] 2.5 1.8 1.5 1.2 Data Rate 200M~400M 400M~800M 800M~2.1G 1.6G~3.2G [bps/pin] Pre-Fetch 2 bit 4 bit 8 bit 8 bit
STROBE Single DQS Differential DQS, DQSB
Interface SSTL_2 SSTL_18 SSTL_15 POD_12
.OCD calibration .Dynamic ODT .CA parity .ODT .ZQ calibration .DBI*, CRC* New .Write leveling .Gear down Feature .CAL* ▪ PDA* .FGREF * ▪ TCAR* .Bank grouping
* DBI: Data bus inversion * PDA: Per DRAM addressability * CRC: Cyclic redundancy check * FGREF: Fine granularity refresh * CAL: Command address latency * TCAR: Temperature controlled array refresh Chulwoo Kim Introduction 17 of 86 GDDR Comparison
GDDR1 gDDR2 GDDR3 GDDR4 GDDR5
VDD [V] 2.5 1.8 1.5 1.5 1.5/1.35 Data Rate 300~900M 800M~1G 700M~2.6G 2.0G~3.0G 3.6G~7.0G [bps/pin] Pre-Fetch 2 bit 4 bit 4 bit 8 bit 8 bit Differential STROBE Single DQS Bi-direction Single Uni-direction WDQS, RDQS DQS*, DQSB Interface SSTL_2 SSTL_2 POD-18 POD-15 POD-15
.OCD* .ZQ .DBI .No DLL calibration .Parity(opt) .PLL(option) New .ODT* .WCK, WCKB Feature .CRC ▪ ABI* .RDQS(option) .Bank grouping
* DQS: DQ strobe signal, DQ is dada I/O Pin * ODT: On die termination * OCD: Off chip driver * ABI: Address bus inversion Chulwoo Kim Introduction 18 of 86 LPDDR Comparison
LPDDR1 LPDDR2 LPDDR3
VDD [V] 1.8 1.2 1.2 Data Rate 200M~400M 200M~1066M 333M~1600M [bps/pin] Pre-Fetch 2 bit 4 bit 8 bit
STROBE DQS DQS_T, DQS_C DQS_T, DQS_C
Interface SSTL_18* HSUL_12* HSUL_12*
DLL X X X
.CA pin .ODT New (High tapped termination) Feature
* SSTL: Stub series terminated logic * HSUL: High speed un-terminated logic Chulwoo Kim Introduction 19 of 86 Trend
DDR1 Although all types of DRAMs are 2.5 reaching their limits in supply voltage, GDDR1 the demand of high-bandwidth
memory is keep increasing
LPDDR1 gDDR2 1.8 DDR2 VDD [V] VDD GDDR3 1.5 GDDR4 GDDR5 DDR3 LPDDR2 DDR4 1.2 LPDDR3
0.2 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 … 7.0 Data Rate [Gbps]
Chulwoo Kim Introduction 20 of 86
Memory Interface
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
DRAM DRAM DRAM DRAM DRAM DRAM DRAM
DRAM DRAM DRAM
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM
DRAM GPU DRAM CPU
System Feature Issue Single-ended/high speed Reflection Many channel Inter-symbol interference (weak for coupling effect) Simultaneous switching output DDR: multi-drop noise (multi rank, multi DIMM) Pin to pin skew GDDR: point to point Poor transistor performance Impedance discontinuities (stubs, connector, via, etc. )
Chulwoo Kim Introduction 21 of 86 Outline
Introduction Clock Generation and Distribution Delay-locked loop (DLL) Duty cycle corrector (DCC) Clock distribution Transceiver Design TSV Conclusions References
Chulwoo Kim 22 of 86 Basic DLL Architecture
tD1 tDVDL tDREP
I_CLK Variable Replica Clock Delay Line Delay
PD Controller FB_CLK
Data O_CLK DATA from memory core External DRAM tD2
tD1 Clock tCK ∙ N = tDVDL +tDREP
I_CLK tDREP ≈ tD1 +tD2 FB_CLK tDREP tDVDL tCK ∙ N = tDVDL +tD1 +tD2 + γ O_CLK
Data γ = tDREP – (tD1 +tD2) tD2 Chulwoo Kim Clock Generation and Distribution 23 of 86 Replica Delay Mismatch
γ variation [ps]
Supply Voltage [V]
γ ≈0
HVDD Long
Valid HVDD Valid Valid
Data Data Data
Window Window Window γ <0
tCK
VDD
VDD
LVDD LVDD Short γ >0 tDQSCK* (or tAC) tDQSCK (or tAC) tDQSCK (or tAC) *tDQSCK (or tAC) – DQS output access time for CK/CKb
Chulwoo Kim Clock Generation and Distribution 24 of 86
Locking Range Considerations
I_CLK Long tDREQUIRED tDINIT+tDREP
FB_CLK
Bird’s beak tCK
I_CLK tD +tD
INIT REP tDREQUIRED
FB_CLK Short tDQSCK (or tAC) tDINIT = tDVDL(0) + tDREP N×tCK > tDVDL(0) + tDREP
tCK = tDVDL + tDREP + t∆ [7] H.-W. Lee et al., submitted to TVLSI
Chulwoo Kim Clock Generation and Distribution 25 of 86 Synchronous Mirror Delay (SMD)
tD1 tD1+tD2 tD3 Clock Clock Delay Measure Delay Line I_CLK
I_CLK Replica Replicate Measure Delay Replicate Delay Line OUT tD2 OUT tD1 tD3 tD3 tD2
tD1+tD2
Basic Operation Measure and replicate the delay No feedback Match delay in two cycles [8] T. Saeki et al., ISSCC 1996, pp. 374-375
Chulwoo Kim Clock Generation and Distribution 26 of 86 Disadvantages of SMD
Disadvantages Mismatch between replica delay and input buffer & clock distribution Coarse resolution Input jitter multiplication tCK
tD1 tD 1+tD2 tD3 Clock w/o jitter tD1+tD2 Clock Delay Measure Delay Line tD1 tCK-(tD1+tD2) tD2 OUT I_CLK -Δ +Δ +2Δ
Replicate Delay Line Clock tD2 w/ jitter tD1+tD2 tCK-(tD1+tD2)+2Δ tD OUT tD1 tCK-(tD1+tD2)+2Δ 2 Input pk-pk OUT jitter(±Δ) Output pk-pk jitter(±2Δ)
Chulwoo Kim Clock Generation and Distribution 27 of 86 Register Controlled DLL
Sub Delay Line tD+Δ Sub Delay Line
tD+Δ tD+Δ tD+Δ tD+Δ fan-out=2 IN
SW(n-1) SW(n) SW0 SW1 SW2 SW3 SW4
tD OUT fan-out=1
tD tD tD tD tD
Main Delay Line Main Delay Line
Locking information is stored digitally in register Vernier type delay line increases resolution
[9] A. Hatakeyama et al., ISSCC 1997, pp. 72-73
Chulwoo Kim Clock Generation and Distribution 28 of 86 Single Register Controlled Delay Line Coarse Delay Fine Delay
IN1 Phase OUT12 Mixer IN2 I_CLK CSL1 CSL2 CSL3 UP/DN* Controller from PD
IN1 OUT1 IN2
IN1 1-K tUD OUT12 OUT1 tUD IN2 K OUT12
OUT2 OUT2
*DN=Down Chulwoo Kim Clock Generation and Distribution 29 of 86 Boundary Switching Problem
Shift left I_CLK IN1 Phase IN1×(1-K)+IN2×K OUT12 Mixer UDC* IN2 K=0.9
Passing through 3 UDCs tUD
IN1 IN2 K=0 K=1 K=0.9
Passing through 4 UDCs tUD Coarse shift & fine reset do not occur IN1 IN2 simultaneously
*UDC=Unit delay cell K=0 K=1 Chulwoo Kim Clock Generation and Distribution 30 of 86 Seamless Boundary Switching
Shift left
Clock IN1 Phase IN1×(1-K)+IN2×K Unit Delay Cell OUT12 Mixer IN2
K(0≤K≤1) Dual Coarse Delay Line K=0.9 K=1.0
Fine set first tUD tUD and then coarse shift IN1 IN2 IN2 IN1 K=0 K=1 K=1 K=0 [10] J.-T. Kwak et al., VLSI 2003, pp. 283-284
Chulwoo Kim Clock Generation and Distribution 31 of 86 Adaptive Bandwidth DLL w/ SDVS*
I_CLK Variable Replica Delay Line Delay
PD Controller
Update Pulse FB_CLK To Upper Block NCODE<0:N> Update Period O_CLK Pulse Gen. Fine Unit Delay vs. Mode I_CLK 1818 Update 16 15.9 ps
Pulse 1414 FB_CLK 12 10.2 ps [ps] 1010 Update Period 8 7.8 ps m×tCK-tDREP+tDREP=m×tCK 6 6 Low Base High m=2,BW =1/(2×tCK) DN BASE UP DLL -Speed -Speed *SDVS: Self-dynamic voltage scaling Mode Mode [11] H.-W. Lee et al., ISSCC 2011, pp. 502-504
Chulwoo Kim Clock Generation and Distribution 32 of 86 Duty Cycle Corrector (DCC)
DCC Reduces duty cycle error Enlarges valid data window for DDR Needs to correct ±15% duty error at max speed Can be implemented either in analog or digital type
DCC Design Issues Location of DCC (before/after DLL) Embedded in DLL or not Power consumption Area Operating frequency range Locking time in case of digital DCC Offset of duty cycle detector
Chulwoo Kim Clock Generation and Distribution 33 of 86
Digital DCC Invert and delay IN IN Out Phase IN Invert-Delay IN Mixer Clock OUT Generator 50% 50% IN Pulse Width Out Controller IN
Duty Cycle OUT Detector 50% 50% IN IN
Half-Cycle Edge Out HD_IN HD_IN Delayed Combiner Clock OUT Generator 50% 50% Chulwoo Kim Clock Generation and Distribution 34 of 86 DCC in GDDR5 Clk Distribution
CML only Network
r
l
PLL e
r
t a
e 4 4
4 a
b
v
i
r e
o DQ
r
l
e p
4 D
d
e
G
X
i
v
R
R i D PLL sel. DQ rxclk rxclkb Duty Cycle CML2 Duty Corrector CMOS Cycle Adjuster Control Pulse 4 Generator rxclk c<1:5> sw hclk & lclk rxclkb
Adder- b
Duty Cycle WCK WCKb k
k l
based l Decreasing
c
c x
Detector x duty-cycle Counter r up/dn 4-phase CML_bias RX r s<1:4> clock Duty-Cycle Adjuster (DCA) DCA is not in clock path X8 X4 X2 X1 X1 X2 X4 X8 No jitter addition c<1:5> [12] D. Shin et al., VLSI 2009, pp. 138-139 Decoder
Chulwoo Kim Clock Generation and Distribution 35 of 86 DLL-related Parameters & Reference
DDR1 DDR2 DDR3/DDR3L GDDR3 GDDR4 VDD 2.5V 1.8V 1.5V/1.35V 1.8V 1.5V Lock time 200 cycles 200 cycles 512 cycles 2~5K cycles 2~20K cycles Max. tDQSCK 600ps 300ps 225ps 180ps 140ps Nominal 333MHz~ 600MHz~ 166MHz 333MHz 1.6GHz speed 800MHz 1.37GHz Max. tCK 12ns 8ns 3.3n 3.3n 2.5ns
tXPDLL*(tXARD) 1×tCK 2×tCK 10×tCK 7×tCK+tIS 9×tCK+tIS tXPDLL*(tXARD) – Timing for exit precharge power-down to any non-READ command RELATED AREA REFERENCE Type DCC block [13][ 14][ 15]**[ 16][ 17]** [18][ 19]** [20][ 21]** [ ] Variable [14][ 18][ 19]**[ 20][ 22][ 23]**[ 24][ 25]*[ 26] digital Delay Line [27][28]** [29] [30] [ ]* Delay [13][ 14][ 15]**[ 16][ 18][ 20][ 21]**[ 23]*[ 25]*[ 26][ 28]** mixed Control Logic [29][ 30]** [31][ 32]*[ 33]**[ 34]* [35]* Replica [27] [28]** [30]**[ 32] [ ]** Low Jitter [14][ 15]**[ 16][ 17]** [19]**[ 24][ 26][ 27][ 32]* [36]* analog
Chulwoo Kim Clock Generation and Distribution 36 of 86 Clock Distribution
DQ DQ DQ DQ CK/CKB DQ DQ DQ DQ Global Clock Buffer DQ DQ DQ DQ DQ DQ DQ DQ 93,750μm
Clock Distribution Issues
Clock skew among DQs m Low power μ Robust under PVT variations
CML to CMOS converter jitter 1,200
[37] S.-J. Bae, et al., ISSCC, 2011, pp. 498-500
Chulwoo Kim Clock Generation and Distribution 37 of 86 CML to CMOS Converter
Global Clock Buffer CML to CMOS Converter
OUTN 1700μm OUTP CLKOUT DQ CLKP CLKN CLKP CLKN
Global Clock Buffer Current logic mode : high-speed clock
CML to CMOS Converter Issue Susceptible to noise Jitter
Chulwoo Kim Clock Generation and Distribution 38 of 86 Outline
Introduction
Clock Generation and Distribution Pre-emphasis Transceiver Design Channel Training DBI/CRC Pre-emphasis Equalizer Output Crosstalk and skew driver
Training CH
Input buffer Output driver Input DBI/CRC buffer TSV Interface for DRAM Summary Training DBI/CRC References Equalizer
Chulwoo Kim 39 of 86 Channel Characteristics
GDDRx CPU
Socket GDDRx
GPU
DIMM Slot DIMM
GDDRx DDRx Point to point connection Multidrop Performance target Performance and power • High data rate Many reflection components Few reflection components • PCB VIAS, DIMM connector…. • PCB VIAS
Chulwoo Kim Transceiver Design 40 of 86
Emphasis for Channel Compensation
e e
d d
u u
t t
i i
l l
p p
m m
A A
Channel
Original Signal Time Distorted Signal Time
D(in) FFE D(out)
Channel
e
e
e
d
d
d
u u
u FFE Channel Channel FFE
t
t
t
i
i
i
l
l
l
p
p
p
m
m
m
A
A A
fdata/2 Freq. fdata/2 Freq. fdata/2 Freq.
Chulwoo Kim Transceiver Design 41 of 86 Pre-emphasis vs. De-emphasis
1-tap pre-emphasis Va
No emphasis Va
1-tap de-emphasis Va
Time Pre-emphasis : Transition Bit Boosting De-emphasis : Non-transition Bit Suppression
Chulwoo Kim Transceiver Design 42 of 86 Basic De-emphasis Circuit
Y(n)
Dout
K0 -K1
Din X(n) Unit D Q delay QB
<1-tap de-emphasis model>
The Number of Taps Depends on the channel quality and bit rate Usually from one to three taps
Chulwoo Kim Transceiver Design 43 of 86 Pre-emphasis Circuit[1/2]
DQ Din(n) 1.20 4:2 2:1 Driver No Pre-emphasis DQB ]
V 1.00 Din(n-1) [ 1.20 4:2 2:1 e g Conventional
a Pre-emphasis t l
o 1.08
V 1.20 Pre- Proposed 2:1 emph. Pre-emphasis 1.04 Din(n-2) 0 400 4:2 2:1 Time[psec]
Cascaded Pre-emphasis Internal node ISI due to limited TR performance at high speed Internal node pre-emphasis ratio would not be affected by the channel Less sensitive to the system environment or channel variations [38] K.-H. Kim et al., JSSC, Jan 2006, pp. 127-134
Chulwoo Kim Transceiver Design 44 of 86 Pre-emphasis Circuit[2/2]
Main Driver
r Equivalent Linear Model e
v RT i
r D R D
Din TX CH GPU in T out
D
- e
r RT P CP RT CL RT Dout
RC CC r
BW e CL
v RC CP
i
r D
- CC e BW r RC P Boosting Capacitor
Pre-Emph. Driver Voltage Mode Driver Pre-emphasis Additional zero by Cc Time continuous pre-emphasis [39] H. Partovi et al., ISSCC, 2009, pp.136-137
Chulwoo Kim Transceiver Design 45 of 86 Decision Feedback Equalization (DFE)
Emulated e
e e e
d
d d ISI d ISI
u
u u u
t
t t
1UI t
i No ISI
i i i
l l
l l
p p
p p
m m
m m
A A
A A
Time Time Time Time (A) (B) (C) (D) DFE cancels ISI without noise amplification Clock must be provided by DLL or PLL Critical path (feedback path) is important [40] Y. Hidaka, CMOS Emerging Technologies Workshop, May 2010
Chulwoo Kim Transceiver Design 46 of 86 Fast Feedback 1-tap DFE
DQ DFE SA P0 SR Latch D0
WCK/2_0 WCK/2_0 WCK/2_0
P90 P0b P0 DFE SA SR Latch D90 WCK/2_90
DFE SA P180 SR Latch D180
WCK/2_180 DQ Vref P270b × α × α P270 DFE SA P270 SR Latch D270 Vref WCK/2_0 WCK/2_0 × α WCK/2_270
DQ D270 D0 D90
WCK/2_270 T The previously captured data FB P270 Precharge Evaluation must be fed back to the 1UI receiver within 1UI WCK/2_0 -ISI P0 Precharge Evaluation
[41] S.-J. Bae et al., ISSCC, 2008, pp. 278-279 TFB=TSA<1UI
Chulwoo Kim Transceiver Design 47 of 86 Crosstalk
Cm Lm
Near Far Near Far ICm ILm
Inear=ICm+ILm
Ifar=ICm─ILm
Timing Effect Input signal Input signal Timing Jitter at far end Near end
crosstalk Signal Integrity Far end
crosstalk
Crosstalk is coupling of energy from one line to another
Chulwoo Kim Transceiver Design 48 of 86 Staggered Memory Bus
Staggered Memory Bus Channel
MCU DRAM τ Channel
No discrepancy of propagation delay due to the crosstalk Difference of transition point is τ/2 Distance between channels with the same transition is increased Jitter due to coupling from the adjacent channel is reduced [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
Chulwoo Kim Transceiver Design 49 of 86 Glitch Canceller
TX1 Aggressor
TX2 D Victim DTX1 TX2 Transition Rise/Fall D Detector TX1 IBIAS+ICOMP DTX3 Rise TX3 Fall
DTX2
Compensation for glitch by adding or subtracting current
Rise : ICOMP is added to the main driver
Fall : ICOMP is subtracted from the main driver [42] K.-I. Oh et al., JSSC, Aug. 2009, pp. 2222-2232
Chulwoo Kim Transceiver Design 50 of 86 Crosstalk Equalizer (TX)
Crosstalk Equalizing Driver DO[0] DQ[0]
DO[1:3] EN[1] EN[0] EN[0:5] DO[0] DQ[0]
EN[0] EN[1]
DO[1]
∆t
Crosstalk equalization at transmitter Cancel the crosstalk by the impedance calibration [37] S.-J. Bae et al., ISSCC, Feb. 2011, pp. 498-500
Chulwoo Kim Transceiver Design 51 of 86 Skew
MCU/GPU CLK DRAM
Peripheral Circuit Peripheral
Controller
Bank CMD TD
Command
TD‟
Address DLL
T ‟‟ Serial
DQS D Parallel
Bank
.
DQ
Generator
Differences of flight time between signals Skew can cause timing errors Key design criterion in high-speed systems
Chulwoo Kim Transceiver Design 52 of 86 Pre/De-skew with Preamble Signal
8 Skewed Data Data 8 Data De-skewed Ext.Clk Delay Data PLL 8 Mux Lines
Sampling 8 3 Clk Skew 3 Register Data[n] Estimator Skew Files
Skew cancellation circuit is put in each DRAM With estimated skew information De-skew the data during write mode Pre-skew the data during read mode [43] S. H. Wang et al., JSSC, Apr. 2001, pp. 648-657
Chulwoo Kim Transceiver Design 53 of 86 Fly-by Topology for DDR3
T-branch T-branch Topology DRAM DRAM DRAM DRAM #1 #2 #7 #8 CLK/CMD/Address are applied to each DRAM in parallel
DQ & DQS Small skew bw. CLK and DQS ] s
[ CLK, CMD, Address w
e k
S Fly-by Topology DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM Better signal integrity to reduce #1 #2 #3 #4 #5 #6 #7 #8 the number of stubs and stub Fly-by VTT length DRAM DRAM DRAM DRAM #1 #2 #7 #8 Easy to apply a single termination at the end of signal DQ & DQS DQ and DQS are applied to each
] CLK, CMD, Address DRAM at the same time
s [
w Large skew bw. CLK and DQS
e k
S Need to calibrate skew
DRAM DRAM DRAM DRAM DRAM DRAM DRAM DRAM [4] JEDEC, JESD79-3E, pp. 56-59 #1 #2 #3 #4 #5 #6 #7 #8 Chulwoo Kim Transceiver Design 54 of 86 Write Leveling for DDR3
T0 T1 T2 T3 T4 T5 T6 T7 CK# Source CK
diff_DQS Tn T0 T1 T2 T3 T4 T5 T6 Destination CK# CK
diff_DQS
DQ 0 or 1 0 0 0
Push DQS to capture 0-1 transition diff_DQS
DQ 0 or 1 1 1 1 Write Leveling Timing mismatch compensation between CLK and DQS Write leveling is applied to all DRAMs, respectively [4] JEDEC, JESD79-3F, pp. 56-59
Chulwoo Kim Transceiver Design 55 of 86 Training for GDDR5
GDDR5 Timing after Training
CK
CMD
ADDR
WCK
DQ
Adaptive Interface Training Ensure the Widest Timing Margins for All Signals Controlled by MCU [44] W. Hubert et al., ATS, 2008, pp. 24-27
Chulwoo Kim Transceiver Design 56 of 86 Training Sequence for GDDR5
Power Up Detect the configuration and mirror function ODT setting
Optional Address Training Optimize address input data eye
WCK2CK Clock alignment Alignment Training Ready for read/write
Search for best read data eye READ Training Detect burst boundaries of read stream
Search for best write data eye WRITE Training Detect burst boundaries of write stream
Exit [45] JEDEC, JESD212, pp. 23-39
Chulwoo Kim Transceiver Design 57 of 86 Training Example : Write Training
t0
t0
t0 t1 t2
t0
t0 + t1
t0 - t2 t2 t1
[44] W. Hubert et al., ATS, 2008, pp. 24-27
Chulwoo Kim Transceiver Design 58 of 86 Input Buffer
MCU/GPU Circuit Peripheral DRAM
CLK Controller CMD
Command Bank
4
Address DLL
m* Serial Serial
DQS Parallel
Bank
.
DQ
n GEN
* m: The number of address channels which are depend on kinds of memory or its density
Convert attenuated external signal to rail-to-rail signal Trade-off between high speed operation and power consumption
Chulwoo Kim Transceiver Design 59 of 86 Input Buffer Comparison
Differential Type En En OUT Complex circuit High-speed input Vref In Robust to noise Stable threshold En Commonly used
CMOS Type En In OUT Simple circuit Low-speed input (CKE) Susceptible to noise En Unstable threshold
Chulwoo Kim Transceiver Design 60 of 86 DDR4 Input Buffer
In Gain Enhanced Buffer InBuffer Vref Signal transition detector is added The bias level (I) is controlled Transition I Sensitivity can be enhanced Detector at higher frequencies
Wide Common-Mode Range DQ Buffer Amp. Delivers stable inputs to Vref In the second stage Amp.
CMFB Feedback network reduces the output common-mode variation
* CMFB : Common-mode feedback [46] K. Sohn et al., ISSCC, 2012, pp. 38-40
Chulwoo Kim Transceiver Design 61 of 86 Pseudo Open Drain (POD)
Pull-UP
Din Din
I/O Channel Buffer
Din Din
240Ω 240Ω Pull-DOWN Impedance Calibration Manual vs. Automatic External Resistor
Chulwoo Kim Transceiver Design 62 of 86
Impedance Calibration
Din PUcon + PU PU n REG En PUcon WP WP WP
R R R Dout ZQ PAD Vref
En REG R R R Din PDcon + PD n PDcon WN WN WN External DRAM
Thermometer Code Control [47] C. Park et al., JSSC, Apr. 2006, pp. 831-838
Chulwoo Kim Transceiver Design 63 of 86
Multi Slew-rate Output Driver
D in PU + PU con PU n WP/4 WP/2 WP 32WP PUcon DF En
128R 64R 32R R Dout ZQ 128R 64R 32R R PAD Vref Din + En DF nWN/4 WN/2 WN 32WN PDcon PDcon 60Ω PD 120Ω DF = Digital LPF + UP/DOWN Counter 240Ω External DRAM
Binary-weighted Code Control [48] D. U. Lee et al., ISSCC, 2008, pp. 280-613
Chulwoo Kim Transceiver Design 64 of 86
Global ZQ Calibration
ZQ DQ0 DQn (n=1~31) Z
O b LO
l
D o i cal LO
c T 0
k
c CP i0cal
a
a
t
l
i
b CP Z Ref.
r
Q
a
t Ref.
p i PA
o
i
n n PA
Zcal LS LS Zcal (-) Global Reference Signal CP: Comparator Z PA: Pre-amplifier i0cal LS: Local PVT sensor LO: Local controller Global Impedance Mismatch Error < 1% PVT variation sensor [49] J. Koo et al., CICC, 2009, pp. 717-720
Chulwoo Kim Transceiver Design 65 of 86
Data Bus Inversion (DBI)
Power reduction technique independent of data pattern Dominant power (I/O Buffer) 2 P=α X CPCB X VDD α < 0.5 For high-BW memory, inversion time +CRC can be a bottle neck [50] S.-S. Yoon et al., ASSCC 2008, pp.249-252
Chulwoo Kim Transceiver Design 66 of 86 Cyclic Redundancy Check (CRC)
Error type Detection rate
random single bit 100%
random double bit 100%
random odd count 100%
burst ≤ 8 100%
Data error check for every unit interval (64 bits – data only) Redundancy bit : 1 bit/byte Speed bottleneck for high-BW Time (READ DBI + READ CRC + CRC calculator) < 9 periods
[50] S.-S. Yoon et al., ASSCC 2008, pp.249-252 Chulwoo Kim Transceiver Design 67 of 86 CRC (cont’d)
X8+X2+X1+1 with an initial value of „0‟ Algorithm for GDDR5 ATM-0M83 Logic for algorithm takes a long time To increase CRC speed XOR logic optimization
CRC calculation time < TCRC
Chulwoo Kim Transceiver Design 68 of 86 Outline
Introduction Clock Generation and Distribution Transceiver Design TSV Interface for DRAM Bandwidth requirement DRAM with TSV TSV DRAM type DRAM stacking type Data confliction issue & solution Failed TSV issue & solution Summary References
Chulwoo Kim 69 of 86 Bandwidth Requirements
DDRx / GDDRx Dat a Rate/Pin Trend 12 DDR Gb/s/chip Gb/s/pin ? 10 DDR2 DDR3 GDDR1 32 1 8 DDR4 GDDR3 GDDR3 51.2 1.6 6 GDDR4 GDDR5 G DDR4 102.4 3.2 4 GDDR5 224 7 2 GDDR? 448 (?) 14 (?)
Data Rate/Pin [Gbps]Data Rate/Pin 0 2000 2005 2010 2015 Requirement Next GDDR will require over 10Gb/s/pin data rate Restrictions Very difficult over 10Gb/s/pin Cost for performance improvements Power consumption
Chulwoo Kim TSV Interface for DRAM 70 of 86 DRAM with TSV Wide I/O Memory Memory TSV Memory MCU/GPU Memory MCU/GPU Memory Interposer
Advantages of DRAM with TSV Higher density per area Shorter interconnection : lower power, faster flight time Higher bandwidth with wide I/O
Wide I/O easily achieves 448 Gb/s/chip at next GDDR (Example : 800 Mb/s/pin ×512 I/O ≈ 448 Gb/s/chip)
Chulwoo Kim TSV Interface for DRAM 71 of 86 TSV DRAM Type
Type Main Memory Mobile Graphics
Architecture GPU Controller Package Interposer
No. of TSV 500~1000 EA 1000~1500 EA 2000~3000 EA
• Low power • Low power • Max bandwidth Feature • Multi channel • High speed • Multi channel • Wide I/O
Chulwoo Kim TSV Interface for DRAM 72 of 86 Stacking Type
Type Homogeneous Heterogeneous
Slave Slave Architecture Slave Master
• Same chips • Slave : only cells Feature • Low cost • Master : with peripheral
Chulwoo Kim TSV Interface for DRAM 73 of 86 Data Confliction Issue
/EN0 DQ DQ DQ DQ DQ DQ MP0 DQ of CHIP 0 HIGH PVT Variations MN0 Fastest Chip EN0
Slowest Chip T
S
V DQ DQ DQ DQ /EN3 MP3 Data Confliction DQ of DQ CHIP 3 Pin LOW MN3 PVT variations cause the data skew EN3 Data Confliction increases the short current
[51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
Chulwoo Kim TSV Interface for DRAM 74 of 86 Separate Data Bus per Group
Group A Group B Group A Group B Bank Bank Bank Bank Group A Group B Bank Bank Bank Bank Group TSVA array Group TSVB array BankTSV array Bank BankTSV array Bank BankBank BankBank BankBank BankBank TSV array TSV array Bank Bank Bank Bank Rank 3 TSV array TSV array Bank Bank Bank Bank Rank 2 Bank Bank Bank Bank Rank 1 Rank 0
Separate Data Bus per Bank Group Less dependent on the PVT variation
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
Chulwoo Kim TSV Interface for DRAM 75 of 86 DLL-Based Self-Aligner
Skew Skew CHIP 0 Detector Compensator Datas Aligned Datas SAM LPaPipitpeeches C_CLK latches MODE REAL PATH latches 0 Fine CK 1 Aligner
MODE T
S 0 UP/DN V
1 M
o CLKOUT PD1 d Replica e
RFBCLK l READb TRCLK PD2 TFBCLK READ
CHIP 1
CHIP 2
CHIP 3
TSV model DQS or PIN Dummy Pin Data alignment to external clock or clock of the slowest chip [51] H.-W. Lee et al., ISSCC, 2012, pp. 48-50
Chulwoo Kim TSV Interface for DRAM 76 of 86 Failed TSV Issue
Failed TSV
a. TSV plating defect b. pinch-off Decreasing the assembly yield Increasing the total cost
[53] D. Malta et al., ECTC, 2010, pp. 1779-1775
Chulwoo Kim TSV Interface for DRAM 77 of 86 TSV Check
Out_0 Out_1 Out_2 Out_3 Out_4
Scan Chain Based Testing Circuits Receiver End
TSV_0 TSV_1 TSV_3 TSV_4 TSV_2
Test Signal Generating Circuits Sender End
In_0 In_1 In_2 In_3 In_4
A TSV connectivity check by using the internal circuit
[54] A.-C. Hsieh et al., TVLSI, Apr. 2012, pp. 711-722
Chulwoo Kim TSV Interface for DRAM 78 of 86 TSV Repair
Chip1 a Chip2 Chip1 a Chip2
b A‟ A A‟ A b B B‟ B c B‟ r1 C C‟ C d C‟ r2 D c D‟ D e D‟
Conventional d Proposed f
Redundant TSVs for Failed TSV Conventional : redundant TSVs are dedicated and fixed Proposed : failed TSV is repaired with a neighboring TSV
[52] U. Kang et al., ISSCC, 2009, pp. 130-131
Chulwoo Kim TSV Interface for DRAM 79 of 86 Outline
Introduction Clock Generation and Distribution Transceiver Design TSV Interface for DRAM Summary References
Chulwoo Kim 80 of 86 Summary
Although all types of DRAMs are reaching their limits in supply voltage, the demand of high-bandwidth memory is keep increasing
For synchronization of external clock and output of DRAM, low power, small area, and low skew are important design parameters
To achieve high-BW memory, many design techniques have been and will be adopted from other high-speed wireline transceivers
TSV interface for DRAM might be a good solution to achieve high bandwidth and low power
Chulwoo Kim Summary 81 of 86
Suggested Papers to See
17.1 “A 6.4Gb/s near-ground single-ended transceiver for dual-rank DIMM memory interface systems”
17.2 “A 27% reduction in transceiver power for single- ended point-to-point DRAM interface with the
termination resistance of 4×Z0 at both TX and RX”
17.3 “A 5.7mW/Gb/s 24-to-240Ω 1.6Gb/s thin-oxide DDR transmitter with 1.9-to-7.6V/ns clock-feathering slew-rate control in 22nm CMOS”
17.4 “An adaptive-bandwidth PLL for avoiding noise interference and DFE-less fast precharge sampling for over 10Gb/s/pin graphics DRAM interface”
Chulwoo Kim 82 of 86 References
[1] K. Koo et al., “A 1.2V 38nm 2.4Gb/s/pin 2Gb DDR4 SDRAM with bank group and ×4 half-page architecture”, in IEEE ISSCC Dig. Tech. Papers, pp. 40–41, 2012. [2] JEDEC, JESD79F. [3] JEDEC, JESD79-2F. [4] JEDEC, JESD79-3F. [5] JEDEC, JESD79-4. [6] T.-Y. Oh et al., “A 7Gb/s/pin GDDR5 SDRAM with 2.5ns bank-to-bank active time and no bank-group restriction”, in IEEE ISSCC Dig. Tech. Papers, pp. 434–435, 2010. [7] H.-W. Lee et al., “Survey and analysis of delay-locked loops used in DRAM interfaces”, submitted to IEEE Trans. VLSI Syst. [8] T. Saeki et al., “A 2.5 ns clock access 250 MHz 256 Mb SDRAM with a synchronous mirror delay”, in IEEE ISSCC Dig. Tech. Papers, pp. 374-375, 1996. [9] A. Hatakeyama et al., “A 256 Mb SDRAM using a register-controlled digital DLL”, in IEEE ISSCC Dig. Tech. Papers, pp. 72-73, 1997. [10] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1Gbps x32 DDR SDRAM”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 283-284, 2003. [11] H.-W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology”, in IEEE ISSCC Dig. Tech. Papers, pp. 502-504, 2011. [12] D. Shin et al., “Wide-range fast-lock duty-cycle corrector with offset-tolerant duty-cycle detection scheme for 54nm 7Gb/s GDDR5 DRAM interface”, in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 138-139, 2009. [13] W.-J. Yun et al., “A 3.57 Gb/s/pin low jitter all-digital DLL with dual DCC circuit for GDDR3 DRAM in 54-nm CMOS technology,” IEEE Trans. VLSI Sys., vol. 19, no. 9, pp. 1718-1722, Nov. 2011. [14] H.–W. Lee et al., “A 7.7mW/1.0ns/1.35V delay locked loop with racing mode and OA-DCC for DRAM interface,” in Proc. of Int. Symp. Circuits and Syst., pp. 3861-3864, 2010. [15] B.-G. Kim et al., “A DLL with jitter reduction techniques and quadrature phase generation for DRAM interfaces,” IEEE J. Solid-State Circuits, vol. 44, no. 5, pp. 1522-1530, May 2009.
Chulwoo Kim References 83 of 86 References
[16] W.–J. Yun et al., “A 0.1-to-1.5GHz 4.2mW all-digital DLL with dual duty-cycle correction circuit and update gear circuit for DRAM in 66nm CMOS Technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 282-283, 2008. [17] S. Kim et al., “A low jitter, fast recoverable, fully analog DLL using tracking ADC for high speed and low stand-by power DDR I/O interface” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 285-286, 2003. [18] T. Matano et al., “A 1-Gb/s/pin 512-Mb DDRII SDRAM using a digital DLL and a slew-rate-controlled output buffer,” IEEE J. Solid-State Circuits, vol. 38, no. 5, pp. 762-768, May 2003. [19] K.-H. Kim et al., “Built-in duty cycle corrector using coded phase blending scheme for DDR/DDR2 synchronous DRAM application” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 287-288, 2003. [20] J.-T. Kwak et al., “A low cost high performance register-controlled digital DLL for 1 Gbps x32 DDR SDRAM” in IEEE Symp. VLSI Circuits Dig. Tech. Papers , pp. 283-284, 2003. [21] O. Okuda et al., “A 66-400 MHz, adaptive-lock-mode DLL circuit with duty-cycle error correction [for SDRAMs]” in IEEE Symp. VLSI Circuits Dig. Tech. Papers, pp. 37-38, 2001. [22] F. Lin et al., “A wide-range mixed-mode DLL for a combination 512 Mb 2.0 Gb/s/pin GDDR3 and 2.5 Gb/s/pin GDDR4 SDRAM,” IEEE J. Solid-State Circuits, vol. 43, no. 3, pp. 631-641, Mar. 2008. [23] K.-W. Kim et al., “A 1.5-V 3.2 Gb/s/pin Graphic DDR4 SDRAM With dual-clock system, four-phase input strobing, and low-jitter fully analog DLL,” IEEE J. Solid-State Circuits, vol. 42, no. 11, pp. 2369-2377, Nov. 2007. [24] D.–U. Lee et al., “A 2.5Gb/s/pin 256Mb GDDR3 SDRAM with series pipelined CAS latency control and dual- loop digital DLL,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 547-548, 2006. [25] S.–J. Bae et al., “A 3Gb/s 8b single-ended transceiver for 4-drop DRAM interface with digital calibration of equalization skew and offset coefficients,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, pp. 520-521, 2005. [26] Y.-J. Jeon et al., “A 66-333-MHz 12-mW register-controlled DLL with a single delay line and adaptive-duty- cycle clock dividers for production DDR SDRAMs,” IEEE J. Solid-State Circuits, vol. 39, no. 11, pp. 2087-2092, Nov. 2004. [27] T. Hamamoto et al., “A 667-Mb/s operating digital DLL architecture for 512-Mb DDR,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 194-206, Jan. 2004.
Chulwoo Kim References 84 of 86
References
[28] S. Kim et al., “A low-jitter wide-range skew-calibrated dual-loop DLL using antifuse circuitry for high-speed DRAM,” IEEE J. Solid-State Circuits, vol. 37, no. 6, pp. 726-734, Jun. 2002. [29] J.–B. Lee et al., “Digitally-controlled DLL and I/O circuits for 500 Mb/s/pin x16 DDR SDRAM,” in IEEE ISSCC Dig. Tech. Papers, pp. 68-69, 2001. [30] S. Kuge et al., “A 0.18um 256-Mb DDR-SDRAM with low-cost post-mold tuning method for DLL replica,” IEEE J. Solid-State Circuits, vol. 35, no. 11, pp. 726-734, Nov. 2000. [31] H.–W. Lee et al., “A 1.6V 1.4Gb/s/pin consumer DRAM with self-dynamic voltage-scaling technique in 44nm CMOS technology,” IEEE J. Solid-State Circuits. vol. 47, no. 1, pp. 131-140, Jan. 2012. [32] Y. K. Kim et al., “A 1.5V, 1.6Gb/s/pin, 1Gb DDR3 SDRAM with an address queuing scheme and bang-bang jitter reduced DLL scheme” in IEEE Symp. VLSI Dig. Tech. Papers, pp. 182-183, 2007. [33] K.–H. Kim et al., “A 1.4 Gb/s DLL using 2nd order charge-pump scheme with low phase/duty error for high- speed DRAM application,” in IEEE ISSCC Dig. Tech. Papers, pp. 213-214, 2004. [34] J.–H. Lee et al., “A 330 MHz low-jitter and fast-locking direct skew compensation DLL,” in IEEE ISSCC Dig. Tech. Papers, pp. 352-353, 2000. [35] J. Kim et al., “A low-jitter mixed-mode DLL for high-speed DRAM applications,” IEEE J. Solid-State Circuits, vol. 35, no. 10, pp. 1430-1436, Oct. 2000. [36] H.–W. Lee et al., “A 1.6V 3.3Gb/s GDDR3 DRAM with dual-mode phase- and delay-locked loop using power- noise management with unregulated power supply in 54nm CMOS,” in IEEE ISSCC Dig. Tech. Papers, 2009, pp. 140-141. [37] S.-J. Bae et al., “A 40nm 2Gb 7Gb/s/pin GDDR5 SDRAM with a Programmable DQ Ordering Crosstalk Equalizer and Adjustable clock-Tracing BW,” in IEEE ISSCC Dig. Tech. Papers, pp. 498-500, 2011. [38] K.-h. Kim et al., “A 20-Gb/s 256-Mb DRAM with an inductorless quadrature PLL and a cascaded pre- emphasis transmitter,” IEEE J. Solid-State Circuits, vol.41, no. 1, pp. 127-134, Jan. 2006. [39] H. Partovi et al., “Single-ended transceiver design techniques for 5.33Gb/s graphics applications,” in IEEE ISSCC Dig. Tech. Papers, pp. 136-137, 2009. [40] Y. Hidaka, “Sign-based-Zero-Forcing Adaptive Equalizer Control,” in CMOS Emerging Technologies Workshop, May 2010.
References Chulwoo Kim 85 of 86
References
[41] S.-J. Bae et al., “A 60nm 6Gb/s/pin GDDR5 graphics DRAM with multifaceted clocking and ISI/SSN- reduction techniques,” in IEEE ISSCC Dig. Tech. Papers, pp. 278-279, 2008. [42] K.-I. Oh et al., “A 5-Gb/s/pin transceiver for DDR memory interface with a crosstalk suppression scheme,” IEEE J. Solid-State Circuits, vol. 44, no. 8, pp. 2222-2232, Aug. 2009. [43] S. H. Wang et al., “A 500-Mb/s quadruple data rate SDRAM interface using a skew cancellation technique,” IEEE J. Solid-State Circuits, vol. 36, no. 4, pp. 648-657, Apr. 2001. [44] W. Hubert et al., “GDDR5 training-challenges and solution for ATE-based test,” in Asian Test Symposium, pp. 24-27, Nov. 2008. [45] JEDEC, JESD212. [46] K. Sohn et al., “A 1.2V 30nm 3.2Gb/s/pin 4Gb DDR4 SDRAM with dual-error detection and PVT-tolerant data-fetch scheme,” in IEEE ISSCC Dig. Tech. Papers, pp. 38-40, 2012. [47] C. Park et al., “A 512-mb DDR3 SDRAM prototype with CIO minimization and self-calibration techniques,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 831-838, Apr. 2006. [48] D. Lee et al., “Multi-slew-rate output driver and optimized impedance-calibration circuit for 66nm 3.0Gb/s/pin DRAM interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 280-613, 2008. [49] J. Koo et al., “Small-area high-accuracy ODT/OCD by calibration of global on-chip for 512M GDDR5 application,” in Proc. IEEE CICC, pp. 717-720, Sep. 2009. [50] S.-S. Yoon et al., "A fast GDDR5 read CRC calculation circuit with read DBI operation," IEEE Asian Solid- State Circuits Conference, pp. 249-252, 2008 [51] H.-W. Lee et al., “A 283.2μW 800Mbp/s/pin DLL-based data self-aligner for through silicon via (TSV) interface,” in IEEE ISSCC Dig. Tech. Papers, pp. 48-50, 2012. [52] U. Kang et al., “8Gb 3D DDR3 DRAM using through-silicon-via technology,” in IEEE ISSCC Dig. Tech. Papers, pp. 130-131, 2009. [53] D. Malta et al., “Integrated process for defect-free copper plating and chemical-mechanical polishing of through-silicon vias for 3D interconnects,” in ECTC, pp. 1769-1775, 2010. [54] A.-C. Hsieh et al., “TSV redundancy: architecture and design issues in 3-D IC,” IEEE Trans. VLSI Systems, pp. 711-722, Apr. 2012.
Chulwoo Kim References 86 of 86