A 16nm 256-bit Wide 89.6GByte/s Total Bandwidth In-Package Interconnect with 0.3V Swing and 0.062pJ/bit Power in InFO Package
Mu-Shan Lin, Chien-Chun Tsai, Cheng-Hsiang Hsieh, Wen-Hung Huang, Yu-Chi Chen, Shu-Chun Yang, Chin- Ming Fu, Hao-Jie Zhan, Jinn-Yeh Chien, Shao-Yu Li, Y.-H. Chen, C.-C. Kuo, Shih-Peng Tai and Kazuyoshi Yamada Taiwan Semiconductor Manufacturing Company, Ltd. DTP TSMC Property Taiwan
© 2016 TSMC, Ltd Outline Security C – TSMC Property Secret Motivation In-Package Interconnect Applications Advance Package Solutions Introduction InFO Process Low Power Design Concept (2013-VLSI) System Architecture Circuit Descriptions Experimental Results Conclusion 1
© 2016 TSMC, Ltd In-Package Interconnect Applications Security C – TSMC Property Secret LIPINCONTM: Low-voltage-In-Package-INter-CONnect
High performance computing
Limit die size for yield In package memory Higher level cache die Heterogeneous integration chip die chip High speed SERDES PCB Smaller form factor Shorter interconnect trace
2
© 2016 TSMC, Ltd High Performance Computing (HPC) Security C – TSMC Property Secret Yield goes down exponentially as die size gets large Building large dice is quite difficult and very costly
~30mm
CORE ~15mm
CORE CORE CORE CORE
CORE CORE CORE CORE ~5mm
CORE CORE CORE CORE ~20mm CORE CORE CORE CORE CORE
3 [1] Intel, 2015-ISSCC, “The Xeon Processor..”
© 2016 TSMC, Ltd In Package Memory (IPM) Security C – TSMC Property Secret Additional memory hierarchy between on-chip SRAM and off-chip DDR
Smaller capacity; higher bandwidth; faster latency Advantages DRAM is more cost-effective than eDRAM Non-TSV allows fine pitch RDL and shorten trace between AP and IPM [2] Google, 2015-VLSI, “System Challenges..” Short trace enable termination-less IO design Save main DRAM bandwidth by 60%
4
© 2016 TSMC, Ltd Memory Interface Development Trend Security C – TSMC Property Secret Higher Bandwidth; Lower Power; Lower Cost (TSV-less)?
?
?
[3] TSMC, 2013-OIP “SoC Memory..”
5
© 2016 TSMC, Ltd Advanced Package Solutions Security C – TSMC Property Secret Why we choose FO-WLP? Ultra-fine pitch RDL (W/S < 5um/5um) Ultra-thin package (~0.6mm including BGA) Smaller die size (no-TSV) Suit for smartphones, tablets, and wearables
[4] 2012 Business Update “3DIC & TSV interconnects” 6
© 2016 TSMC, Ltd TSMC Package Solution Security C – TSMC Property Secret InFO -WLP (INtegrated Fan-Out Wafer-Level-Package) vs FC-BGA More pin counts with smaller die size Lower parasitic resistance (thicker copper traces) Lower substrate loss (molding compound) Better thermal behavior (smaller form factor)
7 [5] TSMC, 2012-IEDM, “High-Performance..”
© 2016 TSMC, Ltd Project Target Security C – TSMC Property Secret Demonstrate an in-package interconnect for in-package memory in InFO package
Less capable of memory process is considered Low-Power Low-Latency Small Area
Try to be transparent as possible 2Gbit/s/pin; 64Gbyte/s total bandwidth Power efficient IO Prompt and automatic timing-calibration scheme
8
© 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret TSMC InFO side-by-side 560um thickness of packaged chip including 3*RDL & BGA
D2P Bump : Die-to-Probe Bump D2D Bump : Die-to-Die Bump BGA Ball D2D Bump D2P Bump InFO Trace BGA ball BGA ball UBM UBM RDL-3 RDL-3 RDL-1/RDL-2/RDL-3 Height: ~560um SOC Molding MEM Substrate Compound Substrate
9
© 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret Low power design concept 2013-VLSI by TSMC
Reduced-power TX VDDQ Low swing with lower VDDQ
Termination-less PAD VDDQ=0.3V Dynamic power only
VSSQ VSSQ
[6] TSMC, 2013-VLSI, “An extra low-power..” 10 [7] JESD8-28 © 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret Low power design concept 2013-VLSI by TSMC
Reduced-power RX VDD VDD DQ IO Clock-based sense amplifier CKB CKB No balance buffer DQS PLL/ Termination-less VREF IN DLL CKB Dynamic power only De-skew
VSS
[6] TSMC, 2013-VLSI, “An extra low-power..” 11
© 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret Low power design concept 2013-VLSI by TSMC
Simplified design in DRAM
Get rid of DLL/PLL in DRAM SOC Memory DQ IO
0/180 PLL DQS 90/270 DCDL De-skew DLL
[6] TSMC, 2013-VLSI, “An extra low-power..” 12
© 2016 TSMC, Ltd System Architecture Security C – TSMC Property Secret
InFO SOC Interconnect MEM 64GByte/s
WRDATA[63:0] SUB- DQ[15:0] SUB- P2L_DQ[63:0] SUBSUB- - SUBSUB- - 256-DQ RDDATA[63:0] ChannelSUB- DMI[1:0] ChannelSUB- L2P_DQ[63:0] ChannelChannelChannel ChannelChannelChannel [0] DQS_t/DQS_c [0] Bi-directional DFI_CA[43:0] CA[10:0] P2L_CA[43:0] (Memory DFI_CLK SUB- SUB- P2L_CK Size Data RST_n/CS_n/CKE Data DQ/DQS CA CA 524288= Align Align 512 address* Channel CK_t/CK_c Channel 64 bit* Modular design 4 cell* SUB- CAL[9:0] SUB- 4 ch) CAL CAL Easily scalable Channel Channel DLL DLLDLL Physical balance Channel[0] Channel[0]
PLL BIST IIC IIC BIST 500Mbps/ Slice 2Gbps/ Slice 500Mbps/SRAM SDR DDR SDR 13
© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret Sub-DQ-channel architecture SOC MEM 16-DQ share one Sub-Channel[*] Sub-Channel[*] Logic Near Pad Logic Low swing IO Low swing IO Near Pad Logic Logic differential DQS DQ[15:0] TX DMI[1:0] Per byte DMI FIFO
combines RX Data_mask (DM) & FIFO
Data_bus_inversion (*16+2) (*16+2) (DBI) DQS_t[*]/ DCDL DCDL DQS_c[*] (RPT) (read) SCLK From CK DCDL (write)
14
© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret Sub-DQ-channel architecture SOC MEM Sub-Channel[*] Sub-Channel[*] Logic Near Pad Logic Low swing IO Low swing IO Near Pad Logic Logic WRITE-path DQ[15:0] TX DMI[1:0] [6] DLL-TX FIFO
RX FIFO
(*16+2) (*16+2) DQS_t[*]/ DCDL DCDL DQS_c[*] De-skew (RPT) (read) SCLK From CK DCDL (write) Shift 270o [6] TSMC, 2013-VLSI, “An extra low-power..” 15
© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret Sub-DQ-channel architecture SOC MEM Sub-Channel[*] Sub-Channel[*] Logic Near Pad Logic Low swing IO Low swing IO Near Pad Logic Logic READ-path DQ[15:0] TX DMI[1:0] [6] DLL-RX FIFO
RXFIFO RX DLL-RPT FIFO Asynchronous (*16+2) (*16+2) DQS_t[*]/ clock between DCDL DCDL De-skew DQS_c[*] SCLK & RDQS (RPT) (read) o SCLK Shift 270 From CK DCDL (write)
[6] TSMC, 2013-VLSI, “An extra low-power..” 16
© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret Sub-CAL(calibration)-channel architecture
SOC InFO MEM 3 DLLs (TX/RX/RPT) CK CAL_NDW_CK_t/c Prompt and automatic 0/270 DLL-RX 270 CTS PLL Lock time < 1us (1:18) read CAL_ADW_CK_t/c 10-pin overhead shared NPL PHYC CAL_NDR_CK_t/c by 64-DQ CTS Replica (1:18) PHYM CTS Replica But scalable DLL-TX NPL write CAL_ADR_CK_t/c CTS
SCLK RDQS CTS DCDL CAL_PABR_DQS_t/c DLL-RPT (1:18) (read) Read Pre- amble DQS_t/c 17 Note: “RPT”=“Read-Preamble-Training”
© 2016 TSMC, Ltd Latency-cost Degrades System Efficiency Security C – TSMC Property Secret
(1T=500MHz) InFO SOC Interconnect MEM 4.75T 1.5T
WRDATA[63:0] SUB- DQ[15:0] SUB- P2L_DQ[63:0] SUBSUB- - SUBSUB- - RDDATA[63:0] ChannelSUB- DMI[1:0] ChannelSUB- L2P_DQ[63:0] ChannelChannelChannel ChannelChannelChannel De-/Serialization [0] DQS_t/DQS_c [0]
ratio: 4 DFI_CA[43:0] CA[10:0] P2L_CA[43:0] (Memory DFI_CLK SUB- SUB- P2L_CK Size Data RST_n/CS_n/CKE Data CA CA 524288= DBI enable Align Align 512 address* Channel CK_t/CK_c Channel 64 bit* 4 cell* (data-bus-inversion) SUB- CAL[9:0] SUB- 4 ch) CAL CAL Channel Channel DLL DLLDLL 1.875TChannel[0] Channel[0] 2T
PLL BIST IIC IIC BIST Slice Slice SRAM
18
© 2016 TSMC, Ltd Temperature Drift Monitoring Scheme Security C – TSMC Property Secret PWR Ramp ~PD_PHYC Temperature drift Reset DLL_LD LOCK_CODE Timing variation DLL Lock Performance degradation Issue PHYC Reset 1. Periodic check (1ms ) Leveraging DLL architecture 2. Exit Power Gating to IDLE
Periodic check (1ms) in background MC DLL BG Re-Check Alert issue if drift exceed the tolerance Error DLL BG Error > threshold , Re-Check Done Raise Error Compare LOCK_CODE, 19 BG_CODE © 2016 TSMC, Ltd Pad Plan Security C – TSMC Property Secret Compact pad plan with perfect-match trace length RDL routing density: 1*RDL/10um D2P-Bump D2D-Bump 550um D2D-Bump D2P-Bump 20 © 2016 TSMC, Ltd Boundary Scan Strategy Security C – TSMC Property Secret KGD (Known-Good-Die) – Contactless IO Compatible with IEEE1149.1 Each die builds in self TAP controller Support contactless LIPINCON-IO open/short test individually InFO D2D PAD …... Boundary Scan Cell BGA ball BGA ball UBM UBM Core RDL-3 RDL-3 RDL-1/RDL-2/RDL-3 TAP D2P Bump D2D Bump D2D Bump D2P Bump …... SOC Molding MEM Substrate Compound Substrate InFO D2P PAD : InFO Die-to-Probe PAD (Contactable) : InFO Die-to-Die PAD (Contactless) 21 © 2016 TSMC, Ltd Boundary Scan Strategy Security C – TSMC Property Secret KGS (Known-Good-Stack) – Interconnect Cascade TAP structure and control in sequence Support inter-connectivity test through TAP/Bscan cells Capable of per pin diagnosis …... Core …... Boundary Scan Cell TAP BGA ball BGA ball UBM UBM RDL-3 RDL-3 TDI TMS TDO RDL-1/RDL-2/RDL-3 InFO D2D PAD D2P Bump D2D Bump D2D Bump D2P Bump …... SOC Molding MEM Substrate Compound Substrate Core Boundary TAP …... Scan Cell 22 InFO D2P PAD TDI TMS TDO © 2016 TSMC, Ltd Packaged-Die Photo After InFO Security C – TSMC Property Secret Fan-out to get the required direct-access BGA balls [BGA ball side] [7mm*7mm] SOC MEM Top-die Top-die (16FF) (16FF) 23 © 2016 TSMC, Ltd KGD Shmoo Plot Security C – TSMC Property Secret SOC/MEM-PHY loopback BIST (Logic VDD vs. Frequency) VDDQ=0.3V Freq. Max=2.8Gbps @ VDD=0.8V 256-DQ toggle +40% Speed DBI enable Freq +10% Freq. Target=2Gbps PRBS VDD -10% 24 Logic VDD Target=0.8V © 2016 TSMC, Ltd KGS Shmoo Plot Security C – TSMC Property Secret SOC-to-MEM Write/Read BIST (Logic VDD vs. Frequency) Freq. Max=3.6Gbps @ VDD=1.0V VDDQ=0.3V 256-DQ toggle Freq. Max=2.8Gbps Same speed as KGD DBI enable Freq +10% Freq. Target=2Gbps PRBS VDD -10% 25 Logic VDD Target = 0.8V © 2016 TSMC, Ltd KGS Shmoo Plot Security C – TSMC Property Secret SOC-to-MEM Write/Read BIST (IO-VDDQ vs. Frequency) -50% VDDQ (150mV) Max Speed VDD=0.8V Limited by Logic V Freq. Max=2.8Gbps 256-DQ toggle -50% VDDQ DBI enable Freq. Target=2Gbps PRBS IO VDDQ Min.=0.15V IO VDDQ Target=0.3V 26 © 2016 TSMC, Ltd Capable of Eye-ploting Security C – TSMC Property Secret 0.3V Swing is critical to SSO under wide bus application How to ensure signal integrity on the un-probed interconnect IO? X-axis: DCDL 0.3V code (7ps) Eye Height (225mV) Y-axis: VREF Adjusted Capture code (9.5mV) 0.15V VDD=0.8V Eye Width (420ps~0.84UI) Org. Capture 256-DQ toggle Code] [VREF 0.04V(Noise Ground) DBI enable 0V PRBS [DCDL Code] 27 © 2016 TSMC, Ltd Power Breakdown Security C – TSMC Property Secret Base on simulation results with 100% toggling conditions Power Breakdown LIPINCON-IO power efficiency: 0.8% 0.062pJ/bit (consider one single-end IO) 9.2% LIPINCON-PHY power efficiency: 0.424pJ/bit 14.5% Digital (0.8V) LIPINCON-IO (0.8V/0.3V) Power Power DCDL (0.8V) Blocks Percentage Consumption Efficiency 50.7% NPL (0.8V) Digital (0.8V) 110.00 0.215 50.7% Replica cell (0.8V) LIPINCON-IO (0.8V/0.3V) 53.97 0.105 24.9% 24.9% DCDL (0.8V) 31.52 0.062 14.5% NPL (0.8V) 20.00 0.039 9.2% Replica cell (0.8V) 1.65 0.003 0.8% SUM 217.141 0.424 100.0% (mW) (mW/Gb/s) 28 © 2016 TSMC, Ltd Conclusion Security C – TSMC Property Secret An in-package interconnect for in-package memory application in InFO package has been demonstrated Technology: TSMC 16FF + InFO 89.6GByte/s total bandwidth is achieved with 256-DQ operating in 2.8Gbit/s and 0.3V-swing Low power: IO (0.062pJ/bit); PHY (0.424pJ/bit) Low latency: Write (4.75T+1.5T=6.25T); Read (2+1.875=3.875T) 0.3V signal integrity on the un-probed IO has been clarified 420ps (0.84UI) Eye width; 225mV (75%) Eye height Prompt and automatic timing-calibration scheme 29 © 2016 TSMC, Ltd Acknowledgement Security C – TSMC Property Secret The authors would like to thank TSMC “InFO-IP” team for InFO back-end support, TSMC “IPPM” team for measurement support, and “System-BD” team for business help. The authors would also like to specially thank “Mentor Graphics” for boundary scan design collaboration. 30 © 2016 TSMC, Ltd Reference Security C – TSMC Property Secret [1] Intel, 2015-ISSCC, “The Xeon® Processor E5-2600 v3: A 22nm 18-Core Product Family” [2] Google, 2015-VLSI, “System Challenges and Hardware Requirements for Future Consumer Devices: From Wearable to ChromeBooks and Devices in-between” [3] TSMC, 2013-OIP “SoC Memory Interfaces. Today and tomorrow at TSMC” [4] 2012 Business Update “3DIC & TSV interconnects” [5] TSMC, 2012-IEDM, “High-Performance Integrated Fan-Out Wafer Level Packaging (InFO-WLP): Technology and System Integration” [6] TSMC, 2013-VLSI, “An extra low-power 1Tbit/s bandwidth PLL/DLL-less eDRAM PHY using 0.3V low-swing IO for 2.5D CoWoS application” [7] JESD8-28 “300mV interface” 31 © 2016 TSMC, Ltd