A 16nm 256-bit Wide 89.6GByte/s Total Bandwidth In-Package Interconnect with 0.3V Swing and 0.062pJ/bit Power in InFO Package

Mu-Shan Lin, Chien-Chun Tsai, Cheng-Hsiang Hsieh, Wen-Hung Huang, Yu-Chi Chen, Shu-Chun Yang, Chin- Ming Fu, Hao-Jie Zhan, Jinn-Yeh Chien, Shao-Yu Li, Y.-H. Chen, C.-C. Kuo, Shih-Peng Tai and Kazuyoshi Yamada Manufacturing Company, Ltd. DTP TSMC Property Taiwan

© 2016 TSMC, Ltd Outline Security C – TSMC Property Secret  Motivation  In-Package Interconnect Applications  Advance Package Solutions  Introduction  InFO Process  Low Power Design Concept (2013-VLSI)  System Architecture  Circuit Descriptions  Experimental Results  Conclusion 1

© 2016 TSMC, Ltd In-Package Interconnect Applications Security C – TSMC Property Secret  LIPINCONTM: Low-voltage-In-Package-INter-CONnect

 High performance computing

 Limit die size for yield  In package memory  Higher level cache die  Heterogeneous integration chip die chip  High speed SERDES PCB  Smaller form factor  Shorter interconnect trace

2

© 2016 TSMC, Ltd High Performance Computing (HPC) Security C – TSMC Property Secret  Yield goes down exponentially as die size gets large  Building large dice is quite difficult and very costly

~30mm

CORE ~15mm

CORE CORE CORE CORE

CORE CORE CORE CORE ~5mm

CORE CORE CORE CORE ~20mm CORE CORE CORE CORE CORE

3 [1] , 2015-ISSCC, “The Xeon Processor..”

© 2016 TSMC, Ltd In Package Memory (IPM) Security C – TSMC Property Secret  Additional memory hierarchy between on-chip SRAM and off-chip DDR

 Smaller capacity; higher bandwidth; faster latency  Advantages  DRAM is more cost-effective than eDRAM  Non-TSV allows fine pitch RDL and shorten trace between AP and IPM [2] Google, 2015-VLSI, “System Challenges..”  Short trace enable termination-less IO design  Save main DRAM bandwidth by 60%

4

© 2016 TSMC, Ltd Memory Interface Development Trend Security C – TSMC Property Secret  Higher Bandwidth; Lower Power; Lower Cost (TSV-less)?

?

?

[3] TSMC, 2013-OIP “SoC Memory..”

5

© 2016 TSMC, Ltd Advanced Package Solutions Security C – TSMC Property Secret  Why we choose FO-WLP?  Ultra-fine pitch RDL (W/S < 5um/5um)  Ultra-thin package (~0.6mm including BGA)  Smaller die size (no-TSV)  Suit for smartphones, tablets, and wearables

[4] 2012 Business Update “3DIC & TSV interconnects” 6

© 2016 TSMC, Ltd TSMC Package Solution Security C – TSMC Property Secret  InFO -WLP (INtegrated Fan-Out -Level-Package) vs FC-BGA  More pin counts with smaller die size  Lower parasitic resistance (thicker copper traces)  Lower substrate loss (molding compound)  Better thermal behavior (smaller form factor)

7 [5] TSMC, 2012-IEDM, “High-Performance..”

© 2016 TSMC, Ltd Project Target Security C – TSMC Property Secret  Demonstrate an in-package interconnect for in-package memory in InFO package

 Less capable of memory process is considered  Low-Power  Low-Latency  Small Area

 Try to be transparent as possible  2Gbit/s/pin; 64Gbyte/s total bandwidth  Power efficient IO  Prompt and automatic timing-calibration scheme

8

© 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret  TSMC InFO side-by-side  560um thickness of packaged chip including 3*RDL & BGA

D2P Bump : Die-to-Probe Bump D2D Bump : Die-to-Die Bump BGA Ball D2D Bump D2P Bump InFO Trace BGA ball BGA ball UBM UBM RDL-3 RDL-3 RDL-1/RDL-2/RDL-3 Height: ~560um SOC Molding MEM Substrate Compound Substrate

9

© 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret  Low power design concept 2013-VLSI by TSMC

Reduced-power TX VDDQ  Low swing with lower VDDQ

 Termination-less PAD VDDQ=0.3V  Dynamic power only

VSSQ VSSQ

[6] TSMC, 2013-VLSI, “An extra low-power..” 10 [7] JESD8-28 © 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret  Low power design concept 2013-VLSI by TSMC

Reduced-power RX VDD VDD DQ IO  Clock-based sense amplifier CKB CKB  No balance buffer DQS PLL/  Termination-less VREF IN DLL CKB  Dynamic power only De-skew

VSS

[6] TSMC, 2013-VLSI, “An extra low-power..” 11

© 2016 TSMC, Ltd Introduction Security C – TSMC Property Secret  Low power design concept 2013-VLSI by TSMC

Simplified design in DRAM

 Get rid of DLL/PLL in DRAM SOC Memory DQ IO

0/180 PLL DQS 90/270 DCDL De-skew DLL

[6] TSMC, 2013-VLSI, “An extra low-power..” 12

© 2016 TSMC, Ltd System Architecture Security C – TSMC Property Secret

InFO SOC Interconnect MEM  64GByte/s

WRDATA[63:0] SUB- DQ[15:0] SUB- P2L_DQ[63:0] SUBSUB- - SUBSUB- -  256-DQ RDDATA[63:0] ChannelSUB- DMI[1:0] ChannelSUB- L2P_DQ[63:0] ChannelChannelChannel ChannelChannelChannel [0] DQS_t/DQS_c [0]  Bi-directional DFI_CA[43:0] CA[10:0] P2L_CA[43:0] (Memory DFI_CLK SUB- SUB- P2L_CK Size Data RST_n/CS_n/CKE Data DQ/DQS CA CA 524288= Align Align 512 address* Channel CK_t/CK_c Channel 64 bit*  Modular design 4 cell* SUB- CAL[9:0] SUB- 4 ch) CAL CAL  Easily scalable Channel Channel DLL DLLDLL  Physical balance Channel[0] Channel[0]

PLL BIST IIC IIC BIST 500Mbps/ Slice 2Gbps/ Slice 500Mbps/SRAM SDR DDR SDR 13

© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret  Sub-DQ-channel architecture SOC MEM  16-DQ share one Sub-Channel[*] Sub-Channel[*] Logic Near Pad Logic Low swing IO Low swing IO Near Pad Logic Logic differential DQS DQ[15:0] TX DMI[1:0]  Per byte DMI FIFO

combines RX Data_mask (DM) & FIFO

Data_bus_inversion (*16+2) (*16+2) (DBI) DQS_t[*]/ DCDL DCDL DQS_c[*] (RPT) (read) SCLK From CK DCDL (write)

14

© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret  Sub-DQ-channel architecture SOC MEM Sub-Channel[*] Sub-Channel[*] Logic Near Pad Logic Low swing IO Low swing IO Near Pad Logic Logic  WRITE-path DQ[15:0] TX DMI[1:0] [6]  DLL-TX FIFO

RX FIFO

(*16+2) (*16+2) DQS_t[*]/ DCDL DCDL DQS_c[*] De-skew (RPT) (read) SCLK From CK DCDL (write) Shift 270o [6] TSMC, 2013-VLSI, “An extra low-power..” 15

© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret  Sub-DQ-channel architecture SOC MEM Sub-Channel[*] Sub-Channel[*] Logic Near Pad Logic Low swing IO Low swing IO Near Pad Logic Logic  READ-path DQ[15:0] TX DMI[1:0] [6]  DLL-RX FIFO

 RXFIFO  RX DLL-RPT FIFO  Asynchronous (*16+2) (*16+2) DQS_t[*]/ clock between DCDL DCDL De-skew DQS_c[*] SCLK & RDQS (RPT) (read) o SCLK Shift 270 From CK DCDL (write)

[6] TSMC, 2013-VLSI, “An extra low-power..” 16

© 2016 TSMC, Ltd Circuit Description Security C – TSMC Property Secret  Sub-CAL(calibration)-channel architecture

SOC InFO MEM  3 DLLs (TX/RX/RPT) CK CAL_NDW_CK_t/c  Prompt and automatic 0/270 DLL-RX 270 CTS PLL  Lock time < 1us (1:18) read CAL_ADW_CK_t/c  10-pin overhead shared NPL PHYC CAL_NDR_CK_t/c by 64-DQ CTS Replica (1:18) PHYM CTS Replica  But scalable DLL-TX NPL write CAL_ADR_CK_t/c CTS

SCLK RDQS CTS DCDL CAL_PABR_DQS_t/c DLL-RPT (1:18) (read) Read Pre- amble DQS_t/c 17 Note: “RPT”=“Read-Preamble-Training”

© 2016 TSMC, Ltd Latency-cost Degrades System Efficiency Security C – TSMC Property Secret

(1T=500MHz) InFO SOC Interconnect MEM 4.75T 1.5T

WRDATA[63:0] SUB- DQ[15:0] SUB- P2L_DQ[63:0] SUBSUB- - SUBSUB- - RDDATA[63:0] ChannelSUB- DMI[1:0] ChannelSUB- L2P_DQ[63:0] ChannelChannelChannel ChannelChannelChannel  De-/Serialization [0] DQS_t/DQS_c [0]

ratio: 4 DFI_CA[43:0] CA[10:0] P2L_CA[43:0] (Memory DFI_CLK SUB- SUB- P2L_CK Size Data RST_n/CS_n/CKE Data CA CA 524288=  DBI enable Align Align 512 address* Channel CK_t/CK_c Channel 64 bit* 4 cell* (data-bus-inversion) SUB- CAL[9:0] SUB- 4 ch) CAL CAL Channel Channel DLL DLLDLL 1.875TChannel[0] Channel[0] 2T

PLL BIST IIC IIC BIST Slice Slice SRAM

18

© 2016 TSMC, Ltd Temperature Drift Monitoring Scheme Security C – TSMC Property Secret PWR Ramp ~PD_PHYC  Temperature drift  Reset DLL_LD LOCK_CODE Timing variation  DLL Lock Performance degradation Issue PHYC Reset 1. Periodic check (1ms )  Leveraging DLL architecture 2. Exit Power Gating to IDLE

 Periodic check (1ms) in background MC DLL BG Re-Check  Alert issue if drift exceed the tolerance Error

DLL BG Error > threshold , Re-Check Done Raise Error

Compare LOCK_CODE, 19 BG_CODE

© 2016 TSMC, Ltd Pad Plan Security C – TSMC Property Secret  Compact pad plan with perfect-match trace length  RDL routing density: 1*RDL/10um

D2P-Bump D2D-Bump 550um

D2D-Bump D2P-Bump

20

© 2016 TSMC, Ltd Boundary Scan Strategy Security C – TSMC Property Secret  KGD (Known-Good-Die) – Contactless IO  Compatible with IEEE1149.1  Each die builds in self TAP controller  Support contactless LIPINCON-IO open/short test individually

InFO D2D PAD …... Boundary Scan Cell BGA ball BGA ball UBM UBM Core RDL-3 RDL-3 RDL-1/RDL-2/RDL-3

TAP D2P Bump D2D Bump D2D Bump D2P Bump …... SOC Molding MEM Substrate Compound Substrate InFO D2P PAD : InFO Die-to-Probe PAD (Contactable)

: InFO Die-to-Die PAD (Contactless)

21

© 2016 TSMC, Ltd Boundary Scan Strategy Security C – TSMC Property Secret  KGS (Known-Good-Stack) – Interconnect  Cascade TAP structure and control in sequence  Support inter-connectivity test through TAP/Bscan cells  Capable of per pin diagnosis

…... Core …... Boundary Scan Cell TAP BGA ball BGA ball UBM UBM RDL-3 RDL-3 TDI TMS TDO RDL-1/RDL-2/RDL-3 InFO D2D PAD D2P Bump D2D Bump D2D Bump D2P Bump …... SOC Molding MEM Substrate Compound Substrate Core Boundary TAP …... Scan Cell

22 InFO D2P PAD TDI TMS TDO

© 2016 TSMC, Ltd Packaged-Die Photo After InFO Security C – TSMC Property Secret  Fan-out to get the required direct-access BGA balls [BGA ball side] [7mm*7mm]

SOC MEM Top-die Top-die (16FF) (16FF)

23

© 2016 TSMC, Ltd KGD Shmoo Plot Security C – TSMC Property Secret  SOC/MEM-PHY loopback BIST (Logic VDD vs. Frequency)

 VDDQ=0.3V Freq. Max=2.8Gbps @ VDD=0.8V  256-DQ toggle +40% Speed  DBI enable Freq +10% Freq. Target=2Gbps  PRBS VDD -10%

24 Logic VDD Target=0.8V

© 2016 TSMC, Ltd KGS Shmoo Plot Security C – TSMC Property Secret  SOC-to-MEM Write/Read BIST (Logic VDD vs. Frequency)

Freq. Max=3.6Gbps @ VDD=1.0V  VDDQ=0.3V  256-DQ toggle Freq. Max=2.8Gbps Same speed as KGD  DBI enable Freq +10% Freq. Target=2Gbps  PRBS VDD -10%

25 Logic VDD Target = 0.8V

© 2016 TSMC, Ltd KGS Shmoo Plot Security C – TSMC Property Secret  SOC-to-MEM Write/Read BIST (IO-VDDQ vs. Frequency)

-50% VDDQ (150mV) Max Speed  VDD=0.8V Limited by Logic V Freq. Max=2.8Gbps  256-DQ toggle -50% VDDQ

 DBI enable Freq. Target=2Gbps  PRBS

IO VDDQ Min.=0.15V IO VDDQ Target=0.3V 26

© 2016 TSMC, Ltd Capable of Eye-ploting Security C – TSMC Property Secret  0.3V Swing is critical to SSO under wide bus application  How to ensure signal integrity on the un-probed interconnect IO?

 X-axis: DCDL 0.3V code (7ps) Eye Height (225mV)  Y-axis: VREF Adjusted Capture code (9.5mV) 0.15V  VDD=0.8V Eye Width (420ps~0.84UI) Org. Capture

 256-DQ toggle Code] [VREF 0.04V(Noise Ground)  DBI enable 0V  PRBS [DCDL Code]

27

© 2016 TSMC, Ltd Power Breakdown Security C – TSMC Property Secret  Base on simulation results with 100% toggling conditions

Power Breakdown  LIPINCON-IO power efficiency: 0.8% 0.062pJ/bit (consider one single-end IO) 9.2%  LIPINCON-PHY power efficiency: 0.424pJ/bit 14.5% Digital (0.8V) LIPINCON-IO (0.8V/0.3V)

Power Power DCDL (0.8V) Blocks Percentage Consumption Efficiency 50.7% NPL (0.8V) Digital (0.8V) 110.00 0.215 50.7% Replica cell (0.8V) LIPINCON-IO (0.8V/0.3V) 53.97 0.105 24.9% 24.9% DCDL (0.8V) 31.52 0.062 14.5%

NPL (0.8V) 20.00 0.039 9.2%

Replica cell (0.8V) 1.65 0.003 0.8%

SUM 217.141 0.424 100.0% (mW) (mW/Gb/s)

28

© 2016 TSMC, Ltd Conclusion Security C – TSMC Property Secret  An in-package interconnect for in-package memory application in InFO package has been demonstrated

 Technology: TSMC 16FF + InFO  89.6GByte/s total bandwidth is achieved with 256-DQ operating in 2.8Gbit/s and 0.3V-swing

 Low power: IO (0.062pJ/bit); PHY (0.424pJ/bit)  Low latency: Write (4.75T+1.5T=6.25T); Read (2+1.875=3.875T)  0.3V signal integrity on the un-probed IO has been clarified

 420ps (0.84UI) Eye width; 225mV (75%) Eye height  Prompt and automatic timing-calibration scheme

29

© 2016 TSMC, Ltd Acknowledgement Security C – TSMC Property Secret

The authors would like to thank TSMC “InFO-IP” team for InFO back-end support, TSMC “IPPM” team for measurement support, and “System-BD” team for business help. The authors would also like to specially thank “Mentor Graphics” for boundary scan design collaboration.

30

© 2016 TSMC, Ltd Reference Security C – TSMC Property Secret  [1] Intel, 2015-ISSCC, “The Xeon® Processor E5-2600 v3: A 22nm 18-Core Product Family”  [2] Google, 2015-VLSI, “System Challenges and Hardware Requirements for Future Consumer Devices: From Wearable to ChromeBooks and Devices in-between”  [3] TSMC, 2013-OIP “SoC Memory Interfaces. Today and tomorrow at TSMC”  [4] 2012 Business Update “3DIC & TSV interconnects”  [5] TSMC, 2012-IEDM, “High-Performance Integrated Fan-Out Wafer Level Packaging (InFO-WLP): Technology and System Integration”  [6] TSMC, 2013-VLSI, “An extra low-power 1Tbit/s bandwidth PLL/DLL-less eDRAM PHY using 0.3V low-swing IO for 2.5D CoWoS application”  [7] JESD8-28 “300mV interface”

31

© 2016 TSMC, Ltd