New Directions in 2.5D / 3D Heterogeneous Integration of FPGAs

Dr. Farhana Sheikh Senior Staff Scientist, Intel Corporation

Collaborators: Ramune Nagisetty, David Kehlet, Ankireddy Nalamalpu, Mahesh Iyer, Tanay Karnik, Jose Alvarez

ISSCC 2021 F5 Forum Presentation

February 18, 2021

February 18, 2021 Farhana Sheikh, Intel Corporation 1 Presenter Biography

Farhana Sheikh (IEEE SM’14, IEEE M’93) received the B.Eng. degree in Systems and Engineering with high distinction and Chancellor’s Medal from Carleton University, Ottawa, Canada, in 1993 and the M.Sc. and Ph.D. degrees in Electrical Engineering and Computer Sciences from the University of California, Berkeley, in 1996 and 2008, respectively. From 1993 to 1994 she worked at Nortel Networks as a software engineer in firmware and design. From 1996 to 2001, she was at Cadabra Design Automation as software engineer and senior manager. Since joining Intel in 2008, Farhana has worked in various roles as a senior researcher and manager in digital circuits for cryptography, graphics, and next generation wireless systems for over 11-years as part of Intel Labs. Currently, Dr. Sheikh is a Senior Staff Scientist in Intel’s Programmable Solutions Group Technology Innovation and Strategy Office where her research focuses on 2-D/3-D heterogeneous integration, mm-Wave and THz distributed/non-distributed massive MIMO circuits and architectures, high-frequency wireless control for quantum , adaptive and deep-learning based circuits/architectures for next generation intelligent wireless systems, cryogenic CMOS circuits, and 2D/3D multi- integration. She has published 48 papers, filed 22 patents, and received multiple academic/industry/publication awards, of which include two prestigious ISSCC Lewis Best Paper Awards for work published at ISSCC 2012 and ISSCC 2019. Farhana is the IEEE Solid-State Circuits Society Oregon Chapter Chair and holds technical program committee and co-chair positions in multiple solid-state circuits and signal processing conferences. At ISSCC 2020, she chaired the first SSCS Women in Circuits “Rising Stars Workshop”.

February 18, 2021 Farhana Sheikh, Intel Corporation 2 Legal Disclosures and Disclaimers

This presentation contains the general insights and opinions of Intel Corporation (“Intel”). The information in this presentation is provided for information only and is not to be relied upon for any other purpose than educational. Statements in this document that refer to Intel’s plans and expectations for the quarter, the year, and the future, are forward-looking statements that involve a number of risks and uncertainties. A detailed discussion of the factors that could affect Intel’s results and plans is included in Intel’s SEC filings, including the annual report on Form 10-K. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at www.intel.com. Results have been estimated or simulated using internal Intel analysis or architecture simulation or modeling and provided to you for informational purposes. Any differences in your system hardware, software or configuration may affect your actual performance. † Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks. Intel, the Intel logo, the Intel. Experience What's Inside logo, eASIC, and Stratix are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others

February 18, 2021 Farhana Sheikh, Intel Corporation 3 Convergence of Intelligence, Compute, and Communications  Number of connected devices to reach 500-billion by 2030[1]  59X larger than expected world population  AR / VR, machine-to-machine connectivity, imaging, sensing  Objects dominant users of next generation communications  Approach perception of humans  high performance targets Source: IEEE  AI to improve performance, network optimization, energy-efficiency, detection  Tb/s wireless capacity to enable seamless streaming User Experience Data Rate (Gbps) Heterogeneous intelligent Mobility Support 1 Peak Data Rate (km/hr) (Gbps) compute and communicate 1000 0.1 6G 1000 100 5G 0.01 100 10 10

Energy Efficiency 2X 1X 0.1X 10-5 10-6 10-7 Reliability

1 105 CLOUD 10 6 10 10 BASE 7 MOBILE 100 10 Intelligence 1 STATION DEVICES PERSONAL Spectral Efficiency 0.1 Connection Density DEVICES (bps/Hz) (devices/km2) CORE Ref: [2], [3] Air Latency (ms) Flexibility and intelligence

February 18, 2021 Farhana Sheikh, Intel Corporation 4 Intelligent Compute + Communicate

SPATIAL

FPGA

Comprehensive edge through networks to cloud; built for reconfigurability, low latency, Spatial (FPGA) Scalar energy efficiency and scalability Vector Matrix

Next set of problems are complex Requires co-design & co-optimization System-level design Ref: Intel Architecture Day, 2020

February 18, 2021 Farhana Sheikh, Intel Corporation 5 Outline

 Introduction and Motivation  Moore’s Law: Application to FPGAs  Evolution of FPGAs: Heterogeneity and Customized Flexibility  The “New” Wave in Scaling: Macro-2D/3D HI for FPGAs  What About Micro-3D and Nano-3D?  The Path Towards Future FPGA Systems: Leveraging the 2D/3D Toolset  2D/3D FPGA Scaling  2D/3D FPGAs + chiplets  New interface chiplets for 2D/3D FPGAs: optical, sub-THz wireless  Conclusions and Summary  Acknowledgements

February 18, 2021 Farhana Sheikh, Intel Corporation 6 Moore’s Law: Modularity, Flexibility, Communications

Electronics 38:8, April 19, 1965, Gordon Moore: Cramming More Components onto Integrated Circuits

Source: Intel 2D and 3D Macro-scale Heterogeneous Integration

Intel FPGA Optical I/O

Source: Intel, IEEE Micro Mar/Apr 2020 Source: Intel ISSCC 2019 – Intel’s 71-76GHz 64- element phased-array transceiver module with 2x2 element TXVR ICs

February 18, 2021 Farhana Sheikh, Intel Corporation 7 Moore’s Law: Flexibility from FPGAs

MEM FPGA

MACs + ALUs

A. Abnous and J. Rabaey, IEEE VLSI Signal ARM8 CORE Processing Workshop 1996

“Embedded FPGAs are a more of a niche; even in 2023 they’ll only be worth $4.1bn, but that’s still more than double the 2018 addressable market. Updatable hardware, application-specific accelerators and heterogeneous computing in the data center ISSCC 2000 will be the growth drivers, ….which will encourage the growing market for pre-tested UC Berkeley + ST Micro blocks of intellectual property which can be dropped into an FPGA design.” – Bill Ray, Oct 2019, Gartner

February 18, 2021 Farhana Sheikh, Intel Corporation 8 Moore’s Law: Modularity to Extend FPGA

ISSCC 2014 C. Erdmann et al., Xilinx R. Mahajan, Intel EMIB patent, 2008 Stacked Silicon Interconnect and Die Stacking, H. Braunisch et al., ESSDERC 2012 IEEE EPEPS 2011 L. Madden et al., Xilinx

February 18, 2021 Farhana Sheikh, Intel Corporation 9 Moore’s Law: Phased Arrays + FPGAs

ISSCC 2019 S. Pellerano et. al., Intel

Power FPGA sequencing (16 channels)

ADCs

4-ch rcv-only frontend boards

Breakout board

2020-2030: phased arrays and IEEE Trans. MTT, 2016 Source: A. Niknejad/B. Nikolic, directive communication S. Zihir et al., UCSD BWRC/UC Berkeley Ref. G. Lacaille et al., ICC 2020

February 18, 2021 Farhana Sheikh, Intel Corporation 10

Evolution of FPGAs: Heterogeneity and Customized Flexibility

February 18, 2021 Farhana Sheikh, Intel Corporation 11 Before FPGA-Chiplet Ecosystem: SoC FPGA

Altera ships its first Cyclone V SoC devices By Tony McConnel, Embedded.com December 12, 2012 Altera Corporation is shipping the first of its 28 nm Cyclone V SoC devices, which combine a dual-core ARM Cortex-A9 system with FPGA logic on a single chip. The new SoCs are targeted for wireless communications, industrial, video surveillance, automotive and medical equipment markets. They are designed to enable creation of custom SoC variants optimized for system power, board space, performance, and cost requirements. Source: Embedded.com

Similarly, in 2011 Xilinx combined PowerPC core 1GOPS Reconfig. Signal Proc. IC with (embedded) and transceivers (SSI technology) to expand Embedded FPGA, ST Micro capabilities and lower power (ref: L. Madden et al., M. Borgatti et al., ISSCC 2003 ESSDERC 2012)

February 18, 2021 Farhana Sheikh, Intel Corporation 12 Today: Convergence of RF, DSP, and Intelligence

Homogeneous Integration: leads to sub-optimality RF/mm-Wave and high-speed I/O pay the price in a digitally optimized technology

I/O

Chiplets

Source: Intel Specialized Nodes: transistor diversity

February 18, 2021 Farhana Sheikh, Intel Corporation 13 Technology for Digital Logic: FPGA Scaling

Intel-Altera FPGA Technology Scaling 1E+07 2019 Agilex 2013 - 2018 Stratix 10 ALM (LE) DSP Mult Blocks BRAM (Kbits)

2010 Stratix V 2008 Stratix IV 1E+06 2006 Stratix III

2005 Stratix II 2002 Stratix 1E+05

1E+04

1996 Flex 8000 10nm 1E+03 14nm 600nm 40nm 28nm 65nm . Most sensitive to device 90nm 1E+02 scaling: speed, density, Family of FPGAs may remain on same 150nm 130nm memory bandwidth limits node – transceivers/chiplets can be 300nm 420nm performance 1E+01 updated, reducing platform cost . Ideal for yield FPGAs: programmable digital logic with configurable interconnect optimization studies 1E+00 1 10 100 1000 Technology Node (nm)

February 18, 2021 Farhana Sheikh, Intel Corporation 14 Technology for High-speed IO and RF

Intel FPGA with two Ayar Labs TeraPHYTM optical I/O chiplets

Intel FPGA (Intel 14nm)

Optical I/O (GF45CLO) Source: Intel Ref: IEEE Micro March/April 2020 HotChips 2019

I/O devices require higher voltage, Technology differentiation, wider gate-pitch thick gate oxide device, high Rout, for high-frequency signal escape, metal wide gate-pitch thickness and number of layers

February 18, 2021 Farhana Sheikh, Intel Corporation 15 Technology for Analog-Mixed Signal

ADC Core

Memory and UCB ADC in Intel 22FFL: Scan chain mixed-signal circuits co- located with memory

Source: B. Nikolic, BWRC/UC Berkeley Benefits from digital logic scaling Sensitive to passive/active device matching

February 18, 2021 Farhana Sheikh, Intel Corporation 16 The “New” Wave in Scaling: Macro-2D/3D HI for FPGAs

February 18, 2021 Farhana Sheikh, Intel Corporation 17 Technology Innovations to Fuel Future FPGA Platforms

“Necessity is the Mother of Invention” 2013 Challenge: Intel to make 14-nm FPGAs for Altera Die disaggregation – FPGA + Chiplets By Rick Merritt 02.26.2013 Transceivers remain in TSMC 20nm: SAN JOSE, Calif. – Intel Corp. will build Design re-use FPGAs for Altera Corp. using its 14-nm FinFET process technology in a deal that turns up the Altera and Intel Extend Manufacturing Partnership to Include heat on TSMC in foundry and Xilinx in high-end Development of Multi-Die Devices Collaboration Will Optimize FPGAs. Integration of 14 nm Tri-Gate Stratix 10 FPGAs with Source: EE Times Heterogeneous Technologies into a Single System-in-a-Package San Jose and Santa Clara, Calif., March 26, 2014 – Altera Corporation (Nasdaq: ALTR) and Intel Corporation today announced their collaboration on the development of multi-die Digitally optimized 14-nm + devices that leverage Intel’s world-class package and assembly Intel EMIB capabilities and Altera’s leading-edge programmable logic technology. Source: Intel

February 18, 2021 Farhana Sheikh, Intel Corporation 18 Chiplets and FPGAs

. 14nm 1GHz FPGA with 2.5D 20nm transceiver integration . Transceiver roadmap becomes independent of FPGA roadmap D. Greenhill et al., Intel, ISSCC 2017

February 18, 2021 Farhana Sheikh, Intel Corporation 19 Macro-2D Integration: Packaging Technologies

AGILEX FPGA STANDARD Typical organic package (FCBGA) EMBEDDED BRIDGE PACKAGE Bump pitch - 100 um Embedding die with dense wiring in a package cavity Bump density - 100/mm2 Bump pitch – 55-36 um Energy/bit – 1.7 pJ/bit Bump density - 330- 772/mm2 2.5D Energy/bit – 0.50 pJ/bit 3.4X  in I/O energy/bit

Intel FPGA

IO/mm/lyr bump pitch + Foundry IO Chiplets IO/mm/lyr HBM μ bump pitch 32  48 100 µm + HBM 250  1000 55 µm Ref: R. Mahajan et al., IEEE Trans. on Components, Packaging and Manufacturing Technology, 9:10, Oct. 2019

February 18, 2021 Farhana Sheikh, Intel Corporation 20 Macro-3D Logic-on-Logic Integration

Foveros enables nearly 1,000 IO/mm2 Bump pitch – 50-25 um LAKEFIELD Bump density – 400-1,600/mm2 Die to wafer stacking, face to face Fine micro-bumps optimized 3D Energy/bit – 0.15 pJ/bit connection for dense 3D and low power Intel Silicon on Multiple Nodes

Ref: Intel Foveros, D. Ingerly et al., IEDM 2019 W. Gomes, Intel Silicon to package standard C4 Coarse TSV optimized for power ISSCC 2020 connection delivery and off-package I/O

February 18, 2021 Farhana Sheikh, Intel Corporation 21 Next Gen FPGAs Co-EMIB: Blending 2D and 3D

MULTIPLE TOP DIE TILES

COMPANION DIE HBM

EMIB EMIB EMIB

ACTIVE OR PASSIVE BASE DIE <50µm DIE- DIE INTERCONNECT PITCH

 Architecture enables larger-than-reticle sized base & high-density connections to companion die and stacked die complexes  Increased partitioning opportunities

February 18, 2021 Farhana Sheikh, Intel Corporation 22 Omni-Directional Interconnect (ODI)

Macro-3D Denser Macro-3D Integration

FOVEROS ODI Enables flexible design with maximum performance

• Smaller TSV die area • Direct power delivery • High bandwidth interconnects

Smaller TSV die area

Ref: A. A. Elsherbini et al., “Heterogeneous Integration Using Omni-Directional Interconnect Packaging,” IEEE IEDM 2019

February 18, 2021 Farhana Sheikh, Intel Corporation 23 Macro-2D/3D Heterogeneous Integration

FUTURE: HIGH-DENSITY INTERCONNECTS

FOVEROS TECHNOLOGY EMBEDDED BRIDGE Bump pitch – < 10 microns Bump density – > 10,000/mm2 Energy/bit – < 0.05 pJ/bit STANDARD Bump pitch – 50-25 um PACKAGE Bump density – 400-1,600/mm2 Energy/bit – 0.15 pJ/bit

Interconnect Density Bump pitch – 55-36 um Bump density – 330-772/mm2 Energy/bit – 0.50 pJ/bit Bump pitch – 100 um Bump density – 100/mm2 Energy/bit – 1.7 pJ/bit

Energy Efficiency

February 18, 2021 Farhana Sheikh, Intel Corporation 24 What About Micro-3D and Nano-3D HI?

February 18, 2021 Farhana Sheikh, Intel Corporation 25 Scaling of 2D and 3D HI Technologies

Growing commercial availability . 3D HI technologies enable improved performance, power and area . Reduced/eliminated die-die RLC, Broad commercial availability e.g. Intel EMIB, Foveros placement granularity . Monolithic 3D (M3D) HI is an emerging opportunity for THz applications

Source: Paul Fischer, Intel

February 18, 2021 Farhana Sheikh, Intel Corporation 26 Micro-3D HI: Hybrid Bonding

Macro-3D Micro-3D

FOVEROS HIGH-DENSITY TECHNOLOGY INTERCONNECTS: HYBRID BONDING Denser 3D integration Area scales with bump pitch Potential for enhanced FPGA performance: 50 um Pitch  pJ/bit 10 um Pitch Lakefield Hybrid Bonding Top View Top View 400 bumps/mm2  density 10000 bumps/mm2 Top View Memory closer to logic Energy/bit – < 0.05 pJ/bit Energy/bit – 0.15 pJ/bit • Smaller, simpler circuits • Lower capacitance • Lower power Ref: I. Cutress, “Intel Next-Gen 10-micron Stacking: Going 3D Beyond Foveros”, anandtech.com, August 2020 February 18, 2021 Farhana Sheikh, Intel Corporation 27 Nano-3D: Driving New Possibilities

GaN Oxide Si Transistor

N+ (top device layer) S/D Epi Top Top Gate Top Buffer GaN contact Top Buffer Si 300mm HR Si (111) GaN Transistor Top contact metal (bottom device layer) Top ILD Top Gate metal Bonding oxide 300mm Si (111), HR Access Via Access Contact Bottom T-Gate Ref. H. Then et al., IEDM 2019 Epi Raised N+ S/D Source: Paul Fischer, Intel GaN Transistors DARPA ERI 2020 February 18, 2021 Farhana Sheikh, Intel Corporation 28 Nano-3D: Efficient, Compact Power Delivery and RF

Example Multi-Chip Solutions with GaN Today Companion CMOS Companion CMOS Chip Needed GaN NFET CMOS Biasing Chip(s) Needed Envelope Tracker Gate Driver only Controller CMOS LNA Switch GaN NFET (CMOS) Analog only Circuits PA Freq. Divider LO buffer, PLL, VGA CMOS Digital Source: Paul Fischer, Intel Control Logic DARPA ERI 2020 Ref: H. Then et al., IEDM 2019 3D HI Power Delivery Si PMOS GaN NMOS Gate Driver CMOS Biasing, ET 3D HI 5G RFFE + Si PMOS LNA Switch GaN NMOS Controller CMOS (CMOS) Analog + Si PMOS Circuits PA GaN NMOS CMOS Digital Control Logic Finer pitches and reduced die-to-die interface RLC improve power, performance, area February 18, 2021 Farhana Sheikh, Intel Corporation 29 The Macro-/Micro-/Nano-3D Tradeoffs Nano Micro Macro

Source: Paul Fischer, Intel DARPA ERI 2020

Metric M3D HI HB 3D HI HB 3D HI uBump [WoW] [DoW, DoD] e.g. Intel EMIB, Foveros Inter-strata pitch, integration density High Med Med Low Power density High Med Med Low Inter-tier signal delay Low Med Med High Known Good Die No No Yes Yes Same die size required Yes Yes No No Technology maturity Low Med Med High Heterogeneous process co-development Yes No No No

February 18, 2021 Farhana Sheikh, Intel Corporation 30 The Path Towards Future FPGA Systems Leveraging the 2D/3D Toolset

February 18, 2021 Farhana Sheikh, Intel Corporation 31 2.5D / 3D Scaling of FPGAs

 Leverage lower energy/bit and increase in density to reduce system power  Enhanced number of FPGA cores, closer proximity of memory to FPGA  lower latency  2.5D: active on passive / EMIB package, (28nm, 20nm, 16nm)  μbump or TSV (horizontal integration)  Thermals, TSV-induced stress not a big issue  Intel 2.5D integration solution: EMIB package  3D: active on active, μbump or TSV (vertical integration) (7nm)  Thermals, TSV-induced stress  Intel 3D integration solution: Foveros, Hybrid Bonding  3.5D: active on active on passive, μbump or TSV (H+V)  Thermals, TSV-induced stress (Beyond 7nm)  Intel 3.5D integration solution: Co-EMIB, ODI  Tools and Design Methodologies: catching up  Early trials by all vendors in progress  When automated design tools address needs  products Ref: S. Burke, Xilinx SNUG March 2019

February 18, 2021 Farhana Sheikh, Intel Corporation 32 2D/3D FPGA + Chiplets

 “In CHIPS, we’re working with Intel on some integration strategies… so that you can do composable design.” – DARPA’s ERI Director Bill Chapel, IEEE Spectrum July 2018  Standardization: AIB – Chips Alliance standard and industry de facto die-to-die standard

HI GH PERFORM ANCE AI M ATRI X BLOCKS ABUNDANT NEAR- COMPUTE M EM ORY

• Up to 15X more INT8 compute performance than • Embedded and customizable memory hierarchy today’s Stratix 10 MX for AI workloads for model persistence • Hardware programmable for AI with customized • Integrated HBM for high memory bandwidth workloads

HIGH BANDWIDTH NETWORKING EXTEN SI BLE

• Up to 57.8G PAM4 transceivers and hard Intel • Chipletsenable easier interface customization Ethernet blocks for high efficiency and ASIC extensions • Flexible and customizable interconnect to scale across multiple nodes

February 18, 2021 Farhana Sheikh, Intel Corporation 33 of 70 Open Source AIB and AIB Generators

Advanced Interface Bus (AIB) Specification, Rev 2.0DRAFT3, 14 July 2020, High-density die-to-die interconnects CHIPS Alliance and Intel Corporation, https://github.com/chipsalliance/ Feature AIB 1.0 AIB 2.0

Bandwidth/wire (Gbps) 2 Up to 6.4

Bump density (um) 55 55/45/36

Bandwidth/mm shoreline 256 1638 (Gbps/mm)

Wiring Density IO Voltage (V) 0.90 0.90/0.40

Energy/bit (pJ/bit) 0.85 0.50

Die 1 Die 2 Backward Compatibility n/a 1 .0

Wire Width, Package AIB Generator available at: Spacing & Dielectrics between wires github.com/chipsalliance

February 18, 2021 Farhana Sheikh, Intel Corporation 34 of 70 Optical Chiplets and FPGAs

Intel FPGA with two Ayar Labs Source: Intel and Ayar Labs, HotChips 2019 presentation TeraPHYTM optical I/O chiplets

Intel FPGA (Intel 14nm)

Optical I/O (GF45CLO) Source: Intel Ref: IEEE Micro, March/April 2020 HotChips 2019

 In-package optics + FPGA  Support for new high-performance compute architectures  Enables new compute-communicate architectures for “intelligence at the edge”

February 18, 2021 Farhana Sheikh, Intel Corporation 35 of 70 What About Wireless and FPGAs?

Antenna array Power FPGA 160 dual polarized patch antennas (100 of 320 ports sequencing connected) (16 channels)

19” Rack 1 19” Rack 2 ADCs

Height: 1.5 m 24 U 24 U 4-ch rcv-only Depth: 0.8 m frontend boards Breakout board Width: 1.2 m 100-Antenna + RF Front-End + FPGA + ASIC: World’s first massive MIMO <6GHz system 128-channel 70GHz Massive MU-MIMO Testbed, 16 users Lund University, IEEE GlobeCom 2014 Source: A. Niknejad/B. Nikolic, BWRC/UC Berkeley Ref. G. Lacaille et al., ICC 2020 Intel 64-element mm-wave (71-76GHz)

phased array system (RFIC only – no digital baseband) with analog beamforming, ISSCC 2019

T. M. Hancock and J. C. Demmin, Int’l 3D Systems Integ. Conference (3DIC) 2019

February 18, 2021 Farhana Sheikh, Intel Corporation 36 of 70 Next Generation sub-THz Wireless Systems

BW ≥ 2GHz BW, 4-6b (comms) Example: 16-QAM, 8Gbps TX BW ≥ 2GHz BW, 8-10b (imaging) fc ≥ 140GHz BW ≥ Advanced scaled CMOS I xN 15-30GHz xN xN DAC Analog / RF CMOS (DBF) N III-V Devices DSP / ML LO1 0°/90° PA Chip-to-chip I/O DAC Q xM BW ≥ LO2 ≥ 1GHz operation 2GHz/channel LO1 = fc/6 LO2 = 5 x fc/6 I/O and power issue exacerbated with: . Higher frequency (e.g., 210GHz) BW ≥ 2GHz BW, 4-6b (comms) BW ≥ . Larger bandwidth/throughput RX fc ≥ 140GHz BW ≥ BW ≥ 2GHz BW, 8-10b (imaging) 2GHz/channel . All-digital beamforming xN xN 15-30GHz I . Fully connected large phased array ADC xN N (DBF) LNA 0°/90° DSP / ML LO1 Q ADC xM LO2 Ref: F. Sheikh, DARPA ERI 2020 ≥ 1GHz operation

February 18, 2021 Farhana Sheikh, Intel Corporation 37 of 70 Opportunity for Macro-, Micro-, Nano-3D

Macro-3D integration Micro-3D Nano-3D Macro-3D Micro-3D integration integration integration integration Example: 16-QAM, 8Gbps TX I xN xN Advanced scaled CMOS xN DAC Analog / RF CMOS (DBF) N III-V Devices DSP / ML LO1 0°/90° PA Chip-to-chip I/O DAC Q LO2 xM

Intelligent THz Radio Cube: IF Architecture 3D Integration pros and cons: DSP / ML + Integration density Micro-3D Memory + Yield Macro-3D + Optimal design/tech. for components Nano-3D ADC/DAC Micro-3D + Power, performance, area (PPA) Nano-3D RFIC − Thermals, power density − Interference/noise (elect./thermal) Package Substrate H and V NoC and inter-die I/O Macro-3D Antennas Power delivery (3D power grid?) Concept only – not to scale ∘ ∘

February 18, 2021 Farhana Sheikh, Intel Corporation 38 System Architecture  3D SoC Design Macro-3D Micro-/Nano-3D integration integration TX Intelligent THz Radio Cube: I xN xN DAC Direct-digital Architecture (DBF) N DSP / ML 0°/90° DSP / ML LO1 PA Micro-3D Macro-3D DAC RFIC TX/RX, ADC/DAC Nano-3D Q

Package Substrate Macro-3D RX Micro-/Macro-3D integration xN 2.5D integration with FPGA I ADC xN FPGA provides programmability (DBF) N Intelligent adaptation LNA 0°/90° DSP / ML LO1 Varied applications – same system Q ADC DSP / ML AIB-3D PHY Advanced scaled CMOS Intel FPGA AIB PHY RFIC TX/RX, ADC/DAC Analog / RF CMOS EMIB Package Substrate Chip-to-chip I/O

February 18, 2021 Farhana Sheikh, Intel Corporation 39 Large Scale Phased Arrays: The I/O Problem

RF TX and RX DAC ADC Digital Beamforming . Highly flexible RF TX and RX DAC Adaptive Digital Digital Baseband . Highly scalable ADC Beamforming DSP . Adaptive (ML) . 3D HI reduces I/O power RF TX and RX DAC ADC Memory Intelligent sub-THz or THz Modular design allows low-cost with digital beamforming High I/O power reduced by 3D HI interchange of analog / hybrid / DSP / ML + Memory digital beamformers dependent on AIB-3D PHY the application ADC/DAC, RFIC TXVR (PA, LNA, LO, Filters) AIB PHY FPGA EMIB Package Substrate

 Intel VTC 2019 paper: I/O between digital and RF >3W for 16-element, 16-user MIMO array at mm- Wave, >50% of power dissipation of entire 16-element front-end  Increasing to 128-elements, interface power grows to >20W!  EMIB integration: ~6X  in I/O power; Foveros 3D stacking: ~13X  in I/O power

February 18, 2021 Farhana Sheikh, Intel Corporation 40 Antenna Integration

Antennas need to be close to RF transceivers  N-antenna AiP size antenna spacing  package shrinks with higher frequency as 2 Lower mm-Wave: RF IC < AiP tile ∝ λ Higher mm-Wave and THz: RF IC area does not shrink as 2  Sub-THz/THz RF IC > AiP package tile λ Co-integration of ADC/DAC array + digital baseband needed to alleviate digital-to-RF I/O  RF/analog + digital IC >> AiP package tile (!)

. Requires use of dummy antennas to fit IC . Can Nano-3D ICs overcome the integration issue?

Ref: B. Sadhu, IEEE Microwave Magazine, Dec. 2019 Low mm-Wave Sub-THz to THz

February 18, 2021 Farhana Sheikh, Intel Corporation 41 Power Delivery and System Design

Power Delivery  Move to on-package power delivery: power conversion near load  Leverage advances in GaN, heterogeneous integration, passives integration  Embedded power passives Network-on-Chip (NoC) and I/Os  Reduce I/O overhead between digital ASIC (or FPGA), ADC/DACs, and RF ICs  NoC reliability: structures for robust NoC – fault tolerance (e.g., hexNoC)  Adaptive algorithms: package, IC, NoC co-design required.

Granular power supply Higher efficiency Miniature modules Short power delivery network Reduced impedance Intel ODI Direct power delivery HexNoC, S. Moriam, and G. Fettweis, Euromicro Conf. Aug. 2016

February 18, 2021 Farhana Sheikh, Intel Corporation 42 System Integration and Thermal Management Heat dissipation mechanism

Hot plate

IC power density in Si-based phased arrays, TIM = thermal interface material 4-tier M3D IC heat flow vs. tier thickness B. Sadhu, IEEE Microwave Magazine, Dec. 2019 G. Fettweis et al., IEEE Proceedings Jan. 2019 (uniform power profile assumed) K. Banerjee et al., J. EDS, June 2019 Power density is growing problem for high-frequency systems – 3D ICs exasperate the issue . 3D ICs produce significant heat > chip package thermal design power (TDP) . Design of heat sink / heat spreaders and their location, cost of cooling Design techniques to reduce heat dissipation and interference . Limit power dissipation in layers far from heat sink . Use memory elements to shield processing elements . Stagger processing elements so not top of each other  impacts NoC

February 18, 2021 Farhana Sheikh, Intel Corporation 43 Standards, Modularity, and Flexibility Help!

SERDES RF ADC/DAC DSP / ML / EMIB / EMIB / EMIB or Transceiver Array ASIC AIB AIB EMIB AIB Optical Intel FPGA RF ADC/DAC DSP / ML / EMIB / EMIB DSP / ML Transceiver Array ASIC / EMIB AIB AIB ASIC AIB AIB

DSP / ML Hybrid Bonding AIB-3D PHY RFIC TX/RX, ADC/DAC AIB PHY Intel FPGA EMIB Package Substrate Foveros, Co-EMIB Ideal scenario: proximity of FPGA, compute, memory, wireless Intel FPGA Intel FPGA 128-channel 70GHz Massive MU-MIMO Testbed, 16 users DSP / ML AIB-3D PHY AIB-3D PHY DSP / ML BWRC/UC Berkeley, 2019 JUMP ComSenTer Review RFIC TX/RX, ADC/DAC RFIC TX/RX, ADC/DAC Ref. G. Lacaille et al., ICC 2020 Nano-3D, ODI EMIB Package Substrate

February 18, 2021 Farhana Sheikh, Intel Corporation 44 Conclusions and Summary

 Convergence of intelligence, compute, and communications  Heterogeneous systems design: FPGAs provide programmability at low latency  2D/3D packaging innovations seen as tools to enhance FPGA platforms  Multiple FPGA integration (2D/3D) – higher performance, parallelism  FPGA + memory (3D) – programmable compute near memory for AI/ML  Chiplet integration to enhance FPGA functionality – separate chiplet / FPGA roadmaps  Cost reduction and faster time to market  Standardized 2D/3D PHY interfaces (e.g., AIB, AIB-3D) + EMIB / Foveros packaging  Ease of customization through 3rd party chiplets  Denser integration of RF + digital + FPGA  “RF-FPGA”, “Intelligent Radio Cubes”  Enabled through nano-/micro-/macro-3D heterogeneous integration  Necessary ingredient for 6G, sub-THz wireless systems  Entering a new dimension of Moore’s Law with 2D/3D FPGAs + chiplets

February 18, 2021 Farhana Sheikh, Intel Corporation 45 Acknowledgements

 Collaborators: Ramune Nagisetty, David Kehlet, Ankireddy Nalamalpu, Mahesh Iyer, Tanay Karnik, Jose Alvarez  Feedback: Jose Alvarez, Patrick Dorsey, Scott Weber, Joannie Fu, Jason Lawley, Michael Buerchner, Bruce Fienberg, Vivek De, Eric Karl, Fatih Hamzaoglu, Robert Munoz

February 18, 2021 Farhana Sheikh, Intel Corporation 46 References

1. Cisco Edge-to-Enterprise IoT Analytics for Electrical Utilities Solution Overview 2. 6G Vision Whitepaper, 3. M. Z. Chowdhury et. al., IEEE Open Journal of the Communication Society 4. Intel Architecture Day, https://newsroom.intel.com/press-kits/architecture-day-2020/, August 2020 5. G. Moore, “Cramming More Components onto Integrated Circuits”, , April 19, 1965, pp. 114-117. 6. D. Greenhill, et al., “A 14nm 1GHz FPGA with 2.5D Transceiver Integration”, 2017 International Solid-State Circuits Conference, San Francisco, Feb. 2017. 7. IEEE Electronics Packaging Society Heterogeneous Integration Roadmap, 2019 Edition, https://eps.ieee.org/technology/heterogeneous-integration- roadmap/2019-edition.html. 8. Kaby Lake G, Hot Chips 2019: https://fuse.wikichip.org/news/1634/hot-chips-30-intel-kaby-lake-g/. 9. M. Wade et. al. (Ayar Labs), S. Y. Shumarayev et. al. (Intel), “TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O”, IEEE Micro, March/April 2020. 10. W. Gomes, et al., “Lakefield: 3D stacked 10 nm and 22 FFL System in 12x12, 1 mm POP Package,” 2020 International Solid-State Circuits Conference, San Francisco, Feb. 2020. 11. I. Cutress, “Intel Lakefield and Foveros”, Hot Chips 2019, August 20, 2019. 12. H-J. Lee et al., “Multi-die Integration Using Advanced Packaging Technologies”, IEEE CICC 2020, March 2020. 13. S. Pellerano, et al., “A Scalable 71-to-76GHz 64-Element Phased-Array Transceiver Module with 2x2 Direct-Conversion IC in 22nm FinFET CMOS Technology”, 2019 International Solid-State Circuits Conference, San Francisco, February 2019. 14. S. Zihir et al., “60-GHz 64- and 256-element Wafer-Scale Phased Array Transmitters Using Full-Reticle and Sub-Reticle Stitching Techniques”, IEEE Transactions on Microwave Theory and Techniques, Vol. 64, No. 12, Dec. 2016. 15. O. Orhan et al., “A Power Efficient Fully Digital Beamforming Architecture for mmWave Communications”, 2019 IEEE 89th Vehicular Technology Conference (VTC-2019), Kuala Lumpur, Malaysia, April 28 – May 1, 2019.

February 18, 2021 Farhana Sheikh, Intel Corporation 47 References

16. R. Mahajan, et al., “Embedded Multi-Die Interconnect Bridge (EMIB) – A Localized, High Density Multi-Chip Packaging (MCP) Interconnect,” IEEE Transactions on Components, Packaging and Manufacturing Technology, Volume: 9, Issue: 10, Oct. 2019, pp. 1952-1962. 17. Intel Foveros: https://hub.packtpub.com/intel-unveils-the-first-3d-logic-chip-packaging-technology-foveros-powering-its-new-10nm-chips-sunny-cove/ 18. B. Sell, et al., “22FFL: A high performance and ultra low power FinFET technology for mobile and RF applications,” 2017 International Electron Devices Meeting, San Francisco, Dec. 2017. 19. H-J Lee, et al., “Intel 22nm FinFET (22FFL) Process Technology for RF and mmWave Applications and Circuit Design Optimization for FinFET Technology,” 2018 International Electron Devices Meeting, San Francisco, CA, Dec. 2018. 20. H-J Lee, et al., “Implementation of high-power RF devices with hybrid work function and oxide thickness in 22nm low-power FinFET technology,” 2019 International Electron Devices Meeting, San Francisco, CA, Dec. 2019. 21. G. Fettweis, et al., “Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy-Efficient Computing”, Proceedings of the IEEE, Vol. 107, No. 1, pp. 204 – 231, January 2019. 22. G. Lacaille, et al., “Design and Demonstration of a Scalable Massive MIMO Uplink at E-Band”, 2020 IEEE International Conference on Communications Workshops (ICC 2020), Dublin, Ireland, June 7-11, 2020. 23. S. Malkowsky, et al., “The World’s First Real-Time Testbed for Massive MIMO: Design, Implementation, and Validation”, IEEE Access, vol. 5, pp. 9073-9088, 2017. 24. Advanced Interface Bus (AIB) Specification, Rev 2.0DRAFT3, 14 July 2020, CHIPS Alliance and Intel Corporation, https://github.com/chipsalliance/AIB-specification/blob/master/AIB_Specification%202_0_DRAFT3.pdf 25. P. Sagazio et al., “Architecture and Circuit Choices for 5G Millimeter-Wave Beamforming Transceivers”, IEEE Communications Magazine, December 2018. 26. https://www.intel.com/content/www/us/en/products/programmable/fpga/agilex.html 27. F. Sheikh et al., “Channel-Adaptive Complex K-Best MIMO Detection Using Lattice Reduction”, 2014 IEEE Workshop on Signal Processing Systems (SiPS), Belfast, Ireland, September 2014.

February 18, 2021 Farhana Sheikh, Intel Corporation 48 References

28. A. Abnous and J. Rabaey, “Ultra-low-power Domain-specific Multimedia Processors”, IEEE VLSI Signal Processing Workshop, San Francisco, Oct. 30 – Nov. 1, 1996. 29. H. Zhang et al., “A 1V Heterogeneous Reconfigurable Processor IC for Baseband Wireless Applications”, IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, Feb. 2000. 30. H. Braunisch et al., “High-speed Performance of Silicon Bridge Die-to-Die Interconnects”, IEEE 20th Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), San Jose, CA, Oct. 23-26, 2011. 31. L. Madden et al., “Advanced High Performance Heterogeneous Integration Through Die Stacking”, IEEE European Solid-State Device Research Conference (ESSDERC), Bordeaux, France, Sept. 17-21, 2012. 32. C. Erdmann et al., “A Heterogeneous 3D-IC Consisting of Two 28nm FPGA Die and 32 Reconfigurable High-Performance Data Converters”, IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, Feb. 2014. 33. A. A. Elsherbini et al., “Heterogeneous Integration Using Omni-Directional Interconnect Packaging”, IEEE International Electron Devices Meeting (IEDM), Dec. 7-11, 2019. 34. D. B. Ingerly et al., “Foveros: 3D Integration and the Use of Face-to-Face Chip Stacking for Logic Devices”, IEEE International Electron Devices Meeting (IEDM), Dec. 7-11, 2019. 35. I. Cutress, “Intel Next-Gen 10-micron Stacking: Going 3D Beyond Foveros”, anandtech.com, August 2020. 36. H. W. Then et al., “3D Heterogeneous Integration of High-Performance High-K Metal GaN NMOS and Si PMOS Transistors on 300mm High- Resistivity Si Substrate for Energy-Efficient and Compact Power Delivery, RF (5G and beyond) and SoC Applications”, IEEE International Electron Devices Meeting (IEDM), Dec. 7-11, 2019. 37. F. Sheikh and P. Fischer, “New Directions in Heterogeneous Integration: Nano-, Micro-, and Macro-3D ICs”, DARPA ERI Summit 2020. 38. S. Burke, “Onwards and Upwards: Xilinx Next Generation Multi-die Construction Flows”, Synopsys Users Group Silicon Valley (SNUG), March 20-21, 2019. 39. S. K. Moore, “Intel Drives New Bus for Future Chiplets”, IEEE Spectrum, July 2018.

February 18, 2021 Farhana Sheikh, Intel Corporation 49 References

40. T. M. Hancock and J. C. Demmin, “Heterogeneous and 3D Integration at DARPA”, 2019 IEEE International 3D Systems Integration Conference (3DIC), Sendai, Japan, Oct. 8-10, 2019. 41. L. Shannon et. al., “Technology Scaling in FPGAs: Trends in Applications and Architectures”, 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, Vancouver, BC Canada, May 2-6, 2015. 42. M. Borgatti et. al., “A 1GOPS Reconfigurable Signal Processing IC with Embedded FPGA and 3-Port 1.2GB/s System”, IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, Feb. 2003. 43. C. C. Wang et. al., “A Multi-Granularity FPGA with Hierarchical Interconnects for Efficient and Flexible ”, IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, Feb. 2014. 44. M. Natsui et. al., “An FPGA-Accelerated Fully Nonvolatile Unit for Sensor-Node Applications in 40nm CMOS/MTJ- Hybrid Technology Achieving 47.14μW Operation at 200MHz”, IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, Feb. 2019. 45. S. Naffziger et. al., “Chiplet Architecture for High-Performance Server and Desktop Products”, IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, Feb. 2020. 46. P. Vivet et. al., “A 220GOPS 96-core processor with 6 chiplets 3D-stacked on an active interposer offering 0.6ns/mm latency, 3Tb/s/mm2 inter-chiplet interconnects and 156mW/mm2 @ 82%-peak-efficiency dc-dc converters”, IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, Feb. 2020. 47. M-S. Lin et. al., “A 7-nm 4-GHz Arm®-Core-Based CoWoS® Chiplet Design for High-Performance Computing”, IEEE Journal of Solid-State Circuits, Vol. 55, No. 4, April 2020, pp. 956-966.

February 18, 2021 Farhana Sheikh, Intel Corporation 50