Intel Stratix 10 MX Devices Solve the Memory Bandwidth Challenge Jeffrey Schulz for Key End Markets and Applications
Total Page:16
File Type:pdf, Size:1020Kb
WHITE PAPER FPGA Intel® Stratix® 10 MX Devices with Samsung* HBM2 Solve the Memory Bandwidth Challenge The Intel Stratix 10 MX device family helps customers efficiently meet their most demanding memory bandwidth requirements. Introduction Conventional memory solutions have limitations that make it difficult for them to meet next-generation memory bandwidth requirements. This white paper describes the emerging memory landscape that addresses these limitations. The Intel® Stratix® 10 MX DRAM system-in-package (SiP) family combines a 1 GHz Authors high-performance monolithic FPGA fabric, state-of-the-art Intel Embedded Multi-die Interconnect Bridge (EMIB) technology, and High Bandwidth Memory , all Manish Deo in a single package. The Intel Stratix 10 MX family helps customers efficiently meet Senior Product Marketing Manager their most demanding memory bandwidth requirements, which are not possible Intel Programmable Solutions Group using conventional memory solutions. This white paper describes the solution and explains how Intel Stratix 10 MX devices solve the memory bandwidth challenge Jeffrey Schulz for key end markets and applications. In-Package I/O Implementation Lead Intel Programmable Solutions Group The Memory Bandwidth Challenge Lance Brown Memory bandwidth is a critical bottleneck for next-generation platforms. The Senior Strategic and critical path in any system’s performance is the system’s ability to process large Technical Marketing Manager amounts of data quickly. Computing elements (such as the FPGA or CPU) must read Intel Programmable Solutions Group or write large chunks of data to and from memory efficiently. A host of end markets and applications (including data center, high-performance Table of Contents computing systems, broadcast (8K), wireline networking, data analytics, and Introduction ....................1 Internet of Things (IoT)) drive memory bandwidth requirements, and process an ever-increasing amount of data. Data centers and networking platforms channel The Memory Bandwidth this data, and strive to increase efficiency and accelerate workloads simultaneously Challenge .....................1 to keep up with the bandwidth explosion. Total data center traffic is projected to hit DRAM Memory Landscape .......3 10.4 zetabytes (ZB) in 2019. Intel Stratix 10 MX (DRAM SiP) To meet exploding memory bandwidth requirements, system designers have Devices .......................4 attempted to use currently available conventional technologies. However, these conventional technologies pose a number of challenges. Intel Stratix 10 MX Device Key Benefits. 4 2014 2019 Data Center Applications .........6 .4 10.4 Algorithmic Acceleration for Total ata enter Trafi Total ata enter Trafi Streaming Cyber Security 1 ettabyte is equal to sextillion bytes or a trillion gigabytes Analytics .....................7 Source: Cisco VNI Global IP Traffic Forecast, 2014 - 2019 Conclusion ......................8 Figure 1. Data Center Traffic Projection by 2019 References ......................8 Where to Get More Information ..8 White Paper | Intel Stratix 10 MX Devices Integrating Samsung* HBM2 Solve the Memory Bandwidth Challenge Challenge #1: Limited I/O Bandwidth Challenge #3: Smaller Form Factor I/O bandwidth scaling has not kept up with bandwidth Attempting to meet increasing memory bandwidth requirements. Simply put, it is not physically possible to requirements using conventional technologies often means have enough I/O pins to support a wide enough memory bus assembling more discrete memory devices on the PCB. To delivering the required bandwidth. Adding more components ensure proper system level margins, designers use specific cannot solve the problem due to the increased power and board layout guidelines for trace lengths, termination form factor impact. resistors, and routing layers. These rules limit how compact the design can be, and how closely the devices can be The wireline networking market provides a good example placed. As memory bandwidth requirements grow, it will of this particular challenge. The aggregate full duplex be increasingly difficult to meet memory bandwidth targets bandwidth required for packet storage and retrieval is while satisfying form factor constraints using conventional a function of the traffic load. System designers must solutions such as DDR. Figure 3 illustrates the form factor adequately provision enough margin to ensure sustained line impact as the memory bandwidth requirement goes from rate performance (100G, 200G, etc.). Figure 2 illustrates the 80 GBps to 256 GBps. 200G traffic load inflection point in the wireline networking space. An application requires over 700 I/O pins and five DDR4 (x72, 3200 Mbps) DIMMs to meet basic data plane Challenge #4: JEDEC DDR Bandwidth Sustainability memory functionality. As Figure 2 shows, 400G systems Conventional technologies such as DDR are finding it will need over 1,100 I/O pins and 8 DDR4 (x72, 3200 Mbps) increasingly difficult to scale to meet future memory DIMMs. Going forward, it is not feasible to increase package bandwidth requirements. DDR technologies have scaled I/O counts to meet these bandwidth requirements. every generation for the past decade and this scaling is 100 Challenge #2: Flat Power Budgets 11 Power budgets are a key challenge area. To meet memory 1000 bandwidth requirements, system designers must assemble an increasing number of discrete memory (components, 00 DIMMs, etc.) that connect to computing elements using umer 10 standard PCB traces. A standard DDR4 x72 bit interface Reuired DD IO consumes roughly 130 parallel PCB traces. Large I/O 00 buffers drive these long PCB traces, thereby consuming 4 significant energy/bit. Additionally, an application requiring 400 256 GBps memory bandwidth, needs an estimated 10 DDR4 3,200 Mbps DIMMs consuming an estimated 40 W total 00 power. (1) The solution quickly becomes power limited for applications requiring the highest bandwidth levels. On the other hand, system-level power budgets have stayed flat or 0 100 00 400 even dropped. System designers are increasingly challenged to obtain the highest bandwidth per watt at a system level. Ofered raic oad ps 1 Assuming 30% RD/WR, single rank configuration. I/O and controller power included. Figure 2. I/O Bandwidth Limited 2 Bps Memory Bandwidth PCB 0 Bps Memory Bandwidth PCB A A A A 1 A 4. mm2) 4 00 Mbps IMMs 1 x 0 mm2) As x 4. mm2) 10 4 00 Mbps IMM 10 x 1 x 0 mm2) Not Drawn to Scale Figure 3. Form Factor Limitations with Conventional Solutions 2 White Paper | Intel Stratix 10 MX Devices Integrating Samsung* HBM2 Solve the Memory Bandwidth Challenge nearing its end. Projecting out beyond DDR4 and assuming • Medium bandwidth/medium power efficiency—DDR3 and a continued 2X rate of increase, one can estimate bandwidth DDR4 have been the workhorses of the memory landscape (per DIMM) in the range of 40 GBps (DDR5). More importantly, for more than a decade. Wide I/O 2 (WIO2) uses 3D stacking the memory bandwidth requirements for next-generation to stack the memory on top of the computing element, applications are expected to far exceed the trends from the providing higher bandwidth with superior power efficiency. past decade. See Figure 4. 40 DRAM Memory Landscape The memory industry has recognized that conventional 0 technologies cannot meet future bandwidth requirements. As a result, the memory landscape is evolving, with multiple 0 competing solutions attempting to address the challenge. 0 1 As shown in Figure 5, basic requirements span both control 1 X istorial and data plane memories. Control plane or fast path 14 ate o memories (such as SRAM) typically provide the highest Inrease† 1 performance (random transaction rates) and low latency. 1 Data plane memories (such as DRAM) have high capacity and 10 8 bandwidth. andwidth ps 6 Legacy products include standard DDR-based memory 6 4 solutions. 3D-based memory solutions evolved to meet the 2 3 high bandwidth, low power, and small form factor challenges. These solutions stack multiple DRAM layers using through- 00 004 00 01 01-01 silicon via (TSV) technology. Because the memories are DDR DDR2 DDR3 4 DDR5 stacked vertically, 3D memory solutions provide maximum capacity with a smaller form factor. Figure 4. DDR (DIMM) Bandwidth Projection The following 3D memories communicate with the computing element using high-speed serial transceivers or high-density Serial parallel GPIO: I/O Interae • Hybrid Memory Cube (HMC) is a 3D DRAM memories with a ./ serial interface. apable Memory • MoSys bandwidth engine (BE) is a DRAM memory with a Control and ata Parallel ide I/O serial interface. lane Memory Interae AM • High Bandwidth Memory is a 3D DRAM memory with a parallel I/O interface. Legay roduts Figure 6 plots power efficiency as a function of bandwidth for various memory types. Figure 5. Emerging Memory Landscape • Low bandwidth/highest power efficiency—Low-power DDR (LPDDR) provides the highest power efficiency and is a great fit for mobile end markets. LPDDR Low-ower IO ide I/O M ybrid Memory ube 4 LPDDR3 IO igh andwidth Memory Power Eficiency M /4 Low andwidth Medium andwidth ighest andwidth ighest ower fiieny Medium-igh ower fiieny Medium-igh ower fiieny Not Drawn to Scale Bandwidth Figure 6. Comparing Power Efficiency vs. Bandwidth 3 White Paper | Intel Stratix 10 MX Devices Integrating Samsung* HBM2 Solve the Memory Bandwidth Challenge • High bandwidth/medium to high power efficiency—High Bandwdith Memory (such as Samsung HBM2) and HMC are competing new technologies. A Intel Stratix