Big Data, AI, and the Future of Memory

Total Page:16

File Type:pdf, Size:1020Kb

Big Data, AI, and the Future of Memory Big Data, AI, and the Future of Memory Steven Woo Fellow and Distinguished Inventor, Rambus Inc. May 15, 2019 Memory Advancements Through Time 1990’s 2000’s 2010’s 2020’s Synchronous Memory Graphics Memory Low Power Memory Ultra High Bandwidth for PCs for Gaming for Mobile Memory for AI Faster Compute + Big Data Enabling Explosive Growth in AI 1980s Annual Size of the Global Datasphere – 1990s Now 175 ZB More 180 Accuracy Compute Neural Networks 160 140 120 Other Approaches 100 Zettabytes 80 60 40 20 Scale (Data Size, Model Size) 2010 2015 2020 2025 Source: Adapted from Jeff Dean, “Recent Advances in Artificial Intelligence and the Source: Adapted from Data Age 2025, sponsored by Seagate Implications for Computer System Design,” HotChips 29 Keynote, August 2017 with data from IDC Global DataSphere, Nov 2018 Key challenges: Moore’s Law ending, energy efficiency growing in importance ©2019 Rambus Inc. 3 AI Accelerators Need Memory Bandwidth Google TPU v1 1000 TPU Roofline Inference on newer silicon (Google TPU K80 Roofline HSW Roofline v1) built for AI processing largely limited LSTM0 by memory bandwidth LSTM1 10 MLP1 MLP0 v nVidia K80 CNN0 Inference on older, general purpose Intel Haswell CNN1 hardware (Haswell, K80) limited by LSTM0 1 LSTM1 compute and memory bandwidth TeraOps/sec (log scale) (log TeraOps/sec MLP1 MLP0 CNN0 0.1 CNN1 LSTM0 1 10 100 1000 Memory bandwidth is a critical LSTM1 Ops/weight byte (log scale) resource for AI applications = Google TPU v1 = nVidia K80 = Intel Haswell N. Jouppi, et.al., “In-Datacenter Performance Analysis of a Tensor Processing Unit™,” https://arxiv.org/ftp/arxiv/papers/1704/1704.04760.pdf ©2019 Rambus Inc. 4 Common Memory Systems for AI Applications On-Chip Memory HBM GDDR On-chip Memory Highest Bandwidth Very High Bandwidth Good tradeoff between bandwidth, and Power Efficiency and Density power efficiency, cost, and reliability Apple A11 Graphcore nVidia AMD Radeon nVidia GeForce AMD Radeon Bionic Processor IPU Tesla V100 RX Vega 56 RTX 2080Ti RX580 Multiple options for different needs, but more bandwidth is needed ©2019 Rambus Inc. 5 Power Dominated by Data Access and Data Movement 2013: Computation Will Take Less Energy 2014: Computation Energy Extremely Small Than On-Chip Data Movement by 2018 Compared to Data Access and Movement FLOPs will cost less than on- chip data movement (NUMA) Source: Horst Simon, “Why we need Exascale and why we Source: Mark Horowitz, ISSCC “Computing's energy problem won’t get there by 2020,” 2013 (and what we can do about it),” 2014. Data movement is a critical challenge to improving power-efficiency ©2019 Rambus Inc. 6 Thank you.
Recommended publications
  • PATENT PLEDGES Jorge L. Contreras*
    PATENT PLEDGES Jorge L. Contreras* ABSTRACT An increasing number of firms are making public pledges to limit the enforcement of their patents. In doing so, they are entering a little- understood middle ground between the public domain and exclusive property rights. The best-known of these patent pledges are FRAND commitments, in which patent holders commit to license their patents to manufacturers of standardized products on terms that are “fair, reasonable and non-discriminatory.” But patent pledges have been appearing in settings well beyond standard-setting, including open source software, green technology and the life sciences. As a result, this increasingly prevalent private ordering mechanism is beginning to reshape the role and function of patents in the economy. Despite their proliferation, little scholarship has explored the phenomenon of patent pledges beyond FRAND commitments and standard- setting. This article fills this gap by providing the first comprehensive descriptive account of patent pledges across the board. It offers a four-part taxonomy of patent pledges based on the factors that motivate patent holders to make them and the effect they are intended to have on other market actors. Using this classification system, it argues that pledges likely to induce reliance in other market actors should be treated as “actionable” * Associate Professor, S.J. Quinney College of Law, University of Utah and Senior Policy Fellow, American University Washington College of Law. The author thanks Jonas Anderson, Clark Asay, Marc Sandy Block, Mark Bohannon, Matthew Bye, Michael Carrier, Michael Carroll, Colleen Chien, Thomas Cotter, Carter Eltzroth, Carissa Hessick, Meredith Jacob, Jay Kesan, Anne Layne-Farrar, Irina Manta, Sean Pager, Gideon Parchomovsky, Arti Rai, Amelia Rinehart, Cliff Rosky, Daniel Sokol and Duane Valz for their helpful comments, suggestions and discussion of this article and contributions of data to the Patent Pledge Database at American University.
    [Show full text]
  • Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory Sung I
    Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory Sung I. Hong, Sally A. McKee†, Maximo H. Salinas, Robert H. Klenke, James H. Aylor, Wm. A. Wulf Dept. of Electrical and Computer Engineering †Dept. of Computer Science University of Virginia University of Utah Charlottesville, VA 22903 Salt Lake City, Utah 84112 Abstract current DRAM page forces a new page to be accessed. The Processor speeds are increasing rapidly, and memory speeds are overhead time required to do this makes servicing such a request not keeping up. Streaming computations (such as multi-media or significantly slower than one that hits the current page. The order of scientific applications) are among those whose performance is requests affects the performance of all such components. Access most limited by the memory bottleneck. Rambus hopes to bridge the order also affects bus utilization and how well the available processor/memory performance gap with a recently introduced parallelism can be exploited in memories with multiple banks. DRAM that can deliver up to 1.6Gbytes/sec. We analyze the These three observations — the inefficiency of traditional, performance of these interesting new memory devices on the inner dynamic caching for streaming computations; the high advertised loops of streaming computations, both for traditional memory bandwidth of Direct Rambus DRAMs; and the order-sensitive controllers that treat all DRAM transactions as random cacheline performance of modern DRAMs — motivated our investigation of accesses, and for controllers augmented with streaming hardware. a hardware streaming mechanism that dynamically reorders For our benchmarks, we find that accessing unit-stride streams in memory accesses in a Rambus-based memory system.
    [Show full text]
  • Download Attachment
    NON-CONFIDENTIAL 2010-1556 UNITED STATES COURT OF APPEALS FOR THE FEDERAL CIRCUIT ASUSTEK COMPUTER INC., ASUS COMPUTER INTERNATIONAL, INC., BFG TECHNOLOGIES, INC., BIOSTAR MICROTECH (U.S.A.) CORP., BIOSTAR MICROTECH INTERNATIONAL CORP., DIABLOTEK, INC., EVGA CORP., G.B.T., INC., GIGA-BYTE TECHNOLOGY CO., LTD., HEWLETT-PACKARD COMPANY, MSI COMPUTER CORP., MICRO-STAR INTERNATIONAL COMPANY, LTD., GRACOM TECHNOLOGIES LLC (FORMERLY KNOWN AS PALIT MULTIMEDIA, INC.), PALIT MICROSYSTEMS LTD., PINE TECHNOLOGY (MACAO COMMERCIAL OFFSHORE) LTD., AND SPARKLE COMPUTER COMPANY, LTD. Appellants, — v. — INTERNATIONAL TRADE COMMISSION, Appellee, and RAMBUS, INC., Intervenor, and NVIDIA CORPORATION, Intervenor. ______________ 2010-1557 ______________ NVIDIA CORPORATION, Appellant, — v. — INTERNATIONAL TRADE COMMISSION, Appellee, and RAMBUS, INC., Intervenor. ______________ ON APPEAL FROM THE UNITED STATES INTERNATIONAL TRADE COMMISSION IN INVESTIGATION NO. 337-TA-661 ______________ NON-CONFIDENTIAL REPLY BRIEF OF APPELLANTS NVIDIA CORPORATION ET AL. _______________ *Caption Continued on Next Page COMPANION CASES TO: 2010-1483 RAMBUS, INC., Appellant, — v. — INTERNATIONAL TRADE COMMISSION, Appellee, and NVIDIA CORPORATION ET AL., Intervenors. ______________ RUFFIN B. CORDELL I. NEEL CHATTERJEE MARK S. DAVIES ANDREW R. KOPSIDAS RICHARD S. SWOPE RACHEL M. MCKENZIE FISH & RICHARDSON P.C. NITIN GAMBHIR LAUREN J. PARKER 1425 K Street, NW, 11th Floor ORRICK, HERRINGTON ORRICK, HERRINGTON Washington, DC 20005 & SUTCLIFFE LLP & SUTCLIFFE LLP Tel. No. 202-626-6449
    [Show full text]
  • Appendix to Brief of Appellee and Cross-Appellant Rambus Inc
    PUBLIC UNITED STATES OF AMERICA BEFORE FEDERAL TRADE COMMISSION COMMISSIONERS: Deborah Platt Majoras, Chairman Orson Swindle Thomas B. Leary Pamela Jones Harbour Jon Leibowitz ) In the Matter of ) RAMBUS INCORPORATED, ) ) a corporation. ) Docket No. 9302 ) ) ) APPENDIX TO BRIEF OF APPELLEE AND CROSS-APPELLANT RAMBUS INC. Pursuant to the Commission’s October 4, 2004 Order granting Rambus leave to file an appendix, Rambus submits this appendix to its appeal brief containing a glossary of terms. -1- US1DOCS 4782131v1 Glossary of Terms Auto precharge: DRAMs store information as minute quantities of electrical charge in memory cells – no charge is interpreted as “0" and positive charge as a “1.” Sense amplifiers are circuits on the DRAM that sense the charge in a memory cell and amplify it when information is to be read from the DRAM. Before the sense amplifiers can perform this function, they must be “precharged” to an intermediate charged state. “Auto precharge” is a feature that was originally found in RDRAMs and later adopted by SDRAMs and DDR SDRAMs that allows the controller to determine whether the sense amplifiers are to be automatically precharged – that is, precharged without the need for a separate precharge command – at the end of a read or write operation. Bit/Byte: A bit or “binary digit” is the unit of information used by digital computers that takes on only two values – “0" or “1." Each memory cell in a DRAM stores a single bit. A “byte” usually refers to eight bits. Since each bit in a byte can take on two values, a byte can take on 28, or 256, possible values.
    [Show full text]
  • The Intel Random Number Generator
    ® THE INTEL RANDOM NUMBER GENERATOR CRYPTOGRAPHY RESEARCH, INC. WHITE PAPER PREPARED FOR INTEL CORPORATION Benjamin Jun and Paul Kocher April 22, 1999 Information in this white paper is provided without guarantee or warranty of any kind. This review represents the opinions of Cryptography Research and may or may not reflect opinions of Intel Corporation. Characteristics of the Intel RNG may vary with design or process changes. © 1999 by Cryptography Research, Inc. and Intel Corporation. 1. Introduction n = − H K∑ pi log pi , Good cryptography requires good random i=1 numbers. This paper evaluates the hardware- where pi is the probability of state i out of n based Intel Random Number Generator (RNG) possible states and K is an optional constant to for use in cryptographic applications. 1 provide units (e.g., log(2) bit). In the case of a Almost all cryptographic protocols require random number generator that produces a k-bit the generation and use of secret values that must binary result, pi is the probability that an output be unknown to attackers. For example, random will equal i, where 0 ≤ i < 2k . Thus, for a number generators are required to generate -k perfect random number generator, pi = 2 and public/private keypairs for asymmetric (public the entropy of the output is equal to k bits. This key) algorithms including RSA, DSA, and means that all possible outcomes are equally Diffie-Hellman. Keys for symmetric and hybrid (un)likely, and on average the information cryptosystems are also generated randomly. present in the output cannot be represented in a RNGs are also used to create challenges, nonces sequence shorter than k bits.
    [Show full text]
  • Rambus Buys Into CXL Interconnect Ecosystem with Two New Deals June 17 2021
    Market Insight Report Reprint Rambus buys into CXL interconnect ecosystem with two new deals June 17 2021 John Abbott The addition of PLDA and AnalogX will speed up Rambus’ initiative to enable memory expansion and pooling in disaggregated infrastructure through the emerging Compute Express Link interconnect ecosystem. CXL 2.0 is a PCIe-based technology intended to make it easier to connect CPUs with memory and specialist accelerators, and to separate memory from physical servers to improve memory bandwidth, capacity and efficiency. This report, licensed to Rambus, developed and as provided by S&P Global Market Intelligence (S&P), was published as part of S&P’s syndicated market insight subscription service. It shall be owned in its entirety by S&P. This report is solely intended for use by the recipient and may not be reproduced or re-posted, in whole or in part, by the recipient without express permission from S&P. Market Insight Report Reprint Introduction Snapshot Rambus has acquired two companies, PDLA and Acquirer Rambus AnalogX. The silicon memory IP company is looking to boost its portfolio of new products and IP through Targets PLDA, AnalogX contributing to the emerging CXL (Compute Express Subsector Semiconductor IP Link) interconnect ecosystem. CXL 2.0, launched last November, is a PCIe-based technology intended to Deal value Not disclosed make it easier to connect CPUs with memory and Date announced 16-Jun-21 specialist accelerators, and to separate memory Closing date, Q3 2021 from physical servers to improve memory bandwidth, expected capacity and efficiency. Advisers None disclosed Rambus intends to combine its existing serial connection, memory and security skills, and IP with the newly acquired IP and engineers from the two companies to produce CXL buffering chips and memory controllers, powering next-generation PCIe 6.0 and CXL 3.0 devices.
    [Show full text]
  • ASUS P3C-E Rambus™ Motherboard
    R P3C-E Rambus™ Motherboard USER’S MANUAL USER'S NOTICE No part of this manual, including the products and software described in it, may be repro- duced, transmitted, transcribed, stored in a retrieval system, or translated into any language in any form or by any means, except documentation kept by the purchaser for backup purposes, without the express written permission of ASUSTeK COMPUTER INC. (“ASUS”). ASUS PROVIDES THIS MANUAL “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OR CONDITIONS OF MERCHANTABILITY OR FITNESS FOR A PAR- TICULAR PURPOSE. IN NO EVENT SHALL ASUS, ITS DIRECTORS, OFFICERS, EMPLOYEES OR AGENTS BE LIABLE FOR ANY INDIRECT, SPECIAL, INCIDEN- TAL, OR CONSEQUENTIAL DAMAGES (INCLUDING DAMAGES FOR LOSS OF PROFITS, LOSS OF BUSINESS, LOSS OF USE OR DATA, INTERRUPTION OF BUSI- NESS AND THE LIKE), EVEN IF ASUS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES ARISING FROM ANY DEFECT OR ERROR IN THIS MANUAL OR PRODUCT. Product warranty or service will not be extended if: (1) the product is repaired, modified or altered, unless such repair, modification of alteration is authorized in writing by ASUS; or (2) the serial number of the product is defaced or missing. Products and corporate names appearing in this manual may or may not be registered trade- marks or copyrights of their respective companies, and are used only for identification or explanation and to the owners’ benefit, without intent to infringe. • Adobe and Acrobat are registered trademarks of Adobe Systems Incorporated. • Intel, LANDesk, and Pentium are registered trademarks of Intel Corporation.
    [Show full text]
  • Challenges and Solutions for Future Main Memory
    White Paper May 26, 2009 Challenges and Solutions for Future Main Memory DDR3 SDRAM is used in many computing systems today and offers data rates of up to 1600Mbps. To achieve performance levels beyond DDR3, future main memory subsystems must attain faster data rates while maintaining low power, high access efficiency and competitive cost. This whitepaper will outline some of the key challenges facing next generation main memory and the Rambus innovations that can be applied to advance the main memory roadmap. Challenges and Solutions for Future Main Memory 1 Introduction Rambus Innovations Enable: Demand for an enriched end-user • 3200Mbps Data Rates experience and increased performance in • 40% Lower Power next-generation computing applications is • Up to 50% Higher never-ending. Driven by recent multi-core computing, virtualization and processor Throughput integration trends, the computing industry is • 2-4x Higher Capacity in need of a next-generation main memory solution with anticipated data rates of up to In order to address these challenges and 3200 Megabits per second (Mbps) at a meet the increased performance demanded similar or lower power envelope as that of by recent multi-core computing, DDR3-1600 solutions. The divergence of virtualization and chip integration trends, these two requirements – increased Rambus has developed an architectural performance and reduced power – presents concept for future main memory. The a difficult challenge for the design of future architecture builds upon existing Rambus power-efficient memories. innovations and designs, such as FlexPhase™ circuitry, FlexClocking™ and In addition to power-efficiency challenges, Dynamic Point-to-Point (DPP) technologies, future memory solutions face potential and introduces new concepts, which include bottlenecks in access efficiency and Near Ground Signaling and Module capacity, both of which become more Threading.
    [Show full text]
  • Dynamic Random Access Memory Topics
    Dynamic Random Access Memory Topics Simple DRAM Fast Page Mode (FPM) DRAM Extended Data Out (EDO) DRAM Burst EDO (BEDO) DRAM Synchronous DRAM (SDRAM) Rambus DRAM (RDRAM) Double Data Rate (DDR) SDRAM One capacitor and transistor of power, the discharge y Leaks a smallcapacitor amount slowly Simplicit refresh Requires top sk de in ed Us le ti la o v General DRAM Formats • DRAM is produced as integrated circuits (ICs) bonded and mounted into plastic packages with metal pins for connection to control signals and buses • In early use individual DRAM ICs were usually either installed directly to the motherboard or on ISA expansion cards • later they were assembled into multi-chip plug-in modules (DIMMs, SIMMs, etc.) General DRAM formats • Some Standard Module Type: • DRAM chips (Integrated Circuit or IC) • Dual in-line Package (DIP) • DRAM (memory) modules • Single in-line in Package(SIPP) • Single In-line Memory Module (SIMM) • Dual In-line Memory Module (DIMM) • Rambus In-line Memory Module (RIMM) • Small outline DIMM (SO-DIMM) Dual in-line Package (DIP) • is an electronic component package with a rectangular housing and two parallel rows of electrical connecting pins • 14 pins Single in-line in Package (SIPP) • It consisted of a small printed circuit board upon which were mounted a number of memory chips. • It had 30 pins along one edge which mated with matching holes in the motherboard of the computer. Single In-line Memory Module (SIMM) SIMM can be a 30 pin memory module or a 72 pin Dual In-line Memory Module (DIMM) Two types of DIMMs: a 168-pin SDRAM module and a 184-pin DDR SDRAM module.
    [Show full text]
  • Dram Technology
    7 DRAM TECHNOLOGY OVERVIEW DRAM (Dynamic Random Access Memory) is the main memory used for all desktop and larger computers. Each elementary DRAM cell is made up of a single MOS transistor and a storage capacitor (Figure 7-1). Each storage cell contains one bit of information. This charge, however, leaks off the capacitor due to the sub-threshold current of the cell transistor. Therefore, the charge must be refreshed several times each second. Bit Line Word Line Transistor Capacitor Plate Source: ICE, "Memory 1997" 19941 Figure 7-1. DRAM Cell HOW THE DEVICE WORKS The memory cell is written to by placing a “1” or “0” charge into the capacitor cell. This is done during a write cycle by opening the cell transistor (gate to power supply or VCC) and presenting either VCC or 0V (ground) at the capacitor. The word line (gate of the transistor) is then held at ground to isolate the capacitor charge. This capacitor will be accessed for either a new write, a read, or a refresh. Figure 7-2 shows a simplified DRAM diagram. The gates of the memory cells are tied to the rows. The read (or write) of a DRAM is done in two main steps as illustrated in Figure 7-3. The row (X) and column (Y) addresses are presented on the same pads and multiplexed. The first step consists of validating the row addresses and the second step consists of validating the column addresses. INTEGRATED CIRCUIT ENGINEERING CORPORATION 7-1 DRAM Technology Address X Row Decode Pads CAS RAS Y Data Sense Data Amplifier Data Sense Data Amplifier Output Dout Buffer Data Y Decode Sense Data Amplifier Input D in Buffer Data Sense Data Amplifier Source: ICE, "Memory 1997" 22430 Figure 7-2.
    [Show full text]
  • RAMBUS INC. (Exact Name of Registrant As Specified in Its Charter) ______Delaware 94-3112828 (State Or Other Jurisdiction of (I.R.S
    Table of Contents UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 ________________________________________ Form 10-K ________________________________________ (Mark One) ☑ ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended December 31, 2020 Or ☐ TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to Commission file number: 000-22339 ________________________________________ RAMBUS INC. (Exact name of registrant as specified in its charter) ________________________________________ Delaware 94-3112828 (State or other jurisdiction of (I.R.S. Employer Identification No.) incorporation or organization) 4453 North First Street Suite 100 San Jose , California 95134 (Address of principal executive offices) (Zip Code) Registrant’s telephone number, including area code: (408) 462-8000 ________________________________________ Securities registered pursuant to Section 12(b) of the Act: Title of Each Class Trading Symbol(s) Name of Each Exchange on Which Registered Common Stock, $.001 Par Value RMBS The NASDAQ Stock Market LLC (The NASDAQ Global Select Market) Securities registered pursuant to Section 12(g) of the Act: None ________________________________________ Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act. Yes ☑ No ☐ Indicate by check mark if the registrant is not required to file reports pursuant to Section 13 or Section 15(d) of the Act. Yes ☐ No ☑ Indicate by check mark whether the registrant (1) has filed all reports required to be filed by Section 13 or 15(d) of the Securities Exchange Act of 1934 during the preceding 12 months (or for such shorter period that the registrant was required to file such reports), and (2) has been subject to such filing requirements for the past 90 days.
    [Show full text]
  • Inside the Failed Patent Litigation Strategy of Rambus, Inc. by Jan Wolfe Corporate Counsel | April 9, 2012
    Inside the Failed Patent Litigation Strategy of Rambus, Inc. By Jan Wolfe Corporate Counsel | April 9, 2012 Few Silicon Valley companies inspire a wider range of emotions than the computer chip designer Rambus, Inc. In the view of its leaders—as well as fiercely loyal shareholders who have flown around the country to watch its lawyers in court—the company is a visionary turned victim. Rambus revolutionized the computer industry in the 1990s, they say, when the company debuted state-of-the-art chip technology and licensed it to chip manufacturers. Rambus hasn't been fairly compensated for those innovations, its defenders maintain, because rivals copied the technology and refused to pay up. Confronted with brazen collusion and theft, Rambus had no choice but to seek relief in court by suing its rivals. "We really didn't plan on being a litigation company," says Thomas Lavelle, Rambus's general counsel since 2006. "Our founders had breakthrough innovations and great patents, and they believed that would lead to success. I don't think it occurred to them that—as cynical as the world can be—the better product might not be adopted." But some of Rambus's rivals have a very different take. They've argued that the company delivered a flawed product; that it defrauded the Joint Electron Device Engineering Council (the body tasked with creating industry standards for computer memory); and that it ambushed JEDEC participants with unreasonable demands for royalty payments. When that plan stalled, Rambus allegedly turned to a Plan B: litigation against rivals Hynix Semiconductor Inc.; Infineon Technologies AG; Micron Technology, Inc.; and Samsung Electronics Co., Ltd.
    [Show full text]