Understanding and Improving the Latency of DRAM-Based Memory Systems

Total Page:16

File Type:pdf, Size:1020Kb

Understanding and Improving the Latency of DRAM-Based Memory Systems Understanding and Improving the Latency of DRAM-Based Memory Systems Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Kevin K. Chang M.S., Electrical & Computer Engineering, Carnegie Mellon University B.S., Electrical & Computer Engineering, Carnegie Mellon University arXiv:1712.08304v1 [cs.AR] 22 Dec 2017 Carnegie Mellon University Pittsburgh, PA May, 2017 Copyright ©2017, Kevin K. Chang Abstract Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic random-access memory), which has been used as the physical substrate for main memory. In stark contrast with capacity and bandwidth, DRAM latency has remained almost constant, reducing by only 1.3x in the same time frame. Therefore, long DRAM latency continues to be a critical performance bot- tleneck in modern systems. Increasing core counts, and the emergence of increasingly more data-intensive and latency-critical applications further stress the importance of providing low-latency memory access. In this dissertation, we identify three main problems that contribute significantly to long latency of DRAM accesses. To address these problems, we present a series of new techniques. Our new techniques significantly improve both system performance and energy efficiency. We also examine the critical relationship between supply voltage and latency in modern DRAM chips and develop new mechanisms that exploit this voltage-latency trade-off to improve energy efficiency. First, while bulk data movement is a key operation in many applications and operating systems, contemporary systems perform this movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel. The use of this narrow channel for bulk data movement results in high latency and high en- ergy consumption. This dissertation introduces a new DRAM design, Low-cost Inter-linked SubArrays (LISA), which provides fast and energy-efficient bulk data movement across sub- arrays in a DRAM chip. We show that the LISA substrate is very powerful and versatile by demonstrating that it efficiently enables several new architectural mechanisms, including low-latency data copying, reduced DRAM access latency for frequently-accessed data, and reduced preparation latency for subsequent accesses to a DRAM bank. Second, DRAM needs to be periodically refreshed to prevent data loss due to leakage. Unfortunately, while DRAM is being refreshed, a part of it becomes unavailable to serve memory requests, which degrades system performance. To address this refresh interference problem, we propose two access-refresh parallelization techniques that enable more overlap- ping of accesses with refreshes inside DRAM, at the cost of very modest changes to the memory controllers and DRAM chips. These two techniques together achieve performance close to an idealized system that does not require refresh. Third, we find, for the first time, that there is significant latency variation in accessing different cells of a single DRAM chip due to the irregularity in the DRAM manufacturing process. As a result, some DRAM cells are inherently faster to access, while others are in- i herently slower. Unfortunately, existing systems do not exploit this variation and use a fixed latency value based on the slowest cell across all DRAM chips. To exploit latency variation within the DRAM chip, we experimentally characterize and understand the behavior of the variation that exists in real commodity DRAM chips. Based on our characterization, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism to reduce DRAM latency by categorizing the DRAM cells into fast and slow regions, and accessing the fast regions with a reduced latency, thereby improving system performance significantly. Our extensive exper- imental characterization and analysis of latency variation in DRAM chips can also enable the development of other new techniques to improve performance or reliability. Fourth, this dissertation, for the first time, develops an understanding of the latency be- havior due to another important factor { supply voltage, which significantly impacts DRAM performance, energy consumption, and reliability. We take an experimental approach to understanding and exploiting the behavior of modern DRAM chips under different supply voltage values. Our detailed characterization of real commodity DRAM chips demonstrates that memory access latency can be reliably reduced by increasing the DRAM array supply voltage. Based on our characterization, we propose Voltron, a new mechanism that im- proves system energy efficiency by dynamically adjusting the DRAM supply voltage using a new performance model. Our extensive experimental data on the relationship between DRAM supply voltage, latency, and reliability can further enable developments of other new mechanisms that improve latency, energy efficiency, or reliability. The key conclusion of this dissertation is that augmenting DRAM architecture with sim- ple and low-cost features, and developing a better understanding of manufactured DRAM chips together lead to significant memory latency reduction as well as energy efficiency im- provement. We hope and believe that the proposed architectural techniques and the detailed experimental data and observations on real commodity DRAM chips presented in this dis- sertation will enable development of other new mechanisms to improve the performance, energy efficiency, or reliability of future memory systems. ii Acknowledgments The pursuit of Ph.D. has been a period of fruitful learning experience for me, not only in the academic arena, but also on a personal level. I would like to reflect on the many people who have supported and helped me to become who I am today. First and foremost, I would like to thank my advisor, Prof. Onur Mutlu, who has taught me how to think critically, speak clearly, and write thoroughly. Onur generously provided the resources and the open environment that enabled me to carry out my research. I am also very thankful to Onur for giving me the opportunities to collaborate with students and researchers from other institutions. They have broadened my knowledge and improved my research. I am grateful to the members of my thesis committee: Prof. James Hoe, Prof. Kayvon Fatahalian, Prof. Moinuddin Qureshi, and Prof. Steve Keckler for serving on my defense. They provided me valuable comments and helped make the final stretch of my Ph.D. very smooth. I would like to especially thank Prof. James Hoe for introducing me to computer architecture and providing me numerous pieces of advice throughout my education. I would like to thank my internship mentors, who provided the guidance to make my work successful for both sides: Gabriel Loh, Mithuna Thottethodi, Yasuko Eckert, Mike O'Connor, Srilatha Manne, Lisa Hsu, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Shih-Lien Lu, and Manu Awasthi. I thank the Intel Corporation and Semiconductor Research Corporation (SRC) for their generous financial support. During graduate school, I have met many wonderful fellow graduate students and friends whom I am grateful to. Members in SAFARI research group have been both great friends and colleagues to me. Rachata Ausavarungnirun was my great cubic mate who supported and tolerated me for many years. Donghyuk Lee was our DRAM guru who introduced expert DRAM knowledge to many of us. Saugata Ghose was my collaborator and mentor who assisted me in many ways. Hongyi Xin has been a good friend who taught me a great deal about bioinformatics and amused me with his great sense of humor. Yoongu Kim taught me how to conduct research and think about problems from different perspectives. Samira Khan provided me insightful academic and life advice. Gennady Pekhimenko was always helpful when I am in need. Vivek Seshadri is someone I aspire to be because of his creative and methodical approach to problem solving and thinking. Lavanya Subramanian was always warm and welcoming when I approached her with ideas and problems. Chris Fallin's critical feedback on research during the early years of my research was extremely helpful. Kevin Hsieh was always willing to listen to my problems in school and life, and provided me with the right advice. Nandita Vijaykumar was a strong-willed person who gave me unwavering advice. I am grateful to other SAFARI members for their companionship: Justin Meza, Jamie, Ben, HanBin Yoon, Yang Li, Minesh Patel, Jeremie Kim, Amirali Boroumand, and iii Damla Senol. I also thank graduate interns and visitors who have assisted me in my research: Hasan Hassan, Abhijith Kashyap, and Abdullah Giray Yaglikci. Graduate school is a long and lonely journey that sometimes hits you the hardest when working at midnight. I feel very grateful for the many friends who were there to help me grind through it. I want to thank Richard Wang, Yvonne Yu, Eugene Wang, and Hongyi Xin for their friendship and bringing joy during my low time. I would like to thank my family for their enormous support and sacrifices that they made. My mother, Lili, has been a pillar of support throughout my life. She is the epitome of love, strength, and sacrifice. I am grateful to her strong belief in education. My sister, Nadine, has always been there kindly supporting me with unwavering love. I owe all of my success to these two strong women in my life. I would like to thank my late father, Pei-Min, for his love and full support. My childhood was full of joy because of him. I am sorry that he has not lived to see me finish my Ph.D. I also thank my step father, Stephen, for his encouragement and support. Lastly, I thank my girlfriend Sherry Huang, for all her love and understanding. Finally, I am grateful to the Semiconductor Research Corporation and Intel for providing me a fellowship. I would like to acknowledge the support of Google, Intel, NVIDIA, Samsung, VMware, and the United States Department of Energy.
Recommended publications
  • Improving DRAM Performance by Parallelizing Refreshes
    Improving DRAM Performance by Parallelizing Refreshes with Accesses Kevin Kai-Wei Chang Donghyuk Lee Zeshan Chishti† [email protected] [email protected] [email protected] Alaa R. Alameldeen† Chris Wilkerson† Yoongu Kim Onur Mutlu [email protected] [email protected] [email protected] [email protected] Carnegie Mellon University †Intel Labs Abstract Each DRAM cell must be refreshed periodically every re- fresh interval as specified by the DRAM standards [11, 14]. Modern DRAM cells are periodically refreshed to prevent The exact refresh interval time depends on the DRAM type data loss due to leakage. Commodity DDR (double data rate) (e.g., DDR or LPDDR) and the operating temperature. While DRAM refreshes cells at the rank level. This degrades perfor- DRAM is being refreshed, it becomes unavailable to serve mance significantly because it prevents an entire DRAM rank memory requests. As a result, refresh latency significantly de- from serving memory requests while being refreshed. DRAM de- grades system performance [24, 31, 33, 41] by delaying in- signed for mobile platforms, LPDDR (low power DDR) DRAM, flight memory requests. This problem will become more preva- supports an enhanced mode, called per-bank refresh, that re- lent as DRAM density increases, leading to more DRAM rows freshes cells at the bank level. This enables a bank to be ac- to be refreshed within the same refresh interval. DRAM chip cessed while another in the same rank is being refreshed, alle- density is expected to increase from 8Gb to 32Gb by 2020 as viating part of the negative performance impact of refreshes.
    [Show full text]
  • NON-CONFIDENTIAL for Publication COMP
    EUROPEAN COMMISSION Brussels, 15.1.2010 SG-Greffe(2010) D/275 C(2010) 150 Subject: Case COMP/C-3/ 38 636 Rambus (Please quote this reference in all correspondence) […]* 1. I refer to Hynix' complaint to the Commission of 18 December 2002 lodged jointly with Infineon pursuant to Article 3 of Council Regulation No. 17/621 against Rambus Inc. ("Rambus"), an undertaking incorporated in 1990 in California and reincorporated in Delaware, USA, in 1997, with its principal place of business in Los Altos, California, regarding alleged violations of Article 101 and Article 102 of the Treaty on the Functioning of the European Union ("TFEU")2 in connection with computer memory chips which are known as synchronous DRAM chips ("the Complaint"). I also refer to the letters listed here below, by which Hynix provided additional information/explanations on the above matter, as well as the Commission’s letter of 13 October 2009 ["Article 7 letter"] addressed to Hynix in that matter and the response to the Article 7 letter of 12 November 2009. […] 2. For the reasons set out below, the Commission considers that there is no sufficient degree of Community interest for conducting a further investigation into the alleged infringement and rejects your complaint pursuant to Article 7(2) of the Commission Regulation (EC) 773/20043. * This version of the Commission Decision of 15.1.2010 does not contain any business secrets or other confidential information. 1 Regulation No. 17 of the Council of 6 February 1962, First Regulation implementing Articles 85 and 86 of the Treaty (OJ No. 13, 21.2.1962, p.
    [Show full text]
  • Refresh Now and Then Seungjae Baek, Member, IEEE, Sangyeun Cho, Senior Member, IEEE, and Rami Melhem, Fellow, IEEE
    IEEE TRANSACTIONS ON COMPUTERS 1 Refresh Now and Then Seungjae Baek, Member, IEEE, Sangyeun Cho, Senior Member, IEEE, and Rami Melhem, Fellow, IEEE Abstract—DRAM stores information in electric charge. Because DRAM cells lose stored charge over time due to leakage, they have to be “refreshed” in a periodic manner to retain the stored information. This refresh activity is a source of increased energy consumption as the DRAM density grows. It also incurs non-trivial performance loss due to the unavailability of memory arrays during refresh. This paper first presents a comprehensive measurement based characterization study of the cell-level data retention behavior of modern low-power DRAM chips. 99.7% of the cells could retain the stored information for longer than 1 second at a high temperature. This average cell retention behavior strongly indicates that we can deeply reduce the energy and performance penalty of DRAM refreshing with proper system support. The second part of this paper, accordingly, develops two practical techniques to reduce the frequency of DRAM refresh operations by excluding a few leaky memory cells from use and by skipping refreshing of unused DRAM regions. We have implemented the proposed techniques completely in the Linux OS for experimentation, and measured performance improvement of up to 17.2% with the refresh operation reduction of 93.8% on smartphone like low-power platforms. Index Terms—SDRAM, refresh operation, power consumption, performance improvement. ✦ 1 INTRODUCTION again. Refresh period, the time interval within which we RAM is commonly used in a computer system’s main must walk through all rows once, is typically 32 ms or D memory.
    [Show full text]
  • Zynq-7000 All Programmable Soc and 7 Series Devices Memory Interface Solutions User Guide (UG586) [Ref 2]
    Zynq-7000 AP SoC and 7 Series Devices Memory Interface Solutions (v4.1) DS176 April 4, 2018 Advance Product Specification • I/O Power Reduction option reduces average I/O Introduction power by automatically disabling DQ/DQS IBUFs and The Xilinx® Zynq®-7000 All Programmable SoC and internal terminations during writes and periods of 7 series FPGAs memory interface solutions cores provide inactivity high-performance connections to DDR3 and DDR2 • Internal VREF support SDRAMs, QDR II+ SRAM, RLDRAM II/RLDRAM 3, and • LPDDR2 SDRAM. Multicontroller support for up to eight controllers • Two controller request processing modes: DDR3 and DDR2 SDRAMs o Normal: reorder requests to optimize system throughput and latency This section discusses the features, applications, and functional description of Xilinx 7 series FPGAs memory o Strict: memory requests are processed in the order interface solutions in DDR3 and DDR2 SDRAMs. These received solutions are available with an optional AXI4 slave interface. LogiCORE™ IP Facts Table Core Specifics DDR3 SDRAM Features Supported Zynq®-7000 All Programmable SoC (1) • Component support for interface widths up to 72 bits Device Family 7series(2) FPGAs • Supported DDR3 Component and DIMM, DDR2 Single and dual rank UDIMM, RDIMM, and SODIMM Memory support Component and DIMM, QDR II+, RLDRAM II, RLDRAM 3, and LPDDR2 SDRAM Components • DDR3 (1.5V) and DDR3L (1.35V) Resources See Table 1. • 1, 2, 4, and 8 Gb density device support Provided with Core • 8-bank support Documentation Product Specification • x8 and x16 device
    [Show full text]
  • Memory & Storage Challenges and Solutions
    Memory & Storage Challenges and Solutions G S A 2 0 1 9 Jinman Han Senior Vice President, Memory Product Planning & Application Engineering Legal Disclaimer This presentation is intended to provide information concerning SSD and memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. The information in this presentation or accompanying oral statements may include forward-looking statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward- looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods. Abstract Memory-centric system innovation is the overarching theme of modern semiconductor technology and is one of the crucial driving forces of the future IT world.
    [Show full text]
  • Micron Technology Inc
    MICRON TECHNOLOGY INC FORM 10-K (Annual Report) Filed 10/26/10 for the Period Ending 09/02/10 Address 8000 S FEDERAL WAY PO BOX 6 BOISE, ID 83716-9632 Telephone 2083684000 CIK 0000723125 Symbol MU SIC Code 3674 - Semiconductors and Related Devices Industry Semiconductors Sector Technology Fiscal Year 03/10 http://www.edgar-online.com © Copyright 2010, EDGAR Online, Inc. All Rights Reserved. Distribution and use of this document restricted under EDGAR Online, Inc. Terms of Use. UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended September 2, 2010 OR TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to Commission file number 1-10658 Micron Technology, Inc. (Exact name of registrant as specified in its charter) Delaware 75 -1618004 (State or other jurisdiction of (IRS Employer incorporation or organization) Identification No.) 8000 S. Federal Way, Boise, Idaho 83716 -9632 (Address of principal executive offices) (Zip Code) Registrant ’s telephone number, including area code (208) 368 -4000 Securities registered pursuant to Section 12(b) of the Act: Title of each class Name of each exchange on which registered Common Stock, par value $.10 per share NASDAQ Global Select Market Securities registered pursuant to Section 12(g) of the Act: None (Title of Class) Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act.
    [Show full text]
  • Low Power DRAM Evolution Osamu Nagashima Executive Professional Micron Memory Japan
    Low Power DRAM Evolution Osamu Nagashima Executive Professional Micron Memory Japan JEDEC Mobile Copyright © 2016 Micron Technology, Inc & IOT Forum How We Got Here • Low Power DRAM evolved from a lower- voltage, lower-performance version of PC-DRAM designed for mobile packages to become one of the highest bandwidth-per-pin DRAMs available • High resolution displays, high-resolution cameras, and 3D rendered content are the primary drivers for increased bandwidth in mobile devices Mainstream DRAM Datarate by Type and Year of Introduction 4500 4000 3500 3000 2500 2000 1500 1000 500 0 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 LPDDR PC-DDR Evolution of Mainstream DRAM Energy Power Evolution 50.00 DDR2 DDR3 DDR4 LPDDR2 LPDDR3 LPDDR4 ENERGY, PJ/BITENERGY, 0.00 500 1000 1500 2000 2500 3000 3500 4000 4500 DATA RATE, MBPS Typical Mobile Device Usage 2% 8% 100% 4% 4% • The percentage of 20% 80% active usage has 20% 24% greatly increased in 60% recent years, driving 22% 13% an increase in 40% 12% memory bandwidth 20% 37% 34% • This has shifted 0% limitations from Heavy user Light user standby battery life to active battery life and Read Energy Write Energy Activate Energy thermal limits Standby Energy Refresh Energy Self Refresh Energy Near-Term Future • This evolution of system limitations is driving future LPDRAM architectures, beginning with the evolution of the LPDDR4 standard • Responding to the need for lower power, JEDEC is developing a reduced-I/O power version of LPDDR4, called LPDDR4X • LPDDR4X will reduce the Vddq level from 1.1v to 0.6v • Signaling swing will remain similar to LPDDR4 – This allows the same receiver designs and specifications to be used for both LPDDR4 and LPDDR4X LPDDR4X: I/O Energy Reduction • Reducing Vddq from 1.1v to 0.6v produces about 40% I/O energy savings LPDDR4 vs.
    [Show full text]
  • SDRAM Memory Systems: Architecture Overview and Design Verification SDRAM Memory Systems: Architecture Overview and Design Verification Primer
    Primer SDRAM Memory Systems: Architecture Overview and Design Verification SDRAM Memory Systems: Architecture Overview and Design Verification Primer Table of Contents Introduction . 3 - 4 DRAM Trends . .3 DRAM . 4 - 6 SDRAM . 6 - 9 DDR SDRAM . .6 DDR2 SDRAM . .7 DDR3 SDRAM . .8 DDR4 SDRAM . .9 GDDR and LPDDR . .9 DIMMs . 9 - 13 DIMM Physical Size . 9 DIMM Data Width . 9 DIMM Rank . .10 DIMM Memory Size & Speed . .10 DIMM Architecture . .10 Serial Presence Detect . .12 Memory System Design . .13 - 15 Design Simulation . .13 Design Verification . .13 Verification Strategy . .13 SDRAM Verification . .14 Glossary . .16 - 19 2 www.tektronix.com/memory SDRAM Memory Systems: Architecture Overview and Design Verification Primer Introduction Memory needs to be compatible with a wide variety of memory controller hubs used by the computer DRAM (Dynamic Random Access Memory) is attractive to manufacturers. designers because it provides a broad range of performance Memory needs to work when a mixture of different and is used in a wide variety of memory system designs for manufacturer’s memories is used in the same memory computers and embedded systems. This DRAM memory system of the computer. primer provides an overview of DRAM concepts, presents potential future DRAM developments and offers an overview Open memory standards are useful in helping to ensure for memory design improvement through verification. memory compatibility. DRAM Trends On the other hand, embedded systems typically use a fixed There is a continual demand for computer memories to be memory configuration, meaning the user does not modify larger, faster, lower powered and physically smaller. These the memory system after purchasing the product.
    [Show full text]
  • Memory Expert Knowledge
    Memory Expert Knowledge As the density increased throughout the 1970’s and in the Memory Experts 1980’s, Japanese DRAM manufacturers, who could sell The friendly team of Memory Experts at Mr Memory higher quality modules at lower prices, overtook IBM. 9 y have on average ears experience. They are fully Traditionally DRAM modules have an asynchronous knowledgeable on all things memory. interface, which means that they respond as quickly as Have a read through just some of that knowledge... possible to changes In control inputs. What is DRAM? SDRAM History DRAM stands for Dynamic Random Access Memory. This is the memory that you use in Desktops, Laptops, Synchronous Dynamic Random Access Memory Servers and other devices. A computer stores quickly (SDRAM) is memory that is synchronised with the System accessible data in the form of 0’s and 1’s. It is dynamic Bus (which connects all the major components of a because it refreshes its capacitor charge periodically computer system). Unlike DRAM, SDRAM waits for a with new electricity, compared to the older SRAM (Static clock signal before responding to control inputs. Random Access Memory). Data is stored in the SDRAM is organised into a grid structure (Banks) with capacitors within an integrated circuit (also known as a rows (Word lines) and columns (Bit lines). Data stored is Chip). in blocks and are defined by the coordinates of the row DRAM memory is a volatile memory – which means it and column of the specific information. loses its data very quickly once the power is removed. SDRAM was universally accepted in 1993, with the This is different from NAND Flash Memory used in invention of the Samsung KM48SL2000.
    [Show full text]
  • LPDDR4 Is Expected to Crossover LPDDR3 in 2018
    S32 DDR Tools Suite Radu Ivan, Jacques Landry Presenter title goes here Second line title goes here October 2019 | Session #AMF-AUT-T3843 Company Public – NXP, the NXP logo, and NXP secure connections for a smarter world are trademarks of NXP B.V. All other product or service names are the property of their respective owners. © 2019 NXP B.V. Agenda • Industry Trends • Basic DDR SDRAM Structure • Next Generation of DDR Subsystems • S32 DDR Tool COMPANY PUBLIC 1 Industry Trends COMPANY PUBLIC 2 Industry Trend • DDR4 DRAM pricing is lower or same as DDR3\3L. The pricing crossover occurred around Q4 of 2015. Similarly, LPDDR4 is expected to crossover LPDDR3 in 2018. • Production DDR4 DRAM, DIMMs and LPDDR4 are available from most DRAM vendors. • The first NXP device with DDR4 support, T104x product, taped out in Q42013.The LS1043A also supports DDR4. Nearly 4 years of product experience with DDR4. − Many current and all future QorIQ products including T1, LS1, and LS2 products will support DDR4. • The first NXP device with LPDDR4 support is the i.MX8 Family. COMPANY PUBLIC 3 DDR3, DDR4 and LPDDR – Major Vendors Supported by all major memory vendors COMPANY PUBLIC 4 DRAM Migration Roadmap 100% 80% 60% DDR4 DDR3 40% DDR2 DDR 20% 0% 2016 2017 2018 2019 2016 2017 2018 2019 DDR 1% 1% 1% 1% DDR2 2% 1% 1% 1% DDR3 32% 20% 14% 8% DDR4 65% 78% 84% 91% COMPANY PUBLIC 5 DDR3/DDR3L/DDR4 Power Saving • DDR3 DRAM provides 20% power savings over DDR2 • DDR3L DRAM provides 10% power savings over DDR3 • DDR4 DRAM provides 37% power savings over DDR3L COMPANY PUBLIC 6 LPDDR2/LPDDR3/LPDDR4 Power Saving Reductions in operating voltage – LPDDR • LPDDR2 DRAM 1.8V (LPSDR, LPDDR) provides 36% power 1.2V (LPDDR2, LPDDR3) savings over LPDDR1 1.1V, 0.6V (LPDDR4/LPDDR4X) • LPDDR3 DRAM provides 10% power savings over LPDDR2 • PLDDR4 DRAM provides 37% power savings over LPDDR3 COMPANY PUBLIC 7 DDR SDRAM Highlights and Comparison Feature/Category DDR3 DDR4 LPDDR4 Package BGA only BGA only BGA.
    [Show full text]
  • Understanding and Exploiting Design-Induced Latency Variation in Modern DRAM Chips
    Understanding and Exploiting Design-Induced Latency Variation in Modern DRAM Chips Donghyuk Leeyz Samira Khan@ Lavanya Subramaniany Saugata Ghosey Rachata Ausavarungniruny Gennady Pekhimenkoy{ Vivek Seshadriy{ Onur Mutluyx yCarnegie Mellon University zNVIDIA @University of Virginia {Microsoft Research xETH Zürich ABSTRACT 40.0%/60.5%, which translates to an overall system perfor- Variation has been shown to exist across the cells within mance improvement of 14.7%/13.7%/13.8% (in 2-/4-/8-core a modern DRAM chip. Prior work has studied and exploited systems) across a variety of workloads, while ensuring reli- several forms of variation, such as manufacturing-process- able operation. or temperature-induced variation. We empirically demon- strate a new form of variation that exists within a real DRAM 1 INTRODUCTION chip, induced by the design and placement of different compo- In modern systems, DRAM-based main memory is sig- nents in the DRAM chip: different regions in DRAM, based nificantly slower than the processor. Consequently, pro- on their relative distances from the peripheral structures, cessors spend a long time waiting to access data from require different minimum access latencies for reliable oper- main memory [5, 66], making the long main memory ac- ation. In particular, we show that in most real DRAM chips, cess latency one of the most critical bottlenecks in achiev- cells closer to the peripheral structures can be accessed much ing high performance [48, 64, 67]. Unfortunately, the la- faster than cells that are farther. We call this phenomenon tency of DRAM has remained almost constant in the past design-induced variation in DRAM.
    [Show full text]
  • Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access ∗
    Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access ∗ Niladrish Chatterjee‡ Manjunath Shevgoor‡ Rajeev Balasubramonian‡ Al Davis‡ Zhen Fang§† Ramesh Illikkal∞ Ravi Iyer∞ ‡University of Utah §Nvidia Corporation ∞Intel Labs {nil,shevgoor,rajeev,ald}@cs.utah.edu [email protected] {ramesh.g.illikkal,ravishankar.iyer}@intel.com Abstract of DRAM chips to build and exploit a heterogeneous memory sys- tem. This paper takes an important step in uncovering the potential The DRAM main memory system in modern servers is largely ho- of such a heterogeneous DRAM memory system. mogeneous. In recent years, DRAM manufacturers have produced The DRAM industry already produces chips with varying proper- chips with vastly differing latency and energy characteristics. This ties. Micron offers a Reduced Latency DRAM (RLDRAM) product provides the opportunity to build a heterogeneous main memory sys- that offers lower latency and lower capacity, and is targeted at high tem where different parts of the address space can yield different performance routers, switches, and network processing [7]. Mi- latencies and energy per access. The limited prior work in this area cron also offers a Low Power DRAM (LPDRAM) product that of- has explored smart placement of pages with high activities. In this fers lower energy and longer latencies and that has typically been paper, we propose a novel alternative to exploit DRAM heterogene- employed in the mobile market segment [6]. Our work explores ity. We observe that the critical word in a cache line can be easily innovations that can exploit a main memory that includes regular recognized beforehand and placed in a low-latency region of the DDR chips as well as RLDRAM and LPDRAM chips.
    [Show full text]