Vincent Coffey Mike Albert Overview

Copy Link

Vincent Coffey Mike Albert Overview Origins of the Performance Gap The separation of CPU and Memory performance The reason for CPU performance increases The reason for Memory lagging behind Measures Taken to close the gap already (cache) How to counteract the problem Industry Origins of the Gap ● Memory focus on Cost/Size ○ Caused by early lack of storage space ○ Can be seen in early game development, space was at a premium ● CPU focus on performance ○ Limitations on single core performance ○ Multi core architecture Limiting Factors Bandwidth ● Busses limit amount of data able to be transmitted Latency/Distance (proximity) ● Space on chip is a premium ○ Faster Memory takes more space (SRAM vs DRAM) ● Need for more memory storage Attempts to Address Gap - Reduction HBM ● Trades Higher Latency for better bandwidth and lower power consumption ● AMD Developed, adopted on high-end video cards HMC - Similar to HBM, slower to evolve and not adopted in mainstream Attempts to Address Gap - Hiding Faster Cache ● Z-RAM - Zero Capacitor RAM ● T-RAM - Thyristor RAM Faster Non-volatile/Permanent Memory ● 3D XPoint ● NVRAM ○ MRAM - Magnetoresistive RAM ○ FeRAM - Ferroelectric RAM Zero Capacitor RAM ● Developed by Innovative Silicon ● Smaller Cell Size ○ 5x cell density ● Use of floating body effects of SOI ○ Ideal for SOI produced chips ○ 1.8 ns cell read time Thyristor RAM Use of Thyristor to replace 6 transistor DRAM cell ● Developed by T-RAM Semiconductor ● Read Speed <1.7 ns Write Speed <2.0 ns ○ At 170 nm ● 4x Cell Density 3D XPoint Memory Fill the gap between NAND permanent storage and Main Memory (RAM) ● 3D XPoint ○ 95k IOPS ○ 9 us Latency ● Flash ○ 13.4k IOPS ○ 73 ms Latency MRAM and FeRAM ● MRAM ○ In development since 1990s ○ Suffers from low density ○ Access times similar to SRAM ○ Density similar to DRAM ● FeRAM ○ In Production ○ Integrated onto TI MSP430 chip ■ Replaces EEPROM and Flash ■ 126uA/MHz vs 200uA/MHz Power Consumption with Flash and EEPROM If the Gap Widens... Is it worth improving CPU performance? Bottlenecked CPU performance Need for highly parallelizable programs Reduced single core performance increases Summary The different needs for CPU and Memory CPU performance bottlenecking Questions? Insert Funny Pic Here Sources Dr. John C McCallum http://www.jcmit.com https://www.amd.com/Documents/High-Bandwidth-Memory-HBM.pdf http://www.cs.columbia.edu/~sedwards/classes/2012/3827-spring/advanced-arch-2011.pdf http://techreport.com/review/28751/intel-core-i7-6700k-skylake-processor-reviewed/4 http://www.hotchips.org/wp-content/uploads/hc_archives/hc18/2_Mon/HC18.S3/HC18.S3T1.pdf EETimes http://img.deusm.com/eetimes/2014/02/1320947/snia-nvdimm-carousel.jpg http://www.eetimes.com/document.asp?doc_id=1328682 Trolomite - wikipedia (XPoint image) http://www.ti.com/lsds/ti/microcontrollers_16-bit_32-bit/msp/ultra-low_power/msp430frxx_fram/overview.page http://www.digitimes.com/bits_chips/a20060328PR202.html http://www.hotchips.org/wp-content/uploads/hc_archives/hc19/3_Tues/HC19.05/HC19.05.02.pdf http://www.dailytech.com/Hynix+Licenses+ISi+ZRAM+Technology+for+Future+DRAM+Chips/article8395.htm https://www.micron.com/ FRAM Image - Cyferz at English Wikipedia https://www.everspin.com/parallel-interface-mram.

Recommended publications

Performance Impact of Memory Channels on Sparse and Irregular Algorithms

Performance Impact of Memory Channels on Sparse and Irregular Algorithms Oded Green1,2, James Fox2, Jeffrey Young2, Jun Shirako2, and David Bader3 1NVIDIA Corporation — 2Georgia Institute of Technology — 3New Jersey Institute of Technology Abstract— Graph processing is typically considered to be a Graph algorithms are typically latency-bound if there is memory-bound rather than compute-bound problem. One com- not enough parallelism to saturate the memory subsystem. mon line of thought is that more available memory bandwidth However, the growth in parallelism for shared-memory corresponds to better graph processing performance. However, in this work we demonstrate that the key factor in the utilization systems brings into question whether graph algorithms are of the memory system for graph algorithms is not necessarily still primarily latency-bound. Getting peak or near-peak the raw bandwidth or even the latency of memory requests. bandwidth of current memory subsystems requires highly Instead, we show that performance is proportional to the parallel applications as well as good spatial and temporal number of memory channels available to handle small data memory reuse. The lack of spatial locality in sparse and transfers with limited spatial locality. Using several widely used graph frameworks, including irregular algorithms means that prefetched cache lines have Gunrock (on the GPU) and GAPBS & Ligra (for CPUs), poor data reuse and that the effective bandwidth is fairly low. we evaluate key graph analytics kernels using two unique The introduction of high-bandwidth memories like HBM and memory hierarchies, DDR-based and HBM/MCDRAM. Our HMC have not yet closed this inefﬁciency gap for latency- results show that the differences in the peak bandwidths sensitive accesses [19].
Nanotechnology ? Nram (Nano Random Access

International Journal Of Engineering Research and Technology (IJERT) IFET-2014 Conference Proceedings INTERFACE ECE T14 INTRACT – INNOVATE - INSPIRE NANOTECHNOLOGY – NRAM (NANO RANDOM ACCESS MEMORY) RANJITHA. T, SANDHYA. R GOVERNMENT COLLEGE OF TECHNOLOGY, COIMBATORE 13. containing elements, nanotubes, are so small, NRAM technology will Abstract— NRAM (Nano Random Access Memory), is one of achieve very high memory densities: at least 10-100 times our current the important applications of nanotechnology. This paper has best. NRAM will operate electromechanically rather than just been prepared to cull out answers for the following crucial electrically, setting it apart from other memory technologies as a questions: nonvolatile form of memory, meaning data will be retained even What is NRAM? when the power is turned off. The creators of the technology claim it What is the need of it? has the advantages of all the best memory technologies with none of How can it be made possible? the disadvantages, setting it up to be the universal medium for What is the principle and technology involved in NRAM? memory in the future. What are the advantages and features of NRAM? The world is longing for all the things it can use within its TECHNOLOGY palm. As a result nanotechnology is taking its head in the world. Nantero's technology is based on a well-known effect in carbon Much of the electronic gadgets are reduced in size and increased nanotubes where crossed nanotubes on a flat surface can either be in efficiency by the nanotechnology. The memory storage devices touching or slightly separated in the vertical direction (normal to the are somewhat large in size due to the materials used for their substrate) due to Van der Waal's interactions.
MRAM (Magnetoresistive Random Access Memory)

MRAM (MagnetoResistive Random Access Memory) By : Dhruv Dani 200601163 Shitij Kumar 200601084 Team - N Flow of Presentation Current Memory Technologies Riddles Introduction Principle, Structure and Working Working Modes Schematic Overview MRAM v/s Other Memory Elements Applications in Embedded Systems Case Studies Supported Microcontrollers and Companies Constraints References Current Memory Technologies Volatile When the power is switched off the information is lost. Restarting: programs and data need to be reloaded resulting in increase of idle time. Non -Volatile Can retain stored information permanently Stores information that does not require frequent changing. Read/Write/Erase cycles consume a lot of time. Commonly Known Memories Volatile – Static RAM (SRAM), Dynamic RAM (DRAM) Non –Volatile – Flash, EEPROM Riddle - 1 A car component manufacturing company ‘X’ has to built Air Bag systems for a range of cars. The requisites of building such a system are that it has to interact with the various sensors which detect and record passenger weight and are employed in other safety devices on the vehicle which perform various crucial tasks like detecting the impact of the possible collision. Such a real time system requires the memory to be susceptible to continuous reads, writes and overwrites in each clocked interval. As an embedded engineer for this company X which kind of memory would you use to implement such a system? Riddle - 2 The Defense Research and Development Organization of a nation ‘C’ has to build a system which can be employed by them for their military and aerospace applications. These systems at present require constant power supply to maintain various kinds databases consisting of confidential information.
Case Study on Integrated Architecture for In-Memory and In-Storage Computing

electronics Article Case Study on Integrated Architecture for In-Memory and In-Storage Computing Manho Kim 1, Sung-Ho Kim 1, Hyuk-Jae Lee 1 and Chae-Eun Rhee 2,* 1 Department of Electrical Engineering, Seoul National University, Seoul 08826, Korea; [email protected] (M.K.); [email protected] (S.-H.K.); [email protected] (H.-J.L.) 2 Department of Information and Communication Engineering, Inha University, Incheon 22212, Korea * Correspondence: [email protected]; Tel.: +82-32-860-7429 Abstract: Since the advent of computers, computing performance has been steadily increasing. Moreover, recent technologies are mostly based on massive data, and the development of artiﬁcial intelligence is accelerating it. Accordingly, various studies are being conducted to increase the performance and computing and data access, together reducing energy consumption. In-memory computing (IMC) and in-storage computing (ISC) are currently the most actively studied architectures to deal with the challenges of recent technologies. Since IMC performs operations in memory, there is a chance to overcome the memory bandwidth limit. ISC can reduce energy by using a low power processor inside storage without an expensive IO interface. To integrate the host CPU, IMC and ISC harmoniously, appropriate workload allocation that reﬂects the characteristics of the target application is required. In this paper, the energy and processing speed are evaluated according to the workload allocation and system conditions. The proof-of-concept prototyping system is implemented for the integrated architecture. The simulation results show that IMC improves the performance by 4.4 times and reduces total energy by 4.6 times over the baseline host CPU.
ECE 571 – Advanced Microprocessor-Based Design Lecture 17

ECE 571 { Advanced Microprocessor-Based Design Lecture 17 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 3 April 2018 Announcements • HW8 is readings 1 More DRAM 2 ECC Memory • There's debate about how many errors can happen, anywhere from 10−10 error/bit*h (roughly one bit error per hour per gigabyte of memory) to 10−17 error/bit*h (roughly one bit error per millennium per gigabyte of memory • Google did a study and they found more toward the high end • Would you notice if you had a bit flipped? • Scrubbing { only notice a flip once you read out a value 3 Registered Memory • Registered vs Unregistered • Registered has a buffer on board. More expensive but can have more DIMMs on a channel • Registered may be slower (if it buffers for a cycle) • RDIMM/UDIMM 4 Bandwidth/Latency Issues • Truly random access? No, burst speed fast, random speed not. • Is that a problem? Mostly filling cache lines? 5 Memory Controller • Can we have full random access to memory? Why not just pass on CPU mem requests unchanged? • What might have higher priority? • Why might re-ordering the accesses help performance (back and forth between two pages) 6 Reducing Refresh • DRAM Refresh Mechanisms, Penalties, and Trade-Offs by Bhati et al. • Refresh hurts performance: ◦ Memory controller stalls access to memory being refreshed ◦ Refresh takes energy (read/write) On 32Gb device, up to 20% of energy consumption and 30% of performance 7 Async vs Sync Refresh • Traditional refresh rates ◦ Async Standard (15.6us) ◦ Async Extended (125us) ◦ SDRAM -
Adm-Pcie-9H3 V1.5

ADM-PCIE-9H3 10th September 2019 Datasheet Revision: 1.5 AD01365 Applications Board Features • High-Performance Network Accelerator • 1x OpenCAPI Interface • Data CenterData Processor • 1x QSFP-DD Cages • High Performance Computing (HPC) • Shrouded heatsink with passive and fan cooling • System Modelling options • Market Analysis FPGA Features • 2x 4GB HBM Gen2 memory (32 AXI Ports provide 460GB/s Access Bandwidth) • 3x 100G Ethernet MACs (incl. KR4 RS-FEC) • 3x 150G Interlaken cores • 2x PCI Express x16 Gen3 / x8 Gen4 cores Summary The ADM-PCIE-9H3 is a high-performance FPGA processing card intended for data center applications using Virtex UltraScale+ High Bandwidth Memory FPGAs from Xilinx. The ADM-PCIE-9H3 utilises the Xilinx Virtex Ultrascale Plus FPGA family that includes on substrate High Bandwidth Memory (HBM Gen2). This provides exceptional memory Read/Write performance while reducing the overall power consumption of the board by negating the need for external SDRAM devices. There are also a number of high speed interface options available including 100G Ethernet MACs and OpenCAPI connectivity, to make the most of these interfaces the ADM-PCIE-9H3 is fitted with a QSFP-DD Cage (8x28Gbps lanes) and one OpenCAPI interface for ultra low latency communications. Target Device Host Interface Xilinx Virtex UltraScale Plus: XCVU33P-2E 1x PCI Express Gen3 x16 or 1x/2x* PCI Express (FSVH2104) Gen4 x8 or OpenCAPI LUTs = 440k(872k) Board Format FFs = 879k(1743k) DSPs = 2880(5952) 1/2 Length low profile x16 PCIe form Factor BRAM = 23.6Mb(47.3Mb) WxHxD = 19.7mm x 80.1mm x 181.5mm URAM = 90.0Mb(180.0Mb) Weight = TBCg 2x 4GB HBM Gen2 memory (32 AXI Ports Communications Interfaces provide 460GB/s Access Bandwidth) 1x QSFP-DD 8x28Gbps - 10/25/40/100G 3x 100G Ethernet MACs (incl.
High Bandwidth Memory for Graphics Applications Contents

High Bandwidth Memory for Graphics Applications Contents • Differences in Requirements: System Memory vs. Graphics Memory • Timeline of Graphics Memory Standards • GDDR2 • GDDR3 • GDDR4 • GDDR5 SGRAM • Problems with GDDR • Solution ‐ Introduction to HBM • Performance comparisons with GDDR5 • Benchmarks • Hybrid Memory Cube Differences in Requirements System Memory Graphics Memory • Optimized for low latency • Optimized for high bandwidth • Short burst vector loads • Long burst vector loads • Equal read/write latency ratio • Low read/write latency ratio • Very general solutions and designs • Designs can be very haphazard Brief History of Graphics Memory Types • Ancient History: VRAM, WRAM, MDRAM, SGRAM • Bridge to modern times: GDDR2 • The first modern standard: GDDR4 • Rapidly outclassed: GDDR4 • Current state: GDDR5 GDDR2 • First implemented with Nvidia GeForce FX 5800 (2003) • Midway point between DDR and ‘true’ DDR2 • Stepping stone towards DDR‐based graphics memory • Second‐generation GDDR2 based on DDR2 GDDR3 • Designed by ATI Technologies , first used by Nvidia GeForce FX 5700 (2004) • Based off of the same technological base as DDR2 • Lower heat and power consumption • Uses internal terminators and a 32‐bit bus GDDR4 • Based on DDR3, designed by Samsung from 2005‐2007 • Introduced Data Bus Inversion (DBI) • Doubled prefetch size to 8n • Used on ATI Radeon 2xxx and 3xxx, never became commercially viable GDDR5 SGRAM • Based on DDR3 SDRAM memory • Inherits benefits of GDDR4 • First used in AMD Radeon HD 4870 video cards (2008) • Current
Architectures for High Performance Computing and Data Systems Using Byte-Addressable Persistent Memory

Architectures for High Performance Computing and Data Systems using Byte-Addressable Persistent Memory Adrian Jackson∗, Mark Parsonsy, Michele` Weilandz EPCC, The University of Edinburgh Edinburgh, United Kingdom ∗[email protected], [email protected], [email protected] Bernhard Homolle¨ x SVA System Vertrieb Alexander GmbH Paderborn, Germany x [email protected] Abstract—Non-volatile, byte addressable, memory technology example of such hardware, used for data storage. More re- with performance close to main memory promises to revolutionise cently, flash memory has been used for high performance I/O computing systems in the near future. Such memory technology in the form of Solid State Disk (SSD) drives, providing higher provides the potential for extremely large memory regions (i.e. > 3TB per server), very high performance I/O, and new ways bandwidth and lower latency than traditional Hard Disk Drives of storing and sharing data for applications and workflows. (HDD). This paper outlines an architecture that has been designed to Whilst flash memory can provide fast input/output (I/O) exploit such memory for High Performance Computing and High performance for computer systems, there are some draw backs. Performance Data Analytics systems, along with descriptions of It has limited endurance when compare to HDD technology, how applications could benefit from such hardware. Index Terms—Non-volatile memory, persistent memory, stor- restricted by the number of modifications a memory cell age class memory, system architecture, systemware, NVRAM, can undertake and thus the effective lifetime of the flash SCM, B-APM storage [29]. It is often also more expensive than other storage technologies.
A Survey of Circuit Innovations in Ferroelectric Random-Access Memories

A Survey of Circuit Innovations in Ferroelectric Random-Access Memories Ali Sheikholeslami, MEMBER, IEEE, AND P. Glenn Gulak, SENIOR MEMBER, IEEE This paper surveys circuit innovations in ferroelectric memo- and low power consumption and the emergence of new ries at three circuit levels: memory cell, sensing, and architecture. applications such as contactless smart cards and digital A ferroelectric memory cell consists of at least one ferroelectric ca- cameras. pacitor, where binary data are stored, and one or two transistors that either allow access to the capacitor or amplify its content for Table 1 compares ferroelectric memories with elec- a read operation. Once a cell is accessed for a read operation, its trically erasable and programmable read-only memories data are presented in the form of an analog signal to a sense ampli- (EEPROM’s) and Flash memories, two types of floating-gate fier, where it is compared against a reference voltage to determine memories, in terms of density, read-access time, write-access its logic level. time, and the energy consumed in a 32-bit read/write. En- The circuit techniques used to generate the reference voltage must be robust to semiconductor processing variations across the chip joying a mature process technology, EEPROM’s and Flash and the device imperfections of ferroelectric capacitors. We review memories [1], [2] are superior to ferroelectric memories in six methods of generating a reference voltage, two being presented terms of density. Also, they require less power compared to for the first time in this paper. These methods are discussed and ferroelectric memories for read operations, a factor that will evaluated in terms of their accuracy, area overhead, and sensing keep them popular in applications that demand numerous complexity.
Evaluation of Ferroelectric Materials for Memory Applications

Calhoun: The NPS Institutional Archive Theses and Dissertations Thesis Collection 1990-06 Evaluation of ferroelectric materials for memory applications Josefson, Carl Elof Monterey, California: Naval Postgraduate School http://hdl.handle.net/10945/27767 NAVAL POSTGRADUATE SCHOOL Monterey, California k ?' AT)'0p D T I C I I ELECTE1 '- ' TH SI EVALUATION OF FERROELECTRIC MATERIALS FOR MEMORY APPLICATIONS by Carl Elof Josefson June 1990 Thesis Advisor: R. Panholzer Approved for public release; distribution is unlimited. 91 2 28 056 SECURITY CLASSIFICATION OF THIS PAGE RPRDOUETA I Form Approved REPORT DOCUMENTATION PAGE No. 0704-0188 la. REPORT SECURITY CLASSIFICATION lb. RESTRICTIVE MARKINGS Unclassified 2a. SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION /AVAILABILITY OF REPORT Approved for public release; 2b. DECLASSIFICATION /DOWNGRADING SCHEDULE distribution is unlimited. 4. PERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) 6a. NAME OF PERFORMING ORGANIZATION C.j. OFFICE SYMBOL 7a. NAME OF MONITORING ORGANIZATION (If appicable) Naval Postgraduate School 9 Naval Postgraduate School 6c. ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code) Monterey, CA 93943-5000 Monterey, CA 93943-5000 Ba. NAME OF FUNDING /SPONSORING 8b. OFFICE SYMBOL 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION I (If applicable) 8c. ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS PROGRAM PROJECT TASK WORK UNIT ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) Evaluation of Ferroelectric Materials for Memory Applications 12. PERSONAL AUTHOR(S) Carl E. Josefson 13a. TYPE OF REPORT 13b. TIME COVERED 14. DATE OF REPORT (YearMonthDay) 15. PAGE COUNT Master's Thesis FROM TO 1990 June I 97 16.
Phase Change Materials and Phase Change Memory Simone Raoux , Feng Xiong , Matthias Wuttig , and Eric Pop

Phase change materials and phase change memory Simone Raoux , Feng Xiong , Matthias Wuttig , and Eric Pop Phase change memory (PCM) is an emerging technology that combines the unique properties of phase change materials with the potential for novel memory devices, which can help lead to new computer architectures. Phase change materials store information in their amorphous and crystalline phases, which can be reversibly switched by the application of an external voltage. This article describes the advantages and challenges of PCM. The physical properties of phase change materials that enable data storage are described, and our current knowledge of the phase change processes is summarized. Various designs of PCM devices with their respective advantages and integration challenges are presented. The scaling limits of PCM are addressed, and its performance is compared to competing existing and emerging memory technologies. Finally, potential new applications of phase change devices such as neuromorphic computing and phase change logic are outlined. Introduction Properties of phase change materials Novel information storage concepts have been continuously Phase change materials exist in an amorphous and one or developed throughout history, from cave paintings to print- sometimes several crystalline phases, and they can be rapidly ing, from phonographs to magnetic tape, dynamic random and repeatedly switched between these phases. The switching access memory (DRAM), compact disks (CDs), and ﬂ ash is typically induced by heating through optical pulses or elec- memory, just to name a few. Over the last four decades, silicon trical (Joule) heating. The optical and electronic properties technology has enabled data storage through charge retention can vary signiﬁ cantly between the amorphous and crystalline on metal-oxide-silicon (MOS) capacitive structures.
FUJITSU LIMITED 7-1, Nishishinjuku 2-Chome, Shinjuku-Ku, Tokyo 163-0721 Tel : +81-3-5322-3353 Fax : +81-3-5322-3386

• FRAM is a registered trademark of Ramtron International Corporation. Other company names and brand names are the trademarks or registered trademarks of their respective owners. Japan Marketing Div., Electronic Devices Shinjuku Dai-ichi Seimei Bldg. FUJITSU LIMITED 7-1, Nishishinjuku 2-chome, Shinjuku-ku, Tokyo 163-0721 http://edevice.fujitsu.com/ Tel : +81-3-5322-3353 Fax : +81-3-5322-3386 North and South America Europe Asia Pacific Korea FUJITSU MICROELECTRONICS FUJITSU MICROELECTRONICS FUJITSU MICROELECTRONICS FUJITSU MICROELECTRONICS AMERICA, INC. EUROPE GmbH ASIA PTE LTD. KOREA LTD. 1250 E. Arques Avenue, M/S 333 Pittlerstrasse 47, #05-08, 151 Lorong Chuan, 1702 KOSMO TOWER, Sunnyvale, CA 94088-3470, USA D-63225 Langen, New Tech Park, 1002 Daechi-Dong, Tel : +1-408-737-5600 Germany Singapore 556741 Kangnam-Gu, Seoul Fax : +1-408-737-5999 Tel : +49-6103-690-0 Tel : +65-6281-0770 135-280, Korea http://www.fma.fujitsu.com/ Fax : +49-6103-690-122 Fax : +65-6281-0220 Tel : +82-02-3484-7100 http://www.fme.fujitsu.com/ http://www.fmal.fujitsu.com/ Fax : +82-02-3484-7111 http://www.fmk.fujitsu.com/ Specifications are subject to change without notice. For further information please contact each office. All Rights Reserved. The contents of this document are subject to change without notice. Customers are advised to consult with FUJITSU sales representatives before ordering. The information, such as descriptions of function and application circuit examples, in this document are presented solely for the purpose of reference to show examples of operations and uses of Fujitsu semiconductor device; Fujitsu does not warrant proper operation of the device with respect to use based on such information.