Memory Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Class Notes Class: IX Topic: INPUT, OUTPUT, MEMORY and STORAGE DEVICES of a COMPUTER SYSTEM Subject: INFORMATION TECHNOLOGY
Class Notes Class: IX Topic: INPUT, OUTPUT, MEMORY AND STORAGE DEVICES OF A COMPUTER SYSTEM Subject: INFORMATION TECHNOLOGY Q1. A collection of eight bits is called BYTE Q2. Which of the following is an example of non-volatile memory? a) ROM b)RAM c) LSI d) VLSI Q3. Which of the following is unit of measurement used with computer system? a) Byte b) Megabyte c) Gigabyte d) All of the above Q4. Which of the following statement is false? a) Secondary storage in non-volatile. b) Primary storage is volatile. c) When the computer is turned off, data and instructions stored in primary storage are erased. d) None of the above. Q5. The secondary storage devices can only store data but they cannot perform a) Arithmetic operation b) Logic operation c) Fetch operation d) Either of above Q6. Which of the following does not represent an I/O device a) Speaker which beep b) Plotter C) Joystick d) ALU Q7. Which of the following is a correct definition of volatile memory? a) It loses its content at high temperatures. b) It is to be kept in airtight boxes. c) It loses its contents on failure of power supply d) It does not lose its contents on failure of power supply Q8. One thousand byte represent a a) Megabyte b) Gigabyte c) Kilobyte d) None of these Q9.What does a storage unit provide? a) A place to show data b) A place to store currently worked on information b) A place to store information Q10. What are four basic components of a computer? a) Input devices, Output devices, printing and typing b) Input devices, processing unit, storage and Output devices c) Input devices, CPU, Output devices and RAM Q11. -
The Era of Expeditious Nanoram-Based Computers Enhancement of Operating System Performance in Nanotechnology Environment
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 1 (2018) pp. 375-384 © Research India Publications. http://www.ripublication.com The Era of Expeditious NanoRAM-Based Computers Enhancement of Operating System Performance in Nanotechnology Environment Mona Nabil ElGohary PH.D Student, Computer Science Department Faculty of Computers and Information, Helwan University, Cairo, Egypt. 1ORCID: 0000-0002-1996-4673 Dr. Wessam ElBehaidy Assistant Professor, Computer Science Department, Faculty of Computers and Information, Helwan University, Cairo, Egypt. Ass. Prof. Hala Abdel-Galil Associative Professor, Computer Science Department Faculty of Computers and Information, Helwan University, Cairo, Egypt. Prof. Dr. Mostafa-Sami M. Mostafa Professor of Computer Science Faculty of Computers and Information, Helwan University, Cairo, Egypt. Abstract They announced that by 2018 will produce the first NanoRAM. The availability of a new generation of memory that is 1000 times faster than traditional DDRAM which can deliver This new NanoRam has many excellent properties that would terabytes of storage capacity, and consumes very little power, make an excellent replacement for the current DDRAM: being has the potential to change the future of the computer’s non-volatile, its large capacity, high speed read / write cycles. operating system. This paper studies the different changes that All the properties are introduced in the next section. will arise on the operating system functions; memory By replacing this NanoRAM instead of DDRAM in the CPU, management and job scheduling (especially context switch) this will affect the functionality of the operating system; such when integrating NanoRAM into the computer system. It is as main memory management, virtual memory, job scheduling, also looking forward to evaluating the possible enhancements secondary storage management; and thus the efficiency of the of computer’s performance with NanoRAM. -
Data Storage the CPU-Memory
Data Storage •Disks • Hard disk (HDD) • Solid state drive (SSD) •Random Access Memory • Dynamic RAM (DRAM) • Static RAM (SRAM) •Registers • %rax, %rbx, ... Sean Barker 1 The CPU-Memory Gap 100,000,000.0 10,000,000.0 Disk 1,000,000.0 100,000.0 SSD Disk seek time 10,000.0 SSD access time 1,000.0 DRAM access time Time (ns) Time 100.0 DRAM SRAM access time CPU cycle time 10.0 Effective CPU cycle time 1.0 0.1 CPU 0.0 1985 1990 1995 2000 2003 2005 2010 2015 Year Sean Barker 2 Caching Smaller, faster, more expensive Cache 8 4 9 10 14 3 memory caches a subset of the blocks Data is copied in block-sized 10 4 transfer units Larger, slower, cheaper memory Memory 0 1 2 3 viewed as par@@oned into “blocks” 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 3 Cache Hit Request: 14 Cache 8 9 14 3 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 4 Cache Miss Request: 12 Cache 8 12 9 14 3 12 Request: 12 Memory 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sean Barker 5 Locality ¢ Temporal locality: ¢ Spa0al locality: Sean Barker 6 Locality Example (1) sum = 0; for (i = 0; i < n; i++) sum += a[i]; return sum; Sean Barker 7 Locality Example (2) int sum_array_rows(int a[M][N]) { int i, j, sum = 0; for (i = 0; i < M; i++) for (j = 0; j < N; j++) sum += a[i][j]; return sum; } Sean Barker 8 Locality Example (3) int sum_array_cols(int a[M][N]) { int i, j, sum = 0; for (j = 0; j < N; j++) for (i = 0; i < M; i++) sum += a[i][j]; return sum; } Sean Barker 9 The Memory Hierarchy The Memory Hierarchy Smaller On 1 cycle to access CPU Chip Registers Faster Storage Costlier instrs can L1, L2 per byte directly Cache(s) ~10’s of cycles to access access (SRAM) Main memory ~100 cycles to access (DRAM) Larger Slower Flash SSD / Local network ~100 M cycles to access Cheaper Local secondary storage (disk) per byte slower Remote secondary storage than local (tapes, Web servers / Internet) disk to access Sean Barker 10. -
Managing the Memory Hierarchy
Managing the Memory Hierarchy Jeffrey S. Vetter Sparsh Mittal, Joel Denny, Seyong Lee Presented to SOS20 Asheville 24 Mar 2016 ORNL is managed by UT-Battelle for the US Department of Energy http://ft.ornl.gov [email protected] Exascale architecture targets circa 2009 2009 Exascale Challenges Workshop in San Diego Attendees envisioned two possible architectural swim lanes: 1. Homogeneous many-core thin-node system 2. Heterogeneous (accelerator + CPU) fat-node system System attributes 2009 “Pre-Exascale” “Exascale” System peak 2 PF 100-200 PF/s 1 Exaflop/s Power 6 MW 15 MW 20 MW System memory 0.3 PB 5 PB 32–64 PB Storage 15 PB 150 PB 500 PB Node performance 125 GF 0.5 TF 7 TF 1 TF 10 TF Node memory BW 25 GB/s 0.1 TB/s 1 TB/s 0.4 TB/s 4 TB/s Node concurrency 12 O(100) O(1,000) O(1,000) O(10,000) System size (nodes) 18,700 500,000 50,000 1,000,000 100,000 Node interconnect BW 1.5 GB/s 150 GB/s 1 TB/s 250 GB/s 2 TB/s IO Bandwidth 0.2 TB/s 10 TB/s 30-60 TB/s MTTI day O(1 day) O(0.1 day) 2 Memory Systems • Multimode memories – Fused, shared memory – Scratchpads – Write through, write back, etc – Virtual v. Physical, paging strategies – Consistency and coherence protocols • 2.5D, 3D Stacking • HMC, HBM/2/3, LPDDR4, GDDR5, WIDEIO2, etc https://www.micron.com/~/media/track-2-images/content-images/content_image_hmc.jpg?la=en • New devices (ReRAM, PCRAM, Xpoint) J.S. -
Let's Talk About Storage & Recovery Methods for Non-Volatile Memory
Let’s Talk About Storage & Recovery Methods for Non-Volatile Memory Database Systems Joy Arulraj Andrew Pavlo Subramanya R. Dulloor [email protected] [email protected] [email protected] Carnegie Mellon University Carnegie Mellon University Intel Labs ABSTRACT of power, the DBMS must write that data to a non-volatile device, The advent of non-volatile memory (NVM) will fundamentally such as a SSD or HDD. Such devices only support slow, bulk data change the dichotomy between memory and durable storage in transfers as blocks. Contrast this with volatile DRAM, where a database management systems (DBMSs). These new NVM devices DBMS can quickly read and write a single byte from these devices, are almost as fast as DRAM, but all writes to it are potentially but all data is lost once power is lost. persistent even after power loss. Existing DBMSs are unable to take In addition, there are inherent physical limitations that prevent full advantage of this technology because their internal architectures DRAM from scaling to capacities beyond today’s levels [46]. Using are predicated on the assumption that memory is volatile. With a large amount of DRAM also consumes a lot of energy since it NVM, many of the components of legacy DBMSs are unnecessary requires periodic refreshing to preserve data even if it is not actively and will degrade the performance of data intensive applications. used. Studies have shown that DRAM consumes about 40% of the To better understand these issues, we implemented three engines overall power consumed by a server [42]. in a modular DBMS testbed that are based on different storage Although flash-based SSDs have better storage capacities and use management architectures: (1) in-place updates, (2) copy-on-write less energy than DRAM, they have other issues that make them less updates, and (3) log-structured updates. -
Embedded DRAM
Embedded DRAM Raviprasad Kuloor Semiconductor Research and Development Centre, Bangalore IBM Systems and Technology Group DRAM Topics Introduction to memory DRAM basics and bitcell array eDRAM operational details (case study) Noise concerns Wordline driver (WLDRV) and level translators (LT) Challenges in eDRAM Understanding Timing diagram – An example References Slide 1 Acknowledgement • John Barth, IBM SRDC for most of the slides content • Madabusi Govindarajan • Subramanian S. Iyer • Many Others Slide 2 Topics Introduction to memory DRAM basics and bitcell array eDRAM operational details (case study) Noise concerns Wordline driver (WLDRV) and level translators (LT) Challenges in eDRAM Understanding Timing diagram – An example Slide 3 Memory Classification revisited Slide 4 Motivation for a memory hierarchy – infinite memory Memory store Processor Infinitely fast Infinitely large Cycles per Instruction Number of processor clock cycles (CPI) = required per instruction CPI[ ∞ cache] Finite memory speed Memory store Processor Finite speed Infinite size CPI = CPI[∞ cache] + FCP Finite cache penalty Locality of reference – spatial and temporal Temporal If you access something now you’ll need it again soon e.g: Loops Spatial If you accessed something you’ll also need its neighbor e.g: Arrays Exploit this to divide memory into hierarchy Hit L2 L1 (Slow) Processor Miss (Fast) Hit Register Cache size impacts cycles-per-instruction Access rate reduces Slower memory is sufficient Cache size impacts cycles-per-instruction For a 5GHz -
Computer Conservation Society
Issue Number 88 Winter 2019/20 Computer Conservation Society Aims and Objectives The Computer Conservation Society (CCS) is a co-operative venture between BCS, The Chartered Institute for IT; the Science Museum of London; and the Science and Industry Museum (SIM) in Manchester. The CCS was constituted in September 1989 as a Specialist Group of the British Computer Society. It is thus covered by the Royal Charter and charitable status of BCS. The objects of the Computer Conservation Society (“Society”) are: To promote the conservation, restoration and reconstruction of historic computing systems and to identify existing computing systems which may need to be archived in the future; To develop awareness of the importance of historic computing systems; To develop expertise in the conservation, restoration and reconstruction of historic computing systems; To represent the interests of the Society with other bodies; To promote the study of historic computing systems, their use and the history of the computer industry; To publish information of relevance to these objectives for the information of Society members and the wider public. Membership is open to anyone interested in computer conservation and the history of computing. The CCS is funded and supported by a grant from BCS and from donations. There are a number of active projects on specific computer restorations and early computer technologies and software. Younger people are especially encouraged to take part in order to achieve skills transfer. The CCS also enjoys a close relationship with the National Museum of Computing. Resurrection The Journal of the Computer Conservation Society ISSN 0958-7403 Number 88 Winter 2019/20 Contents Society Activity 2 News Round-Up 9 The Data Curator 10 Paul Cockshott From Tea Shops to Computer Company: The Improbable 15 Story of LEO John Aeberhard Book Review: Early Computing in Britain Ferranti Ltd. -
The Memory Hierarchy
Chapter 6 The Memory Hierarchy To this point in our study of systems, we have relied on a simple model of a computer system as a CPU that executes instructions and a memory system that holds instructions and data for the CPU. In our simple model, the memory system is a linear array of bytes, and the CPU can access each memory location in a constant amount of time. While this is an effective model as far as it goes, it does not reflect the way that modern systems really work. In practice, a memory system is a hierarchy of storage devices with different capacities, costs, and access times. CPU registers hold the most frequently used data. Small, fast cache memories nearby the CPU act as staging areas for a subset of the data and instructions stored in the relatively slow main memory. The main memory stages data stored on large, slow disks, which in turn often serve as staging areas for data stored on the disks or tapes of other machines connected by networks. Memory hierarchies work because well-written programs tend to access the storage at any particular level more frequently than they access the storage at the next lower level. So the storage at the next level can be slower, and thus larger and cheaper per bit. The overall effect is a large pool of memory that costs as much as the cheap storage near the bottom of the hierarchy, but that serves data to programs at the rate of the fast storage near the top of the hierarchy. -
Lecture 10: Memory Hierarchy -- Memory Technology and Principal of Locality
Lecture 10: Memory Hierarchy -- Memory Technology and Principal of Locality CSCE 513 Computer Architecture Department of Computer Science and Engineering Yonghong Yan [email protected] https://passlab.github.io/CSCE513 1 Topics for Memory Hierarchy • Memory Technology and Principal of Locality – CAQA: 2.1, 2.2, B.1 – COD: 5.1, 5.2 • Cache Organization and Performance – CAQA: B.1, B.2 – COD: 5.2, 5.3 • Cache Optimization – 6 Basic Cache Optimization Techniques • CAQA: B.3 – 10 Advanced Optimization Techniques • CAQA: 2.3 • Virtual Memory and Virtual Machine – CAQA: B.4, 2.4; COD: 5.6, 5.7 – Skip for this course 2 The Big Picture: Where are We Now? Processor Input Control Memory Datapath Output • Memory system – Supplying data on time for computation (speed) – Large enough to hold everything needed (capacity) 3 Overview • Programmers want unlimited amounts of memory with low latency • Fast memory technology is more expensive per bit than slower memory • Solution: organize memory system into a hierarchy – Entire addressable memory space available in largest, slowest memory – Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor • Temporal and spatial locality insures that nearly all references can be found in smaller memories – Gives the allusion of a large, fast memory being presented to the processor 4 Memory Hierarchy Memory Hierarchy 5 Memory Technology • Random Access: access time is the same for all locations • DRAM: Dynamic Random Access Memory – High density, low power, cheap, slow – Dynamic: need to be “refreshed” regularly – 50ns – 70ns, $20 – $75 per GB • SRAM: Static Random Access Memory – Low density, high power, expensive, fast – Static: content will last “forever”(until lose power) – 0.5ns – 2.5ns, $2000 – $5000 per GB Ideal memory: • Magnetic disk • Access time of SRAM – 5ms – 20ms, $0.20 – $2 per GB • Capacity and cost/GB of disk 6 Static RAM (SRAM) 6-Transistor Cell – 1 Bit 6-Transistor SRAM Cell word word 0 1 (row select) 0 1 bit bit bit bit • Write: 1. -
Open Poremba-Dissertation.Pdf
The Pennsylvania State University The Graduate School ARCHITECTING BYTE-ADDRESSABLE NON-VOLATILE MEMORIES FOR MAIN MEMORY A Dissertation in Computer Science and Engineering by Matthew Poremba c 2015 Matthew Poremba Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2015 The dissertation of Matthew Poremba was reviewed and approved∗ by the following: Yuan Xie Professor of Computer Science and Engineering Dissertation Co-Advisor, Co-Chair of Committee John Sampson Assistant Professor of Computer Science and Engineering Dissertation Co-Advisor, Co-Chair of Committee Mary Jane Irwin Professor of Computer Science and Engineering Robert E. Noll Professor Evan Pugh Professor Vijaykrishnan Narayanan Professor of Computer Science and Engineering Kennith Jenkins Professor of Electrical Engineering Lee Coraor Associate Professor of Computer Science and Engineering Director of Academic Affairs ∗Signatures are on file in the Graduate School. Abstract New breakthroughs in memory technology in recent years has lead to increased research efforts in so-called byte-addressable non-volatile memories (NVM). As a result, questions of how and where these types of NVMs can be used have been raised. Simultaneously, semiconductor scaling has lead to an increased number of CPU cores on a processor die as a way to utilize the area. This has increased the pressure on the memory system and causing growth in the amount of main memory that is available in a computer system. This growth has escalated the amount of power consumed by the system by the de facto DRAM type memory. Moreover, DRAM memories have run into physical limitations on scalability due to the nature of their operation. -
Chapter 2: Memory Hierarchy Design
Computer Architecture A Quantitative Approach, Fifth Edition Chapter 2 Memory Hierarchy Design Copyright © 2012, Elsevier Inc. All rights reserved. 1 Contents 1. Memory hierarchy 1. Basic concepts 2. Design techniques 2. Caches 1. Types of caches: Fully associative, Direct mapped, Set associative 2. Ten optimization techniques 3. Main memory 1. Memory technology 2. Memory optimization 3. Power consumption 4. Memory hierarchy case studies: Opteron, Pentium, i7. 5. Virtual memory 6. Problem solving dcm 2 Introduction Introduction Programmers want very large memory with low latency Fast memory technology is more expensive per bit than slower memory Solution: organize memory system into a hierarchy Entire addressable memory space available in largest, slowest memory Incrementally smaller and faster memories, each containing a subset of the memory below it, proceed in steps up toward the processor Temporal and spatial locality insures that nearly all references can be found in smaller memories Gives the allusion of a large, fast memory being presented to the processor Copyright © 2012, Elsevier Inc. All rights reserved. 3 Memory hierarchy Processor L1 Cache L2 Cache L3 Cache Main Memory Latency Hard Drive or Flash Capacity (KB, MB, GB, TB) Copyright © 2012, Elsevier Inc. All rights reserved. 4 PROCESSOR L1: I-Cache D-Cache I-Cache instruction cache D-Cache data cache U-Cache unified cache L2: U-Cache Different functional units fetch information from I-cache and D-cache: decoder and L3: U-Cache scheduler operate with I- cache, but integer execution unit and floating-point unit Main: Main Memory communicate with D-cache. Copyright © 2012, Elsevier Inc. All rights reserved. -
Protecting Non-Volatile Memory Against Both Hard and Soft Errors
FREE-p: Protecting Non-Volatile Memory against both Hard and Soft Errors Doe Hyun Yoon† Naveen Muralimanohar‡ Jichuan Chang‡ [email protected] [email protected] [email protected] Parthasarathy Ranganathan‡ Norman P. Jouppi‡ Mattan Erez† [email protected] [email protected] [email protected] †The University of Texas at Austin ‡Hewlett-Packard Labs Electrical and Computer Engineering Dept. Intelligent Infrastructure Lab. Abstract relies on integrating custom error-tolerance functionality within memory devices – an idea that the Emerging non-volatile memories such as phase- memory industry is historically loath to accept because change RAM (PCRAM) offer significant advantages but of strong demand to optimize cost per bit; (2) it ignores suffer from write endurance problems. However, prior soft errors (in both peripheral circuits and cells), which solutions are oblivious to soft errors (recently raised as can cause errors in NVRAM as shown in recent studies; a potential issue even for PCRAM) and are and (3) it requires extra storage to support chipkill that incompatible with high-level fault tolerance techniques enables a memory DIMM to function even when a such as chipkill. To additionally address such failures device fails. We propose Fine-grained Remapping with requires unnecessarily high costs for techniques that ECC and Embedded-Pointers (FREE-p) to address all focus singularly on wear-out tolerance. three problems. Fine-grained remapping nearly eliminates storage overhead for avoiding wear-out In this paper, we propose fine-grained remapping errors. Our unique error checking and correcting (ECC) with ECC and embedded pointers (FREE-p). FREE-p component can tolerate wear-out errors, soft errors, and remaps fine-grained worn-out NVRAM blocks without device failures.