In-DRAM Cache Management for Low Latency and Low Power 3D-Stacked Drams

Total Page:16

File Type:pdf, Size:1020Kb

In-DRAM Cache Management for Low Latency and Low Power 3D-Stacked Drams micromachines Article In-DRAM Cache Management for Low Latency and Low Power 3D-Stacked DRAMs Ho Hyun Shin 1,2 and Eui-Young Chung 2,* 1 Samsung Electronics Company, Ltd., Hwasung 18448, Korea; [email protected] 2 School of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea * Correspondence: [email protected]; Tel.: +82-2-2123-5866 Received: 24 December 2018; Accepted: 5 February 2019; Published: 14 February 2019 Abstract: Recently, 3D-stacked dynamic random access memory (DRAM) has become a promising solution for ultra-high capacity and high-bandwidth memory implementations. However, it also suffers from memory wall problems due to long latency, such as with typical 2D-DRAMs. Although there are various cache management techniques and latency hiding schemes to reduce DRAM access time, in a high-performance system using high-capacity 3D-stacked DRAM, it is ultimately essential to reduce the latency of the DRAM itself. To solve this problem, various asymmetric in-DRAM cache structures have recently been proposed, which are more attractive for high-capacity DRAMs because they can be implemented at a lower cost in 3D-stacked DRAMs. However, most research mainly focuses on the architecture of the in-DRAM cache itself and does not pay much attention to proper management methods. In this paper, we propose two new management algorithms for the in-DRAM caches to achieve a low-latency and low-power 3D-stacked DRAM device. Through the computing system simulation, we demonstrate the improvement of energy delay product up to 67%. Keywords: 3D-stacked; DRAM; in-DRAM cache; low-latency; low-power 1. Introduction The latency of dynamic random access memory (DRAM) has been a critical issue for two primary reasons [1]. Firstly, while the processing speed of central processing unit (CPU) has been continuously improved, DRAM latency has remained relatively unchanged for decades. This speed gap, called the memory wall, causes significant bottlenecks in the overall computing performance [2,3]. As shown in Figure1a, while the capacity and bandwidth have increased 16 and 6 times over time, respectively, the timing constraints representing the DRAM latency, row address to column address delay (tRCD) and row cycle time (tRC), have only been improved by 11.2% and 20.0%, respectively [4–7]. Secondly, the processing speed of big data workloads is affected by the memory latency, as well as bandwidth. Russell et al. proved that the instructions per cycle of the applications dealing with big data could be significantly improved by reducing the DRAM latency [8]. This is because the data stream of big data is likely to have large dependency between its elements. In particular, on-line transaction processing (OLTP), which supports high transaction-oriented applications, is a representative example of latency-sensitive applications [9]. In addition, recent AI applications require large amounts of memory to handle large amounts of data, and require low latency to provide real-time data processing. In other words, we expect to see an increasing number of applications that simultaneously demand high capacity and low latency. Micromachines 2019, 10, 124; doi:10.3390/mi10020124 www.mdpi.com/journal/micromachines Micromachines 2019, 10, 124 2 of 15 10 25 8 Capacity 20 6 Bandwidth 15 4 10 Capacity (GB) 2 5 0 0 Bandwidth (GB/sec) DDR‐400 DDR2‐800 DDR3‐1600 DDR4‐2400 (a) 18 70 16 50 14 30 tRC (ns) tRCD (ns) tRCD tRC 12 10 DDR‐400 DDR2‐800 DDR3‐1600 DDR4‐2400 (b) Figure 1. Comparison of dynamic random access memory (DRAM) capacity, bandwidth, and latency improvement by DRAM generation [4–7]. (a) Capacity and bandwidth of DRAM. (b) DRAM access latency: row address to column address delay (tRCD) and row cycle time (tRC). DRAM devices are being transformed into various structures as a result of recent developments in die stacking through silicon via (TSV) [10]. For example, the die stacking of homogeneous DRAM chips extends their capacity without power and performance losses [11,12]. Moreover, a heterogeneous combination of logic and DRAM dies, such as for a high-bandwidth memory (HBM) or hybrid memory cube (HMC), increases the data bandwidth without a significant power overhead [13,14]. The meaning of the power implied above is precisely the power relative to the performance value, such as capacity and bandwidth. For example, when comparing Graphic Double Data Rate 5 (GDDR5) and HBM with the same capacity and bandwidth performance, HBM’s power consumption is significantly smaller. However, though they have enhanced the memory sub-system in terms of capacity and bandwidth, the latency improvements have been neglected. In order to overcome the long latency problem of DRAM, many computers embed numerous caches in the CPU. The cache not only overcomes the long latency of DRAM, but it also provides data locality for the pre-fetched pages. Thus, it offers large bandwidth locally in a CPU. However, since a typical cache is implemented using static random access memory (SRAM), it incurs large costs and consumes a high amount of leakage power. As a result, it is essential to reduce the DRAM latency itself to improve memory access latency (In this paper, DRAM latency refers to the time required for a DRAM controller to read or write data to a DRAM device, and memory access latency represents the latency required to access the data of the cache or DRAM by the processor instructions.). The in-DRAM cache, which is embedded in a DRAM device, has several unique characteristics that differ from the processor cache [15]. First, the cache itself is placed in the DRAM, but its operation is managed by the DRAM controller. This is because the interface between the controller and the DRAM follows the DRAM timing constraints specified in joint electron device engineering council (JEDEC), which maintains high compatibility with the current computing system. Of course, there are various ways to implement the in-DRAM cache and its manager, such as operating systems (OS) or processor modifications. However, such methods require many modifications to the current computing system, and eventually degrade compatibility. We designed the manager to the DRAM controller so that the proposed method could follow the JEDEC specification, and implemented the in-DRAM cache in the DRAM device. Micromachines 2019, 10, 124 3 of 15 Secondly, the capacity of the in-DRAM cache increases proportionally to the DRAM capacity and is much larger than the processor cache. For example, when hundreds of gigabytes of DRAM are mounted in a system, while the memory capacity of the processor cache remains constant at several hundred megabytes, the capacity of the in-DRAM cache can be up to tens of gigabytes. However, this large-capacity in-DRAM cache requires a larger tag size. This results in long tag access latency, which in turn increases the overall memory access latency. To overcome this problem, the data transfer granularity between the DRAM and in-DRAM cache, which is called cache block size, must be increased. However, this causes significant power consumption. Power issues in DRAMs are very important in terms of minimizing the energy consumed by the DRAM chip itself, and are also critical parameters for 3D-stacked DRAMs from a thermal point of view. Since a 3D-stacked DRAM chip consists of several dies, it is very difficult to emit the heat generated inside the chip to the outside. This heat degrades the retention characteristics of the DRAM cells, and thus DRAM requires a shorter refresh cycle. However, reducing the refresh cycle of the high-capacity 3D-stacked DRAM results in more heat, which causes the retention time of the DRAM cell to decrease again. Therefore, thermal problems in 3D-stacked DRAMs are very sensitive design parameters and must be overcome. Considering various properties of the in-DRAM cache, this paper proposes two new in-DRAM cache management algorithms for the data replacement, particularly to maximize its efficiency and minimize its energy consumption. In addition, the proposed management algorithms are not tied to a specific in-DRAM cache architecture, and can be appropriately adapted to general architectures. 2. Background and In-Dynamic Random Access Memory (DRAM) Cache Architecture A DRAM chip consists of the DRAM cell array area and peripheral circuits, including several in-out ports (Figure2). Here, the DRAM cell region is composed of a plurality of sub-arrays, including DRAM cells and bit-line sense amplifiers. As mentioned in Section1, DRAM latency improvements are very slow, and there are many reasons for this. The reason for the slow latency improvement is directly related to cost and power consumption [16,17]. In order to reduce the sensing and pre-charge time, for example, the number of cells connected per bit-line should be reduced [18]. However, this leads to an increase in the number of bit-line sense amplifiers, and thus increases the chip size. Moreover, timing constraints, such as CAS latency (tCL) are mainly influenced by the speed of the data path. In order to improve this speed, the capacitive metal loading of the data path signal should be decreased, or its driver strength should be increased. However, these approaches may increase the cost or power consumption. Consequently, the latency of a DRAM device must be optimized with the simultaneous consideration of multiple side-effects. In this paper, we focus on the in-DRAM cache among various skills to reduce the latency of DRAM, and discuss its management method. Micromachines 2019, 10, 124 4 of 15 bit line sense amplifier DRAM DRAM DRAM DRAM bit linebit cell cell cell cell word line array array array array bit line sense amplifier In-out port DRAM cell array Peripheral bit line sense amplifier circuits …… DRAM DRAM DRAM DRAM cell cell cell cell array array array array DRAM cell array bit line sense amplifier Figure 2.
Recommended publications
  • Zynq-7000 All Programmable Soc and 7 Series Devices Memory Interface Solutions User Guide (UG586) [Ref 2]
    Zynq-7000 AP SoC and 7 Series Devices Memory Interface Solutions (v4.1) DS176 April 4, 2018 Advance Product Specification • I/O Power Reduction option reduces average I/O Introduction power by automatically disabling DQ/DQS IBUFs and The Xilinx® Zynq®-7000 All Programmable SoC and internal terminations during writes and periods of 7 series FPGAs memory interface solutions cores provide inactivity high-performance connections to DDR3 and DDR2 • Internal VREF support SDRAMs, QDR II+ SRAM, RLDRAM II/RLDRAM 3, and • LPDDR2 SDRAM. Multicontroller support for up to eight controllers • Two controller request processing modes: DDR3 and DDR2 SDRAMs o Normal: reorder requests to optimize system throughput and latency This section discusses the features, applications, and functional description of Xilinx 7 series FPGAs memory o Strict: memory requests are processed in the order interface solutions in DDR3 and DDR2 SDRAMs. These received solutions are available with an optional AXI4 slave interface. LogiCORE™ IP Facts Table Core Specifics DDR3 SDRAM Features Supported Zynq®-7000 All Programmable SoC (1) • Component support for interface widths up to 72 bits Device Family 7series(2) FPGAs • Supported DDR3 Component and DIMM, DDR2 Single and dual rank UDIMM, RDIMM, and SODIMM Memory support Component and DIMM, QDR II+, RLDRAM II, RLDRAM 3, and LPDDR2 SDRAM Components • DDR3 (1.5V) and DDR3L (1.35V) Resources See Table 1. • 1, 2, 4, and 8 Gb density device support Provided with Core • 8-bank support Documentation Product Specification • x8 and x16 device
    [Show full text]
  • Micron Technology Inc
    MICRON TECHNOLOGY INC FORM 10-K (Annual Report) Filed 10/26/10 for the Period Ending 09/02/10 Address 8000 S FEDERAL WAY PO BOX 6 BOISE, ID 83716-9632 Telephone 2083684000 CIK 0000723125 Symbol MU SIC Code 3674 - Semiconductors and Related Devices Industry Semiconductors Sector Technology Fiscal Year 03/10 http://www.edgar-online.com © Copyright 2010, EDGAR Online, Inc. All Rights Reserved. Distribution and use of this document restricted under EDGAR Online, Inc. Terms of Use. UNITED STATES SECURITIES AND EXCHANGE COMMISSION Washington, D.C. 20549 FORM 10-K (Mark One) ANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the fiscal year ended September 2, 2010 OR TRANSITION REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 For the transition period from to Commission file number 1-10658 Micron Technology, Inc. (Exact name of registrant as specified in its charter) Delaware 75 -1618004 (State or other jurisdiction of (IRS Employer incorporation or organization) Identification No.) 8000 S. Federal Way, Boise, Idaho 83716 -9632 (Address of principal executive offices) (Zip Code) Registrant ’s telephone number, including area code (208) 368 -4000 Securities registered pursuant to Section 12(b) of the Act: Title of each class Name of each exchange on which registered Common Stock, par value $.10 per share NASDAQ Global Select Market Securities registered pursuant to Section 12(g) of the Act: None (Title of Class) Indicate by check mark if the registrant is a well-known seasoned issuer, as defined in Rule 405 of the Securities Act.
    [Show full text]
  • CPU Clock Rate DRAM Access Latency Growing
    CPU clock rate . Apple II – MOS 6502 (1975) 1~2MHz . Original IBM PC (1981) – Intel 8080 4.77MHz CS/COE1541: Introduction to . Intel 486 50MHz Computer Architecture . DEC Alpha Memory hierarchy • EV4 (1991) 100~200MHz • EV5 (1995) 266~500MHz • EV6 (1998) 450~600MHz Sangyeun Cho • EV67 (1999) 667~850MHz Computer Science Department University of Pittsburgh . Today’s processors – 2~5GHz CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh 2 DRAM access latency Growing gap . How long would it take to fetch a memory block from main memory? • Time to determine that data is not on-chip (cache miss resolution time) • Time to talk to memory controller (possibly on a separate chip) • Possible queuing delay at memory controller • Time for opening a DRAM row • Time for selecting a portion in the selected row (CAS latency) • Time to transfer data from DRAM to CPU Possibly through a separate chip • Time to fill caches as needed . The total latency can be anywhere from 100~400 CPU cycles CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh CS/CoE1541: Intro. to Computer Architecture University of Pittsburgh 3 4 Idea of memory hierarchy Memory hierarchy goals . To create an illusion of “fast and large” main memory Smaller CPU Faster Regs • As fast as a small SRAM-based memory More expensive per byte • As large (and cheap) as DRAM-based memory L1 cache . To provide CPU with necessary data and instructions as SRAM quickly as possible • Frequently used data must be placed near CPU L2 cache • “Cache hit” when CPU finds its data in cache SRAM • Cache hit rate = # of cache hits/# cache accesses • Avg.
    [Show full text]
  • Understanding and Exploiting Design-Induced Latency Variation in Modern DRAM Chips
    Understanding and Exploiting Design-Induced Latency Variation in Modern DRAM Chips Donghyuk Leeyz Samira Khan@ Lavanya Subramaniany Saugata Ghosey Rachata Ausavarungniruny Gennady Pekhimenkoy{ Vivek Seshadriy{ Onur Mutluyx yCarnegie Mellon University zNVIDIA @University of Virginia {Microsoft Research xETH Zürich ABSTRACT 40.0%/60.5%, which translates to an overall system perfor- Variation has been shown to exist across the cells within mance improvement of 14.7%/13.7%/13.8% (in 2-/4-/8-core a modern DRAM chip. Prior work has studied and exploited systems) across a variety of workloads, while ensuring reli- several forms of variation, such as manufacturing-process- able operation. or temperature-induced variation. We empirically demon- strate a new form of variation that exists within a real DRAM 1 INTRODUCTION chip, induced by the design and placement of different compo- In modern systems, DRAM-based main memory is sig- nents in the DRAM chip: different regions in DRAM, based nificantly slower than the processor. Consequently, pro- on their relative distances from the peripheral structures, cessors spend a long time waiting to access data from require different minimum access latencies for reliable oper- main memory [5, 66], making the long main memory ac- ation. In particular, we show that in most real DRAM chips, cess latency one of the most critical bottlenecks in achiev- cells closer to the peripheral structures can be accessed much ing high performance [48, 64, 67]. Unfortunately, the la- faster than cells that are farther. We call this phenomenon tency of DRAM has remained almost constant in the past design-induced variation in DRAM.
    [Show full text]
  • Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access ∗
    Leveraging Heterogeneity in DRAM Main Memories to Accelerate Critical Word Access ∗ Niladrish Chatterjee‡ Manjunath Shevgoor‡ Rajeev Balasubramonian‡ Al Davis‡ Zhen Fang§† Ramesh Illikkal∞ Ravi Iyer∞ ‡University of Utah §Nvidia Corporation ∞Intel Labs {nil,shevgoor,rajeev,ald}@cs.utah.edu [email protected] {ramesh.g.illikkal,ravishankar.iyer}@intel.com Abstract of DRAM chips to build and exploit a heterogeneous memory sys- tem. This paper takes an important step in uncovering the potential The DRAM main memory system in modern servers is largely ho- of such a heterogeneous DRAM memory system. mogeneous. In recent years, DRAM manufacturers have produced The DRAM industry already produces chips with varying proper- chips with vastly differing latency and energy characteristics. This ties. Micron offers a Reduced Latency DRAM (RLDRAM) product provides the opportunity to build a heterogeneous main memory sys- that offers lower latency and lower capacity, and is targeted at high tem where different parts of the address space can yield different performance routers, switches, and network processing [7]. Mi- latencies and energy per access. The limited prior work in this area cron also offers a Low Power DRAM (LPDRAM) product that of- has explored smart placement of pages with high activities. In this fers lower energy and longer latencies and that has typically been paper, we propose a novel alternative to exploit DRAM heterogene- employed in the mobile market segment [6]. Our work explores ity. We observe that the critical word in a cache line can be easily innovations that can exploit a main memory that includes regular recognized beforehand and placed in a low-latency region of the DDR chips as well as RLDRAM and LPDRAM chips.
    [Show full text]
  • 4Gb: X16 DDR4 SDRAM Features DDR4 SDRAM EDY4016A - 256Mb X 16
    4Gb: x16 DDR4 SDRAM Features DDR4 SDRAM EDY4016A - 256Mb x 16 Features • Databus write cyclic redundancy check (CRC) • Per-DRAM addressability •VDD = VDDQ = 1.2V ±60mV • Connectivity test •VPP = 2.5V, –125mV/+250mV • JEDEC JESD-79-4 compliant • On-die, internal, adjustable VREFDQ generation • 1.2V pseudo open-drain I/O Options1 Marking •TC of 0°C to 95°C • Revision A – 64ms, 8192-cycle refresh at 0°C to 85°C • FBGA package size – 32ms at 85°C to 95°C – 96-ball (7.5mm x 13.5mm) BG • 8 internal banks: 2 groups of 4 banks each • Timing – cycle time •8n-bit prefetch architecture – 0.625ns @ CL = 24 (DDR4-3200) -JD • Programmable data strobe preambles – 0.750ns @ CL = 19 (DDR4-2666) -GX • Data strobe preamble training – 0.833ns @ CL = 16 (DDR4-2400) -DR • Command/Address latency (CAL) • Packaging • Multipurpose register READ and WRITE capability – Lead-free (RoHS-compliant) and hal- - F • Write and read leveling ogen-free • Self refresh mode • Low-power auto self refresh (LPASR) Notes: 1. Not all options listed can be combined to • Temperature controlled refresh (TCR) define an offered product. Use the part • Fine granularity refresh catalog search on http://www.micron.com • Self refresh abort for available offerings. • Maximum power saving 2. Restricted and limited availability. • Output driver calibration • Nominal, park, and dynamic on-die termination (ODT) • Data bus inversion (DBI) for data bus • Command/Address (CA) parity Table 1: Key Timing Parameters Speed Grade Data Rate (MT/s) Target tRCD-tRP-CL tRCD (ns) tRP (ns) CL (ns) -JD1 3200 24-24-24 15.0 15.0 15.0 -GX2 2666 19-19-19 14.25 14.25 14.25 -DR3 2400 16-16-16 13.32 13.32 13.32 Notes: 1.
    [Show full text]
  • Dynamic Rams from Asynchrounos to DDR4
    Dynamic RAMs From Asynchrounos to DDR4 PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Sun, 10 Feb 2013 17:59:42 UTC Contents Articles Dynamic random-access memory 1 Synchronous dynamic random-access memory 14 DDR SDRAM 27 DDR2 SDRAM 33 DDR3 SDRAM 37 DDR4 SDRAM 43 References Article Sources and Contributors 48 Image Sources, Licenses and Contributors 49 Article Licenses License 50 Dynamic random-access memory 1 Dynamic random-access memory Dynamic random-access memory (DRAM) is a type of random-access memory that stores each bit of data in a separate capacitor within an integrated circuit. The capacitor can be either charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. Since capacitors leak charge, the information eventually fades unless the capacitor charge is refreshed periodically. Because of this refresh requirement, it is a dynamic memory as opposed to SRAM and other static memory. The main memory (the "RAM") in personal computers is dynamic RAM (DRAM). It is the RAM in laptop and workstation computers as well as some of the RAM of video game consoles. The advantage of DRAM is its structural simplicity: only one transistor and a capacitor are required per bit, compared to four or six transistors in SRAM. This allows DRAM to reach very high densities. Unlike flash memory, DRAM is volatile memory (cf. non-volatile memory), since it loses its data quickly when power is removed. The transistors and capacitors used are extremely small; billions can fit on a single memory chip.
    [Show full text]
  • Memory Systems Cache, DRAM, Disk
    Memory Systems Cache, DRAM, Disk Bruce Jacob University of Maryland at College Park Spencer W. Ng Hitachi Global Storage Technologies David T. Wang MetaRAM With Contributions By Samuel Rodriguez Advanced Micro Devices Xmmmk JÜOBSK'1'"'" AMSTERDAM • BOSTON • HEIDELBERG LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO :<* ELSEVIER Morgan Kaufmann is an imprint of Elsevier MORGAN KAUFMANN PUBLISHERS Preface "It's the Memory, Stupid!" xxxi Overview On Memory Systems and Their Design 1 Ov.l Memory Systems 2 Ov.1.1 Locality ofReference Breeds the Memory Hierarchy 2 Ov.1.2 ImportantFigures ofMerit 7 Ov.1.3 The Goal ofa Memory Hierarchy 10 Ov.2 Four Anecdotes on Modular Design 14 Ov.2.1 Anecdote I: Systemic Behaviors Exist 15 Ov.2.2 Anecdote II: The DLL in DDR SDRAM 17 Ov.2.3 Anecdote III: A Catch-22 in the Search for Bandwidth 18 Ov.2.4 Anecdote IV: Proposais to Exploit Variability in CellLeakage 19 Ov.2.5 Perspective 19 Ov.3 Cross-Cutting Issues 20 Ov.3.1 Cost/Performance Analysis 20 Ov.3.2 Power and Energy 26 Ov.3.3 Reliability 32 Ov.3.4 Virtual Memory 34 Ov.4 An Example Holistic Analysis 41 Ov.4.1 Fully-Buffered DIMM vs. the Disk Cache 41 Ov.4.2 FullyBufferedDIMM: Basics 43 Ov.4.3 Disk Caches: Basics 46 Ov.4.4 Experimental Results 47 Ov.4.5 Conclusions 52 Ov.5 What to Expect 54 IX - X Contents Part I Cache 55 Chapter 1 An Overview of Cache Principles 57 1.1 Caches, 'Caches/ and "Caches" 59 1.2 Locality Principles 62 1.2.1 Temporal Locality 63 1.2.2 Spatial Locality 63 1.2.3 Algorithmic Locality 64 1.2.4
    [Show full text]
  • MOCA: Memory Object Classification and Allocation in Heterogeneous Memory Systems
    MOCA: Memory Object Classification and Allocation in Heterogeneous Memory Systems Aditya Narayan∗, Tiansheng Zhang∗, Shaizeen Agay, Satish Narayanasamyy and Ayse K. Coskun∗ ∗Boston University, Boston, MA 02215, USA; Email: fadityan, tszhang, [email protected] yUniversity of Michigan, Ann Arbor, MI 48109, USA; Email: fshaizeen, [email protected] Abstract—In the era of abundant-data computing, main mem- Traditionally, the main memory (e.g., DRAM) of a com- ory’s latency and power significantly impact overall system puting system is composed of a set of homogeneous memory performance and power. Today’s computing systems are typically modules. The key attributes that determine the efficiency of a composed of homogeneous memory modules, which are optimized to provide either low latency, high bandwidth, or low power. Such memory system are latency, bandwidth, and power. An ideal memory modules do not cater to a wide range of applications with memory system should provide the highest data bandwidth diverse memory access behavior. Thus, heterogeneous memory at the lowest latency with minimum power consumption. systems, which include several memory modules with distinct However, there is no such perfect memory system as a memory performance and power characteristics, are becoming promising module with high performance generally has a high power alternatives. In such a system, allocating applications to their best-fitting memory modules improves system performance and density. Hence, memory modules come in different flavors, energy efficiency. However, such an approach still leaves the optimized to either improve system performance or reduce full potential of heterogeneous memory systems under-utilized power consumption. because not only applications, but also the memory objects within Due to diverse memory access behavior across workloads, a that application differ in their memory access behavior.
    [Show full text]
  • Techniquesfor
    TECHNIQUESFOR Design Examples, Signaling and Memory Technologies, Fiber Optics, Modeling and Simulation to Ensure Signal Zntegrity Tom Granberg, P1i.D. PRENTICEHALL PTR UPPER SADDLERIVER, NJ 07458 WWW.PHiTRCOM PTR Preface xxxvii How This Book 1s Organized xxxvii This Textbook Was Written with Educational Institutions in Mind xxxix University Courses for Which This Book 1s Suitable xl Solutions Manual 1s Available xl Cash for Identifying Textbook Errors xl How This Book Was Prepared xli Personal Acknowledgments xli Technical Acknowledgments xliii Part 1 Introduction 1 Chapter 1 Trends in High-Speed Design 1.1 Everything Keeps Getting Faster and Faster! 1.2 Emerging Technologies and Industry Trends 1.2.1 Major Drivers of Printed Circuit Board (PCB) Technology 1.2.2 Drivers of Innovation 1.2.3 110 Signaling Standards 1.2.4 Web Site as Retailer 1.2.5 Memories 1.2.6 On-Die Terminations 1.3 Trends in Bus Architecture 1.3.1 Moving from Parallel to Serial 1.3.2 The Power of Tools 1.3.3 ASSPs and ASMs 1.4 High-Speed Design as an Offshoot from Microwave Theory 1.5 Background Disciplines Needed for High-Speed Design 1.5.1 High-Speed Conferences and Forums 1.6 Book Organization 1.7 Exercises X Contents cha&er 2 ASICs, Backplane Configurations, and SerDes Technology 2.1 Application-Specific Integrated Circuits (ASICs) 2.2 Bus Configurations 2.2.1 Single-Termination Multidrop 2.2.2 Double-Termination Multidrop 2.2.3 Data Distribution with Point-to-Point Links 2.2.4 Multipoint 2.2.5 Switch Matrix Mesh und Fahric Point-to-Point Bus Architectures 2.3 SerDes Devices 2.3.1 SerDes Device Fundamentals 2.3.2 SerDes at 5 Gbps 2.3.3 SerDes Multibit Signal Encoding 2.4 Electrical Interconnects vs.
    [Show full text]
  • Zynq-7000 All Programmable Soc and 7 Series Devices Memory Interface Solutions User Guide (UG586) [Ref 2]
    Zynq-7000 AP SoC and 7 Series Devices Memory Interface Solutions (v2.3) DS176 June 24, 2015 Advance Product Specification Introduction • Strict: memory requests are processed in the order received The Xilinx® Zynq®-7000 All Programmable SoC and 7 series FPGAs memory interface solutions cores provide LogiCORE™ IP Facts Table high-performance connections to DDR3 and DDR2 Core Specifics SDRAMs, QDR II+ SRAM, RLDRAM II/RLDRAM 3, Supported Zynq®-7000All Programmable SoC, Virtex®-7(2), Kintex®-7(2), and LPDDR2 SDRAM. Device ® Family(1) Artix -7 Supported DDR3 Component and DIMM, DDR2 Component and DIMM, DDR3 and DDR2 SDRAMs Memory QDR II+, RLDRAM II, RLDRAM 3, and LPDDR2 SDRAM Components This section discusses the features, applications, and Resources See Table 1. functional description of Xilinx 7 series FPGAs memory interface solutions in DDR3 and DDR2 SDRAMs. These Provided with Core solutions are available with an optional AXI4 slave Documentation Product Specification User Guide interface. Design Files Verilog, VHDL (top-level files only) Example DDR3 SDRAM Features Design Verilog, VHDL (top-level files only) • Component support for interface widths up to 72 bits Test Bench Not Provided • Single and dual rank UDIMM, RDIMM, and Constraints XDC SODIMM support File • DDR3 (1.5V) and DDR3L (1.35V) Supported N/A • 1, 2, 4, and 8 Gb density device support S/W Driver • 8-bank support Tested Design Flows(3) • x8 and x16 device support Design Entry Vivado® Design Suite • 8:1 DQ:DQS ratio support Simulation For supported simulators, see the • Configurable data bus widths (multiples of 8, up to Xilinx Design Tools: Release Notes Guide.
    [Show full text]
  • External Memory Interface Handbook Volume 2: Design Guidelines
    External Memory Interface Handbook Volume 2: Design Guidelines Last updated for Altera Complete Design Suite: 15.0 Subscribe EMI_DG 101 Innovation Drive 2015.05.04 San Jose, CA 95134 Send Feedback www.altera.com TOC-2 Selecting Your Memory Contents Selecting Your Memory.......................................................................................1-1 DDR SDRAM Features............................................................................................................................... 1-2 DDR2 SDRAM Features............................................................................................................................. 1-3 DDR3 SDRAM Features............................................................................................................................. 1-3 QDR, QDR II, and QDR II+ SRAM Features..........................................................................................1-4 RLDRAM II and RLDRAM 3 Features.....................................................................................................1-4 LPDDR2 Features........................................................................................................................................ 1-6 Memory Selection........................................................................................................................................ 1-6 Example of High-Speed Memory in Embedded Processor....................................................................1-9 Example of High-Speed Memory in Telecom......................................................................................
    [Show full text]