Pluto: In-DRAM Lookup Tables to Enable Massively Parallel General

Total Page:16

File Type:pdf, Size:1020Kb

Pluto: In-DRAM Lookup Tables to Enable Massively Parallel General pLUTo: In-DRAM Lookup Tables to Enable Massively Parallel General-Purpose Computation João Dinis Ferreira§ Gabriel Falcao† Juan Gómez-Luna§ Mohammed Alser§ Lois Orosa§ Mohammad Sadrosadati‡§ Jeremie S. Kim§ Geraldo F. Oliveira§ Taha Shahroodi§ Anant Nori⋆ Onur Mutlu§ §ETH Zürich †IT, University of Coimbra ‡Institute for Research in Fundamental Sciences ⋆Intel Data movement between main memory and the processor is in two categories: 1) Processing-near-Memory (PnM), a significant contributor to the execution time and energy con- where computation occurs near the memory array [33, 59, sumption of memory-intensive applications. This data move- 78, 84, 103, 107], and 2) Processing-using-Memory (PuM), ment bottleneck can be alleviated using Processing-in-Memory where computation occurs within the memory array, by ex- (PiM), which enables computation inside the memory chip. ploiting intrinsic properties of the memory technology [31, However, existing PiM architectures often lack support for com- 44, 82, 108, 110]. plex operations, since supporting these operations increases de- In PnM architectures, data is transferred from the DRAM sign complexity, chip area, and power consumption. to nearby processors or specialized accelerators, which are We introduce pLUTo (processing-in-memory with either 1) a part of the DRAM chip, but separate from the lookup table (LUT) operations), a new DRAM substrate memory array [33], or 2) integrated into the logic layer of 3D- that leverages the high area density of DRAM to enable the stacked memories [59,84]. PnM enables the design of flexible massively parallel storing and querying of lookup tables (LUTs). substrates that support a diverse range of operations. How- The use of LUTs enables the efficient execution of complex ever, PnM architectures are limited in functionality and scal- operations in-memory, which has been a long-standing chal- ability: the design and fabrication of memory chips that inte- lenge in the domain of PiM. When running a state-of-the-art grate specialized processing units (such as [33]) has proven to binary neural network in a single DRAM subarray, pLUTo be challenging, and 3D-stacked memories are bound by strict outperforms the baseline CPU and GPU implementations by thermal and area limitations. 33× and 8×, respectively, while simultaneously achieving In contrast, PuM architectures enable computation to oc- energy savings of 110× and 80×. cur within the memory array. Impactful works in this do- main have proposed mechanisms for the execution of bitwise 1. Introduction operations (e.g., AND/OR/XOR)[44,108,110], arithmetic opera- For decades, DRAM has been the predominant technologyfor tions [31,32,82,119],and basic LUT-based operations [32,45]. manufacturing main memory, due to its low cost and high ca- Operations in PuM are usually performed between multiple pacity. Despite recent efforts to create technologies to replace memory rows. This fact, combined with the use of vertical it [64,69,75,96,117], DRAM is expected to continue to be the data layouts and bit-serial computing algorithms [4, 35, 115], de facto main memory technology for the foreseeable future. enables a very high degree of parallelism, since there can be However, despite its high density, DRAM’s latency and band- as many execution lanes as there are bits in each memory width have not kept pace with the rapid improvements to pro- row. However, the flexibility of PuM architectures is limited arXiv:2104.07699v1 [cs.AR] 15 Apr 2021 cessor core speed. This divide creates a bottleneck to system by the range of operations they support, and by the difficulty performance, which has become increasingly limiting in re- of using these operations to express higher-level algorithms cent years due to the rapidly growing sizes of the working of interest with sufficiently low latency and energy costs. sets used by many modern applications [25,63,95]. Indeed, re- We leverage LUT-based computing through the use of LUT- cent surveys show that the movement of data between main based operations that synergize well with existing work to en- memory and the processor is responsible for up to 60% of able more complex PuM-based functions. This allows pLUTo the energy consumption of modern memory-intensive appli- to perform a wider range of operations than prior works, cations [25, 30, 79, 90, 102, 121, 124, 127]. while enjoying similar performance and energy efficiency Processing-in-Memory (PiM) is a promising paradigm that metrics. aims to alleviate this data movement bottleneck. In a PiM- Our goal is to enable the execution of complex operations enabled device, the system’s main memory is augmented in-memory with simple changes to commodity DRAM that with some form of compute capability [46,48,89,90,116]. This synergize well with available PuM-based operations [28, 31, augmentation both 1) alleviates computational pressure from 68, 82, 110]. To this end, we propose pLUTo: processing-in- the CPU, and 2) reduces the movement of data between main memory with lookup table (LUT)operations, a DRAM sub- memory and the CPU. strate that enables massively parallel in-DRAM LUT queries. Recent works divide DRAM-based PiM architectures [48] pLUTo extends current PuM-enabled DRAM substrates [31, 1 82] by integrating a novel LUT-querying mechanism that single bit (0 or 1) in the form of stored electrical charge. The can be used to more efficiently perform arithmetic opera- memory cell transistor connects the capacitor to the bitline tions (e.g., multiplication, division), transcendental functions wire. Each bitline is shared by all the memory cells in a col- (e.g., binarization, exponentiation), and access precomputed umn, and connects them to a sense amplifier. The set of sense results (e.g., memoization, LUT queries in cryptographic al- amplifiers in a subarray makes up the local row buffer. gorithms). pLUTo stands out from prior works by being the first work to enable the massively parallel bulk querying of LUTs inside the DRAM array, which is our main contribution. pLUTo’s careful design enables these LUTs, which can be stored and queried directly inside memory, to express com- plex operations (e.g., multiplication, division, transcenden- tal, memoization) and enables two critical LUT-based capa- bilities: 1) querying of LUT tables of arbitrary size and 2) the pipelining of LUT operations, which significantly syn- ergize with and enhance existing PuM mechanisms (e.g., [28,82,110]). Furthermore, LUTsare an integral componentof many widespread algorithms, including AES, Blowfish, RC4, Figure 1: The internal organization of DRAM banks. and CRC and Huffman codes [40, 86, 87, 118, 126]. We evaluate pLUTo’s performance on a number of work- Reading and writing data in DRAM occurs over three loads against CPU-, GPU-, and PnM-based baselines. Our phases: 1) Activation, 2) Reading/Writing, 3) Precharging. evaluations show that pLUTo consistently outperforms the During Activation, the wordline of the accessed row is driven considered baselines, especially when normalizing to area high. This turns on the row’s access transistors and creates overhead. We also show that LUT-based computing is an ef- a path for charge to be shared between each memory cell ficient paradigm to execute bulk bitwise, arithmetic and tran- and its bitline. This charge sharing process induces a fluctua- scendental functions (e.g., binarization, exponentiation) with tion (δ) in the voltage level of the bitline, which is originally high throughput and energy efficiency. For example, pLUTo set at VDD~2. If the cell is charged, the bitline voltage be- outperforms existing PuM designs [32,82,110]by upto 3.5×, comes VDD~2 + δ. If the cell is discharged, the bitline voltage in the execution time for XOR and XNOR bitwise operations. becomes VDD~2 − δ. To read the value of the cell, the sense In this paper, we make the following contributions: amplifiers in the local row buffer amplify the fluctuation (±δ) • We introduce pLUTo, a PuM substrate that enables new induced in the bitline during Activation. Simultaneously, the lookup table operations. These operations synergize well desired charge level is restored to the capacitor in the memory with available PuM-based operations to enable more com- cell. After reading, the data is sent to the host CPU through plex operations that are commonly used in modern appli- the DRAM chip’s I/O circuitry and the system memory bus. cations. During Precharging, the access transistors are turned off, and • We propose three designs for pLUTo with different levels of the voltage level of all the bitlines is reset to VDD~2. This en- trade-offs in area cost, energy efficiency, and performance sures the correct operation of subsequent activations. depending on the system designers needs. • We evaluate pLUTo using a set of real-world cryptogra- 2.2. DRAM Extensions phy, image processing and neural network workloads. We pLUTo optimizes key operations by incorporating the follow- compare against state-of-the-art GPU implementations and ing previous proposals for enhanced DRAM architectures. find that pLUTo outperforms the baseline CPU and GPU Inter-Subarray Data Copy. The LISA-RBM (Row Buffer implementations by up to 33× and 8×, respectively, while Movement) operation, introduced in [28], copies the contents simultaneously achieving energy savings of 110× and 80×. of a row buffer to the row buffer of another subarray, without 2. Background making use of the external memory channel. This is achieved by linking neighboring subarrays with isolation transistors. In this section we describe the hierarchical organization of LISA-RBM commands are issued by the memory controller. DRAM and provide an overview of relevant prior work. The total area overhead of LISA is 0.8% 2.1. DRAM Background Subarray-Level Parallelism. MASA [68] is a mechanism A DRAM chip contains multiple memory banks (8 for DDR3, that introduces support for subarray-level parallelism by 16 for DDR4), and I/O circuitry.
Recommended publications
  • DDR and DDR2 SDRAM Controller Compiler User Guide
    DDR and DDR2 SDRAM Controller Compiler User Guide 101 Innovation Drive Software Version: 9.0 San Jose, CA 95134 Document Date: March 2009 www.altera.com Copyright © 2009 Altera Corporation. All rights reserved. Altera, The Programmable Solutions Company, the stylized Altera logo, specific device designations, and all other words and logos that are identified as trademarks and/or service marks are, unless noted otherwise, the trademarks and service marks of Altera Corporation in the U.S. and other countries. All other product or service names are the property of their respective holders. Altera products are protected under numerous U.S. and foreign patents and pending ap- plications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. UG-DDRSDRAM-10.0 Contents Chapter 1. About This Compiler Release Information . 1–1 Device Family Support . 1–1 Features . 1–2 General Description . 1–2 Performance and Resource Utilization . 1–4 Installation and Licensing . 1–5 OpenCore Plus Evaluation . 1–6 Chapter 2. Getting Started Design Flow . 2–1 SOPC Builder Design Flow . 2–1 DDR & DDR2 SDRAM Controller Walkthrough .
    [Show full text]
  • Basic Components of a Computer System
    Patricio Bulic´ Basic Components Of a Computer System A Textbook October 14, 2020 Springer Contents 1 Main memory ................................................. 1 1.1 Introduction . .1 1.2 Basics of Digital Circuits: A Quick Review . .3 1.2.1 MOS transistor as a switch . .3 1.2.2 CMOS inverter . .4 1.2.3 Bistable element . .5 1.3 SRAM cell . .6 1.4 DRAM cell . .7 1.4.1 Basic operation of DRAM . .8 1.4.2 Basic operation of sense amplifiers . 10 1.5 DRAM Arrays and DRAM Banks . 11 1.6 DRAM Chips . 13 1.7 Basic DRAM operations and timings . 15 1.7.1 Reading data from DRAM memory . 16 1.7.2 Writing data to DRAM memory . 17 1.7.3 Refreshing the DRAM memory . 18 1.8 Improving the performance of a DRAM chip . 20 1.8.1 Fast Page Mode DRAM . 21 1.8.2 Extended Data Output DRAM . 22 1.9 Synchronous DRAM . 24 1.9.1 Functional description . 25 1.9.2 Basic operations and timings. 28 1.10 Double Data Rate SDRAM . 36 1.10.1 Functional description . 37 1.10.2 DDR SDRAM timing diagrams . 40 1.10.3 Address Mapping . 43 1.10.4 Memory timings: a summary . 44 1.10.5 DDR Versions . 45 1.11 DIMM Modules . 46 1.11.1 Micron DDR4 DIMM module . 49 1.12 Memory channels . 49 v vi Contents 1.12.1 Case study: Intel i7-860 memory . 52 1.12.2 Case study: i9-9900K memory . 53 1.13 Bibliographical notes . 54 References ........................................................
    [Show full text]
  • DDR) SDRAM Controller (Pipelined Version) User’S Guide
    ispLever TM CORECORE Double Data Rate (DDR) SDRAM Controller (Pipelined Version) User’s Guide June 2004 ipug12_03 Double Data Rate (DDR) SDRAM Controller Lattice Semiconductor (Pipelined Version) User’s Guide Introduction DDR (Double Data Rate) SDRAM was introduced as a replacement for SDRAM memory running at bus speeds over 75MHz. DDR SDRAM is similar in function to regular SDRAM but doubles the bandwidth of the memory by transferring data twice per cycle (on both edges of the clock signal), implementing burst mode data transfer. The DDR SDRAM Controller is a parameterized core. This allows the user to modify the data widths, burst transfer rates, and CAS latency settings of the design. In addition, the DDR core supports intelligent bank management. By maintaining a database of “all banks activated” and the “rows activated” in each bank, the DDR SDRAM Controller decides if an active or pre-charge command is needed. This effectively reduces the latency of read/write com- mands issued to the DDR SDRAM. Since the DDR SDRAM Controller takes care of activating/pre-charging the banks, the user can simply issue sim- ple read/write commands without regard to the bank/charge status. Features •Performance of Greater than 100MHz (200 DDR) • Interfaces to JEDEC Standard DDR SDRAMs • Supports DDR SDRAM Data Widths of 16, 32 and 64 Bits • Supports up to 8 External Memory Banks • Programmable Burst Lengths of 2, 4, or 8 • Programmable CAS Latency of 1.5, 2.0, 2.5 or 3.0 • Byte-level Writing Supported • Increased Throughput Using Command Pipelining and Bank Management • Supports Power-down and Self Refresh Modes •Automatic Initialization •Automatic Refresh During Normal and Power-down Modes • Timing and Settings Parameters Implemented as Programmable Registers • Bus Interfaces to PCI Target, PowerPC and AMBA (AHB) Buses Available • Complete Synchronous Implementation 2 Double Data Rate (DDR) SDRAM Controller Lattice Semiconductor (Pipelined Version) User’s Guide Figure 1.
    [Show full text]
  • DDR SDRAM Controller
    DDR SDRAM Controller October 2002 IP Data Sheet Features ■ Bus Interfaces to PCI Target, PowerPC and ■ AMBA (AHB) Buses Available Performance of Greater than 200MHz in ■ Complete Synchronous Implementation DDR Mode ■ Interfaces to JEDEC Standard DDR General Description SDRAMs ■ Supports DDR SDRAM Data Widths of 16, DDR (Double Data Rate) SDRAM was introduced as a 32 and 64 Bits replacement for SDRAM memory running at bus ■ Supports up to 8 External Memory Banks speeds over 75MHz. DDR SDRAM is similar in function ■ Programmable Burst Lengths of 2, 4, or 8 to the regular SDRAM but doubles the bandwidth of the ■ Programmable CAS Latency of 1.5, 2.0, 2.5 memory by transferring data twice per cycle on both or 3.0 edges of the clock signal, implementing burst mode ■ Byte-level Writing Supported data transfer. ■ Increased Performance Using Command The DDR SDRAM Controller is a parameterized core Pipelining and Bank Management giving user the flexibility for modifying the data widths, ■ Supports Power-down and Self Refresh burst transfer rates, and CAS latency settings of the Modes design. In addition, the DDR core supports intelligent ■ Automatic Initialization bank management, which is done by maintaining a ■ Automatic Refresh During Nomal and database of “all banks activated” and the “rows acti- Power-down Modes vated” in each bank. With this information, the DDR ■ Timing and Settings Parameters SDRAM Controller decides if an active or pre-charge Implemented as Programmable Registers command is needed. This effectively reduces the latency of
    [Show full text]
  • External Memory Interface Handbook Volume 2: Design Guidelines
    External Memory Interface Handbook Volume 2: Design Guidelines Last updated for Altera Complete Design Suite: 15.0 Subscribe EMI_DG 101 Innovation Drive 2015.05.04 San Jose, CA 95134 Send Feedback www.altera.com TOC-2 Selecting Your Memory Contents Selecting Your Memory.......................................................................................1-1 DDR SDRAM Features............................................................................................................................... 1-2 DDR2 SDRAM Features............................................................................................................................. 1-3 DDR3 SDRAM Features............................................................................................................................. 1-3 QDR, QDR II, and QDR II+ SRAM Features..........................................................................................1-4 RLDRAM II and RLDRAM 3 Features.....................................................................................................1-4 LPDDR2 Features........................................................................................................................................ 1-6 Memory Selection........................................................................................................................................ 1-6 Example of High-Speed Memory in Embedded Processor....................................................................1-9 Example of High-Speed Memory in Telecom......................................................................................
    [Show full text]
  • Design and Simulation of High Performance DDR3 SDRAM Controller
    ISSN: 2394-6881 International Journal of Engineering Technology and Management (IJETM) Available Online at www.ijetm.org Volume 2, Issue 3, May-June 2015, Page No. 49-52 Design and Simulation of High Performance DDR3 SDRAM Controller Ms. Minal N. Kagdelwar1, Mrs. Priti S. Lokhande2 14thsem M.Tech Electronics Engineering KITS, Ramtek, Maharashtra, India E-mail: [email protected] 2Dept. of EE, KITS, Ramtek, Maharashtra, India E-mail: [email protected] ABSTRACT The latest addition to SDRAM technology is DDR3 SDRAM. DDR3 SDRAM (Double Data Rate Three Synchronous Dynamic Random Access Memory) uses 8n prefetch architecture. The embedded applications which need faster memories like signal processing/networking, image/video processing, digital signalling and controlling; these DDR3 devices are used commonly. DDR3 SDRAM has repetitively advanced over the years to keep up with ever-increasing work out needs. The 3rd generation of DDR memories DDR3 SDRAM that come across these demands in computing and server systems are used for high bandwidth storage of working data of a computer and other automated devices. DDR3 Memory Controller works as the interface amongst DDR3 memory and user and manages the dataflow going to and from the main memory. This paper discusses the overall architecture of the DDR3 controller along with the thorough design and operation of distinct sub blocks. To accomplish high performance at low supply voltage and reduced power consumption, this work presents new functions and defines their executions. The focus of this paper is to minimize the delay and power and thus increasing the device throughput using pipelining in the design.
    [Show full text]
  • 3D-Stacked Memory for Shared-Memory Multithreaded
    3D-stacked memory for shared-memory multithreaded workloads Sourav Bhattacharya Horacio Gonzalez–V´ elez´ Cloud Competency Centre, National College of Ireland [email protected], [email protected] KEYWORDS relevant cause of lag: the CPU-memory intercommunica- tion latency, with the help of 3D stacked memory. 3D-stacked memory; memory latency; computer archi- tecture; parallel computing; benchmarking; HPC Conventional DDR RAM has four primary timing in- dicators which are used to indicate the overall speed of ABSTRACT the DRAM. They are as follows: This paper aims to address the issue of CPU-memory 1. CAS Latency (tCL/tCAS): number of cycles taken to intercommunication latency with the help of 3D stacked access columns of data, after getting column address. memory. We propose a 3D-stacked memory configura- 2. RAS to CAS Delay (tRCD): It is the time taken be- tion, where a DRAM module is mounted on top of the tween the activation of the cache line (RAS) and the col- CPU to reduce latency. We have used a comprehensive umn (CAS) where the data is stored. simulation environment to assure both fabrication feasi- 3. Row Precharge Time (tRP): number of cycles taken to bility and energy efficiency of the proposed 3D stacked terminate the access to a row of data and open access to memory modules. We have evaluated our proposed ar- another row. chitecture by running PARSEC 2.1, a benchmark suite 4. Row Active Time (tRAS): number of cycles taken to for shared-memory multithreaded workloads. The results access rows of data, after getting row address. demonstrate an average of 40% improvement over con- These four parameters are cumulatively known as ventional DDR3/4 memory architectures.
    [Show full text]
  • Basics of the Memory System
    Chapter 6 – Basics of the Memory System We now give an overview of RAM – Random Access Memory. This is the memory called “primary memory” or “core memory”. The term “core” is a reference to an earlier memory technology in which magnetic cores were used for the computer‟s memory. This discussion will pull material from a number of chapters in the textbook. Primary computer memory is best considered as an array of addressable units. Addressable units are the smallest units of memory that have independent addresses. In a byte-addressable memory unit, each byte (8 bits) has an independent address, although the computer often groups the bytes into larger units (words, long words, etc.) and retrieves that group. Most modern computers manipulate integers as 32-bit (4-byte) entities, so retrieve the integers four bytes at a time. In this author‟s opinion, byte addressing in computers became important as the result of the use of 8–bit character codes. Many applications involve the movement of large numbers of characters (coded as ASCII or EBCDIC) and thus profit from the ability to address single characters. Some computers, such as the CDC–6400, CDC–7600, and all Cray models, use word addressing. This is a result of a design decision made when considering the main goal of such computers – large computations involving integers and floating point numbers. The word size in these computers is 60 bits (why not 64? – I don‟t know), yielding good precision for numeric simulations such as fluid flow and weather prediction. Memory as a Linear Array Consider a byte-addressable memory with N bytes of memory.
    [Show full text]
  • Memtest86 User Manual
    MemTest86 User Manual Version 9.2 Copyright 2021 Passmark® Software Page 1 Table of Contents 1 Introduction........................................................................................................................................................3 1.1 Memory Reliability....................................................................................................................................3 1.2 MemTest86 Overview...............................................................................................................................3 1.3 Compatibility............................................................................................................................................3 2 Setup and Use.....................................................................................................................................................5 2.1 Boot-disk Creation in Windows.................................................................................................................6 2.2 Boot-disk Creation in Linux.......................................................................................................................6 2.3 Boot-disk Creation in Mac.........................................................................................................................6 2.4 Setting up Network (PXE) Boot.................................................................................................................8 2.5 Using MemTest86...................................................................................................................................14
    [Show full text]
  • Power/Performance Trade-Offs in Real-Time SDRAM Command Scheduling
    IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, JANUARY 2014 1 Power/Performance Trade-offs in Real-Time SDRAM Command Scheduling Sven Goossens, Karthik Chandrasekar, Benny Akesson, and Kees Goossens Abstract—Real-time safety-critical systems should provide hard bounds on an applications’ performance. SDRAM controllers used in this domain should therefore have a bounded worst-case bandwidth, response time, and power consumption. Existing works on real-time SDRAM controllers only consider a narrow range of memory devices, and do not evaluate how their schedulers’ performance varies across memory generations, nor how the scheduling algorithm influences power usage. The extent to which the number of banks used in parallel to serve a request impacts performance is also unexplored, and hence there are gaps in the tool set of a memory subsystem designer, in terms of both performance analysis, and configuration options. This article introduces a generalized close-page memory command scheduling algorithm that uses a variable number of banks in parallel to serve a request. To reduce the schedule length for DDR4 memories, we exploit bank grouping through a pairwise bank-group interleaving scheme. The algorithm is evaluated using an ILP formulation, and provides schedules of optimal length for most of the considered LPDDR, DDR2, DDR3, LPDDR2, LPDDR3 and DDR4 devices. We derive the worst-case bandwidth, power and execution time for the same set of devices, and discuss the observed trade-offs and trends in the scheduler-configuration design space based on these metrics, across memory generations. Index Terms—dynamic random access memory (DRAM), Memory control and access, Real-time and embedded systems ! 1INTRODUCTION HE past decade has seen the introduction of many new on the worst-case performance impact of the command T SDRAM generations, in an effort to ward off the mem- scheduler and the low-level memory map that distributes ory wall problem [1].
    [Show full text]
  • Speedster22i DDR3 User Guide (UG031)
    Speedster22i DDR3 User Guide (UG031) Speedster FPGAs Speedster FPGAs www.achronix.com 1 Speedster22i DDR3 User Guide (UG031) Copyrights, Trademarks and Disclaimers Copyright © 2018 Achronix Semiconductor Corporation. All rights reserved. Achronix, Speedcore, Speedster, and ACE are trademarks of Achronix Semiconductor Corporation in the U.S. and/or other countries All other trademarks are the property of their respective owners. All specifications subject to change without notice. NOTICE of DISCLAIMER: The information given in this document is believed to be accurate and reliable. However, Achronix Semiconductor Corporation does not give any representations or warranties as to the completeness or accuracy of such information and shall have no liability for the use of the information contained herein. Achronix Semiconductor Corporation reserves the right to make changes to this document and the information contained herein at any time and without notice. All Achronix trademarks, registered trademarks, disclaimers and patents are listed at http://www.achronix.com/legal. Achronix Semiconductor Corporation 2903 Bunker Hill Lane Santa Clara, CA 95054 USA Website: www.achronix.com E-mail : [email protected] 2 www.achronix.com Speedster FPGAs Speedster22i DDR3 User Guide (UG031) Table of Contents Chapter - 1: Introduction . 7 Features . 7 Functional Overview . 8 Chapter - 2: DDR3 Macro Interfaces . 10 Internal (Core-side) Interface . 10 External (Memory) Interface . 13 DDR3 Macro Block Diagram . 15 Chapter - 3: DDR3 Core Parameters . 16 Chapter - 4: Read and Write Operations . 21 Read Interface Details . 21 Read Protocol 2× Clock Mode . 21 Back-to-Back Read Protocol 2× Clock Mode . 23 Write Interface Details . 25 Write Protocol 2× Clock Mode . 25 Back-to-Back Write Protocol 2× Clock Mode .
    [Show full text]
  • Chapter 5 UEFI SETUP UTILITY 28
    Version 1.0 Published June 2020 This device complies with Part 15 of the FCC Rules. Operation is subject to the following two conditions: (1) this device may not cause harmful interference, and (2) this device must accept any interference received, including interference that may cause undesired operation. CALIFORNIA, USA ONLY The Lithium battery adopted on this motherboard contains Perchlorate, a toxic substance controlled in Perchlorate Best Management Practices (BMP) regulations passed by the California Legislature. When you discard the Lithium battery in California, USA, please follow the related regulations in advance. “Perchlorate Material-special handling may apply, see www.dtsc.ca.gov/hazardouswaste/ perchlorate” AUSTRALIA ONLY Our goods come with guarantees that cannot be excluded under the Australian Consumer Law. You are entitled to a replacement or refund for a major failure and compensation for any other reasonably foreseeable loss or damage caused by our goods. You are also entitled to have the goods repaired or replaced if the goods fail to be of acceptable quality and the failure does not amount to a major failure. The terms HDMI™ and HDMI High-Definition Multimedia Interface, and the HDMI logo are trademarks or registered trademarks of HDMI Licensing LLC in the United States and other countries. Contents Chapter 1 Introduction 1 1.1 Package Contents 1 1.2 Specifications 2 Chapter 2 Product Overview 4 2.1 Front View 4 2.2 Rear View 5 1.3 Motherboard Layout 6 Chapter 3 Hardware Installation 13 3.1 Begin Installation 13 3.2
    [Show full text]