ABSTRACT the Performance Characteristics of Modern DRAM
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT Title of dissertation: MODERN DRAM MEMORY SYSTEMS: PERFORMANCE ANALYSIS AND A HIGH PERFORMANCE, POWER-CONSTRAINED DRAM SCHEDULING ALGORITHM David Tawei Wang, Doctor of Philosophy, 2005 Dissertation directed by: Associate Professor Bruce L. Jacob Department of Electrical and Computer Engineer- ing, and Institute for Advanced Computer Studies The performance characteristics of modern DRAM memory systems are impacted by two primary attributes: device datarate and row cycle time. Modern DRAM device dat- arates and row cycle times are scaling at different rates with each successive generation of DRAM devices. As a result, the performance characteristics of modern DRAM memory systems are becoming more difficult to evaluate at the same time that they are increasingly limiting the performance of modern computer systems. In this work, a performance evalua- tion framework that enables abstract performance analysis of DRAM memory systems is presented. The performance evaluation framework enables the performance characteriza- tion of memory systems while fully accounting for the effects of datarates, row cycle times, protocol overheads, device power constraints, and memory system organizations. This dissertation utilizes the described evaluation framework to examine the perfor- mance impact of the number of banks per DRAM device, the effects of relatively static DRAM row cycle times and increasing DRAM device datarates, power limitation con- straints, and data burst lengths in future generations of DRAM devices. Simulation results obtained in the analysis provide insights into DRAM memory system performance charac- teristics including, but not limited to the following observations. • The performance benefit of having a 16 banks over 8 banks increases with increasing datarate. The average performance benefit reaches 18% at 1 Gbps for both open-page and close-page systems. • Close-page systems are greatly limited by DRAM device power constraints, while open-page systems are less sensitive to DRAM device power constraints. • Increasing burst lengths of future DRAM devices can adversely impact cache-limited processors despite the increasing bandwidth. Performance losses of greater than 50% are observed. Finally, This dissertation also present a unique rank hopping DRAM command- scheduling algorithm designed to alleviate the bandwidth constraints in DDR2 and future DDRx SDRAM memory systems. The proposed rank hopping scheduling algorithm sched- ules DRAM transactions and command sequences to avoid the power limiting constraints and amortizes the rank-to-rank switching overhead. Execution based simulations show that some workloads are able to fully utilize the additional bandwidth and significant perfor- mance improvements are observed across a range of workloads. MODERN DRAM MEMORY SYSTEMS: PERFORMANCE ANALYSIS AND SCHEDULING ALGORITHM by David Tawei Wang Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2005 Advisory Committee: Associate Professor Bruce L. Jacob, Chair Associate Professor Shuvra S. Bhattacharyya Associate Professor Tsung Chin Associate Professor Donald Yeung Associate Professor Charles B. Silio Jr. © Copyright by David Tawei Wang 2005 Table of Contents CHAPTER 1 Introduction ......................................................... 1 1.1 Problem Description ............................................................... 2 1.2 Contributions and Significance .............................................. 4 1.3 Organization of Dissertation ................................................... 6 CHAPTER 2 DRAM Device: Basic Circuits and Architecture . 7 2.1 Introduction: ........................................................................... 7 2.2 DRAM Device Organization .................................................. 8 2.3 DRAM Storage Cells .............................................................. 11 2.3.1 Cell capacitance, Leakage and Refresh - - - - - - - - - - - - - - - - - 11 2.4 DRAM Array Structures ......................................................... 13 2.5 Differential Sense Amplifier .................................................. 15 2.5.1 Functionality of Sense Amplifiers in DRAM Devices - - - - - - - - 15 2.5.2 Circuit Diagram of a Basic Sense Amplifier - - - - - - - - - - - - - - 16 2.5.3 Basic Sense Amplifier Operation - - - - - - - - - - - - - - - - - - - - - - 18 2.5.4 Voltage Waveform of Basic Sense Amplifier Operation - - - - - - - 20 2.5.5 Writing into DRAM Array - - - - - - - - - - - - - - - - - - - - - - - - - - 22 2.6 DRAM Device Control Logic ................................................ 23 2.6.1 Mode Register Based Programmability - - - - - - - - - - - - - - - - - 25 2.7 DRAM Device Configuration ................................................ 26 2.7.1 Device Configuration Trade-offs - - - - - - - - - - - - - - - - - - - - - - 27 2.8 Data I/O .................................................................................. 29 2.8.1 Burst Lengths and Burst Ordering - - - - - - - - - - - - - - - - - - - - - 29 2.8.2 N-bit Prefetch - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 30 2.9 DRAM Device Packaging ...................................................... 32 2.10 A 256 Mbit SDRAM Device .................................................. 34 2.10.1 SDRAM Device Block Diagram - - - - - - - - - - - - - - - - - - - - - - 34 2.10.2 Pin Assignment and Functionality - - - - - - - - - - - - - - - - - - - - - 35 2.11 Process Technology and Scaling Considerations ................... 37 2.11.1 Cost Considerations - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 37 2.11.2 DRAM-versus-Logic Optimized Process Technologies - - - - - - - 38 CHAPTER 3 DRAM Memory System Organization ............... 41 3.1 Conventional Memory system ................................................ 41 3.2 Basic Nomenclature ................................................................ 43 3.2.1 Channel - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 44 3.2.2 Rank - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 48 DRAM Memory Systems Performance Analysis i TABLE OF CONTENTS 3.2.3 Bank - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 49 3.2.4 Row - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 50 3.2.5 Column - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 51 3.2.6 Memory System Organization: An Example - - - - - - - - - - - - - - 52 3.3 Memory Modules ................................................................... 53 3.3.1 Single In-line Memory Module (SIMM) - - - - - - - - - - - - - - - - - 55 3.3.2 Dual In-line Memory Module (DIMM) - - - - - - - - - - - - - - - - - - 56 3.3.3 Registered Memory Module - - - - - - - - - - - - - - - - - - - - - - - - - 57 3.3.4 Memory Module Organization - - - - - - - - - - - - - - - - - - - - - - - 59 3.3.5 Serial Presence Detect (SPD) - - - - - - - - - - - - - - - - - - - - - - - - 60 3.4 Memory System Topology ..................................................... 61 3.4.1 Direct RDRAM System Topology - - - - - - - - - - - - - - - - - - - - - - 62 CHAPTER 4 DRAM Memory Access Protocol ....................... 64 4.1 Basic DRAM Commands: ...................................................... 65 4.1.1 Generic DRAM Command Format - - - - - - - - - - - - - - - - - - - - 67 4.1.2 Summary of Timing Parameters - - - - - - - - - - - - - - - - - - - - - - 69 4.1.3 Row Access Command - - - - - - - - - - - - - - - - - - - - - - - - - - - - 70 4.1.4 Column Read Command - - - - - - - - - - - - - - - - - - - - - - - - - - - 71 4.1.5 Column Write Command - - - - - - - - - - - - - - - - - - - - - - - - - - - 72 4.1.6 Precharge Command - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 73 4.1.7 Refresh Command - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 74 4.1.8 A Read Cycle - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 77 4.1.9 Complex Commands - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 78 4.2 DRAM Command Interactions ............................................... 81 4.2.1 Consecutive Reads to Same Rank - - - - - - - - - - - - - - - - - - - - - 82 4.2.2 Consecutive Reads to Different Rows of Same Bank - - - - - - - - - 83 4.2.3 Consecutive Reads to Different Banks: Bank Conflict - - - - - - - - 86 4.2.4 Consecutive Read Requests to Different Ranks - - - - - - - - - - - - 88 4.2.5 Consecutive Write Requests: Open Banks - - - - - - - - - - - - - - - - 89 4.2.6 Consecutive Write Requests: Bank Conflicts - - - - - - - - - - - - - - 90 4.2.7 Write Request Following Read Request: Open Banks - - - - - - - - 92 4.2.8 Write Following Read: Same Bank, Conflict, Best Case - - - - - - 93 4.2.9 Write Following Read: Different Banks, Conflict, Best Case - - - 94 4.2.10 Read Following Write to Same Rank, Open Banks - - - - - - - - - - 95 4.2.11 Read Following Write to Different Ranks, Open Banks - - - - - - - 96 4.2.12 Read Following Write to Same Bank, Bank Conflict - - - - - - - - - 97 4.2.13 Read Following Write: Different Banks Same Rank, Conflict: Best Case 98 4.3 Minimum Scheduling Distances ............................................. 100 4.4 Additional Constraints: Power ................................................ 102 4.4.1 tRRD: Row to Row (activation) Delay - - - - - - - - - - - - -