Understanding and Improving the Latency of DRAM-Based Memory Systems
Total Page:16
File Type:pdf, Size:1020Kb
Understanding and Improving the Latency of DRAM-Based Memory Systems Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical and Computer Engineering Kevin K. Chang M.S., Electrical & Computer Engineering, Carnegie Mellon University B.S., Electrical & Computer Engineering, Carnegie Mellon University arXiv:1712.08304v1 [cs.AR] 22 Dec 2017 Carnegie Mellon University Pittsburgh, PA May, 2017 Copyright ©2017, Kevin K. Chang Abstract Over the past two decades, the storage capacity and access bandwidth of main memory have improved tremendously, by 128x and 20x, respectively. These improvements are mainly due to the continuous technology scaling of DRAM (dynamic random-access memory), which has been used as the physical substrate for main memory. In stark contrast with capacity and bandwidth, DRAM latency has remained almost constant, reducing by only 1.3x in the same time frame. Therefore, long DRAM latency continues to be a critical performance bot- tleneck in modern systems. Increasing core counts, and the emergence of increasingly more data-intensive and latency-critical applications further stress the importance of providing low-latency memory access. In this dissertation, we identify three main problems that contribute significantly to long latency of DRAM accesses. To address these problems, we present a series of new techniques. Our new techniques significantly improve both system performance and energy efficiency. We also examine the critical relationship between supply voltage and latency in modern DRAM chips and develop new mechanisms that exploit this voltage-latency trade-off to improve energy efficiency. First, while bulk data movement is a key operation in many applications and operating systems, contemporary systems perform this movement inefficiently, by transferring data from DRAM to the processor, and then back to DRAM, across a narrow off-chip channel. The use of this narrow channel for bulk data movement results in high latency and high en- ergy consumption. This dissertation introduces a new DRAM design, Low-cost Inter-linked SubArrays (LISA), which provides fast and energy-efficient bulk data movement across sub- arrays in a DRAM chip. We show that the LISA substrate is very powerful and versatile by demonstrating that it efficiently enables several new architectural mechanisms, including low-latency data copying, reduced DRAM access latency for frequently-accessed data, and reduced preparation latency for subsequent accesses to a DRAM bank. Second, DRAM needs to be periodically refreshed to prevent data loss due to leakage. Unfortunately, while DRAM is being refreshed, a part of it becomes unavailable to serve memory requests, which degrades system performance. To address this refresh interference problem, we propose two access-refresh parallelization techniques that enable more overlap- ping of accesses with refreshes inside DRAM, at the cost of very modest changes to the memory controllers and DRAM chips. These two techniques together achieve performance close to an idealized system that does not require refresh. Third, we find, for the first time, that there is significant latency variation in accessing different cells of a single DRAM chip due to the irregularity in the DRAM manufacturing process. As a result, some DRAM cells are inherently faster to access, while others are in- i herently slower. Unfortunately, existing systems do not exploit this variation and use a fixed latency value based on the slowest cell across all DRAM chips. To exploit latency variation within the DRAM chip, we experimentally characterize and understand the behavior of the variation that exists in real commodity DRAM chips. Based on our characterization, we propose Flexible-LatencY DRAM (FLY-DRAM), a mechanism to reduce DRAM latency by categorizing the DRAM cells into fast and slow regions, and accessing the fast regions with a reduced latency, thereby improving system performance significantly. Our extensive exper- imental characterization and analysis of latency variation in DRAM chips can also enable the development of other new techniques to improve performance or reliability. Fourth, this dissertation, for the first time, develops an understanding of the latency be- havior due to another important factor { supply voltage, which significantly impacts DRAM performance, energy consumption, and reliability. We take an experimental approach to understanding and exploiting the behavior of modern DRAM chips under different supply voltage values. Our detailed characterization of real commodity DRAM chips demonstrates that memory access latency can be reliably reduced by increasing the DRAM array supply voltage. Based on our characterization, we propose Voltron, a new mechanism that im- proves system energy efficiency by dynamically adjusting the DRAM supply voltage using a new performance model. Our extensive experimental data on the relationship between DRAM supply voltage, latency, and reliability can further enable developments of other new mechanisms that improve latency, energy efficiency, or reliability. The key conclusion of this dissertation is that augmenting DRAM architecture with sim- ple and low-cost features, and developing a better understanding of manufactured DRAM chips together lead to significant memory latency reduction as well as energy efficiency im- provement. We hope and believe that the proposed architectural techniques and the detailed experimental data and observations on real commodity DRAM chips presented in this dis- sertation will enable development of other new mechanisms to improve the performance, energy efficiency, or reliability of future memory systems. ii Acknowledgments The pursuit of Ph.D. has been a period of fruitful learning experience for me, not only in the academic arena, but also on a personal level. I would like to reflect on the many people who have supported and helped me to become who I am today. First and foremost, I would like to thank my advisor, Prof. Onur Mutlu, who has taught me how to think critically, speak clearly, and write thoroughly. Onur generously provided the resources and the open environment that enabled me to carry out my research. I am also very thankful to Onur for giving me the opportunities to collaborate with students and researchers from other institutions. They have broadened my knowledge and improved my research. I am grateful to the members of my thesis committee: Prof. James Hoe, Prof. Kayvon Fatahalian, Prof. Moinuddin Qureshi, and Prof. Steve Keckler for serving on my defense. They provided me valuable comments and helped make the final stretch of my Ph.D. very smooth. I would like to especially thank Prof. James Hoe for introducing me to computer architecture and providing me numerous pieces of advice throughout my education. I would like to thank my internship mentors, who provided the guidance to make my work successful for both sides: Gabriel Loh, Mithuna Thottethodi, Yasuko Eckert, Mike O'Connor, Srilatha Manne, Lisa Hsu, Zeshan Chishti, Alaa Alameldeen, Chris Wilkerson, Shih-Lien Lu, and Manu Awasthi. I thank the Intel Corporation and Semiconductor Research Corporation (SRC) for their generous financial support. During graduate school, I have met many wonderful fellow graduate students and friends whom I am grateful to. Members in SAFARI research group have been both great friends and colleagues to me. Rachata Ausavarungnirun was my great cubic mate who supported and tolerated me for many years. Donghyuk Lee was our DRAM guru who introduced expert DRAM knowledge to many of us. Saugata Ghose was my collaborator and mentor who assisted me in many ways. Hongyi Xin has been a good friend who taught me a great deal about bioinformatics and amused me with his great sense of humor. Yoongu Kim taught me how to conduct research and think about problems from different perspectives. Samira Khan provided me insightful academic and life advice. Gennady Pekhimenko was always helpful when I am in need. Vivek Seshadri is someone I aspire to be because of his creative and methodical approach to problem solving and thinking. Lavanya Subramanian was always warm and welcoming when I approached her with ideas and problems. Chris Fallin's critical feedback on research during the early years of my research was extremely helpful. Kevin Hsieh was always willing to listen to my problems in school and life, and provided me with the right advice. Nandita Vijaykumar was a strong-willed person who gave me unwavering advice. I am grateful to other SAFARI members for their companionship: Justin Meza, Jamie, Ben, HanBin Yoon, Yang Li, Minesh Patel, Jeremie Kim, Amirali Boroumand, and iii Damla Senol. I also thank graduate interns and visitors who have assisted me in my research: Hasan Hassan, Abhijith Kashyap, and Abdullah Giray Yaglikci. Graduate school is a long and lonely journey that sometimes hits you the hardest when working at midnight. I feel very grateful for the many friends who were there to help me grind through it. I want to thank Richard Wang, Yvonne Yu, Eugene Wang, and Hongyi Xin for their friendship and bringing joy during my low time. I would like to thank my family for their enormous support and sacrifices that they made. My mother, Lili, has been a pillar of support throughout my life. She is the epitome of love, strength, and sacrifice. I am grateful to her strong belief in education. My sister, Nadine, has always been there kindly supporting me with unwavering love. I owe all of my success to these two strong women in my life. I would like to thank my late father, Pei-Min, for his love and full support. My childhood was full of joy because of him. I am sorry that he has not lived to see me finish my Ph.D. I also thank my step father, Stephen, for his encouragement and support. Lastly, I thank my girlfriend Sherry Huang, for all her love and understanding. Finally, I am grateful to the Semiconductor Research Corporation and Intel for providing me a fellowship. I would like to acknowledge the support of Google, Intel, NVIDIA, Samsung, VMware, and the United States Department of Energy.