System-Level Energy Optimisation Methodologies for DRAM Memory of Embedded Systems

System-level Energy Optimisation Methodologies for DRAM Memory of Embedded Systems by Su Myat Min Shwe A Thesis Submitted in Accordance with the Requirements for the Degree of Doctor of Philosophy School of Computer Science and Engineering The University of New South Wales Nov 2013 ⃝c Copyright by Su Myat Min Shwe 2013 All Rights Reserved ii Thesis Publications • S. M. Min, H. Javaid, A. Ignjatovic and S. Parameswaran. A Case Study on Exploration of Last-level Cache for Energy Reduction in DDR3 DRAM. In 2nd Mediterranean Conference on Embedded Computing, MECO 2013 & ECyPS2013,´ Budva, Montenegro. • S. M. Min, H. Javaid and S. Parameswaran. XDRA: Exploration and Opti- mization of Last-level Cache for Energy Reduction in DDR DRAMs. In Design Automation Conference, DAC'13, USA, June 2013. • S. M. Min, H. Javaid and S. Parameswaran. RExCache: Rapid Exploration of Unified Last-level Cache. In Asia and South Pacific Design Automation Conference, ASP-DAC'13, Japan, Jan 2013. • S. M. Min, J. Peddersen, and S. Parameswaran. Realising Cycle Accu- rate Processor-Memory Simulation via Interface Abstraction. In VLSI Design (VLSI Design), 2011 24th International Conference on VLSI Design, VLSI'11, India, Jan 2011. vii Contributions of this Thesis • A novel interface abstraction layer between the processor and memory system is proposed to implement a cycle-accurate processor-memory system simulator, so that detailed statistics of the memory system, such as performance, and power consumption, can be captured cycle-accurately. • A novel estimation methodology of the execution time and energy consumption of the memory system is proposed. • A rapid exploration framework is presented to quickly estimate a suitable last- level cache configuration which enables maximum power savings with negligible performance degradation of the memory system. This framework integrates the cycle-accurate processor-memory simulator, cache simulator and proposed execution time/energy estimators in order to greatly reduce the simulation time. • An improved power mode controller to efficiently manage the DRAM power modes for DRAM energy reduction is presented. • A DRAM energy reduction estimator which is derived using a small number of cycle-accurate simulations is proposed, to obtain the energy savings amount of the DRAM system accurately and rapidly while using a specific last-level cache. • An exploration framework is presented to explore the last-level cache design space for maximum DRAM energy reduction. The framework uses the novel analysis techniques for computation of the proposed DRAM energy reduction estimator parameters which do not require cycle-accurate simulations of all the last-level cache configurations, and thus enables fast exploration of the large design space. viii Acknowledgements I would like to express my deepest gratitude to my supervisor Prof. Sri Parameswaran for his continuous support, patience, motivation, constant encouragement, and im- mense knowledge. Without his insightful guidance and kind support, this disser- tation would not have been possible. I greatly appreciate all the support he has provided to me throughout my candidature. I would also like to express my deepest appreciation to my dear husband, Win. Without his love, constant motivation, understanding, and endless support, I would not have made it this far. I am greatly indebted to him and I cannot find words to express my gratitude to him. A Special gratitude goes to Dr. Haris Javaid for his kind support, guidance, helpful suggestions and sharing on great technical knowledge. I owe him my heartfelt appreciation. I would also like to thank my joint-supervisor, Dr. Aleksander Ignjatovic, for his support and sharing of mathematical knowledge related to my research. Many thanks to the academic committee members for reviewing my research: Dr Oliver Diessel, Dr. Bruno Gaeta, Dr. Annie Guo, and Dr. Hui Wu. Their comments and feedbacks always guided my research in the right direction. My sincere thanks go to Dr. Jorgen Peddersen for his valuable comments and useful advice. I am also truly grateful to all the members of the Embedded Systems group: Dr. Jude Angelo Ambrose, Liang, Tuo, Josef, Haseeb, Babak, Dr. Xin He and Dr. Krutartha Patel for their cooperation, continuous morale support, and for all forms of help throughout the study. I would also like to extend my thanks to members of the Computer Science and Engineering Department, UNSW, for their various support that I received over my candidature. Furthermore, I would like to take this opportunity to thank you to my lovely sister, Dr. Thazin Aung for being supportive and having invaluable suggestions for my life. I was lucky to be her sister and I will forever be thankful to her. Last, ix but not least, I would like to thank my family for their unconditional love and encouragement. My humble apologies to anyone whose name I might not have mentioned here, but I appreciate your support from the bottom of my heart. To all, whom I have mentioned and whom I have forgotten to mention, I would like to dedicate this work. x Abstract Managing power/energy consumption in complex SoC (System On Chip) systems and Application Specific Instruction set Processors (ASIPs) is emerging as a major concern in the design of embedded systems. In these systems, especially in battery- operated portable devices, performance of the system is not only measured by the speed and functionalities of what the system provides but also the lifetime of the battery, which is directly proportional to the power/energy consumption of the system. Among the different components of the system, DRAM is one of the higher power consumers. The increased demand on the long battery life requires power/energy aware methodologies and a comprehensive design process flow to optimise DRAM power/energy consumption of such power-hungry devices. For power/energy estimation purposes, a high level system simulation guided approach is necessary due to the time consuming process of RTL/gate-level performance and power estimation. Applying a two-step simulation approach (memory trace sequences are captured with the processor simulator or hardware-assisted approach in the first step and the collected traces are used in second step's memory system simulation) obtains inaccurate results due to a lack of feedback from one memory request to the next memory request. This thesis presents a design methodology with a seamless interface layer to glue the processor component and memory component for building a one-step system level processor-memory simulation framework so that every memory request from the processor component can be sent directly to the memory component for on-the-fly memory simulation. Over six mediabench benchmarks, our one-step simulation approach provides greater accuracy than trace-driven memory simulations which has shown an 80% variation (over six mediabench benchmarks) in the choice of fixed memory latency in order to achieve the most accurate power consumption. Exploiting the last-level cache is a well-known technique that reduces the DRAM memory traffic. The last-level cache is inserted just before the DRAM level in the xi memory hierarchy design in order to improve the performance of the system. Esti- mation of improved performance amounts for a last-level cache configuration (cache size, cache line size and associativity) with a cycle-accurate simulation approach is exorbitantly slow. Thus, the cycle-accurate simulation for a large last-level cache configuration design space is not a feasible option to obtain the highly accurate estimates. This thesis introduces a technique to rapidly find out the performance and energy consumption of the whole system while using different last-level cache configurations. The proposed technique utilised a combination of one time slow cycle-accurate simulation and a large number of fast trace-based simulations for all the configurations, and thus, reduced the total simulation time (from 257 days to 21 hours maximally for h264 Enc application). Our methodology helps in signifi- cantly reducing the turnaround time to obtain the highly accurate execution time numbers with reasonably accurate energy numbers (average absolute accuracy of 99.74% in execution time and 80.31% in energy consumption for nine multimedia applications). DRAM's energy consumption is a very important component of total energy consumption of a system design. Exploiting both the last-level cache and DRAM's power modes together creates a chance of achieving the energy reduction in the DRAM memory system. However, the increase/reduction of energy consumption is dependent on the application request pattern and the last-level cache configuration. Selecting a suitable last-level cache configuration (from a large design space) for a target application to obtain the maximum energy savings takes a great amount of time. Thus, we developed a design framework and an energy reduction estimator to quickly explore a suitable configuration for maximum DRAM's energy reduction. First, we analysed the energy increase/reduction of the test configurations which are chosen with Latin Hypercube Sampling which is a well-known design of experimen- tal technique. Based on this analysis, we proposed an energy reduction estimator that captures the dependence of the memory system's energy reduction on certain parameters such as memory traffic, power mode switching time, etc. The energy xii reduction estimator of the DARM system is modelled by capturing the relationship between energy reduction with highly correlated DRAM parameters and by using the Kriging prediction method. We show that our technique is able to predict the DRAM energy reduction for 330 last-level cache configurations in several days (with the accuracy within 4.4% on average) for 11 applications from the mediabench and SPEC2000 suite, whereas the cycle accurate simulation took several months. xiii Contents Statement of Originality . iii Copyright Statement . iv Authenticity Statement . v Thesis Publications . vii Contributions of this Thesis . viii Acknowledgements . ix Abstract . xi Table of Contents .

Load more