System-Level Energy Optimisation Methodologies for DRAM Memory of Embedded Systems

Total Page:16

File Type:pdf, Size:1020Kb

System-Level Energy Optimisation Methodologies for DRAM Memory of Embedded Systems System-level Energy Optimisation Methodologies for DRAM Memory of Embedded Systems by Su Myat Min Shwe A Thesis Submitted in Accordance with the Requirements for the Degree of Doctor of Philosophy School of Computer Science and Engineering The University of New South Wales Nov 2013 ⃝c Copyright by Su Myat Min Shwe 2013 All Rights Reserved ii Thesis Publications • S. M. Min, H. Javaid, A. Ignjatovic and S. Parameswaran. A Case Study on Exploration of Last-level Cache for Energy Reduction in DDR3 DRAM. In 2nd Mediterranean Conference on Embedded Computing, MECO 2013 & ECyPS2013,´ Budva, Montenegro. • S. M. Min, H. Javaid and S. Parameswaran. XDRA: Exploration and Opti- mization of Last-level Cache for Energy Reduction in DDR DRAMs. In Design Automation Conference, DAC'13, USA, June 2013. • S. M. Min, H. Javaid and S. Parameswaran. RExCache: Rapid Exploration of Unified Last-level Cache. In Asia and South Pacific Design Automation Conference, ASP-DAC'13, Japan, Jan 2013. • S. M. Min, J. Peddersen, and S. Parameswaran. Realising Cycle Accu- rate Processor-Memory Simulation via Interface Abstraction. In VLSI Design (VLSI Design), 2011 24th International Conference on VLSI Design, VLSI'11, India, Jan 2011. vii Contributions of this Thesis • A novel interface abstraction layer between the processor and memory system is proposed to implement a cycle-accurate processor-memory system simulator, so that detailed statistics of the memory system, such as performance, and power consumption, can be captured cycle-accurately. • A novel estimation methodology of the execution time and energy consumption of the memory system is proposed. • A rapid exploration framework is presented to quickly estimate a suitable last- level cache configuration which enables maximum power savings with negligible performance degradation of the memory system. This framework integrates the cycle-accurate processor-memory simulator, cache simulator and proposed execution time/energy estimators in order to greatly reduce the simulation time. • An improved power mode controller to efficiently manage the DRAM power modes for DRAM energy reduction is presented. • A DRAM energy reduction estimator which is derived using a small number of cycle-accurate simulations is proposed, to obtain the energy savings amount of the DRAM system accurately and rapidly while using a specific last-level cache. • An exploration framework is presented to explore the last-level cache design space for maximum DRAM energy reduction. The framework uses the novel analysis techniques for computation of the proposed DRAM energy reduction estimator parameters which do not require cycle-accurate simulations of all the last-level cache configurations, and thus enables fast exploration of the large design space. viii Acknowledgements I would like to express my deepest gratitude to my supervisor Prof. Sri Parameswaran for his continuous support, patience, motivation, constant encouragement, and im- mense knowledge. Without his insightful guidance and kind support, this disser- tation would not have been possible. I greatly appreciate all the support he has provided to me throughout my candidature. I would also like to express my deepest appreciation to my dear husband, Win. Without his love, constant motivation, understanding, and endless support, I would not have made it this far. I am greatly indebted to him and I cannot find words to express my gratitude to him. A Special gratitude goes to Dr. Haris Javaid for his kind support, guidance, helpful suggestions and sharing on great technical knowledge. I owe him my heartfelt appreciation. I would also like to thank my joint-supervisor, Dr. Aleksander Ignjatovic, for his support and sharing of mathematical knowledge related to my research. Many thanks to the academic committee members for reviewing my research: Dr Oliver Diessel, Dr. Bruno Gaeta, Dr. Annie Guo, and Dr. Hui Wu. Their comments and feedbacks always guided my research in the right direction. My sincere thanks go to Dr. Jorgen Peddersen for his valuable comments and useful advice. I am also truly grateful to all the members of the Embedded Systems group: Dr. Jude Angelo Ambrose, Liang, Tuo, Josef, Haseeb, Babak, Dr. Xin He and Dr. Krutartha Patel for their cooperation, continuous morale support, and for all forms of help throughout the study. I would also like to extend my thanks to members of the Computer Science and Engineering Department, UNSW, for their various support that I received over my candidature. Furthermore, I would like to take this opportunity to thank you to my lovely sister, Dr. Thazin Aung for being supportive and having invaluable suggestions for my life. I was lucky to be her sister and I will forever be thankful to her. Last, ix but not least, I would like to thank my family for their unconditional love and encouragement. My humble apologies to anyone whose name I might not have mentioned here, but I appreciate your support from the bottom of my heart. To all, whom I have mentioned and whom I have forgotten to mention, I would like to dedicate this work. x Abstract Managing power/energy consumption in complex SoC (System On Chip) systems and Application Specific Instruction set Processors (ASIPs) is emerging as a major concern in the design of embedded systems. In these systems, especially in battery- operated portable devices, performance of the system is not only measured by the speed and functionalities of what the system provides but also the lifetime of the bat- tery, which is directly proportional to the power/energy consumption of the system. Among the different components of the system, DRAM is one of the higher power consumers. The increased demand on the long battery life requires power/energy aware methodologies and a comprehensive design process flow to optimise DRAM power/energy consumption of such power-hungry devices. For power/energy estimation purposes, a high level system simulation guided approach is necessary due to the time consuming process of RTL/gate-level perfor- mance and power estimation. Applying a two-step simulation approach (memory trace sequences are captured with the processor simulator or hardware-assisted ap- proach in the first step and the collected traces are used in second step's memory system simulation) obtains inaccurate results due to a lack of feedback from one memory request to the next memory request. This thesis presents a design method- ology with a seamless interface layer to glue the processor component and mem- ory component for building a one-step system level processor-memory simulation framework so that every memory request from the processor component can be sent directly to the memory component for on-the-fly memory simulation. Over six me- diabench benchmarks, our one-step simulation approach provides greater accuracy than trace-driven memory simulations which has shown an 80% variation (over six mediabench benchmarks) in the choice of fixed memory latency in order to achieve the most accurate power consumption. Exploiting the last-level cache is a well-known technique that reduces the DRAM memory traffic. The last-level cache is inserted just before the DRAM level in the xi memory hierarchy design in order to improve the performance of the system. Esti- mation of improved performance amounts for a last-level cache configuration (cache size, cache line size and associativity) with a cycle-accurate simulation approach is exorbitantly slow. Thus, the cycle-accurate simulation for a large last-level cache configuration design space is not a feasible option to obtain the highly accurate estimates. This thesis introduces a technique to rapidly find out the performance and energy consumption of the whole system while using different last-level cache configurations. The proposed technique utilised a combination of one time slow cycle-accurate simulation and a large number of fast trace-based simulations for all the configurations, and thus, reduced the total simulation time (from 257 days to 21 hours maximally for h264 Enc application). Our methodology helps in signifi- cantly reducing the turnaround time to obtain the highly accurate execution time numbers with reasonably accurate energy numbers (average absolute accuracy of 99.74% in execution time and 80.31% in energy consumption for nine multimedia applications). DRAM's energy consumption is a very important component of total energy consumption of a system design. Exploiting both the last-level cache and DRAM's power modes together creates a chance of achieving the energy reduction in the DRAM memory system. However, the increase/reduction of energy consumption is dependent on the application request pattern and the last-level cache configuration. Selecting a suitable last-level cache configuration (from a large design space) for a target application to obtain the maximum energy savings takes a great amount of time. Thus, we developed a design framework and an energy reduction estimator to quickly explore a suitable configuration for maximum DRAM's energy reduction. First, we analysed the energy increase/reduction of the test configurations which are chosen with Latin Hypercube Sampling which is a well-known design of experimen- tal technique. Based on this analysis, we proposed an energy reduction estimator that captures the dependence of the memory system's energy reduction on certain parameters such as memory traffic, power mode switching time, etc. The energy xii reduction estimator of the DARM system is modelled by capturing the relationship between energy reduction with highly correlated DRAM parameters and by using the Kriging prediction method. We show that our technique is able to predict the DRAM energy reduction for 330 last-level cache configurations in several days (with the accuracy within 4.4% on average) for 11 applications from the mediabench and SPEC2000 suite, whereas the cycle accurate simulation took several months. xiii Contents Statement of Originality . iii Copyright Statement . iv Authenticity Statement . v Thesis Publications . vii Contributions of this Thesis . viii Acknowledgements . ix Abstract . xi Table of Contents .
Recommended publications
  • Table of Contents
    A Comprehensive Introduction to Vista Operating System Table of Contents Chapter 1 - Windows Vista Chapter 2 - Development of Windows Vista Chapter 3 - Features New to Windows Vista Chapter 4 - Technical Features New to Windows Vista Chapter 5 - Security and Safety Features New to Windows Vista Chapter 6 - Windows Vista Editions Chapter 7 - Criticism of Windows Vista Chapter 8 - Windows Vista Networking Technologies Chapter 9 -WT Vista Transformation Pack _____________________ WORLD TECHNOLOGIES _____________________ Abstraction and Closure in Computer Science Table of Contents Chapter 1 - Abstraction (Computer Science) Chapter 2 - Closure (Computer Science) Chapter 3 - Control Flow and Structured Programming Chapter 4 - Abstract Data Type and Object (Computer Science) Chapter 5 - Levels of Abstraction Chapter 6 - Anonymous Function WT _____________________ WORLD TECHNOLOGIES _____________________ Advanced Linux Operating Systems Table of Contents Chapter 1 - Introduction to Linux Chapter 2 - Linux Kernel Chapter 3 - History of Linux Chapter 4 - Linux Adoption Chapter 5 - Linux Distribution Chapter 6 - SCO-Linux Controversies Chapter 7 - GNU/Linux Naming Controversy Chapter 8 -WT Criticism of Desktop Linux _____________________ WORLD TECHNOLOGIES _____________________ Advanced Software Testing Table of Contents Chapter 1 - Software Testing Chapter 2 - Application Programming Interface and Code Coverage Chapter 3 - Fault Injection and Mutation Testing Chapter 4 - Exploratory Testing, Fuzz Testing and Equivalence Partitioning Chapter 5
    [Show full text]
  • Getting Started Mikrocodesimulator Mikrosim 2010 and Code-Generator Mikrobat 2010
    Mikrocodesimulator MikroSim 2010 Code-Generator MikroBAT 2010 Getting Started Mikrocodesimulator MikroSim 2010 and Code-Generator MikroBAT 2010 On following pages a short overview of the possibilities to create microcoding machine codes with the Mikrocodesimulator MikroSim 2010 is given, and how applying it when assembling binary code with the Code-Generator MikroBAT 2010. By means of several specially selected examples, i.e. in which simply the register A (R1) of the Mikrocodesimulator is incremented by one, the functionality of the simulator is explained step by step. At first, in the so called exploration mode the 49-bit microcode is investigated in its functionality. Already only one single microcode itself can control the execution of the incrementation of a register. In the next example, the microcode simulation mode is explained, that uses already programmed microcode examples provided by the simulator setup routine. The incrementation can be achieved in a sequence of microcode instructions (Example 1), in a looped sequence of microcode instructions (Example 2) or in a looped sequence of microcoded machine codes (Example3). Finally, a microcoded machine code program is presented which calculates the factorial (i.e. n!) of an integer value. Installation of the Mikrocodesimulator MikroSim 2010 The Mikrocodesimulator MikroSim 2010 is distributed with its own installation program that ease you the installation on your MS-Windows system. At the beginning of the installation the user can choose between two installation languages: English and German. The setup language automatically determines the start up-language of the simulation application. Of course, in the application itself the user can choose at any time the start-up language of the user interface of the bilingual software.
    [Show full text]
  • A Generic Description and Simulation of Architectures Based on Microarchitectures
    UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL INSTITUTO DE INFORMÁTICA CURSO DE BACHARELADO EM CIÊNCIA DA COMPUTAÇÃO GUSTAVO GARCIA VALDEZ A Generic Description and Simulation of Architectures Based on Microarchitectures Monograph presented in partial fulfillment of the requirements for the degree of Bachelor of Computer Science Prof. Dr. Raul Fernando Weber Advisor Porto Alegre, December 5th, 2013 CIP – CATALOGING-IN-PUBLICATION Gustavo Garcia Valdez, A Generic Description and Simulation of Architectures Based on Microarchitectures / Gustavo Garcia Valdez. – Porto Alegre: Graduação em Ciên- cia da Computação da UFRGS, 2013. 64 f.: il. Monograph – Universidade Federal do Rio Grande do Sul. Curso de Bacharelado em Ciência da Computação, Porto Alegre, BR–RS, 2013. Advisor: Raul Fernando Weber. 1. Computer Architectures. 2. Machine Simulation. 3. Mi- croarchitectures. 4. Computer Organization. I. Weber, Raul Fer- nando. II. Título. UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL Reitor: Prof. Carlos Alexandre Netto Vice-Reitor: Prof. Rui Vicente Oppermann Pró-Reitor de Graduação: Prof. Sérgio Roberto Kieling Franco Diretor do Instituto de Informática: Prof. Luís da Cunha Lamb Coordenador do CIC: Prof. Raul Fernando Weber Bibliotecário-chefe do Instituto de Informática: Alexsander Borges Ribeiro Instruction tables will have to be made up by mathematicians with com- puting experience and perhaps a certain puzzle-solving ability. There need be no real danger of it ever becoming a drudge, for any processes that are quite mechanical may be turned over to the machine itself. (TURING, 1946) ACKNOWLEDGMENTS Firstly, I would like to thank my advisor Prof. Raul Fernando Weber for the support in my bachelor thesis, for everything that I learned from him in many classes during the bachelor and for envisioning the system that I would them use as conceptual basis for my project idea.
    [Show full text]
  • Enocs: an Interactive Educational Network-On-Chip Simulator
    Paper ID #14561 ENoCS: An Interactive Educational Network-on-Chip Simulator Paul William Viglucci, Binghamton University Prof. Aaron P. Carpenter, Wentworth Institute of Technology Professor Carpenter is an Assistant Professor at the Wentworth Institute of Technology. In 2012, he completed his PhD at the University of Rochester, focusing on the performance and energy of the on-chip interconnect. c American Society for Engineering Education, 2016 ENoCS: An Interactive Educational Network-on-Chip Simulator Paul Viglucci∗ and Aaron Carpentery ∗Dept. of Electrical & Computer Engineering Binghamton University, [email protected] yDept. of Electrical Engineering & Tech. Wentworth Institute of Technology, [email protected] Abstract On-chip networking concepts, which are central to multicore microprocessor design, are often taught using textbooks and the standard lecture model, as it is difficult to provide interactive learning opportunities and hands-on assignments. Available on-chip network simulators typically focus on research-level accuracy and are not suitable for novices or students. Meanwhile, with each new chip generation, the importance of the on-chip interconnect grows. Career opportunities in computer architecture will increasingly rely on an understanding of the chip’s communication substrate. The lack of interactive, approachable tools thus leaves students with a gap in their computer architecture education in an increasingly multicore industry. In this paper, we present ENoCS, the Educational Network-on-Chip Simulator, which allows users of all levels of expertise to explore the on-chip network environment and see the inner workings of the on-chip communication substrate and all of its components. ENoCS contains multiple topologies and network options, as well as multiple traffic patterns for testing.
    [Show full text]
  • A Novel Simulator for Heterogeneous Parallel and Distributed Systems
    TE C H N I C A L UN I V E R S I T Y O F CR E T E ELECTRONIC AND COMPUTER ENGINEERING DEPARTMENT MICROPROCESSOR & HARDWARE LABORATORY A novel Simulator for Heterogeneous Parallel and Distributed Systems NIKOLAOS G. TAMPOURATZIS DISSERTETION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF DOCTOR OF PHILOSOPHY AT TECHNICAL UNIVERSITY OF CRETE CHANIA, GREECE JUNE 2018 2 Doctoral Thesis Committee Ioannis Papaefstathiou (Supervisor) Associate Professor, Technical University of Crete Dionisios Pnevmatikatos Professor, Technical University of Crete Apostolos Dollas Professor, Technical University of Crete Kostas Kalaitzakis Professor, Technical University of Crete Vasilis Samoladas Associate Professor, Technical University of Crete Dimitrios Soudris Associate Professor, National Technical University of Athens Nikolaos Bellas Associate Professor, University of Thessaly 3 Thesis Statement The astonishing growing development of Heterogeneous Parallel Systems and CPS trigger an emergence need of efficient simulators for such platforms, simulators that are extremely complex and processing hungry. 4 Abstract In an era of complex networked heterogeneous systems, simulating independently only parts, components or attributes of a system-under-design is not a viable, accurate or efficient option. By considering each part of a system in an isolated manner, and due to the numerous and highly complicated interactions between the different components, the system optimization capabilities are severely limited. One of the main problems Cyber Physical Systems (CPS) and Highly Parallel Systems (HPS) designers face is the lack of simulation tools and models for system design and analysis. This is mainly because the majority of the existing simulation tools can handle efficiently only parts of a system (e.g.
    [Show full text]
  • Design of Embedded DSP Processors Unit 7: Programming Toolchain
    © Copyright of Linköping University, all rights reserved ® TSEA80 by Dake Liu: [email protected] Design of Embedded DSP Processors Unit 7: Programming toolchain 9/26/2017 Unit 7 of TSEA26 – 2017 –H1 1 © Copyright of Linköping University, all rights reserved ® TSEA80 by Dake Liu: [email protected] Toolchain introduction There are two kinds of tools 1.The ASIP design tool for HW designers 。Frontend tool: Profiler and architecture optimizer 。Backend tool: Processor (ASIP) synthesizer 。Pipeline accurate simulator: For HW debugging 2.The assembly coding tool for programmers 。C-compiler, Assembler, Linker, (Loader) 。 Instruction set simulator: for ASM programmers 。 ASM coding Debugger: for firmware designers 9/26/2017 Unit 7 of TSEA26 – 2017 –H1 2 © Copyright of Linköping University, all rights reserved ® TSEA80 by Dake Liu: [email protected] Job loads to develop an ASIP Processor HW architecture Toolchain and kernel lib SW Ecological env 9/26/2017 Unit 7 of TSEA26 – 2017 –H1 3 Product © Copyright of Linköping University, all rights reserved ® Source code TSEA80 by Dake Liu: [email protected] Specification Lexical analysis, parsing Compiler Understand CFG extraction Front-end Applications CFG Front-end Function and Architecture design Static Specification Annotation profiling ASM Dynamic Instruction set Stimuli Critical path Specification profiling Cost and ASIP design Performance document Estimation ASIP Designer Synthesizer Idea + Application analysis Benchmark nML language, PDG language and chess compiler Synthesized ASIP Architecture Assembly Constraint
    [Show full text]
  • Department of Computer Science and Engineering Curriculum and Syllabi – Regulations 2011
    DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CURRICULUM AND SYLLABI – REGULATIONS 2011 HOURS/WEEK MAXIMUM MARKS S.NO CODE COURSE CREDITS L T P CA FE TOTAL SEMESTER - 1 THEORY 1 11USL101 Communication Skills - I 3 0 1 3 40 60 100 2 11USM101 Engineering Mathematics - I 3 1 0 4 40 60 100 3 11USC102 Chemistry for Computing Sciences 3 0 0 3 40 60 100 4 11UCK101 Fundamentals of Computing 3 0 0 3 40 60 100 5 11USP102 Physics for Computing Sciences 3 0 0 3 40 60 100 Basics of Electrical and Electronics 6 11UFK101 3 1 0 4 40 60 100 Engineering 7 11UCK102 History of Science and Engineering 1 0 0 1 100 - 100 PRACTICAL 1 11USH111 Physical Science lab I 0 0 3 1 40 60 100 2 11UCK103 Computing Practices Lab 0 0 3 2 40 60 100 3 11UAK108 Engineering Graphics Lab 1 0 3 2 40 60 100 TOTAL 20 2 10 26 HOURS/WEEK MAXIMUM MARKS S.NO CODE COURSE CREDITS L T P CA FE TOTAL SEMESTER - 2 THEORY 1 11USL201 Communication Skills - II 3 0 1 3 40 60 100 2 11USM201 Engineering Mathematics - II 3 1 0 4 40 60 100 3 11USC201 Environmental Science and Engineering 3 0 0 3 40 60 100 4 11UCK201 C Programming and Practices 3 1 0 4 40 60 100 5 11UAK201 Engineering Mechanics 3 1 0 4 40 60 100 6 11USP202 Science of Engineering Materials 3 0 0 3 40 60 100 PRACTICAL 1 11USH211 Physical Sciences Lab II 0 0 3 1 40 60 100 2 11UCK202 C Programming Lab 0 0 3 2 40 60 100 3 11UAK204 Engineering Practices Lab 0 0 3 2 40 60 100 TOTAL 18 3 10 26 HOURS/WEEK MAXIMUM MARKS S.NO CODE COURSE CREDITS L T P CA FE TOTAL SEMESTER - 3 THEORY 1 11USM301 Engineering Mathematics- III 3 1 0 4 40 60 100
    [Show full text]
  • POLITECNICO DI MILANO Master of Science Intelecommunication Engineering
    POLITECNICO DI MILANO Master of Science inTelecommunication engineering Electronics, Information and Bioengineering Department Visual Search modeling for end to end simulation of a Cyber Physical Systems Supervisor: Prof. Marco Marcon Advisor: Eng. Danilo Pietro Pau Eng. Emanuele Plebani Graduation thesis of: Shen Yun 835888 Acadmic year 2015 - 2016 Acknowledgements There are people whom I would like to acknowledge, for their assistance and support during my studies in Politecnico di Milano. I would like to thank all the wonderful teachers, colleagues, family, and friends whom I have been fortunate to interact with during my lifetime. I would like to take this opportunity to express my sincere gratitude and appreciation to my supervisor in STMicroelectronics, Danilo Pau for his countless efforts in guiding and encouraging me throughout my studies and work. His friendly attitude has been a very strong support for me to work with him. This work without his guidances and encouragements was not possible. I am so grateful to him. I am also thankful to Prof. Marco Marcon, for his valuable advices in monthly meetings and discussions, which was a plus point during my re- search. Without doubt, all these meetings together guided me into a bright way to handle the research and studies. I would like to give special thanks to Eng.Emanuele Plebani and Eng.Marco Brando Paracchini in STMicroelectronics for the uncountable helps and ad- vices they have given to me during my internship in STMicroelectronics. Also I have to thank my colleagues in the university, for their effortless helps, valuable advices and discussions. We had great and unforgettable times during all these years.
    [Show full text]