An Improved Instruction-Level Power and Energy Model for RISC Microprocessors
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF SOUTHAMPTON FACULTY OF PHYSICAL SCIENCES AND ENGINEERING Electronics and Computer Science An Improved Instruction-Level Power and Energy Model for RISC Microprocessors by Wei Wang Thesis for the degree of Doctor of Philosophy March 2017 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL SCIENCES AND ENGINEERING Electronics and Computer Science Doctor of Philosophy AN IMPROVED INSTRUCTION-LEVEL POWER AND ENERGY MODEL FOR RISC MICROPROCESSORS by Wei Wang Recently, the power and energy consumed by a chip has become a primary design con- straint for embedded systems and is largely affected by software. Because aims vary with the application domain, the best program is sometimes the most power or energy efficient one rather than the fastest. However, there is a gap between software and hardware that makes it hard to predict which code consumes the least power without measurement. Therefore, it is vital to discover which factors can affect a program's power and energy consumption. In this thesis we present an instruction level model to estimate the power and energy consumed by a program. Firstly, instead of studying the different instructions individ- ually, we cluster instructions into three groups: ALU, load and store. The power is affected by the percentage of each group in the program. Secondly, the power is affected by the instructions per cycle (IPC) of the program since IPC can reflect how fast the processor runs. There are three advantages of this method, and the first one is conciseness. The reason is that it does not consider the overhead energy as an independent factor or the operand Hamming distance of two consecutive instructions. The second one is accuracy. For example, the errors of our method across different benchmarks with different processors on the development boards are all less than 10%. The last and the most important advantage of this method is that it can apply to different processors, such as OpenRISC processor, ARM11, ARM Cortex-A8, and a dual- core ARM Cortex-A9 processor. We have demonstrated that the previous instruction level power/energe model cannot be extended to superscalar processors and multi-core processors. Contents Acknowledgements xvii Abbreviations xix Nomenclature xxi 1 Introduction1 1.1 Motivation and Objects............................1 1.2 Contribution..................................3 1.3 Thesis Organization..............................5 1.4 List of Publications...............................6 2 Literature Review9 2.1 Basic Model................................... 10 2.1.1 The Base Power/Energy Cost..................... 10 2.1.2 The effect of two adjacent instructions................ 11 2.1.3 The Basic Instruction Level Model.................. 13 2.1.4 The extension of the basic model................... 14 2.2 Nop Model................................... 16 2.3 Clustering Instructions Model......................... 21 2.4 Linear Regression Method to Analyze the Power/Energy.......... 23 2.4.1 Introduction to linear regression................... 23 2.4.2 Linear Regression Model........................ 24 2.4.3 Data Dependent Model........................ 26 2.4.3.1 The Effect of Data...................... 26 2.4.3.2 Different Data Dependent Models............. 28 2.4.4 Cycle-accurate Model......................... 33 2.5 Functional-level Power Model......................... 35 2.6 Architecture-level Estimation......................... 41 2.7 System-level Estimation............................ 43 2.8 Some Other Related Research......................... 50 2.8.1 The Effect of Cache.......................... 52 2.8.2 The Effect of The Different Hazards................. 54 2.8.3 The Effect of Memory and The Various Address Models...... 55 2.9 Conclusion................................... 56 2.9.1 Summary of The Previous Work................... 56 2.9.2 Trends and Changes.......................... 56 v vi CONTENTS 3 OpenRISC 59 3.1 Target Processor................................ 59 3.2 Experimental Tools............................... 61 3.2.1 Synthesis tool: Design Compiler................... 61 3.2.2 CMOS power dissipation and power analysis tool: Primetime... 61 3.2.2.1 CMOS power dissipation.................. 61 3.2.2.2 Power analysis tool: Primetime............... 63 3.3 Experimental Methodology.......................... 63 3.4 Power Analysis of Basic Test......................... 65 3.4.1 Design Of The Tests.......................... 65 3.4.2 Test Results............................... 68 3.5 Instruction Level Modeling.......................... 70 3.6 Estimation And Analysis........................... 72 3.6.1 Design of the tests........................... 72 3.6.2 Test Result............................... 75 3.7 Comparison................................... 77 3.8 Conclusions................................... 82 3.9 Limitation of the work............................. 83 4 ARM11 85 4.1 Target Processor................................ 85 4.2 Experimental Methodology.......................... 86 4.3 Basic Power Consumption of Different Instructions............. 88 4.4 The Power Consumption of Different Hamming Distances......... 91 4.5 The Overhead Power Cost........................... 94 4.6 Instruction level power analysis and modeling................ 97 4.7 Validation.................................... 101 4.8 Energy model.................................. 102 4.8.1 Comparison with Previous Work................... 103 4.8.2 Comparison with the Basic Energy Model.............. 104 4.8.3 Discussion: Low Energy Software................... 109 4.9 Conclusions................................... 111 5 ARM Cortex-A8 113 5.1 Target Processor................................ 114 5.2 Experimental Methodology.......................... 115 5.3 The Power Consumption of Different Instructions When Dual-Issue.... 116 5.3.1 The IPC for Each Dual-Issue Test.................. 116 5.3.2 The Power Consumption of Arithmetic and Logic Instructions.. 118 5.3.3 The Power Consumption of Load&Store and MUL......... 120 5.4 The Power Consumption of Dual-issue Restrictions............. 121 5.4.1 The IPC of the Dual-issue Restriction Tests............. 123 5.4.2 The Power Consumption of Dual-issue Restrictions......... 123 5.5 The Power Consumption with Different Hamming Distances........ 124 5.6 The Overhead Power Cost........................... 125 5.7 The Power Consumption of Data Cache................... 128 5.7.1 Design of the experiment....................... 128 CONTENTS vii 5.7.2 The Power Consumption And IPC of The Data Cache Experiment 133 5.8 Instruction Level Power Analysis and Modeling............... 135 5.8.1 The Power Analysis of the Combined Tests............. 135 5.8.2 Instruction Level Model........................ 138 5.9 Validation of the Power Model........................ 140 5.10 Comparison and Discussion.......................... 141 5.11 Discussion: Low Energy Software....................... 145 5.12 Conclusion................................... 146 6 ARM Cortex-A9 Dual-core Processor 149 6.1 Target Processor................................ 150 6.2 Experimental Methodology.......................... 153 6.3 Instruction Level Power Model Analysis for a Dual-core.......... 154 6.4 Experimental Design.............................. 156 6.5 Experimental Results.............................. 159 6.5.1 The Test Results of the Best Case.................. 159 6.5.2 The Test Result of the Worst Case.................. 161 6.5.3 Summary of the Experimental Results................ 163 6.6 Instruction Level Power Modeling....................... 164 6.7 Validation.................................... 166 6.8 Discussion: Energy Consumption and Performance............. 170 6.8.1 Discussion: EPI vs IPC for a Dual-core Processor.......... 170 6.8.2 Discussion: the Energy of a Single Thread Program vs a Multi Thread Program............................ 172 6.9 Conclusion................................... 173 7 How To Apply The Model To New Processors 175 7.1 The limitation of the method......................... 175 7.2 How to apply the model to new processors.................. 176 7.3 Conclusion................................... 181 8 Conclusion and Future Work 183 8.1 Conclusion................................... 183 8.2 Future Work.................................. 184 8.2.1 Apply to more complex systems................... 184 8.2.2 Reduce the energy consumption of the system............ 185 8.2.3 Static program analysis........................ 185 A Design Codes and Benchmarks 189 A.1 The Test Code of Chapter 3.......................... 189 A.1.1 The code of Section 3.3........................ 189 A.1.2 The code of Section 3.4........................ 189 A.1.3 The code of Section 3.6........................ 210 A.1.4 The code of Section 3.7........................ 223 A.2 The Test Code of Chapter 4.......................... 233 A.2.1 The code of Section 4.3........................ 233 A.2.2 The code of Section 4.4........................ 239 A.2.3 The code of Section 4.5........................ 240 viii CONTENTS A.2.4 The code of Section 4.6........................ 242 A.2.5 The code of Section 4.7........................ 243 A.3 The Test Code of Chapter 5.......................... 247 A.3.1 The code of Section 5.4........................ 247 A.3.2 The code of Section 5.7........................ 248 A.3.3 The code of Section 5.8........................ 251 A.3.4 The code of Section 5.9........................ 253 A.4 The Test Code of Chapter 6.......................... 253 A.4.1 The code of Section