Energy Efficient Branch Prediction

Energy Efficient Branch Prediction

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Hertfordshire Research Archive Energy Efficient Branch Prediction Michael Andrew Hicks A thesis submitted in partial fulfilment of the requirements of the University of Hertfordshire for the degree of Doctor of Philosophy December 2007 To my family and friends. Contents 1 Introduction 1 1.1 Thesis Statement . 1 1.2 Motivation and Energy Efficiency . 1 1.3 Branch Prediction . 3 1.4 Contributions . 4 1.5 Dissertation Structure . 5 2 Energy Efficiency in Modern Processor Design 7 2.1 Transistor Level Power Dissipation . 7 2.1.1 Static Dissipation . 8 2.1.2 Dynamic Dissipation . 9 2.1.3 Energy Efficiency Metrics . 9 2.2 Transistor Level Energy Efficiency Techniques . 10 2.2.1 Clock Gating and Vdd Gating . 10 2.2.2 Technology Scaling . 11 2.2.3 Voltage Scaling . 11 2.2.4 Logic Optimisation . 11 2.3 Architecture & Software Level Efficiency Techniques . 11 2.3.1 Activity Factor Reduction . 12 2.3.2 Delay Reduction . 12 2.3.3 Low Power Scheduling . 12 2.3.4 Frequency Scaling . 13 2.4 Branch Prediction . 13 2.4.1 The Branch Problem . 13 2.4.2 Dynamic and Static Prediction . 14 2.4.3 Dynamic Predictors . 15 2.4.4 Power Consumption . 18 2.5 Summary . 18 3 Related Techniques 20 3.1 The Prediction Probe Detector (Hardware) . 20 3.1.1 Implementation . 20 3.1.2 Pipeline Gating . 22 i 3.2 Software Based Approaches . 23 3.2.1 Hinting and Hint Instructions . 23 3.3 Analysis and Summary . 24 4 Initial Investigation and Preliminary Research 26 4.1 Research Question Focus . 26 4.2 Static Methods to Avoid Dynamic Branch Prediction . 27 4.2.1 Delay Region Scheduling . 27 4.2.2 Static Prediction and Instruction Hints . 29 4.2.3 Guarded Execution . 30 4.3 Hardware Multithreading . 31 4.4 Initial Experiments . 31 4.4.1 Removing Dynamic Branch Predictors . 31 4.4.2 Instruction Stream Research (HTracer) . 33 4.4.3 I-Cache Experimentation . 34 4.5 Summary . 34 5 The Combined Approach 36 5.1 Local Delay Region Scheduling . 36 5.2 Profiling . 38 5.2.1 Assigning a Static Branch Behaviour . 39 5.2.2 Adaptive Branch Bias Measurement (ABBM) . 40 5.3 The Combined Algorithm . 40 5.4 Hardware Implementation . 41 5.4.1 Instruction Set Modifications . 42 5.4.2 Hardware Modifications . 44 5.5 Summary . 46 6 Simulation Tools 47 6.1 Introduction . 47 6.2 Simulator (HWattch) . 47 6.2.1 Architecture Model . 49 6.2.2 Architecture Modifications . 52 6.2.3 Profiling Enhancement . 55 6.2.4 Instruction Set (PISA) . 56 6.2.5 Compiler (Custom GCC) . 59 6.3 Scheduler and Static Prediction Assigner (HACA) . 59 6.3.1 Combined Algorithm: Practical Implementation . 59 6.4 EEMBC . 63 6.4.1 Sub-Suites and Benchmarks . 63 6.4.2 Bespoke Build System for the Combined Algorithm . 65 6.5 Summary . 65 ii 7 Simulations and Results 66 7.1 Introduction . 66 7.2 The Baseline Models . 66 7.2.1 The Branch Predictor . 67 7.2.2 Scalar Processor . 67 7.2.3 Multiple Instruction Issue Processor . 68 7.3 Preamble To Results . 70 7.3.1 Metrics . 70 7.3.2 Calculation of Averages and ‘Weighted Averages’ . 74 7.3.3 Important Summary Notes . 75 7.4 Scalar Processor Results . 75 7.4.1 Benchmark Breakdown . 75 7.4.2 Averages . 78 7.5 Two Instruction Issue Processor Results . 79 7.5.1 Benchmark Breakdown . 79 7.5.2 Averages . 79 7.6 Sixteen Instruction Issue Processor Results . 83 7.6.1 Benchmark Breakdown . 83 7.6.2 Averages . 86 7.7 Overall Analysis . 86 7.7.1 Results Summary . 88 8 Comparisons and Enhancements 90 8.1 Comparison of ABBM with Fixed Bias Level and Compiler Heuris- tics . 90 8.1.1 Results and Analysis . 91 8.2 Reducing Set Associativity in the Branch Target Buffer . 92 8.2.1 Results and Analysis . 92 8.3 Summary . 94 9 Conclusion and Discussion 96 9.1 Thesis Summary . 96 9.1.1 Key Novelties and Contributions . 97 9.2 Generalisation . 97 9.3 Critique . 99 9.3.1 Local Delay Region . 99 9.3.2 Hint Bits . 100 9.3.3 Timing Issues . 101 9.3.4 Profiling Duration . 102 9.3.5 Profiling on a ‘Real’ Architecture . 102 9.3.6 Dependency on Datasets . 104 9.4 Related Work Comparison . 104 9.4.1 Prediction Probe Detector . 105 9.5 Future Work . 106 iii 9.5.1 Maximising the Fetch Window of Wide Issue Processors . 106 9.5.2 Hinting Libraries . 107 9.5.3 Combining with the Prediction Probe Detector . 107 9.5.4 Hints and Context Switching . 108 9.5.5 Profiling and Processor-Wide Power Saving . 109 9.6 Concluding Remarks . 109 Bibliography 111 Glossary 120 Appendix A: Published Papers i Towards an Energy Efficient Branch Prediction Scheme. ii Reducing the Branch Power Cost In Embedded Processors. iii HTracer: A Dynamic Instruction Stream Research Tool iv Enhancing the I-cache to Reduce the Power Consumption. Appendix B: Technical Reports i An Introduction to Power Consumption Issues in Processor Design ii HTracer V0.5: A User Guide Appendix C: Additional Background Appendix D: Raw Data iv List of Figures 2.1 An example five stage processor pipeline . 14 2.2 An example of a modern dynamic predictor architecture . 15 3.1 The Prediction Probe Detector . 21 5.1 An example of local delayed branch scheduling . 37 5.2 The basic structure of profiling . 39 5.3 Block model of the profiling and hinting regime . 42 5.4 Hardware modifications required in the instruction fetch stage . 45 6.1 The Wattch simulator in relation to SimpleScalar . 48 6.2 The Wattch simulator pipeline . 49 6.3 A logical represention of the IF stage hint-bits showing ‘1,1’ . 53 6.4 A logical represention of the EXE stage hint-bits showing ‘1,0’ . 54 6.5 The PISA instruction format . 56 6.6 The location of the two hint-bits within the branch instruction format 58 7.1 Scalar baseline global power savings (%) compared with ideal (free) prediction . 77 7.2 Scalar baseline average global power savings (%) compared with ideal (free) prediction . 78 7.3 2-way issue baseline global power savings (%) compared with ideal (free) prediction . 81 7.4 2-way issue baseline average power savings (%) compared with ideal (free) prediction . 82 7.5 16-way issue baseline global power savings (%) compared with ideal (free) prediction . 85 7.6 16-way issue baseline average power savings compared with ideal (free) prediction . 87 8.1 Average Change in the dynamic instruction stream after resizing the BTB from four-way to two-way set-associativity . 93 8.2 Additional power saving after resizing the BTB . 94 v 9.1 Series and parallel i-cache/branch predictor access. (1) and (2) represent the direction and target address predictors, respectively . 101 vi List of Tables 5.1 Static and dynamic branch occurrence for each PISA branch, and its occurrence across the whole EEMBC benchmark suite . 43 6.1 Static and dynamic branch occurrence for each PISA branch, and its occurrence across the whole EEMBC benchmark suite . 57 6.2 The full EEMBC benchmark suite with descriptions . 64 7.1 Scalar Processor Baseline Configuration . 69 7.2 2-Way Issue Processor Baseline Configuration . 71 7.3 16-Way Issue Processor Baseline Configuration . 72 7.4 Benchmark breakdown results for scalar baseline processor . 76 7.5 Average benchmark results for scalar baseline processor . 78 7.6 Benchmark breakdown results for two-way issue baseline processor 80 7.7 Average benchmark results for two-way issue baseline processor . 82 7.8 Benchmark breakdown results for sixteen-way issue baseline pro- cessor . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    202 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us