Dynamic Voltage and Frequency Scaling for 3D Graphics Applications on the State-Of-The-Art Mobile Gpus
Total Page:16
File Type:pdf, Size:1020Kb
Dynamic Voltage and Frequency Scaling for 3D Graphics Applications on the State-Of-The-Art Mobile GPUs A Dissertation Presented by Navid Farazmand to The Department of Electrical and Computer Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Engineering Northeastern University Boston, Massachusetts February 2018 To my beautiful wife, Sanam, and our little angel, Eliana Nour... ii Contents List of Figures v List of Tables vii List of Acronyms ix Acknowledgments xii Abstract of the Dissertation xiii 1 Introduction 1 1.1 Low-power design techniques . .3 1.1.1 Dynamic voltage and frequency scaling . .6 1.2 Contributions of this thesis . 12 1.3 Organization of this thesis . 13 2 Background and Related Work 15 2.1 Power consumption in CMOS circuits . 15 2.1.1 Dynamic power . 16 2.1.2 Static power . 19 2.2 GPU programming . 20 2.2.1 3D graphics programming model . 23 2.3 Dynamic voltage and frequency scaling . 26 2.3.1 Desktop CPU . 27 2.3.2 Desktop GPUs . 29 2.3.3 Mobile SoC . 30 3 Fine-Grained DVFS Analysis Framework 35 3.1 Performance measurements . 36 3.1.1 Pipeline idle insertion . 37 3.2 Power measurement setup . 39 3.3 Performance-power data alignment . 40 3.3.1 Idle energy calculation . 42 3.4 Putting it all together . 43 iii 3.4.1 API log capture and playback . 44 4 Workload Analysis and Design Space Exploration 47 4.1 Graphics workload performance and energy consumption analysis . 48 4.2 Design space exploration . 66 4.2.1 The ideal DVFS . 67 4.2.2 The effect of HW and SW configuration parameters on the DVFS . 69 4.3 Proposed OPP selection algorithms . 80 4.4 Conclusions . 85 5 QoS-Aware DVFS 89 5.1 Algorithm core logic . 89 5.2 Proactively avoiding deadline misses . 92 5.3 Single step, multilevel frequency change . 93 5.4 QoS awareness . 94 5.5 Evaluation . 99 5.5.1 Implementation . 99 5.5.2 Applications . 100 5.5.3 Results . 100 5.6 Summary . 105 6 Energy-Aware DVFS 106 6.1 Performance model . 107 6.2 Energy model . 109 6.3 Experimental setup and evaluation results . 112 6.3.1 Correlation analysis . 114 6.3.2 Models . 119 6.3.3 Evaluation results . 122 7 Conclusion and future work 134 7.1 Conclusion . 134 7.2 Future work . 136 Bibliography 137 iv List of Figures 2.1 Energy consumption in CMOS circuits during 0 ! 1 and 1 ! 0 transition on the output node [1] . 16 2.2 a) GPU architecture, b) CPU-GPU communication diagram, and c) the command- buffer (ring-buffer) . 21 2.3 OpenGL ES 2.0 graphics programming pipeline [2]. 24 2.4 eglSwapBuffers is used to identify frame boundaries: (A) the same pace between rendering and display (B) rendering faster than display (C) rendering slower than display: a jank (frame miss) occurs. 25 3.1 Instrumenting command stream by inserting performance counter profiling instruc- tions into the command stream. 36 3.2 The steps involved in the UMD to instrument command stream for performance profiling. 38 3.3 Power measurement lab setup . 39 3.4 Assertion of the GPIO pin, acquired with other voltage rails . 41 3.5 Idle power calculation based on the notion of slack time. 42 3.6 Total frame time (a) and graphics rail energy consumption (b) measured iteratively for all DDR frequencies . 44 4.1 Frame-to-frame workload variation. 50 4.2 Relationship between OPP voltage and energy consumption . 52 4.3 Total energy consumption for graphics and DDR power rails for EgyptHD, across DDR frequencies . 55 4.4 Total energy consumption for graphics and DDR power rails for EgyptHD across GPU frequencies . 56 4.5 Energy consumption at the battery rail across GPU frequencies . 58 4.6 Aggregated energy consumption across all frames for each OPP . 59 4.7 Normalized change in the energy consumption with the Graphics Processing Unit (GPU) and/or Double Data Rate memory (DDR) frequency transitions . 63 4.8 Performance-frequency relationship . 65 4.9 Selecting optimal OPP for frame 200, EgyptHD in ideal DVFS algorithm . 69 4.10 ideal DVFS output: OPP transitions and performance/energy. profile . 70 v 4.11 Ideal Dynamic Voltage and Frequency Scaling (DVFS): performance/energy profile across all configurations. 73 4.12 Summary of the effect of hardware and software configuration parameters on the energy/performance. 74 4.13 Optimizing for minimum energy vs. minimum performance . 74 4.14 Design space exploration: OPP configurations and OPP residency . 78 4.15 Effect of OPP configurations on the performance and energy consumption. 79 5.1 (a) Utilization-based algorithm: utilization calculation @DVFS execution time (b) QoS-aware algorithm: separation between statistics tracking (frame rendering time) and DVFS execution. 90 5.2 Multicontext multi-FPS rendering . 95 5.3 (a) Normalized and (b) percent of frame deadline violations . 101 5.4 Normalized (a) energy and (b) EV 2P ........................ 103 6.1 Components of (a) existing utilization-based DVFS algorithms, and (b) the proposed DVFS solution . 107 6.2 (a, b) Relationship between software pipeline stage operations and performance/en- ergy consumption. (c) Consolidated hardware pipeline of a mobile GPU with unified shaders centered around major processing blocks performing computation and data transfer. 108 6.3 Correlation coefficients (a) and scatter plots between energy (b, c) and frame draw time (d), and workload event performance counters. Left column shows data only from one representative OPP (GPU Freq. = 401:8 MHz, DDR Freq. = 1296 MHz) 115 6.4 Correlation coefficients (a) and scatter plots between energy (b, c), and frame draw time (d, e), and GPU/DDR voltage and frequency. Left column shows data from only a single randomly selected frame of a randomly selected test. 117 6.5 Performance model deadline-guarantee prediction accuracy . 123 6.6 Energy model-based DVFS accuracy. 125 6.7 Scatter plot for all combinations of the Perf. models, threshold values, and energy models: (a) Energy-Draw Time, (b) Energy-Deadline Guarantee . 126 6.8 DVFS frequency prediction accuracy: Percentage of frames for which (a) GPU frequency, (b) DDR frequency, (c) Both GPU & DDR frequencies are correctly predicted. 129 6.9 Mean Absolute Percentage Error (MAPE) for AP SoppSpr and EnSoppSpr energy models, and DT Sopp performance model (mode,outcome variable) across OPPs (GPUFreqMHz DDRFreqMHz categories on the x-axis). 130 6.10 10-fold cross-validation scatter plot; energy models: (a) GFXDDRSepPerOPP, (b) AvgPowerSepPerOPP. In both charts, the performance model is MeetsLogitPerOPP. A data point from the training/test data split (Figure 6.7) for the same model configu- ration, for each test (and the average across tests), is highlighted with the solid fill color. 132 vi List of Tables 1.1 Main reasons for low-power design across different consumer electronic device categories . .2 1.2 Representative examples of various low-power design techniques . .4 3.1 Command Stream instrumentation. Inserting profiling instructions at the eof (End of Frame) and sof(Start of Frame). 38 4.1 Summary of the hardware and software configuration parameters affecting the quality of a DVFS algorithm. 71 4.2 An example of OPP energy consumption leading to suboptimal decisions . 82 5.1 Applications used for evaluation . 99 6.1 Design space of our energy models. 110 6.2 Applications used for evaluation. 112 6.3 List of the performance counters used to profile workload parameters. 113 6.4 Predictors and outcome variable for the energy and performance models we have explored. Some models use transformed predictors and/or outcome variable that require preprocessing for both offline training and runtime prediction. 121 6.5 Breakdown of the top configurations in Figure 6.7 . 126 6.6 Goodness of fit parameters for our energy and performance models. 133 vii Listings 3.1 OpenGL ES API instrumentation for API log capture. 45 4.1 OPP selection algorithm, criteria: Energy efficiency (simple) . 80 4.2 OPP selection algorithm, criteria: Energy efficiency . 82 4.3 OPP selection algorithm, criteria: number of OPPs (simple) . 83 4.4 OPP selection algorithm, criteria: number of OPPs . ..