Thermal-Aware Scheduling in Openmp
Total Page:16
File Type:pdf, Size:1020Kb
Thermal-aware Scheduling in OpenMP ARTUR PODOBAS KTH Information and Communication Technology Master of Science Thesis Stockholm, Sweden 2010 TRITA-ICT-EX-2010:114 I would like to thank my supervisor Mats Brorsson for the freedom I had in this work and his time, help, advices and directions when I was stuck as well as his mentorship. I also would like to oer thanks to the SICS multicore groups for useful feedback in the dierent stages of this work. Simics and Gems forum users have my thanks for all the question solved by reading through them. Finally my thanks to the guys at Barcelona Supercomputing Center for their support in the Nanos++ and Mercurium compiler. 2 Contents I Abbreviations 11 II Introduction 12 III Background Study 13 1 Introduction 14 2 Eects 15 2.1 Introduction . 15 2.2 Electromigration . 15 2.3 Stressmigration . 16 2.4 Time-Dependent Dielectric Breakdown . 16 2.5 Resistor-Capacitor-Delays / Propagation Delays . 17 2.6 Ohmic drop . 18 2.7 Package and Lifetime Cost . 18 3 The transistor and the inverter 20 3.1 Introduction . 20 3.2 The MOSFET Transistor . 20 3.3 The Inverter . 20 4 Power calculation 21 4.1 Introduction . 21 4.2 Static Power . 21 4.3 Dynamic Power . 21 5 Power management 24 5.1 Introduction . 24 5.2 Voltage and frequency scaling . 24 5.3 Clock Gating . 25 5.4 Power-ecient logic . 26 5.5 Thread, Task and Process migration . 26 5.6 Shutting down modules . 27 5.7 Cool-loops . 27 6 Scheduling and Thermal modeling 27 6.1 Introduction . 27 6.2 Predictive Dynamic Thermal Management for Multicore Systems 28 6.2.1 Introduction . 28 6.2.2 Results . 28 6.2.3 Model: Application-Based-Thermal-Modeling . 28 3 6.2.4 Model: Core-Based-Thermal-Modeling . 28 6.2.5 Scheduler: Predictive Dynamic Thermal Management . 29 6.3 Temperature-aware Scheduler Based on Thermal Behavior group- ing in Multicore systems . 29 6.3.1 Introduction . 29 6.3.2 Results . 30 6.3.3 Model: Core-Based-Thermal-Modeling . 30 6.3.4 Scheduler: Prediction Based Thermal Management . 31 6.4 Temperature Aware Task Scheduling in MPSoCs . 32 6.4.1 Introduction . 32 6.4.2 Results . 32 6.4.3 Scheduler: Temperature-aware Task Scheduling in MPSoCs 33 6.5 Temperature-aware MPSoC scheduling for reducing hot spots and gradients . 33 6.5.1 Introduction . 33 6.5.2 Results . 34 6.5.3 Model: Task-Graph optimum ILP model . 34 6.5.4 Scheduler: Hybrid Temperature-aware scheduler . 35 6.6 Proactive temperature balancing for low cost thermal manage- ment in MPSoCs . 36 6.6.1 Introduction . 36 6.6.2 Results . 36 6.6.3 Model: ARMA model . 36 6.6.4 Scheduler: Proactive Temperature Balancing Scheduler . 36 IV Methodology 38 7 Introduction 39 8 System 39 8.1 CPU . 39 8.1.1 Introduction . 39 8.1.2 Simulator . 40 8.1.3 Power consumption . 41 8.2 Caches . 42 8.2.1 Introduction . 42 8.2.2 MSI . 42 8.2.3 Simulator . 43 8.2.4 Power Consumptions . 43 8.3 Network and Routers . 43 8.3.1 Introduction . 43 8.3.2 Dimension-Order routing . 44 8.3.3 Simulator . 45 8.3.4 Power Consumption . 45 8.4 On-Chip Temperature Sensor . 45 4 8.4.1 Introduction . 45 8.4.2 Simulator . 46 8.4.3 Power Consumption . 46 8.5 Hardware Counters and Monitors . 48 8.5.1 Introduction . 48 8.5.2 List of Magic-Instructions . 48 8.5.3 Using the hardware counters and monitors . 48 9 Software 49 9.1 OpenMP . 49 9.2 Mercurium and Nanos++ ...................... 49 10 Verication 49 10.1 System temperature test . 49 10.2 System hardware counters test . 49 10.3 Cache coherence verication . 51 V Benchmarks 55 11 Introduction 56 12 Fibonacci 56 13 nQueens 58 14 Multisort 58 VI Implementation 59 15 Introduction 60 16 Case Study : Task-descriptors in Mercurium and Nanos++ 61 17 Case study: Cilk-scheduler in Mercurium and Nanos++ 62 18 Breadth-rst Scheduler 66 19 Work-rst Scheduler 67 20 Temperature-aware Scheduler 68 VII Results 72 20.1 Metrics . 72 20.2 Benchmark results . 72 20.3 Fibonacci . 73 5 20.4 Queens . 76 20.5 Multisort . 77 VIII Conclusion 82 IX Future work 83 20.6 Simulation platform . 83 20.7 Power regulation control . 83 20.8 Scheduler . 83 References 85 6 List of Figures 1 Chip-interconnect worn down by EM. Picture from Sanya Semi- conductor. 15 2 Gate-oxide breakdown(shows as a void) shown on a circuit. Pic- ture from Sanya Semiconductor. 16 3 Showing the propagation delay of the signal between the output of inverter one and the input of inverter two. 17 4 Ohmic-drop across a systems power-rails. 18 5 CMOS Inverter. 21 6 Leakage parasitics shown on the CMOS inverter. 22 7 Energy and Dynamic power consumption. a) Charging the out- put node. b) Discharging the output node. c) Short-cut between Vdd and GND. 23 8 Example DVFS implementation. 24 9 Power distribution of a CPU(Caches not included) . 25 10 Example gated Clock tree. 26 11 Inductive voltage drop. 27 12 Flow-graph of the Predictive Dynamic Thermal Management sched- uler. 30 13 Thermal group and pattern. 31 14 Flow-graph of the Prediction Based Thermal Management sched- uler. 32 15 Tlow and Tthr shown in the sliding window. Picture from [12] . 33 16 Flow-graph of the Temperature-aware task scheduler. 34 17 Flow-graph of the Hybrid Temperature-aware task scheduler. 35 18 ARMA model order impact on accuracy. 37 19 Overview of the system. 40 20 CPU instruction power weights . 41 21 CPU power trace using two processors. 42 22 MSI Cache state transitions. Picture taken from NCSU library. 43 23 Cache latencies and energy consumption . 44 24 Dimension order X-Y routing . 45 25 Example router power trace . 46 26 Floorplan for one tile . 47 27 Tile64 chip layout. 47 28 Doing a magic call using C. 48 29 Function for reading a processors temperature. 48 30 System temperature test . 50 31 Temperature image of the test-program. 50 32 Ruby reacting to magic instructions . 51 33 Example tester statistic dump . 52 34 Case-1 coherence scenario . 53 35 Case-2 coherence scenario . 53 36 Case-3 coherence scenario . 54 37 Source code for Fibonacci . 57 7 38 Fibonacci for n=4 . 57 39 Queens task graph for a 4x4 table. 58 40 Multisort using parallel sort and merge. 59 41 Task spawn and Taskwait in bonacci . 61 42 Translated source-code for spawning a task and syncinc using MCC and Nanos++ ......................... 62 43 Scheduler class template in Nanos++ ............... 63 44 The function for queuing tasks onto threads inside the Cilk- scheduler. 64 45 atSubmit function in the Cilk-scheduler . 64 46 atIdle() function in the Cilk-scheduler. ..