KIT PPT Master
Total Page:16
File Type:pdf, Size:1020Kb
Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Andreas Merkel, Jan Stoess, Frank Bellosa System Architecture Group KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH) Memory Contention – a Problem on Multicores CPU Memory 2 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU Memory 3 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU Memory 4 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU CPU CPU Memory 5 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU CPU CPU Memory CPU CPU CPU CPU 6 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention – a Problem on Multicores CPU CPU CPU CPU CPU CPU CPU CPU Memory CPU CPU CPU CPU CPU CPU CPU CPU 7 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Memory Contention Intel Core2 Quad core0 core1 Bottleneck: memory bus stream idle core2 core3 Stall cycles, increased runtime idle idle core0 core1 stream stream 4.5 core2 core3 idle idle 4 e core0 core1 m i 3.5 stream stream t e n c core2 core3 u n 3 stream stream r a 1 instance t d s 2.5 e 2 instances n on 4 cores z i i l } r 2 4 instances a e p m r 1.5 o n 1 0.5 0 stream memory benchmark 8 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Impact of Resource Contention on Energy Efficiency Longer time to halt More static power Increasing importance of leakage 9 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Achieving Energy Efficiency by Scheduling Scheduler decides When Where In which combination At which frequency setting to execute tasks. What ist the most energy-efficient schedule? 10 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Achieving Energy Efficiency via Co-Scheduling Combination of tasks running together determines performance and energy efficiency Memory-bound + memory-bound: low energy efficiency Avoid memory bottleneck by combining memory- bound with compute bound tasks ➔ Co-schedule tasks with different characteristics energy avoid mem + comp efficiency contention 11 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Achieving Energy Efficiency via DVFS DVFS: Dynamic Voltage and Frequency Scaling Adapt processor frequency and voltage to task characteristics Memory-bound tasks: low frequency/voltage Compute-bound tasks: high frequency/voltage Multicore hardware limits options for frequency/voltage selection Often shared frequency/voltage domains ➔ Co-schedule similar tasks to select common best frequency and voltage energy use mem + mem efficiency DVFS 12 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Achieving Energy Efficiency use mem + mem DVFS energy efficiency avoid mem + comp contention 13 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Achieving Energy Efficiency use mem + mem DVFS energy efficiency avoid mem + comp contention 14 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Outline Analysis Resource contention Shared frequency/voltage domains Resource-conscious scheduling for energy efficiency OS task scheduling VM scheduling Frequency selection Evaluation Reduction of resource contention Increase in energy efficiency by 10 to 20% 15 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Analysis of Resource Contention on the Intel Core2 Quad Q6600 Contention for shared resources reduces energy efficiency Shared L2 caches (two cores) Shared memory interconnect (four cores) 16 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 17 Resource Contention SPEC CPU 2006 CPU Contention SPEC Resource normalized runtime per instance 0.5 1.5 2.5 3.5 4.5 0 1 2 3 4 Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on hmmer libquantum 4instances caches shared 2instances caches separate 2instances 1instance 18 Resource Contention SPEC CPU 2006 CPU Contention SPEC Resource normalized runtime per instance 0.5 1.5 2.5 3.5 4.5 0 1 2 3 4 Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on hmmer libquantum 4instances caches shared 2instances caches separate 2instances 1instance 19 Resource Contention SPEC CPU 2006 CPU Contention SPEC Resource normalized runtime per instance 0.5 1.5 2.5 3.5 4.5 0 1 2 3 4 Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on hmmer libquantum 4instances caches shared 2instances caches separate 2instances 1instance 20 Resource Contention SPEC CPU 2006 CPU Contention SPEC Resource normalized runtime per instance 0.5 1.5 2.5 3.5 4.5 0 1 2 3 4 Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on hmmer libquantum 4instances caches shared 2instances caches separate 2instances 1instance 21 Resource Contention SPEC CPU 2006 CPU Contention SPEC Resource normalized runtime per instance 0.5 1.5 2.5 3.5 4.5 0 1 2 3 4 Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on hmmer libquantum 4instances caches shared 2instances caches separate 2instances 1instance 22 0.5 1.5 2.5 3.5 Resource Contention SPEC CPU 2006 CPU Contention SPEC Resource 2 3 0 1 caches 2 instances separate 1 instance gromacs 4 instances caches 2 instances shared Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on namd compute-bound povray hmmer gamess h264ref sjeng gobmk dealII tonto zeusmp bzip2 astar xalancbmk memory-bound leslie3d bwaves sphinx3 omnetpp mcf soplex GemsFDTD milc libquantum lbm Resource Contention SPEC CPU 2006 Compute-bound benchmarks Little resource contention Memory-bound benchmarks Severe slowdown caused by memory contention Huge increase in memory demands since SPEC 2000 Cache contention of comparatively little importance 23 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors 0.2 0.4 0.6 0.8 1.2 1.4 1.6 1.8 24 0 1 Energy Efficiency under DVFS under Efficiency Energy Reducing paysoff the frequency for memory intensive tasks instances4 of benchmark Comparison of to 1.6GHz gromacs namd Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on compute-bound povray hmmer gamess h264ref sjeng gobmk dealII tonto 2.4GHz zeusmp bzip2 astar xalancbmk leslie3d memory-bound bwaves sphinx3 omnetpp mcf soplex GemsFDTD milc edp energy time libquantum lbm 0.5 1.5 2.5 3.5 0.2 0.4 0.6 0.8 1.2 1.4 1.6 1.8 0 1 2 3 0 1 gromacs gromacs 4 instances caches 2 instances shared caches 2 instances separate 1 instance namd namd povray povray hmmer hmmer gamess gamess h264ref h264ref sjeng sjeng gobmk gobmk contention dealII dealII DVFS tonto tonto zeusmp zeusmp bzip2 bzip2 astar astar xalancbmk xalancbmk leslie3d leslie3d bwaves bwaves sphinx3 sphinx3 omnetpp omnetpp mcf mcf soplex soplex GemsFDTD GemsFDTD milc milc edp energy time libquantum libquantum lbm lbm 0 1 2 3 4 5 6 7 8 9 26 Contention and Resource DVFS under Efficiency Energy gromacs namd povray Multicore Processors Multicore Resource-ConsciousScheduling for Efficiency Energy on hmmer gamess calculix h264ref sjeng perlbench gobmk cactusADM dealII tonto zeusmp wrf bzip2 astar xalancbmk leslie3d bwaves sphinx3 omnetpp mcf soplex GemsFDTD EDP energy time gcc milc libquantum lbm Energy-Efficient Co-Scheduling use mem + mem DVFS energy efficiency avoid mem + comp contention 27 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Energy-Efficient Co-Scheduling energy efficiency avoid mem + comp contention 28 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Energy-Efficient Co-Scheduling Avoiding resource contention Requires knowledge of task characteristics Requires coordination of task selection across cores Merkel and Bellosa, EuroSys 2008 Task characterization Execution of tasks in a defined order (runqueue sorting) Used for mitigating thermal effects Take advantage of runqueue sorting to provide coordination with low overhead 29 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Sorted Co-Scheduling Group cores in pairs Sort runqueues by critical resource (memory bandwidth) Coordinate processing of runqueues Co-schedule tasks with complementary resource demands core 1 core 0 time 30 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Sorted Co-Scheduling Dealing with unequeal runqueue lengths Example: core 0 executes one task more than core 1 Time needed to process runqueues does not even out → increase length of timeslices on core 1 core 1 core 0 31 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Sorted Co-Scheduling Shift runqueues of additional cores Avoid running most memory intensive tasks together core 1 core 0 core 3 core 2 32 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Resource-Conscious Load Balancing Sorting requires tasks with different characteristics on each core Migrate task if variance among tasks in runqueue is increased core 0 core 1 core 0 core 1 33 Resource-Conscious Scheduling for Energy Efficiency on Multicore Processors Virtual Machine Scheduling Leverage workload diversity of several physical machines Extend