Shutdown Policies with Power Capping for Large Scale Computing Systems : the ELCI Solution
Total Page:16
File Type:pdf, Size:1020Kb
Shutdown policies with power capping for large scale computing systems : the ELCI solution Issam Raïs1 Anne Benoit1 Laurent Lefevre1 Anne-Cécile Orgerie2 Inria, University of Lyon, France. CNRS, IRISA, Rennes, France. PRACE Booth, 16 November 2017 Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution1 Energy, a global concern for large scale platforms An energy driven world ... I MW, MW, MW.... I DataCenters responsible of 2% of global carbon emissions I Need better flops per watt ! ... needs a "greener" usage of infrastructures I Free cooling I Low-power processors I Recover lost energy (ex : reuse water-cooling) I ShutDown techniques Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution2 Make computing centers energy efficient Shutdown techniques are : I Extensively studied in literature I Not used by computing centers administrators Why are we interested in this leverage ? I Non proportional computing units I And over provisioning of infrastructures I Lead to non negligible energy consumption when device is idle I One of the most promising leverage Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution3 ShutDown Techniques, disadvantage Risk Turn On or Off is not energy and time free ! Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution4 Models inputs 200 Off->On 150 On->Off 100 Watt PIdle 50 POff 0 0 50 100 150 200 250 Time +1.4528503e9 Seqi = f(t0; AvrgP0);:::; (tn; AvrgPn)g Monitored input for the models I Seqi : sequence of node i I tk : timestamp at second k of Seqi I AvrgPk : power consumption at second k of Seqi Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution5 Models usage Models aim At given time T , a model aims at I Answering whether the device can switch state I While respecting imposed constraints Models scope Could be used at different scale I On one device I On a sub-set of devices I On all devices Model hypothesis At current time Tc , we know I Node reservation I State of every node (Working, Idle,Off ) I A window on state of nodes from Tc to Tc + Ts Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution6 Models definition Basic models Used by most paper in the literature I No-OnOff model : the nodes are never shut down I LB-ZeroCost-OnOff : no cost to shut down or wake up nodes Sequence-Aware models Make sure that a sequence : I Time constrained : fits in time I Energy constrained : is beneficial in energy PowerCapping aware models Aims at maintaining an average power budget I PC_Min : lower limit for power usage I PC_Max : upper limit for power usage Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution7 Two detailed traces Grid’5000 I Large-scale and versatile testbed I Experiment-driven research in all areas of computer science I High heterogeneity in 8 different sites I Fine grain trace (every Watt consumed every second) I One week and month traces Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution8 Simulation : LB-ZeroCost-OnOff, Seq-Aw-T and Seq-Aw-E models 14000 NO SAT LB 12000 SAE 10000 8000 Watt 6000 4000 2000 0 25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov Model Total energy consumed # cycles % Saved Grid’5000 trace, 1 week No-OnOff 6,083,698,688 0 0,0 LB-ZeroCost-OnOff 3,983,408,384 1794 34.52 Seq-Aw-T 4,015,736,064 964 33.99 Seq-Aw-E 4,015,201,024 844 34.00 Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution9 Simulation : Power-Cap model 14000 NO PC2000min PC4000min 12000 PC6000min SAT 10000 8000 Watt 6000 4000 2000 0 25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov Model Total energy consumed # cycles % Saved No-OnOff 6,083,698,688 0 0,0 Seq-Aw-T 4,015,736,064 964 33.99 Power-Cap2000 min 4,401,067,520 855 27.65 Power-Cap4000 min 4,593,668,096 761 24.49 Power-Cap6000 min 5,059,857,408 617 16.82 Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 10 Scalability validated Model Total energy consumed # cycles % Saved Grid’5000 trace, 1 week No-OnOff 6,083,698,688 0 0,0 LB-ZeroCost-OnOff 3,983,408,384 1794 34.52 Seq-Aw-T 4,015,736,064 964 33.99 Seq-Aw-E 4,015,201,024 844 34.00 Power-Cap2000 min 4,401,067,520 855 27.65 Power-Cap4000 min 4,593,668,096 761 24.49 Power-Cap6000 min 5,059,857,408 617 16.82 Grid’5000 trace, 1 month No-OnOff 22,866,315,264 0 0.0 LB-ZeroCost-OnOff 12,935,132,160 5,559 43.43 Seq-Aw-T 13,038,270,464 3,819 42.9804 Seq-Aw-E 13,037,558,784 3,605 42.9835 Power-Cap4000 min 17,864,194,048 2,376 21.87 Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 11 Apply realistic shutdown on platforms ! Come to discuss and take a flyer ! Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 12 Conclusion In this paper, models for the OnOff leverage are proposed. I Express physical constraints easily I Usage "à la carte" I Provides a clear answer to changing the state of a device, while taking into account various factors I Large possibility of usage, one simulated I Generic models that can be adapted to every device that can be shutdown and waked-up Future work [Benoit et al., 2017] I Deeper analysis of combination of models I Studying the behavior of switching nodes to control the impact on cooling system I Studying the specific case of external interactions : renewable energy Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 13 References Benoit, A., Lefèvre, L., Orgerie, A.-C., and Rais, I. (2017). Reducing the energy consumption of large scale computing systems through combined shutdown policies with multiple constraints. International Journal of High Performance Computing Applications. Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 14 Week replay statistics 14000 12000 10000 8000 Watt 6000 4000 2000 NO 0 25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov Day #Jobs Average job Average power cons. (W) job size (s) Oct. 24 (7PM to 12AM) 33 157.91 50,401.24 Oct. 25 (Full day) 144 155.08 23,002.74 Oct. 26 (Full day) 277 159.79 12,299.06 Oct. 27 (Full day) 353 154.11 13,819.43 Oct. 28 (Full day) 318 159.96 27,286.17 Oct. 29 (Full day) 171 174.11 41,525.71 Oct. 30 (Full day) 180 174.04 39,453.67 Oct. 31 (Full day) 563 173.39 12,821.24 Nov. 1 (12AM to 8AM) 48 179.25 17,179.17 Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 15 SAE - Ts Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 16.