Shutdown policies with power capping for large scale computing systems : the ELCI solution
Issam Raïs1 Anne Benoit1 Laurent Lefevre1 Anne-Cécile Orgerie2
Inria, University of Lyon, France.
CNRS, IRISA, Rennes, France.
PRACE Booth, 16 November 2017
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution1 Energy, a global concern for large scale platforms
An energy driven world ...
I MW, MW, MW....
I DataCenters responsible of 2% of global carbon emissions
I Need better flops per watt !
... needs a "greener" usage of infrastructures
I Free cooling
I Low-power processors
I Recover lost energy (ex : reuse water-cooling)
I ShutDown techniques
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution2 Make computing centers energy efficient
Shutdown techniques are :
I Extensively studied in literature
I Not used by computing centers administrators Why are we interested in this leverage ?
I Non proportional computing units
I And over provisioning of infrastructures
I Lead to non negligible energy consumption when device is idle
I One of the most promising leverage
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution3 ShutDown Techniques, disadvantage
Risk Turn On or Off is not energy and time free !
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution4 Models inputs
200 Off->On
150
On->Off
100 Watt PIdle
50
POff
0 0 50 100 150 200 250 Time +1.4528503e9 Seqi = {(t0; AvrgP0),..., (tn; AvrgPn)} Monitored input for the models
I Seqi : sequence of node i
I tk : timestamp at second k of Seqi
I AvrgPk : power consumption at second k of Seqi
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution5 Models usage
Models aim At given time T , a model aims at
I Answering whether the device can switch state
I While respecting imposed constraints
Models scope Could be used at different scale
I On one device
I On a sub-set of devices
I On all devices
Model hypothesis
At current time Tc , we know
I Node reservation
I State of every node (Working, Idle,Off )
I A window on state of nodes from Tc to Tc + Ts
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution6 Models definition
Basic models Used by most paper in the literature
I No-OnOff model : the nodes are never shut down
I LB-ZeroCost-OnOff : no cost to shut down or wake up nodes
Sequence-Aware models Make sure that a sequence :
I Time constrained : fits in time
I Energy constrained : is beneficial in energy
PowerCapping aware models Aims at maintaining an average power budget
I PC_Min : lower limit for power usage
I PC_Max : upper limit for power usage
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution7 Two detailed traces
Grid’5000
I Large-scale and versatile testbed
I Experiment-driven research in all areas of computer science
I High heterogeneity in 8 different sites
I Fine grain trace (every Watt consumed every second)
I One week and month traces
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution8 Simulation : LB-ZeroCost-OnOff, Seq-Aw-T and Seq-Aw-E models
14000 NO SAT LB 12000 SAE
10000
8000 Watt 6000
4000
2000
0
25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov
Model Total energy consumed # cycles % Saved Grid’5000 trace, 1 week No-OnOff 6,083,698,688 0 0,0 LB-ZeroCost-OnOff 3,983,408,384 1794 34.52 Seq-Aw-T 4,015,736,064 964 33.99 Seq-Aw-E 4,015,201,024 844 34.00
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution9 Simulation : Power-Cap model
14000 NO PC2000min PC4000min 12000 PC6000min SAT
10000
8000 Watt 6000
4000
2000
0
25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov
Model Total energy consumed # cycles % Saved No-OnOff 6,083,698,688 0 0,0 Seq-Aw-T 4,015,736,064 964 33.99 Power-Cap2000 min 4,401,067,520 855 27.65 Power-Cap4000 min 4,593,668,096 761 24.49 Power-Cap6000 min 5,059,857,408 617 16.82
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 10 Scalability validated
Model Total energy consumed # cycles % Saved Grid’5000 trace, 1 week No-OnOff 6,083,698,688 0 0,0 LB-ZeroCost-OnOff 3,983,408,384 1794 34.52 Seq-Aw-T 4,015,736,064 964 33.99 Seq-Aw-E 4,015,201,024 844 34.00 Power-Cap2000 min 4,401,067,520 855 27.65 Power-Cap4000 min 4,593,668,096 761 24.49 Power-Cap6000 min 5,059,857,408 617 16.82 Grid’5000 trace, 1 month No-OnOff 22,866,315,264 0 0.0 LB-ZeroCost-OnOff 12,935,132,160 5,559 43.43 Seq-Aw-T 13,038,270,464 3,819 42.9804 Seq-Aw-E 13,037,558,784 3,605 42.9835 Power-Cap4000 min 17,864,194,048 2,376 21.87
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 11 Apply realistic shutdown on platforms !
Come to discuss and take a flyer !
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 12 Conclusion
In this paper, models for the OnOff leverage are proposed.
I Express physical constraints easily
I Usage "à la carte"
I Provides a clear answer to changing the state of a device, while taking into account various factors
I Large possibility of usage, one simulated
I Generic models that can be adapted to every device that can be shutdown and waked-up
Future work [Benoit et al., 2017]
I Deeper analysis of combination of models
I Studying the behavior of switching nodes to control the impact on cooling system
I Studying the specific case of external interactions : renewable energy
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 13 References
Benoit, A., Lefèvre, L., Orgerie, A.-C., and Rais, I. (2017). Reducing the energy consumption of large scale computing systems through combined shutdown policies with multiple constraints. International Journal of High Performance Computing Applications.
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 14 Week replay statistics
14000
12000
10000
8000 Watt 6000
4000
2000
NO 0
25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov
Day #Jobs Average job Average power cons. (W) job size (s) Oct. 24 (7PM to 12AM) 33 157.91 50,401.24 Oct. 25 (Full day) 144 155.08 23,002.74 Oct. 26 (Full day) 277 159.79 12,299.06 Oct. 27 (Full day) 353 154.11 13,819.43 Oct. 28 (Full day) 318 159.96 27,286.17 Oct. 29 (Full day) 171 174.11 41,525.71 Oct. 30 (Full day) 180 174.04 39,453.67 Oct. 31 (Full day) 563 173.39 12,821.24 Nov. 1 (12AM to 8AM) 48 179.25 17,179.17
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 15 SAE - Ts
Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 16