<<

policies with power capping for large scale computing systems : the ELCI solution

Issam Raïs1 Anne Benoit1 Laurent Lefevre1 Anne-Cécile Orgerie2

Inria, University of Lyon, France.

CNRS, IRISA, Rennes, France.

PRACE Booth, 16 November 2017

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution1 Energy, a global concern for large scale platforms

An energy driven world ...

I MW, MW, MW....

I DataCenters responsible of 2% of global carbon emissions

I Need better flops per watt !

... needs a "greener" usage of infrastructures

I Free cooling

I Low-power processors

I lost energy ( : reuse water-cooling)

I ShutDown techniques

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution2 computing centers energy efficient

Shutdown techniques are :

I Extensively studied in literature

I Not used by computing centers administrators Why are we interested in this leverage ?

I Non proportional computing units

I And over provisioning of infrastructures

I Lead to non negligible energy consumption when device is idle

I One of the most promising leverage

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution3 ShutDown Techniques, disadvantage

Risk Turn On or Off is not energy and free !

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution4 Models inputs

200 Off->On

150

On->Off

100 Watt PIdle

50

POff

0 0 50 100 150 200 250 Time +1.4528503e9 Seqi = {(t0; AvrgP0),..., (tn; AvrgPn)} Monitored input for the models

I Seqi : sequence of node i

I tk : timestamp second k of Seqi

I AvrgPk : power consumption at second k of Seqi

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution5 Models usage

Models aim At given time T , a model aims at

I Answering whether the device can switch state

I While respecting imposed constraints

Models scope Could be used at different scale

I On one device

I On a sub-set of devices

I On all devices

Model hypothesis

At current time Tc , we know

I Node reservation

I State of every node (Working, Idle,Off )

I A window on state of nodes from Tc to Tc + Ts

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution6 Models definition

Basic models Used by most paper in the literature

I No-OnOff model : the nodes are never shut down

I LB-ZeroCost-OnOff : no cost to shut down or wake up nodes

Sequence-Aware models Make sure that a sequence :

I Time constrained : fits in time

I Energy constrained : is beneficial in energy

PowerCapping aware models Aims at maintaining an average power budget

I PC_Min : lower limit for power usage

I PC_Max : upper limit for power usage

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution7 Two detailed traces

Grid’5000

I Large-scale and versatile testbed

I Experiment-driven research in all areas of computer science

I High heterogeneity in 8 different sites

I Fine grain trace (every Watt consumed every second)

I One week and month traces

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution8 Simulation : LB-ZeroCost-OnOff, Seq-Aw-T and Seq-Aw-E models

14000 NO SAT LB 12000 SAE

10000

8000 Watt 6000

4000

2000

0

25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov

Model Total energy consumed # cycles % Saved Grid’5000 trace, 1 week No-OnOff 6,083,698,688 0 0,0 LB-ZeroCost-OnOff 3,983,408,384 1794 34.52 Seq-Aw-T 4,015,736,064 964 33.99 Seq-Aw-E 4,015,201,024 844 34.00

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution9 Simulation : Power-Cap model

14000 NO PC2000min PC4000min 12000 PC6000min SAT

10000

8000 Watt 6000

4000

2000

0

25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov

Model Total energy consumed # cycles % Saved No-OnOff 6,083,698,688 0 0,0 Seq-Aw-T 4,015,736,064 964 33.99 Power-Cap2000 min 4,401,067,520 855 27.65 Power-Cap4000 min 4,593,668,096 761 24.49 Power-Cap6000 min 5,059,857,408 617 16.82

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 10 Scalability validated

Model Total energy consumed # cycles % Saved Grid’5000 trace, 1 week No-OnOff 6,083,698,688 0 0,0 LB-ZeroCost-OnOff 3,983,408,384 1794 34.52 Seq-Aw-T 4,015,736,064 964 33.99 Seq-Aw-E 4,015,201,024 844 34.00 Power-Cap2000 min 4,401,067,520 855 27.65 Power-Cap4000 min 4,593,668,096 761 24.49 Power-Cap6000 min 5,059,857,408 617 16.82 Grid’5000 trace, 1 month No-OnOff 22,866,315,264 0 0.0 LB-ZeroCost-OnOff 12,935,132,160 5,559 43.43 Seq-Aw-T 13,038,270,464 3,819 42.9804 Seq-Aw-E 13,037,558,784 3,605 42.9835 Power-Cap4000 min 17,864,194,048 2,376 21.87

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 11 Apply realistic shutdown on platforms !

Come to discuss and take a flyer !

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 12 Conclusion

In this paper, models for the OnOff leverage are proposed.

I Express physical constraints easily

I Usage "à la carte"

I Provides a clear answer to changing the state of a device, while taking into account various factors

I Large possibility of usage, one simulated

I Generic models that can be adapted to every device that can be shutdown and waked-up

Future work [Benoit et al., 2017]

I Deeper analysis of combination of models

I Studying the behavior of switching nodes to control the impact on cooling system

I Studying the specific case of external interactions : renewable energy

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 13 References

Benoit, A., Lefèvre, L., Orgerie, A.-C., and Rais, I. (2017). Reducing the energy consumption of large scale computing systems through combined shutdown policies with multiple constraints. International Journal of High Performance Computing Applications.

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 14 Week replay statistics

14000

12000

10000

8000 Watt 6000

4000

2000

NO 0

25 Oct 26 Oct 27 Oct 28 Oct 29 Oct 30 Oct 31 Oct 1 Nov

Day #Jobs Average job Average power cons. (W) job size (s) Oct. 24 (7PM to 12AM) 33 157.91 50,401.24 Oct. 25 (Full day) 144 155.08 23,002.74 Oct. 26 (Full day) 277 159.79 12,299.06 Oct. 27 (Full day) 353 154.11 13,819.43 Oct. 28 (Full day) 318 159.96 27,286.17 Oct. 29 (Full day) 171 174.11 41,525.71 Oct. 30 (Full day) 180 174.04 39,453.67 Oct. 31 (Full day) 563 173.39 12,821.24 Nov. 1 (12AM to 8AM) 48 179.25 17,179.17

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 15 SAE - Ts

Issam Raïs and Laurent Lefevre Shutdown policies with power capping for large scale computing systems : the ELCI solution 16