Pileup job with limited memory and zswap

IT-DI-WLCG UP 3 May 2018, Atlas workflow meeting The test

• Running pileup with atlas v20.7.8.12 – Centos 7 (3.10.0-693.21.1.el7) – 2 x E5-2630v3 (32 logical cores with SMT on) – 64GiB RAM, 32GiB swap backed on 2 x Intel DC 3610 800GB SSD – MP=8, 4 x same job (but separate copies of data) to fill the machine – Memory is limited at start by a to allocate, lock and write to a configurable amount of memory – Kernel swappiness is 60: Controls the priority at which anonymous (e.g. application allocated memory) vs file pages (already on disk, e.g. ) are reclaimed. ZSWAP

• Is a feature to enhance swap functionality – When swapping out a page, considers first storing it compressed in a dynamically sized RAM cache – Zswap RAM cache managed by a policy called the zpool: default is zbud – If the page does not compress well enough, or there are already too many compressed pages per page in the cache, the new page is swapped straight to disk. Zbud allows 2 compressed pages per physical page in the cache – When the cache reaches a maximum size, previously compressed pages may be uncompressed and sent to disk to make room – ZSWAP has different functionality available by default depending on kernel version • On Centos 7: zswap can only be enabled or disabled at boot time, zbud is the default zpool, and compression method is lzo • Default zswap RAM cache size is up to 20% of total RAM, can be changed at boot time Results Wallclock against available memory ´103

s 50 / ZSwap OFF

e ZSwap 4%

m

i

t 45

ZSwap 9,10%

k

ZSwap 20%

o l 40

c

l

l

a

W 35

30

25

20

15

10 Wallclock against available memory (zoom) ´103

s 23 / ZSwap OFF

e ZSwap 4%

m

5 i

t 22 ZSwap 9,10%

k

c ZSwap 20%

o

l

c

l

l

0 a 21

0.4 0.6 0.8 1 1.2 1.4 W 1.6 1.8 2 Approx memory per core20 (61.6 - reserved)/32 /GiB

19

18

17

16

15 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Approx memory per core (61.6 - reserved)/32 /GiB Conclusion

• Zswap does not seem to help in this situation • However the Job behaves quite well even at <2GiB per core, down to less than 700MiB. – With SSD backing • Possible directions for follow up (discuss?) – different types of job with restricted memory – swap backing device with lower performance (e.g. approximating a HD) – Probably not practical: newer options available for zswap (e.g. z3fold zpool for up to 3 compressed pages per cache page, lz4 compressor, runtime change of zswap parameters)