Pileup job with limited memory and zswap
IT-DI-WLCG UP 3 May 2018, Atlas workflow meeting The test
• Running pileup with atlas v20.7.8.12 – Centos 7 (3.10.0-693.21.1.el7) – 2 x E5-2630v3 (32 logical cores with SMT on) – 64GiB RAM, 32GiB swap backed on 2 x Intel DC 3610 800GB SSD – MP=8, 4 x same job (but separate copies of data) to fill the machine – Memory is limited at start by a process to allocate, lock and write to a configurable amount of memory – Kernel swappiness is 60: Controls the priority at which anonymous (e.g. application allocated memory) vs file pages (already on disk, e.g. page cache) are reclaimed. ZSWAP
• Is a feature to enhance swap functionality – When swapping out a page, considers first storing it compressed in a dynamically sized RAM cache – Zswap RAM cache managed by a policy called the zpool: default is zbud – If the page does not compress well enough, or there are already too many compressed pages per page in the cache, the new page is swapped straight to disk. Zbud allows 2 compressed pages per physical page in the cache – When the cache reaches a maximum size, previously compressed pages may be uncompressed and sent to disk to make room – ZSWAP has different functionality available by default depending on kernel version • On Centos 7: zswap can only be enabled or disabled at boot time, zbud is the default zpool, and compression method is lzo • Default zswap RAM cache size is up to 20% of total RAM, can be changed at boot time Results Wallclock against available memory ´103
s 50 / ZSwap OFF
e ZSwap 4%
m
i
t 45
ZSwap 9,10%
k
c ZSwap 20%
o l 40
c
l
l
a
W 35
30
25
20
15
10 Wallclock against available memory (zoom) ´103
s 23 / ZSwap OFF
e ZSwap 4%
m
5 i
t 22 ZSwap 9,10%
k
c ZSwap 20%
o
l
c
l
l
0 a 21
0.4 0.6 0.8 1 1.2 1.4 W 1.6 1.8 2 Approx memory per core20 (61.6 - reserved)/32 /GiB
19
18
17
16
15 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Approx memory per core (61.6 - reserved)/32 /GiB Conclusion
• Zswap does not seem to help in this situation • However the Job behaves quite well even at <2GiB per core, down to less than 700MiB. – With SSD backing • Possible directions for follow up (discuss?) – different types of job with restricted memory – swap backing device with lower performance (e.g. approximating a HD) – Probably not practical: newer options available for zswap (e.g. z3fold zpool for up to 3 compressed pages per cache page, lz4 compressor, runtime change of zswap parameters)