LHCb’s 3.5 to Puppet 4.9 migration Three weeks long hackathon

H. Mohamed1, F. Sborzacchi2, T. Colombo1, L. Brarda1, N. Neufeld1 R. Schwemmer1, M. Daoudi1 1. CERN, Geneva, Switzerland 2. INFN, Rome, Italy [email protected]

Objectives Performance tunning and other considerations

• Migrate our entire code base from Puppet 3.5 to Puppet 4.9 Below are some performance tuning we used in our infrastructure • Put high availability on every component of our Puppet infrastructure 1. Test for yourself if deadline/cfq/noop performs better in your environment • Scale our Puppet infrastructure to be able to initiate a site-wide run in predictable time 2. Enable G1GC garbage collector with 4GB+ Heaps • Scale out MCollective installation 3. Monitor PuppetDB performance drop and regularly initiate a vacuum2 • Do all of the above in the time frame of a month! 4. For running behind Passenger ensure maximum number of instances from startup (pro- cess spawning is expensive)3 3 Introduction 5. If using Passenger with Apache use MPM and a large number of workers (due to blocking I/O)

Up until September 2017 LHCb Online was running on Puppet 3.5 a Master/Client non-redundant architecture. As a result, we had problem with outages, both planned and unplanned, as well as with Performance tunning of Puppet Masters - JRuby scalability issues (unable to predict amount of time needed to complete site-wide run). What’s more 1. Puppet uses JRuby instances. Default value are between 1 and 4 (num-cpus - 1)4 . We see our code base was very large, aging and starting to lack compatibility with newer modules. Further that num-cpus is better (CFQ scheduler) more, Puppet 5.0 was stable and around the corner!

1 New Infrastructure 45 JRuby Instances 21.01

21.01 MCollective MCollective MCollective 48JRuby Instances 19.06

19.06

51JRuby Instances 21.04

DB DB 21.04 Clients Memcached Memcached Backend Backend 54JRuby Instance 22.65

22.65 57JRuby Instances 22.86

LoadBalancer Foreman Foreman 22.86 LoadBalancer 60JRuby Instances 23.04

23.04 0 2.5 5 7.5 10 12.5 15 17.5 20 22.5 25 PuppetDB PuppetDB /PostgreSQL /PostgreSQL 96 Clients, 48 Hardware Threads Puppet Puppet meta-chart.com Servers/ Servers/ Figure 2: Performance split over 3 Puppet Masters, 16 hardware threads each Smart Smart Puppet Proxies Proxies CA Puppet Puppet LoadBalancer 2. Minimum 512 MB of memory per JRuby with light catalogs / small number of environments Servers/ Servers/ Puppet 3. Ensure your system has adequate entropy Smart Smart CA Proxies Proxies Shared Code Repository Results

1. We now have over 2500 hosts running our highly available Puppet Updating the Code base 2. We can tolerate failure in every component 3. On average our infrastructure needs 17-20 seconds catalog compilation time For the redo of our code base, we used the following principles 4. Full run of Puppet on our entire infrastructure takes less than 30 minutes! • Always use modules from the forge(if available), never write your own • (All) Specific data for our infrastructure goes to Hiera • Write every module like an API Conclusion • Reuse profiles whenever possible • Start using CI/CD pipeline Puppet’s initial release was in 2005. Since then it has come a long way to become a complete config- uration management solution. Nowadays for a successful Puppet Open Source deployment, multiple components are needed as well as following software engineering principles. 80000

63986 References 60000 63986 537 83 [1] Christopher Pisano. Journey to high availability. 2016. 537 83 [2] Robert Treat Greg Smith and Christopher Browne. Tuning your postgresql server. PostgreSQL

40000 wiki, 2018. 33856 [3] Phussion Passanger. Phussion passanger documentation. 2018. 33856 [4] PuppetLabs. Puppetlabs documentation. 2018.

20000 157 38 [5]J ulio´ Brettas. Poster template - escola de modelos de regresso. 2018. 1367 7 1107 0 11833 157 38 7 911 1367 7 1107 0 11833 7 911 Additional materials used 0 Yaml PP files Templates( All Files Additional information and resources used: Puppet3.5 Puppet4.9 meta-chart.com • Pro Puppet Turnbull, James, McCune, Jeffrey Figure 1: Code reduction in lines of code • http://www.brendangregg.com/linuxperf.html Brendan Gregg