Quattor-Fall-2015-Jrha.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
2 in 2015 t ar James Adams P Scientific Computing Department STFC Rutherford Appleton Laboratory Hello! ● Configuration System Architect – 8 years at RAL – wLCG Tier 1 – Distributed Services Team James ● Quattor Release Manager – Since Sept 2013 ● +20% worrying about Ceph What is Quattor ● A distribution of system administration tools ● A configuration management system ● A provisioning system ● An inventory database ● A service orchestration tool ● Open source, community developed ● Support for: EL{5..7}, Fedora(s), Solaris 11 What is Quattor ● Tested configuration modules with defined interfaces for: accounts, afsclt, aiiserver, altlogrotate, amandaserver, apel, authconfig, autofs, bacula, ccm, cdp, ceph, chkconfig, collectl, condorconfig, cron, ctdb, cups, dcache, devicemapper, directoryservices, dirperm, django, download, dpmlfc, etcservices, filecopy, filesystems, fmonagent, frontiersquid, fsprobe, fstab, gacl, ganesha, ganglia, gip2, glitestartup, globuscfg, gmetad, gmond, gold, gpfs, graphite, graylog2, gridmapdir, grub, gsissh, hostsaccess, hostsfile, httpd, icinga, interactivelimits, ipmi, iptables, kerberos, lbconfig, lcas, lcgbdii, lcgmonjob, lcmaps, ldconf, libvirtd, logstash, maui, mcx, metaconfig, mkgridmap, moab, modprobe, mrtg, myproxy, mysql, nagios, named, named, network, nfs, nginx, nrpe, nsca, nscd, nss, ntpd, ofed, openldap, opennebula, openvpn, pakiti, pam, pbsclient, pbsknownhosts, pbsserver, perfsonar, pnp4nagios, postfix, postgresql, profile, puppet, resolver, rpcidmapd, rsync, sendmail, shorewall, snmp, spma, ssh, sudo, symlink, sysconfig, sysctl, syslog, syslogng, udev, useraccess, vomrs, vomsclient, wlconfig, wmsclient, wmslb, xinetd, xrootd, yaim, yaim_usersconf, zookeeper http://quattor-core.readthedocs.org What is Quattor Quattor is a Community Community ● Community drives everything ● No full-time developers or commerical interests, just site admins ● ~15 active institutes th ● Just celebrated the 20 Quattor workshop – LAL, Orsay, Paris ● A few new contributors in the last six months – 21 registered “developers” on GitHub (+2 robots) VUB & ULB (Brussels) ● Two Quattor managed clusters – IceCube - local non-grid – T2_BE_IIHE - grid-cluster supporting CMS ● 2200 job slots and 2.2 TB dCache storage. ● Two OpenNebula clouds – One for Flemish users (funded by VSC) – VMs can be created with Quattor ● ncm-openebula and aii-opennebula VUB & ULB (Brussels) ● Very interested in CEPH as a grid storage element – Would prefer to use Quattor to deploy and manage this ● Investigating adoption of Aquilon ● More collaboration meetings with UGent who are using Quattor for their large HPC clusters Paris-Sud University ● Started with about 10 hosts – Including a small Open Linked Data infrastructure – Part of the University's Scientific Computing Infrastructure ● Previously no central system administration tool ● Migrating to more hosts – Planning to re-install for OS upgrades ● Still evaluating roll-out to ~250 more hosts – University services ● e-mail, teaching, HR, management etc. RAL (UK) ● Migration to Aquilon in progress – Worker nodes migrated ● ~700 hosts – Now managing 860 1619 hosts ● vs. 1610 922 in SCDB ● OpenNebula Private Cloud ● Three Ceph clusters – Automagic CRUSH map generation ● Hundreds of service hosts RAL (UK) ● Starting to be adopted more widely across STFC – ISIS Neutron Source Computing Infrastructure – Research Infrastructure Group (HPC resources) ● CASTOR team has retired Puppet – But not started move to Aquilon ● Starting to experiment with services being users of Quattor – Making changes to configuration – Deploying new instances RAL (UK) ● Batch farm and OpenNebula are “users” of Aquilon – Change personalities of hosts as needed – Automatically clean up when done MS (Global) ● Managing ~38,000 hosts worldwide – Including vmware ESX clusters and NetApp filers ● Introducing concept of personality versions – Three options: previous current next – Allows for better large scale testing in large infrastructures ● Starting to see some scaling issues in Aquilon broker – Investigating more parallelism through multiprocess architecture UAM (Madrid) ● Managing ~250 hosts – SCDB + Quattor 15.4.0. ● Internal services: – Web sites for teaching, twikis, KVM hypervisor with VMs for testing, SVN server, mail server, nagios, ganglia, desktops. ● T2 services: – ~1PB of storage in dCache, SRM and doors: webdav, gridfp, dcap, gsidcap, xrootd, FAX. – ~500 core farm: Torque, maui and CREAM CE. ● Other grid services bdiisite, argus service, apel accounting, perfsonar, frontier-squid cache system. UAM (Madrid) ● Working on: – Moving some machines to 10G fiber – Migration to EL7 - Starting with worker nodes – Setup a High Availability environment using Red Hat cluster suite ● KVM for virtualization, GFS2 ● Start moving grid services to it next year ● Future: – Evaluating & migrating to Condor batch system and ARC CE. UAM (Madrid) ● Working on: – Moving some machines to 10G fiber – Migration to EL7 - Starting with worker nodes – Setup a High Availability environment using Red Hat cluster suite ● KVM for virtualization, GFS2 ● Start moving grid services to it next year ● Future: – Evaluating & migrating to Condor batch system and ARC CE. “I am the only sysadmin - administration would be just IMPOSSIBLE without Quattor and the great and close support I get from the community.” Common Activities ● Many sites building out private clouds – Mostly OpenNebula, but also OpenStack ● Lots of interest in Ceph – Both for block devices for VMs and generate storage ● Everyone wants to move from SCDB to Aquilon – Renewed focus on improving installation experience ● FreeIPA (RedHat IdM) becoming popular – Idenitity management and distribution of secrets ● Kerberos required for Aquilon Development ● Contributions continue to increase ● Increased number of more active contributors ● Moving to GitHub transformed development ● Automated testing! Test all of the things! 3000 2500 2000 Quattor moved to GitHub s t i m 1500 m o C 1000 500 0 2008 2009 2010 2011 2012 2013 2014 Year It's free for you from me you see. Development ● Investment in testing continues to grow – Unit tests for all of the things – Increased from ~19000 to ~36000 tests since Spring HEPiX – Jenkins testing every pull request – Tests now enforced by GitHub ● ● ● Focusing codebase onsimplifying Quality steadily increasing LOC and rate Commit – Cleaning Cleaning up lower layers of code keeps improving ofkeeps code Development Lines of Code 100,000 150,000 200,000 250,000 300,000 350,000 50,000 0 2008 Moved to GitHub Clean up legacy code legacy up Clean 2009 2010 Date 2011 2012 2013 2014 2015 New and interesting things ● Support for systemd ● New ccm CLI ● Private clouds being built – Ceph, OpenNebula and OpenStack components New ccm CLI ● Client side utility for querying host profile data – Deprecates ancient ncm-query ● Multiple output formats – Json, pan, pancxml, tabcompletion (bash), yaml ● For example... # ccm --format json --profpath /system/network/ --show {"default_gateway":"1.2.3.4", "domainname":"example.org", "hostname":"host", "interfaces":{"eth0":{"broadcast":"1.2.3.255", "driver":"e1000", "ip":"1.2.3.10", "netmask":"255.255.255.0"}}, "nameserver":["1.2.3.243"], "nozeroconf":true, "set_hwaddr":true} Challenges ● Large number of repositories – 23 repositories with cross-dependencies – Opportunities to merge/retire some ● Size of releases continues to increase – Fantastic, but makes release process more difficult ● How to build initial images for VMs and containers – Build from scratch, not just contextualise – Some ideas here, but nothing concrete yet Thanks! http://www.quattor.org Questions? http://www.quattor.org .