Quattor-Fall-2015-Jrha.Pdf

Quattor-Fall-2015-Jrha.Pdf

2 in 2015 t ar James Adams P Scientific Computing Department STFC Rutherford Appleton Laboratory Hello! ● Configuration System Architect – 8 years at RAL – wLCG Tier 1 – Distributed Services Team James ● Quattor Release Manager – Since Sept 2013 ● +20% worrying about Ceph What is Quattor ● A distribution of system administration tools ● A configuration management system ● A provisioning system ● An inventory database ● A service orchestration tool ● Open source, community developed ● Support for: EL{5..7}, Fedora(s), Solaris 11 What is Quattor ● Tested configuration modules with defined interfaces for: accounts, afsclt, aiiserver, altlogrotate, amandaserver, apel, authconfig, autofs, bacula, ccm, cdp, ceph, chkconfig, collectl, condorconfig, cron, ctdb, cups, dcache, devicemapper, directoryservices, dirperm, django, download, dpmlfc, etcservices, filecopy, filesystems, fmonagent, frontiersquid, fsprobe, fstab, gacl, ganesha, ganglia, gip2, glitestartup, globuscfg, gmetad, gmond, gold, gpfs, graphite, graylog2, gridmapdir, grub, gsissh, hostsaccess, hostsfile, httpd, icinga, interactivelimits, ipmi, iptables, kerberos, lbconfig, lcas, lcgbdii, lcgmonjob, lcmaps, ldconf, libvirtd, logstash, maui, mcx, metaconfig, mkgridmap, moab, modprobe, mrtg, myproxy, mysql, nagios, named, named, network, nfs, nginx, nrpe, nsca, nscd, nss, ntpd, ofed, openldap, opennebula, openvpn, pakiti, pam, pbsclient, pbsknownhosts, pbsserver, perfsonar, pnp4nagios, postfix, postgresql, profile, puppet, resolver, rpcidmapd, rsync, sendmail, shorewall, snmp, spma, ssh, sudo, symlink, sysconfig, sysctl, syslog, syslogng, udev, useraccess, vomrs, vomsclient, wlconfig, wmsclient, wmslb, xinetd, xrootd, yaim, yaim_usersconf, zookeeper http://quattor-core.readthedocs.org What is Quattor Quattor is a Community Community ● Community drives everything ● No full-time developers or commerical interests, just site admins ● ~15 active institutes th ● Just celebrated the 20 Quattor workshop – LAL, Orsay, Paris ● A few new contributors in the last six months – 21 registered “developers” on GitHub (+2 robots) VUB & ULB (Brussels) ● Two Quattor managed clusters – IceCube - local non-grid – T2_BE_IIHE - grid-cluster supporting CMS ● 2200 job slots and 2.2 TB dCache storage. ● Two OpenNebula clouds – One for Flemish users (funded by VSC) – VMs can be created with Quattor ● ncm-openebula and aii-opennebula VUB & ULB (Brussels) ● Very interested in CEPH as a grid storage element – Would prefer to use Quattor to deploy and manage this ● Investigating adoption of Aquilon ● More collaboration meetings with UGent who are using Quattor for their large HPC clusters Paris-Sud University ● Started with about 10 hosts – Including a small Open Linked Data infrastructure – Part of the University's Scientific Computing Infrastructure ● Previously no central system administration tool ● Migrating to more hosts – Planning to re-install for OS upgrades ● Still evaluating roll-out to ~250 more hosts – University services ● e-mail, teaching, HR, management etc. RAL (UK) ● Migration to Aquilon in progress – Worker nodes migrated ● ~700 hosts – Now managing 860 1619 hosts ● vs. 1610 922 in SCDB ● OpenNebula Private Cloud ● Three Ceph clusters – Automagic CRUSH map generation ● Hundreds of service hosts RAL (UK) ● Starting to be adopted more widely across STFC – ISIS Neutron Source Computing Infrastructure – Research Infrastructure Group (HPC resources) ● CASTOR team has retired Puppet – But not started move to Aquilon ● Starting to experiment with services being users of Quattor – Making changes to configuration – Deploying new instances RAL (UK) ● Batch farm and OpenNebula are “users” of Aquilon – Change personalities of hosts as needed – Automatically clean up when done MS (Global) ● Managing ~38,000 hosts worldwide – Including vmware ESX clusters and NetApp filers ● Introducing concept of personality versions – Three options: previous current next – Allows for better large scale testing in large infrastructures ● Starting to see some scaling issues in Aquilon broker – Investigating more parallelism through multiprocess architecture UAM (Madrid) ● Managing ~250 hosts – SCDB + Quattor 15.4.0. ● Internal services: – Web sites for teaching, twikis, KVM hypervisor with VMs for testing, SVN server, mail server, nagios, ganglia, desktops. ● T2 services: – ~1PB of storage in dCache, SRM and doors: webdav, gridfp, dcap, gsidcap, xrootd, FAX. – ~500 core farm: Torque, maui and CREAM CE. ● Other grid services bdiisite, argus service, apel accounting, perfsonar, frontier-squid cache system. UAM (Madrid) ● Working on: – Moving some machines to 10G fiber – Migration to EL7 - Starting with worker nodes – Setup a High Availability environment using Red Hat cluster suite ● KVM for virtualization, GFS2 ● Start moving grid services to it next year ● Future: – Evaluating & migrating to Condor batch system and ARC CE. UAM (Madrid) ● Working on: – Moving some machines to 10G fiber – Migration to EL7 - Starting with worker nodes – Setup a High Availability environment using Red Hat cluster suite ● KVM for virtualization, GFS2 ● Start moving grid services to it next year ● Future: – Evaluating & migrating to Condor batch system and ARC CE. “I am the only sysadmin - administration would be just IMPOSSIBLE without Quattor and the great and close support I get from the community.” Common Activities ● Many sites building out private clouds – Mostly OpenNebula, but also OpenStack ● Lots of interest in Ceph – Both for block devices for VMs and generate storage ● Everyone wants to move from SCDB to Aquilon – Renewed focus on improving installation experience ● FreeIPA (RedHat IdM) becoming popular – Idenitity management and distribution of secrets ● Kerberos required for Aquilon Development ● Contributions continue to increase ● Increased number of more active contributors ● Moving to GitHub transformed development ● Automated testing! Test all of the things! 3000 2500 2000 Quattor moved to GitHub s t i m 1500 m o C 1000 500 0 2008 2009 2010 2011 2012 2013 2014 Year It's free for you from me you see. Development ● Investment in testing continues to grow – Unit tests for all of the things – Increased from ~19000 to ~36000 tests since Spring HEPiX – Jenkins testing every pull request – Tests now enforced by GitHub ● ● ● Focusing codebase onsimplifying Quality steadily increasing LOC and rate Commit – Cleaning Cleaning up lower layers of code keeps improving ofkeeps code Development Lines of Code 100,000 150,000 200,000 250,000 300,000 350,000 50,000 0 2008 Moved to GitHub Clean up legacy code legacy up Clean 2009 2010 Date 2011 2012 2013 2014 2015 New and interesting things ● Support for systemd ● New ccm CLI ● Private clouds being built – Ceph, OpenNebula and OpenStack components New ccm CLI ● Client side utility for querying host profile data – Deprecates ancient ncm-query ● Multiple output formats – Json, pan, pancxml, tabcompletion (bash), yaml ● For example... # ccm --format json --profpath /system/network/ --show {"default_gateway":"1.2.3.4", "domainname":"example.org", "hostname":"host", "interfaces":{"eth0":{"broadcast":"1.2.3.255", "driver":"e1000", "ip":"1.2.3.10", "netmask":"255.255.255.0"}}, "nameserver":["1.2.3.243"], "nozeroconf":true, "set_hwaddr":true} Challenges ● Large number of repositories – 23 repositories with cross-dependencies – Opportunities to merge/retire some ● Size of releases continues to increase – Fantastic, but makes release process more difficult ● How to build initial images for VMs and containers – Build from scratch, not just contextualise – Some ideas here, but nothing concrete yet Thanks! http://www.quattor.org Questions? http://www.quattor.org .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    26 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us