Technical Review - Sustaining NEP-101
Colin Leavett-Brown, University of Victoria 1 Technical Review - Deliverables
Colin Leavett-Brown, University of Victoria 2 Technical Review - Glint
● Code re-factored: – Remove Django database abstraction – Using Paste, WebObj, and SQLAlchemy building blocks – Consistent with other Glance code. ● Reviewing Glance tasks for data movement.
● Testing and Documentation ongoing ● Ongoing maintenance
Colin Leavett-Brown, University of Victoria 3 Technical Review - Shoal
● Ongoing maintenance
Colin Leavett-Brown, University of Victoria 4 Technical Review - Maintenance
● ATLAS and Belle-II production – highlights a number of issues
Colin Leavett-Brown, University of Victoria 5 Technical Review - Maintenance
● Moved BelleCS server: – Root volume moved from ephemeral (Glance image) to persistent (Cinder volume) storage – No performance impact
Colin Leavett-Brown, University of Victoria 6 Technical Review - Maintenance
● cern.ch cloud issue: – 8 core VMs starting with only one virtual cpu
● VM would start but would not process any jobs – 10 to 50% of VMs ● udev patch for VM kernel – VM hang during boot – 10% of Vms ● KVM patched VM kernel – problem mitigated - 0.5% VMs hang during boot
Colin Leavett-Brown, University of Victoria 7 Technical Review - Maintenance
● Challenges with new clouds: – Each new cloud is different:
● S/W versions, services, configuration, resources, etc. ● Takes time and effort to incorporate – Chameleon OpenStack cloud
● Uses Blazer, a new reservation system – Can't instantiate without a reservation ID – Requires Python 2.7, Nova API v2 and Keystone API v3 – Reviewing impact on Cloud Scheduler – Cybera OpenStack cloud:
● Under utilized, so backfill via over allocation and prioritization ● How does it perform?
Colin Leavett-Brown, University of Victoria 8 Technical Review - Maintenance
● cybera cloud – similar hardware, 50% performant: – Using HEP-SPEC06 (HS06) - http://w3.hepix.org/benchmarks/doku.php/ – Issue still ongoing
Colin Leavett-Brown, University of Victoria 9 Technical Review - Maintenance
● mouse cloud – why the spread?
Colin Leavett-Brown, University of Victoria 10 Technical Review - Maintenance
● mouse cloud – why the spread?
– Small hump before the peak caused by an over allocation of cpu on a single compute node
Colin Leavett-Brown, University of Victoria 11 Technical Review - Maintenance
● Challenges with monitoring: – Can see anomalies from job output – Can diagnose anomalies on Mouse (our development cloud)
● Can view hypervisor state – No hypervisor information access for other clouds
Colin Leavett-Brown, University of Victoria 12 Technical Review - UGR
● Moved production server to Compute Canada cloud: – Glance images (ephemeral) to Cinder volumes (persistent) ● Ongoing maintenance
Colin Leavett-Brown, University of Victoria 13 Technical Review – Summary
● Activities over the last two months: – Improved code base for Glint – Incorporation of new clouds – Cloud problem resolution – Cloud performance monitoring and diagnosis – Improved service-VM hosting – CS, UGR – Review of required service changes - Monitoring, CS, Glint
Colin Leavett-Brown, University of Victoria 14