<<

Technical Review - Sustaining NEP-101

Colin Leavett-Brown, University of Victoria 1 Technical Review - Deliverables

Colin Leavett-Brown, University of Victoria 2 Technical Review - Glint

● Code re-factored: – Remove Django database abstraction – Using Paste, WebObj, and SQLAlchemy building blocks – Consistent with other Glance code. ● Reviewing Glance tasks for data movement.

● Testing and Documentation ongoing ● Ongoing maintenance

Colin Leavett-Brown, University of Victoria 3 Technical Review - Shoal

● Ongoing maintenance

Colin Leavett-Brown, University of Victoria 4 Technical Review - Maintenance

● ATLAS and Belle-II production – highlights a number of issues

Colin Leavett-Brown, University of Victoria 5 Technical Review - Maintenance

● Moved BelleCS server: – Root volume moved from ephemeral (Glance image) to persistent (Cinder volume) storage – No performance impact

Colin Leavett-Brown, University of Victoria 6 Technical Review - Maintenance

● cern.ch cloud issue: – 8 core VMs starting with only one virtual cpu

● VM would start but would not any jobs – 10 to 50% of VMs ● udev patch for VM kernel – VM hang during boot – 10% of Vms ● KVM patched VM kernel – problem mitigated - 0.5% VMs hang during boot

Colin Leavett-Brown, University of Victoria 7 Technical Review - Maintenance

● Challenges with new clouds: – Each new cloud is different:

● S/W versions, services, configuration, resources, etc. ● Takes time and effort to incorporate – Chameleon OpenStack cloud

● Uses Blazer, a new reservation system – Can't instantiate without a reservation ID – Requires Python 2.7, Nova API v2 and Keystone API v3 – Reviewing impact on Cloud Scheduler – Cybera OpenStack cloud:

● Under utilized, so backfill via over allocation and prioritization ● How does it perform?

Colin Leavett-Brown, University of Victoria 8 Technical Review - Maintenance

● cybera cloud – similar hardware, 50% performant: – Using HEP-SPEC06 (HS06) - http://w3.hepix.org/benchmarks/doku.php/ – Issue still ongoing

Colin Leavett-Brown, University of Victoria 9 Technical Review - Maintenance

● mouse cloud – why the spread?

Colin Leavett-Brown, University of Victoria 10 Technical Review - Maintenance

● mouse cloud – why the spread?

– Small hump before the peak caused by an over allocation of cpu on a single compute node

Colin Leavett-Brown, University of Victoria 11 Technical Review - Maintenance

● Challenges with monitoring: – Can see anomalies from job output – Can diagnose anomalies on Mouse (our development cloud)

● Can view state – No hypervisor information access for other clouds

Colin Leavett-Brown, University of Victoria 12 Technical Review - UGR

● Moved production server to Compute Canada cloud: – Glance images (ephemeral) to Cinder volumes (persistent) ● Ongoing maintenance

Colin Leavett-Brown, University of Victoria 13 Technical Review – Summary

● Activities over the last two months: – Improved code base for Glint – Incorporation of new clouds – Cloud problem resolution – Cloud performance monitoring and diagnosis – Improved service-VM hosting – CS, UGR – Review of required service changes - Monitoring, CS, Glint

Colin Leavett-Brown, University of Victoria 14