Archer Update Andrew Washbrook University of Edinburgh
Total Page:16
File Type:pdf, Size:1020Kb
Archer Update Andrew Washbrook University of Edinburgh HPC working group mee:ng 10th December 2014 Archer Details • Archer is the UK’s primary academic research supercomputer • Operaonal since Nov 2013 • NEW: Phase 2 upgrade completed in Nov 2014 • Cray XC30 system • Each compute node comprises of: • 2 x 12-core 2.7 GHz Ivy Bridge processors • At least 64 GB of DDR3-1833 MHz main memory • Cray Aries interconnect (mul:-:er all-to-all connec:vity) • 4.4 PB scratch storage (Lustre) • 3008 4920 compute nodes è 72,192 118,080 cores • 1.56 >2 Petaflops of theore:cal peak performance. 2 Questions for ATLAS Weekly How much are we using for G4 and what are the prospects for using more? • We currently have access to Archer via a nominal allocaon pledged to University of Edinburgh researchers (Archer directors :me) • An effec:ve proof of concept to demonstrate effec:ve use of opportunis:c slots would help bolster the case to request addi:onal resources What are the main challenges for using HPCs in general? • Specific challenges are covered in the following slides • Some of these will be Archer specific • (Hopefully) some of these will have been addressed at other HPC sites 3 Compute Node Connectivity Challenge: Outgoing connec=vity is (in general) required throughout the life=me of a job • Up un:l Archer Phase 2 there was no external connec:vity available on the compute nodes Approaches • External connec:vity may now be possible by the use of new Cray Realm Specific IP addressing (RSIP) for compute nodes • Allows compute and service nodes to share the IP addresses configured on the external Gigabit Ethernet interfaces of network nodes • Included on Archer for third party sobware licence validaon • Not meant for large scale data transfer • Performing scaling tests underway to determine limitaons (if any) Alternatives • ssh port forwarding to select servers on the Archer network 4 Software Delivery Challenge: Availability of ATLAS soIware on HPC compute nodes Approaches • CVMFS is not available from compute nodes • External connec:vity issues (see previous slide) • Problemac installing CVMFS sobware, FUSE and autofs on compute nodes • Currently have a local snapshot of CVMFS as a pragmac op:on • CVMFS directory is rsynced to machine resident at Tier-2 site • Repository copy is cleansed of absolute paths and replaced with local path • Copy is rsynced over to Archer shared storage • For now only selected releases extracted during tes:ng phase Alternatives • Archer will allow external filesystem resident on edge server to be mounted on scheduling nodes • Not clear how his would be beneficial – could try CVMFS over NFS solu:on • Parrot/CVMFS • Pacman 5 Job Environment Challenge: Define HPC compute node environment before each job Approaches • c/w Grid environment script + Worker node tarball soluon used for some shared Tier-2 facili:es (e.g. ECDF) • HEP specific libraries (covered by the HEP_OSlibs meta rpm) have to be made available without rpm installaon method • Need to address poten:al conflicts in common tools and libraries used by other HPC users (gcc, python) • Favoured the use of an TCL module to define path setup to be consistent with sobware management of other HPC applicaons on Archer • asetup command works with some tweaking of absolute paths 6 HPC Pilots Challenge: A pilot submiLed to a HPC batch system will (in general) have to request and manage workload across many compute nodes • Pilot executes on a scheduling node and then requests compute resources using aprun Approaches • The easiest approach would be just to submit wholenode pilots • Archer queue limitaons: maximum 16 queued jobs, 8 running jobs per user • Assuming no MPI implementaon the pilot would need to handle many instances of wholenode jobs • Job finishes at the speed of the slowest if all wholenode jobs be launched simultaneously • Inefficient resource allocaon over :me and burns through quota Alternatives • Yoda Event Service approach makes sense to use given job throughput limitaons on Archer • Note that this challenge is not isolated to HEP - alternave HPC pilot solu:on (on top of SAGA) is being used for Molecular Dynamics simulaon framework (ExTASY) on Archer • RADICAL pilot: hnp://radical-cybertools.github.io/radical-pilot/index.html 7 Workload Scheduling and Backfilling Challenge: Assuming that pilots handle mul=-node workloads how many compute resources (and how much wallclock =me) should a pilot request at any given me? Approaches • We could just es:mate a fixed value that has a reasonable chance of being scheduled • Would like to follow approach made on Titan • Pilot polls scheduler directly to determine most efficient resource allocaon based on backfilling informaon • Unfortunately we cannot do this at Archer due to a different scheduler implementaon • Titan has PBSpro has access to “showbf” command • Archer has Cray ALPS • Ongoing discussions with Cray on alternaves - informaon could be derived from superset of informaon from apstat, qstat, xtnodestat commands - but less trivial than direct method 8 Other Challenges and Outlook Grid storage interacon • May need two stage copy to move job output across to local storage • Could be done via post-processing node outside of job life:me Dynamic Shared libraries vs Stac libraries • aprun normally expects stac libraries • Need to explore if this is a real issue for ATLAS workload Outlook • No major blockers on progress (for now) • External connec:vity was a long-standing issue but could be resolved in Phase 2 setup • Troubleshoo:ng session scheduled with Archer admins to resolve remaining issues in validaon exercise • Dormant ARC CE connected to Archer - will revive service to allow low level test jobs if suitable pilots are available • Would like scheduling and resource allocaon to be efficient as possible • Yoda Event Service could be more efficient use of resources - will perform ini:al tes:ng in step with deployment • Aiming for a reliable service early next year if challenges can be addressed (or at 9 least mi:gated) .