Creating Abstractions for Piz Daint and its Ecosystem Sadaf Alam Chief Technology Officer Swiss National Supercomputing Centre November 16, 2017 CSCS in a Nutshell

§ A unit of the Swiss Federal Institute of Technology in Zurich (ETH Zurich) § founded in 1991 in Manno § relocated to Lugano in 2012 § Develops and promotes technical and scientific services § for the Swiss research community in the field of high-performance computing § Enables world-class scientific research § by pioneering, operating and supporting leading-edge supercomputing technologies § Employing 90+ persons from about 15+ different nations

Supercomputing, 2017 2 Piz Daint and the User Lab http://www.cscs.ch/uploads/tx_factsheet/FSPizDaint_2017_EN.pdf http://www.cscs.ch/publications/highlights/ http://www.cscs.ch/uploads/tx_factsheet/AR2016_Online.pdf

Model XC40/XC50

Intel® ® E5-2690 v3 @ 2.60GHz (12 XC50 Compute cores, 64GB RAM) and NVIDIA® Tesla® Nodes P100 16GB

XC40 Compute ® Xeon® E5-2695 v4 @ 2.10GHz (18 Nodes cores, 64/128 GB RAM)

Interconnect Aries routing and communications ASIC, Configuration and Dragonfly network topology

Scratch ~9 + 2.7 PB capacity

Supercomputing, 2017 3 Piz Daint (2013à2016à)

§ 5,272 hybrid nodes (Cray XC30) § 5,320 hybrid nodes (Cray XC50) § Nvidia Tesla K20x § Nvidia Tesla P100 § Intel Xeon E5-2670 § Intel Xeon E5-2690 v3 § § 6 GB GDDR5 16 GB HBM2 § 64 GB DDR4 § 32 GB DDR3 § 1,431 multi-core nodes (Cray XC40) § No multi-core § 2 x Intel Xeon E5-2695 v4 § 64 and 128 GB DDR4 2013 2016 à § Cray Aries dragonfly interconnect § Cray Aries dragonfly interconnect § ~33 TB/s bisection bandwidth § ~36 TB/s bisection bandwidth § Fully provisioned for 28 cabinets § Sonexion Lustre file system § Sonexion Lustre file system § ~9 PB (Sonexion 3000) & 2.7 PB § 2.7 PB (Sonexion 1600) (Sonexion 1600) § External GPFS on selected nodes

Supercomputing, 2017 4 Piz Daint-–More Versatile Than Before

Computing

Visualization 2013

Data analysis ✙

Pre-post processing 2016 à Data mover

Data Warp

Machine learning

Deep learning

Supercomputing, 2017 5 Overview of Hardware Infrastructure

SWITCHLAN (100 Gbit Ethernet)

CSCS LAN Dedicated platforms

Data Center Network (InfiniBand, Ethernet)

Supercomputing, 2017 6 LHConCray Collaborative Project

§ Teams members from CHIPP (Swiss Institute of Particle Physics) and CSCS § To explore LHC workflows efficiency in a shared environment (Piz Daint) § Goals: full transparency to users while sustaining efficiency metrics and supporting monitoring and accounting tools (complete workflow mapping for multiple experiments)

§ Publications, presentations and community meetings § G. Sciacca, “ATLAS and LHC Computing on Cray”, CHEP, 2016 § L. Benedecis, M. Gila et. al. “Opportunities for container environments on Cray XC30 with GPU devices”. Proceedings of the Cray User Group meeting, 2015 § Status (Jan 2017): https://wiki.chipp.ch/twiki/pub/LCGTier2/MeetingLHConCRAY20170127/20170127.CSCS_CHIPP_ F2F.pdf § Status (Aug 2017): https://wiki.chipp.ch/twiki/pub/LCGTier2/BlogAcceptanceTests2017/LHConCRAY-Run4_CSCS.pdf

§ Community tools Production since § https://wiki.chipp.ch/twiki/bin/view/LCGTier2/LHConCRAYMonitoring April 2017 § http://ganglia.lcg.cscs.ch/ganglia/sltop_lhconcray.html

Supercomputing, 2017 7 LHConCray Project

WLCG platform statistics ~170 sites in 40 countries 350,000+ cores 500+ PB 2+ Mio jobs per day 10-100 Gb links http://wlcg.web.cern.ch/tools

Data Center Network (InfiniBand, Ethernet)

Supercomputing, 2017 8 Status of LHConCray Project

§ Operational since April 2017 § Monitor status and progress at http://wlcg.web.cern.ch/tools & https://wiki.chipp.ch/twiki/bin/view/LCGTier2/LHConCRAYMonitoring

§ Statistics § ~20% of total job submission (<0.4% of total compute resources) § Over 90% of docker/shifter image pull requests are for LHC software

§ Open items (tuning and optimization) § Data corruption patch generated an issue for the swap test case § Continued investigation into DVS and DWS optimization and tuning

Supercomputing, 2017 9 Bridging the Gap à Creating New Abstractions § Light-weight operating system (SLES based)

§ Possible solution: containers or other virtualization interfaces

§ Diskless compute nodes

§ Possible solution: exploit burst buffer or tiered storage hierarchies

§ Computing nodes connectivity (high speed Aries interconnect)

§ Possible solution: web services access with no address translations overhead

Supercomputing, 2017 10 Future Policy and Technical Considerations Convergence of HPC and Data Science Workflows

§ Resource Management Systems (job schedulers) § Too many jobs (relative to HPC jobs mix) § Fine grain control and interactive access

§ Resource Specialization (multi-level heterogeneity) § Subset of nodes with special operating conditions e.g. node sharing

§ Resource Access (authentication, authorization and accounting) § Delegation of access (service and user accounts mappings)

§ Resource Accessibility and Interoperability (middleware services) § Secure and efficient access through web services § Interoperability with multiple storage targets (POSIX & object)

Supercomputing, 2017 11 Thank you for your attention.