Creating Abstractions for Piz Daint and its Ecosystem Sadaf Alam Chief Technology Officer Swiss National Supercomputing Centre November 16, 2017 CSCS in a Nutshell
§ A unit of the Swiss Federal Institute of Technology in Zurich (ETH Zurich) § founded in 1991 in Manno § relocated to Lugano in 2012 § Develops and promotes technical and scientific services § for the Swiss research community in the field of high-performance computing § Enables world-class scientific research § by pioneering, operating and supporting leading-edge supercomputing technologies § Employing 90+ persons from about 15+ different nations
Supercomputing, 2017 2 Piz Daint and the User Lab http://www.cscs.ch/uploads/tx_factsheet/FSPizDaint_2017_EN.pdf http://www.cscs.ch/publications/highlights/ http://www.cscs.ch/uploads/tx_factsheet/AR2016_Online.pdf
Model Cray XC40/XC50
Intel® Xeon® E5-2690 v3 @ 2.60GHz (12 XC50 Compute cores, 64GB RAM) and NVIDIA® Tesla® Nodes P100 16GB
XC40 Compute Intel® Xeon® E5-2695 v4 @ 2.10GHz (18 Nodes cores, 64/128 GB RAM)
Interconnect Aries routing and communications ASIC, Configuration and Dragonfly network topology
Scratch ~9 + 2.7 PB capacity
Supercomputing, 2017 3 Piz Daint (2013à2016à)
§ 5,272 hybrid nodes (Cray XC30) § 5,320 hybrid nodes (Cray XC50) § Nvidia Tesla K20x § Nvidia Tesla P100 § Intel Xeon E5-2670 § Intel Xeon E5-2690 v3 § § 6 GB GDDR5 16 GB HBM2 § 64 GB DDR4 § 32 GB DDR3 § 1,431 multi-core nodes (Cray XC40) § No multi-core § 2 x Intel Xeon E5-2695 v4 § 64 and 128 GB DDR4 2013 2016 à § Cray Aries dragonfly interconnect § Cray Aries dragonfly interconnect § ~33 TB/s bisection bandwidth § ~36 TB/s bisection bandwidth § Fully provisioned for 28 cabinets § Sonexion Lustre file system § Sonexion Lustre file system § ~9 PB (Sonexion 3000) & 2.7 PB § 2.7 PB (Sonexion 1600) (Sonexion 1600) § External GPFS on selected nodes
Supercomputing, 2017 4 Piz Daint-–More Versatile Than Before
Computing
Visualization 2013
Data analysis ✙
Pre-post processing 2016 à Data mover
Data Warp
Machine learning
Deep learning
Supercomputing, 2017 5 Overview of Hardware Infrastructure
SWITCHLAN (100 Gbit Ethernet)
CSCS LAN Dedicated platforms
Data Center Network (InfiniBand, Ethernet)
Supercomputing, 2017 6 LHConCray Collaborative Project
§ Teams members from CHIPP (Swiss Institute of Particle Physics) and CSCS § To explore LHC workflows efficiency in a shared environment (Piz Daint) § Goals: full transparency to users while sustaining efficiency metrics and supporting monitoring and accounting tools (complete workflow mapping for multiple experiments)
§ Publications, presentations and community meetings § G. Sciacca, “ATLAS and LHC Computing on Cray”, CHEP, 2016 § L. Benedecis, M. Gila et. al. “Opportunities for container environments on Cray XC30 with GPU devices”. Proceedings of the Cray User Group meeting, 2015 § Status (Jan 2017): https://wiki.chipp.ch/twiki/pub/LCGTier2/MeetingLHConCRAY20170127/20170127.CSCS_CHIPP_ F2F.pdf § Status (Aug 2017): https://wiki.chipp.ch/twiki/pub/LCGTier2/BlogAcceptanceTests2017/LHConCRAY-Run4_CSCS.pdf
§ Community tools Production since § https://wiki.chipp.ch/twiki/bin/view/LCGTier2/LHConCRAYMonitoring April 2017 § http://ganglia.lcg.cscs.ch/ganglia/sltop_lhconcray.html
Supercomputing, 2017 7 LHConCray Project
WLCG platform statistics ~170 sites in 40 countries 350,000+ cores 500+ PB 2+ Mio jobs per day 10-100 Gb links http://wlcg.web.cern.ch/tools
Data Center Network (InfiniBand, Ethernet)
Supercomputing, 2017 8 Status of LHConCray Project
§ Operational since April 2017 § Monitor status and progress at http://wlcg.web.cern.ch/tools & https://wiki.chipp.ch/twiki/bin/view/LCGTier2/LHConCRAYMonitoring
§ Statistics § ~20% of total job submission (<0.4% of total compute resources) § Over 90% of docker/shifter image pull requests are for LHC software
§ Open items (tuning and optimization) § Data corruption patch generated an issue for the swap test case § Continued investigation into DVS and DWS optimization and tuning
Supercomputing, 2017 9 Bridging the Gap à Creating New Abstractions § Light-weight operating system (SLES based)
§ Possible solution: containers or other virtualization interfaces
§ Diskless compute nodes
§ Possible solution: exploit burst buffer or tiered storage hierarchies
§ Computing nodes connectivity (high speed Aries interconnect)
§ Possible solution: web services access with no address translations overhead
Supercomputing, 2017 10 Future Policy and Technical Considerations Convergence of HPC and Data Science Workflows
§ Resource Management Systems (job schedulers) § Too many jobs (relative to HPC jobs mix) § Fine grain control and interactive access
§ Resource Specialization (multi-level heterogeneity) § Subset of nodes with special operating conditions e.g. node sharing
§ Resource Access (authentication, authorization and accounting) § Delegation of access (service and user accounts mappings)
§ Resource Accessibility and Interoperability (middleware services) § Secure and efficient access through web services § Interoperability with multiple storage targets (POSIX & object)
Supercomputing, 2017 11 Thank you for your attention.