Convergence between HPC and Big Data: The Day After Tomorrow Maxime Martinasso Swiss National Supercomputing Centre (CSCS / ETH Zurich) Supercomputing 2018 Piz Daint and the User Lab

Model XC40/XC50

Intel® ® E5-2690 v3 @ 2.60GHz (12 XC50 Compute cores, 64GB RAM) and NVIDIA® Tesla® Nodes P100 16GB

XC40 Compute ® Xeon® E5-2695 v4 @ 2.10GHz (18 Nodes cores, 64/128 GB RAM)

Interconnect Aries routing and communications ASIC, Configuration and Dragonfly network topology

Scratch ~9 + 2.7 PB capacity

SC18 Panel 2 New requirements to access HPC hardware

§ Definition of users and their capabilities § Workflow, scientific devices, web portal § User defined software stack

§ Connectivity of HPC § Increasing connectivity of compute nodes (Internet, 3rd party services) § HPC service needs an interface to be used by other services

§ Interactivity § Jupyter notebooks (JHub, JLab, …) § Community portal

§ Data management § Data mover/broker service § Select physical storage: SSD and in memory on compute nodes, Scratch, Archive

SC18 Panel 3 Paul Scherrer Institute

§ PSI Mission § Study the internal structure of a wide range of different materials § Research facilities: the Swiss Light Source (SLS), the free-electron X-ray laser SwissFEL, the SINQ neutron source and the SμS muon source

§ PSI facility users reserve a scientific device for a period of time § Compute power should also be available § Storage and archive availability during the experiment § Data retrievable after experiment by the users of PSI facilities (not PSI)

§ Proposal to interface Piz Daint with their workflow § Use an API to access compute and data services (job scheduler, data mover) § Create a reservation service to reserve computation nodes § Provide a portal running on OpenStack to let PSI users access archived data at CSCS

SC18 Panel 4 Convergence of HPC and Big Data Science Workflows

INTERNET

Gateway

Public IPs User Login node Gate to CSCS CN API HPC parallel FS

Containers Local storage and in memory Data science workflows API Interactive compute

Local storage Archives Software defined infrastructure SC18 Panel 5 Challenges

§ Authentication and Authorization infrastructure § Enable multiple identity providers (not only users known by the HPC centre) § Identify “who” (workflow, scientific device, web portal,…) is authorized to use HPC services

§ Data management § Complex data ownership and security with multiple identity providers § Automated staging in/out and transformation of data (POSIX to SWIFT)

§ Workflow systems § Which workflow engines or standards to support? § Enable access to HPC services via a REST API (compute, data, reservation) § Interactive service and batch scheduling (preemption, priority)

SC18 Panel 6 Thank you for your attention.