DOE and NSF Present and upcoming HPC facilities

erhtjhtyhy

DOUG BENJAMIN Argonne National Lab

WLCG- HSF- OSG meeting JLab Introduction and acknowledgements . This Talk contains: – Evolution time-line intentionally vague (from several months to several years) – Selected examples of how HPC’s are evolving – Details on existing Machines in US and the Future plans – And a future plan for Japan

2 Current DOE machines . OLCF – Titan (shutdown this summer) – . ALCF – Mira – Theta . NERSC – Cori

3 NSF XSede

4 TACC Stampede2

5 DOE - OLCF . At OLCF Titan is shutting down this summer . New machine Summit is on-line now. Available through Insite/ALCC process

6 DOE - ALCF

Theta System Configuration •24 racks •4,392 nodes •281,088 cores •70.272 TB MCDRAM •843.264 TB DDR4 •562.176 TB SSD •Aries interconnect with Dragonfly configuration •10 PB Lustre file system •Peak performance of 11.69 petaflops

7 DOE – NERSC - Cori

Two Phases • Phase 1 - Haswell • Phase 2 - KNL

8 Big US HPC TeraFlop/s (Top500.org)

Theta, 6920.9 Mira, 8586.612 Nov 2018 Top500 Ranking Stampede2, 10680.7 #1 Summit – Power9 /GPU #12 Cori – Haswell & KNL #17 Stampede2 – #21 Mira – IBM BG/Q #24 Theta - KNL Cori, 14014.7

Summit, 143500

9 Wide Area Data Transfers – common tool - Globus

Common tool recommended by all large HPC centers – Globus • Web interface for users • Python-SDK for programmatic access - Used by ATLAS for ~ 2 years - Note – Not integrated with Rucio 10 Is there a common tool Job submission? . - in a word NO. . Missing a common API to submit jobs to the HPC job scheduler from outside of the HPC center.

. Globus provides a common API/service to transfer files between HPC’s and to other locations.

. Proposal: – Experiment HEP come up with common set of criteria. – We approach the HPC centers with our problem and see how we can collaborate with them on how to solve the problem. • We can not dictate our solution to them but work with them for a solution.

11 NEW FUTURE MACHINES

12 NSF

13 Evolution in the Bay Area (ie NERSC) - Perlmutter . To be delivered in 2020 . Heterogenous system . 3 times more powerful than Cori (Cori = 14 PFs) ~ 40 PFs . CPU cores for CPU only nodes . CPU-GPU for ML and the like

14

15 Aurora

“The foundation of the Aurora supercomputer will be new technologies designed specifically for the convergence of artificial intelligence and high-performance computing at extreme computing scale. These include a future generation of Intel® Xeon® Scalable processor, Intel’s Xe compute architecture, a future generation of Intel® Optane™ DC Persistent Memory and Intel’s One API software. Aurora will use ’s next-generation supercomputer system, code-named ​“Shasta,” which will comprise more than 200 cabinets and include Cray’s Slingshot™ high-performance scalable interconnect and the Shasta software stack optimized for Intel architecture.” 16 https://www.anl.gov/article/us-department-of-energy-and-intel-to-deliver-first-exascale-supercomputer What is Intel’s Xe Compute Architecture? “The architecture is a key element of this bold, new engineering vision that we announced in December. Intel Xe architecture spans multiple computing and graphics market segments and will include a range of implementations that will allow us to address a wide range of markets and workloads, from mainstream notebooks to enthusiast game systems, to powerful computing solutions for the data center.”

David Blythe, Chief Architect, Xe https://itpeernetwork.intel.com/intel-xe- compute

17 Time line to exascale in US (ALCF and OLCF) . ALCF . OLCF

now now Each machine will have difference architecture.

18 Japan - Post K – exascale – Many low power cores

ARM – many low power cores

19 Future HPC machines are a very HETEROGENOUS Lot. – Low power cores – ARM – Non-GPU Accelerators – Powerful GPU machines Conclusions: • World wide the biggest HPC’s will be different • Should experiment HEP should make it a priority to target software ports to different architectures (for example ARM or accelerators)? • Where will the effort come from? • How we are access the HPC’s is evolving. • Using these big resources is a software issue. Extra slides

22 NSF Xsede Startup Allocations

23 References . HPC ASIA Conference http://sighpc.ipsj.or.jp/HPCAsia2018/ – Invited Speaker “An Overview of Post-K Development” Dr. Yutaka Ishikawa (RIKEN AICS) http://sighpc.ipsj.or.jp/HPCAsia2018/images/YutakaIshikawa_slides.pdf . Fujitsu talks and press releases on Post-K http://www.fujitsu.com/global/about/resources/news/press-releases/2018/0621-01.html ISC 2018 - http://www.fujitsu.com/global/solutions/business-technology/tc/events/isc18/ http://www.fujitsu.com/global/Images/post-k-computer-development.pdf http://www.fujitsu.com/global/Images/post-k-supercomputer-for-application-performance.pdf SC 2018 – http://www.fujitsu.com/global/solutions/business-technology/tc/events/sc18/index.html http://www.fujitsu.com/global/Images/post-k_supercomputer_with_fujitsu%27s_original_cpu_a64fx_powered_by_arm_isa.pdf References (2) . Summit @ OLCF https://www.olcf.ornl.gov/summit/ https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/ . CORI/Perlmutter @ NERSC https://www.nersc.gov/systems/perlmutter/ https://www.nersc.gov/users/computational-systems/cori/ https://www.energy.gov/articles/doe-build-next-generation-supercomputer-lawrence-berkeley-national-laboratory . Exascale at US HPC’s https://www.alcf.anl.gov/alcf-aurora-2021-early-science-program-data-and-learning-call-proposals https://www.olcf.ornl.gov/2018/02/13/frontier-olcfs-exascale-future/ . Theta/Aurora @ ANL https://www.anl.gov/article/us-department-of-energy-and-intel-to-deliver-first-exascale-supercomputer https://www.tomshardware.com/news/intel-exascale-aurora-supercomputer-xe-graphics,38851.html https://www.alcf.anl.gov/theta https://itpeernetwork.intel.com/intel-xe-compute . Stampede2/Frontera @TACC https://www.tacc.utexas.edu/systems/stampede2 https://www.tacc.utexas.edu/systems/frontera References (3) . Data Transfer and Management https://portal.xsede.org/data-management https://www.olcf.ornl.gov/for-users/system-user-guides/summit/summit-user-guide/#data-storage-&-transfers https://www.nersc.gov/users/data-analytics/data-transfer/ https://www.alcf.anl.gov/user-guides/data-transfer https://www.globus.org/

. Miscellaneous links https://www.hpcwire.com/2018/11/16/dell-emcs-hpc-chief-on-strategy-and-emerging-processor-diversity/ https://www.hpcwire.com/2018/11/12/us-leads-number-one-number-two-petascale-arm/ https://www.hpcwire.com/2018/11/08/ceas-pick-of-thunderx2-based-atos-system-boosts-arm/ (Europe) . AMD Epyc https://www.hpcwire.com/2018/11/16/at-sc18-amd-sets-up-for-epyc-epoch/ . NVIDA Tensor core+GPU https://www.nvidia.com/en-us/data-center/tensorcore/