The Effect of Infiniband In-Network Computing on CAE Simulations HPC-AI Advisory Council
Total Page:16
File Type:pdf, Size:1020Kb
The Effect of InfiniBand In-Network Computing on CAE Simulations HPC-AI Advisory Council nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI The HPC-AI Advisory Council • World-wide HPC non-profit organization • More than 400 member companies / universities / organizations • Bridges the gap between HPC-AI usage and its potential • Provides best practices and a support/development center • Explores future technologies and future developments • Leading edge solutions and technology demonstrations October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI HPC Advisory Council Members October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI HPC-AI Advisory Council Cluster Center (Examples) • Supermicro / Foxconn 32-node cluster • Dual Socket Intel(R) Xeon(R) Gold 6138 CPU @ 2.00GHz • Dell™ PowerEdge™ R730/R630 36-node cluster • Dual Socket Intel® Xeon® 16-core CPUs E5-2697A V4 @ 2.60 GHz • AMD Daytona_X • Dual Socket AMD Rome 128 core 8-node cluster @ 2.25GHz Multiple Applications Best Practices Published • Abaqus • CPMD • Lattice QCD • OpenFOAM • ABySS • Dacapo • LAMMPS • OpenMX • AcuSolve • Desmond • LS-DYNA • OptiStruct • Amber • DL-POLY • miniFE • PARATEC • AMG • Eclipse • MILC • PFA • AMR • FLOW-3D • MSC Nastran • PFLOTRAN • GADGET-2 • ANSYS CFX • MR Bayes • Quantum ESPRESSO • • Graph500 ANSYS FLUENT • MM5 • RADIOSS • ANSYS Mechanical • GROMACS App • MPQC • SNAP • BQCD • Himeno • NAMD • SPECFEM3D • BSMBench • HIT3D • Nekbone • STAR-CCM+ • CAM-SE • HOOMD-blue • NEMO • STAR-CD • CCSM • HPCC • NWChemApp • VASP App • CESM • HPCG • Octopus • VSP • COSMO • HYCOM • OpenAtom • WRF • CP2K • ICON nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI HPC-AI Advisory Council Activities • HPC-AI Advisory Council – More then 400 members, http://www.hpcadvisorycouncil.com/ – Application best practices, case studies – Development and benchmarking center with remote access for users – World-wide conferences • Conferences – USA (Stanford University) – February – Switzerland (CSCS) – April – Student Cluster Competition (ISC) – July – China (HPC China) - August – Australia - August – UK – September – China – November • Competitions – APAC HPC-AI Competition - March – China - 6th Annual RDMA Competition - May – ISC Germany - Annual Student Cluster Competition - June • For more information – www.hpcadvisorycouncil.com – [email protected] October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI HPC|Works Community October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI Computing Evolution – Compute Centric to Data Centric Compute-Centric Data-Centric Von Neumann Neural Machine Networks October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI The Need for Intelligent Data Center CPU-Centric (Onload) Data-Centric (Offload) CPU GPU CPU GPU GPU CPU GPU CPU Onload Network CPU GPU CPU GPU GPU CPU GPU CPU Move Data to the Compute Move Compute to the Data Must Wait for the Data Analyze Data Everywhere Creates Performance Bottlenecks Higher Performance and Scale October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI Scalable Hierarchical Aggregation and Reduction Protocol (SHARP) • Reliable Scalable General Purpose Primitive • In-network Tree based aggregation mechanism • Large number of groups • Multiple simultaneous outstanding operations • Applicable to Multiple Use-cases • HPC Applications using MPI / SHMEM Switch • Distributed Machine Learning applications Aggregated Aggregated Data Result • Scalable High Performance Collective Offload Switch Switch • Barrier, Reduce, All-Reduce, Broadcast and more • Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND • Integer and Floating-Point, 16/32/64 bits Data Aggregated Result Host Host Host Host Host SHARP AllReduce Performance Advantages (128 Nodes) SHARP AllReduce Performance Advantages 1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology The Niagara Supercomputer – University of Toronto OpenFOAM • Toolbox in an open source CFD applications that can simulate – Complex fluid flows involving – Chemical reactions – Turbulence – Heat transfer – Solid dynamics – Electromagnetics – The pricing of financial options October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI OpenFOAM Profiling – MPI/User Time Ratio • OpenFOAM simpleFOAM solver uses mainly non-blocking communications • 23% of overall runtime spent on MPI communication at 16 nodes / 640 MPI cores • Both Intel MPI and HPC-X spent the same time in overall runtime on MPI communications • Overall of MPI time spent in MPI non-blocking communications (MPI_Waitall 47%, MPI_Isend, 47%) • Most of the MPI calls made by OpenFOAM are MPI_Waitall October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI OpenFOAM Profiling – MPI Time • MPI profiler shows the type of underlying MPI network communications – Majority of communications occurred are non-blocking communications • Majority of the MPI time is spent on non-blocking communications at 32 nodes – MPI_Waitall (11% wall), 8-byte MPI_Recv (1.4% wall), 1-byte MPI_Recv (0.7% wall) – Only 14% of the overall runtime is spent on MPI communications at 32-nodes (when EDR is used) October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI OpenFOAM Profiling – MPI Communication Topology • Communication topology shows communication patterns among MPI ranks • MPI processes mainly communicates with neighbors, but also shows some other patterns 4 Nodes 8 Nodes 16 Nodes 32 Nodes October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI OpenFOAM Performance E5-2697A v4 @ 2.60GHz, HDR100 23% October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI OpenFOAM Performance E5-2697A v4 @ 2.60GHz, HDR100 50% October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI OpenFOAM Performance Using (HPC-X 2.5 MPI) 35% October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI LS-DYNA • LS-DYNA – A general purpose structural and fluid analysis simulation software package capable of simulating complex real world problems – Developed by the Livermore Software Technology Corporation (LSTC) • LS-DYNA used by – Automobile – Aerospace – Construction – Military 2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION – Manufacturing – Bioengineering 21 October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI LS-DYNA Performance Intel Xeon Gold 6138 CPU 2.00GHz , HDR100 2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION 22 October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI LS-DYNA Performance Intel Xeon Gold 6138 CPU 2.00GHz , HDR100 39 % 2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION 23 October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI ANSYS Fluent • Computational Fluid Dynamics (CFD) – Enables the study of the dynamics of things that flow – Enable better understanding of qualitative and quantitative physical phenomena in the flow which is used to improve engineering design. • CFD brings together a number of different disciplines – Fluid dynamics, mathematical theory of partial differential systems, computational geometry, numerical analysis, Computer science. • ANSYS FLUENT2019, 28 - 29 October is a35th INTERNATIONALleading CAE CONFERENCE CFD AND EXHIBITION application from ANSYS – Widely used in almost every industry sector and manufactured product. 24 October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI ANSYS Fluent E5-2697A v4 @ 2.60GHz, HDR100 26% 2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION 25 October 1st, 2019 | Columbus, OH nafems.org/americas Simulation in the Automotive Industry: Creating the Next Generation Vehicle November 14th, 2019 | Troy, MI ANSYS Fluent E5-2697A v4 @ 2.60GHz, HDR100 15% 2019, 28 - 29 October 35th INTERNATIONAL CAE CONFERENCE AND EXHIBITION 26 October 1st, 2019