Understanding the Usage of MPI in Exascale Proxy Applications

Understanding the Usage of MPI in Exascale Proxy Applications Nawrin Sultana Anthony Skjellum Purushotham Bangalore Auburn University University of Tennessee at Chattanooga University of Alabama at Birmingham Auburn, AL Chattanooga, TA Birmingham, AL [email protected] [email protected] [email protected] Ignacio Laguna Kathryn Mohror Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory Livermore, CA Livermore, CA [email protected] [email protected] Abstract—The Exascale Computing Project (ECP) focuses on suite [5] consists of proxies of ECP projects that represent the the development of future exascale-capable applications. Most key characteristics of these exascale applications. ECP applications use the Message Passing Interface (MPI) as In this paper, we explore the usage patterns of MPI among their parallel programming model and create mini-apps to serve as proxies. This paper explores the explicit usage of MPI in ECP proxy applications. We explore fourteen proxy appli- ECP proxy applications. For our study, we empirically analyze cations in particular from the ECP Proxy Apps Suite and fourteen proxy applications from the ECP Proxy Apps Suite. Our summarize the MPI functionality used in each. An earlier result shows that a small subset of features from MPI is com- survey [6] on different U.S. DOE applications shows that monly used in the proxies of exascale capable applications, even the HPC community will continue to use MPI in future when they reference third-party libraries. Our study contributes to a better understanding of the use of MPI in current exascale exascale systems for its flexibility, portability, and efficiency. applications. This finding can help focus software investments We study the proxies of those applications to understand how made for exascale systems in the MPI middleware including much and what features of the MPI standard they are using. optimization, fault-tolerance, tuning, and hardware-offload. This provides appropriate guidance to developers of full MPI Index Terms—Exascale Proxy Applications, MPI, Empirical implementations as well as extremely efficient subsets for Analysis future systems. There are multiple open source MPI implementations avail- I. INTRODUCTION able (e.g., OpenMPI [7], MPICH [8], and MVAPICH [9]), In high-performance computing (HPC) systems, MPI is an and many high-end vendors base their commercial versions extensively used API for data communication [1]. A con- on these source bases (e.g., Intel, Cray, and IBM). All of siderable number of large-scale scientific applications have the currently available MPI implementations are monoliths. been written based on MPI. These applications are success- However, sometimes a small subset of MPI is capable of fully running in current petascale systems with 1,000’s to running the proxy of a real application [10]. Large-scale appli- 100,000’s of processors [2] and will continue to run in future cations often use only a subset of MPI functionality. Our work exascale systems [3]. To increase the applicability of MPI in contributes to a better understanding of MPI usage in mini- next-generation exascale systems, the MPI Forum is actively apps that represent real exascale applications. It shows that working on updating and incorporating new constructs to MPI a small subset of features from MPI standard are commonly standard. used in the proxies of future exascale capable applications. The Exascale Computing Project (ECP) [4] is an effort to The remainder of this paper is organized as follows: Section accelerate the development and delivery of a capable exascale II discusses the overview of the proxy apps used in our study. computing ecosystem. Different ECP applications use different Section III demonstrates the MPI usage patterns among those parallel programming models. One of the focuses of ECP is to applications. Finally, we offer conclusions in section IV. provide exascale capability and continued relevance to MPI. These ECP applications are typically large and complex with II. OVERVIEW OF APPLICATIONS thousands to millions lines of code. As a means to access For our study, we targeted ECP Proxy Applications Suite 2.0 their performance and capabilities, most of the applications [5], which contains 15 proxy apps. We focused on applications create “mini-apps” to serve as their proxies or exemplars. that use MPI. We found 14 applications in the suite that use Proxy application represents the key characteristics of the real MPI for communication. Some of these applications use a application without sharing the actual details. As part of ECP, hybrid model (OpenMP and MPI) of parallel programming multiple proxy apps have been developed. The ECP proxy apps where OpenMP is used for parallelism within a node while TABLE I APPLICATION OVERVIEW Application Language Description Third-party Library Programming Model Parallel algebric multigrid solver for linear systems AMG C hypre MPI, OpenMP arising from problems on unstructured grids Represent highly simplified communication patterns Ember C N/A MPI relevant to DOE application workloads Proxy application for Molecular Dynamics with ExaMiniMD C++ N/A MPI a Modular design Solves Euler equation of compressible gas dynamics Laghos C++ hypre, MFEM, Metis MPI using unstructured high-order finite elements Multi-purpose, scalable I/O proxy application that MACSio C HDF5, Silo MPI mimics I/O worklods of real applications Applies a 3D stencil calculation on a unit cube miniAMR C N/A MPI, OpenMP computational domain– divided into blocks Designed to evaluate different programming models miniQMC C++ N/A MPI, OpenMP for performance portability Detects graph community by implementing Louvain miniVite C++ N/A MPI, OpenMP method in distributed memory Thermal Hydraulics mini-app that solves a standard NEKbone Fortran N/A MPI Poisson equation Portrays the computational loads and dataflow of PICSARlite Fortran N/A MPI complex Particle-In-Cell codes Solves the seismic wave equations in Cartesian SW4lite C, Fortran N/A MPI, OpenMP coordinates Run 3D distributed memory discrete fast Fourier SWFFT C FFTW3 (MPI interface not used) MPI transform Solves radiative trasnfer equation in a multi-group thornado-mini Fortran HDF5 MPI two-moment approximation Represents a key computational kernel of the XSBench C N/A MPI, OpenMP Monte Carlo neutronics application MPI is used for parallelism among nodes. We empirically The majority of applications (93%) of Table I use analyzed these fourteen applications to understand how they MPI_Init for initialization. Only one application (PICSAR- use MPI. For the purpose of our study, we only did static code lite) uses MPI thread based initialization. In PICSARlite, analysis. MPI_Init_thread is called with MPI_THREAD_SINGLE Table I provides an overview of the applications. Although and MPI_THREAD_FUNNELED. In both cases, only one some of the applications are written in C++, they all use C or thread makes the MPI calls. Fortran to call MPI routines. A number of applications (36%) are dependent on third-party software libraries. However, not B. MPI Communication all of these libraries are MPI-based. Four of the applications— The MPI standard [13], [14] provides different techniques AMG, Laghos, MACSio, and thornado-mini use MPI-based for communication among processes. A set of processes are third-party numerical and I/O libraries. managed using “communicator” – an object that allows com- Almost 50% of the applications (6) use both MPI and munication isolation for a group of processes. OpenMP as their parallel programming model. The hypre [11] Point-to-point. Transmit message between a pair of pro- library also uses OpenMP along with MPI. The data model cesses where sender and receiver cooperate with each other, library, HDF5 [12] uses “pthread” as its parallel execution which is referred to as “two-sided” communication. To com- model. municate a message, the source process calls a send operation and the target process must call a receive operation. MPI III. USAGE OF MPI IN ECP PROXY APPS provides both blocking and non-blocking forms of point-to- point communications. In this section, we analyze the proxy applications to eluci- Wildcard. MPI allows the receive and probe operations to date their MPI usage patterns. specify a wildcard value for source and/or tag. It indicates that a process will accept a message from any source and/or A. MPI Initialization tag. The source wildcard is MPI_ANY_SOURCE and the tag All MPI programs must call MPI_Init or wildcard is MPI_ANY_TAG. The scope of these wildcards is MPI_Init_thread to initialize the MPI execution limited to the processes of the specified communicator. environment [13]. MPI can be initialized at most once. Collective. A communication that involves participation of In addition to initializing MPI, MPI_Init_thread also all processes of a given communicator. All processes of the initializes the MPI thread environment. It requests the desired given communicator need to make the collective call. Collec- level of thread support using the argument “required”. tive communications do not use tag. Collective communica- TABLE II MPI CALLS USED FOR COMMUNICATIONS Point-to-point Collective Application Blocking Non-blocking Blocking Non-blocking MPI Allreduce, MPI Reduce MPI Allgatherfvg, MPI Gatherfvg MPI Isend MPI Send MPI Alltoall AMG, Laghos MPI Irsend N/A MPI Recv MPI Barrier, MPI Bcast MPI Irecv MPI Scan MPI Scatterfvg MPI Send MPI Isend Ember MPI Barrier N/A MPI Recv MPI Irecv MPI Allreduce ExaMiniMD MPI Send MPI Irecv MPI Barrier

Load more