Implementation of a Message Passing Interface Into a Cloud-Resolving Model for Massively Parallel Computing

Implementation of a Message Passing Interface into a Cloud-Resolving Model for Massively Parallel Computing Hann-Ming Henry Juang’, Wei-Kuo Tao’, Xiping Zen$’3, Chung-Lin Shie23, Joanne Simpson2and Steve hg4, ’EnvironmentalModeling Center NCEP, NOR4 Washington, DC 2Laboratolyfor Atmospheres NASNGoddard Space Flight Center Greenbelt, MLI 20771, USA 3GoddardEarth Sciences and Technology Center University of Maryland, Baltimore County Baltimore, MD 4Science Systems and Applications Inc. Lanham, MD 20706 June 2,2004 Mon. Wea. Rev. 1 Corresponding author address: Dr. Ham-Ming Henry Juang, Environmental Modeling Center, NCEP, NOAA, W/NP5 Room 204,5200 Auth Road, Camp Springs, MD 20746 Abstract The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the on@ code, so it is easily modified andor managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- 1 directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation. 1. Introduction Cloud-resolving models (CRMs), which are based the non-hydrostatic equations of motion, have been extensively applied to cloud-scale and mesoscale processes during the past four decades (see a brief review by Tao 2003). Table 1 lists the major foci and some (not all) of the key contributors to C& development over the past four decades. Because cloud-scale dynamics are treated explicitly, uncertainties stemming fiom convection that have to be parameterized in large-scale (hydrostatic) models are obviated, or at least mitigated, in CRMs. Also, CRMs solve the equations of motion with much higher spatial and temporal resolution and use more sophisticated and physically realistic parameterizations of cloud microphysical processes (although by no means perfect yet). CRMs also allow explicit interactions between clouds, radiation and surface processes. For this reason, the Global Energy and Water Cycle Experiment (GEWEX) formed the GEWEX Cloud System Study (GCSS), which chose CRMs as the primary approach to improve the representation of moist processes in large- scale models (GCSS Science Plan 1993; Randall et al. 2003). Global models will use a non- hydrostatic fi-amework with horizontal resolutions of 5-10 km one to two decades fkom now. In recent years, exponentially increasing computer power has extended CRM integrations from hours to months (i.e., Wu et al. 1998) and the number of computational grid points fi-om less than a thousand to close to ten million (Grabowski and Moncrieff 2001). 'I I Three-dimensional CRMs are now more prevalent. Much attention is being devoted to precipitating cloud systems where the crucial 1 km scales are resolved in horizontal domains as large as 10,000 km in two-dimensions and 1,000 x 1,000 km2 in three-dmensions. However, many CRMs need to be re-programmed (re-coded) in order to fully utilize the fast advancement of computing technology (i.e., massive parallel processors)'. In this paper, the design for massively parallel programming (MPP) with a message passing interface (MPI) that is implemented into a three-dimensional (3D) version of a CRM, the Goddard Cumulus Ensemble (GCE) model will be presented. The concept of MPI implementation, along with the method of domain decomposition and data-communication to avoid aforementioned risks, will be presented. In section 2, a brief description of the GCE model will be given. The MPI implementation will be described in section 3. The performance of the model with MPI, implementation regarding the model dynamics (anelastic and compressible), stability, speedup, efficiency, reproducibility and wall-clock comparisons among different decompositions, tasks and dimensions using three different computing platforms will be given in section 4. The summary and conclusion are given in section 5 with future model developments. 2. Goddard Cumulus Ensemble (GCE) model 2, I GCE model description and applications 1 IBM Blue Gene Lite supercomputer (estimated to be 5-10 times more powem than the Japanese Earth Simulator by IBM representatives) will be delivered in 12 months with 65,000 CPU nodes and as many as 1 The Goddard Cumulus Ensemble (GCE) model has been developed and improved at NASNGoddard Space Flight Center over the past two decades. The development and main features of the GCE model have been extensively published by Tao and Simpson (1993) and Tao et al. (2003a). Recent improvements and testing were presented in Ferrier (1994), Tao et al. (1996), Wang et al. (1996), Lynn et al. (1998), Baker et al. (2001) and Tao et al. (2003b). A Kessler-type two-category liquid water (cloud water and rain) microphysical formulation is mainly used with a choice of two three-class ice formulations (3ICE), namely that by Lin et al. (1983) and the Lin scheme modified to adopt slower graupel fall speeds as reported by Rutledge and Hobbs (1984). An improved four-class, multiple-moment ice scheme (4ICE) has also been developed (Ferrier 1994) and tested for several convective systems in different geographic locations (Ferrier et al. 1995). The 41CE scheme only requires minimal tuning compared to the 31CE schemes. Recently, two detailed spectral-bin microphysical schemes (Khain et al. 2000; Chen and Lamb 1999) were also implemented into the GCE model. The formulation for the explicit spectral-bin microphysical processes is based on solving stochastic kinetic equations for the size distribution functions of water droplets and several types of ice particles. Each type is described by a special size distribution function containing many categories (i.e., 33 bins). Atmospheric aerosols are also described using number density size-distribution functions. Significant computation is required in applying this explicit spectral-bin microphysics to study cloud-aerosol interactions and nucleation scavenging of aerosols, as well as the impact of different concentrations and size distributions 655,361 CPU nodes with optimal parallel efficiency leaving a lot of room for future supercomputer 3 ?r P of aerosol particles upon cloud formation. These new microphysics, however, require the use of a multi-dimensional Positive Definite Advection Transport Algorithm (MPDATA, Smolarkiewicz and Grabowski 1990) to avoid ttdecouplingt'between mass and number concentration2. The positive definite advection scheme also produces more light precipitation, which is in better agreement with observations (Johnson et al. 2002; Lang et al. 2003). Solar and infrared radiative transfer processes (Chou and Suarez 1999; Chou et al. 1999) have been included, and their impact on cloud development as well as several hypotheses associated with cloud-radiation interaction have been assessed (Tao et al. 1996; Sui et al. 1998). A sophisticated seven-layer soiYvegetation land process model has also been implemented into the GCE model &ynn et al. 1998). Subgrid-scale (turbulent) processes in the WEmodel are parameterized using a scheme based on Klemp and Wilhelmson (1978) and Soong and Ogura (1980), and the effects of both dry and moist processes on the generation of subgrid-scale kinetic energy have been incorporated. Table 2 shows the major characteristics of the GCE model. The application of the WE model to the study of precipitation processes can be generalized into fourteen categories (see Table 2 in Tao 2003). They are: 1) the mechanisms associated with cloud-cloud interactions and mergers (Tao and Simpson 1984, 1989a), 2) Q 1 and 42 Budgets and their individual components in different geographic locations (Soong and development with microchips. 2 Decoupling means that a grid point has mass without number concentration or has number concentration without mass. The decoupling is caused by large phase errors associated with the spatially centered (second or fourth order) advection scheme. Tao 1980; Tao and Soong 1986), 3) statistical characteristics of clouds, convective updrafts I and downdrafts (Tao et al. 1987), 4) role of the horizontal pressure grabent force on I l momentum transport and budget (Soong and Tao 1984; Tao et al. 1999, 5) ice processes and their role in stratiform rain formation and the associated mass, Q1 and 42 budgets (Tao and Simpson 1989b, Tao et al. 1989), 6) the redistribution of trace gases by convection and enhancement of 03production in the tropics (Scala et al. 1990; Pickering et al. 1992%b; and a review by Thompson et al. 1997), 7) precipitation efficiency (Ferrier et al. 1996; Tao et al. 2004), 8) cloud radiation interaction and their impact on diurnal variation of precipitation (Tao et al. 1996), 9) the horizontal transport of hydrometeors and water vapor fiom convective towers into the stratiform region (Tao et al.

Implementation of a Message Passing Interface Into a Cloud-Resolving Model for Massively Parallel Computing

2.5 Classification of Parallel Computers

A Massively-Parallel Mixed-Mode Computer Designed to Support

Massively Parallel Computing with CUDA

CS 677: Parallel Programming for Many-Core Processors Lecture 1

Core Processors

Massively Parallel Computers: Why Not Prirallel Computers for the Masses?

Introduction to Parallel Computing

Massively Parallel Processor Architectures for Resource-Aware Computing

Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture

Massively Parallel Message Passing: Time to Start from Scratch?

An Event-Driven Massively Parallel Fine-Grained Processor Array

A Massively Parallel Digital Learning Processor