<<

Manuscript Click here to view linked References

1 sbPOM: A parallel implementation of Princenton 2 3 4 Model 5 6 7 8 9 10 Antoni Jordi1 and Dong-Ping Wang2 11 12 13 14 15 16 1Institut Mediterrani d’Estudis Avançats, IMEDEA (UIB-CSIC), Miquel Marqués 21, 17 18 19 07190 Esporles, Illes Balears, Spain 20 21 22 23 24 2 25 School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, 26 27 NY 11794, USA. 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 ______43 44 45 Corresponding author: Antoni Jordi 46 47 48 Institut Mediterrani d’Estudis Avançats, IMEDEA (UIB-CSIC), Miquel Marqués 21, 49 50 07190 Esporles, Illes Balears, Spain 51 52 53 E-mail: [email protected] 54 55 56 Phone: +34 971 611 910 57 58 59 Fax: +34 971 611 761 60 61 62 63 64 65

Abstract 1 2 3 This paper presents the Stony Brook Parallel Ocean Model (sbPOM) for execution on 4 5 6 workstations, Linux clusters and massively parallel supercomputers. The sbPOM is 7 8 derived from the Princenton Ocean Model (POM), a widely used community ocean 9 10 circulation model. Two-dimensional data decomposition of the horizontal domain is 11 12 13 used with a halo of ghost cells to minimize communication between processors. 14 15 Communication consists of the exchange of information between neighbor processors 16 17 18 based on the Message Passing Interface (MPI) standard interface. The Parallel-NetCDF 19 20 library is also implemented to achieve a high efficient input and output (I/O). Parallel 21 22 23 performance is tested on an IBM Blue Gene/L massively parallel supercomputer, and 24 25 efficiency using up to 2048 processors remains very good. 26 27 28 29 30 31 Keywords. Ocean circulation: Numerical model; Parallel computing 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 2 63 64 65

Software availability 1 2 3 - Software name: sbPOM 4 5 6 - Developers: Antoni Jordi and Dong-Ping Wang 7 8 9 - Contact: [email protected] 10 11 12 - Software requirements: Linux or Unix, fortran compiler, mpich 2, parallel NetCDF 13 14 15 - Programming language: Fortran 77 16 17 18 - Availability and cost: Test cases are available at http://imedea.uib- 19 20 csic.es/users/toni/sbpom at no cost 21 22 23 24 25 26 1. Overview 27 28 29 Ocean circulation models are an integral part of the ocean observing system, which play 30 31 32 a central role in the study of global climate and biogeochemical cycles. There are many 33 34 ocean models using different approaches to spatial discretization and vertical coordinate 35 36 37 treatment, numerical algorithms for time-stepping, advection, pressure gradient, and 38 39 subgrid-scale parameterizations. Reviews on ocean models and recent developments can 40 41 be found in Griffies et al. (2000) and Ezer et al. (2002). Among them, the Princeton 42 43 44 Ocean Model (POM) is widely adopted 45 46 (http://www.aos.princeton.edu/WWWPUBLIC/htdocs.pom/). The POM community has 47 48 49 developed several coupled models for biogeochemical (Chau, 2003; Vichi et al., 1998), 50 51 sediment (Liu and Huang, 2009; Xu et al., 2010), and operational applications (Nittis et 52 53 54 al., 2006; Price et al., 2006). Recent advances include the wetting and drying scheme 55 56 (Oey, 2005) and surface wave- coupling (Mellor et al., 2008). In addition, 57 58 59 several parallel POM codes have been described (Boukas et al., 1999; Giunta et al., 60 61 62 3 63 64 65

2007). However, these codes are not publicly available, are limited to the use of specific 1 2 3 hardware architectures, or are extensively modified from the serial code. In response to 4 5 these limitations, we developed a new parallel version of POM (the sbPOM) available 6 7 to the general public, which minimizes the modification to the POM code and achieves 8 9 10 good scalability on a wide range of number of processors. 11 12 13 14 15 16 2. Model description 17 18 19 The procedure for the parallelization of POM is the implementation of a message- 20 21 passing code using two-dimensional data decomposition of the horizontal domain. This 22 23 24 approach ensures the portability across a large variety of parallel machines, and allows 25 26 to maintain the same numerical algorithms used in the serial code. The horizontal global 27 28 29 domain is thus partitioned into two-dimensional local domains using a Cartesian 30 31 decomposition, and the vertical dimension is not divided. Each local domain (i.e. each 32 33 processor) integrates independently the code on the local domain. The computation 34 35 36 applied to each local domain is the same as that applied to the entire global domain with 37 38 the serial code. The horizontal arrays assigned to each local domain are expanded by 39 40 41 one grid point in each horizontal dimension, creating a halo of ghost cells (Fig. 1). The 42 43 two- and three-dimensional arrays in these ghost cells are exchanged between neighbor 44 45 46 local domains using the MPI standard interface 47 48 (http://www.mcs.anl.gov/research/projects/mpich2/). 49 50 51 The sbPOM has implemented Parallel-NetCDF 52 53 (http://trac.mcs.anl.gov/projects/parallel-netcdf), which provides high-performance 54 55 56 parallel I/O while still maintains file-format compatibility with Unidata's NetCDF 57 58 (http://www.unidata.ucar.edu/software/netcdf/). This makes the files space-efficient, 59 60 61 62 4 63 64 65

self-describing and machine independent. NetCDF is also recognized by many graphics 1 2 3 and post-processing utilities. 4 5 6 In order to assess the performance of sbPOM, we used the test case (Ezer et 7 8 al., 2002; Mellor et al., 1998) which is made available with the code. It is a stratified 9 10 Taylor column problem which simulates the flow across a seamount. The test case has 11 12 13 1026 × 770 global grid points and 31 vertical sigma levels. The simulation time is 14 15 measured while varying the number of processors on a Blue Gene/L massively parallel 16 17 18 supercomputer housed in the New York Center for Computational Sciences 19 20 (http://www.newyorkccs.org/). The problem size for the test case is too large to run on 21 22 23 one or two processors due to memory allocation. Simulations are therefore run with a 24 25 number of processors ranging from 4 to 2,048 (2K), and the efficiency is measured with 26 27 28 respect to the four processors, tt4 nproc nproc4 , where tnproc is the simulation time 29 30 when using nproc processors. The efficiency is very high even for 2K processors (~ 0.8) 31 32 33 (Fig. 2). 34 35 36 37 38 39 3. Concluding remarks 40 41 42 POM is a relatively simple community model with very few options for numerics, 43 44 physics, boundary conditions, and I/O. In this regard, sbPOM also is a relatively simple 45 46 47 parallelization in order to maintain the POM legacy. This should provide POM 48 49 developers/users with a familiar code that they can easily use or modify. The user 50 51 52 community is encouraged to parallelize recent advances in POM and to develop new 53 54 parallel techniques. Despite its simplicity, sbPOM test results on IBM BlueGene/L 55 56 show a very high efficiency. The supreme CPU performance on supercomputers opens a 57 58 59 new era for high-resolution ocean modeling. An example is application of sbPOM with 60 61 62 5 63 64 65

high spatial resolution to simulate submesoscale processes of surface frontogenesis of 1 2 3 the Mid-Atlantic Bight (U.S.A) shelfbreak front (Wang and Jordi, 2011). 4 5 6 7 8 9 Acknowledgments 10 11 This research utilized resources at the New York Center for Computational Sciences at 12 13 14 Stony Brook University/Brookhaven National Laboratory which was supported by the 15 16 U.S. Department of Energy under Contract No. DE-AC02-98CH10886 and by the State 17 18 19 of New York. A. Jordi’s work was supported by a Ramón y Cajal grant from MICINN. 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 6 63 64 65

References 1 2 3 Boukas, L.A., Mimikou, N.T., Missirlis, N.M., Mellor, G.L., Lascaratos, A., Korres, G., 4 5 6 1999. The Parallelization of the Princeton Ocean Model, In: Amestoy, P. (Ed.), Lecture 7 8 Notes in Computer Science. Springer-Verlag: Berlin, pp. 1395-1402. 9 10 11 Chau, K.W., 2003. Manipulation of numerical coastal flow and water quality models. 12 13 Environmental Modelling & Software 18(2) 99-108. 14 15 16 Ezer, T., Arango, H., Shchepetkin, A.F., 2002. Developments in terrain-following ocean 17 18 19 models: intercomparisons of numerical aspects. Ocean Modelling 4(3-4) 249-267. 20 21 22 Giunta, G., Mariani, P., Montella, R., Riccio, A., 2007. pPOM: A nested, scalable, 23 24 parallel and Fortran 90 implementation of the Princeton Ocean Model. Environmental 25 26 27 Modelling & Software 22(1) 117-122. 28 29 30 Griffies, S.M., Boning, C., Bryan, F., Chassignet, E.P., Gerdes, R., Hasumi, H., Hirst, 31 32 A., Treguier, A.M., Webb, D., 2000. Developments in ocean climate modelling. Ocean 33 34 Modelling 2 123-192. 35 36 37 Liu, X.H., Huang, W.R., 2009. Modeling sediment resuspension and transport induced 38 39 40 by storm wind in Apalachicola Bay, USA. Environmental Modelling & Software 24(11) 41 42 1302-1313. 43 44 45 Mellor, G.L., Donelan, M.A., Oey, L.Y., 2008. A Surface Wave Model for Coupling 46 47 48 with Numerical Ocean Circulation Models. Journal of Atmospheric and Oceanic 49 50 Technology 25(10) 1785-1807. 51 52 53 Mellor, G.L., Oey, L.Y., Ezer, T., 1998. Sigma coordinate pressure gradient errors and 54 55 56 the Seamount problem. Journal of Atmospheric and Oceanic Technology 15(5) 1122- 57 58 1131. 59 60 61 62 7 63 64 65

Nittis, K., Perivoliotis, L., Korres, G., Tziavos, C., Thanos, I., 2006. Operational 1 2 3 monitoring and forecasting for marine environmental applications in the Aegean . 4 5 Environmental Modelling & Software 21(2) 243-257. 6 7 8 Oey, L.Y., 2005. A wetting and drying scheme for POM. Ocean Modelling 9(2) 133- 9 10 150. 11 12 13 Price, J.M., Reed, M., Howard, M.K., Johnson, W.R., Ji, Z.G., Marshall, C.F., 14 15 16 Guinasso, N.L., Rainey, G.B., 2006. Preliminary assessment of an oil-spill trajectory 17 18 model using satellite-tracked, oil-spill-simulating drifters. Environmental Modelling & 19 20 21 Software 21(2) 258-270. 22 23 24 Vichi, M., Pinardi, N., Zavatarelli, M., Matteucci, G., Marcaccio, M., Bergamini, M.C., 25 26 Frascari, F., 1998. One-dimensional ecosystem model tests in the Po Prodelta area 27 28 29 (Northern Adriatic Sea). Environmental Modelling & Software 13(5-6) 471-481. 30 31 32 Wang, D.P., Jordi, A., 2011. Surface frontogenesis and thermohaline intrusion in a 33 34 shelfbreak front. Ocean Modelling 38(1-2) 161-170. 35 36 37 Xu, F.H., Wang, D.P., Riemer, N., 2010. An idealized model study of flocculation on 38 39 sediment trapping in an estuarine turbidity maximum. Research 40 41 42 30(12) 1314-1323. 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 8 63 64 65

Figure captions 1 2 3 Fig. 1 Example of data decomposition scheme for a global size of 14 x 17, local sizes of 4 5 6 6 x 7, and 9 processors. Crosses represent the boundaries of the global domain, shadow 7 8 areas are the ghost cells (or local boundaries) and arrows indicate the communication 9 10 between local domains to exchange variables at the ghost cells. 11 12 13 Fig. 2 Performance for the test case. (a) Time (seconds per simulation day) as a function 14 15 16 of the number of processors. (b) Parallel efficiency relative to performance on four 17 18 processors. The black line (in both panels) is efficiency equal to one. 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 9 63 64 65 Figure 1

i i i local local local 1 2 3 4 5 6 1 2 3 4 5 6 1 2 3 4 5 6

17 7

16 6

15 5

14 4

local 13 3 j

12 2

11 1

12 7

11 6

10 5

9 4

global

j

local 8 3 j

7 2

6 1

7 7

6 6

5 5

4 4

local 3 3 j

2 2

1 1

1 2 3 4 5 6 5 6 7 8 9 10 9 10 11 12 13 14 i global Figure 2 6 10 (a)

5 10

4 10 Time

3 10

2 10 1.2 (b) 1

0.8

0.6

Efficiency 0.4

0.2

0 4 8 16 32 64 128 256 512 1K 2K Number of processors