Parallel Simulation of Brownian Dynamics on Shared Memory

1 2 Noname manuscript No. 3 (will be inserted by the editor) 4 5 6 7 8 9 Parallel simulation of Brownian dynamics on shared 10 memory systems with OpenMP and Unified Parallel C 11 12 · · 13 Carlos Teijeiro Godehard Sutmann · 14 Guillermo L. Taboada Juan Touriño 15 16 17 18 19 Received: date / Accepted: date 20 21 22 Abstract The simulation of particle dynamics is an essential method to ana- 23 lyze and predict the behavior of molecules in a given medium. This work 24 presents the design and implementation of a parallel simulation of Brownian 25 dynamics with hydrodynamic interactions for shared memory systems using 26 two approaches: (1) OpenMP directives and (2) the Partitioned Global Ad- 27 dress Space (PGAS) paradigm with the Unified Parallel C (UPC) language. 28 The structure of the code is described, and different techniques for work distri- 29 bution are analyzed in terms of efficiency, in order to select the most suitable 30 strategy for each part of the simulation. Additionally, performance results have 31 been collected from two representative NUMA systems, and they are studied 32 and compared against the original sequential code. 33 34 Keywords Brownian dynamics parallel simulation OpenMP PGAS 35 UPC shared memory · · · · 36 · 37 38 1Introduction 39 40 Particle based simulation methods have been continuously used in physics 41 and biology to model the behavior of different elements (e.g., molecules, cells) 42 in a medium (e.g., fluid, gas) under thermodynamical conditions (e.g., tem- 43 perature, density). These methods represent a simplification of the real-world 44 45 C. Teijeiro, G. L. Taboada, J. Touriño 46 Computer Architecture Group, University of A Coruña 47 15071 A Coruña, Spain 48 E-mail: {cteijeiro,taboada,juan}@udc.es 49 G. Sutmann 50 Jülich Supercomputing Centre (JSC), Institute for Advanced Simulation 51 Forschungszentrum Jülich GmbH 52425 Jülich, Germany 52 ICAMS, Ruhr-University Bochum, D-44801 Bochum, Germany 53 E-mail: [email protected] 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 2 Carlos Teijeiro et al. 5 6 scenario but often provide enough details to model and predict the state and 7 evolution of a system on a given time and length scale. Among them, Brownian 8 dynamics describes the movement of particles in solution on a diffusive time 9 scale, where the solvent particles are taken into account only by their statisti- 10 cal properties, i.e. the interactions between solute and solvent are modeled as 11 a stochastic process. 12 The main goal of this work is to provide a clear and efficient paralleliza- 13 tion for the Brownian dynamics simulation, using two different approaches: 14 OpenMP and the PGAS paradigm. On the one hand, OpenMP facilitates the 15 parallelization of codes by simply introducing directives (pragmas) to create 16 parallel regions in the code (with private and shared variables) and distribute 17 their associated workload between threads. Here the access to shared variables 18 is concurrent for all threads, thus the programmer should control possible 19 data dependencies and deadlocks. On the other hand, PGAS is an emerging 20 21 paradigm that treats memory as a global address space divided in private and 22 shared areas. The shared area is logically partitioned in chunks with affinity 23 to different threads. Using the UPC language [15] (which extends ANSI C 24 including PGAS support) the programmer can deal directly with workload 25 distribution and data storage by means of different constructs, such as assign- 26 ments to shared variables, collective functions or parallel loop definitions. 27 The rest of the paper presents the design and implementation of the parallel 28 code. First, some related work on this area is presented, and then amathe- 29 matical and computational description of the different parts in the sequential 30 code is provided. Next, the most relevant design decisions taken forthepa- 31 rallel implementation of the code, according to the nature of the problem, are 32 discussed. Finally, a performance analysis of the parallel code is accomplished, 33 and the main conclusions are drawn. 34 35 36 2Relatedwork 37 38 39 There are multiple works on simulations based on Brownian dynamics, e.g. 40 several simulation tools such as BrownDye [7] and the BROWNFLEX pro- 41 gram included in the SIMUFLEX suite [5], and many other studies oriented 42 to specific fields, such as DNA research [8] or copolymers [10]. Although most 43 of these works focus on sequential codes, some of them have developed parallel 44 implementations with OpenMP, applied to smoothed particle hydrodynamics 45 (SPH) [6] and the transportation of particles in a fluid [16]. The UPC language 46 has also been used in a few works related to particle simulations, such as the 47 implementation of the particle-in-cell algorithm [12]. However, comparative 48 analyses between parallel programming approaches on shared memory with 49 OpenMP and UPC are mainly restricted to some computational kernels for 50 different purposes [11,17], without considering large simulation applications. 51 The most relevant recent work on parallel Brownian dynamics simulation 52 is BD BOX [2], which supports simulations on CPU using codes written with 53 MPI and OpenMP, and also on GPU with CUDA. However, there is still little 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 Parallel Brownian dynamics on shared memory 3 5 6 information published about the actual algorithmic implementation of these 7 simulations, and their performance has not been thoroughly studied, espe- 8 cially under periodic boundary conditions. In order to solve this and provide 9 an efficient parallel solution on different NUMA shared memory systems, we 10 have analyzed the Brownian dynamics simulation for different problem sizes, 11 and described how to address the main performance issues of the code using 12 OpenMP and a PGAS-based approach with UPC. 13 14 15 3Theoreticaldescriptionofthesimulation 16 17 The present work focuses on the simulation of Brownian motion of particles in- 18 cluding hydrodynamic interactions. The time evolution in configurationspace 19 has been stated by Ermak and McCammon [3] (based on the Focker-Planck 20 and Langevin descriptions) which includes the direct interactions between par- 21 22 ticles via systematic forces as well as solvent mediated effects via a correlated 23 random process 24 ∂Dij (t) 1 25 r (t + ∆t)=r (t)+ ∆t + D (t)F (t)∆t + R (t + ∆t)(1) i i ! ∂r ! k T ij j i 26 j j j B 27 28 Here the underlying stochastic process can be discretized in different ways, 29 leading to e.g. the Milstein [9] or Runge-Kutta [14] schemes, but in Eq. 1 30 the time integration method follows a simple Euler scheme. In a system that 31 contains N particles, the trajectory ri(t); t [0,tmax] of particle i is cal- { ∈ } 32 culated as a succession of small and fixed time step increments ∆t,defined 33 as the sum of interactions on the particles during each time step (that is, 0 34 the partial derivative of the initial diffusion tensor Dij with respect to the 35 position component and the product of the forces F and the diffusion ten- 36 sor at the beginning of a time step, being kB Boltzmann’s constant and T 37 the absolute temperature), and a random displacement Ri(∆t) associated to 38 the possible correlations between displacements for each particle. The random 39 displacement vector R is Gaussian distributed with average value Ri =0 40 and covariance R (t + ∆t)RT (t + ∆t) =2D (t)∆t. The diffusion" tensor# 41 i j ij D thereby contains" information about hydrodynamic# interactions, which, for- 42 mally, can be neglected by considering only the diagonal components D . 43 ii 44 Choosing appropriate models for the diffusion tensor, the partial derivative of 45 D drops out of Eq. 1, which is true for the Rotne-Prager tensor [13] used in 46 this work. Therefore, Eq. 1 reduces to 47 1 48 ∆r = DF∆t + √2∆t Sξ (2) kT 49 50 where the expression R = √2∆tSξ holds and D = SST , which relates the 51 stochastic process to the diffusion matrix. Therefore, S may be calculated via a 52 Cholesky decomposition of D or via the square root of D.Bothapproachesare 53 3 very CPU time consuming and have a complexity of (N ). A faster approach 54 O 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 4 Carlos Teijeiro et al. 5 6 that approximates the random displacement vector R without constructing S 7 explicitly was introduced by Fixman [4] using an expansion of R in terms of 8 Chebyshev polynomials, which has a complexity of (N 2.25). O 9 The interaction model for the computation of forces in the system (which 10 here considers periodic boundary conditions) consists of (1) direct systematic 11 interactions, which are computed as Lennard-Jones type interactions and eval- 12 uated within a fixed cutoffradius, and (2) hydrodynamic interactions, which 13 have long range character. The latter ones are reflected in the off-diagonal 14 components of the diffusion tensor D and therefore contribute to both terms 15 of the right-hand side of Eq. 2. Due to the long range nature, the diffusion 16 tensor is calculated via the Ewald summation method [1]. 17 18 19 4Designoftheparallelsimulation 20 21 The sequential code considers a unity 3-dimensional box with periodic bound- 22 ary conditions that contains a set of N equal particles.

Load more