Parareal Algorithm Implementation and Simulation in Julia

Parareal Algorithm Implementation and Simulation in Julia Tyler M. Masthay Saverio Perugini The Institute for Computational Engineering and Sciences Department of Computer Science The University of Texas at Austin University of Dayton Austin, Texas Dayton, Ohio [email protected] [email protected] ABSTRACT either synchronous or asynchronous, and can be typed. However, if We present a full implementation of the parareal algorithm—an no type is specified in the definition of a channel, then values ofany integration technique to solve differential equations in parallel— type can be written to that channel, much like unix pipes. Messages in the Julia programming language for a fully general, first-order, are passed between coroutines through channels with the put! and initial-value problem. We provide a brief overview of Julia—a con- take!() functions. To add tasks to be automatically scheduled, use current programming language for scientific computing. Our im- the schedule() function, or the @schedule and @sync macros. Of plementation of the parareal algorithm accepts both coarse and fine course, coroutines have little overhead, but will always run on the integrators as functional arguments. We use Euler’s method and same cpu. another Runge-Kutta integration technique as the integrators in our The current version of Julia multiplexes all tasks onto experiments. We also present a simulation of the algorithm for pur- a single os thread. Thus, while tasks involving i/o poses of pedagogy and as a tool for investigating the performance operations benefit from parallel execution, compute of the algorithm. bound tasks are effectively executed sequentially on a single os thread. Future versions of Julia may sup- KEYWORDS port scheduling of tasks on multiple threads, in which Concurrent programming, Euler’s method, Julia, Runge-Kutta meth- case compute bound tasks will see benefits of parallel ods, parareal algorithm, ordinary differential equations. execution too[2]. 1 INTRODUCTION 2.2 Parallel Computing In addition to tasks, Julia supports parallel computing—functions The parareal algorithm was first proposed in 2001 by Lions, Ma- running on multiple cpus or distributed computers. New processes day, and Turinici [6] as an integration technique to solve differ- are spawned with addproc(<n>), where <n> is the number of ential equations in parallel. We present a full implementation of processes desired. The function addproc returns the pids of the the parareal algorithm in the Julia programming language (https: created processes. The function workers returns a list of the pro- //julialang.org) [7] for a fully general, first-order, initial-value prob- cesses. Alternatively, the Julia interpreter can be started with the lem. Furthermore, we present a simulation of the algorithm for -p <n> option, where <n> is the number of processes desired. For purposes of pedagogy and as a tool for investigating the perfor- instance: mance of the algorithm. Our implementation accepts both coarse and fine integrators as functional arguments. We use Euler’s method $ j u l i a and another Runge-Kutta integration technique as the integrators julia> addprocs(3) in our experiments. We start with a brief introduction to the Julia 3− element Array{Int64 ,1}: programming language. 2 2 AN INTRODUCTION TO JULIA: DYNAMIC, 3 YET EFFICIENT, SCIENTIFIC/NUMERICAL 4 arXiv:1706.08569v2 [cs.MS] 13 Dec 2018 PROGRAMMING julia> workers() Julia is a multi-paradigm language designed for scientific comput- 3− element Array{Int64 ,1}: ing; it supports multidimensional arrays, concurrency, and metapro- 2 gramming. Due to both Julia’s LLVM-based Just-In-Time com- 3 piler and the language design, Julia programs run computationally efficient—approaching and sometimes matching the speed of 4 languages like C. See [1] for a graph depicting the relative performance of Julia compared to other common languages for scientific ^D computing on a set of micro-benchmarks. $ $ j u l i a −p 3 2.1 Coroutines and Channels in Julia julia> workers() Coroutines are typically referred to as tasks in Julia, and are not 3− element Array{Int64 ,1}: scheduled to run on separate CPU cores. Channels in Julia can be 2 T.M. Masthay & S. Perugini 3 4 ^D $ Note that the process ids start at 2 because the Julia REPL shell is process 1. Processes in Julia, which are either locally running or remotely distributed, communicate with each other through message passing. The function remotecall(<Function>, <ProcessID>, <args ... >) executes <Function> on worker <ProcessID> and returns a value of the Future type, which contains a reference to a location from which the return value can be retrieved, once <Function> has completed its execution. The Future value can be extracted with the function fetch(), which blocks until the result is available. Thus, the function remotecall is used to send a Figure 1: Right endpoint error. message while the function fetch is used to receive a message. For instance: e l s e julia> addprocs(2) x = @spawn fib_parallel(n −1) julia> future = remotecall( sqrt , 2 , 4 ) y = fib_parallel(n −2) Future(2,1,3,Nullable{Any}()) return f e t c h ( x ) + y julia> fetch(future) end 2 . 0 end After the function remotecall is run, the worker process simply julia> @time fib(42) remotecall waits for the next call to . 2.271563 seconds (793 allocations: 40.718 KB) julia> counter1 = new_counter(3) 267914296 (::#1) (generic function with 1 method) julia> @time fib_parallel(42) julia> future = remotecall(counter1 , 2) 3.483601 seconds (344.48 k allocations: Future(2,1,23,Nullable{Any}()) 15.344 MB, 0.25 % gc time ) julia> fetch(future) There are also remote channels which are writable for more control 4 over synchronizing processes. The Julia macro @spawn simplifies this message-passing protocol for the programmer and obviates the need for explicit use of the 2.3 Multidimensional Arrays low-level remotecall function. Similarly, the macro @parallel Julia supports multidimensional arrays, an important data structure can be used to run each iteration of a (for) loop in its own process. in scientific computing applications, with a simple syntax and their julia> future = @spawn sqrt ( 4 ) efficient creation and interpretation over many dimensions [4]. The julia> fetch(future) function call ArrayType(<dimensions>) creates an array, where 2 . 0 the nth argument in <dimensions> specifies the size of the nth julia> addprocs(2) dimension of the array. Similarly, the programmer manipulates 2− element Array{Int64 ,1}: these arrays using function calls that support infinite-dimensional 3 arrays given only limitations on computational time. In summary, Julia incorporates concepts and mechanisms— 4 particularly concurrency and multidimensional arrays—which sup- julia > @everywhere function f i b ( n ) port efficient scientific computing. i f ( n < 2 ) return n 3 THE PARAREAL ALGORITHM e l s e The parareal algorithm is designed to perform parallel-in-time in- return f i b ( n −1) + f i b ( n −2) tegration for a first-order initial-value problem. The algorithm in- end volves two integration techniques, often known as the ‘coarse’ end integrator and the ‘fine’ integrator. For the algorithm to be effective, julia > @everywhere function fib_parallel(n) the coarse integrator must be of substantially lower computational i f ( n < 3 5 ) cost than the fine integrator. The reason will become apparent later return f i b ( n ) in this section. Consider the differential equation (1) given by Parareal Algorithm Implementation and Simulation in Julia if fine integration were simply done sequentially. Thus, each k de- y0¹tº = f ¹t;y¹tºº t 2 »a;b¼ (1) notes fine integration over the whole interval. This means thatthe total computation performed is much greater than if fine integra- with its associated initial-value problem (2) tion were performed sequentially. However, the time efficiency of ∗ ∗ ∗ y¹t º = y t 2 »a;b¼: (2) each iteration has the potential to be improved through concur- For simplicity, let us assume t∗ = a, so that the solution only ex- rency. Since fine integration is more computationally intensive, this tends rightward. To obtain an approximate solution to equation (1) improvement in the run-time efficiency may compensate for the satisfying the initial condition (2), we partition our domain into cumulative computation performed. Let K be the total number of iterations necessary to achieve a »t0 = a; :::; tN = b¼ with uniform step size ∆. We now precisely define an ‘integrator’ as a function from ¹0; 1º × R2 × R to R where desired accuracy of solution and P be the number of subintervals R is the set of all Riemann integrable functions. For example, the into which we divide according to the coarse integrator. If K = integrator I given by 1, then we achieve perfect parallel efficiency. If K = P, then we likely slowed the computation down. The parareal algorithm is I¹δ; x0;y0;дº = y0 + д¹x0;y0ºδ guaranteed to converge to the solution given by the sequential is the integrator corresponding to Euler’s method with step size δ. fine integrator within P iterations. For a more complete treatment Let C and F be the coarse and fine integrators, respectively. Define of this convergence analysis, we refer the reader to [5]. For fully general pseudocode, we refer the reader to [3, 8]. ∗ y0;1 = y¹t0º = y yn+1;1 = y¹tn+1º = C¹∆; tn;yn;1; f º: 4 PARAREAL ALGORITHM Sinceyn+1;1 depends onyn;1, this algorithm is inherently sequential. IMPLEMENTATION IN JULIA 0 m M Partition »tn; tn+1¼ into ftn = tn; :::; tn ; :::tn = tn+1g with uniform Listing 1 presents an implementation of the parareal algorithm step size δ < ∆. Define (from the prior section) in Julia. The @async macro within the loop causes the program to evaluate the first expression to its right as z0 = y¹t0 º = y n;1 n n;1 a concurrent task (i.e., the for loop assigning values to sub).

Load more