Compiling First-Order Functions Tosession-Typed Parallel Code

1 56 2 Compiling First-Order Functions to 57 3 58 4 Session-Typed Parallel Code 59 5 60 6 David Castro-Perez Nobuko Yoshida 61 7 Imperial College London Imperial College London 62 8 London, UK London, UK 63 9 [email protected] [email protected] 64 10 65 11 Abstract problematic if a function does not match one of the avail- 66 12 Building correct and efficient message-passing parallel pro- able parallel constructs, or if a program needs to be ported 67 13 grams still poses many challenges. The incorrect use of to an architecture where some of the skeletons have not 68 14 message-passing constructs can introduce deadlocks, and been implemented. Unlike previous structured parallelism 69 15 a bad task decomposition will not achieve good speedups. approaches, we do not require the existence of an underlying 70 16 Current approaches focus either on correctness or efficiency, library or implementation of common patterns of parallelism. 71 17 but limited work has been done on ensuring both. In this In this paper, we propose a structured parallel program- 72 18 paper, we propose a new parallel programming framework, ming framework whose front-end language is a first-order 73 19 PAlg, which is a first-order language with participant anno- language based on the algebra of programming [2; 3]. The 74 20 tations that ensures deadlock-freedom by construction. PAlg algebra of programming is a mathematical framework that 75 21 programs are coupled with an abstraction of their communi- codifies the basic laws of algorithmics, and it has been suc- 76 22 cation structure, a global type from the theory of multiparty cessfully applied to e.g. program calculation techniques [4], 77 23 session types (MPST). This global type serves as an output datatype-generic programming [35], and parallel computing 78 24 for the programmer to assess the efficiency of their achieved [66]. Our framework produces message-passing parallel code 79 25 parallelisation. PAlg is implemented as an EDSL in Haskell, from program specifications written in the front-end lan- 80 26 from which we: 1. compile to low-level message-passing guage. The programmer controls how the program is paral- 81 27 C code; 2. compile to sequential C code, or interpret as se- lelised by annotating the code with participant identifiers. To 82 28 quential Haskell functions; and, 3. infer the communication make sure that the achieved parallelisation is satisfactory, 83 29 protocol followed by the compiled message-passing program. we produce as an output a formal description of the com- 84 30 We use the properties of global types to perform message munication protocol achieved by a particular parallelisation. 85 31 reordering optimisations to the compiled C code. We prove This formal description is a global type, introduced by Honda 86 32 the extensional equivalence of the compiled code, as well et al. [42] in the theory of Multiparty Session Types (MPST). 87 33 as protocol compliance. We achieve linear speedups on a We prove that the parallelisation, and any optimisation per- 88 34 shared-memory 12-core machine, and a speedup of 16 on a formed to the low-level code respects the inferred protocol. 89 35 2-node, 24-core NUMA. The properties of global types justify the message reordering 90 36 done by our back-end. In particular, we permute send and 91 Keywords multiparty session types, parallelism, arrows 37 receive operations whenever sending does not depend on the 92 38 values received. This is called asynchronous optimisation [57], 93 and removes unnecessary synchronisation, while remaining 39 1 Introduction 94 communication-safe. 40 Structured parallel programming is a technique for parallel 95 41 96 programming that requires the use of high-level parallel 1.1 Overview 42 constructs, rather than low-level send/receive operations 97 43 [52; 62]. A popular approach to structured parallelism is 98 PAlg (§3) 44 the use of algorithmic skeletons [20; 36], i.e. higher-order 99 45 100 functions that implement common patterns of parallelism. optimise 46 code generation protocol inference 101 Programming in terms of high-level constructs rather than typability 47 low-level send/receive operations is a successful way to avoid Parallel Code (§5) MPST (§4) 102 48 common concurrency bugs by construction [38]. One limita- 103 49 tion of structured parallelism is that it restricts programmers Figure 1. Overview 104 50 to use a set of fixed, predefined parallel constructs. This is 105 51 Our framework has three layers: (1) Parallel Algebraic Lan- 106 52 107 CC ’20, February 22–23, 2020, San Diego, CA, USA guage (PAlg), a point-free first-order language with partici- 53 2020. ACM ISBN 978-1-4503-7120-9/20/02...$15.00 pant annotations, which describe which process is in charge 108 54 https://doi.org/10.1145/3377555.3377889 of executing which part of the computation; (2) Message 109 55 1 110 CC ’20, February 22–23, 2020, San Diego, CA, USA David Castro-Perez and Nobuko Yoshida 111 Passing Monad (Mp), a monadic language that represents protocol compliance. The MPST theory guarantees that the 166 112 low-level message-passing parallel code, from which we gen- code that we generate complies with the inferred protocol 167 113 erate parallel C code; and (3) global types (from MPST), a (Theorem 5.2), which greatly simplifies the proof of exten- 168 114 formal description of the protocol followed by the output sional equivalence (Theorem 5.3), by allowing us to focus on 169 115 Mp code. Fig. 1 shows how these layers interact. PAlg, high- representative traces, instead of all possible interleavings of 170 116 lighted in green, is the input to our framework; and Mp and actions. On the practical side, we perform message reorder- 171 117 global types (MPST), highlighted in yellow, are the outputs. ing optimisation based on the global types [57]. Moreover, 172 118 We prove that the generated code behaves as prescribed by an explicit representation of the communication protocol is 173 119 the global type, and any low-level optimisation performed on a valuable output for programmers, since it can be used to 174 120 the generated code must respect the protocol. As an example, assess a parallelisation. (Fig. 4). 175 121 we show below a parallel mergesort. mergesort. 176 122 1.2 Outline and Contributions 177 1 msort :: (CVal a, CAlg f) => Int -> f [a] [a] 123 §2 Alg 178 2 msort n = fix n $ \ms x -> vlet (vsize x) $ \sz -> defines the Algebraic Functional Language ( ), a lan- 124 179 3 if sz <= 1 then x guage inspired by the algebra of programming, that we use 125 §3 180 4 else vlet (sz / 2) $ \sz2 -> as a basis for our work; proposes the Parallel Algebraic 126 PAlg 181 5 vlet(par ms $ vtake sz2 x) $ \xl -> Language ( ), our front-end language, as an extension 127 Alg §4 182 6 vlet(par ms $ vdrop sz2 x) $ \xr -> of with participant annotations; introduces a proto- 128 PAlg 183 7 app merge $ pair (sz, pair (xl, xr)) col inference relation that associates expressions with 129 MPST protocols, specified as global types. We prove that 184 130 The return type of msort, f [a] [a], is the type of first-order the inferred protocols are deadlock-free: i.e. every send has 185 131 programs that take lists of values [a], and return [a]. Con- a matching receive. Moreover, we use the global types to 186 132 straint CAlg restricts the kind of operations that are allowed justify message reordering optimisations, while preserving 187 133 in the function definition. The integer parameter to function communication safety; §5 develops a translation scheme 188 134 fix is used for rewriting the input programs, limiting the which generates message-passing code from PAlg, that we 189 135 depth of recursion unrolling. par is used to annotate the func- prove to preserve the extensionality of the input programs; 190 136 tions that we want to run at different processes, and function §6 demonstrates our approach using a number of examples. 191 137 app is used to run functions at the same participant as their We will provide as an artifact our working prototype im- 192 138 inputs. In case this input comes from different participants, plementation, and the examples that we used in §6, with 193 139 first all values are gathered at any of them, and thenthe instructions on how to replicate our experiments. 194 140 function is applied. We can instantiate f either as a sequen- 195 141 tial program, as a parallel program, or as an MPST protocol. 2 Algebraic Functional Language 196 142 We prove that the sequential program, and output parallel This section describes the Algebraic Functional Language 197 143 programs are extensionally equal, and that the output parallel (Alg) and its combinators. In functional programming lan- 198 144 program complies with the inferred protocol. For example, guages, it is common to provide these combinators as ab- 199 145 interpreting msort 1 as a parallel program produces C code stractions defined in a base language. For example, onesuch 200 that is extensionally equal to its sequential interpretation, 146 combinator is the split function (M), also known as fanout, or 201 147 and behaves as the following protocol: (&&&), in the arrow literature [45] and Control.Arrow Haskell 202 148 package [61]. Programming in terms of these combinators, 203 149 p1 ⊕ p2 p1 avoiding explicit mention of variables is known as point-free 204 150 programming. Another approach is to translate code writ- 205 p 151 3 ten in a pointed style, i.e.

Load more