New Abstractions for Data Parallel Programming

New Abstractions for Data Parallel Programming James C. Brodman, Basilio B. Fraguela†, Mar´ıaJ. Garzarán,and David Padua Department of Computer Science University of Illinois at Urbana-Champaign brodman2, garzaran, [email protected] †Universidade da Coruña,Spain [email protected] Abstract but we believe it is also an idea with much room for advances. We have chosen to study data parallel no- Developing applications is becoming increasingly dif- tations because most parallel programs of importance ficult due to recent growth in machine complexity along can be represented as a sequence of data parallel oper- many dimensions, especially that of parallelism. We ations. Furthermore, scalable programs, which are the are studying data types that can be used to repre- ones that will drive the evolution of machines, must sent data parallel operations. Developing parallel pro- be data parallel. The strategy of representing parallel grams with these data types have numerous advantages computations as a sequence of data parallel operations and such a strategy should facilitate parallel program- has several advantages: ming and enable portability across machine classes and machine generations without significant performance • Conventional notation. Data parallel programs writ- degradation. ten as sequences of data parallel operations can be un- In this paper, we discuss our vision of data parallel derstood as conventional programs by ignoring paral- programming with powerful abstractions. We first dis- lelism. Although understanding how the parallelism is cuss earlier work on data parallel programming and list exploited is necessary to analyze performance, in this some of its limitations. Then, we introduce several di- paradigm it is not necessary to understand the se- mensions along which is possible to develop more pow- mantics of the program. We believe that using a con- erful data parallel programming abstractions. Finally, ventional notation reduces the likelihood of errors, fa- we present two simple examples of data parallel pro- cilitates maintenance, and shortens the learning programs that make use of operators developed as part of cess. These benefits of a conventional notation were our studies. the main motivation behind the work on autoparal- lelization [23]. 1 Introduction • Higher level of abstraction. A judicious selection The extensive research in parallel computing of the of operators should lead to very readable programs last several decades produced important results, but where powerful operators encapsulate parallelism. there is still much room, and much need, for advances in parallel programming including language develop- • Control of determinacy. Whenever the data paral- ment. New programming notations and tools are sorely lel operators are implemented as pure functions, the needed to facilitate the control of parallelism, locality, programs will be guaranteed to be determinate, al- processor load, and communication costs. though this comes at the cost of having an implicit barrier after each data parallel operator. Avoiding In this paper, we present preliminary ideas on data these barriers may require compiler transformations. types (data structures and operators) that can be used If non-determinacy is desired, it can be encapsulated to facilitate the representation of data parallel compu- inside the parallel operators by allowing interaction tations. A data parallel operation acts on the elements with a global state. of a collection of objects. With these operations, it is possible to represent parallel computations as con- • Portability. Data parallel programs written as a se- ventional programs with the parallelism encapsulated quence of operations can be ported across classes of within the operations. This is of course an old idea, parallel machines just by implementing the operators in different ways. Thus, the same program could be include the vector languages of Illiac IV such as Il- executed on shared-memory and distributed-memory liac IV Fortran and IVTRAN [22]. Recent examples multiprocessors as well as on SIMD machines. In the include High Performance Fortran [19, 12] that rep- same way that parallelism is encapsulated by the op- resented distributed memory data parallel operations erations, so can be the nature of the target machine. with array operations [1] and data distribution direc- Programmers would of course need to consider porta- tives. The functional data parallel language NESL [5] bility when developing a program and must choose made use of dynamic partitioning of collections and algorithms that perform well across machine classes. nested parallelism. Data parallel extensions of Lisp (*Lisp) were developed by Thinking Machines. Data • Good performance abstractions. By understanding parallel operations on sets was presented as an ex- the performance of each operation it is possible to tension to SETL [15] and discussed in the context of determine the overall performance of the program. the Connection Machine [13], but it seems there is The rest of this paper is organized as follows. In not much more about the use of data parallel oper- Section 2, we discuss the data parallel operators of the ation on sets in the literature. The recent design of past. Possible directions of evolution for data parallel a MapReduce [8] data parallel operation combining the operators are discussed in Section 3 and two examples map and reduce operators of Lisp has received much of data parallel codes built with some of the data types attention. we have developed are presented in Section 4. Conclu- In the numerically oriented high-level languages, sions are presented in Section 5. data parallel programming often took the form of arith- metic operations on linear arrays perhaps controlled 2 Data Parallel Programming by a mask. Most often, the operations performed where either element-by-element operations or reduc- There is an extensive collection of data parallel oper- tions across arrays. An example from Illiac IV Fortran ators developed during the last several decades. This is A(*) = B(*) + C(*) which adds, making use of the collection arises from several sources. First, many of parallelism of the machine, vectors B and C and assigns today’s data parallel operators were initially conceived the result to vector A. In Fortran 90 and MATLAB the as operators on collections. Parallelism seems to have same expression is represented by replacing * with :. In been an afterthought. Examples include the map [21] IVTRAN, the range of subscripts was controlled with and reduce [26] functions of LISP, the operation on the do for all statement (the predecessor of today’s sets of SETL [25], and the array, set, and tree opera- forall). Reductions were represented with intrinsic tors of APL [17]. The reason why these operators can functions such as sum, prod, any, and first. be used to represent parallel computation is that many Two important characteristics of the operations on parallel computation patterns can be represented as collection and data parallel constructs of the languages element-by-element operations on arrays or other col- described above are: lections or as reduction operations. Furthermore, parallel communication patterns found in message passing • Flat data types. In practically all cases, there is no in- (e.g. MPI) parallel programs correspond to operations ternal hierarchy of the data structures. Arrays, sets, found in APL, and more recently Fortran 90, such as sequences are typically flat data structures. An ex- transposition or circular shifts. Most of these opera- ception is NESL which makes use of sequences of se- tions were part of the languages just mentioned. quences for nested parallelism. HPF accepted dec- The vector instructions of SIMD machines such as larations specifying data partitioning and alignment, the early array and vector processors, including Illiac but these were instructions to the compiler and not IV [3], TI ASC [27], and CDC Star [14] are a second reflected directly in the executable instructions. source of data parallel operators. Array instructions • Fully independent parallelism. In all cases, paral- are still found today in modern machines including vec- lelism is either fully independent or, if there is interac- tor supercomputers and as extensions to the instruction tion, takes the form of a reduction or scan operations. set of conventional microprocessors (SSE [16] and Al- tivec [9]) and as GPU hardware accelerators [20], with Despite the great advantages mentioned in Sec- their hundreds of processors specialized in performing tion 1, there is much less experience with the expression repetitive operations on large arrays of data. of parallelism using operators on collections than with The the data parallel operators of high-level lan- other forms of parallelism. Parallel programming in guages and the libraries developed to encapsulate par- recent times has mostly targeted MIMD machines and allel computations are a third source of data parallel relied on SPMD notation, task spawning operations operators. Early examples of data parallel languages and parallel loops. Even for the popular GPUs, the 2 notation of choice today, CUDA, is SPMD. The most enhance locality. When the target is a distributed- significant experience with data parallel operators has memory system, tiles can make communication ex- been in the context of vector machines and vector ex- plicit, because computations involving elements from tensions (SSE/Altivec) where data parallel operators different tiles result

Load more