A Multidimensional Array Slicing DSL for Stream Programming

2010 International Conference on Complex, Intelligent and Software Intensive Systems A Multidimensional Array Slicing DSL for Stream Programming Pablo de Oliveira Castro1, Stephane´ Louise1 and Denis Barthou2 1 CEA LIST, Embedded Real Time Systems Laboratory, 2 University of Bordeaux - Labri / INRIA Point Courrier 94, Gif-sur-Yvette, F-91191 France 351, cours de la Liberation,´ Talence, F-33405 France fpablo.de-oliveira-castro, [email protected] [email protected] F Abstract—Stream languages offer a simple multi-core programming of elements on its inputs and produce a fixed number of model and achieve good performance. Yet expressing data rearrange- elements on its outputs. ment patterns (like a matrix block decomposition) in these languages is verbose and error prone. Filters are particular nodes that have only one input In this paper, we propose a high-level programming language to and one output, they represent computation nodes, pos- elegantly describe n-dimensional data reorganization patterns. We show sibly keeping a state through successive firings. Split, how to compile it to stream languages. Dup and Join nodes are nodes that dispatch data through the application. Since the focus of this paper is on data reorganization, we will concentrate on Split, Dup and 1 INTRODUCTION Join nodes. We recall thereafter the main types of data Stream programming languages [1][2][3] are particularly reorganization nodes: well-suited to write efficient parallel programs for multi- Join round-robin J(c1 ::: cn) : a join round-robin has n core architectures. Fork-join parallelism and pipelines inputs and one output. We associate to each input i a ? are explicitly described by the stream graph, and task consumption rate ci 2 N . The node fires periodically. In th memory requirements and communication costs may its k firing the node takes cu, where u = (k mod n)+1, be statically extracted from the stream representation, elements on its uth input and writes them on its output. enabling powerful optimization strategies [4][5] for high As in a classic Cyclo Static Data-Flow [6] model, nodes performance. only fire when there are enough elements on their input. Languages such as StreamIt[1] or ΣC[2] are examples Split round-robin S(p1 ::: pm) : a split round-robin has of stream languages with optimizing compilers. These m outputs and one input. We associate to each output j ? th compilers analyze the stream communication patterns a production rate pj 2 N . In its k firing the node takes and simplify them, breaking useless dependencies. These pv, where v = (k mod m)+1, elements on its input, and optimizations rely in particular on the fact that all writes them to the vth output. communication patterns use Split, Duplicate and Join Duplicate D(m) has one input and m outputs. Each time nodes. While very expressive, this low-level represen- this node is fired, it takes one element on the input and tation of dataflow reorganizations is very verbose and writes it to every output, duplicating its input m times. error prone. Besides, stream graphs have sources and sinks: In this paper we propose a high-level language for the Source I(l): a source models a program input. It has an description of stream reorganizations. In this language, associated size l. The source node is fired only once and streams are structured through iterators, enabling the writes l elements to its single output. If all the elements construction of complex patterns of communication/re- in the source are the same, the source is constant and organization. We show that these iterators and patterns denoted by the node C(l). can then be compiled efficiently into stream graphs Sink O: a sink models a program output, consuming using Split, Duplicate and Join nodes. The language can all elements on its single input. If we never observe the be seen as an extension to stream languages, as such consumed elements, we say the sink is trash and we write we show how it can be integrated with the StreamIt the node T. language (but it could easily be adapted to other stream languages). We have implemented a compiler for this 1.2 Motivating Example language that produces stream graphs. As a motivating example we are going to present an excerpt from a matrix multiplication program that is 1.1 Stream languages shipped with StreamIt distribution 2.1.1. (cf. figure 1). Stream languages model parallel programs with stream In StreamIt the stream graph is described hierarchi- graphs. In this dataflow representation, nodes represent cally, in a textual form: either data reorganization operations between streams • add, is used to chain subgraphs. or filters, and arcs are communications between nodes. • split duplicate, splits the previous output through a Each time a node is fired it will consume a fixed number Duplicate node. 978-0-7695-3967-6/10 $26.00 © 2010 IEEE 913 DOI 10.1109/CISIS.2010.135 f l o a t −>float pipeline 2.2 Grids MatrixMultiply (int x0, int y0, int x1, int y1) f add RearrangeDuplicateBoth(x0, y0, x1, y1); On instances of type shape we can apply the grid op- add MultiplyAccParallel(x0, x0); g erator which is defined by giving on each dimension i f l o a t −>float splitjoin three parameters (li; hi; δi): RearrangeDuplicateBoth (int x0, int y0, int x1, int y1) f split roundrobin(x0 ∗ y0 , x1 ∗ y1 ) ; • li is the lower bound of the grid for dimension i. // the first matrix just needs to get duplicated • h i add DuplicateRows(x1, x0); i is the upper bound of the grid for dimension . • δi is the stride of the grid for dimension i. // the second matrix needs to be transposed first // and then duplicated For each dimension i, we consider the set of points : add RearrangeDuplicate(x0, y0, x1, y1); join roundrobin; li hi g Gi = fδi:k:~ei : 8k 2 [j ; j]g f l o a t −>float pipeline δi δi RearrangeDuplicate(int x0, int y0, int x1, int y1) f add Transpose(x1, y1); The elements of a grid are constructed by computing add DuplicateRows(y0, x1∗y1 ) ; g the Cartesian product of the Gi: f l o a t −>float splitjoin Transpose(int x, int y) f G = G ⊗ · · · ⊗ G split roundrobin; 1 d for (int i = 0; i < x; i++) add Identity<f l o a t >(); join roundrobin(y); They are lexicographically ordered. This ordering defines g f l o a t −>float pipeline a grid iterator G(n), where G(0) is the first element, G(1) DuplicateRows(int times, int length) f the second, etc. split duplicate; for (int i = 0; i < times; i++) add Identity<f l o a t >(); The grid operator uses a standard slicing notation join roundrobin(length ); where li; hi; δi are separated by colons and each dimen- g sion is separated by commas, [l1:h1:δ1; : : : ; ld:hd:δd]. The points described by the grid B [2:15:5,0:8:3] for Fig. 1. StreamIt program for matrix multiplication instance are represented on figure 2(a). If the dimensions of a grid are not the same as the dimensions of the shape on which it is applied, a type error is raised. Out of • split roundrobin, splits the previous output through simplicity, it is possible to omit one or more values of the a Split round robin node. triplet; missing values are replaced by sensible default • join roundrobin, joins the previous outputs with a values (0 in place of li, si in place of hi, 1 in place of Join roundrobin node. δi). For instance, the above example could be written As we can observe in figure 1, describing reorganiza- B [2::5,:8:3]. tion of 2D data in StreamIt is quite fastidious. 2.3 Blocks 2 HIGH-LEVEL LANGUAGE The block operator can only be applied upon a grid We propose a high-level language that describes data type. A block is a d-dimensional box parametrized reorganization operations on data streams, through the by its min and max coordinates on each dimension: manipulation of shapes and slicing patterns. The lan- (a1:b1; : : : ; ad:bd) with ai; bi 2 Z. guage is build around five concepts: Shapes, Grids, (−1 : 1; 0 : 1) defines a 3 × 2 block B, the points in B Blocks, Iterators described in this section. are lexicographically ordered, obtaining an ordered set: 2.1 Shapes B = f(−1; 0); (0; 0); (1; 0); (−1; 1); (0; 1); (1; 1)glex The language restructures input streams into multidimensional patterns with shapes types. These shapes cor- Blocks must always be applied to a grid of same respond to a multidimentional indexing of the stream dimension using the product (×) operator, elements. B[2::5,:8:3] x ( − 1 : 1 , 0 : 1 ) In the following example, the two input streams, identified by the numbers 0 and 1 and accessed using which describes the points in figure 2(b). If a block does the keyword “input”, are structured into 3 shapes: not have the same dimension as the grid to which it is applied, a type error is raised. shape[10] A = input 0 To apply a block on a grid, we center the block around shape[15,10] B = input 1 each point of the grid and take the resulting set of shape[3,3,3] C = input 0 points. The resulting points, in order, are defined by the Stream 0, is viewed in A as a stream of vectors of length following iterator of ordered sets, 10, in C as a stream of 3 × 3 × 3 cubes and stream 1 is viewed in B as a stream of 15 × 10 matrices.

Load more