A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds Jie Zhao, Michael Kruse, Albert Cohen

A polyhedral compilation framework for loops with dynamic data-dependent bounds Jie Zhao, Michael Kruse, Albert Cohen To cite this version: Jie Zhao, Michael Kruse, Albert Cohen. A polyhedral compilation framework for loops with dynamic data-dependent bounds. CC’18 - 27th International Conference on Compiler Construction, Feb 2018, Vienna, Austria. 10.1145/3178372.3179509. hal-01720368 HAL Id: hal-01720368 https://hal.inria.fr/hal-01720368 Submitted on 11 Jun 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds Jie Zhao Michael Kruse Albert Cohen INRIA & DI, École Normale INRIA & DI, École Normale INRIA & DI, École Normale Supérieure Supérieure Supérieure Paris, France Paris, France Paris, France [email protected] [email protected] [email protected] Abstract ACM Reference Format: We study the parallelizing compilation and loop nest opti- Jie Zhao, Michael Kruse, and Albert Cohen. 2018. A Polyhedral mization of an important class of programs where counted Compilation Framework for Loops with Dynamic Data-Dependent Bounds. In Proceedings of 27th International Conference on Compiler loops have a dynamic data-dependent upper bound. Such Construction (CC’18). ACM, New York, NY, USA, 11 pages. https: loops are amenable to a wider set of transformations than //doi.org/10.1145/3178372.3179509 general while loops with inductively defined termination conditions: for example, the substitution of closed forms for induction variables remains applicable, removing the 1 Introduction loop-carried data dependences induced by termination con- While a large number of computationally intensive applica- ditions. We propose an automatic compilation approach to tions spend most of their time in static control loop nests— parallelize and optimize dynamic counted loops. Our ap- with affine conditional expressions and array subscripts, sev- proach relies on affine relations only, as implemented in eral important algorithms do not meet such statically pre- state-of-the-art polyhedral libraries. Revisiting a state-of- dictable requirements. We are interested in the class of com- the-art framework to parallelize arbitrary while loops, we putational kernels involving dynamic counted loops. These introduce additional control dependences on data-dependent are regular counted loops with numerical constant strides, it- predicates. Our method goes beyond the state of the art in erating until a dynamically computed, data-dependent upper fully automating the process, specializing the code gener- bound. Such bounds are loop invariants, but often recom- ation algorithm to the case of dynamic counted loops and puted in the immediate vicinity of the loop they control; for avoiding the introduction of spurious loop-carried depen- example, their definition may take place in the immediately dences. We conduct experiments on representative irregular enclosing loop. Dynamic counted loops play an important computations, from dynamic programming, computer vision role in numerical solvers, media processing applications, and and finite element methods to sparse matrix linear algebra. data analytics, as we will see in the experimental evaluation. We validate that the method is applicable to general affine They can be seen as a special case of while loop that does transformations for locality optimization, vectorization and not involve an arbitrary, inductively defined termination con- parallelization. dition. The ability to substitute their counter with a closed form—an affine induction variable—makes them amenable CCS Concepts • Software and its engineering → Com- to a wider set of transformations than while loops. Dynamic pilers; counted loops are commonly found in sparse matrix computations, but not restricted to this class of algorithms. They are Keywords parallelizing compiler, loop nest optimization, also found together with statically unpredictable, non-affine polyhedral model, dynamic counted loop array subscripts. The polyhedral framework of compilation unifies a wide variety of loop and array transformations using affine (lin- Permission to make digital or hard copies of all or part of this work for ear) transformations. The availability of a general-purpose personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear method to generate imperative code after the application of this notice and the full citation on the first page. Copyrights for components such affine transformations [3, 16, 20] brought polyhedral of this work owned by others than ACM must be honored. Abstracting with compilers to the front scene, in the well-behaved case of credit is permitted. To copy otherwise, or republish, to post on servers or to static control loops. While significant amount of work tar- redistribute to lists, requires prior specific permission and/or a fee. Request geted the affine transformation and parallelization of while permissions from [email protected]. CC’18, February 24–25, 2018, Vienna, Austria loops [5, 8, 9, 12–15, 17], these techniques face a painful © 2018 Association for Computing Machinery. problem: the lack of a robust method to generate imperative ACM ISBN 978-1-4503-5644-2/18/02...$15.00 code from the polyhedral representation. One representative https://doi.org/10.1145/3178372.3179509 approach to model while loops in a polyhedral framework, 14 CC’18, February 24–25, 2018, Vienna, Austria Jie Zhao, Michael Kruse, and Albert Cohen and in the code generator in particular, is the work of Benab- automation of imperative code generation after the applica- derrahmane et al. [5]. This work uses over-approximations tion of affine transformations, targetting both CPU and GPU to translate a while loop into a static control loop iterating architectures. from 0 to infinity that can be represented and optimized Our method goes beyond the state of the art [5, 17, 24] in in the polyhedral model. It introduces exit predicates and fully automating the process, specializing the code gener- the associated data dependences to preserve the computa- ation algorithm to the case of dynamic counted loops, and tion of the original termination condition, and to enforce avoiding the introduction of spurious loop-carried depen- the proper termination of the generated loops the first time dences or resorting to speculative execution. We conduct ex- this condition holds. These data dependences severely re- periments on representative irregular computations, includ- strict the application of loop transformations involving a ing dynamic programming, computer vision, finite element while loop, since reordering of the iterations of the latter is methods, and sparse matrix linear algebra. We validate that not permitted, and loop interchange is also restricted. The the method is applicable to general affine transformations framework was also not fully automated at the time of its for locality optimization, vectorization and parallelization. publication, leaving much room for the interpretation of its The paper is organized as follows. We introduce technical applicable cases and the space of legal transformations it background and further motivate our approach to paral- effectively models. Speculative approaches like the work of lelize dynamic counted loops in the next section. Section3 Jimborean et al. also addressed the issue [17], but a general discusses the conversion of control dependences into data- “while loop polyhedral framework” compatible with arbi- dependent predicates. Section4 introduces the code genera- trary affine transformations has yet to emerge. In this paper, tion algorithm. Experimental results are shown in Section5, we make a more pragmatic, short term step: we focus on followed by a discussion of related work in Section6 and the special case of dynamic counted loops where the most concluding remarks. difficult of these problems do not occur. There has also been a significant body of research special- 2 Background and Motivation izing on high-performance implementations of sparse matrix The polyhedral compilation framework was traditionally computations. Manually-tuned libraries [2, 4, 7, 18, 19, 27] limited to static control loop nests. It represents a program are a commonly used approach, but it is tedious to implement and its semantics using iteration domains, access relations, and tune for each representation and target architecture. A dependences and schedules. The statement instances are in- polyhedral framework that can handle non-affine subscripts cluded in iteration domains. Access relations map statement has a greater potential to achieve transformations and opti- instances to the array elements they access. Dependences mizations on sparse matrix computations, as illustrated by capture the partial order on statement instances accessing Venkat et al. [24]. the same array element (one of which being a write). The In this paper, we propose an automatic polyhedral compi- schedule implements

A Polyhedral Compilation Framework for Loops with Dynamic Data-Dependent Bounds Jie Zhao, Michael Kruse, Albert Cohen

Expression Rematerialization for VLIW DSP Processors with Distributed Register Files ?

User-Directed Loop-Transformations in Clang

Elimination of Memory-Based Dependences For

A General Compilation Algorithm to Parallelize and Optimize Counted Loops with Dynamic Data-Dependent Bounds Jie Zhao, Albert Cohen

Foundations of Scientific Research

Polyhedral-Model Guided Loop-Nest Auto-Vectorization Konrad Trifunović, Dorit Nuzman, Albert Cohen, Ayal Zaks, Ira Rosen

Compiler Construction

Synthesis and Exploration of Loop Accelerators for Systems-On-A-Chip

Portable Section-Level Tuning of Compiler Parallelized Applications

Mipsprotm Fortran 77 Programmer's Guide

Unified Polyhedral Modeling of Temporal and Spatial Locality

Power and Energy Impact by Loop Transformations