Loop Transformations Using Clang's Abstract Syntax Tree

Loop Transformations Using Clang's Abstract Syntax Tree

Loop Transformations using Clang’s Abstract Syntax Tree Michael Kruse Argonne National Laboratory Lemont, Illinois, USA [email protected] ABSTRACT source.c source.h OpenMP 5.1 introduced the first loop nest transformation directives unroll and tile, and more are expected to be included in OpenMP FileManager 6.0. We discuss the two Abstract Syntax Tree (AST) representations used by Clang’s implementation that is currently under develop- MemoryBuffer ment. The first representation is designed for compatibility with the SourceManager existing implementation and stores the transformed loop nest in a Characters shadow AST next to the syntactical AST. The second representation SourceLocation introduces a new meta AST-node OMPCanonicalLoop that guaran- Lexer tees that the semantic requirements of an OpenMP loop are met, and a CanonicalLoopInfo type that the OpenMPIRBuilder uses Tokens to represent literal and transformed loops. This second approach Preprocessor provides a better abstraction of loop semantics, removes the need for shadow AST nodes that are only relevant for code generation, Tokens allows sharing the implementation with other front-ends such as Parser flang, but depends on the OpenMPIRBuilder which is currently Transform under development. Tree- AST Sema CCS CONCEPTS AST • Software and its engineering ! Compilers; Parsers; Parallel programming languages; Software performance. CodeGen OpenMPIRBuilder IRBuilder KEYWORDS OpenMP, Clang, abstract syntax tree, semantic analysis, code gen- eration source.ll ACM Reference Format: Michael Kruse. 2021. Loop Transformations using Clang’s Abstract Syntax Figure 1: Clang’s internal component layers Tree. In ICPP ’21: 50th International Conference on Parallel Processing, August 09–12, 2021, Chicago, IL. ACM, New York, NY, USA, 7 pages. https://doi.org/ 10.1145/nnnnnnn.nnnnnnn an implementation of OpenMP [3] in using an “early outlining” approach [4]. That is, all OpenMP semantics are lowered in the 1 INTRODUCTION front-end and the generated IR does not contain OpenMP-specific A compiler front-end is responsible for parsing source code, deter- constructs, but calls to an OpenMP runtime. mine its meaning (semantics), and translate it into an intermediate arXiv:2107.08132v1 [cs.PL] 16 Jul 2021 representation (IR) designed to easy analysis an transformation that 1.1 OpenMP Loop Transformation Directives is (mostly) unspecific in regards to input programming language OpenMP 5.1 [16] added loop nest transformations to the OpenMP and target instruction set architecture. language. Before this change, OpenMP directives could only apply Within the LLVM compiler infrastructure project [15], the front- to statements that a programmer has written explicitly in the source end for C, C++ and Objective-C is Clang [1]. Clang 3.8 also added code. In the new OpenMP version, a loop transformation directive applied to a loop stands in for another loop as determined by the Permission to make digital or hard copies of all or part of this work for personal or directive’s definition. classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation In the example below, we first apply loop unrolling to the literal on the first page. Copyrights for components of this work owned by others than ACM for-loop. This results in another, unrolled, loop onto which another must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, directive can be applied to; for instance, a parallel for directive: to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. #pragma omp parallel for ICPP ’21, August 09–12, 2021, Chicago, IL #pragma omp unroll partial(2) © 2021 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 for (int i =0; i < N; i+=1) https://doi.org/10.1145/nnnnnnn.nnnnnnn body(i); ICPP ’21, August 09–12, 2021, Chicago, IL Michael Kruse int i =0; #pragma omp parallel for schedule(static) for (; i+3 < N; i+=4){ // unrolled for (int i =7; i < 17; i +=3) body(i); body(i); body(i+1); (a) body(i+2); OMPParallelForDirective body(i+3); |-OMPScheduleClause | `-[...] } `-CapturedStmt for (; i < N; i+=1) // remainder `-CapturedDecl nothrow |-ForStmt body(i); | |-DeclStmt | | `-VarDecl 0x7fffc6750e68 used i 'int' cinit | | `-IntegerLiteral 'int' 7 Figure 2: Partial unrolling with remainder loop | |-[...] | |-[... (Cond)] | |-[... (Incr)] | `-CallExpr 'void' | `-[...] |-ImplicitParamDecl implicit .global_tid. 'const int *const __restrict' The code above is semantically equivalent to the following version |-ImplicitParamDecl implicit .bound_tid. 'const int *const __restrict' where the loop is unrolled manually by the programmer. |-ImplicitParamDecl implicit __context '(unnamed struct) *const __restrict' `-VarDecl 0x7fffc6750e68 #pragma omp parallel for (b) for (int i =0; i < N; i+=2){ body(i); Figure 3: An OpenMP loop-associated construct (a) and its if (i+1 < N) body(i+1); AST (b) as printed by clang -Xclang -ast-dump; brackets in- } dicate omissions from the raw output As a result, transformations are applied in reverse order as they appear in the source code. This is consistent with any other pragma that appear before the item they apply to. With the addition of loop A typical implementation of unrolling avoids the conditional transformations, this can be either a literal loop (by analogy with within the loop and instead peels the last iteration into a remain- literal expression constants) that appears in the source code, or or der loop, as shown in Figure 2. Implementations are allowed to a loop that is the result of a transformation, which we refer to as a apply this as an optimization as ling and the code’s semantics are generated loop. preserved. Such directives enable the separation of the semantics of algo- rithms and its performance-optimization [14]. For one, it improves 1.2 The Clang Abstract Syntax Tree the maintainability of the code: The directive clearly conveys the An Abstract Syntax Tree (AST) is the structural in-memory repre- intend of the directives, compared to where the unrolling is inter- sentation of a program’s source code. Clang’s AST mixes syntactic- mingled with algorithm itself. Using unrolling as an example, the only (such as parenthesis) and semantic-only (such as implicit con- body has to be duplicated multiply times. If the unroll factor was versions) nodes into the same tree structure. With a few exceptions to be changed, multiple expressions have to stay consistent with it is immutable, meaning that a subtree cannot be modified after it each other, including the body copies themselves, with any acci- has been created. dental inconsistency leading to potentially wrong results. Hence, Figure 3 shows an example of an AST for an OpenMP directive dedicated loop transformations make it easier to experiment with associated to a for-loop. The root of this subtree represents the different optimization to find the best-performing on a particular parallel for pragma itself. The child nodes at the beginning are hardware. Moreover, different optimizations can be chosen for dif- the directive’s clauses and their arguments, if any. ferent hardware by either using the preprocessor, or the OpenMP The last child node is the code the directive is associated with. It metadirective, while using the same source code for the algo- is wrapped inside a CapturedStmt which borrows from Clang’s rithm itself. C++ lambda and Objective-C’s block implementation. The The implementation challenge is that before OpenMP 5.1 no CapturedDecl node contains the ‘lambda function’ definition, directive was freely composable with other directives in arbitrary CapturedStmt represents the statement that declares it and the order and multiplicity. There were only combined and composite OMPParallelForDirective is responsible for calling it. directives with all valid combinations enumerated explicitly in Re-purposing the lambda/block implementation makes it easier to the specification. OpenMP 5.1 introduced two loop transformation outline the directive’s associated code into another function which directives: tile and unroll. Tiling applies to multiple loops nested is necessary to call it from other threads. Clang also keeps track of inside each other and generates twice as many loops. which variables are used inside the CapturedStmt to become Unrolling has a full, partial, and heuristic mode. If fully unrolled, parameters of the outlined function. In Figure 3 these are indicated there is no generated loop that can be associated with another by the ImplicitParamDecl nodes for passing the thread directive. Partial unrolling can be understood as first tiling the loop identifiers, a context structure wrapping the captured variables, by an unroll-factor, then fully unrolling the inner loop. In heuristic and the loop iteration variable itself. mode, the compiler decides what to do: Full unroll, partial unroll The loop itself is represented by the ForStmt, the same AST node with a chosen unroll factor, or not unroll at all. as if the loop was not part of an OpenMP directive. It’s children Loop Transformations using Clang’s Abstract Syntax Tree ICPP ’21, August 09–12, 2021, Chicago, IL Stmt 1.3 Clang Layer Architecture Expr ForStmt CXXForRangeStmt OMPExecutableDirective CapturedStmt Clang’s internal organization is sketched in Figure 1. It follows a typical compiler structure consisting of tokenizer/Lexer, Prepro- … OMPParallelDirective … OMPLoopDirective cessor, Parser, semantic analyzer (Sema), and IR code generation (CodeGen). General control flow is steered by the parser. That is, OMPForDirective OMPParallelForDirective … when calling the parser’s ParseTopLevelDecl(), it pulls the to- kens to be consumed from the previous layers. When the parser Figure 4: Excerpt of the AST node class hierarchy has decided what syntactic element it is, it is pushed to Sema to create an AST node for it. Sema also performs the semantic anal- ysis including creating implicit AST nodes. The TreeTransform class creates copies of AST subtrees with some changes applied.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us