Branching in Data-Parallel Languages Using Predication with LLVM

Branching in Data-Parallel Languages using Predication with LLVM Marcello Maggioni Codeplay Software Ltd. EuroLLVM 2014 M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 1 / 46 Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 2 / 46 Introduction Data-Parallel Languages Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 3 / 46 Introduction Data-Parallel Languages Data-Parallel Languages OpenCL, CUDA, Renderscript ... Heavily parallel Many threads running the same code/program on a varying dataset SIMD-architecture friendly M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 4 / 46 Introduction SIMD Architectures Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 5 / 46 Introduction SIMD Architectures SIMD Architectures Heavily Parallel Very efficient at running data parallel workloads with uniform control flow Very common today (Today’s CPUs all have SIMD capabilities. Most GPUs are SIMD at heart) M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 6 / 46 Introduction SIMD + Data-Parallel Approach Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 7 / 46 Introduction SIMD + Data-Parallel Approach SIMD + Data-Parallel Approach Each SIMD processing unit (PU) runs a different Data-Parallel thread SIMD Processor Thread 0 Thread 1 Thread 2 Thread 3 M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 8 / 46 Introduction SIMD + Data-Parallel Approach SIMD + Data-Parallel: Challenges Divergent branching happens when different SIMD PUs want to follow different code paths Needs special handling on many SIMD hardware as each individual unit is not independent. SIMD units share the same PC (need to execute the same instructions) M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 9 / 46 Introduction SIMD + Data-Parallel Approach Branching on SIMD: Principles We want to auto-vectorize the entire instruction stream over all the PUs of the SIMD Processor. Linearize the entire CFG after register allocation After linearization Basic Blocks that shouldn’t run on a certain SIMD PU should have the execution of the instructions in that Basic Block disabled BB#0: entry entry T F BB#1: if.else if.then if.else BB#2: if.then if.end CFG for 'kern0' function BB#3: if.end CFG for 'kern0' function M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 10 / 46 Introduction SIMD + Data-Parallel Approach Branching on SIMD: Principles SIMD Processor BB#0: entry entry SIMD Processor T F BB#1: if.else if.then if.else BB#2: if.then if.end CFG for 'kern0' function BB#3: if.end CFG for 'kern0' function M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 11 / 46 Introduction SIMD + Data-Parallel Approach Branching on SIMD: Principles SIMD Processor BB#0: entry entry SIMD Processor T F BB#1: if.else if.then if.else BB#2: if.then if.end CFG for 'kern0' function BB#3: if.end CFG for 'kern0' function M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 12 / 46 Introduction SIMD + Data-Parallel Approach Branching on SIMD: Principles SIMD Processor BB#0: entry entry SIMD Processor T F BB#1: if.else if.then if.else BB#2: if.then if.end CFG for 'kern0' function BB#3: if.end CFG for 'kern0' function M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 13 / 46 Introduction SIMD + Data-Parallel Approach Approaches IR-approach Preferred if the architecture doesn’t support full-predication Needs special handling for side-effected instructions (trapped instructions, function calls ...) Backend-approaches Can make full use of the features of the hardware Hardware predication can be exploited M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 14 / 46 Introduction SIMD + Data-Parallel Approach Predication Predication is an hardware cmp m0 , r0 , r 1 setmask m0 feature that conditionally // Execute only i f mask for unit is true disables side effects of addvp r0, r1, r2 instructions M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 15 / 46 Implementation Predicating Instructions Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 16 / 46 Implementation Predicating Instructions Predicating instructions Predicable instructions are defined in TableGen with an additional predicate operand in the backend The predicate operand has a default value which typically equals to the “always execute” predicate M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 17 / 46 Implementation Predicating Instructions Predicating instructions (2) d e f LDRT_POST : ARMAsmPseudo<"ldrt${q}␣$Rt ,␣$addr" , (ins addr_offset_none:$addr , pred:$q), (outs GPR:$Rt)>; M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 18 / 46 Implementation Predicating Instructions Predicating instructions (3) // ARM Predicate operand. Default to 14 = always (AL). Second part is CC // register whose default is 0 (no register). def CondCodeOperand : AsmOperandClass { let Name = "CondCode"; } def pred : PredicateOperand<OtherVT, (ops i32imm, i32imm), (ops (i32 14), (i32 zero_reg))> { let PrintMethod = "printPredicateOperand"; let ParserMatchClass = CondCodeOperand; let DecoderMethod = "DecodePredicateOperand"; } M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 19 / 46 Implementation Determining an execution schedule Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 20 / 46 Implementation Determining an execution schedule Determining an execution schedule The execution schedule is the order of execution in the linearized CFG. BB#0: entry entry The schedule needs to be chosen T F such that every possible BB#1: if.else if.then if.else predecessor of a BB is executed BB#2: if.then before the BB itself if.end The Reverse Post Order CFG for 'kern0' function BB#3: if.end CFG for 'kern0' function traversal of the CFG follows this rule (Can use the Reverse Post Order Iterator from LLVM) M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 21 / 46 Implementation Determining an execution schedule Determining an execution schedule (2) Structurizing the CFG in this phase generates naturally a valid execution schedule and simplifies later passes. Can be done using the StructurizeCFG pass from LLVM M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 22 / 46 Implementation Determining an execution schedule StructurizeCFG %0 %0 T F T F %5 i n t main ( ) { T F volatile int a = 5 ; %4 volatile int b = 6 ; T F %12 i f (a==5 || b< 2) { b = 6 ; Flow1 } e l s e { b = 7 ; } %7 %8 Flow T F r e t u r n 0 ; } %10 %9 %13 CFG for 'main' function CFG for 'main' function M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 23 / 46 Implementation Determining an execution schedule StructurizeCFG (2) i n t main ( ) { %0 T F volatile int a = 5 ; volatile int b = 6 ; %0 %8 T F i f ( a < 5) { a = 5 ; Flow %4 %6 T F } e l s e { b = 1 0 ; } %7 %6 CFG for 'main' function r e t u r n 0 ; %9 CFG for 'main' function } M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 24 / 46 Implementation Execution mask allocation Outline 1 Introduction Data-Parallel Languages SIMD Architectures SIMD + Data-Parallel Approach 2 Implementation Predicating Instructions Determining an execution schedule Execution mask allocation Mask Handling Insertion CFG Linearization Predicate instructions Optimizations Advantages M. Maggioni (Codeplay Software Ltd.) Branching with Predication EuroLLVM 2014 25 / 46 Implementation Execution mask allocation Execution Mask Allocation To each basic-block

Load more