Is dynamic compilation possible for embedded system ? Scopes 2015, St Goar

Victor Lomüller, Henri-Pierre Charles

CEA DACLE / Grenoble June 2 2015 www.cea.fr

& Cliquez pour modifier le Introduction : Wake Up Questions Session style du titre FAQ What do you mean by dynamic compilation ? : a compilation system where the binary code is generated at run-time Like a Java JIT system ? Yes exactly ! But not only for portability : for performances & data / architecture adaptation But Java JIT can not be used on ES ! Yes ! Because it need a huge memory size, introduce lag, is slow in the interpreted part, takes time to generate binary code, ... because it’s java ;-)

What do you want ? We want to have flexible and fast code generation at run-time. Use it to specialize (simplify) binary code at run-time.There is many application domains !

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 2 & © CEA. All rights reserved Cliquez pour modifier le Intro Classical architecture : GCC, LLVM,style Java JIT du titre Driven by performance only architecture Not energy aware Not data dependent

“Compilation time” typology

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 3 & © CEA. All rights reserved Cliquez pour modifier le Intro Compilers Future Compilers Architecture style du titre Multi objective ( time, power, thermal constraints) Multi-target (Heterogeneous multi SoC) Data driven (dynamically)

Using “Compilation time”

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 4 & © CEA. All rights reserved Cliquez pour modifier le Definitions style du titre

Definitions Static compilation “classical” binary code generation (gcc, icc, , ...) Dynamic Compilation binary code generated at run-time (DBT) JIT run-time dynamic compilation based on complex Intermediate representation (Java, LLVM)

Innovations Compilette : small binary code generator embedded into application able to optimize code depending on data sets deGoal : a tool which help to generate Compilettes Kahuna : an LLVM transformation to implement simple Compilettes

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 5 & © CEA. All rights reserved Cliquez pour modifier le deGoal : Data Dependent Code Generation style du titre Compilette Code generation at run-time embedded into application Data dependant Architecture independant (mostly)

Compilation chain

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 6 & © CEA. All rights reserved Cliquez pour modifier le deGoal Features style du titre deGoal features Obtained results Portable “assembly Auto adaptive dynamic language” libraries Source to source compiler Runtime Portable Registers Optimization Typed : int, float, Multiple metrics : complex,... Faster generated code Vector support : Smaller generated code dynamic size 3 order of magnitude Mix runtime data & faster than JIT/LLVM binary code 4 order of magnitude smaller than JIT/LLVM Correct use of any multimedia instruction

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 7 & © CEA. All rights reserved Cliquez pour modifier le deGoal support style du titre Architecture Port SIMD Instruction status sup- bundling port ARM Thumb-2 (+NEON/VFP) OoO/InO STxP70 (STHORM / P2012) N/A K1 (Kalray MPPA) PTX (GPU NVIDIA) N/A ARM32 N/A MSP430 N/A N/A MIPS N/A ARM64 N/A

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 8 & © CEA. All rights reserved Cliquez pour modifier le deGoal Example : Simple multiplication Runtime “constant” multiplication code generatorstyle du titre

1 # include /* -*- c -*- */ 2 3 typedef int (* pifi )( int ); 4 5 /* Compilette which add a constant value */ 6 pifi multiplyCompile( int multiplyValue) 7 { 8 cdgInsnT *code= CDGALLOC(1024); 9 printf("Code␣generation␣for␣multiply␣value␣%d␣code␣at␣%p\n", multiplyValue, code); 10 #[ 11 VectorType ScalarInt float 32 1 12 RegAlloc ScalarInt in 1 13 14 Begin code Prelude in0 15 16 mul in0, in0, #(multiplyValue) 17 rtn 18 End 19 ]#; 20 return (pifi)code; 21 }

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 9 & © CEA. All rights reserved Cliquez pour modifier le Kahuna : High Level Idea Kahuna General idea style du titre Identify “key” variable (which could be constant during a period of time Generate a template & a specializer Specialize on the fly, only the needed instructions Based on LLVM

High level idea

C 1 2 3 Front-end LLVM Static Exec. lang. IR Binary Optim. F.E. Annotated Specializer Specializer lang. IR Kernel

Kahuna Compile chain Data

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 10 & © CEA. All rights reserved Cliquez pour modifier le Kahnuna : In Place Code Generation style du titre

Reuse the same template Illustration everytime + : don’t have to generate Specializer the whole kernel + : no compilette to write Template - : need to specialize for Data every change - : only instructions “compiler aware”

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 11 & © CEA. All rights reserved Cliquez pour modifier le Kahnuna : Out Place Code Generation style du titre

Reuse multiple template Illustration - : need to generate the whole kernel (copy Specializer template + specialize) + : no compilette to write Template + : no need to specialize for Data every change (use cache) Instance - : only instructions Data “compiler aware” Instance

Data Instance

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 12 & © CEA. All rights reserved Cliquez pour modifier le Idea : Kahuna Recipe style du titre Algorithm Implementation Start with LLVM IR Based on LLVM 3.2 (language independant) Implement annotation Treat annoted variable as handling “constant” Implement Kahuna Add label for instructions process to post-modify Modified backend

Generate code for binary Annotated specialization LLVM IR Binding- Dynamic Instructions Templates Time Instructions with holes Fragments Compile “as usual” Analysis Constant Instructions Code Gen. llc Fragments Insn Info

Kahuna Binary Code Gen. templates

Code Code Specializer llc Specializer (LLVA) (object)

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 13 & © CEA. All rights reserved Cliquez pour modifier le Results : Benchmark & Experiment style du titre Code using “constants” : Architecture Pass Band Audio filter Speed & memory : 800 (Extracted from SOX) MHz Cortex-A8 ARM processor Finite impulse response (Beagleboard-xM filter platform). 2D Convolution in-order, dual-issue Modified version of LLVM FPU (theoretical peak Standard static LLVM at 80 MFlops) compilation Energy : modified GEM5 and McPAT ARMv7-A Kahuna simulation environment LLVM + static specialization (inlining) deGoal

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 14 & © CEA. All rights reserved Cliquez pour modifier le Results Speed style du titre Speedup over the static version sox FIR Convolution Application Speedup (%) Speedup (%) Speedup (%) LLVM 0 0 0 SpeLLVM 21 deGoal 27 kahuna 21 10 48 SpeLLVM = code with data specialization made “by hand” ; deGoal = code generator made by hand Convolution unrolled for kahuna

First metric : code speed First step : Does it work : yes ! Code speed as fast as compiler production grade

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 15 & © CEA. All rights reserved Cliquez pour modifier le Results : Code Generation Speed style du titre

Code generation speed sox FIR Application Cycles Cycles per Insn. Cycles Cycles per Insn. LLVM 126 M 3 M 223 M 3 M SpeLLVM 111 M 3 M deGoal 10 753 233 kahuna 205 20 76 8 (Code generation timing, in cycles took to generate the kernels and cycles took generate one instruction.)

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 16 & © CEA. All rights reserved Cliquez pour modifier le Results Code Generation Speed style du titre Code Generation result

Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 17 & © CEA. All rights reserved Cliquez pour modifier le Conclusion General style du titre Results References H.-P. Charles, D. Couroussé, V. Lomüller, F. A. Endo Victor Lomüller and R. Gauguey. degoal a tool to embed dynamic code PhD thesis and generators into applications. Jan 2014. CC conference V. Lomüller and H.-P. Charles. A LLVM Extension for articles the Generation of Low Overhead Runtime Program Specializer. In Proceedings of International Workshop Tools for new on Adaptive Self-tuning Computing Systems - ADAPT’14, pages 14–16, Jan 2014. metrics in code V. Lomüller. Générateur de code multi-temps et optimisation de code multi-objectifs. PhD thesis, generation Ecole Doctorale “Mathématiques, Sciences et Technologies de l’Information, Informatique” Université Open the door for de Grenoble, 11 2014. new code specialization

Open PhD position Java JIT compiler for embedded systems Is dynamic compilation possible for embedded system ? | DACLE Division | June 2 2015 | 18 & © CEA. All rights reserved