Continuous Program Optimization Via Advanced Dynamic Compilation Techniques
Total Page:16
File Type:pdf, Size:1020Kb
Continuous Program Optimization via Advanced Dynamic Compilation Techniques Marco Festa, Nicole Gervasoni, Stefano Cherubin, Giovanni Agosta Politecnico di Milano, DEIB {marco2.festa,nicoleannamaria.gervasoni}@mail.polimi.it,[email protected],[email protected] ABSTRACT together with the growth of HPC infrastructures toward the Ex- In High Performance Computing, it is often useful to fine tune an ascale, will strain the ability of HPC centers to provide sufficient application code via recompilation of specific computational inten- personnel for these activities, due to the increased number of users. sive code fragments to leverage runtime knowledge. Traditional To ease this strain, practices such as autotuning and continuous compilers rarely provide such capabilities, but solutions such as optimisation [6–8] can be successfully applied to make the appli- libVC exist to allow C/C++ code to employ dynamic compilation. cation itself more aware of its performance and able to cope with We evaluate the impact of the introduction of Just-in-Time compila- platform heterogeneity and workload changes which are not pre- tion in a framework supporting partial dynamic (re-)compilation of dictable at compile-time, and for which traditional techniques such functions to provide continuous optimization in high performance as profile-guided optimisation may fail due to the difficulty offind- environments. We show that Just-In-Time solutions can have com- ing small profile data sets that are representative of the large ones parable performance in terms of code quality with respect to the actually used in the HPC runs. In these cases, which are becoming libVC alternatives, and it can provide smaller compilation over- more and more common [9], dynamic approaches can prove more head. We further demonstrate the strength of our approach against effective. another interpreter-based dynamic evaluation solution from the The operation of such a dynamic approach chiefly consists in state-of-the-art. generating more than one version of the code of compute-intensive kernel, and then selecting the best version at each invocation of KEYWORDS the kernel. The selection can be performed as part of an autotuning algorithm, which can be used both to tune software parameters Dynamic Compilation, JIT, Continuous Program Optimization and to search the space of compiler optimizations for optimal so- ACM Reference Format: lutions [10]. Autotuning frameworks can select one of a set of Marco Festa, Nicole Gervasoni, Stefano Cherubin, Giovanni Agosta. 2019. different versions of the same computational kernel to best fitthe Continuous Program Optimization via Advanced Dynamic Compilation HPC system runtime conditions, such as system resource partition- Techniques. In Proceedings of PARMA-DITAM Workshop (PARMA-DITAM ing, as long as such versions are generated at compile time. Some 2019). ACM, New York, NY, USA, 6 pages. https://doi.org/00.001/000_1 frameworks are actually able to perform continuous optimization, generally through specific versions of a dynamic compiler [11; 12], 1 INTRODUCTION or through cloud-based platforms [13], or by leveraging an external The trend towards the democratisation of access to High Perfor- compiler through a dedicated API [14]. mance Computing (HPC) infrastructures [1; 2] is leading a wider In this paper, we introduce an extension of a state-of-the-art spectrum of developers to work on applications that will run on dynamic compilation library to support the Just In Time (JIT) com- complex, potentially heterogeneous architectures. Yet, the design pilation paradigm. We discuss the implementation of easy-to-use and the implementation of HPC applications are difficult tasks for APIs to realize the continuous optimization approach. Finally, we which several tools and languages are used [3; 4]. Typically, each compare the dynamic compilation overhead due to this kind of HPC center has its own set of tools, which collect the expertise and compilation technique with other dynamic compilation techniques work of many experts across the years. Such tools can go from ex- in the state-of-the art. tensive frameworks, such as the OmpSs/Nanos++/Mercurium tool The rest of this paper is organized as follows. Section 2 described set from Barcelona Supercomputing Center1, to simpler collections the Just In Time paradigm, and the implementation of the related of scripts, which are nonetheless critical to achieve the expected APIs within a C++ library. Section 3 discusses the experimental performance [5]. Thus, specialised help is often needed in the form results. Section 4 provides an overview of the related works and a of HPC center staff, developers who are accustomed to the practices comparative analysis with them. Finally, we draw some conclusions of HPC systems and can provide expert knowledge to support the in section 5. application domain expert. Still, the same democratisation trend, 2 A JUST-IN-TIME SOLUTION 1https://pm.bsc.es Continuous optimization requires the program to re-shape portions Permission to make digital or hard copies of part or all of this work for personal or of its own executable code to adapt them to runtime conditions. classroom use is granted without fee provided that copies are not made or distributed This capability is very common among interpreted programming for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. languages, as the instructions are usually parsed and issued at run- For all other uses, contact the owner/author(s). time. In the context of compiled programming languages, instead, PARMA-DITAM 2019, January 2019, Valencia, ES it is more complex to achieve this behaviour, as it requires to defer © 2018 Copyright held by the owner/author(s). ACM ISBN 123-4567-24-567/08/06. the full compilation process at runtime. Although interpreted lan- https://doi.org/00.001/000_1 guages provide greater flexibility, they typically have throughput PARMA-DITAM 2019, January 2019, Valencia, ES Festa et al. .hpp running program which is much lower with respect to compiled ones. Thus, in HPC .cpp systems it is preferable to have a reconfiguration time span to opti- .cpp .cpp mize the software rather than having a constant overhead on each save / load instruction. 3 load function pointer libVC 2.1 Continuous Optimization via Dynamic clang 1 setup gcc Compilation -fopenmp-O3 compile JITCompiler compiler=gcc Dynamic compilation techniques allow to compile source code kernel.cpp -DvarPi=3.14 into executable code after the software itself has been deployed. source 2 file LLVM Whenever important runtime conditions or checkpoints are met, -Dpivot=5 a dynamic re-configuration of the software system may entail a optional dynamic (re-)compilation of its source code to apply a different set of optimization which better fit the incoming workload. Dynamic compilation can be performed via ad-hoc software hypervisors, via Figure 1: Continuous Program Optimization flow with dynamic generation and loading of software libraries, or via the libVC infrastructure, including the proposed extension integration of a compiler stack within the adaptive software system itself. features three different implementation of compiler APIs: System- The former approach suffers from the difficulty of maintaining Compiler, SystemCompilerOptimizer, and ClangLibCompiler. both the hypervisor and the code to use it. This approach requires The first two solutions require to be configured to properly inter- a deep knowledge of the hypervisor system, and an accurate con- act with external compilers already deployed on the host machine. figuration over the software system. The latter implements the Compiler as a library paradigm, and The dynamic generation and loading of software libraries is therefore needs llvm to be installed in the host machine. We extend a platform-dependent solution which requires fine tuning of the libVC with JITCompiler, an additional implementation for the compiler configuration at deploy time. Moreover, this approach Compiler interface, with the aim of providing true JIT compilation requires to access and to manage additional persistent memory capabilities. Figure 1 shows the infrastructure of libVC when it is space to handle the dynamically-generated shared objects. This used to perform continuous program optimization. We highlight problem has a non-trivial impact on HPC infrastructures, as such the new component with a red box under the other alternatives systems usually aims at minimizing the access to persistent memory provided by the framework. due to its intrinsic high-latency. Although recent proposals have To pursue continuity with the original implementation of libVC, been made to simplify the configuration of compiler settings [14], we base our implementation of JITCompiler on the llvm compiler the limitation given by the memory access still persists. The JIT framework. In particular, we support the generation of llvm-ir bit- paradigm removes the problems of the abovementioned approaches, code files, the optimization of such intermediate representation, and as it does not create any persistent object and it integrates the full the compilation of it into executable code. Whilst the bitcode gen- compiler functionalities within the adaptive application. eration and optimization are implemented similarly to the Clang as a library paradigm, the generation of executable code is extremely 2.2 Principles of Just-in-Time