Surgical Precision JIT Compilers

Total Page:16

File Type:pdf, Size:1020Kb

Surgical Precision JIT Compilers Surgical Precision JIT Compilers Tiark Rompf ‡∗ Arvind K. Sujeethy Kevin J. Browny HyoukJoong Leey Hassan Chafizy Kunle Olukotuny zOracle Labs: {first.last}@oracle.com ∗EPFL: {first.last}@epfl.ch yStanford University: {asujeeth, kjbrown, hyouklee, kunle}@stanford.edu Abstract class Record(fields: Array[String], schema: Array[String]) { def apply(key: String) = fields(schema indexOf key) Just-in-time (JIT) compilation of running programs provides more def foreach(f: (String,String) => Unit) = optimization opportunities than offline compilation. Modern JIT for ((k,i) <- schema.zipWithIndex) f(k,fields(i)) compilers, such as those in virtual machines like Oracle’s HotSpot } def processCSV(file: String)(yld: Record => Unit) = { for Java or Google’s V8 for JavaScript, rely on dynamic profiling val lines = FileReader(file) as their key mechanism to guide optimizations. While these JIT val schema = lines.next.split(",") compilers offer good average performance, their behavior is a black while (lines.hasNext) { box and the achieved performance is highly unpredictable. val fields = lines.next().split(",") In this paper, we propose to turn JIT compilation into a preci- val rec = new Record(fields,schema) sion tool by adding two essential and generic metaprogramming yld(rec) } facilities: First, allow programs to invoke JIT compilation explic- } itly. This enables controlled specialization of arbitrary code at run- time, in the style of partial evaluation. It also enables the JIT com- Figure 1. Example: Processing CSV files. piler to report warnings and errors to the program when it is un- 1. Introduction able to compile a code path in the demanded way. Second, allow the JIT compiler to call back into the program to perform compile- Just-in-time (JIT) compilation of running programs presents more time computation. This lets the program itself define the translation optimization opportunities than offline compilation. Modern JIT strategy for certain constructs on the fly and gives rise to a powerful compilers, such as those found in virtual machines like Oracle’s JIT macro facility that enables “smart” libraries to supply domain- HotSpot for Java[38] or Google’s V8 for JavaScript [19] base specific compiler optimizations or safety checks. their optimization decisions on dynamic properties of the program We present Lancet, a JIT compiler framework for Java bytecode which would not be available to a static compiler. To give one that enables such a tight, two-way integration with the running pro- key example, most modern VMs keep track of the call targets of gram. Lancet itself was derived from a high-level Java bytecode virtual method calls, which enables the JIT compiler to perform interpreter: staging the interpreter using LMS (Lightweight Modu- inlining across virtual calls if the call site is monomorphic, i.e. it lar Staging) produced a simple bytecode compiler. Adding abstract only resolves to a single target, or polymorphic for a small, fixed interpretation turned the simple compiler into an optimizing com- number. Such an inlining decision is speculative; at a later time in piler. This fact provides compelling evidence for the scalability of the program’s execution, the method call might actually resolve to the staged-interpreter approach to compiler construction. a different target, which was not encountered before. In this case, In the case of Lancet, JIT macros also provide a natural interface the caller needs to be deoptimized. The compiled code, which was to existing LMS-based toolchains such as the Delite parallelism and optimized using too optimistic assumptions, is thrown away and DSL framework, which can now serve as accelerator macros for recompiled to take the new information into account. These and arbitrary JVM bytecode. similar approaches go back to the seminal work of Urs Hölzle and others on the Self [10] and StrongTalk [7] VMs. Categories and Subject Descriptors D.3.4 [Programming Lan- guages]: Processors – Code generation, Optimization, Run-time A Motivating Example However, there is a lot more runtime environments; D.1.3 [Programming Techniques]: Concurrent Pro- information that could be exploited, but is not readily accessible gramming – Parallel programming via simple profiling alone. Let us consider an example (in Scala): we would like to implement a library function to read CSV files. General Terms Design, Languages, Performance A CSV file contains tabular data, where the first line defines the Keywords JIT Compilation, Staging, Program Generation schema, i.e. the names of the columns. We would like to iterate over all the rows in a file and access the data fields by name (sample data Permission to make digital or hard copies of all or part of this work for personal or on the right): classroom use is granted without fee provided that copies are not made or distributed processCSV("data.txt") { record => Name, Value, Flag for profit or commercial advantage and that copies bear this notice and the full citation if (record("Flag") == "yes") A, 7, no on the first page. Copyrights for components of this work owned by others than the println(record("Name")) B, 2, yes author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or } ... republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. Moreover, we would like to process files of arbitrary schema and PLDI ’14, June 9 – 11 2014, Edinburgh, United Kingdom. be able to iterate over the fields of each record as (key,value) pairs: Copyright is held by the owner/author(s). Publication rights licensed to ACM. processCSV("data.txt") { record => ACM 978-1-4503-2784-8/14/06. $15.00. for ((k,v) <- record) print(k + ": " + v); println http://dx.doi.org/10.1145/2594291.2594316 } A straightforward implementation of processCSV is shown in // JIT compile function f (explicit compilation) Figure 1. The runtime information we wish to exploit is the schema def compile[T,U](f: T => U): T => U // evaluate f at JIT-compile time (compile-time execution) array: we can easily see that schema is never modified after reading def freeze[T](f: => T): T the first line of the file. As we know both the shape of the data // unroll subsequent loops on xs that follows and the code that performs the processing, we could def unroll[T](xs: Iterable[T]): Iterable[T] specialize our processing logic accordingly. In our second snippet Figure 2. JIT API required for CSV example. above, we would not need to execute a nested loop over the record fields but could directly execute class Record(fields: Array[String], schema: Array[String]) { while (lines.hasNext) { def apply(key: String) = fields(freeze(schema indexOf key)) val fields = lines.next().split(",") def foreach(f: (String,String) => Unit) = print("Name: " + fields(0)) for ((k,i) <- unroll(freeze(schema.zipWithIndex))) f(k,fields(i)) print("Value: " + fields(1)) } print("Flag: " + fields(2)) def processCSV(file: String) = { println val lines = FileReader(file) } val schema = lines.next.split(",") and get away without the overhead of a name-to-column mapping compile { yld: (Record => Unit) => and a record abstraction altogether. while (lines.hasNext) { ... yld(rec) } Alas, no existing JIT compiler will perform this kind of spe- } cialization reliably. Why? There are just too many things it would } need to guess. First, it would need to guess the right chunk to com- Figure 3. CSV example with explicit JIT calls. pile. It must pick the loop after schema initialization, but not the full method. Then it must prove that schema is actually a constant, tial aspects with partial evaluators and staged meta-programming which is difficult because it is stored in Record objects that are ar- systems: A JIT compiler does not operate on closed programs but guments to a method call, yld(rec), which needs to be inlined. Fi- in the presence of static data, live objects on the heap, and static nally, multiple threads may read different files at the same time, code, pieces of the program which can be executed right away, ei- and we want to obtain a specialized compiled code path per invo- ther because they are already compiled or because the VM also in- cation. Specialization per program point – even temporal special- cludes an interpreter. What sets current JIT compilers apart is their ization with deoptimization and recompilation – would not work if fully automatic nature, unobservable to the running program, and multiple versions need to be active at the same time. their support for dynamic deoptimization and recompilation. Roadblocks for Deterministic Performance The word “reliably” We consider only the latter essential and argue that JIT compil- in the preceding paragraph is indeed our main concern. While JIT ers should also come with reflective high-level APIs and extensi- compilers provide good average performance for a large set of pro- bility hooks instead of being black-box systems. In particular, we grams, their behavior is in general a black box and in many cases, propose to augment JIT compilers with two intuitive and general the achieved performance is highly unpredictable. The programmer metaprogramming facilities, upon which further functionality can has no influence on when and how JIT compilation takes place. be built: Key optimization decisions are often based on magic threshold val- • Explicit compilation: if programs can invoke JIT compilation ues; the HotSpot Server compiler, for example, imposes a 35 byte explicitly, code can be specialized with respect to preexisting method size limit for early inlining. Thus, small changes in the pro- objects on the heap, in the style of partial evaluation [28]. It gram source can drastically affect its runtime behavior. Since many also allows the JIT compiler to report warnings and errors to optimization decisions rely on dynamic profile data, performance the program when it is unable to compile a code path in the can vary significantly even between runs of the same program [17].
Recommended publications
  • The LLVM Instruction Set and Compilation Strategy
    The LLVM Instruction Set and Compilation Strategy Chris Lattner Vikram Adve University of Illinois at Urbana-Champaign lattner,vadve ¡ @cs.uiuc.edu Abstract This document introduces the LLVM compiler infrastructure and instruction set, a simple approach that enables sophisticated code transformations at link time, runtime, and in the field. It is a pragmatic approach to compilation, interfering with programmers and tools as little as possible, while still retaining extensive high-level information from source-level compilers for later stages of an application’s lifetime. We describe the LLVM instruction set, the design of the LLVM system, and some of its key components. 1 Introduction Modern programming languages and software practices aim to support more reliable, flexible, and powerful software applications, increase programmer productivity, and provide higher level semantic information to the compiler. Un- fortunately, traditional approaches to compilation either fail to extract sufficient performance from the program (by not using interprocedural analysis or profile information) or interfere with the build process substantially (by requiring build scripts to be modified for either profiling or interprocedural optimization). Furthermore, they do not support optimization either at runtime or after an application has been installed at an end-user’s site, when the most relevant information about actual usage patterns would be available. The LLVM Compilation Strategy is designed to enable effective multi-stage optimization (at compile-time, link-time, runtime, and offline) and more effective profile-driven optimization, and to do so without changes to the traditional build process or programmer intervention. LLVM (Low Level Virtual Machine) is a compilation strategy that uses a low-level virtual instruction set with rich type information as a common code representation for all phases of compilation.
    [Show full text]
  • The Interplay of Compile-Time and Run-Time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel
    The Interplay of Compile-time and Run-time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel To cite this version: Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel. The Interplay of Compile-time and Run-time Options for Performance Prediction. SPLC 2021 - 25th ACM Inter- national Systems and Software Product Line Conference - Volume A, Sep 2021, Leicester, United Kingdom. pp.1-12, 10.1145/3461001.3471149. hal-03286127 HAL Id: hal-03286127 https://hal.archives-ouvertes.fr/hal-03286127 Submitted on 15 Jul 2021 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. The Interplay of Compile-time and Run-time Options for Performance Prediction Luc Lesoil, Mathieu Acher, Xhevahire Tërnava, Arnaud Blouin, Jean-Marc Jézéquel Univ Rennes, INSA Rennes, CNRS, Inria, IRISA Rennes, France [email protected] ABSTRACT Both compile-time and run-time options can be configured to reach Many software projects are configurable through compile-time op- specific functional and performance goals. tions (e.g., using ./configure) and also through run-time options (e.g., Existing studies consider either compile-time or run-time op- command-line parameters, fed to the software at execution time).
    [Show full text]
  • Design and Implementation of Generics for the .NET Common Language Runtime
    Design and Implementation of Generics for the .NET Common Language Runtime Andrew Kennedy Don Syme Microsoft Research, Cambridge, U.K. fakeÒÒ¸d×ÝÑeg@ÑicÖÓ×ÓfغcÓÑ Abstract cally through an interface definition language, or IDL) that is nec- essary for language interoperation. The Microsoft .NET Common Language Runtime provides a This paper describes the design and implementation of support shared type system, intermediate language and dynamic execution for parametric polymorphism in the CLR. In its initial release, the environment for the implementation and inter-operation of multiple CLR has no support for polymorphism, an omission shared by the source languages. In this paper we extend it with direct support for JVM. Of course, it is always possible to “compile away” polymor- parametric polymorphism (also known as generics), describing the phism by translation, as has been demonstrated in a number of ex- design through examples written in an extended version of the C# tensions to Java [14, 4, 6, 13, 2, 16] that require no change to the programming language, and explaining aspects of implementation JVM, and in compilers for polymorphic languages that target the by reference to a prototype extension to the runtime. JVM or CLR (MLj [3], Haskell, Eiffel, Mercury). However, such Our design is very expressive, supporting parameterized types, systems inevitably suffer drawbacks of some kind, whether through polymorphic static, instance and virtual methods, “F-bounded” source language restrictions (disallowing primitive type instanti- type parameters, instantiation at pointer and value types, polymor- ations to enable a simple erasure-based translation, as in GJ and phic recursion, and exact run-time types.
    [Show full text]
  • Adding Self-Healing Capabilities to the Common Language Runtime
    Adding Self-healing capabilities to the Common Language Runtime Rean Griffith Gail Kaiser Columbia University Columbia University [email protected] [email protected] Abstract systems can leverage to maintain high system availability is to perform repairs in a degraded mode of operation[23, 10]. Self-healing systems require that repair mechanisms are Conceptually, a self-managing system is composed of available to resolve problems that arise while the system ex- four (4) key capabilities [12]; Monitoring to collect data ecutes. Managed execution environments such as the Com- about its execution and operating environment, performing mon Language Runtime (CLR) and Java Virtual Machine Analysis over the data collected from monitoring, Planning (JVM) provide a number of application services (applica- an appropriate course of action and Executing the plan. tion isolation, security sandboxing, garbage collection and Each of the four functions participating in the Monitor- structured exception handling) which are geared primar- Analyze-Plan-Execute (MAPE) loop consumes and pro- ily at making managed applications more robust. How- duces knowledgewhich is integral to the correct functioning ever, none of these services directly enables applications of the system. Over its execution lifetime the system builds to perform repairs or consistency checks of their compo- and refines a knowledge-base of its behavior and environ- nents. From a design and implementation standpoint, the ment. Information in the knowledge-base could include preferred way to enable repair in a self-healing system is patterns of resource utilization and a “scorecard” tracking to use an externalized repair/adaptation architecture rather the success of applying specific repair actions to detected or than hardwiring adaptation logic inside the system where it predicted problems.
    [Show full text]
  • 18Mca42c .Net Programming (C#)
    18MCA42C .NET PROGRAMMING (C#) Introduction to .NET Framework .NET is a software framework which is designed and developed by Microsoft. The first version of .Net framework was 1.0 which came in the year 2002. It is a virtual machine for compiling and executing programs written in different languages like C#, VB.Net etc. It is used to develop Form-based applications, Web-based applications, and Web services. There is a variety of programming languages available on the .Net platform like VB.Net and C# etc.,. It is used to build applications for Windows, phone, web etc. It provides a lot of functionalities and also supports industry standards. .NET Framework supports more than 60 programming languages in which 11 programming languages are designed and developed by Microsoft. 11 Programming Languages which are designed and developed by Microsoft are: C#.NET VB.NET C++.NET J#.NET F#.NET JSCRIPT.NET WINDOWS POWERSHELL IRON RUBY IRON PYTHON C OMEGA ASML(Abstract State Machine Language) Main Components of .NET Framework 1.Common Language Runtime(CLR): CLR is the basic and Virtual Machine component of the .NET Framework. It is the run-time environment in the .NET Framework that runs the codes and helps in making the development process easier by providing the various services such as remoting, thread management, type-safety, memory management, robustness etc.. Basically, it is responsible for managing the execution of .NET programs regardless of any .NET programming language. It also helps in the management of code, as code that targets the runtime is known as the Managed Code and code doesn’t target to runtime is known as Unmanaged code.
    [Show full text]
  • Dmon: Efficient Detection and Correction of Data Locality
    DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling Tanvir Ahmed Khan and Ian Neal, University of Michigan; Gilles Pokam, Intel Corporation; Barzan Mozafari and Baris Kasikci, University of Michigan https://www.usenix.org/conference/osdi21/presentation/khan This paper is included in the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation. July 14–16, 2021 978-1-939133-22-9 Open access to the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. DMon: Efficient Detection and Correction of Data Locality Problems Using Selective Profiling Tanvir Ahmed Khan Ian Neal Gilles Pokam Barzan Mozafari University of Michigan University of Michigan Intel Corporation University of Michigan Baris Kasikci University of Michigan Abstract cally at run time. In fact, as we (§6.2) and others [2,15,20,27] Poor data locality hurts an application’s performance. While demonstrate, compiler-based techniques can sometimes even compiler-based techniques have been proposed to improve hurt performance when the assumptions made by those heuris- data locality, they depend on heuristics, which can sometimes tics do not hold in practice. hurt performance. Therefore, developers typically find data To overcome the limitations of static optimizations, the locality issues via dynamic profiling and repair them manually. systems community has invested substantial effort in devel- Alas, existing profiling techniques incur high overhead when oping dynamic profiling tools [28,38, 57,97, 102]. Dynamic used to identify data locality problems and cannot be deployed profilers are capable of gathering detailed and more accurate in production, where programs may exhibit previously-unseen execution information, which a developer can use to identify performance problems.
    [Show full text]
  • Compile Time and Runtime
    COMPILE TIME AND RUNTIME Compile time and runtime are two distinctly different times during the active life of a computer program. Compile time is when the program is compiled; runtime is when it executes (on either a physical or virtual computer). Programmers use the term static to refer to anything that is created during compile time and stays fixed during the program run. They use the term dynamic to refer to things that are created and can change during execution. Example In application MyApp shown below, the data type of variable tax is statically set to double for the entire program run. Its value is dynamically set when the assignment statement (marked **) executes. public class MyApp { public static void main( String [] args ) { double tax; . ** tax = price * TAX_RATE; . } } Compiler Tasks The main tasks of the compiler are to: Check program statements for errors and report them. Generate the machine instructions that carry out the operations specified by the program. To do this, the compiler must gather as much information as possible about the items (e.g. variables, classes, objects, etc.) used in the program. Compile Time and Runtime Page 1 Example Consider this Java assignment statement: x = y2 + z; To determine if it is correct, the compiler needs to know if y2 is a declared variable (perhaps the programmer meant to type y*2). To generate correct bytecode, the compiler needs to know the data types of the variables (integer addition is a different machine instruction than floating-point addition). Declarations Program text that is primarily meant to provide information to the compiler is called a declaration, which the compiler processes by collecting the information given.
    [Show full text]
  • Csc 553 Principles of Compilation 2 : Interpreters Department Of
    Compiler tokens source Lexer Parser CSc 553 AST Principles of Compilation IR AST VM Code Interm. Semantic Gen Code Gen Analyser 2 : Interpreters VM Department of Computer Science Interpreter University of Arizona Get Next Execute Instruction Instruction [email protected] Copyright c 2011 Christian Collberg Kinds of Interpreters An interpreter is like a CPU, only in software. The compiler generates virtual machine (VM) code rather than native machine code. "APL/Prolog−style (load−and−go/interactive) interpreter" The interpreter executes VM instructions rather than native machine code. Read, lex, read Source parse, VM eval Code Code print Interpreters are semantics slow Often 10–100 times slower than executing machine Interpret code directly. portable The virtual machine code is not tied to any particular Read, lex, Source VM parse, CodeFile.vm architecture. Code Code semantics Interpreters work well with read Read very high-level, dynamic languages CodeFile.vm eval (APL,Prolog,ICON) where a lot is unknown at VM print compile-time (array bounds, etc). "Java−style interpreter" Interpret Kinds of Interpreters. Actions in an Interpreter Just−In−Time (JIT,Dynamic) Compiler Read, lex, Source VM parse, CodeFile.vm Internally, an interpreter consists of Code Code semantics 1 The interpreter engine, which executes the VM instructions. 2 Memory for storing user data. Often separated as a heap and a stack. Native 3 A stream of VM instructions. CodeFile.vm Compile Execute Code Actions in an Interpreter. Stack-Based Instruction Sets Instruct− ionstream add I := Get next instruction. Many virtual machine instruction sets (e.g. Java bytecode, store Forth) are stack based.
    [Show full text]
  • Translation and Evaluation
    Translation and evaluation A mental model for compile-time metaprogramming Document Number: P0992r0 Audience: SG7 Date: Apr 2, 2018 Reply-to: Andrew Sutton <[email protected]> The purpose of this note is to document my understanding of what it means to evaluate an expression at compile time. During my work on reflection and metaprogramming, I’ve often found myself getting lost in the various interactions between the compiler (or translator) and the evaluation subsystem. In particular, it is not always obvious what information is allowed to flow from the translation to evaluation and back. Value-based reflection and metaprogramming also add new wrinkles to the language, since they move both values and code in between translation and evaluation. Constructing a sound mental model early on is useful. This note should be helpful for anybody writing metaprogramming proposals. We begin by defining two terms: • Translation is a process concerned with lexical, syntactic, and semantic analysis of source code. The state of a translation maintains data related to the interpretation of text as programs, and includes tokens, ASTs, scopes, lookup tables, etc. • Evaluation (or execution) is a process concerned with computation. The state of an evaluation maintains data related to computation, and contains instruction streams, runtime stores, call stacks, etc. In the conventional use of C++, these two processes are physically separate: You compile a program, and then later you run the program. You cannot, during program execution, request the instantiation of a template. The translation process is no longer active at the time of execution. (This is not C#.) The constant expression facility allows a translation to request the evaluation of an expression at particular points in the program.
    [Show full text]
  • Compile-Time RTL Interpreters
    Compile-Time RTL Interpreters Sahand Kashani James R. Larus EPFL EPFL Switzerland Switzerland ABSTRACT encode the precise timing and physical structure of a circuit. Com- There is no doubt that high-level synthesis (HLS) tools have greatly pilers must preserve the semantics of their input programs, but eased accelerator development. However, many accelerators have pipelining changes the input-to-output latency of a circuit and thus already been written in a RTL language and cannot benefit from its original timing. the automated techniques commonly used by HLS tools. Pipelining Retiming is a transparent transformation that decreases a cir- is one such technique and must painstakingly be performed by a cuit’s clock period by moving existing registers around to balance human when a RTL accelerator needs to be migrated to a higher- delays. Current FPGA toolchains perform retiming under the hood, speed FPGA for example. This paper provides a first step to bringing but its applicability highly depends on the structure of the input HLS-like automated pipelining directly to accelerators written in a circuit. For example, Intel’s HyperFlex FPGA architecture has first- RTL language. class support for retiming by including bypassable registers on every routing segment of the device to help balance long routing 1 INTRODUCTION paths. However, these special registers do not support asynchro- Pipelining is the main mechanism for creating high-frequency ac- nous resets, clock enables, or initial values. Design registers which celerators. All designs—written directly in a RTL language or gen- use such features can therefore not be retimed to a routing segment erated through HLS—exhibit some form of pipelined structure to register to decrease the clock period.
    [Show full text]
  • (Just-In-Time) Compile-Time] MOP for Advanced Dispatching
    ALIA4J’s [(Just-In-Time) Compile-Time] MOP for Advanced Dispatching Christoph Bockisch Andreas Sewe Martin Zandberg Software Engineering group CASED Software Engineering group University of Twente Technische Universität Darmstadt, University of Twente Enschede, The Netherlands Gernamy Enschede, The Netherlands [email protected] [email protected] [email protected] Abstract In this paper, we propose such an approach for languages based The ALIA4J approach provides a framework for implementing ex- on dispatching, a language feature that has seen much experimen- ecution environments with support for advanced dispatching as tation by language designers in recent years, ranging from pred- found, e.g., in aspect-oriented or predicate dispatching languages. It icate dispatching [7] over aspect-oriented programming [12] and context-oriented programming [10] to various domain-specific lan- also defines an extensible meta-model acting as intermediate repre- 1 sentation for dispatching declarations, e.g., pointcut-advice or pred- guages. The ALIA4J project therefore provides an approach for icate methods. From the intermediate representation of all dispatch implementing such programming languages. As demonstrated in declarations in the program, the framework derives an execution our previous work [5], advanced dispatching is a mechanism that model for which ALIA4J specifies an generic execution strategy. subsumes the various styles of dispatching mentioned above. The meta-object protocol (MOP) formed by the meta-model and To make advanced dispatching implementable in a flexible fash- the framework are defined such that new programming language ion, we employ a meta-object protocol (MOP) together with a concepts can be implemented modularly.
    [Show full text]
  • Compile-Time Symbolic Differentiation Using C++ Expression Templates
    Abstract Template metaprogramming is a popular technique for implementing compile time mechanisms for numerical computing. We demonstrate how expression templates can be used for compile time symbolic differentiation of algebraic expressions in C++ computer programs. Given a positive integer N and an algebraic function of multiple variables, the compiler generates executable code for the Nth partial derivatives of the function. Compile-time simplification of the derivative expressions is achieved using recursive tem- plates. A detailed analysis indicates that current C++ compiler technology is already sufficient for practical use of our results, and highlights a number of issues where further improvements may be desirable. arXiv:1705.01729v1 [cs.SC] 4 May 2017 1 Compile-Time Symbolic Differentiation Using C++ Expression Templates Drosos Kourounis1, Leonidas Gergidis2, Michael Saunders3, Andrea Walther4, and Olaf Schenk1 1Università della Svizzera italiana 2University of Ioannina 3Stanford University 4University of Paderborn 1 Introduction Methods employed for the solution of scientific and engineering problems often require the evaluation of first or higher-order derivatives of algebraic functions. Gradient methods for optimization, Newton’s method for the solution of nonlinear systems, numerical solu- tion of stiff ordinary differential equations, stability analysis: these are examples of the major importance of derivative evaluation. Computing derivatives quickly and accurately improves both the efficiency and robustness of such numerical algorithms. Automatic differentiation tools are therefore increasingly available for the important programming languages; see www.autodiff.org. There are three well established ways to compute derivatives: Numerical derivatives. These use finite-difference approximations [22]. They avoid the difficulty of very long exact expressions, but introduce truncation errors and this usually affects the accuracy of further computations.
    [Show full text]