Guided Automatic Binary Parallelisation

Total Page:16

File Type:pdf, Size:1020Kb

Guided Automatic Binary Parallelisation Guided Automatic Binary Parallelisation Ruoyu (Kevin) Zhou St John's College This dissertation is submitted in July, 2017 for the degree of Doctor of Philosophy in Computer Science Abstract For decades, the software industry has amassed a vast repository of pre-compiled libraries and executables which are still valuable and actively in use. However, for a significant fraction of these binaries, most of the source code is absent or is written in old languages, making it practically impossible to recompile them for new generations of hardware. As the number of cores in chip multi-processors (CMPs) continue to scale, the performance of this legacy software becomes increasingly sub-optimal. Rewriting new optimised and parallel software would be a time-consuming and expensive task. Without source code, ex- isting automatic performance enhancing and parallelisation techniques are not applicable for legacy software or parts of new applications linked with legacy libraries. In this dissertation, three tools are presented to address the challenge of optimising legacy binaries. The first, GBR (Guided Binary Recompilation), is a tool that recompiles stripped application binaries without the need for the source code or relocation infor- mation. GBR performs static binary analysis to determine how recompilation should be undertaken, and produces a domain-specific hint program. This hint program is loaded and interpreted by the GBR dynamic runtime, which is built on top of the open-source dynamic binary translator, DynamoRIO. In this manner, complicated recompilation of the target binary is carried out to achieve optimised execution on a real system. The problem of limited dataflow and type information is addressed through cooperation be- tween the hint program and JIT optimisation. The utility of GBR is demonstrated by software prefetch and vectorisation optimisations to achieve performance improvements compared to their original native execution. The second tool is called BEEP (Binary Emulator for Estimating Parallelism), an extension to GBR for binary instrumentation. BEEP is used to identify potential thread- level parallelism through static binary analysis and binary instrumentation. BEEP per- forms preliminary static analysis on binaries and encodes all statically-undecided questions into a hint program. The hint program is interpreted by GBR so that on-demand binary instrumentation codes are inserted to answer the questions from runtime information. BEEP incorporates a few parallel cost models to evaluate identified parallelism under different parallelisation paradigms. The third tool is named GABP (Guided Automatic Binary Parallelisation), an ex- tension to GBR for parallelisation. GABP focuses on loops from sequential application binaries and automatically extracts thread-level parallelism from them on-the-fly, under the direction of the hint program, for efficient parallel execution. It employs a range of runtime schemes, such as thread-level speculation and synchronisation, to handle runtime data dependences. GABP achieves a geometric mean of speedup of 1.91x on binaries from SPEC CPU2006 on a real x86-64 eight-core system compared to native sequential execu- tion. Performance is obtained for SPEC CPU2006 executables compiled from a variety of source languages and by different compilers. Acknowledgements First and foremost, I would like to thank my supervisor Dr. Timothy Jones who guided me throughout my years in Cambridge. I appreciate the opportunities and challenges he offered me, and his valuable support that made this dissertation possible. I would have never got this far without his expertise and assistance. No matter how tough the technical challenges I encountered, he is always there for discussion and encourages me to find solutions and carry on. My sincere appreciation also goes out to: • Edmund Grimley-Evans, my advisor at ARM, for his invaluable insight in many technical challenges I faced. During my placement at ARM, thanks to his enor- mous support, my understanding in dynamic binary translation has significantly strengthened. • Robert Mullins, for providing constant source of encouragement and research ideas throughout my years in the Computer Lab from MPhil to PhD, especially during the weekend. • Niall Murphy, for his assistance in the implementation of parallel execution models. • Tom Sun, for his assistance in finding parallelism from SPEC CPU2006 and provid- ing many legacy binaries compiled by different compilers. • Sam Ainsworth, for his assistance in providing the software prefetching optimisation algorithms and test suites. • Dennis Zhang, for his assistance in resolving a few significant technical challenges. • Negar Miralaei, who tragically died shortly before finishing her dissertation, for her kind support and encouragement during the hard times. • Everyone in Computer Architecture Group, for discussions and exchange at scrum meetings every two days. • St John's College for funding my tuition fee during my study and research in Cam- bridge, for providing the best scenery, environment, food and accommodation in Cambridge. • ARM Ltd for funding my tuition fee and maintenance during my study and research in Cambridge. • My wife and parents, for always being been so loving, caring and supportive during my research. Contents 1 Introduction 13 1.1 Binary Recompilation . 13 1.2 Automatic Parallelisation . 14 1.3 Contributions . 15 1.4 Structure of this Dissertation . 17 2 Background 19 2.1 Dynamic Binary Translation . 19 2.1.1 Translation Process . 20 2.1.2 Just-In-Time Recompilation . 21 2.1.3 Linking and Indirect Branch Handling . 23 2.1.4 Trace Optimisation . 24 2.1.5 Dynamic Binary Instrumentation . 24 2.2 Static Binary Analysis . 26 2.2.1 Binary Abstraction . 27 2.2.2 Dependence Analysis . 29 2.3 Automatic Parallelisation . 34 2.3.1 Independent Multi-threading . 34 2.3.2 Cyclic Multi-threading . 35 2.3.3 Pipelined Multi-threading . 37 2.3.4 Polyhedral Multi-threading . 37 2.3.5 Speculative Multi-threading . 38 2.3.5.1 Thread Level Speculation . 39 2.3.5.2 Transactional Memory . 40 2.3.6 Profile Guided Multi-threading . 42 2.4 Summary . 43 3 Recompiling Binaries As Instructed 45 3.1 System Overview . 45 3.2 Design Choices . 47 3.2.1 Dynamic Binary Translation . 47 3.2.2 Static Binary Analysis . 48 3.2.3 Hint Instruction Interface . 48 3.3 Guided Binary Recompilation . 49 3.3.1 Guided IR Modification . 50 3.3.1.1 Case Study: JIT Prefetching Recompilation . 50 3.3.2 Guided Binary Instrumentation . 55 3.3.3 Partial Static Recompilation . 56 3.3.3.1 Case Study: Binary Vectorisation . 58 3.4 Related Work . 59 3.5 Summary . 60 4 Uncovering Parallelism In Binaries 63 4.1 Demand-driven Instrumentation . 63 4.1.1 Binary Emulator For Estimating Parallelism . 64 4.1.2 Benchmarks . 66 4.1.3 Loop Coverage Profiling . 66 4.1.4 Dynamic Data Dependence Profiling . 68 4.2 Ideal Parallel Execution Models . 70 4.2.1 DOACROSS Dataflow Model . 70 4.2.2 Induction/Reduction Optimisation . 72 4.2.3 Code Motion Model . 74 4.3 Realistic Parallel Execution Model . 78 4.3.1 Synchronisation Model . 79 4.3.1.1 Ambiguous Static Binary Analysis . 80 4.3.1.2 Thread Communication Latency . 83 4.3.2 Thread-level Speculation Model . 83 4.4 Related Work . 86 4.5 Summary . 87 5 Automatic Binary Parallelisation Framework 89 5.1 System Overview . 90 5.2 Static Hint Generation . 93 5.2.1 Loop Recognition . 93 5.2.2 Dependence and Alias Analysis . 94 5.2.3 Loop Characterisation and Selection . 98 5.2.4 Hint Generation . 99 5.3 Thread Management . 99 5.3.1 Threading States . 100 5.3.2 Thread Privatisation . 104 5.3.2.1 Thread Local Storage . 104 5.3.2.2 Register and Stack Privatisation . 105 5.3.2.3 Heap Privatisation . 106 5.4 DOALL Loop Parallelisation . 107 5.4.1 Block Parallelisation . 108 5.4.2 Cyclic Parallelisation . 108 5.5 Resolving Runtime Data Dependencies . 109 5.5.1 Just-In-Time Software Transactional Memory . 109 5.5.1.1 Read and Write Buffer . 110 5.5.1.2 Hint-Guided JIT Speculation . 110 5.5.1.3 Speculative Signal Handlers . 113 5.5.2 Speculative Value Prediction . 114 5.5.2.1 Guided Speculative Value Prediction . 115 5.5.2.2 Speculation or Synchronisation . 115 5.5.2.3 Versioned Signal and Wait . 116 5.6 Generic Loop Parallelisation . 117 5.6.1 Hint Generation Strategy . 118 5.6.2 Correctness and Verification . 119 5.6.2.1 Static Consistency Verification . 119 5.6.2.2 Dynamic Runtime Validation . 119 5.7 Related Work . 120 5.8 Summary . 121 6 System Evaluation 123 6.1 Experimental Setup . 123 6.2 Performance Evaluation . 125 6.2.1 Overhead Analysis . 126 6.2.2 Machine and System Variance . 128 6.2.3 Cyclic vs Block Parallelisation . 131 6.3 Irregular Loop Evaluation . 133 6.4 Performance Comparison and Related Work . 136 6.4.1 Comparison with Kotha on PolyBench . 136 6.4.2 Comparison with gcc autopar ..................... 137 6.5 Summary . 138 7 Conclusion and Future Work 139 7.1 Contribution . 139 7.1.1 Guided Binary Recompilation . 139 7.1.2 Binary Emulator for Estimating Parallelism . 140 7.1.3 Guided Automatic Binary Parallelisation . 140 7.2 Future Work . 141 7.2.1 Standardisation . ..
Recommended publications
  • US 2002/0066086 A1 Linden (43) Pub
    US 2002O066086A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2002/0066086 A1 Linden (43) Pub. Date: May 30, 2002 (54) DYNAMIC RECOMPLER (52) U.S. Cl. ............................................ 717/145; 717/151 (76) Inventor: Randal N. Linden, Los Angeles, CA (57) ABSTRACT (US) A method for dynamic recompilation of Source Software Correspondence Address: instructions for execution by a target processor, which LU & LIU LLP considers not only the Specific Source instructions, but also 811 WEST SEVENTH STREET, SUITE 1100 the intent and purpose of the instructions, to translate and LOS ANGELES, CA 90017 (US) optimize a set of equivalent code for the target processor. The dynamic recompiler determines what the Source opera (21) Appl. No.: 09/755,542 tion code is trying to accomplish and the optimum way of Filed: Jan. 5, 2001 doing it at the target processor, in an “interpolative' and (22) context Sensitive fashion. The Source instructions are pro Related U.S. Application Data cessed in blocks of varying Sizes by the dynamic recompiler, which considers the instructions that come before and after (63) Non-provisional of provisional application No. a current instruction to determine the most efficient approach 60/175,008, filed on Jan. 7, 2000. out of Several available approaches for encoding the opera tion code for the target processor to perform the equivalent Publication Classification tasks Specified by the Source instructions. The dynamic compiler comprises a decoding Stage, an optimization Stage (51) Int. Cl. ................................................. G06F 9/45 and an encoding Stage. Patent Application Publication May 30, 2002 Sheet 1 of 3 US 2002/0066086 A1 Patent Application Publication May 30, 2002 Sheet 2 of 3 US 2002/0066086 A1 - \O, 2 Patent Application Publication May 30, 2002 Sheet 3 of 3 US 2002/0066086 A1 fo, 3 US 2002/0066086 A1 May 30, 2002 DYNAMIC RECOMPLER more target instructions to be executed in response to a given Source instruction.
    [Show full text]
  • Dynamic Optimization of IA-32 Applications Under Dynamorio by Timothy Garnett
    Dynamic Optimization of IA-32 Applications Under DynamoRIO by Timothy Garnett Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Engineering in Computer Science Abstract The ability of compilers to optimize programs statically is diminishing. The advent and increased use of shared libraries, dynamic class loading, and runtime binding means that the compiler has, even with difficult to accurately obtain profiling data, less and less knowledge of the runtime state of the program. Dynamic optimization systems attempt to fill this gap by providing the ability to optimize the application at runtime when more of the system's state is known and very accurate profiling data can be obtained. This thesis presents two uses of the DynamoRIO runtime introspection and modification system to optimize applications at runtime. The first is a series of optimizations for interpreters running under DynamoRIO's logical program counter extension. The optimizations include extensions to DynamoRIO to allow the interpreter writer add a few annotations to the source code of his interpreter that enable large amounts of optimization. For interpreters instrumented as such, improvements in performance of 20 to 60 percent were obtained. The second use is a proof of concept for accelerating the performance of IA-32 applications running on the Itanium processor by dynamically translating hot trace paths into Itanium IA-64 assembly through DynamoRIO while allowing the less frequently executed potions of the application to run in Itanium's native IA-32 hardware emulation. This transformation yields speedups of almost 50 percent on small loop intensive benchmarks.
    [Show full text]
  • A Dynamically Recompiling ARM Emulator POWERED
    POWERED TACARM A Dynamically Recompiling ARM Emulator 3rd year project report 2000-2001 University of Warwick Written by David Sharp Supervised by Graham Martin 2 Abstract Dynamically recompiling from one machine code to another can be used to emulate modern microprocessors at realistic speeds. This report is a discussion of the techniques used in implementing a dynamically recompiling emulator of an ARM processor for use in an emulation of a complete computer system. Keywords Emulation, Dynamic Recompilation, Binary Translation, Just-In-Time compilation, Virtual Machine, Microprocessor Simulation, Intermediate Code, ARM. 3 Contents 1. Introduction 7 1.1. What is emulation? 7 1.2. Applications of emulation 7 1.3. Processor emulation techniques 9 1.4. This Project 13 2. Analysis 16 2.1. Introduction to the ARM 16 2.2. Identifying the problem 18 2.3. Getting started 20 2.4. The System Design 21 3. Disassembler 23 3.1. Purpose 23 3.2. ARM decoding 23 3.3. Design 24 4. Interpreter 26 4.1. Purpose 26 4.2. The problem with JIT 26 4.3. The HotSpotTM alternative 26 4.4. Quantifying the JIT problem 27 4.5. Faster decoding 28 4.6. Interfaces 29 4.7. The emulation loop 30 4.8. Implementation 31 4.9. Debugging 32 4.10. Compatibility 33 5. Recompilation 35 5.1. Overview 35 5.2. Methods of generating native code 35 5.3. The use of intermediate code 36 6. Armlets – An Intermediate Code 38 6.1. Purpose 38 6.2. The ‘explicit-implicit problem’ 38 6.3. Options 39 6.4.
    [Show full text]
  • A Brief History of Just-In-Time Compilation
    A Brief History of Just-In-Time JOHN AYCOCK University of Calgary Software systems have been using “just-in-time” compilation (JIT) techniques since the 1960s. Broadly, JIT compilation includes any translation performed dynamically, after a program has started execution. We examine the motivation behind JIT compilation and constraints imposed on JIT compilation systems, and present a classification scheme for such systems. This classification emerges as we survey forty years of JIT work, from 1960–2000. Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors; K.2 [History of Computing]: Software General Terms: Languages, Performance Additional Key Words and Phrases: Just-in-time compilation, dynamic compilation 1. INTRODUCTION into a form that is executable on a target platform. Those who cannot remember the past are con- What is translated? The scope and na- demned to repeat it. ture of programming languages that re- George Santayana, 1863–1952 [Bartlett 1992] quire translation into executable form covers a wide spectrum. Traditional pro- This oft-quoted line is all too applicable gramming languages like Ada, C, and in computer science. Ideas are generated, Java are included, as well as little lan- explored, set aside—only to be reinvented guages [Bentley 1988] such as regular years later. Such is the case with what expressions. is now called “just-in-time” (JIT) or dy- Traditionally, there are two approaches namic compilation, which refers to trans- to translation: compilation and interpreta- lation that occurs after a program begins tion. Compilation translates one language execution. into another—C to assembly language, for Strictly speaking, JIT compilation sys- example—with the implication that the tems (“JIT systems” for short) are com- translated form will be more amenable pletely unnecessary.
    [Show full text]
  • Program Dynamic Analysis Overview
    4/14/16 Program Dynamic Analysis Overview • Dynamic Analysis • JVM & Java Bytecode [2] • A Java bytecode engineering library: ASM [1] 2 1 4/14/16 What is dynamic analysis? [3] • The investigation of the properties of a running software system over one or more executions 3 Has anyone done dynamic analysis? [3] • Loggers • Debuggers • Profilers • … 4 2 4/14/16 Why dynamic analysis? [3] • Gap between run-time structure and code structure in OO programs Trying to understand one [structure] from the other is like trying to understand the dynamism of living ecosystems from the static taxonomy of plants and animals, and vice-versa. -- Erich Gamma et al., Design Patterns 5 Why dynamic analysis? • Collect runtime execution information – Resource usage, execution profiles • Program comprehension – Find bugs in applications, identify hotspots • Program transformation – Optimize or obfuscate programs – Insert debugging or monitoring code – Modify program behaviors on the fly 6 3 4/14/16 How to do dynamic analysis? • Instrumentation – Modify code or runtime to monitor specific components in a system and collect data – Instrumentation approaches • Source code modification • Byte code modification • VM modification • Data analysis 7 A Running Example • Method call instrumentation – Given a program’s source code, how do you modify the code to record which method is called by main() in what order? public class Test { public static void main(String[] args) { if (args.length == 0) return; if (args.length % 2 == 0) printEven(); else printOdd(); } public
    [Show full text]
  • Dynamic Binary Translation & Instrumentation
    Dynamic Binary Translation & Instrumentation Pin Building Customized Program Analysis Tools with Dynamic Instrumentation CK Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Kim Hazelwood Intel Vijay Janapa Reddi University of Colorado http://rogue.colorado.edu/Pin PLDI’05 2 Instrumentation • Insert extra code into programs to collect information about execution • Program analysis: • Code coverage, call-graph generation, memory-leak detection • Architectural study: • Processor simulation, fault injection • Existing binary-level instrumentation systems: • Static: • ATOM, EEL, Etch, Morph • Dynamic: • Dyninst, Vulcan, DTrace, Valgrind, Strata, DynamoRIO C Pin is a new dynamic binary instrumentation system PLDI’05 3 A Pintool for Tracing Memory Writes #include <iostream> #include "pin.H" executed immediately before a FILE* trace; write is executed • Same source code works on the 4 architectures VOID RecordMemWrite(VOID* ip, VOID* addr, UINT32 size) { fprintf(trace,=> “%p: Pin Wtakes %p %dcare\n”, of ip, different addr, size); addressing modes } • No need to manually save/restore application state VOID Instruction(INS ins, VOID *v) { if (INS_IsMemoryWrite(ins))=> Pin does it for you automatically and efficiently INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR(RecordMemWrite), IARG_INST_PTR, IARG_MEMORYWRITE_EA, IARG_MEMORYWRITE_SIZE, IARG_END); } int main(int argc, char * argv[]) { executed when an instruction is PIN_Init(argc, argv); dynamically compiled trace = fopen(“atrace.out”, “w”); INS_AddInstrumentFunction(Instruction,
    [Show full text]
  • Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation
    Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation Chi-Keung Luk Robert Cohn Robert Muth Harish Patil Artur Klauser Geoff Lowney Steven Wallace Vijay Janapa Reddi Kim Hazelwood Intel Corporation ¡ University of Colorado ¢¤£¦¥¨§ © £ ¦£ "! #%$&'( £)&(¦*+©-,.+/01$©-!2 ©-,2¦3 45£) 67©2, £¦!2 "0 Abstract 1. Introduction Robust and powerful software instrumentation tools are essential As software complexity increases, instrumentation—a technique for program analysis tasks such as profiling, performance evalu- for inserting extra code into an application to observe its behavior— ation, and bug detection. To meet this need, we have developed is becoming more important. Instrumentation can be performed at a new instrumentation system called Pin. Our goals are to pro- various stages: in the source code, at compile time, post link time, vide easy-to-use, portable, transparent, and efficient instrumenta- or at run time. Pin is a software system that performs run-time tion. Instrumentation tools (called Pintools) are written in C/C++ binary instrumentation of Linux applications. using Pin’s rich API. Pin follows the model of ATOM, allowing the The goal of Pin is to provide an instrumentation platform for tool writer to analyze an application at the instruction level with- building a wide variety of program analysis tools for multiple archi- out the need for detailed knowledge of the underlying instruction tectures. As a result, the design emphasizes ease-of-use, portabil- set. The API is designed to be architecture independent whenever ity, transparency, efficiency, and robustness. This paper describes possible, making Pintools source compatible across different archi- the design of Pin and shows how it provides these features.
    [Show full text]
  • Transparent Dynamic Optimization: the Design and Implementation of Dynamo
    Transparent Dynamic Optimization: The Design and Implementation of Dynamo Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia HP Laboratories Cambridge HPL-1999-78 June, 1999 E-mail: [email protected] dynamic Dynamic optimization refers to the runtime optimization optimization, of a native program binary. This report describes the compiler, design and implementation of Dynamo, a prototype trace selection, dynamic optimizer that is capable of optimizing a native binary translation program binary at runtime. Dynamo is a realistic implementation, not a simulation, that is written entirely in user-level software, and runs on a PA-RISC machine under the HPUX operating system. Dynamo does not depend on any special programming language, compiler, operating system or hardware support. Contrary to intuition, we demonstrate that it is possible to use a piece of software to improve the performance of a native, statically optimized program binary, while it is executing. Dynamo not only speeds up real application programs, its performance improvement is often quite significant. For example, the performance of many +O2 optimized SPECint95 binaries running under Dynamo is comparable to the performance of their +O4 optimized version running without Dynamo. Internal Accession Date Only Ó Copyright Hewlett-Packard Company 1999 Contents 1 INTRODUCTION ........................................................................................... 7 2 RELATED WORK ......................................................................................... 9 3 OVERVIEW
    [Show full text]
  • Infrastructures and Compilation Strategies for the Performance of Computing Systems Erven Rohou
    Infrastructures and Compilation Strategies for the Performance of Computing Systems Erven Rohou To cite this version: Erven Rohou. Infrastructures and Compilation Strategies for the Performance of Computing Systems. Other [cs.OH]. Universit´ede Rennes 1, 2015. <tel-01237164> HAL Id: tel-01237164 https://hal.inria.fr/tel-01237164 Submitted on 2 Dec 2015 HAL is a multi-disciplinary open access L'archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destin´eeau d´ep^otet `ala diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publi´esou non, lished or not. The documents may come from ´emanant des ´etablissements d'enseignement et de teaching and research institutions in France or recherche fran¸caisou ´etrangers,des laboratoires abroad, or from public or private research centers. publics ou priv´es. Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License HABILITATION À DIRIGER DES RECHERCHES présentée devant L’Université de Rennes 1 Spécialité : informatique par Erven Rohou Infrastructures and Compilation Strategies for the Performance of Computing Systems soutenue le 2 novembre 2015 devant le jury composé de : Mme Corinne Ancourt Rapporteur Prof. Stefano Crespi Reghizzi Rapporteur Prof. Frédéric Pétrot Rapporteur Prof. Cédric Bastoul Examinateur Prof. Olivier Sentieys Président M. André Seznec Examinateur Contents 1 Introduction 3 1.1 Evolution of Hardware . .3 1.1.1 Traditional Evolution . .4 1.1.2 Multicore Era . .4 1.1.3 Amdahl’s Law . .5 1.1.4 Foreseeable Future Architectures . .6 1.1.5 Extremely Complex Micro-architectures . .6 1.2 Evolution of Software Ecosystem .
    [Show full text]
  • Costing JIT Traces
    Costing JIT Traces John Magnus Morton, Patrick Maier, and Philip Trinder University of Glasgow [email protected], {Patrick.Maier, Phil.Trinder}@glasgow.ac.uk Abstract. Tracing JIT compilation generates units of compilation that are easy to analyse and are known to execute frequently. The AJITPar project aims to investigate whether the information in JIT traces can be used to make better scheduling decisions or perform code transformations to adapt the code for a specific parallel architecture. To achieve this goal, a cost model must be developed to estimate the execution time of an individual trace. This paper presents the design and implementation of a system for ex- tracting JIT trace information from the Pycket JIT compiler. We define three increasingly parametric cost models for Pycket traces. We perform a search of the cost model parameter space using genetic algorithms to identify the best weightings for those parameters. We test the accuracy of these cost models for predicting the cost of individual traces on a set of loop-based micro-benchmarks. We also compare the accuracy of the cost models for predicting whole program execution time over the Py- cket benchmark suite. Our results show that the weighted cost model using the weightings found from the genetic algorithm search has the best accuracy. 1 Introduction Modern hardware is increasingly multicore, and increasingly, software is required to exhibit decent parallel performance in order to match the hardware’s poten- tial. Writing performant parallel code is non-trivial for a fixed architecture, yet it is much harder if the target architecture is not known in advance, or if the code is meant to be portable across a range of architectures.
    [Show full text]
  • An Infrastructure for Adaptive Dynamic Optimization
    An Infrastructure for Adaptive Dynamic Optimization Derek Bruening, Timothy Garnett, and Saman Amarasinghe Laboratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 fiye,timothyg,[email protected] Abstract ing profile information improves the predictions but still falls short for programs whose behavior changes dynami- Dynamic optimization is emerging as a promising ap- cally. Additionally, many software vendors are hesitant to proach to overcome many of the obstacles of traditional ship binaries that are compiled with high levels of static op- static compilation. But while there are a number of com- timization because they are hard to debug. piler infrastructures for developing static optimizations, Shifting optimizations to runtime solves these problems. there are very few for developing dynamic optimizations. Dynamic optimization allows the user to improve the per- We present a framework for implementing dynamic analy- formance of binaries without relying on how they were ses and optimizations. We provide an interface for build- compiled. Furthermore, several types of optimizations are ing external modules, or clients, for the DynamoRIO dy- best suited to a dynamic optimization framework. These in- namic code modification system. This interface abstracts clude adaptive, architecture-specific, and inter-module opti- away many low-level details of the DynamoRIO runtime mizations. system while exposing a simple and powerful, yet efficient Adaptive optimizations require instant responses to and lightweight, API. This is achieved by restricting opti- changes in program behavior. When performed statically, a mization units to linear streams of code and using adaptive single profiling run is taken to be representative of the pro- levels of detail for representing instructions.
    [Show full text]
  • Using Dynamic Binary Instrumentation for Security(And How You May Get
    King’s Research Portal Document Version Peer reviewed version Link to publication record in King's Research Portal Citation for published version (APA): Cono D'Elia, D., Coppa, E., Palmaro, F., & Cavallaro, L. (Accepted/In press). SoK: Using Dynamic Binary Instrumentation for Security (And How You May Get Caught Red Handed). In ACM Asia Conference on Information, Computer and Communications Security (ASIACCS 2019) Citing this paper Please note that where the full-text provided on King's Research Portal is the Author Accepted Manuscript or Post-Print version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version for pagination, volume/issue, and date of publication details. And where the final published version is provided on the Research Portal, if citing you are again advised to check the publisher's website for any subsequent corrections. General rights Copyright and moral rights for the publications made accessible in the Research Portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognize and abide by the legal requirements associated with these rights. •Users may download and print one copy of any publication from the Research Portal for the purpose of private study or research. •You may not further distribute the material or use it for any profit-making activity or commercial gain •You may freely distribute the URL identifying the publication in the Research Portal Take down policy If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim.
    [Show full text]