SIMD Intrinsics on Managed Language Runtimes

Total Page:16

File Type:pdf, Size:1020Kb

SIMD Intrinsics on Managed Language Runtimes 1 56 2 SIMD Intrinsics on Managed Language Runtimes 57 3 58 4 Anonymous Author(s) 59 5 60 6 Abstract speedup, but neither languages like Java, JavaScript, Python, 61 7 Managed language runtimes such as the Java Virtual Ma- or Ruby, nor their managed runtimes, provide direct access 62 8 chine (JVM) provide adequate performance for a wide range to SIMD facilities. This means that SIMD optimizations, if 63 9 of applications, but at the same time, they lack much of the available at all, are left to the virtual machine (VM) and the 64 10 low-level control that performance-minded programmers built-in just-in-time (JIT) compiler to carry out automatically, 65 11 appreciate in languages like C/C++. One important example which often leads to suboptimal code. As a result, developers 66 12 of such a control is the intrinsics interface that expose in- may be pushed to use low-level languages such as C/C++ to 67 13 structions of SIMD (Single Instruction Multiple Data) vector gain access to the intrinsics API. But leaving the high-level 68 14 ISAs (Instruction Set Architectures). In this paper we present ecosystem of Java or other languages also means to abandon 69 15 an automatic approach for including native intrinsics in the many high-level abstractions that are key for the produc- 70 16 runtime of a managed language. Our implementation con- tive and efficient development of large-scale applications, 71 17 sists of two parts. First, for each vector ISA, we automatically including access to a large set of libraries. 72 18 generate the intrinsics API from the vendor-provided XML To reap the benefits of both high-level and low-level lan- 73 19 specification. Second, we employ a metaprogramming ap- guages, developers using managed languages may write low- 74 20 proach that enables programmers to generate and load native level native C/C++ functions, that are invoked by the man- 75 21 code at runtime. In this setting, programmers can use the aged runtime. In the case of Java, developers could use the 76 22 entire high-level language as a kind of macro system to de- Java Native Interface (JNI) to invoke C functions with spe- 77 23 fine new high-level vector APIs with zero overhead. Asan cific naming conventions. However, this process of dividing 78 24 example we show a variable precision API. We provide an the application logic between two languages creates a sig- 79 25 end-to-end implementation of our approach in the HotSpot nificant gap in the abstractions of the program, limits code 80 26 VM that supports all 5912 Intel SIMD intrinsics from MMX to reuse, and impedes clear separation of concerns. Further, the 81 27 AVX-512. Our benchmarks demonstrate that this combina- native code must be cross-complied ahead of time, which is 82 28 tion of SIMD and metaprogramming enables developers to an error-prone process that requires complicated pipelines 83 29 write high-performance vectorized code on an unmodified to support different operating systems and architectures, and 84 30 JVM that outperforms the auto-vectorizing HotSpot just-in- thus directly affects code maintenance and refactoring. 85 31 time (JIT) compiler and provides tight integration between To address these problems, we propose a systematic and 86 32 vectorized native code and the managed JVM ecosystem. automated approach that gives developers access to SIMD 87 33 instructions in the managed runtime, eliminating the need 88 CCS Concepts • Computer systems organization → 34 to write low-level C/C++ code. Our methodology supports 89 Single instruction, multiple data; • Software and its 35 the entire set of SIMD instructions in the form of embedded 90 engineering → Virtual machines; Translator writing 36 domain-specific languages (eDSLs) and consists of two parts. 91 systems and compiler generators; Source code genera- 37 First, for each architecture, we automatically generate ISA- 92 tion; Runtime environments; 38 specific eDSLs from the vendor’s XML specification ofthe 93 39 ACM Reference Format: SIMD intrinsics. Second, we provide the developer with the 94 40 Anonymous Author(s). 2017. SIMD Intrinsics on Managed Lan- means to use the SIMD eDSL to develop application logic, 95 guage Runtimes. In Proceedings of ACM SIGPLAN Conference on 41 which automatically generates native code inside the run- 96 Programming Languages, New York, NY, USA, January 01–03, 2017 42 time. Instead of executing each SIMD intrinsic immediately 97 (CGO’18), 11 pages. when invoked by the program, the eDSLs provide a staged or 43 https://doi.org/10.1145/nnnnnnn.nnnnnnn 98 44 deferred API, which accumulates intrinsic invocations along 99 45 with auxiliary scalar operations and control flow, batches 100 46 1 Introduction them together in a computation graph, and generates a na- 101 47 Managed high-level languages are designed to be general- tive kernel that executes them all at once, when requested 102 48 purpose and portable. For the programmer, the price is re- by the program. This makes it possible to interleave SIMD 103 49 duced access to detailed and low-level performance optimiza- intrinsics with the generic language constructs of the host 104 50 tions. In particular, SIMD vector instructions on modern ar- language without switching back and forth between native 105 51 chitectures offer significant parallelism, and thus potential and managed execution, enabling programmers to build both 106 52 high-level and low-level abstractions, while running SIMD 107 CGO’18, January 01–03, 2017, New York, NY, USA kernels at full speed. 53 2017. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 108 54 https://doi.org/10.1145/nnnnnnn.nnnnnnn 109 55 1 110 CGO’18, January 01–03, 2017, New York, NY, USA Anon. 111 This paper makes the following contributions: • AVX / AVX2 - ISAs that expand the SSE operations to 166 112 1. We present the first systematic and automated ap- 256-bit wide registers and provide extra operations for 167 113 proach that supports the entire set of SIMD instruc- manipulating non-contiguous memory locations. 168 114 tions, automatically generated from the vendor spec- • FMA - an extension to SSE and AVX ISAs to provide 169 115 ification, in a managed high-level language. Theap- fused multiply add operations. 170 116 proach is applicable to other low-level instructions, • AVX-512 - extends AVX to operate on 512-bit registers 171 117 provided support for native code binding in the man- and consists of multiple parts called F / BW / CD / DQ / 172 118 aged high-level language. ER / IFMA52 / PF / VBMI / VL. 173 119 2. In doing so, we show how to use metaprogramming • KNC - the first production version of Intel’s Many In- 174 120 techniques and runtime code generation to give back tegrated Core (MIC) architecture that provides opera- 175 121 low-level control to developers in an environment that tions on 512-bit registers. 176 122 typically hides architecture-specific details. Additionally, we also include: 177 123 178 3. We provide an end-to-end implementation of our ap- • SVML - an intrinsics short vector math library, built on 124 proach within the HotSpot JVM, which provides access 179 125 top of the ISAs mentioned above. 180 to all Intel SIMD intrinsics from MMX to AVX-512. • ADX AES BMI1 126 Sets of smaller ISA extensions: / / / 181 4. We show how to use the SIMD eDSLs to build new BMI2 CLFLUSHOPT CLWB FP16C FSGSBASE FXSR 127 / / / / / / 182 abstractions using host language constructs. Program- INVPCID LZCNT MONITOR MPX PCLMULQDQ POPCNT 128 / / / / / 183 mers can use the entire managed language as a form PREFETCHWT1 RDPID RDRAND RDSEED RDTSCP 129 / / / / / / 184 of macro system to define new vectorized APIs with RTM / SHA / TSC / XSAVE / XSAVEC / XSAVEOPT / XSS 130 zero overhead. As an example, we present a “virtual 185 131 ISA” for variable precision arithmetic. These ISAs yield a large number of associated intrinsics: 186 132 5. We provide benchmarks that demonstrate significant arithmetics operations on both floating point and integer 187 133 performance gains of explicit SIMD code versus code numbers, intrinsics that operate with logical and bitwise 188 134 auto-vectorized by the HotSpot JIT compiler. operations, statistical and cryptographic operations, com- 189 135 parison, string operations and many more. Figure 1a gives a 190 Our work focuses on the JVM and Intel SIMD intrinsics 136 rough classification of the different classes of intrinsics. To 191 functions, but would equally apply to other platforms. For 137 ease the life of the developers, Intel provides an interactive 192 the implementation of computation graphs and runtime code 138 tool called Intel Intrinsics Guide [2] where each available 193 generation, we use the LMS (Lightweight Modular Staging) 139 intrinsics is listed, including a detailed description of the 194 compiler framework [16]. 140 underlying ISA instruction. Figure 1b shows the number of 195 141 available intrinsics: 5912 in total (of which 338 are shared 196 2 Background 142 between AVX-512 and KNC), classified into 24 groups. 197 143 We provide background on intrinsics functions, JVMs, and 198 144 the LMS metaprogramming framework that we use. 2.2 Java Virtual Machines 199 145 There are many active implementations of the JVM includ- 200 2.1 Intrinsics 146 ing the open source IBM V9 [5], Jikes RVM [3], Maxine [24], 201 147 Intrinsics are compiler-built-in functions that usually map JRockit [12], and the proprietary SAP JVM [19], CEE-J [21], 202 148 into a single or a small number of assembly instructions. Dur- JamaicaVM [20]. HotSpot remains the primary reference 203 149 ing compilation, they are inlined to remove calling overhead.
Recommended publications
  • An OSEK/VDX-Based Multi-JVM for Automotive Appliances
    AN OSEK/VDX-BASED MULTI-JVM FOR AUTOMOTIVE APPLIANCES Christian Wawersich, Michael Stilkerich, Wolfgang Schr¨oder-Preikschat University of Erlangen-Nuremberg Distributed Systems and Operating Systems Erlangen, Germany E-Mail: [email protected], [email protected], [email protected] Abstract: The automotive industry has recent ambitions to integrate multiple applications from different micro controllers on a single, more powerful micro controller. The outcome of this integration process is the loss of the physical isolation and a more complex monolithic software. Memory protection mechanisms need to be provided that allow for a safe co-existence of heterogeneous software from different vendors on the same hardware, in order to prevent the spreading of an error to the other applications on the controller and leaving an unclear responsi- bility situation. With our prototype system KESO, we present a Java-based solution for ro- bust and safe embedded real-time systems that does not require any hardware protection mechanisms. Based on an OSEK/VDX operating system, we offer a familiar system creation process to developers of embedded software and also provide the key benefits of Java to the embedded world. To the best of our knowledge, we present the first Multi-JVM for OSEK/VDX operating systems. We report on our experiences in integrating Java and an em- bedded operating system with focus on the footprint and the real-time capabili- ties of the system. Keywords: Java, Embedded Real-Time Systems, Automotive, Isolation, KESO 1. INTRODUCTION Modern cars contain a multitude of micro controllers for a wide area of tasks, ranging from convenience features such as the supervision of the car’s audio system to safety relevant functions such as assisting the braking system of the car.
    [Show full text]
  • Apache Harmony Project Tim Ellison Geir Magnusson Jr
    The Apache Harmony Project Tim Ellison Geir Magnusson Jr. Apache Harmony Project http://harmony.apache.org TS-7820 2007 JavaOneSM Conference | Session TS-7820 | Goal of This Talk In the next 45 minutes you will... Learn about the motivations, current status, and future plans of the Apache Harmony project 2007 JavaOneSM Conference | Session TS-7820 | 2 Agenda Project History Development Model Modularity VM Interface How Are We Doing? Relevance in the Age of OpenJDK Summary 2007 JavaOneSM Conference | Session TS-7820 | 3 Agenda Project History Development Model Modularity VM Interface How Are We Doing? Relevance in the Age of OpenJDK Summary 2007 JavaOneSM Conference | Session TS-7820 | 4 Apache Harmony In the Beginning May 2005—founded in the Apache Incubator Primary Goals 1. Compatible, independent implementation of Java™ Platform, Standard Edition (Java SE platform) under the Apache License 2. Community-developed, modular architecture allowing sharing and independent innovation 3. Protect IP rights of ecosystem 2007 JavaOneSM Conference | Session TS-7820 | 5 Apache Harmony Early history: 2005 Broad community discussion • Technical issues • Legal and IP issues • Project governance issues Goal: Consolidation and Consensus 2007 JavaOneSM Conference | Session TS-7820 | 6 Early History Early history: 2005/2006 Initial Code Contributions • Three Virtual machines ● JCHEVM, BootVM, DRLVM • Class Libraries ● Core classes, VM interface, test cases ● Security, beans, regex, Swing, AWT ● RMI and math 2007 JavaOneSM Conference | Session TS-7820 |
    [Show full text]
  • CS 110 Discussion 15 Programming with SIMD Intrinsics
    CS 110 Discussion 15 Programming with SIMD Intrinsics Yanjie Song School of Information Science and Technology May 7, 2020 Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 1 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 2 / 21 Table of Contents 1 Introduction on Intrinsics 2 Compiler and SIMD Intrinsics 3 Intel(R) SDE 4 Application: Horizontal sum in vector Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 3 / 21 Introduction on Intrinsics Definition In computer software, in compiler theory, an intrinsic function (or builtin function) is a function (subroutine) available for use in a given programming language whose implementation is handled specially by the compiler. Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 4 / 21 Intrinsics in C/C++ Compilers for C and C++, of Microsoft, Intel, and the GNU Compiler Collection (GCC) implement intrinsics that map directly to the x86 single instruction, multiple data (SIMD) instructions (MMX, Streaming SIMD Extensions (SSE), SSE2, SSE3, SSSE3, SSE4). Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 5 / 21 x86 SIMD instruction set extensions MMX (1996, 64 bits) 3DNow! (1998) Streaming SIMD Extensions (SSE, 1999, 128 bits) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) Advanced Vector eXtensions (AVX, 2008, 256 bits) AVX2 (2013) F16C (2009) XOP (2009) FMA FMA4 (2011) FMA3 (2012) AVX-512 (2015, 512 bits) Yanjie Song (S.I.S.T.) CS 110 Discussion 15 2020.05.07 6 / 21 SIMD extensions in other ISAs There are SIMD instructions for other ISAs as well, e.g.
    [Show full text]
  • PGI Compilers
    USER'S GUIDE FOR X86-64 CPUS Version 2019 TABLE OF CONTENTS Preface............................................................................................................ xii Audience Description......................................................................................... xii Compatibility and Conformance to Standards............................................................xii Organization................................................................................................... xiii Hardware and Software Constraints.......................................................................xiv Conventions.................................................................................................... xiv Terms............................................................................................................ xv Related Publications.........................................................................................xvii Chapter 1. Getting Started.....................................................................................1 1.1. Overview................................................................................................... 1 1.2. Creating an Example..................................................................................... 2 1.3. Invoking the Command-level PGI Compilers......................................................... 2 1.3.1. Command-line Syntax...............................................................................2 1.3.2. Command-line Options............................................................................
    [Show full text]
  • Intel Hardware Intrinsics in .NET Core
    Han Lee, Intel Corporation [email protected] Notices and Disclaimers No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Learn more at intel.com, or from the OEM or retailer. The products and services described may contain defects or errors known as errata which may cause deviations from published specifications. Current characterized errata are available on request. No product or component can be absolutely secure. Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548- 4725 or by visiting www.intel.com/design/literature.htm. Intel, the Intel logo, and other Intel product and solution names in this presentation are trademarks of Intel *Other names and brands may be claimed as the property of others © Intel Corporation. 2 What Do These Have in Common? Domain Example Image processing Color extraction High performance computing (HPC) Matrix multiplication Data processing Hamming code Text processing UTF-8 conversion Data structures Bit array Machine learning Classification For performance sensitive code, consider using Intel® hardware intrinsics 3 Objectives .
    [Show full text]
  • Oracle Lifetime Support Policy for Oracle Fusion Middleware Guide
    ORACLE INFORMATION-DRIVEN SUPPORT Oracle Lifetime Support Policy Oracle Fusion Middleware Oracle Fusion Middleware 8 Oracle Fusion Middleware Releases 8 Application Development Tools 9 Oracle’s Application Development Tools 9 Oracle’s GraalVM Enterprise Releases 9 Oracle Cloud Application Foundation Releases 10 Oracle’s Sun and Glassfish Application Server Releases 12 Oracle’s Java Releases 13 Oracle’s Sun JDK Releases 13 Oracle’s Blockchain Platform Releases 14 Business Intelligence 14 Oracle Business Intelligence EE Releases 14 Oracle’s HyperRoll Releases 21 Oracle’s Siebel Technology Releases 22 Oracle’s Siebel Applications Releases 22 Oracle Big Data Discovery Releases 23 Oracle Endeca Information Discovery Releases 23 Oracle’s Endeca Releases 25 Master Data Management and Data Integrator 26 Oracle’s GoldenGate Releases 26 Oracle Data Integrator Releases 28 Oracle Data Integrator (Formerly Sunopsis) Releases 28 Oracle’s Sun Master Data Management and Data Integrator Releases 28 Oracle’s Silver Creek and EDQP Releases 29 Oracle's Datanomic and EDQ Releases 30 Oracle WebCenter Portal Releases 31 Oracle’s Sun Portal Releases 32 Oracle WebCenter Content Releases 32 Oracle’s Stellent Releases (Enterprise Content Management) 34 Oracle’s Captovation Releases (Enterprise Content Management) 35 Oracle WebCenter Sites Releases 36 Oracle FatWire Releases (WebCenter Sites) 36 Oracle Identity and Access Management Releases 37 Oracle’s Bharosa Releases 40 Oracle’s Passlogix Releases 41 Oracle’s Bridgestream Releases 41 Oracle’s Oblix Releases 42
    [Show full text]
  • Secure Control Applications in Smart Homes and Buildings
    Die approbierte Originalversion dieser Dissertation ist in der Hauptbibliothek der Technischen Universität Wien aufgestellt und zugänglich. http://www.ub.tuwien.ac.at The approved original version of this thesis is available at the main library of the Vienna University of Technology. http://www.ub.tuwien.ac.at/eng Secure Control Applications in Smart Homes and Buildings DISSERTATION submitted in partial fulfillment of the requirements for the degree of Doktor der Technischen Wissenschaften by Mag. Dipl.-Ing. Friedrich Praus Registration Number 0025854 to the Faculty of Informatics at the Vienna University of Technology Advisor: Ao.Univ.Prof. Dipl.-Ing. Dr.techn. Wolfgang Kastner The dissertation has been reviewed by: Ao.Univ.Prof. Dipl.-Ing. Prof. Dr. Peter Palensky Dr.techn. Wolfgang Kastner Vienna, 1st October, 2015 Mag. Dipl.-Ing. Friedrich Praus Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Erklärung zur Verfassung der Arbeit Mag. Dipl.-Ing. Friedrich Praus Hallergasse 11/29, A-1110 Wien Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 1. Oktober 2015 Friedrich Praus v Kurzfassung Die zunehmende Integration von heterogenen Gebäudeautomationssystemen ermöglicht gesteigerten Komfort, Energieeffizienz, verbessertes Gebäudemanagement, Nachhaltig- keit sowie erweiterte Anwendungsgebiete, wie beispielsweise “Active Assisted Living” Szenarien. Diese Smart Homes und Gebäude sind heutzutage als dezentrale Systeme rea- lisiert, in denen eingebettete Geräte Prozessdaten über ein Netzwerk austauschen.
    [Show full text]
  • Thread Scheduling in Multi-Core Operating Systems Redha Gouicem
    Thread Scheduling in Multi-core Operating Systems Redha Gouicem To cite this version: Redha Gouicem. Thread Scheduling in Multi-core Operating Systems. Computer Science [cs]. Sor- bonne Université, 2020. English. tel-02977242 HAL Id: tel-02977242 https://hal.archives-ouvertes.fr/tel-02977242 Submitted on 24 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Ph.D thesis in Computer Science Thread Scheduling in Multi-core Operating Systems How to Understand, Improve and Fix your Scheduler Redha GOUICEM Sorbonne Université Laboratoire d’Informatique de Paris 6 Inria Whisper Team PH.D.DEFENSE: 23 October 2020, Paris, France JURYMEMBERS: Mr. Pascal Felber, Full Professor, Université de Neuchâtel Reviewer Mr. Vivien Quéma, Full Professor, Grenoble INP (ENSIMAG) Reviewer Mr. Rachid Guerraoui, Full Professor, École Polytechnique Fédérale de Lausanne Examiner Ms. Karine Heydemann, Associate Professor, Sorbonne Université Examiner Mr. Etienne Rivière, Full Professor, University of Louvain Examiner Mr. Gilles Muller, Senior Research Scientist, Inria Advisor Mr. Julien Sopena, Associate Professor, Sorbonne Université Advisor ABSTRACT In this thesis, we address the problem of schedulers for multi-core architectures from several perspectives: design (simplicity and correct- ness), performance improvement and the development of application- specific schedulers.
    [Show full text]
  • Optimizing Subroutines in Assembly Language an Optimization Guide for X86 Platforms
    2. Optimizing subroutines in assembly language An optimization guide for x86 platforms By Agner Fog. Copenhagen University College of Engineering. Copyright © 1996 - 2012. Last updated 2012-02-29. Contents 1 Introduction ....................................................................................................................... 4 1.1 Reasons for using assembly code .............................................................................. 5 1.2 Reasons for not using assembly code ........................................................................ 5 1.3 Microprocessors covered by this manual .................................................................... 6 1.4 Operating systems covered by this manual................................................................. 7 2 Before you start................................................................................................................. 7 2.1 Things to decide before you start programming .......................................................... 7 2.2 Make a test strategy.................................................................................................... 9 2.3 Common coding pitfalls............................................................................................. 10 3 The basics of assembly coding........................................................................................ 12 3.1 Assemblers available ................................................................................................ 12 3.2 Register set
    [Show full text]
  • Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets
    Automatic SIMD Vectorization of Fast Fourier Transforms for the Larrabee and AVX Instruction Sets Daniel S. McFarlin Volodymyr Arbatov Franz Franchetti Department of Electrical and Department of Electrical and Department of Electrical and Computer Engineering Computer Engineering Computer Engineering Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA USA 15213 Pittsburgh, PA USA 15213 Pittsburgh, PA USA 15213 [email protected] [email protected] [email protected] Markus Püschel Department of Computer Science ETH Zurich 8092 Zurich, Switzerland [email protected] ABSTRACT General Terms The well-known shift to parallelism in CPUs is often associated Performance with multicores. However another trend is equally salient: the increasing parallelism in per-core single-instruction multiple-date Keywords (SIMD) vector units. Intel’s SSE and IBM’s VMX (compatible to Autovectorization, super-optimization, SIMD, program generation, AltiVec) both offer 4-way (single precision) floating point, but the Fourier transform recent Intel instruction sets AVX and Larrabee (LRB) offer 8-way and 16-way, respectively. Compilation and optimization for vector extensions is hard, and often the achievable speed-up by using vec- 1. Introduction torizing compilers is small compared to hand-optimization using Power and area constraints are increasingly dictating microar- intrinsic function interfaces. Unfortunately, the complexity of these chitectural developments in the commodity and high-performance intrinsics interfaces increases considerably with the vector length, (HPC) CPU space. Consequently, the once dominant approach of making hand-optimization a nightmare. In this paper, we present a dynamically extracting instruction-level parallelism (ILP) through peephole-based vectorization system that takes as input the vector monolithic out-of-order microarchitectures is being supplanted by instruction semantics and outputs a library of basic data reorgani- designs with simpler, replicable architectural features.
    [Show full text]
  • Micro Focus Visual COBOL 6.0 for Visual Studio
    Micro Focus Visual COBOL 6.0 for Visual Studio Release Notes Micro Focus The Lawn 22-30 Old Bath Road Newbury, Berkshire RG14 1QN UK http://www.microfocus.com © Copyright 2020 Micro Focus or one of its affiliates. MICRO FOCUS, the Micro Focus logo and Visual COBOL are trademarks or registered trademarks of Micro Focus or one of its affiliates. All other marks are the property of their respective owners. 2020-06-16 ii Contents Micro Focus Visual COBOL 6.0 for Visual Studio Release Notes ..................5 What's New ......................................................................................................... 6 .NET Core ........................................................................................................................... 6 COBOL Application Console Size .......................................................................................6 COBOL Language Enhancements ......................................................................................6 Code Analysis ..................................................................................................................... 7 Code Analyzer Refactoring ................................................................................................. 7 Compiler Directives ............................................................................................................. 7 Containers ...........................................................................................................................8 Database Access - DB2 ECM ............................................................................................
    [Show full text]
  • Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine
    Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine Entwurf und Analyse einer Scala Benchmark Suite für die Java Virtual Machine Zur Erlangung des akademischen Grades Doktor-Ingenieur (Dr.-Ing.) genehmigte Dissertation von Diplom-Mathematiker Andreas Sewe aus Twistringen, Deutschland April 2013 — Darmstadt — D 17 Fachbereich Informatik Fachgebiet Softwaretechnik Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine Entwurf und Analyse einer Scala Benchmark Suite für die Java Virtual Machine Genehmigte Dissertation von Diplom-Mathematiker Andreas Sewe aus Twistrin- gen, Deutschland 1. Gutachten: Prof. Dr.-Ing. Ermira Mezini 2. Gutachten: Prof. Richard E. Jones Tag der Einreichung: 17. August 2012 Tag der Prüfung: 29. Oktober 2012 Darmstadt — D 17 For Bettina Academic Résumé November 2007 – October 2012 Doctoral studies at the chair of Prof. Dr.-Ing. Er- mira Mezini, Fachgebiet Softwaretechnik, Fachbereich Informatik, Techni- sche Universität Darmstadt October 2001 – October 2007 Studies in mathematics with a special focus on com- puter science (Mathematik mit Schwerpunkt Informatik) at Technische Uni- versität Darmstadt, finishing with a degree of Diplom-Mathematiker (Dipl.- Math.) iii Acknowledgements First and foremost, I would like to thank Mira Mezini, my thesis supervisor, for pro- viding me with the opportunity and freedom to pursue my research, as condensed into the thesis you now hold in your hands. Her experience and her insights did much to improve my research as did her invaluable ability to ask the right questions at the right time. I would also like to thank Richard Jones for taking the time to act as secondary reviewer of this thesis.
    [Show full text]