Effective Extensible Programming: Unleashing Julia on Gpus

Total Page:16

File Type:pdf, Size:1020Kb

Effective Extensible Programming: Unleashing Julia on Gpus IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 1 Effective Extensible Programming: Unleashing Julia on GPUs Tim Besard, Christophe Foket and Bjorn De Sutter, Member, IEEE Abstract—GPUs and other accelerators are popular devices for accelerating compute-intensive, parallelizable applications. However, programming these devices is a difficult task. Writing efficient device code is challenging, and is typically done in a low-level programming language. High-level languages are rarely supported, or do not integrate with the rest of the high-level language ecosystem. To overcome this, we propose compiler infrastructure to efficiently add support for new hardware or environments to an existing programming language. We evaluate our approach by adding support for NVIDIA GPUs to the Julia programming language. By integrating with the existing compiler, we significantly lower the cost to implement and maintain the new compiler, and facilitate reuse of existing application code. Moreover, use of the high-level Julia programming language enables new and dynamic approaches for GPU programming. This greatly improves programmer productivity, while maintaining application performance similar to that of the official NVIDIA CUDA toolkit. Index Terms—Graphics processors, very high-level languages, code generation F 1 INTRODUCTION the implementation but hinders long-term maintainability To satisfy ever higher computational demands, hardware when the host language gains features or changes semantics. vendors and software developers look at accelerators, spe- It also forces users to learn and deal with the inevitable cialized processors that are optimized for specific, typically divergence between individual language implementations. parallel workloads, and perform much better at them than This paper presents a vision in which the high-level general-purpose processors [1], [2], [3], [4], [5]. Multiple hard- language compiler exposes interfaces to alter the compilation ware vendors are working on such accelerators and release process (Section2). Implementations of the language for many new devices every year. These rapid developments other platforms can use these interfaces together with other make it difficult for developers to keep up and gain sufficient extensible programming patterns to ensure that source code experience programming the devices. This is exacerbated by is compiled to compatible and efficient accelerator machine the fact that many vendors only provide low-level toolchains, code. To demonstrate the power of this approach, we have such as CUDA or OpenCL, which offer full control to reach added such interfaces to the reference compiler of the Julia peak performance at the cost of developer productivity [6]. language (Section4), and used it to add support for NVIDIA To improve developer productivity, programmers com- GPUs (Section5). We show that the resulting toolchain makes monly use high-level programming languages. However, it possible to write generic and high-level GPU code, while these languages often rely on techniques and functionality performing similar to low-level CUDA C (Section6). All code that are hard to implement or even incompatible with implementing this framework is available as open-source execution on typical accelerators, such as interpretation, software on GitHub, and can be easily installed using the tracing Just-in-Time (JIT) compilation, or reliance on a Julia package manager. Our contributions are as follows: managed runtime library. To remedy this, implementations • We introduce interfaces for altering the compilation of high-level languages for accelerators generally target a arXiv:1712.03112v1 [cs.PL] 8 Dec 2017 process, and implement them in the Julia compiler. derived version of the language, such as a restricted subset • We present an implementation of the Julia language or an embedded Domain Specific Language (DSL), in which for NVIDIA GPUs, using the introduced interfaces. incompatible features have been redefined or adjusted. • We analyze the performance of CUDA benchmarks Modern extensible languages offer the means to realize from the Rodinia benchmark suite, ported to Julia. such programming language derivatives [7]. For example, We show that code generated by our GPU compiler Lisp-like languages feature powerful macros for processing performs similar to CUDA C code compiled with syntax, Python’s decorators make it possible to change the NVIDIA’s reference compiler. behavior of functions and methods, and the Julia program- • We demonstrate high-level programming with this ming language supports introspection of each of its Inter- toolchain, and show that Julia GPU code can be highly mediate Representations (IRs). However, these facilities do generic and flexible, without sacrificing performance. not encourage reuse of existing compiler functionality. Most derived languages use a custom compiler, which simplifies • T. Besard, C. Foket and B. De Sutter are with the Department of Electronics and Information Systems, Ghent University, Belgium. {tim.besard,christophe.foket}@ugent.be Corresponding author: [email protected] This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2 Host extensive Device Host minimal Device Source code code reuse Source code Python code code reuse nopython code Main compiler Device package CPython Numba Front-end Parser Interpreter Front-end extensions High-level High-level High-level AST Numba IR IR optims optims Middle-end Codegen Codegen Middle-end extensions High-level Low-level Low-level Low-level Bytecode LLVM IR optims optims IR optims CPU GPU Back-end Back-end interpreter codegen Figure 1. Abstract overview of the proposed toolchain. Figure 2. CPython and Numba compilation processes for host and device. 2 VISION Our proposed compiler interfaces make it possible to Our proposed solution to the difficulty in integrating high- share functionality between an existing language imple- level languages and accelerators is a set of interfaces to mentation and external derivatives, avoiding needless reim- the high-level language’s general purpose compiler, that plementation of compiler functionality by reconfiguring provide fine-grained access to the different IRs and to the the existing compiler to generate code that is compatible processes that generate and optimize those IRs. With these with the platform at hand. This not only facilitates external interfaces, developers can influence the existing language language implementations, but also improves compatibility implementation and, e.g., improve compatibility with new with existing code as it avoids the inevitable differences hardware or run-time environments without the need for a between individual compiler implementations. custom compiler or an embedded language subset. In some cases, even more reuse of existing infrastructure Figure1 shows an overview of the proposed toolchain. than suggested in Figure1 is possible. When the back-end An external device package uses the introduced interfaces compiler used in the general-purpose tool flow can also target to add support for new hardware, without modifying the accelerators, there is no need to reimplement a device back existing language implementation. For example, it could end in the device package. Instead, that existing back-end refuse to generate code for certain language features, such as compiler can then be used for host and device code. Even if exceptions or dynamic memory allocations, or replace their this is not the case, it might not be necessary to reimplement code with compatible or optimized alternatives. a full device back end in the device package: If third-party Such a setup has multiple advantages. For one, it keeps device code generators can be reused, the device back end the existing language implementation stable, while new only has to translate the low-level IR code to an IR accepted implementations can be developed independently as external by that third-party code generator. packages. This makes it easier to experiment, as these Conceptually, the compiler interfaces shown in Figure1 packages do not need to meet the support, quality, or are generally applicable. Their actual instantiation, however, licensing requirements of the existing implementation. It will be specific to the host language and accelerator at hand. also makes it easier to cope with the rapid development pace We expect further research into such interfaces to generalize of accelerator hardware, providing the means for vendors to the design and improve reusability across languages and contribute more effectively to the language ecosystem. accelerators. For now, we will design the interfaces around a Another important advantage is the ability to reuse single language and accelerator platform. the existing language implementation, whereas current For this work, we chose to target Graphics Processing implementations of high-level languages for accelerators Units (GPUs), massively parallel accelerators with a dis- often reimplement large parts of the compiler. For example, tinctive enough architecture to make optimization worth- Numba is a JIT compiler for Python, building on the while, yet broadly usable for many kinds of applications CPython reference language implementation. As Figure2 (as explained in Section 3.1). Specifically, we focus on shows, the Numba compiler takes Python bytecode and NVIDIA GPUs with the Compute
Recommended publications
  • CPAS04 Authenticator Options Version 1.0 11 November 2015
    GSM Association Non-confidential Official Document PDATA.03 - CPAS04 Authenticator Options CPAS04 Authenticator Options Version 1.0 11 November 2015 This is a Non-binding Permanent Reference Document of the GSMA Security Classification: Non-confidential Access to and distribution of this document is restricted to the persons permitted by the security classification. This document is confidential to the Association and is subject to copyright protection. This document is to be used only for the purposes for which it has been supplied and information contained in it must not be disclosed or in any other way made available, in whole or in part, to persons other than those permitted under the security classification without the prior written approval of the Association. Copyright Notice Copyright © 2016 GSM Association Disclaimer The GSM Association (“Association”) makes no representation, warranty or undertaking (express or implied) with respect to and does not accept any responsibility for, and hereby disclaims liability for the accuracy or completeness or timeliness of the information contained in this document. The information contained in this document may be subject to change without prior notice. Antitrust Notice The information contain herein is in full compliance with the GSM Association’s antitrust compliance policy. V1.0 Page 1 of 38 GSM Association Non-confidential Official Document PDATA.03 - CPAS04 Authenticator Options Table of Contents 1 Introduction 3 1.1 Overview 3 1.2 Scope 3 1.3 Abbreviations 3 1.4 References 5 2 Level
    [Show full text]
  • Advancing Automatic Programming: Regeneration, Meta-Circularity, and Two-Sided Interfaces
    SOFTNET 2019 Advancing Automatic Programming: Regeneration, Meta-circularity, and Two-Sided Interfaces HERWIGMANNAERT NOVEMBER 2 7 , 2 0 1 9 • Automatic Programming • Toward Automatic Regeneration • Creating Meta-Circular Regeneration • On Meta-Programming Interfaces ADVANCING AUTOMATIC PROGRAMMING • Concluding Facts and Thoughts Overview • Questions and Discussion • Automatic Programming • Concept and Trends • Remaining Challenges • Toward Automatic Regeneration ADVANCING AUTOMATIC PROGRAMMING • Creating Meta-Circular Regeneration • On Meta-Programming Interfaces Overview • Concluding Facts and Thoughts • Questions and Discussion Automatic Programming • Automatic programming: • The act of automatically generating source code from a model or template • Sometimes distinguished from code generation, as performed by a compiler • Has always been a euphemism for programming in a higher-level language than was then available to the programmer [David Parnas] • Also referred to a generative or meta-programming • To manufacture software components in an automated way • It is as old as programming itself: System.out.println(“Hello world.”); System.out.println(“System.out.println(\“Hello world.\”);”); The Need for Automatic Programming • Goal is and has always been to improve programmer productivity • In general, to manufacture software components in an automated way, in the same way as automation in the industrial revolution, would: • Increase programming productivity • Consolidate programming knowledge • Eliminate human programming errors • Such
    [Show full text]
  • Turing's Influence on Programming — Book Extract from “The Dawn of Software Engineering: from Turing to Dijkstra”
    Turing's Influence on Programming | Book extract from \The Dawn of Software Engineering: from Turing to Dijkstra" Edgar G. Daylight∗ Eindhoven University of Technology, The Netherlands [email protected] Abstract Turing's involvement with computer building was popularized in the 1970s and later. Most notable are the books by Brian Randell (1973), Andrew Hodges (1983), and Martin Davis (2000). A central question is whether John von Neumann was influenced by Turing's 1936 paper when he helped build the EDVAC machine, even though he never cited Turing's work. This question remains unsettled up till this day. As remarked by Charles Petzold, one standard history barely mentions Turing, while the other, written by a logician, makes Turing a key player. Contrast these observations then with the fact that Turing's 1936 paper was cited and heavily discussed in 1959 among computer programmers. In 1966, the first Turing award was given to a programmer, not a computer builder, as were several subsequent Turing awards. An historical investigation of Turing's influence on computing, presented here, shows that Turing's 1936 notion of universality became increasingly relevant among programmers during the 1950s. The central thesis of this paper states that Turing's in- fluence was felt more in programming after his death than in computer building during the 1940s. 1 Introduction Many people today are led to believe that Turing is the father of the computer, the father of our digital society, as also the following praise for Martin Davis's bestseller The Universal Computer: The Road from Leibniz to Turing1 suggests: At last, a book about the origin of the computer that goes to the heart of the story: the human struggle for logic and truth.
    [Show full text]
  • Modern Password Security for System Designers What to Consider When Building a Password-Based Authentication System
    Modern password security for system designers What to consider when building a password-based authentication system By Ian Maddox and Kyle Moschetto, Google Cloud Solutions Architects This whitepaper describes and models modern password guidance and recommendations for the designers and engineers who create secure online applications. A related whitepaper, Password security ​ for users, offers guidance for end users. This whitepaper covers the wide range of options to consider ​ when building a password-based authentication system. It also establishes a set of user-focused recommendations for password policies and storage, including the balance of password strength and usability. The technology world has been trying to improve on the password since the early days of computing. Shared-knowledge authentication is problematic because information can fall into the wrong hands or be forgotten. The problem is magnified by systems that don't support real-world secure use cases and by the frequent decision of users to take shortcuts. According to a 2019 Yubico/Ponemon study, 69 percent of respondents admit to sharing passwords with ​ ​ their colleagues to access accounts. More than half of respondents (51 percent) reuse an average of five passwords across their business and personal accounts. Furthermore, two-factor authentication is not widely used, even though it adds protection beyond a username and password. Of the respondents, 67 percent don’t use any form of two-factor authentication in their personal life, and 55 percent don’t use it at work. Password systems often allow, or even encourage, users to use insecure passwords. Systems that allow only single-factor credentials and that implement ineffective security policies add to the problem.
    [Show full text]
  • NVIDIA CUDA on IBM POWER8: Technical Overview, Software Installation, and Application Development
    Redpaper Dino Quintero Wei Li Wainer dos Santos Moschetta Mauricio Faria de Oliveira Alexander Pozdneev NVIDIA CUDA on IBM POWER8: Technical overview, software installation, and application development Overview The exploitation of general-purpose computing on graphics processing units (GPUs) and modern multi-core processors in a single heterogeneous parallel system has proven highly efficient for running several technical computing workloads. This applied to a wide range of areas such as chemistry, bioinformatics, molecular biology, engineering, and big data analytics. Recently launched, the IBM® Power System S824L comes into play to explore the use of the NVIDIA Tesla K40 GPU, combined with the latest IBM POWER8™ processor, providing a unique technology platform for high performance computing. This IBM Redpaper™ publication discusses the installation of the system, and the development of C/C++ and Java applications using the NVIDIA CUDA platform for IBM POWER8. Note: CUDA stands for Compute Unified Device Architecture. It is a parallel computing platform and programming model created by NVIDIA and implemented by the GPUs that they produce. The following topics are covered: Advantages of NVIDIA on POWER8 The IBM Power Systems S824L server Software stack System monitoring Application development Tuning and debugging Application examples © Copyright IBM Corp. 2015. All rights reserved. ibm.com/redbooks 1 Advantages of NVIDIA on POWER8 The IBM and NVIDIA partnership was announced in November 2013, for the purpose of integrating IBM POWER®-based systems with NVIDIA GPUs, and enablement of GPU-accelerated applications and workloads. The goal is to deliver higher performance and better energy efficiency to companies and data centers. This collaboration produced its initial results in 2014 with: The announcement of the first IBM POWER8 system featuring NVIDIA Tesla GPUs (IBM Power Systems™ S824L).
    [Show full text]
  • Programming Language
    Programming language A programming language is a formal language, which comprises a set of instructions that produce various kinds of output. Programming languages are used in computer programming to implement algorithms. Most programming languages consist of instructions for computers. There are programmable machines that use a set of specific instructions, rather than general programming languages. Early ones preceded the invention of the digital computer, the first probably being the automatic flute player described in the 9th century by the brothers Musa in Baghdad, during the Islamic Golden Age.[1] Since the early 1800s, programs have been used to direct the behavior of machines such as Jacquard looms, music boxes and player pianos.[2] The programs for these The source code for a simple computer program written in theC machines (such as a player piano's scrolls) did not programming language. When compiled and run, it will give the output "Hello, world!". produce different behavior in response to different inputs or conditions. Thousands of different programming languages have been created, and more are being created every year. Many programming languages are written in an imperative form (i.e., as a sequence of operations to perform) while other languages use the declarative form (i.e. the desired result is specified, not how to achieve it). The description of a programming language is usually split into the two components ofsyntax (form) and semantics (meaning). Some languages are defined by a specification document (for example, theC programming language is specified by an ISO Standard) while other languages (such as Perl) have a dominant implementation that is treated as a reference.
    [Show full text]
  • Model Based Software Development: Issues & Challenges
    Model Based Software Development: Issues & Challenges N Md Jubair Basha 1, Salman Abdul Moiz 2 & Mohammed Rizwanullah 3 1&3 IT Department, Muffakham Jah College of Engineering & Technology, Hyderabad, India 2IT Department, MVSR Engineering College, Hyderabad, India E-mail : [email protected] 1, [email protected] 2, [email protected] 3 Abstract - One of the goals of software design is to model a system in such a way that it is easily understandable . Nowadays the tendency for software development is changing from manual coding to automatic code generation; it is becoming model-based. This is a response to the software crisis, in which the cost of hardware has decreased and conversely the cost of software development has increased sharply. The methodologies that allowed this change are model-based, thus relieving the human from detailed coding. Still there is a long way to achieve this goal, but work is being done worldwide to achieve this objective. This paper presents the drastic changes related to modeling and important challenging issues and techniques that recur in MBSD. Keywords - model based software; domain engineering; domain specificity; transformations. I. INTRODUCTION domain engineering and application. The system described by a model may or may not exist at the time Model is an abstraction of some aspect of a system. the model is created. Models are created to serve Model-based software and system design is based on the particular purposes, for example, to present a human end-to-end use of formal, composable and manipulable understandable description of some aspect of a system models in the product life-cycle.
    [Show full text]
  • Safe, Fast and Easy: Towards Scalable Scripting Languages
    Safe, Fast and Easy: Towards Scalable Scripting Languages by Pottayil Harisanker Menon A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy. Baltimore, Maryland Feb, 2017 ⃝c Pottayil Harisanker Menon 2017 All rights reserved Abstract Scripting languages are immensely popular in many domains. They are char- acterized by a number of features that make it easy to develop small applications quickly - flexible data structures, simple syntax and intuitive semantics. However they are less attractive at scale: scripting languages are harder to debug, difficult to refactor and suffers performance penalties. Many research projects have tackled the issue of safety and performance for existing scripting languages with mixed results: the considerable flexibility offered by their semantics also makes them significantly harder to analyze and optimize. Previous research from our lab has led to the design of a typed scripting language built specifically to be flexible without losing static analyzability. Inthis dissertation, we present a framework to exploit this analyzability, with the aim of producing a more efficient implementation Our approach centers around the concept of adaptive tags: specialized tags attached to values that represent how it is used in the current program. Our frame- work abstractly tracks the flow of deep structural types in the program, and thuscan ii ABSTRACT efficiently tag them at runtime. Adaptive tags allow us to tackle key issuesatthe heart of performance problems of scripting languages: the framework is capable of performing efficient dispatch in the presence of flexible structures. iii Acknowledgments At the very outset, I would like to express my gratitude and appreciation to my advisor Prof.
    [Show full text]
  • Dynamic Extension of Typed Functional Languages
    Dynamic Extension of Typed Functional Languages Don Stewart PhD Dissertation School of Computer Science and Engineering University of New South Wales 2010 Supervisor: Assoc. Prof. Manuel M. T. Chakravarty Co-supervisor: Dr. Gabriele Keller Abstract We present a solution to the problem of dynamic extension in statically typed functional languages with type erasure. The presented solution re- tains the benefits of static checking, including type safety, aggressive op- timizations, and native code compilation of components, while allowing extensibility of programs at runtime. Our approach is based on a framework for dynamic extension in a stat- ically typed setting, combining dynamic linking, runtime type checking, first class modules and code hot swapping. We show that this framework is sufficient to allow a broad class of dynamic extension capabilities in any statically typed functional language with type erasure semantics. Uniquely, we employ the full compile-time type system to perform run- time type checking of dynamic components, and emphasize the use of na- tive code extension to ensure that the performance benefits of static typing are retained in a dynamic environment. We also develop the concept of fully dynamic software architectures, where the static core is minimal and all code is hot swappable. Benefits of the approach include hot swappable code and sophisticated application extension via embedded domain specific languages. We instantiate the concepts of the framework via a full implementation in the Haskell programming language: providing rich mechanisms for dy- namic linking, loading, hot swapping, and runtime type checking in Haskell for the first time. We demonstrate the feasibility of this architecture through a number of novel applications: an extensible text editor; a plugin-based network chat bot; a simulator for polymer chemistry; and xmonad, an ex- tensible window manager.
    [Show full text]
  • Trusted Execution Environments
    Trusted Execution Environments Assaf Rosenbaum Technion - Computer Science Department - M.Sc. Thesis MSC-2019-03 - 2019 Technion - Computer Science Department - M.Sc. Thesis MSC-2019-03 - 2019 Trusted Execution Environments Research Thesis Submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science Assaf Rosenbaum Submitted to the Senate of the Technion | Israel Institute of Technology Kislev 5779 Haifa November 2018 Technion - Computer Science Department - M.Sc. Thesis MSC-2019-03 - 2019 Technion - Computer Science Department - M.Sc. Thesis MSC-2019-03 - 2019 This research was carried out under the supervision of Prof. Eli Biham and Dr. Sara Bitan, in the Faculty of Computer Science. Acknowledgements First and foremost, I wish to thank my advisors, Prof. Eli Biham and Dr. Sara Bitan, for their incredible support during the work on thesis. Thank you for providing me the freedom to find my own path while constantly guiding and directing me. I learned a lot from both of you. I would also like to thank my parents, siblings and my parents-in-law for their help and support during this stressful time. Thank you for being there whenever I needed. A special thanks is due to my three lovely children, you sparkle my life with moment of pure joy. Most importantly, I would like to thank my beautiful wife, Lior. Thank you for your endless support, the long hours in which you worked hard to give me the time I needed to work and for always pushing me forward. You ar the light of my life and I love you very much.
    [Show full text]
  • E.W. Dijkstra Archive: on the Cruelty of Really Teaching Computing Science
    On the cruelty of really teaching computing science Edsger W. Dijkstra. (EWD1036) http://www.cs.utexas.edu/users/EWD/ewd10xx/EWD1036.PDF The second part of this talk pursues some of the scientific and educational consequences of the assumption that computers represent a radical novelty. In order to give this assumption clear contents, we have to be much more precise as to what we mean in this context by the adjective "radical". We shall do so in the first part of this talk, in which we shall furthermore supply evidence in support of our assumption. The usual way in which we plan today for tomorrow is in yesterday’s vocabulary. We do so, because we try to get away with the concepts we are familiar with and that have acquired their meanings in our past experience. Of course, the words and the concepts don’t quite fit because our future differs from our past, but then we stretch them a little bit. Linguists are quite familiar with the phenomenon that the meanings of words evolve over time, but also know that this is a slow and gradual process. It is the most common way of trying to cope with novelty: by means of metaphors and analogies we try to link the new to the old, the novel to the familiar. Under sufficiently slow and gradual change, it works reasonably well; in the case of a sharp discontinuity, however, the method breaks down: though we may glorify it with the name "common sense", our past experience is no longer relevant, the analogies become too shallow, and the metaphors become more misleading than illuminating.
    [Show full text]
  • The University of Chicago Reflective Techniques In
    THE UNIVERSITY OF CHICAGO REFLECTIVE TECHNIQUES IN EXTENSIBLE LANGUAGES A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF COMPUTER SCIENCE BY JONATHAN RIEHL CHICAGO, ILLINOIS AUGUST 2008 To Leon Schram ABSTRACT An extensible programming language allows programmers to use the language to modify one or more of the language’s syntactic, static-semantic, and/or dynamic- semantic properties. This dissertation presents Mython, a variant of the Python lan- guage, which affords extensibility of all three language properties. Mython achieves extensibility through a synthesis of reflection, staging, and compile-time evaluation. This synthesis allows language embedding, language evolution, domain-specific opti- mization, and tool development to be performed in the Mython language. This work argues that using language-development tools from inside an extensible language is preferable to using external tools. The included case study shows that users of an embedded differential equation language are able to work with both the embedded language and embedded programs in an interactive fashion — simplifying their work flow, and the task of specifying the embedded language. iii ACKNOWLEDGMENTS In keeping with my dedication, I’d like to begin by thanking Leon Schram, my A.P. computer science teacher. Mr. Schram took a boy who liked to play with computers, and made him a young man that could apply reason to the decomposition and solution of problems. In turn, I’d like to thank Charlie Fly and Robin Friedrich for helping extend that reason to encompass large scale systems and language systems.
    [Show full text]