Productive High Performance Parallel Programming with Auto-Tuned Domain-Specific Embedded Languages

Total Page:16

File Type:pdf, Size:1020Kb

Productive High Performance Parallel Programming with Auto-Tuned Domain-Specific Embedded Languages Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages By Shoaib Ashraf Kamil A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Armando Fox, Co-Chair Professor Katherine Yelick, Co-Chair Professor James Demmel Professor Berend Smit Fall 2012 Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages Copyright c 2012 Shoaib Kamil. Abstract Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages by Shoaib Ashraf Kamil Doctor of Philosophy in Computer Science University of California, Berkeley Professor Armando Fox, Co-Chair Professor Katherine Yelick, Co-Chair As the complexity of machines and architectures has increased, performance tuning has become more challenging, leading to the failure of general compilers to generate the best possible optimized code. Expert performance programmers can often hand-write code that outperforms compiler- optimized low-level code by an order of magnitude. At the same time, the complexity of programs has also increased, with modern programs built on a variety of abstraction layers to manage complexity, yet these layers hinder efforts at optimization. In fact, it is common to lose one or two additional orders of magnitude in performance when going from a low-level language such as Fortran or C to a high-level language like Python, Ruby, or Matlab. General purpose compilers are limited by the inability of program analysis to determine pro- grammer intent, as well as the lack of detailed performance models that always determine the best executable code for a given computation and architecture. The latter problem can be mitigated through auto-tuning, which generates many code variants for a particular problem and empirically determines which performs best on a given architecture. This thesis addresses the problem of how to write programs at a high level while obtaining the performance of code written by performance experts at the low level. To do so, we build domain- specific embedded languages that generate low-level parallel code from a high-level language, and then use auto-tuning to determine the best performing low-level code. Such DSELs avoid analysis by restricting the domain while ensuring programmers specify high-level intent, and by performing empirical auto-tuning instead of modeling machine parameters. As a result, programmers write in high-level languages with portions of their code using DSELs, yet obtain performance equivalent to the best hand-optimized low-level code, across many architectures. We present a methodology for building such auto-tuned DSELs, as well as a software in- frastructure and example DSELs using the infrastructure, including a DSEL for structured grid computations and two DSELs for graph algorithms. The structured grid DSEL obtains over 80% of peak performance for a variety of benchmark kernels across different architectures, while the graph algorithm DSELs mitigate all performance loss due to using a high-level language. Overall, the methodology, infrastructure, and example DSELs point to a promising new direction for obtaining high performance while programming in a high-level language. 1 For all who made this possible. i Contents List of Figures vii List of Tables x List of Symbols xi Acknowledgements xiii 1 Introduction 1 1.1 Thesis Contributions . .2 1.2 Thesis Outline . .3 2 Motivation and Background 6 2.1 Trends in Computing Hardware . .6 2.2 Trends in Software . .7 2.3 The Productivity-Performance Gap . .8 2.4 Auto-tuning and Auto-tuning Compilers . .9 2.5 Summary . 10 3 Related Work 11 3.1 Optimized Low-level Libraries and Auto-tuning . 11 3.2 Accelerating Python . 12 3.3 Domain-Specific Embedded Languages . 12 3.4 Just-in-Time Compilation & Specialization . 13 3.5 Accelerating Structured Grid Computations . 14 3.6 Accelerating Graph Algorithms . 14 3.7 Summary . 15 4 SEJITS: A Methodology for High Performance Domain-Specific Embedded Languages 16 4.1 Overview of SEJITS . 16 4.2 DSELs and APIs in Productivity Languages . 18 4.3 Code Generation . 21 4.4 Auto-tuning . 22 4.5 Best Practices for DSELs in SEJITS . 22 4.6 Language Requirements to Enable SEJITS . 23 4.7 Summary . 24 ii 5 Asp is SEJITS for Python 25 5.1 Overview of Asp . 25 5.2 Walkthrough: Building a DSEL Compiler Using Asp . 26 5.2.1 Defining the Semantic Model . 26 5.2.2 Transforming Python to Semantic Model Instances . 29 5.2.3 Generating Backend Code . 30 5.3 Expressing Semantic Models . 31 5.4 Code Generation . 32 5.4.1 Dealing with Types . 33 5.5 Just-In-Time Compilation of Asp Modules . 34 5.6 Debugging Support . 34 5.7 Auto-tuning Support . 35 5.8 Summary . 36 6 Experimental Setup 37 6.1 Hardware Platforms . 37 6.2 Software Environment . 38 6.2.1 Compilers & Runtimes . 38 6.2.2 Parallel Programming Models . 39 6.3 Performance Measurement Methodology . 39 6.3.1 Timing Methodology . 39 6.3.2 Roofline Model . 39 6.4 Summary . 40 7 Overview of Case Studies 42 8 Structured Grid Computations 44 8.1 Characteristics of Structured Grid Computations . 45 8.1.1 Applications . 45 8.1.2 Dimensionality . 45 8.1.3 Connectivity . 45 8.1.4 Topology . 47 8.2 Computational Characteristics . 48 8.2.1 Data Structures . 48 8.2.2 Interior Computation & Boundary Conditions . 49 8.2.3 Memory Traffic . 50 8.3 Optimizations . 50 8.3.1 Algorithmic Optimizations . 51 8.3.2 Cache and TLB Blocking . 52 8.3.3 Vectorization . 53 8.3.4 Locality Across Grid Sweeps . 54 8.3.5 Communication Avoiding Algorithms . 55 8.3.6 Parallelization . 55 8.3.7 Summary of Optimizations . 55 8.4 Modeling Performance of Structured Grid Algorithms . 56 iii 8.4.1 Serial Performance Models . 56 8.4.2 Roofline Model for Structured Grid . 57 8.5 Summary . 59 9 An Auto-tuner for Parallel Multicore Structured Grid Computations 61 9.1 Structured Grids Kernels & Architectures . 61 9.1.1 Benchmark Kernels . 64 9.1.2 Experimental Platforms . 65 9.2 Auto-tuning Framework . 65 9.2.1 Front-End Parsing . 65 9.2.2 Structured Grid Kernel Breadth . 67 9.3 Optimization & Code Generation . 67 9.3.1 Serial Optimizations . 68 9.3.2 Multicore-specific Optimizations and Code Generation . 69 9.3.3 CUDA-specific Optimizations and Code Generation . 70 9.4 Auto-Tuning Strategy Engine . 70 9.5 Performance Evaluation . 72 9.5.1 Auto-Parallelization Performance . 72 9.5.2 Performance Expectations . 72 9.5.3 Performance Portability . 76 9.5.4 Programmer Productivity Benefits . 77 9.5.5 Architectural Comparison . 77 9.6 Limitations . 77 9.7 Summary . 78 10 Sepya: An Embedded Domain-Specific Auto-tuning Compiler for Structured Grids 79 10.1 Analysis-Avoiding DSEL for Structured Grids . 80 10.1.1 Building Blocks of Structured Grid Calculations . 80 10.1.2 Language and Semantics . 81 10.1.3 Avoiding Analysis . 83 10.1.4 Language in Python Constructs . 83 10.2 Structure of the Sepya Compiler . 85 10.3 Implemented Code Generation Algorithms & Optimizations . 86 10.3.1 Auto-tuning . 87 10.3.2 Data Structure . 87 10.4 Evaluation . 88 10.4.1 Test kernels & Other DSL systems . ..
Recommended publications
  • A Case for High Performance Computing with Virtual Machines
    A Case for High Performance Computing with Virtual Machines Wei Huangy Jiuxing Liuz Bulent Abaliz Dhabaleswar K. Panday y Computer Science and Engineering z IBM T. J. Watson Research Center The Ohio State University 19 Skyline Drive Columbus, OH 43210 Hawthorne, NY 10532 fhuanwei, [email protected] fjl, [email protected] ABSTRACT in the 1960s [9], but are experiencing a resurgence in both Virtual machine (VM) technologies are experiencing a resur- industry and research communities. A VM environment pro- gence in both industry and research communities. VMs of- vides virtualized hardware interfaces to VMs through a Vir- fer many desirable features such as security, ease of man- tual Machine Monitor (VMM) (also called hypervisor). VM agement, OS customization, performance isolation, check- technologies allow running different guest VMs in a phys- pointing, and migration, which can be very beneficial to ical box, with each guest VM possibly running a different the performance and the manageability of high performance guest operating system. They can also provide secure and computing (HPC) applications. However, very few HPC ap- portable environments to meet the demanding requirements plications are currently running in a virtualized environment of computing resources in modern computing systems. due to the performance overhead of virtualization. Further, Recently, network interconnects such as InfiniBand [16], using VMs for HPC also introduces additional challenges Myrinet [24] and Quadrics [31] are emerging, which provide such as management and distribution of OS images. very low latency (less than 5 µs) and very high bandwidth In this paper we present a case for HPC with virtual ma- (multiple Gbps).
    [Show full text]
  • Red Hat Enterprise Linux 6 Developer Guide
    Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Dave Brolley William Cohen Roland Grunberg Aldy Hernandez Karsten Hopp Jakub Jelinek Developer Guide Jeff Johnston Benjamin Kosnik Aleksander Kurtakov Chris Moller Phil Muldoon Andrew Overholt Charley Wang Kent Sebastian Red Hat Enterprise Linux 6 Developer Guide An introduction to application development tools in Red Hat Enterprise Linux 6 Edition 0 Author Dave Brolley [email protected] Author William Cohen [email protected] Author Roland Grunberg [email protected] Author Aldy Hernandez [email protected] Author Karsten Hopp [email protected] Author Jakub Jelinek [email protected] Author Jeff Johnston [email protected] Author Benjamin Kosnik [email protected] Author Aleksander Kurtakov [email protected] Author Chris Moller [email protected] Author Phil Muldoon [email protected] Author Andrew Overholt [email protected] Author Charley Wang [email protected] Author Kent Sebastian [email protected] Editor Don Domingo [email protected] Editor Jacquelynn East [email protected] Copyright © 2010 Red Hat, Inc. and others. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
    [Show full text]
  • Ethereal Developer's Guide Draft 0.0.2 (15684) for Ethereal 0.10.11
    Ethereal Developer's Guide Draft 0.0.2 (15684) for Ethereal 0.10.11 Ulf Lamping, Ethereal Developer's Guide: Draft 0.0.2 (15684) for Ethere- al 0.10.11 by Ulf Lamping Copyright © 2004-2005 Ulf Lamping Permission is granted to copy, distribute and/or modify this document under the terms of the GNU General Public License, Version 2 or any later version published by the Free Software Foundation. All logos and trademarks in this document are property of their respective owner. Table of Contents Preface .............................................................................................................................. vii 1. Foreword ............................................................................................................... vii 2. Who should read this document? ............................................................................... viii 3. Acknowledgements ................................................................................................... ix 4. About this document .................................................................................................. x 5. Where to get the latest copy of this document? ............................................................... xi 6. Providing feedback about this document ...................................................................... xii I. Ethereal Build Environment ................................................................................................14 1. Introduction .............................................................................................................15
    [Show full text]
  • The GNU Compiler Collection on Zseries
    The GNU Compiler Collection on zSeries Dr. Ulrich Weigand Linux for zSeries Development, IBM Lab Böblingen [email protected] Agenda GNU Compiler Collection History and features Architecture overview GCC on zSeries History and current status zSeries specific features and challenges Using GCC GCC optimization settings GCC inline assembly Future of GCC GCC and Linux Apache Samba mount cvs binutils gdb gcc Linux ls grep Kernel glibc DB2 GNU - essentials UDB SAP R/3 Unix - tools Applications GCC History Timeline January 1984: Start of the GNU project May 1987: Release of GCC 1.0 February 1992: Release of GCC 2.0 August 1997: EGCS project announced November 1997: Release of EGCS 1.0 April 1999: EGCS / GCC merge July 1999: Release of GCC 2.95 June 2001: Release of GCC 3.0 May/August 2002: Release of GCC 3.1/3.2 March 2003: Release of GCC 3.3 (estimated) GCC Features Supported Languages part of GCC distribution: C, C++, Objective C Fortran 77 Java Ada distributed separately: Pascal Modula-3 under development: Fortran 95 Cobol GCC Features (cont.) Supported CPU targets i386, ia64, rs6000, s390 sparc, alpha, mips, arm, pa-risc, m68k, m88k many embedded targets Supported OS bindings Unix: Linux, *BSD, AIX, Solaris, HP/UX, Tru64, Irix, SCO DOS/Windows, Darwin (MacOS X) embedded targets and others Supported modes of operation native compiler cross-compiler 'Canadian cross' builds GCC Architecture: Overview C C++ Fortran Java ... front-end front-end front-end front-end tree Optimizer rtx i386 s390 rs6000 sparc ... back-end back-end back-end
    [Show full text]
  • Open Source License and Copyright Information for Gplv3 and Lgplv3
    Open Source License and Copyright Information for GPLv3/LGPLv3 Dell EMC PowerStore Open Source License and Copyright Information for GPLv3/LGPLv3 June 2021 Rev A02 Revisions Revisions Date Description May 2020 Initial release December 2020 Version updates for some licenses, and addition and deletion of other components June, 2021 Version updates for some licenses, and addition and deletion of other components The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2020-2021 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [6/1/2021] [Open Source License and Copyright Information for GPLv3/LGPLv3] [Rev A02] 2 Dell EMC PowerStore: Open Source License and Copyright Information for GPLv3/LGPLv3 Table of contents Table of contents Revisions............................................................................................................................................................................. 2 Table of contents ...............................................................................................................................................................
    [Show full text]
  • Exploring Massive Parallel Computation with GPU
    Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Exploring Massive Parallel Computation with GPU Ian Bond Massey University, Auckland, New Zealand 2011 Sagan Exoplanet Workshop Pasadena, July 25-29 2011 Ian Bond | Microlensing parallelism 1/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Assumptions/Purpose You are all involved in microlensing modelling and you have (or are working on) your own code this lecture shows how to get started on getting code to run on a GPU then its over to you . Ian Bond | Microlensing parallelism 2/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Outline 1 Need for parallelism 2 Graphical Processor Units 3 Gravitational Microlensing Modelling Ian Bond | Microlensing parallelism 3/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Paralel Computing Parallel Computing is use of multiple computers, or computers with multiple internal processors, to solve a problem at a greater computational speed than using a single computer (Wilkinson 2002). How does one achieve parallelism? Ian Bond | Microlensing parallelism 4/40 Need for parallelism Graphical Processor Units Gravitational Microlensing Modelling Grand Challenge Problems A grand challenge problem is one that cannot be solved in a reasonable amount of time with todays computers’ Examples: – Modelling large DNA structures – Global weather forecasting – N body problem (N very large) – brain simulation Has microlensing
    [Show full text]
  • Dcuda: Hardware Supported Overlap of Computation and Communication
    dCUDA: Hardware Supported Overlap of Computation and Communication Tobias Gysi Jeremia Bar¨ Torsten Hoefler Department of Computer Science Department of Computer Science Department of Computer Science ETH Zurich ETH Zurich ETH Zurich [email protected] [email protected] [email protected] Abstract—Over the last decade, CUDA and the underlying utilization of the costly compute and network hardware. To GPU hardware architecture have continuously gained popularity mitigate this problem, application developers can implement in various high-performance computing application domains manual overlap of computation and communication [23], [27]. such as climate modeling, computational chemistry, or machine learning. Despite this popularity, we lack a single coherent In particular, there exist various approaches [13], [22] to programming model for GPU clusters. We therefore introduce overlap the communication with the computation on an inner the dCUDA programming model, which implements device- domain that has no inter-node data dependencies. However, side remote memory access with target notification. To hide these code transformations significantly increase code com- instruction pipeline latencies, CUDA programs over-decompose plexity which results in reduced real-world applicability. the problem and over-subscribe the device by running many more threads than there are hardware execution units. Whenever a High-performance system design often involves trading thread stalls, the hardware scheduler immediately proceeds with off sequential performance against parallel throughput. The the execution of another thread ready for execution. This latency architectural difference between host and device processors hiding technique is key to make best use of the available hardware perfectly showcases the two extremes of this design space.
    [Show full text]
  • Gnu Compiler Collection Backend Port for the Integral Parallel Architecture
    U.P.B. Sci. Bull., Series C, Vol. 74, Iss. 3, 2012 ISSN 1454-234x GNU COMPILER COLLECTION BACKEND PORT FOR THE INTEGRAL PARALLEL ARCHITECTURE Radu HOBINCU1, Valeriu CODREANU2, Lucian PETRICĂ3 Lucrarea de față prezintă procesul de portare a compilatorului GCC oferit de către Free Software Foundation pentru arhitectura hibridă Integral Parallel Architecture, constituită dintr-un controller multithreading și o mașina vectorială SIMD. Este bine cunoscut faptul că motivul principal pentru care mașinile hibride ca și cele vectoriale sunt dificil de utilizat eficient, este programabilitatea. În această lucrare vom demonstra că folosind un compilator open-source și facilitățile de care acesta dispune, putem ușura procesul de dezvoltare software pentru aplicații complexe. This paper presents the process of porting the GCC compiler offered by the Free Software Foundation, for the hybrid Integral Parallel Architecture composed of an interleaved multithreading controller and a vectorial SIMD machine. It is well known that the main reason for which hybrid and vectorial machines are difficult to use efficiently, is programmability. In this paper we well show that by using an open-source compiler and the features it provides, we can ease the software developing process for complex applications. Keywords: integral parallel architecture, multithreading, interleaved multithreading, bubble-free embedded architecture for multithreading, compiler, GCC, backend port 1. Introduction The development of hardware technology in the last decades has required the programmers to offer support for the new features and performances of the last generation processors. This support comes as more complex compilers that have to use the machines' capabilities at their best, and more complex operating systems that need to meet the users' demand for speed, flexibility and accessibility.
    [Show full text]
  • On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the Gvirtus Framework
    On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework Montella, R., Giunta, G., Laccetti, G., Lapegna, M., Palmieri, C., Ferraro, C., Pelliccia, V., Hong, C-H., Spence, I., & Nikolopoulos, D. (2017). On the Virtualization of CUDA Based GPU Remoting on ARM and X86 Machines in the GVirtuS Framework. International Journal of Parallel Programming, 45(5), 1142-1163. https://doi.org/10.1007/s10766-016-0462-1 Published in: International Journal of Parallel Programming Document Version: Peer reviewed version Queen's University Belfast - Research Portal: Link to publication record in Queen's University Belfast Research Portal Publisher rights © 2016 Springer Verlag. The final publication is available at Springer via http://dx.doi.org/ 10.1007/s10766-016-0462-1 General rights Copyright for the publications made accessible via the Queen's University Belfast Research Portal is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The Research Portal is Queen's institutional repository that provides access to Queen's research output. Every effort has been made to ensure that content in the Research Portal does not infringe any person's rights, or applicable UK laws. If you discover content in the Research Portal that you believe breaches copyright or violates any law, please contact [email protected]. Download date:08. Oct. 2021 Noname manuscript No. (will be inserted by the editor) On the virtualization of CUDA based GPU remoting on ARM and X86 machines in the GVirtuS framework Raffaele Montella · Giulio Giunta · Giuliano Laccetti · Marco Lapegna · Carlo Palmieri · Carmine Ferraro · Valentina Pelliccia · Cheol-Ho Hong · Ivor Spence · Dimitrios S.
    [Show full text]
  • Introduction to the Linux Kernel: Challenges and Case Studies
    Introduction to the Linux kernel: challenges and case studies Juan Carlos Sáez Alcaide Department of Computer Architecture and Automation ArTeCS Group Complutense University of Madrid IV Semana de la Informática 2018 Feb 8, 2018 About Me Juan Carlos Sáez Alcaide ([email protected]) Interim Associate Professor, UCM Department of Computer Architecture and Automation Teaching: Operating Systems, Linux and Android Internals,… Member of the ArTeCS Research Group High Performance Computing Computer Architecture Interaction between system software and architecture … UCM Campus Representative of the USENIX Int’l Association Login (USENIX Magazine) IV Semana de la Informática 2018 - 2 Outline 1 Introduction 2 Main Features 3 Kernel Control Paths and Concurrency 4 Common Kernel abstractions 5 A case study: PMCTrack tool IV Semana de la Informática 2018 - 3 Outline 1 Introduction 2 Main Features 3 Kernel Control Paths and Concurrency 4 Common Kernel abstractions 5 A case study: PMCTrack tool IV Semana de la Informática 2018 - 4 Unix (I) Unics – Unix (1969) Created by Ken Thompson and rewrit- ten in “C” by Dennis Ritchie (1973) V6 (1975): Public source code (AT&T license) BSD distributions (Billy Joy) John Lion’s book on UNIX V6 Keys to success 1 Inexpensive license 2 Source code available 3 Code was simple and easy to modify 4 Ran on modest HW IV Semana de la Informática 2018 - 5 Unix (II) Unix (Cont.) V7 (1979): code can be no longer used for academic purposes Xenix (1980) Microsoft SCO Unix System III (1982) Unix System V (1983) HP-UX, IBM’s AIX, Sun’s Solaris IV Semana de la Informática 2018 - 6 Unix (III) Proyecto GNU (1983) - Richard Stallman SO GNU: Emacs, GNU compiler collection (GCC), GNU Hurd (kernel) Minix v1 (1987) - Andrew Tanenbaum Richard Stallman Minimal Unix-like OS (Unix clone) Teaching purposes.
    [Show full text]
  • HPVM: Heterogeneous Parallel Virtual Machine
    HPVM: Heterogeneous Parallel Virtual Machine Maria Kotsifakou∗ Prakalp Srivastava∗ Matthew D. Sinclair Department of Computer Science Department of Computer Science Department of Computer Science University of Illinois at University of Illinois at University of Illinois at Urbana-Champaign Urbana-Champaign Urbana-Champaign [email protected] [email protected] [email protected] Rakesh Komuravelli Vikram Adve Sarita Adve Qualcomm Technologies Inc. Department of Computer Science Department of Computer Science [email protected]. University of Illinois at University of Illinois at com Urbana-Champaign Urbana-Champaign [email protected] [email protected] Abstract hardware, and that runtime scheduling policies can make We propose a parallel program representation for heteroge- use of both program and runtime information to exploit the neous systems, designed to enable performance portability flexible compilation capabilities. Overall, we conclude that across a wide range of popular parallel hardware, including the HPVM representation is a promising basis for achieving GPUs, vector instruction sets, multicore CPUs and poten- performance portability and for implementing parallelizing tially FPGAs. Our representation, which we call HPVM, is a compilers for heterogeneous parallel systems. hierarchical dataflow graph with shared memory and vector CCS Concepts • Computer systems organization → instructions. HPVM supports three important capabilities for Heterogeneous (hybrid) systems; programming heterogeneous systems: a compiler interme- diate representation (IR), a virtual instruction set (ISA), and Keywords Virtual ISA, Compiler, Parallel IR, Heterogeneous a basis for runtime scheduling; previous systems focus on Systems, GPU, Vector SIMD only one of these capabilities. As a compiler IR, HPVM aims to enable effective code generation and optimization for het- 1 Introduction erogeneous systems.
    [Show full text]
  • Introduction to Free Software-SELF
    Introduction to Free Software Jordi Mas Hernández (coordinador) David Megías Jiménez (coordinador) Jesús M. González Barahona Joaquín Seoane Pascual Gregorio Robles XP07/M2101/02708 © FUOC • XP07/M2101/02708 Introduction to Free Software Jordi Mas Hernández David Megías Jiménez Jesús M. González Barahona Founding member of Softcatalà and Computer Science Engineer by the Professor in the Department of Tele- of the telematic network RedBBS. Universitat Autònoma de Barcelona matic Systems and Computation of He has worked as a consultant in (UAB, Spain). Master in Advanced the Rey Juan Carlos University (Ma- companies like Menta, Telépolis, Vo- Process Automatisation Techniques drid, Spain), where he coordinates dafone, Lotus, eresMas, Amena and by the UAB. PhD. in Computer Sci- the research group LibreSoft. His Terra España. ence by the UAB. Associate Profes- professional areas of interest include sor in the Computer Science, Multi- the study of free software develop- media and Telecommunication De- ment and the transfer of knowledge partment of the Universitat Oberta in this field to the industrial sector. de Catalunya (UOC, Spain) and Di- rector of the Master Programme in Free Software at the UOC. Joaquín Seoane Pascual Gregorio Robles PhD. Enigeer of Telecommunicati- Assistant professor in the Rey Juan ons in the Politechnical University Carlos University (Madrid, Spain), of Madrid (Spain). He has worked where he acquired his PhD. de- in the private sector and has al- gree in February 2006. Besides his so taught in the Computer Scien- teaching tasks, he researches free ce Faculty of that same university. software development from the Nowadays he is professor in the De- point of view of software enginee- partment of Telematic Systems En- ring, with special focus in quantitati- gineering, and has taught courses ve issues.
    [Show full text]