Lessons Learned: Contacts: [email protected], Patrick.Viry

Total Page:16

File Type:pdf, Size:1020Kb

Lessons Learned: Contacts: Mathias.Beck@Unige.Ch, Patrick.Viry Running Java on GPUs An ESA Innovation Triangle Initiative with: • ESA’s Gaia mission • Observatoire de Genève • Ateji All Gaia code is in Java Introducing OpenCL or CUDA in the development process would be a huge disruption Ateji PX embeds parallelism in Java Multi-core parallelism already available, GPU target developed during this project →Port compute-intensive period search algorithms to the GPU Deeming, Lomb-Scargle, String-Length How it works: • Source-to-source translation from fragments of Java to OpenCL • Computation locality and communications expressed with Ateji PX extensions 1. Express parallelism with || Benchmarks: for(Obs o : observations) ... Becomes || (Obs o : observations) ... ► Computation now uses all available cores. 2. Express distribution to GPU with # annotations || (#OpenCL(), Obs o : observs) // these branches run on GPU || (...) // these branches run on CPU ► Mix of multicore and GPU computation 3. Express communication with message passing Channel c = ...; [ Lessons learned: || c ! value; || c ? value; • Parallelism easy to introduce in the code ] • No disruption in the Java devt. process ► Overlap of computation and communication • Pretty good speed-up on multi-cores • Similar perf. figures for GPU and 16-core GPU 4. Map the GPU memory hierarchy to lexical scope • But depend heavily on the hardware • Still a lot of room for improvement • Code needs refactoring for performance on GPU, such as loop pipelining and workgroup sizing • Even though we code in Java, detailed knowledge of GPU architecture is required • Performance / price ratio on multi-core vs. CPU • Not decisive • But hardware is evolving rapidly Contacts: [email protected], [email protected] Downloads & Documentation: www.ateji.com/px .
Recommended publications
  • Multiprocessing Contents
    Multiprocessing Contents 1 Multiprocessing 1 1.1 Pre-history .............................................. 1 1.2 Key topics ............................................... 1 1.2.1 Processor symmetry ...................................... 1 1.2.2 Instruction and data streams ................................. 1 1.2.3 Processor coupling ...................................... 2 1.2.4 Multiprocessor Communication Architecture ......................... 2 1.3 Flynn’s taxonomy ........................................... 2 1.3.1 SISD multiprocessing ..................................... 2 1.3.2 SIMD multiprocessing .................................... 2 1.3.3 MISD multiprocessing .................................... 3 1.3.4 MIMD multiprocessing .................................... 3 1.4 See also ................................................ 3 1.5 References ............................................... 3 2 Computer multitasking 5 2.1 Multiprogramming .......................................... 5 2.2 Cooperative multitasking ....................................... 6 2.3 Preemptive multitasking ....................................... 6 2.4 Real time ............................................... 7 2.5 Multithreading ............................................ 7 2.6 Memory protection .......................................... 7 2.7 Memory swapping .......................................... 7 2.8 Programming ............................................. 7 2.9 See also ................................................ 8 2.10 References .............................................
    [Show full text]
  • Supporting Concurrency Abstractions in High-Level Language Virtual Machines
    Faculty of Science and Bio-Engineering Sciences Department of Computer Science Software Languages Lab Supporting Concurrency Abstractions in High-level Language Virtual Machines Dissertation Submitted for the Degree of Doctor of Philosophy in Sciences Stefan Marr Promotor: Prof. Dr. Theo D’Hondt Copromotor: Dr. Michael Haupt January 2013 Print: Silhouet, Maldegem © 2013 Stefan Marr 2013 Uitgeverij VUBPRESS Brussels University Press VUBPRESS is an imprint of ASP nv (Academic and Scientific Publishers nv) Ravensteingalerij 28 B-1000 Brussels Tel. +32 (0)2 289 26 50 Fax +32 (0)2 289 26 59 E-mail: [email protected] www.vubpress.be ISBN 978 90 5718 256 3 NUR 989 Legal deposit D/2013/11.161/010 All rights reserved. No parts of this book may be reproduced or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the author. DON’T PANIC ABSTRACT During the past decade, software developers widely adopted JVM and CLI as multi-language virtual machines (VMs). At the same time, the multicore revolution burdened developers with increasing complexity. Language im- plementers devised a wide range of concurrent and parallel programming concepts to address this complexity but struggle to build these concepts on top of common multi-language VMs. Missing support in these VMs leads to tradeoffs between implementation simplicity, correctly implemented lan- guage semantics, and performance guarantees. Departing from the traditional distinction between concurrency and paral- lelism, this dissertation finds that parallel programming concepts benefit from performance-related VM support, while concurrent programming concepts benefit from VM support that guarantees correct semantics in the presence of reflection, mutable state, and interaction with other languages and libraries.
    [Show full text]
  • Comparative Programming Languages CM20253
    We have briefly covered many aspects of language design And there are many more factors we could talk about in making choices of language The End There are many languages out there, both general purpose and specialist And there are many more factors we could talk about in making choices of language The End There are many languages out there, both general purpose and specialist We have briefly covered many aspects of language design The End There are many languages out there, both general purpose and specialist We have briefly covered many aspects of language design And there are many more factors we could talk about in making choices of language Often a single project can use several languages, each suited to its part of the project And then the interopability of languages becomes important For example, can you easily join together code written in Java and C? The End Or languages And then the interopability of languages becomes important For example, can you easily join together code written in Java and C? The End Or languages Often a single project can use several languages, each suited to its part of the project For example, can you easily join together code written in Java and C? The End Or languages Often a single project can use several languages, each suited to its part of the project And then the interopability of languages becomes important The End Or languages Often a single project can use several languages, each suited to its part of the project And then the interopability of languages becomes important For example, can you easily
    [Show full text]
  • Suporte À Computação Paralela E Distribuída Em Java: API E Comunicação Entre Nós Cooperantes
    Suporte à Computação Paralela e Distribuída em Java: API e Comunicação entre Nós Cooperantes Rui Miguel Brandão Rodrigues Dissertação para obtenção do Grau de Mestre em Engenharia Informática, Área de Especialização em Arquitetura, Sistemas e Redes Orientador: Prof. Doutor Luís Lino Ferreira Coorientador: Eng.º Cláudio Maia Júri: Presidente: Doutora Maria de Fátima Coutinho Rodrigues, ISEP Vogais: Doutor Jorge Manuel Neves Coelho, ISEP Doutor Luís Miguel Moreira Lino Ferreira, ISEP Eng.º Cláudio Roberto Ribeiro Maia, ISEP Porto, Outubro 2012 ii Aos meus pais... iii iv Resumo Este trabalho é uma parte do tema global “Suporte à Computação Paralela e Distribuída em Java”, também tema da tese de Daniel Barciela no mestrado de Engenharia Informática do Instituto Superior de Engenharia do Porto. O seu objetivo principal consiste na definição/criação da interface com o programador, assim como também abrange a forma como os nós comunicam e cooperam entre si para a execução de determinadas tarefas, de modo a atingirem um único objetivo global. No âmbito desta dissertação foi realizado um estudo prévio relativamente aos modelos teóricos referentes à computação paralela, assim como também foram analisadas linguagens e frameworks que fornecem suporte a este mesmo tipo de computação. Este estudo teve como principal objetivo a análise da forma como estes modelos e linguagens permitem ao programador expressar o processamento paralelo no desenvolvimento das aplicações. Como resultado desta dissertação surgiu a framework denominada Distributed Parallel Framework for Java (DPF4j), cujo objetivo principal é fornecer aos programadores o suporte para o desenvolvimento de aplicações paralelas e distribuídas. Esta framework foi desenvolvida na linguagem Java.
    [Show full text]
  • Group-Based Parallel Multi-Scheduling Methods for Grid Computing
    Group-Based Parallel Multi-scheduling Methods for Grid Computing Abraham, G. T. Submitted version deposited in Coventry University’s Institutional Repository Original citation: Abraham, G. T. (2016) Group-Based Parallel Multi-scheduling Methods for Grid Computing. Unpublished PhD Thesis. Coventry: Coventry University Copyright © and Moral Rights are retained by the author. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This item cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder(s). The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. Some materials have been removed from this thesis due to third party. Pages where material has been removed are clearly marked in the electronic version. The unabridged version of the thesis can be viewed at the Lanchester Library, Coventry University. Group-Based Parallel Multi-scheduling Methods for Grid Computing Goodhead Tomvie Abraham June 2016 A thesis submitted in partial fulfilment of the University’s requirements for the degree of Doctor of Philosophy Faculty of Engineering and Computing Coventry University Group-Based Parallel Multi-scheduling Methods for Grid Computing Abstract With the advent in multicore computers, the scheduling of Grid jobs can be made more effective if scaled to fully utilize the underlying hardware and parallelized to benefit from the exploitation of multicores. The fact that sequential algorithms do not scale with multicore systems nor benefit from parallelism remains a major challenge to scheduling in the Grid. As multicore systems become ever more pervasive in our computing lives, over reliance on such systems for passive parallelism does not offer the best option in harnessing the benefits of their multiprocessors for Grid scheduling.
    [Show full text]
  • Parallel Version of Tree Based Association Rule Mining Algorithm
    International Journal of Scientific and Research Publications, Volume 3, Issue 12, December 2013 1 ISSN 2250-3153 Parallel version of tree based association rule Mining Algorithm *Jadhav Arvind Uttamrao Department of Computer Science, P.E.S.C.O.E. Aurangabad. MH-India Abstract- Tree Based Association Rule Mining (TBAR) TBAR stores the v>, which 'a' is an attribute, 'v ' is the value of algorithm decreases the number of data base scans during 'a'. If the value of 'a' in tuple t is 'v', then t contains pair <a: v >. frequent item set mining to improve association rule mining R. Agrawal [2] introduced three algorithms for parallelization of process. In this paper, we present parallel version of TBAR association rule mining algorithm. He divided the association algorithm. Parallel version of TBAR algorithm decreases the rule mining task in two sub parts for parallelization. In first part time required for association rule mining process by making he considered frequent itemset generation. For this part he gave candidate item set generation, frequent item set generation and three parallelization techniques. And second part association rule association rule generation process parallel. generation in this part he simply distributed the generated frequent itemsets among multiple processors. Keywords - Data mining, Association rules mining, Tree Data Structure, Parallel Programming, Scala, Ateji PX. The rest of the paper is organized as follows. In Section II, we will see the work in the field of association rule mining and the attempts made by researchers to improve association rule mining I. INTRODUCTION algorithm’s performance. In Section III, we will see the details of ssociation rule mining[1] is to find out association rules that original TBAR algorithm.
    [Show full text]
  • Facilitating High Performance Code Parallelization
    Syracuse University SURFACE Dissertations - ALL SURFACE 5-14-2017 Facilitating High Performance Code Parallelization Maria Abi Saad Syracuse University Follow this and additional works at: https://surface.syr.edu/etd Part of the Engineering Commons Recommended Citation Abi Saad, Maria, "Facilitating High Performance Code Parallelization" (2017). Dissertations - ALL. 712. https://surface.syr.edu/etd/712 This Dissertation is brought to you for free and open access by the SURFACE at SURFACE. It has been accepted for inclusion in Dissertations - ALL by an authorized administrator of SURFACE. For more information, please contact [email protected]. Abstract With the surge of social media on one hand and the ease of obtaining information due to cheap sensing devices and open source APIs on the other hand, the amount of data that can be processed is as well vastly increasing. In addition, the world of computing has recently been witnessing a growing shift towards massively parallel distributed systems due to the increasing importance of transforming data into knowledge in today’s data-driven world. At the core of data analysis for all sorts of applications lies pattern matching. Therefore, parallelizing pattern matching algorithms should be made efficient in order to cater to this ever-increasing abundance of data. We propose a method that automatically detects a user’s single threaded function call to search for a pattern using Java’s standard regular expression library, and replaces it with our own data parallel implementation using Java bytecode injection. Our approach facilitates parallel processing on different platforms consisting of shared memory systems (using multithreading and NVIDIA GPUs) and distributed systems (using MPI and Hadoop).
    [Show full text]
  • Grid Computing
    Grid computing This article needs additional citations for verification. Please help improve this article by adding reliable references. Unsourced material may be challenged and removed. (July 2010) This article needs attention from an expert on the subject. See the talk page for details. Consider associating this request with a WikiProject. (July 2010) Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files. What distinguishes grid computing from conventional high performance computing systems such as cluster computing is that grids tend to be more loosely coupled, heterogeneous, and geographically dispersed. Although a grid can be dedicated to a specialized application, it is more common that a single grid will be used for a variety of different purposes. Grids are often constructed with the aid of general-purpose grid software libraries known as middleware. Grid size can vary by a considerable amount. Grids are a form of distributed computing whereby a ―super virtual computer‖ is composed of many networked loosely coupled computers acting together to perform very large tasks. Furthermore, ―distributed‖ or ―grid‖ computing, in general, is a special type of parallel computing that relies on complete computers (with onboard CPUs, storage, power supplies, network interfaces, etc.) connected to a network (private, public or the Internet) by a conventional network interface, such as Ethernet. This is in contrast to the traditional notion of a supercomputer, which has many processors connected by a local high- speed computer bus.
    [Show full text]
  • Domain-Specific Modelling for Coordination Engineering
    Domain-Specific Modelling for Coordination Engineering Dipl.-Inform. Stefan Gudenkauf Dissertation zur Erlangung des akademischen Grades Doktor der Ingenieurwissenschaften (Dr.-Ing.) der Technischen Fakultät der Christian-Albrechts-Universität zu Kiel eingereicht im Jahr 2012 Kiel Computer Science Series (KCSS) 2012/2 v1.0 dated 2012-09-16 ISSN 2193-6781 (print version) ISSN 2194-6639 (electronic version) Electronic version, updates, errata available via https://www.informatik.uni-kiel.de/kcss Published by the Department of Computer Science, Christian-Albrechts-Universität zu Kiel Software Engineering Group Please cite as: Stefan Gudenkauf. Domain-Specific Modelling for Coordination Engineering Number 2012/2 in Kiel Computer Science Series. Department of Computer Science, 2012. Dissertation, Faculty of Engineering, Christian-Albrechts-Universität zu Kiel. @book{Gudenkauf2012, author = {Stefan Gudenkauf}, title = {Domain-Specific Modelling for Coordination Engineering}, publisher = {Department of Computer Science, CAU Kiel}, year = {2012}, number = {2012-2}, series = {Kiel Computer Science Series}, note = {Dissertation, Faculty of Engineering, Christian-Albrechts-Universit\"at zu Kiel} } © 2012 by Stefan Gudenkauf ii About this Series The Kiel Computer Science Series (KCSS) covers dissertations, habilita- tion theses, lecture notes, textbooks, surveys, collections, handbooks, etc. written at the Department of Computer Science at the Christian-Albrechts- Universität zu Kiel. It was initiated in 2011 to support authors in the dissemination of their work in electronic and printed form, without restrict- ing their rights to their work. The series provides a unified appearance and aims at high-quality typography. The KCSS is an open access series; all series titles are electronically available free of charge at the department’s website. In addition, authors are encouraged to make printed copies avail- able at a reasonable price, typically with a print-on-demand service.
    [Show full text]
  • Identifying a Unifying Mechanism for the Implementation of Concurrency Abstractions on Multi-Language Virtual Machines
    Identifying A Unifying Mechanism for the Implementation of Concurrency Abstractions on Multi-Language Virtual Machines Stefan Marr and Theo D'Hondt Software Languages Lab, Vrije Universiteit Brussel Pleinlaan 2, 1050 Elsene, Belgium fstefan.marr, [email protected] Abstract. Supporting all known abstractions for concurrent and paral- lel programming in a virtual machines (VM) is a futile undertaking, but it is required to give programmers appropriate tools and performance. In- stead of supporting all abstractions directly, VMs need a unifying mech- anism similar to INVOKEDYNAMIC for JVMs. Our survey of parallel and concurrent programming concepts identifies concurrency abstractions as the ones benefiting most from support in a VM. Currently, their semantics is often weakened, reducing their engi- neering benefits. They require a mechanism to define flexible language guarantees. Based on this survey, we define an ownership-based meta-object proto- col as candidate for VM support. We demonstrate its expressiveness by implementing actor semantics, software transactional memory, agents, CSP, and active objects. While the performance of our prototype con- firms the need for VM support, it also shows that the chosen mechanism is appropriate to express a wide range of concurrency abstractions in a unified way. Keywords: Virtual Machines, Language Support, Abstraction, Paral- lelism, Concurrency 1 The Right Tool for the Job Implementing parallel and concurrent systems has been argued to be a complex undertaking that requires the right tools for the job, perhaps more than other problems software engineering encountered so far. Instead of searching for a non-existing silver bullet approach, we argue that language designers need to be supported in building domain-specific concurrency abstractions.
    [Show full text]
  • Hipeac Info Is a Quarterly Newsletter Published by the Hipeac Network of Excellence, 24 Funded by the 7Th European Framework Programme (FP7) Under Contract No
    info24 Compilation appears quarterly | October 2010 Architecture and Embedded High Performance and Network of Excellence on 2 Message from the HiPEAC Coordinator 3 Message from the Project Officer 4 HiPEAC Activities: 4 - PUMPS Summer School 5 - HiPEAC Goes Across the Pond to DAC 2 HiPEAC News: 2 - HiPEAC Management Staff Change 3 - Mateo Valero Awarded Honorary Doctoral Degree by the Universidad Veracruzana, Mexico 5 - Tom Crick Takes Over HiPEAC Newsletter Proofreading Role 6 - Habilitation Thesis Defence of Sid Touati on Backend Code Optimization 7 - Collaboration on the Adaptation of Numerical Libraries to Graphics Processors for the Windows OS Welcome to 8 - Lieven Eeckhout Awarded an ERC Starting Independent Researcher Grant Computing 8 - ComputerSciencePodcast.com - A Computer Science Podcast 4 HiPEAC Announce: System Week 4 - Computer Architecture Performance Evaluation Methods 6 - Barcelona Multicore Workshop 2010 in Barcelona, 7 - Handbook of Signal Processing Systems 11 In the Spotlight: Spain, October 11 - FP7 EURETILE Project 12 - FP7 PEPPHER Project 19-22 13 - An Update on MILEPOST 16 - (Open-People) ANR Project 17 - 9 Million Euros for Invasive Computing 10 HiPEAC Start-ups 9 HiPEAC Students and Trip Reports 18 PhD News 20 Upcoming Events www.HiPEAC.net HiPEAC’11 Conference, 24-26 January 2011, Heraklion, Greece http://www.hipeac.net/hipeac2011 Intro Message from the HiPEAC Coordinator Koen De Bosschere Dear friends, I hope all of you have enjoyed relaxing summer holidays and that you are ready to tackle the challenges of the coming year. This summer, the international news was overshadowed by big items like the massive oil spill in the Mexican Gulf, and the huge flooding in Pakistan.
    [Show full text]
  • Sebuah Kajian Pustaka
    Institute of Advanced Engineering and Science International Journal of Information & Network Security (IJINS) Vol.3, No.3, June 2014, pp. 195~211 ISSN: 2089-3299 195 Arithmetic Deduction Model for High Performance Computing: A Comparative Exploration of Computational Models Paradigms Patrick Mukala Departement of Computer Science, University of Pisa Article Info ABSTRACT Article history: A myriad of applications ranging from engineering and scientific simulations, image and signal processing as well as high-sensitive data th Received Jun 12 , 201x retrieval demand high processing power reaching up to teraflops for their Revised Aug 20th, 201x efficient execution. While a standard serial computer would require clock- Accepted Aug 26th, 201x cycles of less than one per second in this instance, parallel computing is a viable alternative. In adopting parallelism, multiple architectural models such as the PRAM, BSP and DataFlow Models have been proposed and Keyword: implemented with some limitations due to a number of factors. Perhaps one of the predominant causes is the presence of sequential execution at some Arithmetic Deduction Model extent in these models. This status has trigged the need for improved Models of Computation alternatives. Hence, the Arithmetic Deduction Model has been introduced Parallel Processing and its peculiarity can be seen through its use of natural arithmetic concepts von Neumann Model to perform computation, and the remarkable substitution or elimination of Parallel Computing dependency on variables and states in distributed data processing. Although some initial results about its performance have been published, it is critical to contextualize its genesis. Hence, in this paper we explore the importance of high performance computing and conduct a comparative study of some models of computation in terms of their strenghs and limitations and accordingly highlight the need for a new model of computation.
    [Show full text]