A Survey of Parallel Programming Languages and Tools
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Evaluation of Architectural Support for Global Address-Based
Evaluation of Architectural Supp ort for Global AddressBased Communication in LargeScale Parallel Machines y y Arvind Krishnamurthy Klaus E Schauser Chris J Scheiman Randolph Y Wang David E Culler and Katherine Yelick the sp ecic target architecture Wehave develop ed multi Abstract ple highly optimized versions of this compiler employing a Largescale parallel machines are incorp orating increas range of co degeneration strategies for machines with dedi ingly sophisticated architectural supp ort for userlevel mes cated network pro cessors In this studywe use this sp ec saging and global memory access We provide a systematic trum of runtime techniques to evaluate the p erformance evaluation of a broad sp ectrum of current design alternatives tradeos in architectural supp ort for communication found based on our implementations of a global address language in several of the current largescale parallel machines on the Thinking Machines CM Intel Paragon Meiko CS We consider ve imp ortant largescale parallel platforms Cray TD and Berkeley NOW This evaluation includes that havevarying degrees of architectural supp ort for com a range of compilation strategies that makevarying use of munication the Thinking Machines CM Intel Paragon the network pro cessor each is optimized for the target ar Meiko CS Cray TD and Berkeley NOW The CM pro chitecture and the particular strategyWe analyze a family vides direct userlevel access to the network the Paragon of interacting issues that determine the p erformance trade provides a network pro cessor -
Technical Report Aaron Councilman
Extensible Parallel Programming in ableC Aaron Councilman Department of Computer Science and Engineering University of Minnesota, Twin Cities May 23, 2019 1 Introduction There are many different manners of parallelizing code, and many different languages that provide such features. Different types of computations are best suited by different types of parallelism. Simply whether a computation is compute bound or I/O bound determines whether the computation will benefit from being run with more threads than the machine has cores, and other properties of a computation will similarly affect how it performs when run in parallel. Thus, to provide parallel programmers the ability to deliver the best performance for their programs, the ability to choose the parallel programming abstractions they use is important. The ability to combine these abstractions however they need is also important, since different parts of a program will have different performance properties, and therefore may perform best using different abstractions. Unfortunately, parallel programming languages are often designed monolithically, built as an entire language with a specific set of features. Because of this, programmer's choice of parallel programming abstractions is generally limited to the choice of the language to use. Beyond limiting the available abstracts, this also means, that the choice of abstractions must be made ahead of time, since any attempt to change the parallel programming language at a later time is likely to be be prohibitive as it may require rewriting large portions of the codebase, if not the entire codebase. Extensible programming languages can offer a solution to these problems. With an extensible compiler, the programmer chooses a base programming language and can then select the set of \extensions" for that language that best fit their needs. -
Parallel Computer Systems
Parallel Computer Systems Randal E. Bryant CS 347 Lecture 27 April 29, 1997 Topics • Parallel Applications • Shared vs. Distributed Model • Concurrency Models • Single Bus Systems • Network-Based Systems • Lessons Learned Motivation Limits to Sequential Processing • Cannot push clock rates beyond technological limits • Instruction-level parallelism gets diminishing returns – 4-way superscalar machines only get average of 1.5 instructions / cycle – Branch prediction, speculative execution, etc. yield diminishing returns Applications have Insatiable Appetite for Computing • Modeling of physical systems • Virtual reality, real-time graphics, video • Database search, data mining Many Applications can Exploit Parallelism • Work on multiple parts of problem simultaneously • Synchronize to coordinate efforts • Communicate to share information – 2 – CS 347 S’97 Historical Perspective: The Graveyard • Lots of venture capital and DoD Resarch $$’s • Too big to enumerate, but some examples … ILLIAC IV • Early research machine with overambitious technology Thinking Machines • CM-2: 64K single-bit processors with single controller (SIMD) • CM-5: Tightly coupled network of SPARC processors Encore Computer • Shared memory machine using National microprocessors Kendall Square Research KSR-1 • Shared memory machine using proprietary processor NCUBE / Intel Hypercube / Intel Paragon • Connected network of small processors • Survive only in niche markets – 3 – CS 347 S’97 Historical Perspective: Successes Shared Memory Multiprocessors (SMP’s) • E.g., SGI -
Intel® Cilk™ Plus
Overview: Programming Environment for Intel® Xeon Phi™ Coprocessor One Source Base, Tuned to many Targets Source Compilers, Libraries, Parallel Models Multicore Many-core Cluster Multicore Multicore CPU CPU Intel® MIC Multicore Multicore and Architecture Cluster Many-core Cluster Copyright© 2014, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Intel® Parallel Studio XE 2013 and Intel® Cluster Studio XE 2013 Phase Product Feature Benefit Intel® Threading design assistant • Simplifies, demystifies, and speeds Advisor XE (Studio products only) parallel application design • C/C++ and Fortran compilers • Intel® Threading Building Blocks • Enabling solution to achieve the Intel® • Intel® Cilk™ Plus application performance and Composer XE • Intel® Integrated Performance scalability benefits of multicore and Build Primitives forward scale to many-core • Intel® Math Kernel Library • Enabling High Performance Scalability, Interconnect Intel® High Performance Message Independence, Runtime Fabric † MPI Library Passing (MPI) Library Selection, and Application Tuning Capability ® Intel Performance Profiler for • Remove guesswork, saves time, VTune™ optimizing application makes it easier to find performance Amplifier XE performance and scalability and scalability bottlenecks Memory & threading dynamic • Increased productivity, code quality, ® Intel analysis for code quality and lowers cost, finds memory, Verify Inspector XE threading , and security defects & Tune Static Analysis for code quality -
Edsger W. Dijkstra: a Commemoration
Edsger W. Dijkstra: a Commemoration Krzysztof R. Apt1 and Tony Hoare2 (editors) 1 CWI, Amsterdam, The Netherlands and MIMUW, University of Warsaw, Poland 2 Department of Computer Science and Technology, University of Cambridge and Microsoft Research Ltd, Cambridge, UK Abstract This article is a multiauthored portrait of Edsger Wybe Dijkstra that consists of testimo- nials written by several friends, colleagues, and students of his. It provides unique insights into his personality, working style and habits, and his influence on other computer scientists, as a researcher, teacher, and mentor. Contents Preface 3 Tony Hoare 4 Donald Knuth 9 Christian Lengauer 11 K. Mani Chandy 13 Eric C.R. Hehner 15 Mark Scheevel 17 Krzysztof R. Apt 18 arXiv:2104.03392v1 [cs.GL] 7 Apr 2021 Niklaus Wirth 20 Lex Bijlsma 23 Manfred Broy 24 David Gries 26 Ted Herman 28 Alain J. Martin 29 J Strother Moore 31 Vladimir Lifschitz 33 Wim H. Hesselink 34 1 Hamilton Richards 36 Ken Calvert 38 David Naumann 40 David Turner 42 J.R. Rao 44 Jayadev Misra 47 Rajeev Joshi 50 Maarten van Emden 52 Two Tuesday Afternoon Clubs 54 2 Preface Edsger Dijkstra was perhaps the best known, and certainly the most discussed, computer scientist of the seventies and eighties. We both knew Dijkstra |though each of us in different ways| and we both were aware that his influence on computer science was not limited to his pioneering software projects and research articles. He interacted with his colleagues by way of numerous discussions, extensive letter correspondence, and hundreds of so-called EWD reports that he used to send to a select group of researchers. -
Glossary of Technical and Scientific Terms (French / English
Glossary of Technical and Scientific Terms (French / English) Version 11 (Alphabetical sorting) Jean-Luc JOULIN October 12, 2015 Foreword Aim of this handbook This glossary is made to help for translation of technical words english-french. This document is not made to be exhaustive but aim to understand and remember the translation of some words used in various technical domains. Words are sorted alphabetically. This glossary is distributed under the licence Creative Common (BY NC ND) and can be freely redistributed. For further informations about the licence Creative Com- mon (BY NC ND), browse the site: http://creativecommons.org If you are interested by updates of this glossary, send me an email to be on my diffusion list : • [email protected] • [email protected] Warning Translations given in this glossary are for information only and their use is under the responsability of the reader. If you see a mistake in this document, thank in advance to report it to me so i can correct it in further versions of this document. Notes Sorting of words Words are sorted by alphabetical order and are associated to their main category. Difference of terms Some words are different between the "British" and the "American" english. This glossary show the most used terms and are generally coming from the "British" english Words coming from the "American" english are indicated with the suffix : US . Difference of spelling There are some difference of spelling in accordance to the country, companies, universities, ... Words finishing by -ise, -isation and -yse can also be written with -ize, Jean-Luc JOULIN1 Glossary of Technical and Scientific Terms -ization and -yze. -
Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing
02/12/2014 Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing • The main principles of fault tolerant programming are: – Fault Detection - Knowing that a fault exists – Fault Recovery - having atomic instructions that can be rolled back in the event of a failure being detected. • System’s viewpoint it is quite possible that the fault is in the program that is attempting the recovery. • Attempting to recover from a non-existent fault can be as disastrous as a fault occurring. CA463 Lecture Notes (Martin Crane 2014) 26 1 02/12/2014 Fault Tolerant Concurrent Computing (cont’d) • Have seen replication used for tasks to allow a program to recover from a fault causing a task to abruptly terminate. • The same principle is also used at the system level to build fault tolerant systems. • Critical systems are replicated, and system action is based on a majority vote of the replicated sub systems. • This redundancy allows the system to successfully continue operating when several sub systems develop faults. CA463 Lecture Notes (Martin Crane 2014) 27 Types of Failures in Concurrent Systems • Initially dead processes ( benign fault ) – A subset of the processes never even start • Crash model ( benign fault ) – Process executes as per its local algorithm until a certain point where it stops indefinitely – Never restarts • Byzantine behaviour (malign fault ) – Algorithm may execute any possible local algorithm – May arbitrarily send/receive messages CA463 Lecture Notes (Martin Crane 2014) 28 2 02/12/2014 A Hierarchy of Failure Types • Dead process – This is a special case of crashed process – Case when the crashed process crashes before it starts executing • Crashed process – This is a special case of Byzantine process – Case when the Byzantine process crashes, and then keeps staying in that state for all future transitions CA463 Lecture Notes (Martin Crane 2014) 29 Types of Fault Tolerance Algorithms • Robust algorithms – Correct processes should continue behaving thus, despite failures. -
Actor-Based Concurrency by Srinivas Panchapakesan
Concurrency in Java and Actor- based concurrency using Scala By, Srinivas Panchapakesan Concurrency Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel. Concurrent programs can be executed sequentially on a single processor by interleaving the execution steps of each computational process, or executed in parallel by assigning each computational process to one of a set of processors that may be close or distributed across a network. The main challenges in designing concurrent programs are ensuring the correct sequencing of the interactions or communications between different computational processes, and coordinating access to resources that are shared among processes. Advantages of Concurrency Almost every computer nowadays has several CPU's or several cores within one CPU. The ability to leverage theses multi-cores can be the key for a successful high-volume application. Increased application throughput - parallel execution of a concurrent program allows the number of tasks completed in certain time period to increase. High responsiveness for input/output-intensive applications mostly wait for input or output operations to complete. Concurrent programming allows the time that would be spent waiting to be used for another task. More appropriate program structure - some problems and problem domains are well-suited to representation as concurrent tasks or processes. Process vs Threads Process: A process runs independently and isolated of other processes. It cannot directly access shared data in other processes. The resources of the process are allocated to it via the operating system, e.g. memory and CPU time. Threads: Threads are so called lightweight processes which have their own call stack but an access shared data. -
Part V Some Broad Topic
Part V Some Broad Topic Winter 2021 Parallel Processing, Some Broad Topics Slide 1 About This Presentation This presentation is intended to support the use of the textbook Introduction to Parallel Processing: Algorithms and Architectures (Plenum Press, 1999, ISBN 0-306-45970-1). It was prepared by the author in connection with teaching the graduate-level course ECE 254B: Advanced Computer Architecture: Parallel Processing, at the University of California, Santa Barbara. Instructors can use these slides in classroom teaching and for other educational purposes. Any other use is strictly prohibited. © Behrooz Parhami Edition Released Revised Revised Revised First Spring 2005 Spring 2006 Fall 2008 Fall 2010 Winter 2013 Winter 2014 Winter 2016 Winter 2019* Winter 2021* *Chapters 17-18 only Winter 2021 Parallel Processing, Some Broad Topics Slide 2 V Some Broad Topics Study topics that cut across all architectural classes: • Mapping computations onto processors (scheduling) • Ensuring that I/O can keep up with other subsystems • Storage, system, software, and reliability issues Topics in This Part Chapter 17 Emulation and Scheduling Chapter 18 Data Storage, Input, and Output Chapter 19 Reliable Parallel Processing Chapter 20 System and Software Issues Winter 2021 Parallel Processing, Some Broad Topics Slide 3 17 Emulation and Scheduling Mapping an architecture or task system onto an architecture • Learn how to achieve algorithm portability via emulation • Become familiar with task scheduling in parallel systems Topics in This Chapter 17.1 Emulations Among Architectures 17.2 Distributed Shared Memory 17.3 The Task Scheduling Problem 17.4 A Class of Scheduling Algorithms 17.5 Some Useful Bounds for Scheduling 17.6 Load Balancing and Dataflow Systems Winter 2021 Parallel Processing, Some Broad Topics Slide 4 17.1 Emulations Among Architectures Need for scheduling: a. -
Mastering Concurrent Computing Through Sequential Thinking
review articles DOI:10.1145/3363823 we do not have good tools to build ef- A 50-year history of concurrency. ficient, scalable, and reliable concur- rent systems. BY SERGIO RAJSBAUM AND MICHEL RAYNAL Concurrency was once a specialized discipline for experts, but today the chal- lenge is for the entire information tech- nology community because of two dis- ruptive phenomena: the development of Mastering networking communications, and the end of the ability to increase processors speed at an exponential rate. Increases in performance come through concur- Concurrent rency, as in multicore architectures. Concurrency is also critical to achieve fault-tolerant, distributed services, as in global databases, cloud computing, and Computing blockchain applications. Concurrent computing through sequen- tial thinking. Right from the start in the 1960s, the main way of dealing with con- through currency has been by reduction to se- quential reasoning. Transforming problems in the concurrent domain into simpler problems in the sequential Sequential domain, yields benefits for specifying, implementing, and verifying concur- rent programs. It is a two-sided strategy, together with a bridge connecting the Thinking two sides. First, a sequential specificationof an object (or service) that can be ac- key insights ˽ A main way of dealing with the enormous challenges of building concurrent systems is by reduction to sequential I must appeal to the patience of the wondering readers, thinking. Over more than 50 years, more sophisticated techniques have been suffering as I am from the sequential nature of human developed to build complex systems in communication. this way. 12 ˽ The strategy starts by designing —E.W. -
An Introduction to Parallel Programming
In Praise of An Introduction to Parallel Programming With the coming of multicore processors and the cloud, parallel computing is most cer- tainly not a niche area off in a corner of the computing world. Parallelism has become central to the efficient use of resources, and this new textbook by Peter Pacheco will go a long way toward introducing students early in their academic careers to both the art and practice of parallel computing. Duncan Buell Department of Computer Science and Engineering University of South Carolina An Introduction to Parallel Programming illustrates fundamental programming principles in the increasingly important area of shared-memory programming using Pthreads and OpenMP and distributed-memory programming using MPI. More important, it empha- sizes good programming practices by indicating potential performance pitfalls. These topics are presented in the context of a variety of disciplines, including computer science, physics, and mathematics. The chapters include numerous programming exercises that range from easy to very challenging. This is an ideal book for students or professionals looking to learn parallel programming skills or to refresh their knowledge. Leigh Little Department of Computational Science The College at Brockport, The State University of New York An Introduction to Parallel Programming is a well-written, comprehensive book on the field of parallel computing. Students and practitioners alike will appreciate the rele- vant, up-to-date information. Peter Pacheco’s very accessible writing style, combined with numerous interesting examples, keeps the reader’s attention. In a field that races forward at a dizzying pace, this book hangs on for the wild ride covering the ins and outs of parallel hardware and software. -
Concurrent Computing
Concurrent Computing CSCI 201 Principles of Software Development Jeffrey Miller, Ph.D. [email protected] Outline • Threads • Multi-Threaded Code • CPU Scheduling • Program USC CSCI 201L Thread Overview ▪ Looking at the Windows taskbar, you will notice more than one process running concurrently ▪ Looking at the Windows task manager, you’ll see a number of additional processes running ▪ But how many processes are actually running at any one time? • Multiple processors, multiple cores, and hyper-threading will determine the actual number of processes running • Threads USC CSCI 201L 3/22 Redundant Hardware Multiple Processors Single-Core CPU ALU Cache Registers Multiple Cores ALU 1 Registers 1 Cache Hyper-threading has firmware logic that appears to the OS ALU 2 to contain more than one core when in fact only one physical core exists Registers 2 4/22 Programs vs Processes vs Threads ▪ Programs › An executable file residing in memory, typically in secondary storage ▪ Processes › An executing instance of a program › Sometimes this is referred to as a Task ▪ Threads › Any section of code within a process that can execute at what appears to be simultaneously › Shares the same resources allotted to the process by the OS kernel USC CSCI 201L 5/22 Concurrent Programming ▪ Multi-Threaded Programming › Multiple sections of a process or multiple processes executing at what appears to be simultaneously in a single core › The purpose of multi-threaded programming is to add functionality to your program ▪ Parallel Programming › Multiple sections of