Irregular Computations in Fortran Expression and Implementation Strategies
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The State of the Chapel Union
The State of the Chapel Union Bradford L. Chamberlain, Sung-Eun Choi, Martha Dumler Thomas Hildebrandt, David Iten, Vassily Litvinov, Greg Titus Cray Inc. Seattle, WA 98164 chapel [email protected] Abstract—Chapel is an emerging parallel programming Cray’s entry in the DARPA HPCS program concluded language that originated under the DARPA High Productivity successfully in the fall of 2012, culminating with the launch Computing Systems (HPCS) program. Although the HPCS pro- of the Cray XC30TM system. Even though the HPCS pro- gram is now complete, the Chapel language and project remain very much alive and well. Under the HCPS program, Chapel gram has completed, the Chapel project remains active and generated sufficient interest among HPC user communities to growing. This paper’s goal is to present a snapshot of Chapel warrant continuing its evolution and development over the next as it stands today at the juncture between HPCS and the several years. In this paper, we reflect on the progress that was next phase of Chapel’s development: to review the Chapel made with Chapel under the auspices of the HPCS program, project, describe its status, summarize its successes and noting key decisions made during the project’s history. We also summarize the current state of Chapel for programmers who lessons learned, and sketch out the next few years of the are interested in using it today. And finally, we describe current language’s life cycle. Generally speaking, this paper reports and ongoing work to evolve it from prototype to production- on Chapel as of version 1.7.01, released on April 18th, 2013. -
Parallel Languages & CUDA
CS 470 Spring 2019 Mike Lam, Professor Parallel Languages & CUDA Graphics and content taken from the following: http://dl.acm.org/citation.cfm?id=2716320 http://chapel.cray.com/papers/BriefOverviewChapel.pdf http://arxiv.org/pdf/1411.1607v4.pdf https://en.wikipedia.org/wiki/Cilk Parallel languages ● Writing efficient parallel code is hard ● We've covered two generic paradigms ... – Shared-memory – Distributed message-passing ● … and three specific technologies (but all in C!) – Pthreads – OpenMP – MPI ● Can we make parallelism easier by changing our language? – Similarly: Can we improve programmer productivity? Productivity Output ● Productivity= Economic definition: Input ● What does this mean for parallel programming? – How do you measure input? ● Bad idea: size of programming team ● "The Mythical Man Month" by Frederick Brooks – How do you measure output? ● Bad idea: lines of code Productivity vs. Performance ● General idea: Produce better code faster – Better can mean a variety of things: speed, robustness, etc. – Faster generally means time/personnel investment ● Problem: productivity often trades off with performance – E.g., Python vs. C or Matlab vs. Fortran – E.g., garbage collection or thread management Why? Complexity ● Core issue: handling complexity ● Tradeoff: developer effort vs. system effort – Hiding complexity from the developer increases the complexity of the system – Higher burden on compiler and runtime systems – Implicit features cause unpredictable interactions – More middleware increases chance of interference and -
Parallel Programming Models
Lecture 3 Parallel Programming Models Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline • Goal: Understand parallel programming models – Programming Models – Application Models – Parallel Computing Frameworks • Open-MP • MPI • CUDA • Map-Reduce – Parallel Computing Libraries ICOM 6025: High Performance Computing 2 Parallel Programming Models • A parallel programming model represents an abstraction of the underlying hardware or computer system. – Programming models are not specific to a particular type of memory architecture. – In fact, any programming model can be in theory implemented on any underlying hardware. ICOM 6025: High Performance Computing 3 Programming Language Models • Threads Model • Data Parallel Model • Task Parallel Model • Message Passing Model • Partitioned Global Address Space Model ICOM 6025: High Performance Computing 4 Threads Model • A single process can have multiple, concurrent execution paths – The thread based model requires the programmer to manually manage synchronization, load balancing, and locality, • Deadlocks • Race conditions • Implementations – POSIX Threads • specified by the IEEE POSIX 1003.1c standard (1995) and commonly referred to as Pthreads, • Library-based explicit parallelism model. – OpenMP • Compiler directive-based model that supports parallel programming in C/C++ and FORTRAN. ICOM 6025: High Performance Computing 5 Data Parallel Model • Focus on data partition • Each task performs the same operation -
High Performance Computing Using MPI and Openmp on Multi-Core
Parallel Computing 37 (2011) 562–575 Contents lists available at ScienceDirect Parallel Computing journal homepage: www.elsevier.com/locate/parco High performance computing using MPI and OpenMP on multi-core parallel systems ⇑ Haoqiang Jin a, , Dennis Jespersen a, Piyush Mehrotra a, Rupak Biswas a, Lei Huang b, Barbara Chapman b a NAS Division, NASA Ames Research Center, Moffett Field, CA 94035, United States b Department of Computer Sciences, University of Houston, Houston, TX 77004, United States article info abstract Article history: The rapidly increasing number of cores in modern microprocessors is pushing the current Available online 2 March 2011 high performance computing (HPC) systems into the petascale and exascale era. The hybrid nature of these systems – distributed memory across nodes and shared memory with non- Keywords: uniform memory access within each node – poses a challenge to application developers. In Hybrid MPI + OpenMP programming this paper, we study a hybrid approach to programming such systems – a combination of Multi-core Systems two traditional programming models, MPI and OpenMP. We present the performance of OpenMP Extensions standard benchmarks from the multi-zone NAS Parallel Benchmarks and two full applica- Data Locality tions using this approach on several multi-core based systems including an SGI Altix 4700, an IBM p575+ and an SGI Altix ICE 8200EX. We also present new data locality extensions to OpenMP to better match the hierarchical memory structure of multi-core architectures. Published by Elsevier B.V. 1. Introduction Multiple computing cores are becoming ubiquitous in commodity microprocessor chips, allowing the chip manufacturers to achieve high aggregate performance while avoiding the power demands of increased clock speeds. -
Single-Sided Pgas Communications Libraries
SINGLE-SIDED PGAS COMMUNICATIONS LIBRARIES Parallel Programming Languages and Approaches Parallel Programming Languages 2 Contents • A Little Bit of History • Non-Parallel Programming Languages • Early Parallel Languages • Current Status of Parallel Programming • Parallelisation Strategies • Mainstream HPC • Alternative Parallel Programming Languages • Single-Sided Communication • PGAS • Final Remarks and Summary Parallel Programming Languages 3 Non-Parallel Programming Languages • Serial languages important • General scientific computing Fortran • Basis for parallel languages C • PRACE Survey results: C++ Python Perl Other Java Chapel Co-array Fortran 0 50 100 150 200 250 Response Count • PRACE Survey indicates that nearly all applications are written in: • Fortran: well suited for scientific computing • C/C++: allows good access to hardware • Supplemented by • Scripts using Python, PERL and BASH • PGAS languages starting to be used Parallel Programming Languages 4 Data Parallel • Processors perform similar operations across elements in an array • Higher level programming paradigm, characterised by: • single-threaded control • global name space • loosely synchronous processes • parallelism implied by operations applied to data • compiler directives • Data parallel languages: generally serial language (e.g., F90) plus • compiler directives (e.g., for data distribution) • first class language constructs to support parallelism • new intrinsics and library functions • Paradigm well suited to a number of early (SIMD) parallel computers • Connection -
(Hpc) Through Parallel Programming Paradigms and Their Principles
International Journal of Programming Languages and Applications ( IJPLA ) Vol.4, No.1, January 2014 TOWARDS HIGH PERFORMANCE COMPUTING (HPC) THROUGH PARALLEL PROGRAMMING PARADIGMS AND THEIR PRINCIPLES Dr. Brijender Kahanwal Department of Computer Science & Engineering, Galaxy Global Group of Institutions, Dinarpur, Ambala, Haryana, India ABSTRACT Nowadays, we are to find out solutions to huge computing problems very rapidly. It brings the idea of parallel computing in which several machines or processors work cooperatively for computational tasks. In the past decades, there are a lot of variations in perceiving the importance of parallelism in computing machines. And it is observed that the parallel computing is a superior solution to many of the computing limitations like speed and density; non-recurring and high cost; and power consumption and heat dissipation etc. The commercial multiprocessors have emerged with lower prices than the mainframe machines and supercomputers machines. In this article the high performance computing (HPC) through parallel programming paradigms (PPPs) are discussed with their constructs and design approaches. KEYWORDS Parallel programming languages, parallel programming constructs, distributed computing, high performance computing 1. INTRODUCTION The numerous computational concentrated tasks of the computer science like weather forecast, climate research, the exploration of oil and gas, molecular modelling, quantum mechanics, and physical simulations are performed by the supercomputers as well as mainframe computer. But these days due to the advancements in the technology multiprocessors systems or multi-core processor systems are going to be resembled to perform such type of computations which are performed by the supercomputing machines. Due to the recent advances in the hardware technologies, we are leaving the von Neumann computation model and adopting the distributed computing models which have peer-to-peer (P2P), cluster, cloud, grid, and jungle computing models in it [1]. -
Workshop on Programming Languages for High Performance Computing (HPCWPL) Final Report
SANDIA REPORT SAND2007-XXXX Unlimited Release Printed Workshop on Programming Languages for High Performance Computing (HPCWPL) Final Report Prepared by Sandia National Laboratories Albuquerque, New Mexico 87185 and Livermore, California 94550 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under Contract DE-AC04-94-AL85000. Approved for public release; further dissemination unlimited. Issued by Sandia National Laboratories, operated for the United States Department of Energy by Sandia Corporation. NOTICE: This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government, nor any agency thereof, nor any of their employees, nor any of their contractors, subcontractors, or their employees, make any warranty, express or implied, or assume any legal liability or responsibility for the accuracy, completeness, or usefulness of any infor- mation, apparatus, product, or process disclosed, or represent that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recom- mendation, or favoring by the United States Government, any agency thereof, or any of their contractors or subcontractors. The views and opinions expressed herein do not necessarily state or reflect those of the United States Government, any agency thereof, or any of their contractors. Printed in the United States of America. This report has been reproduced directly from the best available copy. Available to DOE and DOE contractors from U.S. -
A Brief Overview of Chapel
ABRIEF OVERVIEW OF CHAPEL1 (PRE-PRINT OF AN UPCOMING BOOK CHAPTER) Bradford L. Chamberlain, Cray Inc. January 2013 revision 1.0 1This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0001. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the Defense Advanced Research Projects Agency. Chapter 9 Chapel Bradford L. Chamberlain, Cray Inc. Chapel is an emerging parallel language whose design and development has been led by Cray Inc. under the DARPA High Productivity Computing Systems program (HPCS) from 2003 to the present. Chapel supports a multithreaded execution model, permitting the expression of far more general and dynamic styles of computation than the typi- cal single-threaded Single Program, Multiple Data (SPMD) programming models that became dominant during the 1990’s. Chapel was designed such that higher-level abstractions, such as those supporting data parallelism, would be built in terms of lower-level concepts in the language, permitting the user to select amongst differing levels of abstraction and control as required by their algorithm or performance requirements. This chapter provides a brief introduction to Chapel, starting with a condensed history of the project (Section 9.1). It then describes Chapel’s mo- tivating themes (Section 9.2), followed by a survey of its main features (Section 9.3), and a summary of the project’s status and future work (Chapter 9.4). 9.1 A Brief History of Chapel 9.1.1 Inception DARPA’s HPCS program was launched in 2002 with five teams, each led by a hardware vendor: Cray Inc., Hewlett- Packard, IBM, SGI, and Sun. -
Array Optimizations for High Productivity Programming Languages
RICE UNIVERSITY Array Optimizations for High Productivity Programming Languages by Mackale Joyner A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy Approved, Thesis Committee: Vivek Sarkar , Co-Chair E.D. Butcher Professor of Computer Science Zoran Budimli´c,Co-Chair Research Scientist Keith Cooper L. John and Ann H. Doerr Professor of Computer Engineering John Mellor-Crummey Professor of Computer Science Richard Tapia University Professor Maxfield-Oshman Professor in Engineering Houston, Texas September, 2008 ABSTRACT Array Optimizations for High Productivity Programming Languages by Mackale Joyner While the HPCS languages (Chapel, Fortress and X10) have introduced improve- ments in programmer productivity, several challenges still remain in delivering high performance. In the absence of optimization, the high-level language constructs that improve productivity can result in order-of-magnitude runtime performance degrada- tions. This dissertation addresses the problem of efficient code generation for high-level array accesses in the X10 language. The X10 language supports rank-independent specification of loop and array computations using regions and points. Three as- pects of high-level array accesses in X10 are important for productivity but also pose significant performance challenges: high-level accesses are performed through Point objects rather than integer indices, variables containing references to arrays are rank- independent, and array subscripts are verified as legal array indices during runtime program execution. Our solution to the first challenge is to introduce new analyses and transforma- tions that enable automatic inlining and scalar replacement of Point objects. Our solution to the second challenge is a hybrid approach. We use an interprocedural rank analysis algorithm to automatically infer ranks of arrays in X10.