Selecting the Right Parallel Programming Model

Selecting the Right Parallel Programming Model perspective and commitment James Reinders, Intel Corp. 1 © Intel 2012, All Rights Reserved Selecting the Right Parallel Programming Model TACC‐Intel Highly Parallel Computing Symposium April 10th‐11th, 2012 Austin, TX 8:30‐10:00am, Tuesday April 10, 2012 Invited Talks: • "Selecting the right Parallel Programming Model: Intel's perspective and commitments," James Reinders • "Many core processors at Intel: lessons learned and remaining challenges," Tim Mattson James Reinders, Director, Software Evangelist, Intel Corporation James Reinders is a senior engineer who joined Intel Corporation in 1989 and has contributed to projects including systolic arrays systems WARP and iWarp, and the world's first TeraFLOPS supercomputer (ASCI Red), and the world's first TeraFLOPS microprocessor (Knights Corner), as well as compilers and architecture work for multiple Intel processors and parallel systems. James has been a driver behind the development of Intel as a major provider of software development products, and serves as their chief software evangelist. James is the author of a book on VTune (Intel Press) and Threading Building Blocks (O'Reilly Media) as well as a co‐author of Structured Parallel Programming (Morgan Kauffman) due out in June 2012. James is currently involved in multiple efforts at Intel to bring parallel programming models to the industry including for the Intel MIC architecture. 2 © Intel 2012, All Rights Reserved No Widely Used Programming Language was designed as a Parallel Programming Language 3 © Intel 2012, All Rights Reserved Fortran C++ C# C Java Perl COBOL Python javascript HTML 4 © Intel 2012, All Rights Reserved FORTRAN (1957) WRITE (6,7) 7 FORMAT(13H HELLO, WORLD) STOP END 5 © Intel 2012, All Rights Reserved Fortran • Key adaptations that came later: – Subroutines and Functions (Fortran II, 1958) – File I/O, characters, strings (Fortran 77, 1978) – Recursion (Fortran 90, 1991) [common non‐standard extension available in many Fortran‐77 compilers] – Free‐form input, not based on 80 column Punched Card (Fortran 90, 1991) – Variable names up to 31 characters instead of 6 (Fortran 90, 1991) – Inline comments (Fortran 90, 1991) – Array Notations (Fortran 90, 1991) – Operator overloading (Fortran 90, 1991) – Dynamic Memory Allocation (Fortran 90, 1991) – FORALL (Fortran 95, 1995) – OOP (Fortran 2003, 2003) – DO CONCURRENT (Fortran 2008, 2010) – Co‐Array Fortran (Fortran 2008, 2010) 6 © Intel 2012, All Rights Reserved • C – early key features “register” keyword out of use – “volatile” fading in usage – added: stronger typing (ANSI C, 1989) – C11 – OpenMP* (1996) – Cilk™ Plus (2010) • C++ – Objected oriented – Intel® Threading Building Blocks (2006) – C++11 – Cilk™ Plus (2010) Photo: Wikimedia Commons (http://commons.wikimedia.org) 7 7 © Intel 2012, All Rights Reserved C++11 (some applies to C11 also) Core language runtime performance Core language functionality improvements enhancements • Variadic templates • Rvalue references and move constructors • New string literals defining visibility of stores • Generalized constant expressions • User‐defined literals • Modification to the definition of plain old data • Multithreading memory model Core language build time performance • Thread‐local storage • Explicitly defaulted and deleted special member functions enhancements • Type long long int • Extern template • Static assertions • Allow sizeof to work on members of classes without an Core language usability enhancements explicit object • Initializer lists • Control and query object alignment • Uniform initialization • Allow garbage collected implementations • Type inference • Range‐based for‐loop C++ standard library changes • Lambda functions and expressions • Upgrades to standard library components • Alternative function syntax • Threading facilities • Object construction improvement anonymous • Tuple types • Explicit overrides and final • Hash tables futures & promises, async • Null pointer constant functions • Regular expressions • Strongly typed enumerations • General‐purpose smart pointers • Right angle bracket • Extensible random number facility • Explicit conversion operators • Wrapper reference • Alias templates • Polymorphic wrappers for function objects • Unrestricted unions • Type traits for metaprogramming • Uniform method for computing the return type of function objects 8 Some material adopted from wikipedia.org © Intel 2012, All Rights Reserved What about futures & promises? Futures/promises were NOT added to be highly parallel constructs. Using them that way – will do very POORLY. future : the consumer end of a 1‐element produce/consumer queue • Producer computes the value: calls set_value() on the promise. • Consumer needs the future value: it calls get() on the future. Futures can be used via the async() member function. (this is where people get hopeful for parallel programming) 9 © Intel 2012, All Rights Reserved What about futures & promises? Scaling with using futures for parallelism has been soundly refuted in the seminal 1993 paper: Space‐efficient scheduling of multithreaded computations by Blumofe and Leiserson.This strongly motivated Cilk. The linguistic problems are more subtle. 10 © Intel 2012, All Rights Reserved What about futures & promises? The following two statements that do roughly the same thing: C++11 std::future<double> result = std::async(foo, x); Cilk double result = cilk_spawn foo(x); The first looks like a call to async(). The second looks like a call to foo(). 11 © Intel 2012, All Rights Reserved What about futures & promises? Semantically, consider the following: std::string s(“hello”); int bar(const std::string& s); std::future<int> result = std::async(bar, s + “ world”); The above statement is intended to pass “hello world” to bar and run it asynchronously. 12 © Intel 2012, All Rights Reserved What about futures & promises? Semantically, consider the following: std::string s(“hello”); int bar(const std::string& s); std::future<int> result = std::async(bar, s + “ world”); The above statement is intended to pass “hello world” to bar and run it asynchronously. The problem is that s + “ world” is a temporary object that gets destroyed as soon as the statement completes. 13 © Intel 2012, All Rights Reserved What about futures & promises? std::future<int> result = std::async(bar, s + “ world”); Boosters of std::async will counter that all you need is to add a lambda: std::future<int> result = std::async([&]{ bar(s + “ world”); }); Without the lambda ‐ it is a race condition that should not exist in a linguistically sound parallel construct, but it is pretty much unavoidable in a library‐only specification. 14 © Intel 2012, All Rights Reserved Needs for Parallel Programming Please be: • Be composable • Fit my program • Be portable 15 © Intel 2012, All Rights Reserved Needs for Parallel Programming Please be: Most Important: • Be composable • Fit my program • Be portable The investment required to be “composable” for parallel programming is significant and has many faces. 16 © Intel 2012, All Rights Reserved Needs for Parallel Programming Please be: Most Important: • Be composable Fortran, C, C++: • Fit my program • Be portable 17 © Intel 2012, All Rights Reserved Needs for Parallel Programming Please be: Most Important: • Be composable Fortran, C, C++: • Fit my program Not Platform Specific: • Be portable 18 © Intel 2012, All Rights Reserved Needs for Parallel Programming Most Important: • Be composable Fortran, C, C++: • Fit my program Not Platform Specific: • Be portable This summarizes Intel’s primary investments in helping with parallel programming. Focus, without precluding other smaller investments. 19 © Intel 2012, All Rights Reserved Family of Parallel Programming Models Applicable to Multicore and Many‐core Programming Intel® Cilk™ Plus Intel® Threading Domain-Specific Established Research and Building Blocks Libraries Standards Development Language Widely used C++ Intel® Integrated Message Passing Intel® Concurrent extensions to template library Performance Interface (MPI) Collections simply parallelism for parallelism Primitives Offload OpenMP* Extensions Intel® Math Kernel River Trail: Library Coarray Fortran Parallel Javascript Open sourced Open Sourced OpenCL* Intel® Array Building Blocks Also: Intel product Also: Intel product Intel® SPMD Program Compiler Building upon the C/C++/Fortran Compilers, libraries, analysis tools, and standards you depend on… …innovating on Parallel Programming 20 © Intel 2012, All Rights Reserved 20 Needs for Parallel Programming Address these: Please be: • Scaling • Be composable • Vectorization • Fit my program • Specialization • Be portable 21 © Intel 2012, All Rights Reserved Moore’s Law More Transistors More Parallelism Many Ways 22 © Intel 2012, All Rights Reserved Processor Clock Rate (rapid growth halted ~2005) 23 Source: James Reinders, © 2011, parallelbook.com © Intel 2012, All Rights Reserved Transistors per Processor Continuing to grow (Moore’s Law) Source: James Reinders, © 2011, parallelbook.com 24 © Intel 2012, All Rights Reserved MoreMoore’s Parallelism: Law ✔ Many Ways More Transistors ✔ More Parallelism ✔ Many Ways 25 © Intel 2012, All Rights Reserved More Parallelism: Many Ways •More Cores 26 © Intel 2012, All Rights Reserved Hardware Threads & Cores Source: James Reinders, © 2011, parallelbook.com 27 © Intel 2012, All Rights Reserved Hardware Threads & Cores Source: James Reinders, © 2011, parallelbook.com 28 © Intel 2012, All Rights Reserved Hardware Threads

Selecting the Right Parallel Programming Model

Evaluation of the Coarray Fortran Programming Model on the Example of a Lattice Boltzmann Code

A Comparison of Coarray Fortran, Unified Parallel C and T

Cafe: Coarray Fortran Extensions for Heterogeneous Computing

Coarray Fortran Runtime Implementation in Openuh

Parallel Computing

CS 470 Spring 2018

Coarray Fortran: Past, Present, and Future

The Cascade High Productivity Programming Language

Locally-Oriented Programming: a Simple Programming Model for Stencil-Based Computations on Multi-Level Distributed Memory Architectures

Coarray Fortran 2.0

IBM Power Systems Compiler Strategy

Evaluation of the Coarray Fortran Programming Model on the Example of a Lattice Boltzmann Code