Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?

[3B2-14] mmi2010030003.3d 11/5/010 16:48 Page 2 .......................................................................................................................................................................................................................... PROGRAMMING MULTICORES: DO APPLICATIONS PROGRAMMERS NEED TO WRITE EXPLICITLY PARALLEL PROGRAMS? .......................................................................................................................................................................................................................... IN THIS PANEL DISCUSSION FROM THE 2009 WORKSHOP ON COMPUTER ARCHITECTURE Arvind RESEARCH DIRECTIONS,DAVID AUGUST AND KESHAV PINGALI DEBATE WHETHER EXPLICITLY Massachusetts Institute PARALLEL PROGRAMMING IS A NECESSARY EVIL FOR APPLICATIONS PROGRAMMERS, of Technology ASSESS THE CURRENT STATE OF PARALLEL PROGRAMMING MODELS, AND DISCUSS David August POSSIBLE ROUTES TOWARD FINDING THE PROGRAMMING MODEL FOR THE MULTICORE ERA. Princeton University Keshav Pingali Moderator’s introduction: Arvind in the 1970s and 1980s, when two main Do applications programmers need to write approaches were developed. Derek Chiou explicitly parallel programs? Most people be- The first approach required that the com- lieve that the current method of parallel pro- pilers do all the work in finding the parallel- University of Texas gramming is impeding the exploitation of ism. This was often referred to as the ‘‘dusty multicores. In other words, the number of decks’’ problem—that is, how to exploit par- at Austin cores in a microprocessor is likely to track allelism in existing programs. This approach Moore’s law in the near future, but the pro- taught us a lot about compiling. But most gramming of multicores might remain the importantly, it taught us how to write a pro- Resit Sendag biggest obstacle in the forward march of gram in the first place, so that the compiler performance. had a chance of finding the parallelism. University of Rhode Let’s assume that this premise is true. The second approach, to which I also Now, the real question becomes: how should contributed, was to write programs in a man- Island applications programmers exploit the poten- ner such that the inherent (or obvious) paral- tial of multicores? There have been two main lelism in the algorithm is not obscured in the Joshua J. Yi ideas in exploiting parallelism: implicitly and program. I explored declarative languages for explicitly. this purpose. This line of research also taught University of Texas us a lot. It showed us that we can express all Implicit parallelism kinds of parallelism in a program, but even School of Law The concept of the implicit exploitation after all the parallelism has been exposed, it of the parallelism in a program has its roots is fairly difficult to efficiently map the .............................................................. 2 Published by the IEEE Computer Society 0272-1732/10/$26.00 c 2010 IEEE [3B2-14] mmi2010030003.3d 11/5/010 16:48 Page 3 exposed parallelism on a given hardware sub- David August and Keshav Pingali will debate strate. So, we are faced with two problems: the following questions: How do we expose the parallelism in a How should applications programmers program? exploit the potential of multicores? How do we package the parallelism for Is explicitly parallel programming aparticularmachine? inherently more difficult than implicitly parallel programming? Explicit parallelism Canwedesignlanguages,compilers, The other approach is to program and runtime systems so that applica- machines explicitly to exploit parallelism. tions programmers can get away with This means that the programmer should be writing only implicit parallel programs, made aware of all the machine resources: without sacrificing performance? the type of interconnection, the number Does anyone other than a few compiler and configuration of caches in the memory writers and systems programmers need hierarchy, and so on. But, it is obvious that explicitly parallel programming? if too much information were disclosed Do programmers need to be able to ex- about the machine, programming difficulty press nondeterminism (for example, se- would increase rapidly. Perhaps a more sys- lection from a set) to exploit parallelism tematic manner of exposing machine details in an application? could alleviate this problem. Is a speculative execution model essen- Difficulty is always at the forefront of any tial for exploiting parallelism? discussion of explicit parallel programming. In the earliest days of the message passing in- terface (MPI), experts were concerned that The case for an implicitly parallel people would not be able to write parallel programming model and dynamic programs at all because humans’ sequential parallelization: David August manner of thinking would make writing Let’s address Arvind’s questions directly, these programs difficult. In addition, the starting with the first. more resources a machine exposes, the less abstract the programming becomes. Hence, How should applications programmers exploit it is not surprising that such programs’ por- the potential of multicores? We appear to tability becomes a difficult issue. How will have two options. The first is explicitly paral- the program be transferred to the next gener- lel programming (parallel programming for ation of a given machine from the same ven- short); the second is parallelizing compilers. dor or to an entirely different machine? In one approach, humans perform all of Today, the issue of portability is so impor- the parallelism extraction. In the other, tant that giving it up is not really an option. tools, such as compilers, do the extraction. To do so would require that every program Historically, both approaches have been dis- be rewritten for each machine configuration. mal failures. We need a new approach, called The last problem, which is equally important the implicitly parallel programming model and as the first two, is that of composability. dynamic parallelization.Thisisahybrid After all, the main purpose of parallel approach that is fundamentally different programming is to enhance performance. from simply combining explicitly parallel Composing parallel programs in a manner programming with parallelizing compilers. that is informative about the performance To understand this new approach, we of the composed program remains one of must understand the difference between the most daunting challenges facing us. ‘‘explicit’’ and ‘‘implicit.’’ Consider the __INLINE__ directive. You Implicit versus explicit debate can, using some compilers, mark a function This minipanel addresses the implicit ver- with this directive, and the compiler will au- sus explicit debate. How much about the tomatically and reliably inline it for you. machine should be exposed? Professors This directive is explicit. The tool makes .................................................................... MAY/JUNE 2010 3 [3B2-14] mmi2010030003.3d 11/5/010 16:48 Page 4 ............................................................................................................................................................................................... COMPUTER ARCHITECTURE DEBATES Figure 1. A partial list of parallel programming languages. inlining more convenient than inlining man- Programmers have a hard time making good ually. However, it is not as convenient as decisions about how to parallelize codes, so simply not having to concern yourself with forcing them to be explicit about it to address questions of inlining, because you are still re- the multicore problem isn’t a solution. As sponsible for making the decision and we’ll see, it actually makes the problem informing the inlining tool of your decision. worse in the long run. That’s why I’m not a Explicit inlining is now unnecessary because proponent of explicitly parallel programming. compilers are better at making inlining decisions than programmers. Instead, by using Is explicitly parallel programming inherently functions and methods, programmers pro- more difficult than implicitly parallel pro- vide compilers with enough information to gramming? For this question, I’m not decide on their own what and where to going to make a strong argument. Instead, inline. Inlining has become implicit since my opponent (Keshav Pingali) will make each function and method by itself (without a strong argument for me. In his PLDI the __INLINE__ directive) is a suggestion to (Programming Language Design and Imple- the compiler that it should make a decision mentation) 2007 paper,1 he quotes Tim about whether to inline. Sweeney, who ‘‘designed the first multi- Parallel programming is explicit in that the threaded Unreal 3 game engine.’’ Sweeney programmer concretely specifies how to parti- estimates that ‘‘writing multithreaded code tion the program. In parallel programming, triple software code cost at Epic games.’’ we have many choices. Figure 1 shows a partial This means that explicitly parallel program- list of more than 150 parallel programming ming is going to hurt. And, it’s going to languages that programmers can choose hurt more as the number of cores increases. from if they want to be explicit about how Through Sweeney, I think my opponent to parallelize a program. Somehow, despite has typified the universal failure of explicitly all these options, the problem still isn’t solved. parallel programming to solve the problem. ...................................................................

Programming Multicores: Do Applications Programmers Need to Write Explicitly Parallel Programs?

The Importance of Data

(CITIUS) Phd DISSERTATION

Intel® Oneapi Programming Guide

Executable Modelling for Highly Parallel Accelerators

An Outlook of High Performance Computing Infrastructures for Scientiﬁc Computing

Performance Tuning Workshop

Regent: a High-Productivity Programming Language for Implicit Parallelism with Logical Regions

Introduction to Parallel Processing

CIS 501 Computer Architecture This Unit: Shared Memory

Comparing SYCL with HPX, Kokkos, Raja and C++ Executors the Future of ISO C++ Heterogeneous Computing

The Cascade High Productivity Programming Language

Non-Blocking Collectives for MPI – Overlap at the Highest Level –