Parallware Trainer: Interactive Tool for Experiential Learning of Parallel

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Parallware Trainer: Interactive Tool for Experiential Learning of Parallel Programming using OpenMP and OpenACC Manuel Arenaz, and Sergio Ortega, and Ernesto Guerrero, and Fernanda Foertter, Abstract—STEM education plays a key role in the sustained growth and stability of the US economy and worldwide. There is currently a shortage of a skilled STEM workforce, and that gap is expected to grow widely in the next decade. It is key to widen the audience of STEM people trained in parallel programming targeting parallel architectures like Intel Xeon, IBM Power, NVIDIA GPU and Intel Xeon Phi. In this regard, the standards OpenMP 4.5 and OpenACC 2.5 offer pragma-based parallel programming paradigms that promise performance portability and higher productivity. This paper presents Parallware Trainer, a new interactive tool for high-productivity STEM education and training in parallel programming using OpenMP 4.5 and OpenACC 2.5. It enables experiential learning by providing an interactive, real-time GUI with editor capabilities to assist in the design, implementation and benchmarking of OpenMP/OpenACC-enabled parallel code. We envision Parallware Trainer as a key enabler for STEM education from PhD down to undergraduate in computer science, maths, physics,.... This paper also describes a success story resulting from a GPU Hackathon organized at the Supercomputing Center of Galicia (CESGA). We present the progress of a 2-people team of the EDANYA group learning how to address the parallelization of a simulation code for prediction of tsunamis used by the National Oceanic and Atmospheric Administration (NOAA). Index Terms—STEM education, experiential learning, parallel programming, OpenMP 4.5, OpenACC 2.5, Parallware Trainer. Fig. 1. The HPC education and training pyramid. I. INTRODUCTION The view of the parallel computing landscape from Berke- PC education and training is organized today mostly ley [5] addresses the so-called parallel challenge, which is around courses, workshops and hackathons. As shown H described as ”Writing programs that scale with increasing in the pyramid of Fig. 1, in courses the participants passively numbers of cores should be as easy as writing programs for listen to a lecture or presentation, and apply parallel program- sequential computers”. Thus, a Parallel Bridge is used to ming concepts to simple example codes in hands-on sessions. illustrate that Software is the main problem in bridging the gap Workshops increase learning retention through interactive between user Applications and the parallel Hardware industry. training activities between the participants. And hackathons Today, the HPC field still widely recognizes that software is are events where people come together to solve problems, pain #1. This paper addresses this by extending the parallel participating in groups of about 2-5 individuals that take out bridge as shown in Fig. 2. The new tower Code highlights their laptops and dive into their own problems. Hackathons that the features of the code implemented by the programmer allow experiential learning as participants work with a coach directly impact on the productivity of the parallelization pro- to learn through immediate practice in the optimization and cess. Thus, best practices on parallel programming typically parallelization of their own code. However, hackathons are recommend, for example, to use stride-1 memory accesses and expensive small events that train only a few people throughout to prefer structures-of-arrays instead of arrays-of-structures. the year. The main contribution of this paper is Parallware M. Arenaz is with the Department of Computer Engineering at University of Trainer [1], a new commercial software that aims at bringing A Coruna,˜ and with Appentra Solutions, Spain e-mail: [email protected]. the benefits of workshops and hackathons to a broader audi- S. Ortega and E. Guerrero are with EDANYA group, University of Malaga, ence of STEM people. It is a new interactive, real-time GUI Spain e-mail: [email protected], [email protected]. F. Foertter is with ORNL e-mail: [email protected]. for high-productivity HPC education and training that enables Manuscript received September 8, 2017; revised September 8, 2017. self-learning of best practices on parallel programming with JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2 Fig. 2. An extension of the parallel bridge of Berkeley’s view of the parallel computing landscape. The new tower Code inserted between Applications and parallel Hardware highlights the impact of the program code in the productivity of the parallelization process. OpenMP 4.5 and OpenACC 2.5. Powered by the hierarchical messages are written using the notation and terminology of classification engine of Parallware technology [4], it discovers classical dependence analysis theory, a mathematical approach parallel patterns in sequential code, provides a ranking of to discover parallelism in sequential codes. Thus, compiler’s parallelization strategies, and generates pragma-based parallel user messages typically report failures in discovering paral- source code using standards OpenMP 4.5 and OpenACC 2.5. lelism by pointing to source code instructions that may intro- The tool is also pre-loaded with sample codes that cover the duce true/output/anti-dependences during parallel execution. most important parallel patterns used in the codes available In contrast, Parallware uses a new computational approach in the CORAL Benchmarks [6] and the NAS Parallel Bench- that consists of a hierarchical classification scheme for depen- marks [7]. As shown in Fig. 2, the programmers interact with dence analysis. Parallware Trainer reports algorithmic features compilers through Parallware’s parallel patterns, which hide found in the code in terms of parallel patterns [4], such as the complexity of the dependency terminology that is difficult fully parallel loops, parallel scalar reductions and parallel to understand for scientists and engineers. Overall, Parallware sparse reductions. In addition, it provides a ranking of several Trainer reduces the costs of HPC training and education, the parallelization strategies that are applicable to the code, and main advantages being: (1) reduction of learning effort and allows the user to generate, study and run implementations of increase of learning retention through learn-by-doing, (2) high those strategies. These features are not typically available in availability 24x7, (3) broader audience of STEM people not production-level compilers. located near HPC training sites. Web-based HPC training [3] typically include webinar se- The rest of the paper is organized as follows. Section II ries, video tutorials and code samples. However, these HPC discusses related work. Section III presents the new tool training environments do not enable experiential learning Parallware Trainer, describing its GUI layout and its technical because the environment does not provide any feedback about features. The current technological roadmap under develop- the problems encountered when the concepts are applied to ment in order to find the product-market fit is also sketched. the code of the developer. Parallware Trainer is a step forward Section IV describes the experience of staff of the EDANYA that could be integrated in third-party web-based training group using Parallware Trainer to learn how to parallelize a environments as well. simulation code to help predicting tsunamis by the NOAA. Finally, Section V presents conclusions and future work. III. PARALLWARE TRAINER Parallware Trainer [1] is a new interactive commercial II. RELATED WORK tool for high-productivity HPC education and training using There are not so many tools dedicated to training in parallel OpenMP 4.5 and OpenACC 2.5. It allows experiential learning programming with OpenMP and OpenACC. HPC centers by providing an interactive, real-time GUI with editor capabili- organize courses and workshops [2] that typically teach the ties to assist in the design and implementation of parallel code. most relevant parallel programming concepts using step-by- Powered by the hierarchical classification engine of Parallware step instructions and hands-on labs. Production-level compilers technology, it discovers parallelism using parallel patterns, (e.g., Intel ICC, GNU GCC, NVIDIA PGI) are not of practical and implements those patterns using standards OpenMP 4.5 use to help understanding the technical reasons behind success and OpenACC 2.5 (see video tutorials How to use Parallware and failure in the parallelization of a code. Compiler’s user Trainer available at www.parallware.com). JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3 Fig. 3. Main screen of the GUI of Parallware Trainer. Layout composed of five panels: 1) Project manager, 2) Source code editor, 3) Parallel code editor, 4) Execution console, and 5) Parallware console. When clicking on a gutter attached to a scope (e.g., function, loop), the GUI shows a window that enables the user to provide additional hints to control the behaviour of the Parallware core technology. A. Graphical User Interface (GUI) benchmarks CORAL and NASA NPB: Atomic access, Built- in reduction and Variable privatization (see details in [4]). The layout of the GUI of Parallware Trainer is shown in Note that unsupported combinations of hints are conveniently Fig. 3. It provides an environment for the edition, compilation reported to the user. and execution of sequential and OpenMP/OpenACC-enabled 3) Parallel Code Editor: Handle multiple parallel versions parallel code. Next, the panels of the main screen are described of the same source code, the one corresponding to the active in detail. tab in the source code editor (see the version pw atmux.c of 1) Project Manager: Handle multiple Parallware Trainer sequential code atmux.c in Fig. 3). By default, Parallware projects, and drag-and-drop the C/C++/Fortran source code Trainer creates a parallel version automatically generated files that compose your projects. A Parallware Trainer project by Parallware core technology (see Fig. 3, version named consists of a directory with the tree structure shown in Fig.

Load more