Evolution of Programming Approaches for High-Performance Heterogeneous Systems

Evolution of Programming Approaches for High-Performance Heterogeneous Systems Jacob Lambert University of Oregon [email protected] Advisor: Allen D. Malony University of Oregon [email protected] External Advisor: Seyong Lee Oak Ridge National Lab [email protected] Area Exam Report Committee Members: Allen Malony, Boyana Norris, Hank Childs Computer Science University of Oregon United States December 14, 2020 Evolution of Programming Approaches for High-Performance Heterogeneous Systems ABSTRACT they still were created to address the same challenges: using opti- Nearly all contemporary high-performance systems rely on hetero- mized hardware to execute specific algorithmic patterns. geneous computation. As a result, scientific application developers The Partitionable SIMD/MIMD System (PASM) [270] machine are increasingly pushed to explore heterogeneous programming developed at Purdue University in 1981 was initially developed for approaches. In this project, we discuss the long history of hetero- image processing and pattern recognition application. PASM was geneous computing and analyze the evolution of heterogeneous unique in that it could be dynamically reconfigured into either a programming approaches, from distributed systems to grid com- SIMD or MIMD machine, or a combination thereof. The goal was puting to accelerator-based supercomputers. to create a machine that could be optimized for different image processing and pattern recognition tasks, configuring either more SIMD or MIMD capabilities depending on the requirements of the application. 1 INTRODUCTION However, like many early heterogeneous computing systems, Heterogeneous computing is paramount to today’s high-performance programmability was not the primary concern. The programming systems. The top and next generation of supercomputers all employ environment for PASM required the design of a new procedure- heterogeneity, and even desktop workstations can be configured based structured language similar to TRANQUIL [2], the develop- to utilize heterogeneous execution. The explosion of activity and ment of a custom compiler, and even the development of a custom interest in heterogeneous computing, as well as the exploration operating system. and development of heterogeneous programming approaches, may Another early heterogeneous system was TRAC, the Texas Re- seem like a recent trend. However, heterogeneous programming configurable Array Computer [264], built in 1980. Like PASM, TRAC has been a topic of research and discussion for nearly four decades. could weave between SIMD and MIMD execution modes. But also Many of the issues faced by contemporary heterogeneous pro- like PASM, programmability was not a primary or common concern gramming approach designers have long histories, and have many with the TRAC machine, as it relied on now-arcane Job Control connections with now antiquated projects. Languages and APL source code [197]. In this project, we explore the evolution and history of hetero- The lack of focus on programming approaches for early heteroge- geneous computing, with a focus on the development of heterogeneous systems is evident in some ways by the difficulty in finding neous programming approaches. In Section 2, we do a deep dive information on how the machines were typically programmed. into the field of distributed heterogeneous programming, the first However, as the availability of heterogeneous computing environ- application of hardware heterogeneity in computing. In Section 3, ments increased throughout the 1990s, so did the research and we briefly explore the resolutions of distributed heterogeneous development of programming environments. systems and approaches, and discuss the transitional period for Throughout the 80s and early 90s, this environment expanded the field of heterogeneous computing. In Section 4, we provide a to include vector processors, scalar processors, graphics machines, broad exploration into contemporary accelerator-based heteroge- etc. To this end, in this first major section we explore distributed neous computing, specifically analyzing the different programming heterogeneous computing. approaches developed and employed across different accelerator Although the first heterogeneous machines consisted of mixed- architectures. Finally, in Section 5, we take a zoomed-out look at mode machines like PASM and TRAC, mixed-machine heteroge- the development of heterogeneous programming approaches, in- neous systems became the more popular and accessible option trospect on some important takeaways and topics, and speculate throughout the 1990s. Instead of a single machine with the ability about the future of next-generation heterogeneous systems. to switch between a synchronous SIMD mode and an asynchronous MIMD mode, mixed-machine systems contained a variety of differ- 2 DISTRIBUTED HETEROGENEOUS ent processing machines connected by a high-speed interconnect. Examples of machines used in mixed-machine systems include SYSTEMS 1980 - 1995 graphics and rendering-specific machines like the Pixel Planes 5, Even 40 years ago, computer scientists realized heterogeneity was Silicon Graphics 340 VGX, SIMD and vector machines like the needed due to diminishing returns in homogeneous systems. In MasPar MP-series and the CM 200/2000, and coarse grained MIMD the literature, the first references to the term "heterogeneous com- machines like the CM-5, Vista, and Sequent machines. puting" revolved around the distinction between single instruc- It was well understood that different classes of machines (SIMD, tion, multiple data (SIMD) and multiple instruction, multiple data MIMD, vector, graphics, sequential) excelled at different tasks (par- (MIMD) machines in a distributed computing environment. allel computation, statistical analysis, rendering, display), and that Several machines dating back to the 1980s were created and these machines could be networked together in a single system. advertised as heterogeneous computers. Although these machines However, coordinating these distributed systems to execute a single were conceptually different than today’s heterogeneous machines, application presented significant challenges, which many of the early surveyed works related to distributed heterogeneous com- projects in the next section began to address. puting, and they heavily influenced the heterogeneous systems In this section, we explore different programming frameworks created and heterogeneous software and programming approaches developed to utilize these distributed heterogeneous systems. In used. Ercegovac [106] lists how, at the time, the three different ap- Section 2.1, we review several surveys to gain a contextualized proaches were combined in different ways to form the five following insight into the research consensus during the time period. Then in heterogeneous approaches: Section 2.2, we review the most prominent and impactful program- (1) Mainframes with integrated vector units, programmed using ming systems introduced during this time. Finally in Section 2.3 we a single instruction set augmented by vector instructions. discuss the evolution of distributed heterogeneous computing, and (2) Vector processors having two distinct types of instructions how it relates to the subsequent sections. and processors, scalar and vector. An early example includes the SAXPY system, which could be classified as a Matrix 2.1 Distributed Heterogeneous Architectures, Processing Unit. Concepts, and Themes (3) Specialized processors attached to the host machine (AP). This approach closely resembles accelerator-based heteroge- For insight into high-level perspectives, opinions, and the general neous computing, the subject of Section 4. The ST-100 and state of the area of early distributed heterogeneous computing, we ST-50 are early examples of this approach. include discussions from several survey works published during (4) Multiprocessor Systems with vector processors as nodes, the targeted time period. We aim to extract general trends and or scalar processors augmented with vector units as nodes. overarching concepts that drove the development of early systems For example in PASM, mentioned earlier in this Section, the and early heterogeneous programming approaches. operating system supported multi-tasking at the FORTRAN The work by Ercegovac [106], Heterogeneity in Supercomputer Ar- level, and the programmer could use the fork/join API calls chitectures, represents one of the first published works specifically to exploit MIMD-level parallelism. surveying the state of high performance heterogeneous computing. CEDAR [172] represented another example of a multiproces- They define heterogeneity as the combination of different architec- sor cluster with eight processors, each processor modified tures and system design styles into one system or machine, and with an Alliant FX/8 mini-supercomputer. This allowed het- their motivation for heterogeneous systems is summed up well by erogeneity within clusters, and among clusters, and at the the following direct quote: level of instructions, supporting vector processing, multipro- Heterogeneity in the design (of supercomputers) needs cessing, and parallel processing. to be considered when a point of diminishing returns (5) Special-purpose architectures that could contain heterogene- in a homogeneous architecture is reached. ity at both the implementation and function levels. The Navier-Stokes computer (NSC) [262] is an example. The As we see throughout this work, this drive for specialization nodes

Evolution of Programming Approaches for High-Performance Heterogeneous Systems

Integration of CUDA Processing Within the C++ Library for Parallelism and Concurrency (HPX)

Analysis of GPU-Libraries for Rapid Prototyping Database Operations

Wind Rose Data Comes in the Form >200,000 Wind Rose Images

Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing

HPX – a Task Based Programming Model in a Global Address Space

Conference Program

Openpower AI CERN V1.Pdf

Learning PHP 5 by David Sklar

Enea® Linux Open Source Report

Vector Graphics Computer Science Masters Dissertation

“GPU in HEP: Online High Quality Trigger Processing”

Exascale Computing Project -- Software