SIMD Computing: an Introduction

SIMD Computing: An Intro duction C. J. C. Schauble Septemb er 12, 1995 High Performance Scienti c Computing University of Colorado at Boulder c Copyright 1995 by the HPSC Group of the University of Colorado The following are memb ers of the HPSC Group of the Department of Computer Science at the University of Colorado at Boulder: Lloyd D. Fosdick Elizab eth R. Jessup Carolyn J. C. Schauble Gitta O. Domik SIMD Computing i Contents 1 General architecture 2 1.1 The Connection Machine CM-2 ::: :: :: :: :: ::: :: 3 1.1.1 Characteristics :: :: :: ::: :: :: :: :: ::: :: 5 1.1.2 Performance :: :: :: :: ::: :: :: :: :: ::: :: 12 1.2 The MasPar MP-2 :: :: :: :: ::: :: :: :: :: ::: :: 12 1.2.1 Characteristics :: :: :: ::: :: :: :: :: ::: :: 13 1.2.2 Performance :: :: :: :: ::: :: :: :: :: ::: :: 15 2 Programming issues 17 2.1 Architectural organization considerations :::::::::::: 17 2.1.1 Homes ::: :: :: :: :: ::: :: :: :: :: ::: :: 18 2.2 CM Fortran, MPF, and Fortran 90 :: :: :: :: :: ::: :: 19 2.2.1 Arrays ::: :: :: :: :: ::: :: :: :: :: ::: :: 20 2.2.2 Array sections : :: :: :: ::: :: :: :: :: ::: :: 22 2.2.3 Alternate DO lo ops :: :: ::: :: :: :: :: ::: :: 23 2.2.4 WHERE statements : :: :: ::: :: :: :: :: ::: :: 23 2.2.5 FORALL statements :: :: ::: :: :: :: :: ::: :: 24 2.3 Built-in functions for CM Fortran and Fortran 90 :: ::: :: 25 2.3.1 Intrinsic functions : :: :: ::: :: :: :: :: ::: :: 26 2.3.2 Masks ::: :: :: :: :: ::: :: :: :: :: ::: :: 26 2.3.3 Sp ecial functions : :: :: ::: :: :: :: :: ::: :: 27 2.4 Compiler directives :: :: :: :: ::: :: :: :: :: ::: :: 34 2.4.1 CM Fortran LAYOUT :: :: ::: :: :: :: :: ::: :: 35 2.4.2 MasPar MPF MAP : :: :: ::: :: :: :: :: ::: :: 38 2.4.3 CM Fortran ALIGN :: :: ::: :: :: :: :: ::: :: 39 2.4.4 CM Fortran COMMON :: :: ::: :: :: :: :: ::: :: 40 2.4.5 MasPar MPF ONDPU : :: ::: :: :: :: :: ::: :: 40 2.4.6 MasPar MPF ONFE :: :: ::: :: :: :: :: ::: :: 41 3 Acknowledgements 42 References 42 CUBoulder : HPSC Course Notes ii SIMD Computing Trademark Notice DECstation, ULTRIX, VAX are trademarks of Digital Equipment Corp oration. Goodyear MPP is a trademark of Go o dyear Rubb er and Tire Company, Inc. ICL DAP is a trademark of International Computers Limited. MasPar Fortran, MasPar MP-1, MasPar MP-2, MasPar Programming En- vironment, MPF, MPL, MPPE, X-net are trademarks of MasPar Computer Corp oration. X-Window System is a trademark of The Massachusetts Institute of Tech- nology. MATLAB is a trademark of The MathWorks, Inc. IDL is a registered trademark of Research Systems, Inc. Symb olics is trademark of Symb olics, Inc. C*, CM, CM-1, CM-2, CM-5, CM Fortran, Connection Machine, DataVault, *Lisp, Paris, Slicewise are trademarks of Thinking Machines Corp oration. UNIX is a trademark of UNIX Systems Lab oratories, Inc. CUBoulder : HPSC Course Notes SIMD Computing: yz An Intro duction C. J. C. Schauble Septemb er 12, 1995 According to the Flynn computer classi cation system [Flynn 72 ], a SIMD computer is a Single-Instruction, Multiple-Data machine. In other words, all the pro cessors of a SIMD multipro cessor execute the same instruction at the same time, but each executes that instruction with di erent data. The computers we discuss in this tutorial are SIMD machines with distributed memories DM-SIMD. They are sometimes referred to as processor arrays or as massively-paral lel computers. This tutorial is divided into two main parts. In the rst section, we discuss the general architecture of SIMD multipro cessors. Then we consider how these general features are emb o died in two particular SIMD machines: the Thinking Machines CM-2 and the MasPar MP-2. In the second section, welookinto programming issues for SIMD multipro cessors, b oth architectural and language-oriented. In particular, we de- This work has b een partially supp orted by the National Center for Atmospheric Re- search NCAR and utilized the TMC CM-2 at NCAR in Boulder, CO. NCAR is supp orted by the National Science Foundation. y This work has b een partially supp orted by the National Center for Sup ercomputing Applications under the grants, TRA930330N and TRA930331N, and utilized the Connec- tion Machine Mo del-2 CM-2 at the National Center for Sup ercomputing Applications, University of Illinois at Urbana-Champaign. z This work has b een supp orted by the National Science Foundation under an Ed- ucational Infrastructure grant, CDA-9017953. It has b een pro duced by the HPSC Group, Department of Computer Science, University of Colorado, Boulder, CO 80309. Please direct comments or queries to Elizab eth Jessup at this address or e-mail [email protected]. c Copyright 1995 by the HPSC Group of the University of Colorado 1 2 SIMD Computing scrib e useful features of Fortran 90 and CM Fortran. For detailed information on how to login and program sp eci c SIMD computers such as CM-2 and the MasPar MP-1, refer to the do cuments in the /pub/HPSC directory at the cs.colorado.edu anonymous ftp site. 1 General architecture Each of the pro cessors in a distributed-memory SIMD machine has its own lo cal memory to store the data it needs. Also each pro cessor is connected to other pro cessors in the computer and may send or receive data to or from any of them. In many resp ects, these computers are similar to distributed- memory MIMD multiple instruction, multiple data multipro cessors. As stated ab ove, the term SIMD implies that the same instruction is exe- cuted on multiple data. Hence the distinguishing feature of a SIMD machine is that all the pro cessors act in concert . Each pro cessor p erforms the same instruction at the same time as all the other pro cessors, but each pro cessor uses it own lo cal data for this execution. The array of pro cessors is usually connected to the outside world bya sequential computer or workstation. The user accesses the pro cessor array through this front end or host machine. Using a SIMD computer for scienti c computing means that many ele- 1 ments of an array can b e computed simultaneously. Unlikevector pro cessors, the computation of these elements is not pip elined with di erent p ortions of neighb oring elements b eing worked on at the same time. Instead large groups of elements go through the same computation in parallel. In the following, we discuss the architectural features of SIMD multipro cessors concentrating on two computers in this class: the Connection Machine CM-2 by Thinking Machines Corp oration and the MasPar MP-2 by MasPar Computer Corp oration. Similar computers include the Digital Equipment Corp oration MPP series technically the same as the MasPar machines, the Go o dyear MPP, and the ICL DAP. 1 See the tutorial on vector computing [Schauble 95] for more information on vector pro cessors. CUBoulder : HPSC Course Notes SIMD Computing 3 1.1 The Connection Machine CM-2 The CM-2 Connection Machine is a SIMD sup ercomputer manufactured by Thinking Machines Corp oration TMC. Data parallel programming is the natural paradigm for this machine allowing each pro cessor to handle one data element or set of data elements at a time. The initial concept of the machine was set forth in a Ph.D. dissertation by W. Daniel Hillis [Hillis 85 ]. The rst commercial version of this computer was called the CM-1 and was manufactured in 1986. It contained up to 65,536 or 64K pro cessors capable of executing the same instruction concurrently.As shown in gure 1, sixteen one-bit pro cessors with 4K bits of memory apiece 2 are on one chip of the machine. These chips are arranged in a hyp ercub e d pattern. Thus the machine was available in units of 2 pro cessors where d = 12 through 16. One of the original purp oses of the computer was arti cial intelligence; the eventual goal was a thinking machine . Each pro cessor is only a one- bit pro cessor. The idea was to provide one pro cessor p er pixel for image pro cessing, one pro cessor p er transistor for VLSI simulation, or one pro cessor p er concept for semantic networks. The rst high-level language implemented for the machine is *Lisp, a parallel extension of Lisp. The design of p ortions of the *Lisp language are discussed in the Hillis dissertation. As the rst version of this sup ercomputer came onto the market, TMC discovered that there was also signi cantinterest and money for sup ercom- puters that can b e used for numerical and scienti c computing. Hence a faster version of the machine was pro duced in 1987; named the CM-2, this machine was the rst of the CM-200 series of computers. It included oating- p oint hardware, a faster clo ck, and increased the memory to 64K bits p er pro cessor. These mo dels emphasized the use of data-parallel programming. Both C* and CM Fortran were available on this machine in addition to *Lisp. Announced in Novemb er 1991, a more recent machine is the CM-5. This is a MIMD machine that emb o dies many of the earlier Connection Machine concepts with more p owerful pro cessors, routing techniques, and I/O. The following subsections discuss the characteristics and the p erformance of the CM-2. For further information, see the Connection Machine CM-200 2 See the tutorial on MIMD computing [Jessup 95] for more information on a hyp ercub e. CUBoulder : HPSC Course Notes 4 SIMD Computing Memory P P P P P P P P M M e e P P P P m m q o o H r r P P P P B H H y y B H H B B Router B B B B B B Memory B B B 2 Figure 1: A representative blowup of one of the 64 pro cessor chips in a Thinking Machines CM-1 or CM-2.

SIMD Computing: an Introduction

2.5 Classification of Parallel Computers

Massively Parallel Computing with CUDA

Parallel Computer Architecture

A Review of Multicore Processors with Parallel Programming

Vector Vs. Scalar Processors: a Performance Comparison Using a Set of Computational Science Benchmarks

A PARALLEL IMPLEMENTATION of BACKPROPAGATION NEURAL NETWORK on MASPAR MP-1 Faramarz Valafar Purdue University School of Electrical Engineering

Parallel Processing! 1! CSE 30321 – Lecture 23 – Introduction to Parallel Processing! 2! Suggested Readings! •! Readings! –! H&P: Chapter 7! •! (Over Next 2 Weeks)!

Thinking Machines

Mathematics 18.337, Computer Science 6.338, SMA 5505 Applied Parallel Computing Spring 2004

Trends in HPC Architectures and Parallel Programmming

Introduction to Parallel Processing : Algorithms and Architectures

The Maspar MP-1 As a Computer Arithmetic Laboratory