<<

1

Intro duction to SplitC

Version

David E Culler

Andrea Dusseau

Seth Cop en Goldstein

Arvind Krishnamurthy

Steven Lumetta

Steve Luna

Thorsten von Eicken

Katherine Yelick

Computer Science Division EECS

University of California Berkeley

Berkeley CA

SplitCb oingCSBerkeleyEDU

April

SplitC is a parallel extension of the C programming language primarily intended for distributed

memory multipro cessors It is designed around two ob jectives The rst is to capture certain useful

elements of shared memory message passing and data parallel programming in a familiar context

while eliminating the primary deciencies of each paradigm The second is to provide ecient access

to the underlying machine with no surprises This is similar to the original motivation for Cto

provide a direct and obvious mapping from highlevel programming constructs to lowlevel machine

instructions SplitC do es not try to obscure the inherent p erformance characteristics of the

machine through sophisticated transformations This combination of generality and transparency

of the language gives the algorithm or library designer a concrete optimization target

This do cument describ es the central concepts in SplitC and provides a general intro duction to

programming in the language Both the language and the do cument are undergoing active devel

opment so please view the do cument as working notes rather than the nal language denition

1

This work was supp orted in part by the National Science Foundation as a Presidential Faculty Fellowship num

b er CCR Research Initiation Award numb er CCR and Infrastructure Grant numb er CDA

by Lawrence Livermore National Lab oratory task numb er by the Advanced Research Pro jects Agency

of the Department of Defense monitored by the Oce of Naval Research under contract DABTC by the

Semiconductor Research Consortium under contracts and DC and by T The information

presented here do es not necessarily reect the p osition or the p olicy of the Government and no ocial endorsement should b e inferred

CONTENTS

Contents

Intro duction

SplitC Primitives Overview

Control Paradigm

Global Pointers

Declaring global p ointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Constructing global p ointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Destructuring a global p ointer : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Using global p ointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Arithmetic on global p ointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Spread Pointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Using spread p ointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Spread Arrays

Declaring spread arrays : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Dynamic allo cation of spread ob jects : : : : : : : : : : : : : : : : : : : : : : : : : : :

Address arithmetic : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Conguration indep endent use of spread arrays : : : : : : : : : : : : : : : : : : : : :

Conguration dep endent use of spread arrays : : : : : : : : : : : : : : : : : : : : : :

Bulk assignment

Splitphase Assignment

Get and put : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Store : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Global data movement : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Data driven execution : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Message passing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Synchronization

Executing Co de Atomically : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Optimizing SplitC Programs

Library extensions

Sp ecial variables : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

CONTENTS

Barriers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Global p ointers : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Read : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

GetPut : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Store : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Storage management : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Global communication : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

IO : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Timing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

String copy : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

String concatenation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Miscellaneous : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Atomic op erations : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

Splitcc intrinsics : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

App endix Op en Issues and Inadequacies

Restrictions on global op erations : : : : : : : : : : : : : : : : : : : : : : : : : : : : :

INTRODUCTION

Intro duction

SplitC is a parallel extension to the C programming language designed for large distributed

memory multipro cessors Following the C tradition SplitC is a generalpurp ose language but not a

very high level language nor a big one It strives to provide the programmer enough machinery

to construct p owerful parallel data structures and op erate on these in a machine indep endent

fashion with reasonable clarity At the same it do es not attempt to hide the fundamental

p erformance characteristics of the machine through elab orate language constructs or visionary

compilation Whereas C deals with the of ob jects that sequential computers do

the extensions in SplitC deal with the additional op erations that most collections of computers

supp ort In either case we exp ect the compiler to b e reasonably go o d at address calculations

instruction scheduling and lo storage management with the usual optimizations that p ertain

to these issues

Largescale multipro cessors intro duce two fundamental concerns there is an active thread of

control on each pro cessor and there is a new level of the storage hierarchy involves access to

remote memory mo dules via an interconnection network The SplitC extensions address these two

concerns under the assumption that the programmer must think ab out these issues in designing

eective data structures and algorithms and desires a reasonable means of expressing the results

of the design eort The presence of parallelism and remote access should not unduly obscure

the resulting program The underlying machine mo del is a collection of pro cessors op erating in

a common global address space which is exp ected to b e implemented as a physically distributed

collection of memories The global address space is two dimensional from the viewp oint of address

arithmetic on global data structures and from a p erformance viewp oint in that each pro cessor has

ecient access to a p ortion of the address space We may call this the local portion of the global

space SplitC provides access to global ob jects in a manner that reects the access characteristics

of the interpro cessor level of the storage hierarchy

SplitC attempts to combine the most valuable asp ects of shared memory programming with the

most valuable asp ects message passing and data parallel programming within a coherent framework

The ability to dereference global p ointers provides access to data without prearranged coordination

b etween pro cessors on which the data happ ens to reside This allows sophisticated linked data

structures to b e constructed and used Splitphase access eg prefetch allows global p ointers to

b e dereferenced without causing the pro cessor to stall during access The global address space and

the syntactic supp ort for distributed data structures provides a means of do cumenting the global

data structures in the program This global structure is usually lost with traditional message

passing b ecause it is implicit in the communication patterns Algorithms that are natural to

state in terms of message passing are ecient within a global address framework with bulk

transfer they are as easy to express and the fundamental storage requirements of the algorithm

INTRODUCTION

are made explicit Traditional sharedmemory loses the inherent event asso ciated with transfer of

information so even simple global op erations such as summation are hard to express eciently

SplitC allows notication to b e asso ciated with access to the global addresses using an approach

similar to splitphase access Data parallel programming involves phases of lo cal computation and

phases of global communication The global communication phases are often very general say

scattering data from each pro cessor to every other so the global address is very useful but there is

no need to maintain consistency on a p erop eration basis SplitC is built up on an active message

substrateAM so the functionality of the language can easily b e extended by libraries that use

the lowest level communication primitive directly while providing meaningful abstractions within

a global address framework

This pap er is intended to intro duce the pilot version of SplitC Section provides an overview

of the basic concepts in the language Sections through explain these concepts in more detail

describ e the syntax and provide simple examples Section discusses optimization strategies and

Section lists the library functions available to the SplitC programmer as well as the primitives

used by the SplitC compiler

SPLITC PRIMITIVES OVERVIEW

SplitC Primitives Overview

The extensions intro duced in SplitC attempt to exp ose the salient features of mo dern multipro

cessor machines in a generic fashion The most obvious facet is simply the presence of multiple

pro cessors each following an indep endent thread of control More interesting is the presence of a

very large address space that is accessed by these threads In all recent largescale multipro ces

sors this is realized by storage resources that are lo cal to the individual pro cessors This trend is

exp ected to continue SplitC provides a range of access metho ds to the global address space but

encourages a mostly lo cal programming style It is anticipated that dierent architectures will

provide varying degrees of supp ort for direct access to remote memory Finally it is exp ected that

global ob jects will often b e shared and this requires an added degree of control in how they are

accessed

SplitC provides the following extensions to C

 Multiple persistent threads A SplitC program is parallel ab initio From program b egin

2

to program end there are PROCS threads of control within the same program image Each

thread has a unique numb er given by a sp ecial variable MYPROC that ranges from to PROCS

Generally we will use the term pro cessor to mean the thread of control or pro cess on that

pro cessor A variety of convenient parallel control structures can b e built on this substrate

and several are provided as C prepro cessor cpp macros but the basic language denition

do es not prescrib e dynamic thread manipulation or task scheduling A small family of global

synchronization op erations are provided to coordinate the entire collection of threads eg

barrier No sp ecic programming paradigm such as data parallel data driven or mes

sage passing is imp osed by the language However these programming paradigms can b e

supp orted as a matter of convention

 D Global Address Space Any pro cessor can access any ob ject in a large global address space

However the inherent two dimensional structure of the underlying machine is not lost Each

pro cessor owns a sp ecic region of the address space and is p ermitted to access that region

via standard lo cal p ointers Rather than intro ducing a complicated set of mapping functions

as in FortranD or mysterious mappings in the runtime system as in CMFortran or C

simple mapping rules are asso ciated with multidimensional structures and global p ointer

typ es Sophisticated mappings are supp orted by exploiting the relationship b etween arrays

and p ointers as is common in C

 Global pointers A global p ointer refers to an arbitrary ob ject of the asso ciated typ e anywhere

in the system We will use the term global object to mean an ob ject referenced by a global

p ointer A global ob ject is owned entirely by a pro cessor which may have ecient access to

2

This is termed the split mo del in Bro oks

SPLITC PRIMITIVES OVERVIEW

the ob ject though standard p ointers A new keyword global is intro duced to qualify a p ointer

as meaningful to all pro cessors Global p ointers can b e dereferenced in the same manner as

standard p ointers although the time to dereference a global p ointer is considerably greater

than that for a lo cal p ointer p erhaps up to ten times a lo cal memory op eration ie a

cache miss The language provides supp ort for allo cating global ob jects constructing global

p ointers from lo cal counterparts and destructuring global p ointers In general global ob jects

may contain lo cal p ointers but such p ointers must b e interpreted relative to the pro cessor

owning the global ob ject

A p ointer in C references a particular ob ject but also denes a sequence of ob jects that can

b e referenced by arithmetic op erations on the p ointer In SplitC the sequence of ob jects

referenced by a standard p ointer are entirely lo cal to the pro cessor Address arithmetic on a

global p ointer has the same meaning as arithmetic on a standard p ointer by the pro cessor that

owns the ob ject Hence all the ob jects referenced relative to a global p ointer are asso ciated

with one pro cessor

 Spread pointers A second form of global p ointer is provided which denes a sequence of

ob jects that are distributed or spread across the pro cessors The keyword spread is used

as the qualier to declare this form of global p ointer Consecutive ob jects referenced by a

spread p ointer are wrapp in a helical fashion through the global address space with the

pro cessor dimension varying fastest Each ob ject is entirely owned by a single pro cessor but

the consecutive element ie that referenced by is on the next pro cessor

 Spread arrays The duality in C b etween p ointers and arrays is naturally extended to spread

p ointers and arrays that are spread across pro cessors called spread arrays Spread arrays

are declared by inserting a spreader which identies the dimensions that are to b e

spread across pro cessors All dimensions to the left of the spreader are wrapp ed over the

pro cessors Dimensions to the right of the spreader dene the ob ject that is allo cated within

a pro cessor The spreader p osition is part of the static typ e so ecient co de can b e generated

for multidimensional access Indexing to the left of the spreader corresp onds to arithmetic

on spread p ointers while indexing to the right of the spreader corresp onds to arithmetic

on global p ointers The op erator applied to an array expression yields a p ointer of the

appropriate typ e Generic routines that op erate indep endent of the input layout utilize the

duality b etween arrays and p ointers to eliminate the higher dimensions

 Splitphase assignment A new assignment op erator is intro duced to split the initiation of

a global access from the completion of the access This allows the time of a global access to b e

masked by other useful work and the communication resources of the system to b e eectively

utilized In contrast standard assignments stall the issuing pro cessor until the assignment

SPLITC PRIMITIVES OVERVIEW

is complete to guarantee that reads and writes o ccur in program order However there

are restrictions on the use of split assignments Whereas the standard assignment op erator

describ es arbitrary reads and one write the split assignment op erator sp ecies either to get

the contents of a global reference into a lo cal one or to put the contents of a lo cal reference

into a global one Thus arbitrary expressions are not allowed on the right hand side of a

split assignment The initiates the transfer but do es not for its completion A

op eration joins the preceeding split assignments with the thread of control A lo cal variable

assigned by a get similarly a global variable assigned by a put is guaranteed to have its new

value only after the following sync statement The value of the variable prior to the sync is

not dened Variables app earing in split assignments should not b e mo died either directly

or through aliases b etween the assignment and the following sync and variables on the left

hand side should not b e read during that time The order in which puts take eect is only

constrained by sync b oundaries b etween those b oundaries the puts may b e reordered No

limit is placed on the numb er of outstanding assignments

 Signaling assignment A weaker form of assignment called store and denoted is provided

to allow ecient data driven execution and global op erations Store up dates a global lo ca

tion but do es not provide any acknowledgement of its completion to the issuing pro cessor

Completion of a collection of such stores is detected globally using allstoresync executed

by all pro cessors For global data rearrangement in which all pro cessors are co op erating to

move data a set of stores by the pro cessors are followed by an allstoresync In addition

the recipient of store can determine if certain numb er of stores to it have completed using

storesync which takes the exp ected numb er of stores and waits until they have completed

This is useful for data driven execution with predictable communication patterns

 Bulk assignment Transfers of complete ob jects are supp orted through the assignment op

erators and library routines The library op erations allow for bulk transfers which reect

the view that in managing a storage hierarchy the unit of transfer should increase with the

access time Moreover bulk transfers enhance the utility of splitphase op erations A single

word get is essentially a binding prefetch The ability to prefetch an entire ob ject or blo ck

often allows the prefetch op eration to b e moved out of the inner lo op and increases the dis

tance b etween the time where the get is issued and the time where the result is needed The

assignment and splitassignment op erators transfer arbitrary data typ es or structs as with

the standard C assignment However C do es not provide op erators for copying entire arrays

3

Bulk op erations are provided to op erate on arrays

 Synchronizing assignment Concurrent access to shared ob jects as o ccurs in manipulating

3

It is anticipated that SplitC will supp ort range or triplet syntax ala Fortran to copy p ortions of arrays

SPLITC PRIMITIVES OVERVIEW

linked data structures requires that the accesses b e protected under a meaningful lo cking

strategy SplitC libraries provide a variety of atomic access primitives such as fetchand

add and a general facility for constructing lo cking versions of structs and manipulating them

under mutual exclusion single writer multiple reader or other strategies

CONTROL PARADIGM

Control Paradigm

The control paradigm for SplitC programs is a single thread of control on each of PROCS pro cessors

from the b eginning of splitcmain until its completion The pro cessors may each follow distinct

ow of control but join together at rendezvous p oints such as barrier It is a SPMD mo del in

that every pro cessor executes the same logical program image Each pro cessor has its own stack for

automatic variables and its own static or external variables Static spread arrays and heap ob jects

referenced global p ointers provide the means for shared data Pro cessors are numb ered rom to

PROCS with the pseudoconstant MYPROC referring to the numb er of the executing pro cessor

Figure shows a simple SplitC program to compute an approximation of  through a Monte

Carlo integration technique The idea is to throw darts into the unit square ;  ; and

compute the fraction of darts that hit within the unit circle This should approximate the ratio of

the areas which is  = Although the example is contrived it illustrates several imp ortant asp ects

of the language

All pro cessors enter splitcmain together They can each obtain command line arguments in

the usual fashion In this case the total numb er of trials is provided this represents the work that

is to b e divided among the pro cessors Each pro cessor computes the numb er of trials that it is to

p erform initializes its random numb er generator with a seed based on the value of MYPROC and

conducts its trials The pro cessors join at the barrier and then all coop erate to the hits into

totalhits on pro cessor Finally pro cessor prints the result

In general the co de executed by dierent pro cessors is varied using a standard library of control

macros These typically involve a of MYPROC as in the case of onone which tests for MYPROC

More interesting macros such as formyD will app ear in later examples this is used for

iterating over sets of indexes that corresp ond to lo cally owned data The library contains a set of

these control macros for hiding the index arithmetic in some common control patterns and users

can easily dene their own

SplitC programs may mix this kind of control parallelism in which dierent pro cessors are

ecuting dierent co de with data parallelism in which global op erations such as scans or reductions

require the involvement of all pro cessors The global op erations are provided by library op erations

which by convention are named with the prex all The assumption is that all pro cessors execute

these within a reasonably short time frame If some pro cessors are signicantly b ehind the others

then p erformance will degrade and it some pro cessors fail to execute the op eration at all the

program may hang We will discuss these global op erations further in Section since they are

frequently used with spread arrays

All SplitC les should include splitcsplitch which denes language primitives such as

barrier and pseudoconstants such as MYPROC and PROCS The current installation uses sc as the

le typ e for SplitC les which may call normal C routines for lo cal computations Most SplitC

CONTROL PARADIGM

les will also include the standard control macros in splitccontrolh The integer reduction

under addition is one of the standard global communication op erations in splitccomh We

will lo ok more closely at how this can b e implemented as we intro duce more of the language

Implementation note Example programs with gmake les for the current release can b e

found in usrcmlocalsrcsplitcexamples The co de in this tutorial is in the tutorial

sub directory there

The example illustrates a bulk synchronous programming style that arises quite frequently in

SplitC In this style programs are typically constructed as a sequence of parallel phases Often the

phases alternate b etween purely lo cal computation and global communication as in this example

Notice also that the program works on any numb er of pro cessors even though the numb er of

pro cessors is exp osed The results will not b e quite identical b ecause of the initializati on of the

random numb er generator By default the compiler pro duces conguration indep endent co de By

following a few simple conventions it is p ossible to optimize for the machine size yet run on any

conguration

Another common style is to allow the threads to coop erate is a structured fashion through

op erations on shared ob jects SplitC supp orts b oth styles and a variety of others The following

sections fo cus on how the various forms of interaction b etween pro cessors are supp orted in SplitC

CONTROL PARADIGM

include

include

include

include

include

include

int hit

f

int const rand max xFFFFFF

double x double randrand max=rand max

double y double randrand max=rand max

if xxyy return

else return

g

splitc mainint argc char argv

f

int i total hits hits

double pi

int trials my trials

if argc

trials

else

trials atoiargv

my trials trials PROCS MYPROC=PROCS

srandMYPROC Dierent seed on each processor

for i i < my trials i hits hit

barrier

total hits all reduce to one addhits

on one f

pi total hits=trials

PI estimated at f from d trials on d processorsnn

pi trials PROCS

g

g

Figure Example SplitC program computing and approximation to pi using a parallel Monte

Carlo integration technique