<<

View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Syracuse University Research Facility and Collaborative Environment

Syracuse University SURFACE

Electrical Engineering and Computer Science College of Engineering and Computer Science

1995

PASSION Runtime for the Intel Paragon

Alok Choudhary Syracuse University, Department of Electrical and Computer Engineering

Rajesh Bordawekar Syracuse University

Sachin More Syracuse University, Department of Electrical and Computer Engineering, [email protected]

K. Sivaram Syracuse University, Department of Electrical and Computer Engineering, [email protected]

Rajeev Thakur Argonne National Laboratory, Mathematics and Computer Science Division

Follow this and additional works at: https://surface.syr.edu/eecs

Part of the Computer Sciences Commons

Recommended Citation Choudhary, Alok; Bordawekar, Rajesh; More, Sachin; Sivaram, K.; and Thakur, Rajeev, "PASSION Runtime Library for the Intel Paragon" (1995). Electrical Engineering and Computer Science. 50. https://surface.syr.edu/eecs/50

This Article is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Electrical Engineering and Computer Science by an authorized administrator of SURFACE. For more information, please contact [email protected].



PASSION Runtime Library for the Intel Paragon

y

Alok Choudhary Rajesh Bordawekar Sachin More K Sivaram

Dept of Electrical and Computer Engineering

Syracuse University Syracuse NY

choudhar rajesh ssmore sivaram catsyredu

Rajeev Thakur

Mathematics and Computer Science Division

Argonne National Lab oratory

Argonne IL

thakurmcsanlgov

parallel computers have not kept pace with improve Abstract

ments in their computation and communication ca

We are developing a runtime library which pro

pabilities This results in the IO system b eing the

vides a number of routines to perform the IO re

b ottleneck in many cases

quired in paral lel applications in an ecient and con

venient manner This is part of a project cal led PAS

There are a numb er of reasons why IO may b e

SION which aims to provide software support for

needed in a parallel program In many applica

highperformance paral lel IO at the run

tions all the data required by the program cannot t

time and le system levels The PASSION Runtime

in main memory and so has to b e stored in les on

Library uses a highlevel interface which makes it easy

disks Such programs are called outofcore programs

for the user to specify the IO required in the program

In outofcore programs IO is needed to access the

The user only needs to specify what portion of the data

entire data set IO may also b e required in incore

structure needs to read from or written to the le and

programs where all the data can t in main memory

the PASSION routines wil l perform al l the necessary

For example it may b e necessary to read input data

IO eciently This paper gives an overview of the

from les at the start of the computation and write

PASSION Runtime Library and describes in detail its

results to les at the end of the computation During

highlevel interface

the computation it may b e necessary to p erio dically

write data to les to monitor the progress of the so

Intro duction

lution In applications which run for a long time it

Parallel computers are b ecoming increasingly p ow

may b e necessary to checkp oint stop the computa

erful day by day This has made p ossible the solution

tion at some p oint and restart it later This requires

of many problems which were previously considered

saving the contents of the data structures in les IO

intractable These include large scale applications in

may also b e required for the purp ose of debugging a

physics chemistry biology engineering medicine and

parallel program

other sciences as well as in other areas such as infor

We are working on a pro ject called PASSION Par

mation technology Many of these applications deal

allel and Scalable Software for InputOutput which

with large data sets and hence have signicant IO re

aims to provide software supp ort for highp erformance

quirements Improvements in the IO p erformance of

parallel IO on distributed memory parallel comput

ers PASSION provides supp ort at the compiler

This work was supp orted in part by a grant from Intel SSD

runtime and le system levels The PASSION Run

and NSF Young Investigator Award CCR This work

time Library provides a numb er of optimized routines

was p erformed in part using the Intel Paragon and Touchstone

to p erform the IO required in parallel applications

Delta Systems op erated by Caltech on b ehalf of the Concur

in an ecient manner It uses a highlevel interface

rent Sup ercomputing Consortium Access to this facility was

which makes it easy for the user to sp ecify the IO re

provided by CRPC

y

quired in the program The interface also enables the

Dept of Computer and Information Science Syracuse

use of collective IO in which pro cessors co op erate to University

on the Intel Paragon and Touchstone Delta systems p erform IO eciently The user is freed from the bur

It is currently b eing p orted to other machines den of explicitly manipulating le p ointers calculating

le osets managing buers and other tedious tasks

Data Storage and Access Mo dels

asso ciated with using the lowlevel interface provided

In outofcore programs all the data required by

by parallel le systems This pap er gives an overview

the program cannot t in main memory and so has

of the PASSION Runtime Library and describ es in

to stored in les on disks in some fashion PASSION

detail its highlevel interface

supp orts two basic mo dels for storing and accessing

The rest of this pap er is organized as follows Sec

data called the Local Placement Model LPM and

tion gives a brief overview of the PASSION Runtime

the Global Placement Model GPM

Library The need for providing highlevel interfaces

for parallel IO is explained in Section Section de

scrib es the various data structures used by the PAS

Lo cal Placement Mo del LPM

SION library The interface used by several of the

PASSION routines is describ ed in Section followed

In this mo del the global array is divided into lo cal

by Conclusions in Section

arrays b elonging to each pro cessor Since the lo cal ar

rays are outofcore they have to b e stored in les on

Overview of the PASSION Runtime

disks The lo cal array of each pro cessor is stored in a

separate le called the Local Array File LAF of that

Library

pro cessor The no de program explicitly reads from

The PASSION Runtime Library provides routines

and writes to the le when required The simplest way

to eciently p erform the IO required in parallel ap

to view this mo del is to think of each pro cessor as hav

plications b oth incore as well as outofcore It sup

ing another level of memory which is much slower than

p orts a lo osely synchronous Single Program Multiple

main memory If the IO architecture of the system is

Data SPMD programming mo del The PASSION li

such that each pro cessor has its own disk the LAF of

brary uses a simple highlevel interface which is a level

each pro cessor will b e stored on the disk attached to

higher than any of the existing parallel le system in

that pro cessor If there is a common set of disks for all

terfaces as shown in Figure For example the user

pro cessors the LAF will b e distributed across one or

only needs to sp ecify what section of the array needs to

more of these disks In other words we assume that

b e read in terms of its lowerb ound upp erb ound and

each pro cessor has its own logical disk with the LAF

stride in each dimension and the PASSION Runtime

stored on that disk The mapping of the logical disk to

Library will fetch it in an ecient manner PASSION

the physical disks dep ends on how much control the

thus provides a simple and p ortable level of abstrac

parallel le system provides the user At any time

tion ab ove the native parallel le system provided on

only a p ortion of the lo cal array is fetched and stored

the machine The PASSION library is designed to ei

in main memory The size of this p ortion dep ends on

ther b e directly used by application programmers or a

the amount of memory available The p ortion of the

compiler could translate outofcore programs written

lo cal array which is in main memory is called the In

in a highlevel dataparallel language like High Per

Core Local Array ICLA All computations are p er

formance Fortran HPF to no de programs with calls

formed on the data in the ICLA Thus during the

to the library for IO A numb er of optimizations

course of the program parts of the LAF are fetched

such as twophase IO data sieving data prefetch

into the ICLA the new values are computed and the

ing and data reuse have b een incorp orated in the li

ICLA is stored back into appropriate lo cations in the

brary

LAF

Architectural Mo del

The architectural mo del assumed by PASSION is

Global Placement Mo del GPM

that of any general distributed memory computer in

In this mo del the global array is stored in a single le which the pro cessors are connected together in some

called the Global Array File GAF and no lo cal ar fashion The system is assumed to b e provided with a

ray les are created The global array is only logically set of disks and IO no des The IO no des can either

divided into lo cal arrays in keeping with the SPMD b e dedicated pro cessors or some of the compute no des

programming mo del But there is a single global ar may also serve as IO no des Each pro cessor may ei

ray on disk The PASSION fetches ther have its own lo cal disk or all pro cessors may share

the appropriate p ortion of each pro cessors lo cal array the set of disks The IO subsystem may have a sepa

from the global array le as requested by the user rate interconnection network or it can share the same

The advantage of the Global Placement Mo del is that network which connects the pro cessors together Thus

it saves the initial lo cal array le creation phase in the architectural mo del of PASSION conforms to that

the Lo cal Placement Mo del In addition if the dis of any of the commercially available parallel comput

tribution of the array among pro cessors needs to b e ers The PASSION library was originally implemented HPF/HPC++ Node + MP

Interface Interface

PASSION RUNTIME SYSTEM

Message Passing System

Parallel File System

Figure Software Architecture

arrive from dierent pro cessors in any order On the changed during program an explicit redis

other hand if pro cessors use collective IO they can tribution of the outofcore data is not required The

co op erate among themselves to p erform IO eciently disadvantage is that each pro cessors data may not b e

in large chunks and in the right order stored contiguously in the GAF resulting in multiple

read requests and higher IO latency time However

The PASSION library p erforms collective IO us

this drawback can b e overcome to a large extent by

ing a TwoPhase Metho This can b e used

using the TwoPhase Metho d for IO Also

to readwrite either entire arrays or sections of ar

in the Global Placement Mo del explicit synchroniza

rays withwithout strides in each dimension In the

tion is required when a pro cessor needs to access data

TwoPhase Metho d IO is done in two phases In

that may have b een previously mo died by another

the rst phase pro cessors co op erate to read data in

pro cessor

large contiguous chunks A dynamic scheme is used

to partition the IO workload among pro cessors de

Optimizations

p ending on the access requests In the second

A numb er of optimizations have b een incorp orated

phase data is redistributed among pro cessors using

in the PASSION Runtime Library We briey describ e

interpro cessor communication so that each pro cessor

some of them b elow Further details and p erformance

gets the data it requested The main advantages of

results are given in

the TwoPhase Metho d are

 It results in high granularity data transfer b e

Collective IO Using a TwoPhase

tween pro cessors and disks

Metho d

 It makes use of the higher bandwidth of the pro

In data parallel programs all pro cessors p erform sim

cessor interconnection network

ilar op erations but on dierent data sets Hence if

one pro cessor needs to read data from disks it is very

likely that a group of pro cessors or mayb e all pro ces

Data Sieving

sors need to read data from disks at ab out the same

All PASSION routines for reading or writing data time This makes it p ossible for the requesting pro

fromto disks supp ort the readingwriting of regu cessors to co op erate in reading or writing data in an

lar sections of arrays with strides For example a ecient manner which is known as col lective IO If

pro cessor may want to read a section of an outof pro cessors p erform IO indep endently it may result in

core twodimensional array given by its lowerb ound a large numb er of low granularity requests which may

upp erb ound and stride in each dimension l u

1

1 A D

s l u s The interfaces provided by most of the

1 2 2 2

parallel le systems at present do not supp ort strided

accesses Hence the only way of reading this array sec

ve the le tion using a direct metho d is to explicitly mo (l1,l2)

p ointer to each element and read it individually This

requires as many reads as the numb er of elements in

the section The ma jor disadvantage of this metho d

is the large numb er of IO calls and low granularity

Since IO latency is very high this

of data transfer (u1,u2)

metho d proves to b e very exp ensive

An optimization called data sieving is used in PAS

SION to readwrite strided data eciently For read

strided section instead of reading only the re

ing a B

quested elements large contiguous chunks of data are

read at a time into a temp orary buer in main mem

Figure Pro cessor needs to access section l

1

ory This includes unwanted data The useful data is

u l u of the outofcore array ABCD stored in a

1 2 2

extracted from the buer and passed on to the call

le in columnma jor order

ing program The amount of data read in each read

op eration dep ends on the amount of temp orary space

available A similar metho d is used for writing reg

instead of reading it again from disk The amount of

ular sections except that this requires an extra read

data reuse dep ends on the intersection of the sets of

b efore the write to avoid overwriting any data already

data needed for computation on the p ortion of data

present in the le The advantage of data sieving

currently fetched into memory and the p ortion that

is that it results in higher granularity data transfer

will b e fetched next

though extra data is also transferred in the pro cess

HighLevel Interfaces

We found that data sieving provides considerable p er

Most parallel le systems provide a onedimensional

formance improvement

view of data ie the le is viewed as a linear sequence

of records The user needs to know how the data struc

Data Prefetching

ture in the program is mapp ed to this onedimensional

sequence of records For example a twodimensional

In b oth the Lo cal and Global Placement Mo dels pro

array may b e stored in the le in rowma jor or column

gram execution pro ceeds by fetching data from a le

ma jor order To readwrite a p ortion of the data the

p erforming the computation on the data and writing

user has to explicitly calculate where the data is lo

the results back to a le This is rep eated on other

cated in the le move the le p ointer to that lo cation

data sets till the end of the program Thus IO and

and then readwrite data Also the interface provided

computation form distinct phases in the program A

by most parallel le systems do es not supp ort strided

pro cessor has to wait while each data set is b eing read

accesses If the required data lies strided in the le

or written as there is no overlap b etween computa

the user has to explicitly seek to each contiguous p or

tion and IO The time taken by the program can b e

tion and readwrite that contiguous p ortion We call

reduced if it is p ossible to overlap computation with

such an interface a lowlevel interface

IO in some fashion A simple way of achieving this

For example consider Figure ABCD is a large

is to issue an asynchronous IO read request for the

outofcore array stored in a le in columnma jor or

next data set immediately after the current data set

der Pro cessor needs to read a section of this array

has b een read This is called data prefetching Since

given by the indices l u l u This section

1 1 2 2

the read request is asynchronous the reading of the

do es not lie contiguously in the le Each column of

next data set can b e overlapp ed with the computation

the section is lo cated contiguously but the individual

b eing p erformed on the current data set If the com

columns are separated by some other data The only

putation time is comparable to the IO time this can

way to read this section using the traditional lowlevel

result in signicant p erformance improvement

interface provided by a parallel le system is to ex

plicitly seek to the rst element of each column read

Data Reuse

all elements in the column then seek to the rst ele

ment of the next column and so on There are several

In many applications a p ortion of the current data set

drawbacks to directly using the lowlevel interface

fetched from the le is also needed for computation on

the next data set To reduce the amount of IO the  Calculating osets and manipulating le p ointers

data already fetched into main memory can b e reused is tedious to the user

 Size of each element of the array in bytes Each  Since the IO latency is very high the larger the

element of the array could p otentially b e a struc numb er of requests required to access data lower

ture or record This enables the PASSION library is the p erformance

to supp ort arrays of any data typ e

 The le system cannot p erform optimizations

based on the access requests of all pro cessors

 Numb er of pro cessors in each dimension

since in general there is no supp ort for pro ces

sors to make collective requests

 Distribution of the array in each dimension

We b elieve that highlevel interfaces that facilitate

the use of semantic knowledge ab out the accesses from

 Size of the InCore Lo cal Array ICLA

parallel application programs are necessary for simple

p ortable and ecient programming For example in

 Size of the overlap area

the case of Figure the user should b e able to sp ec

ify in a simple way and in a single call that the sec

 Size of the OutofCore Lo cal Array OCLA

tion l u l u of the array needs to b e read

1 1 2 2

A library of optimized routines can b e develop ed to

Parallel File Pointer PFILE

read the necessary data using the lowlevel interface

provided by the le system PASSION provides such

The parallel le p ointer is the parallel equivalent

a highlevel interface for the convenience of the user

of the le p ointer asso ciated with a sequential le It

and a library of routines which supp ort this interface

is allo cated by the PASSION open routine It needs

eciently

to b e passed as a parameter to all PASSION routines

Recently some le systems have b een develop ed

that access les The parallel le p ointer contains the

such as the Vesta le system and the nCUBE le

following information ab out the parallel le

system which provide some limited supp ort for

the user to sp ecify a logical view of the data to b e

 System le descriptor

accessed There have also b een some prop osals for

le system interfaces which allow the user to sp ecify

 Header size

strided requests in a single readwrite call Sp e

cialized interfaces are also provided by other runtime

Prefetch Descriptor

libraries such as The PASSION Runtime Li

The prefetch descriptor is used to store informa

brary provides a very general highlevel interface For

tion ab out prefetch read op erations in progress It is

example the user can access arbitrary array sections

prefetch read It allo cated by the routine PASSION

with strides in each dimension The array elements

is used by the PASSION prefetch wait routine which

can b e of any typ e even userdened records The ar

waits for a previously initiated prefetch op eration to

ray can b e stored in the le in any storage order and

complete

the le can have a header containing some additional

information PASSION also supp orts a collective in

Reuse Descriptor

terface so that optimizations can b e p erformed based

on the knowledge of the access requests of all pro ces

This data structure is used to implement the

sors Sections and describ e the PASSION interface

data reuse op eration It is allo cated by the

in detail

reuse init routine which initiates a reuse PASSION

op eration It is up dated on the subsequent calls to the

PASSION Data Structures

PASSION read reuse routine which actually do es the

The PASSION library provides supp ort for read

reuse

ingwriting entire arrays as well as sections of arrays

stored in les It uses the following data structures for

Access Descriptor

this purp ose

This data structure is used to sp ecify which section

OutofCore Array Descriptor

of the array needs to b e read or written It is a two

dimensional array row i sp ecies the lower b ound

OCAD

upp er b ound and stride in dimension i of the section

Each outofcore array has a descriptor asso ci

to b e accessed

ated with it called the OutofCore Array Descriptor

OCAD The OCAD contains the following informa

tion ab out the array

PASSION Interface

We describ e the interface used by several of the

 Numb er of dimensions

PASSION routines Further details can b e found in

 Size of the global array the PASSION Users Guide

Setting up the OCAD

int PASSION read headerPFILE PFilePtr

All PASSION routines which access arrays require

char HBuf

a p ointer to the OCAD The OCAD can b e created

and initialized as follows

The parameters are a parallel le p ointer and a p ointer

to a buer in memory to store the header This rou

 The OCAD has to rst b e allo cated using the rou

tine can b e called immediately after the le is op ened

tine PASSION malloc OCAD

even b efore calling PASSION fill OCAD This allows

the application program to store information ab out

malloc OCADint dimensions OCAD PASSION

the array in the le header and use that information

to ll in the OCAD

The parameter to this routine is the numb er of

Information can b e written to the le header

dimensions of the outofcore array

using the routine PASSION write header

 After the OCAD has b een allo cated it can b e ini

tialized using the routine PASSION fill OCAD

int PASSION write headerPFILE PFilePtr

char HBuf

fill OCADOCAD OCADptr int PASSION

int size int distribution

int nprocs int ocla size

Reading the Array

int icla size int overlap

A numb er of routines are provided to read the ar

int elemsize int storage

ray from the le If each pro cessors lo cal array can

t in its main memory then the entire lo cal array can

The parameters to this routine are a p ointer to

b e read using the routine PASSION read

the OCAD size of the array distribution of the

array numb er of pro cessors size of the OCLA

int PASSION readPFILE PFilePtr

size of the ICLA overlap information size of each

OCAD OCADptr char Array

element of the array and the storage order of the

array in the le ROW MAJOR or COLUMN MAJOR

The parameters are a parallel le p ointer p ointer to

the OCAD and a p ointer to a buer in main memory

Once the OCAD is initialized it can b e used to ac

to store the array This routine is only for the Lo cal

cess the outofcore array After all the accesses have

Placement Mo del In the Global Placement Mo del

b een p erformed the OCAD is no longer necessary and

even if the entire lo cal array ts in memory it has to

should b e deallo cated This can b e done using the rou

b e read by sp ecifying its lower b ound upp er b ound

free OCAD tine PASSION

and stride in the global array

free OCADOCAD OCADptr void PASSION

Reading Array Sections

Op ening and Closing Files

If the array cannot t in memory sections of the array

Files should only b e op ened and closed with the

need to b e read at a time PASSION provides rou

routines PASSION open and PASSION close

tines to read sections of the array with strides in each

dimension Separate routines are provided for read

openchar FileName PFILE PASSION

ing array sections in the Lo cal and Global Placement

unsigned int HeaderSize

Mo dels

int PASSION closePFILE PFilePtr

Lo cal Placement Mo del The routine

open are the name of the The parameters to PASSION

PASSION read section is used to read array sec

le and size of the header at the start of the le It

tions in the Lo cal Placement Mo del

returns a parallel le p ointer Note that in the Lo cal

Placement Mo del each pro cessor op ens its own sepa

int PASSION read sectionPFILE PFilePtr

rate lo cal array le whereas in the Global Placement

OCAD OCADptr char Array int Index

Mo del all pro cessors op en a common le

int AccessArray

Accessing the File Header

The parameters are a parallel le p ointer p ointer

PASSION provides supp ort for les containing to the OCAD buer in memory to store the sec

some other information in addition to the array in tion co ordinates of the lo cation in the buer from

the form of a header at the start of the le The header where the section is to b e stored and the sec

can b e read using the routine PASSION read header tion to b e read sp ecied by an access descriptor

see Section The parameters are a parallel le see Section Data sieving is used to read

p ointer p ointer to the OCAD and the p osition in the strided sections This routine reads the

OCLA from where the read op eration is to start It array section from the lo cal array le to the sp ec

returns a p ointer to the reuse descriptor ied lo cation in memory The shap e of the sec

tion is retained To save memory the section is

PASSION read reuse is used to read data with

stored without stride in memory even if there was

reuse

a stride in the OCLA

int PASSION read reuseREUSE REUSEptr

Global Placement Mo del The routine

char Array

PASSION global read can b e used to read

array sections in the Global Placement Mo del

The parameters are a p ointer to the reuse descrip

Each pro cessor can access any arbitrary section

tor and a p ointer to a buer in memory to store data

of the array The sections requested by dierent

The return value indicates when end of le is reached

pro cessors could b e distinct overlapping or even

Figure illustrates how reuse works

identical

Writing the Array

int PASSION global readPFILE PFilePtr

A numb er of routines are provided to write arrays

OCAD OCADptr char Array int Index

to les If each pro cessors lo cal array can t in its

int AccessArray int nprocs

main memory then the entire lo cal array can b e writ

ten using the routine PASSION write

The parameters are the same as

for PASSION read section with the addition of

int PASSION writePFILE PFilePtr

the numb er of pro cessors since this is a collective

OCAD OCADptr char Array

read op eration This routine uses the Extended

TwoPhase Metho d describ ed in

The parameters are a parallel le p ointer p ointer to

the OCAD and a p ointer to a buer in main memory

Data Prefetching

containing the array This routine is only for the Lo

cal Placement Mo del In the Global Placement Mo del

The PASSION library provides routines for prefetch

even if the entire lo cal array ts in memory it has to

ing data b efore it is needed Prefetching is basically

b e written by sp ecifying its lower b ound upp er b ound

a nonblo cking read op eration This can b e used to

and stride in the global array

overlap computation with IO and thus reduce the

time sp ent in waiting for IO

Writing Array Sections

PREFETCH PASSION read prefetchPFILE PFilePtr

If the array cannot t in memory sections of the array

OCAD OCADptr char Array int Index

need to b e written at a time PASSION provides rou

int AccessArray

tines to write sections of the array with strides in each

dimension Separate routines are provided for writ

This routine is used to start a prefetch op er

ing array sections in the Lo cal and Global Placement

ation The parameters are the same as for

Mo dels

PASSION read section It returns a p ointer to a

prefetch descriptor see Section

Lo cal Placement Mo del The routine

The routine PASSION prefetch wait can b e used

PASSION write section is used to write array

to wait for a previously initiated prefetch op eration to

sections in the Lo cal Placement Mo del

complete

write sectionPFILE PFilePtr int PASSION

int PASSION prefetch waitPREFETCH PREFETCHptr

OCAD OCADptr char Array int Index

int AccessArray

Data Reuse

The parameters are a parallel le p ointer p ointer

to the OCAD buer in memory containing the Data reuse can b e p erformed using the routines

section co ordinates of the starting lo cation of PASSION reuse init and PASSION read reuse

the section in the buer and the section to b e

written sp ecied by an access descriptor see Sec read reusePFILE PFilePtr REUSE PASSION

tion Data sieving is used to write strided OCAD OCADptr int start

sections This routine writes the array

section from the sp ecied lo cation in the buer PASSION reuse init initializes the reuse descriptor OCLA

Call PASSION_reuse_init Data Used Data Read

Lower Overlap First call to PASSION_read_reuse Upper Overlap

Lower Overlap Second call to PASSION_read_reuse Upper Overlap

Lower Overlap Third call to PASSION_read_reuse Upper Overlap

Lower Overlap Fourth call to PASSION_read_reuse Upper Overlap

Fifth call to PASSION_read_reuse returns -1

Figure Data Reuse

Technical Rep ort SCCS NPAC Syracuse to the lo cal array le The shap e of the section

University February is retained The section is assumed to b e stored

with unit stride in memory but is written to the

P Corb ett D Feitelson Y Hsu J Prost

le with the sp ecied stride

M Snir S Fineb erg B Nitzb erg B Traversat

and P Wong MPIIO A Parallel IO Interface

Global Placement Mo del The routine

for MPI Version Technical Rep ort NAS

PASSION global write can b e used to write ar

NASA Ames Research Center January

ray sections in the Global Placement Mo del

If the sections requested to b e written by dif

P Corb ett D Feitelson J Prost and S Baylor

ferent pro cessors have some elements in com

Parallel Access to Files in the Vesta File System

mon there is a p otential data consistency prob

In Proceedings of Supercomputing pages

lem PASSION global write has b een imple

Novemb er

mented such that if there are write requests from

multiple pro cessors to the same lo cation the data

E DeBenedictis and J del Rosario nCUBE Par

th

from the highest numb ered pro cessor is written to

allel IO Software In Proceedings of Inter

the le

national Phoenix Conference on Computers and

Communications pages April

global writePFILE PFilePtr int PASSION

OCAD OCADptr char Array int Index

J del Rosario R Bordawekar and A Choud

int AccessArray int nprocs

hary Improved Parallel IO via a TwoPhase

Runtime Access Strategy In Proceedings of the

The parameters are the same as for

Workshop on IO in Paral lel Computer Systems

PASSION write section with the addition of

at IPPS pages April

the numb er of pro cessors since this is a collec

N Galbreath W Gropp and D Levine

tive write op eration The Extended TwoPhase

ApplicationsDriven Parallel IO In Proceedings

Metho d is used for writing sections

of Supercomputing pages Novemb er

Conclusions

N Nieuwejaar and D Kotz Lowlevel Interfaces

Portable highlevel interfaces such as the PAS

for Highlevel Parallel IO In Proceedings of the

SION interface make it easier for the user to sp ecify

Third Annual Workshop on IO in Paral lel and

the IO required in parallel applications There is no

Distributed Systems pages April

standard highlevel IO interface at present but we

b elieve that the ideas used in PASSION and the ex

K Seamons and M Winslett An Ecient Ab

p erience gained in its development would help in the

stract Interface for Multidimensional Array IO

denition of such a standard

In Proceedings of Supercomputing pages

The development of the PASSION library is an

Novemb er

ongoing pro cess Version has b een available

since February and Version will b e re

R Thakur Runtime Support for InCore and

leased so on We are also in the pro cess of us

OutofCore DataParal lel Programs PhD the

ing the PASSION library for IO in several real

sis Dept of Electrical and Computer Engineer

parallel applications and studying the p erformance

ing Syracuse University May

b enets Further information ab out PASSION in

R Thakur R Bordawekar A Choudhary cluding the co de can b e obtained from the URL

R Ponnusamy and T Singh PASSION Run httpwwwcatsyredupassionhtml

time Library for Parallel IO In Proceedings of

the Scalable Paral lel Libraries Conference pages References

Octob er

A Choudhary R Bordawekar M Harry

R Krishnaiyer R Ponnusamy T Singh and

R Thakur and A Choudhary Collective IO

R Thakur PASSION Parallel and Scalable

Using an Extended TwoPhase Metho d with Dy

Software for InputOutput Technical Rep ort

namic Partitioning Technical Rep ort SCCS

SCCS NPAC Syracuse University Septem

NPAC Syracuse University March

b er Also available as CRPC Technical Re

p ort CRPCTRS

A Choudhary R Bordawekar S More

K Sivaram and R Thakur A Users Guide

for the PASSION Runtime Library Version