<<

Syracuse University SURFACE

Northeast Parallel Architecture Center College of and

1994

A Study of Development for High Performance

Manish Parashar Syracuse University

Salim Hariri Syracuse University

Tomasz Haupt Syracuse University

Geoffrey C. Fox Syracuse University

Follow this and additional works at: https://surface.syr.edu/npac

Part of the Commons

Recommended Citation Parashar, Manish; Hariri, Salim; Haupt, Tomasz; and Fox, Geoffrey C., "A Study of Software Development for High Performance Computing" (1994). Northeast Parallel Architecture Center. 71. https://surface.syr.edu/npac/71

This Article is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized administrator of SURFACE. For more , please contact [email protected].

A Study of Software Development for High

Performance Computing

Manish Parashar Salim Hariri Tomasz Haupt and Georey Fox

Northeast Parallel Architectures Center

Syracuse University

Published in

Programming Environments for Massively Parallel Distributed

Birkhauser Verlag Basel Switzerland August

Also presented at

IFIP WG Working Conference on Programming Environments

for Massively Parallel Distributed Systems

Abstract

Software development in a High Performance Computing HPC environment is

nontrivial and requires a thorough understanding of the application and the ar

chitecture The objective of this paper is to study the software development process

in a high performance computing environment and to outline the stages typical ly

encountered in this process Support required at each stage is also highlighted The

modeling of stock option pricing is used as a running example in the study

Intro duction

Software development in any High Performance ParallelDistributed Computing

HPC environment is a nontrivial pro cess and requires a thorough understanding

of the application and the architecture This is apparent from the fact that applica

tions currently achieve only a fraction of p eak available p erformance Zor HPC

software development requires the develop er to resolve and tune a large numb er

of available design options For example during the course of software develop

ment the develop er is required to select the optimal hardware conguration for a

particular application the b est decomp osition and mapping of the problem onto

the selected hardware conguration the b est communication and synchronization

strategy to b e used etc Using conventional techniques this would require exten

sive exp erimentation collection and p ostpro cessing The set of reasonable

alternatives that have to b e evaluated is very large and selecting the b est among

these is a formidable task As a result the exploitation of the vast p otential of

HPC systems will largely b e governed by the availability of suitable to ols and

application development environments to supp ort application develop ers

The ob jective of this pap er is to study the software development pro cess in a

high p erformance computing environment and to outline the stages encountered

Further the nature of supp orting to ols that can assist the develop er at each stage

are identied Parallel mo deling of sto ck option pricing is used as an illustrative

example in the study The rest of the do cument is as organized follows Section

presents the study of HPC software development pro cess and outlines the stages

subsections Section presents some conclusions

HPC Software Development

The HPC software development pro cess is describ ed as a set of stages which cor

resp ond to the phases typically encountered by a develop er At each stage a set

of supp ort to ols which can assist the develop er are identied The stages can b e

viewed as a set of lters in cascade see Figure forming a development pip eline

The input to this of lters is the application description and sp ecication

which is generated from the application itself if it is a new problem or from

existing sequential co de p orting of dusty decks The nal output of the pip eline

is a running application Feedback lo ops present at some stages signify stepwise

renement and tuning Related discussions p ertaining to envi

ronments and spanning parts of the software development pro cess can b e found

in BM BBDK RL A survey of existing to ols and techniques corresp ond

ing to the develop emnt stages is presented in PHHFa The stages in the HPC

software development pro cess are describ ed in the following sections Parallel mo

+

eling of Sto ck Option Pricing MCV is used as an illustrative running example

in the discussion

Parallel Mo deling of Sto ck Option Pricing

Sto ck options are contracts that give the holder of the contract the right to buy or

sell the underlying sto ck at some time in the future for an agreed up on striking or

exercise price Option contracts are traded just as sto cks and mo dels that quickly

and accurately predict their prices are valuable to the traders Sto ck option pricing

mo dels estimate the price for an option contract based on historical market trends

and current market information The mo del requires three classes of inputs Mar

ket Variables which include the current sto ck price call price exercise price and

time to maturity Mo del Parameters which include the volatility of the asset

variance of the asset price over time variance of the volatility and the correlation

b etween asset price and volatility These parameters cannot b e b e directly observed

and must b e estimated from historical data Inputs which sp ecify the na Dusty Decks New Application

Application Specification Application Specification Filter Filter

Application Specification

Application Analysis Stage

Parallelization Specification

Algorithm Development Module System Level Mapping Module

Design Evaluator Module

Implementation/Coding Module Machine Level Mapping Module Application Development Stage

Parallelized

Compile−Time/Run−Time Stage

Evaluation Specification

Evaluation Stage

Evaluation Recommendation

Maintenance/Evolution Stage

Figure The HPC Software Development Pro cess

ture of the required estimation eg AmericanEurop ean call constantsto chastic

volatility time of dividend payo and other constraints regarding acceptable ac

curacy and running times A numb er of option pricing mo dels have b een develop ed

using varied approaches eg nonsto chastic analytic mo dels Monte Carlo simu

lation mo dels binomial mo dels binomial mo dels with forced recombination etc

Each of these mo dels involve a set of tradeo s in the nature and accuracy of the

estimation and suit dierent user In addition these mo dels

varied demands in terms of programming mo dels and computing resources

Inputs

The HPC software development pro cess presented in this section addresses new

application development as well as the p orting of exiting applications Dusty

Decks to HPC environments The input to the development pip eline is the ap

plication sp ecication in the form of a functional ow description which is a very

highlevel ow of the application outlining the sequence of functions to

b e p erformed Each no de termed as functional mo dule in the functional ow di

agram is a blackb ox and contains information ab out its inputs the func

tion to b e p erformed the desired outputs and the resource requirements

at each no de The application sp ecication can b e thought of as corresp onding to

the user do cument in a traditional lifecycle mo dels

In the case of new applications the inputs are generated from the textual

description of the problem and its requirements In the case of dusty decks co de

p orting the develop er is required to analyze the existing source co de In either case

exp ert system based to ols and intelligent editors b oth equipp ed with a knowledge

base to assist in analyzing the application are required In Figure these to ols

are included in the Application Sp ecication Filter mo dule

The sto ck price mo deling application comes under the rst class of applica

tions ie new applications The application sp ecications based on the textual

description presented in Section is shown in Figure It consists of three func

tional mo dules The input mo dule which accepts user sp ecication market

information and historical data and generates the three classes of inputs required

by the mo del The estimation mo dule consists of the actual mo del and generates

the sto ck option pricing estimates The output mo dule provides a graphical

display of the estimation to the user The feedback from the output mo dule to

the input mo dule represents tuning of the user sp ecication based on the output

displayed

Application Analysis Stage

The rst stage of the HPC software development pip eline is the application anal

ysis stage The input to this stage is the application sp ecication as describ ed in

Section The function of this stage is to thoroughly analyze the application

with the sole ob jective of achieving the most ecient The prob

lems dealt with in this stage are mo dule creation problem ie identication

of tasks which can b e executed in parallel mo dule classication problem ie

identication of standard mo dules and mo dule synchronization problem ie

analysis of mutual interdep endencies The output of this stage is a detailed pro

cess ow graph called the Parallelization Sp ecication where the no des represent

analyzing historical data and generating mo del parameters and accepting

The Input functional mo dule is sub divided into two functional comp onents

The parallelization sp ecication for the running example is shown in Figure

of transformations and strategies applicable to the sp ecic problem

mo dules into standard mo dules and problem sp ecic to ols equipp ed with a

identify p otentially parallelizable mo dules and attempt to classify the functional

learning capabilities which can use the directed graphs to analyze dep endencies

rected graph mo dels from the application sp ecications intelligent to ols with

of software development are smart editors which can interactively generate di

corresp onds to the design do cument To ols which can assist the user at this stage

sp onds to the design phase in standard software lifecycle mo dels and its output

functional comp onents and the edges represent interdep endencies This stage corre

ications

ing Mo del Application Sp ec tion Sp ecications

Figure Sto ck Option Pric Figure Sto ck Option Pricing Mo del Paralleliza

market information and user inputs to generate market variables and estimation

sp ecications The two comp onents can b e executed concurrently The Estimation

mo dule is identied as a standard computational mo dule and is retained as a single

functional comp onent The Output functional mo dule consists of two indep endent

functional comp onents rendering the estimated information onto a graphical

display and writing it onto disk for subsequent analysis

Application Development Stage

The application development stage receives as its input the Parallelization Sp eci

cations and pro duces the Parallelized Structure which can then b e compiled and

executed This stage is made up of mo dules Development Mo d

ule System Level Mapping Mo dule Machine Level Mapping Mo dule

ImplementationCo ding Mo dule and Design Evaluator Mo dule It should b e

noted however that these mo dules are not executed in any xed sequence or a

xed numb er of times There exists instead a feedback system from each mo dule

to the other mo dules through the design evaluator mo dule This allows the devel

opment as well as the tuning to pro ceed in an iterative manner using stepwise

renement The mo dules are describ ed b elow

Algorithm Development Mo dule

The function of the algorithm development mo dule is to assist the develop er in

identifying functional comp onents in the parallelization sp ecication and selecting

appropriate algorithmic The input information to this mo dule

includes the classication and requirements of the comp onents sp ecied in the

parallelization sp ecication hardware conguration information and map

ping information generated by the system level mapping mo dule It then uses this

information to select the b est algorithmic implementation and the corresp onding

implementation template from its database The algorithm development mo dule

uses the services of the design evaluator mo dule to select b etween p ossible al

gorithmic implementations To ols needed during this phase include an intelligent

algorithm development environment ADE equipp ed with a database of opti

mized templates for dierent algorithmic implementations an evaluation of the

requirements of these templates and an estimation of their p erformance on dier

ent platforms

The algorithm chosen to implement the Estimation Comp onent of the sto ck

option pricing mo del shown in Figure dep ends on the nature of the esti

mation constantsto chastic volatility AmericanEurop ean callsputs dividend

payo time etc to b e p erformed and the accuracytime constraints For exam

ple mo dels based on Monte Carlo simulation provide high accuracy However

these mo dels are computationally intensive and slow and thereby cannot b e used

in realtime systems Further they are not suitable for American callsputs when

early dividend payo is p ossible Binomial mo dels are accurate than Monte

Carlo mo dels but are more tractable and can handle early exercise Mo dels using

constant volatility as opp osed to treating volatility as a sto chastic pro cess lack

accuracy but are simplistic and easy to compute The algorithmic implementations

of the input and output functional comp onents must b e capable of handling ter

minal and disk IO at rates sp ecied by the time constraint parameters Further

the output display must provide all information required by the user

System Level Mapping Mo dule

The function of the system level mapping mo dule is to use the information provided

by the algorithm development mo dule to appropriately map the functional com

p onents of the application to the appropriate computing elements of a distributed

p ossibly heterogeneous HPC environment The ob jective is to map each func

tional comp onent to the computing element that maximizes the p erformance of

the application Some data and load distribution issues may have to b e resolved

in this mo dule In addition this mo dule may also cluster functional comp onent

no des sp ecied in the parallelization sp ecications to obtain a b etter mapping

The system level mapping mo dule uses feedback from the evaluation mo dule to

select b etween dierent mapping candidates System level mapping can b e ac

complished in an interactive mapping environment equipp ed with intelligent to ols

for analyzing the requirements of the functional comp onents and a knowledge

base consisting of analytic b enchmarks for the dierent computing elements and

interconnection media in the HPC environment

The for sto ck option pricing have b een eciently implemented

+

on architectures like the CM and the DECmpp MCV Thus an ap

propriate mapping for the estimation functional comp onent in the parallelization

sp ecication in Figure is an SIMD architecture The input and output interfaces

InputOutput Comp onentA require graphics capability with supp ort for high

sp eed rendering output display and must b e mapp ed to an appropriate graphics

stations Finally InputOutput Comp onentB requires high sp eed disk IO and

must b e mapp ed to an IO server with such capabilities

Machine Level Mapping Mo dule

The machine level mapping mo dule p erforms the mapping of the functional com

p onents onto the pro cessors of the computing elements This stage resolves is

sues like data partitioning load distribution control distribution etc and makes

transformations sp ecic to that computing element It uses the feedback from

the design evaluator mo dule to select b etween p ossible alternatives Machine level

mapping can b e accomplished in an interactive mapping environment similar to

that describ ed for the system level mapping mo dule but equipp ed with informa

tion p ertaining individual computing elements of a sp ecic

The p erformance of the sto ck option pricing mo dels are very sensitive to the

layout of data onto the pro cessing elements The optimal layout is dictated by

the input parameters eg time of dividend payo terminal time etc and by

the sp ecication of the architecture onto which the comp onent is mapp ed For

example in the binomial mo del the continuous time pro cesses for sto ck price and

volatility are represented as discrete updown movements forming a binary lattice

Such a lattice is generally implemented as asymmetric arrays which are distributed

onto the pro cessing elements It has b een found that the default mapping of these

arrays ie in two dimensions on architectures like the DECmpp lead to

p o or load balancing and p erformance sp ecially for extreme values of the dividend

payo time Further the p erformance in case of such a mapping is very sensitive

to this value and has to b e mo died for each set of inputs Hence in this case it

is favorable to explicitly map them as one dimensional arrays This is done by the

machine level mapping mo dule

ImplementationCo ding Mo dule

The function of the implementationco ding mo dule is to handle all co de generation

and p erform the co de lling of selected templates so as to pro duce parallel co de

which can then b e compiled and executed on the target computer architecture

This mo dule incorp orates all machine sp ecic transformations optimized libraries

and co des handles the intro duction of calls to communication and synchronization

routines and takes care of the distribution of data among the pro cessing elements

It also handles any inputoutput redirection that may b e required

With regard to the pricing mo del application the implementationco ding

mo dule is resp onsible for intro ducing the machine sp ecic communication rou

tines For example the binary estimation mo del makes use of the endofshift

function for its nearestneighb or communication The corresp onding function call



in C CM or MPL DECmpp are intro duced by this mo dule A p os

sible machine sp ecic optimization that can b e intro duced by this mo dule is to

reduce communication by making use of inpro cessor arrays This optimization

+

can improve p erformance by ab out two orders of magnitude MCV

Design Evaluator Mo dule

The design evaluator mo dule is a critical comp onent of the application develop

ment stage Its function is to assist the develop er in evaluating dierent options

available to each of the other mo dules and identifying the option that provides the

b est p erformance It receives information ab out the hardware conguration the

application structure the requirements of the selected algorithms and the map

pings This input information is then used to estimate the p erformance of the

application on the target conguration Further it provides insight into the com

putation and communication costs the existing idle times and the overheads This

information can b e used by the other mo dules to identify regions where further

renement or tuning is required The keys features of this mo dule are the

ability to provide evaluations with the desired accuracy with minimum resource

requirements and within a reasonable amount of time the ability to auto

mate the evaluation pro cess and the ability to p erform the evaluation within

an integrated workstation environment without running the application on the

target computers Supp ort applicable to this mo dule consists primarily of p erfor

mance prediction and estimation to ols Simulation approaches can also b e used to

achieve some of the required functionality A novel approach which uses interpre

tive techniques to realize a p erformance prediction framework that can meet these

requirements is presented in PHHFb

CompileTime RunTime Stage

The compiletimeruntime stage handles the task of executing the parallelized

application generated by the development stage to pro duce the required output

The input to this stage is the parallelized source co de parallelized structure

The compiletime p ortion of this stage consists of set of cross for the

computing elements and to ols for scheduling and allo cation The runtime p or

tion of this stage handles runtime functions like scheduling dynamic

load balancing migration irregular communications etc It also enables the user

to nonintrusively instrument the co de for proling and debugging and allows

checkp ointing for faulttolerance During the execution of the application it ac

cepts outputs from the dierent computing elements and directs them for prop er

It intercepts error messages generated and provides prop er interpre

tation

Evaluation Stage

In the evaluation stage the develop er retrosp ectively evaluates the design choices

made during the design pro cess and lo oks for ways to improve the p erformance

The evaluation stage p erforms a thorough evaluation of the execution of the en

tire application detailing communication and computation times synchronization

overheads and existing idle times at every execution level application level no de

level pro cedure level etc It uses this evaluation to identify regions in the im

plementation where p erformance improvement is p ossible Further it allows a

costeective evaluation in terms of time and resources of the application for a

representative inputs set as well as the eect of various runtime parameters like

system load network contention on p erformance The scalability of the applica

tion with machine and problem size is also evaluated The key requirement of this

stage is the ability to provide desired accuracy and granularity of evaluation while

maintaining tractability and nonintrusiveness Supp ort applicable to the evalua

tion stage include dierent analytic to ols monitoring to ols simulation to ols and

predictionestimation to ols

MaintenanceEvolution Stage

In addition to the ab ove describ ed stages encountered during the development

and execution of HPC applications there is an additional stage in the lifecycle of

this software which involves its maintenance and evolution Maintenance includes

monitoring the op eration of the software and ensuring that it continues to meet

its sp ecications It involves detecting and correcting bugs as they surface The

maintenance stage also handles mo dications needed to incorp orate changes in

the system conguration Software evolution deals with improving the software

adding additional functionality incorp orating new optimizations etc Another as

p ect of evolution is the development of more ecient algorithms and corresp ond

ing algorithmic templates and the incorp oration of new hardware architectures To

supp ort such a development the maintenanceevolution stage provides to ols for

the rapid prototyping of hardware and software and for evaluating the new cong

uration and designs without having to implement them Other supp ort required

during this stage includes to ols for monitoring the p erformance and execution of

the software fault detection and recovery to ols and system conguration and

conguration evaluation to ols

Conclusions

Software development in any ParallelDistributed environment is a nontrivial pro

cess and requires a thorough understanding of the application and the architecture

This apparent from the fact that currently applications are able to achieve only

a fraction of p eak available p erformance This pap er studies the software develop

ment pro cess for in a High Performance Computing environment It describ es the

stages typically involved in this pro cess and outlines the supp ort required at each

stage The development of a parallel mo del for sto ck option pricing is used as a

running example

References

BBDK J E Boillat H Burkhart K M Decker and P G Kropf Parallel Comput

ing in the s Attacking the Software Problem Physics Report Review

Section of Physics Letters

BM Victor R Basili and John D Musa The Future Engineering of Software A

Management Persp ective IEEE Computer Septemb er

+

MCV Kim Mills Gang Cheng Michael Vinson Sanjay Ranka and Georey C

Fox Software Issues and Performance of a Parallel Mo del for Sto ck Op

th

tion Pricing Proceedings of the Australian Supercomputing Conference

Melbourne Australia Decemb er

PHHFa Manish Parashar Salim Hariri Tomasz Haupt and Georey C Fox An In

tegrated Software Development Mo del for Heterogeneous High Performance

Computing Technical Rep ort SCCS Northeast Parallel Architectures

Center Syracuse University Syracuse NY April

PHHFb Manish Parashar Salim Hariri Tomasz Haupt and Georey C Fox An

Interpretive Framework for Application Prediction Procs of the Intl

Conference On Paral lel and Distributed Systems Dec

RL Lucian Russell and R N C Lightfo ot Software Development Issues for

th

Parallel Pro cessing Proceedings of the Annual International Computer

Software and Applications Conference

Zor Glenn Zorp ette Teraops Galore IEEE Spectrum sep