University of Vienna

Institute Rep ort

Institute for Software Science

University of Vienna

Vienna

Preface

This rep ort describ es the scientic activities of the Institute for Software Science the former

Institute for Software Technology and Paral lel Systems at the University of Vienna for the twoyear

p erio d from January through Decemb er

The primary ob jectives of the Institute are

 to conduct research in programming languages compilers programming environments and

software to ols that supp ort the user in the pro cess of solving problems on high p erformance

computing systems

 to actively contribute to a transfer of technology by participating in joint development

pro jects with application develop ers and industry and

 to disseminate knowledge in the elds of parallel computing and software technology

Research at the Institute included the continuation of work related to High Performance Fortran

HPF including the sp ecication of the HPF language and the development of the Vienna For

tran Compiler VFC as well as related p erformance analysis and prediction to ols such as SCALA

3

and P T We also initiated new activities in a numb er of elds such as proleguided optimization

parallel data mining in large scientic databases interop erability b etween OpusHPF and Java

and programming mo dels for massively parallel pro cessorinmemory arrays An imp ortant new

fo cus of work b ecame the Sp ecial Research Program F AURORA of the Austrian Science

Fund FWF which started in April and reached the end of its rst phase in early

AURORA is a joint pro ject of the Institute with application designers from various disciplines

including nancial optimization quantum mechanics and semiconductor simulation

The European Centre of Excel lence for Paral lel Computing VCPC is a department of the Institute

that was founded in with the help of a Europ ean Union grant and additional supp ort from

the Austrian government and the Austrian Science Fund Its main fo cus is a transfer of technology

from academia to industry in the area of High Performance Computing During the rep orting

p erio d the VCPC participated in a numb er of ESPRIT pro jects of the EUs Fourth Framework

Programme and also contributed to AURORA

Research and development at the Institute were p erformed in co op eration with many institutions

across the world Emphasis was placed on strengthening the collab oration with academic and in

dustrial application partners in particular in the context of AURORA various ESPRIT pro jects

and direct co op eration pro jects with industry

We would like to express our thanks to the Austrian Ministry of Education Science and Culture

the Austrian Research Fund FWF the Europ ean Commission and the University of Vienna for

their generous supp ort and invaluable advice and guidance The co op eration with the Institute i

for Computer Applications in Science and Engineering ICASE NASA Langley Research Center

in Hampton Virginia has played an imp ortant role in our work A new co op eration with the

Center for Advanced Computing Research CACR at the California Institute of Technology in

Pasadena California led to a new research direction targeting an emerging class of massively

parallel architectures Sp ecial thanks are due to our partners in the Industrial Aliates Program

Fujitsu and NEC

I am grateful to the Rector of the University of Vienna and the Dean of the Scho ol of So cial Studies

and Economics as well as to the many colleagues at the University of Vienna who have supp orted

us over the past years

Last but not least I would like to thank all memb ers of the Institute for their hard work and

continued enthusiasm

Vienna July Hans P Zima

Acknowledgement I thank Peter Brezany and Ian Glendinning for their contributions to editing this

Rep ort

Our Co ordinates

Institute for Software Science University of Europ ean Centre of Excellence for Parallel

Vienna Computing at Vienna VCPC

Liechtensteinstrasse Liechtensteinstrasse

A Vienna Austria A Vienna Austria

Telephone Telephone

Fax Fax

WWW Home Page httpwwwparunivieacat WWW Home Page httpwwwvcpcunivieacat

Anonymous FTP Server ftpparunivieacat Anonymous FTP Server ftpvcpcunivieacat ii

Contents

Ma jor Research Pro jects

The AURORA Sp ecial Research Program

Overview

Scientic Concept and Goals

Results of Phase

The ESPRIT Pro ject HPF

APART ESPRIT Working Group on Automatic Performance Analysis Resources and To ols

The Industry Co op eration Pro ject ADVANCE

Languages Compilers and Runtime Systems for Scientic Computing

The HPF Language

VFC

Distributed High Performance Computing with Opus and

OpusJava

Prole Guided Optimizations

Advanced Communication Optimizations for Distributed and Parallel Systems

Parallel Access to Persistent Arrays from HPF Application s

Mining Large Spatial Databases With Parallel Pro cessing

Macroservers An Execution Mo del for Pro cessorinMemory Arrays

Pro cessorinMemory

The Macroserver Mo del

To ols

Intro duction

3

P T

SCALA

SPIDER

Europ ean Centre of Excellence for Parallel Computing at Vienna VCPC

Intro duction

Computing Facilities

CS iii

PC cluster

The ESPRIT Pro ject ATTN and Asso ciated Pro jects

The ESPRIT Pro ject FRAME

The ESPRIT Pro ject FLOAT

The ESPRIT Pro ject HPCNCAST

The ESPRIT Pro ject APPEAL

The ESPRIT Pro ject FITS

The ESPRIT Pro ject VICAR

The FSP Pro ject Mathematical Metho ds and To ols for Digital Image Pro cessing

The ACTS Pro ject AC DIANE

Teaching

University Courses

Other Courses

Diploma Theses

Publications

Chapters in Bo oks

Refereed Publication s

Technical Rep orts

Editorial Activities

Program Committee Memb erships

PhD and Habilitation Theses

Exhibitions and Conferences

Lectures Research Visits and Visitors

Lectures and Research Visits

Visitors and Guest Lectures

Faculty and Sta

Bibliography iv

Chapter

Ma jor Research Pro jects

The AURORA Sp ecial Research Program

Hans Zima

Overview

The Special Research Program AURORA Advanced Mo dels Applications and Software

Systems for High Performance Computing is an academic interdisciplin ary tenyear research

program supp orted by the Austrian Science Fund FWF AURORA is co ordinated by our Institute

and led by Hans Zima Sp eaker and KarlHeinz Schwarz Deputy Sp eaker Phase started in

April and was concluded in early Below we give the list of pro jects participating in

AURORA together with the pro ject heads and their aliation

Coordination Project Hans Zima Institute for Software Science University of Vienna

Languages and Compilers for Scientic Computation Hans Zima

Tools Thomas Fahringer Institute for Software Science University of Vienna

Numerical Algorithms and Software for High Performance Computers Christoph Ub erhub er Institute

for Numerical Mathematics Vienna University of Technology

Paral lel Algorithms for Dynamic Stochastic Optimization in Financial Planning Georg Pug De

partment of Statistics Op erations Research and Computer Metho ds University of Vienna

Quantum Mechanical Calculations of Solids with WIEN KarlHeinz Schwarz Theoretical Chem

istry at the Institute of Technical Electro chemistry Vienna University of Technology

Paral lelization of Program Packages for the Simulation of Semiconductor Processes and Devices Eras

mus Langer Institute for Micro electronics Vienna University of Technology

Services and Systems Barbara ChapmanWilly Weisz VCPC University of Vienna

Scientic Concept and Goals

Research in AURORA fo cuses on highlevel software for High Performance Computing HPC

systems The ma jor goals include

 pushing the stateoftheart in highlevel programming paradigms languages and program

ming environments for HPC systems

 studying and developing new mo dels applications and algorithms for HPC

 implementing advanced application b enchmarks eciently in parallel across a range of HPC

systems and

 contributing actively to international research and standardization eorts

Results of Phase

At the heart of AURORA lies the co op eration b etween language and to ol develop ers on the one

hand and algorithm and application designers on the other hand We give a brief overview of some

ma jor results obtained during Phase b elow A more detailed technical discussion of language

compiler and to ol development can b e found in Chapters and information ab out the other

pro jects can b e retrieved via the AURORA web page

Language Compiler and To ol Development

Language compiler and to ol development fo cused on HPF This work contributed to the de

nition of the HPF defacto standard and resulted in the development of the HPF language

which addresses imp ortant p erformance issues for advanced applications see Section The

Vienna Fortran Compiler VFC is a newlydevelop ed sourcetosource translator mapping HPF

to Fortran plus message passing MPI VFC fo cuses on the generation of ecient co de for

irregular computations and provides a new highly advanced runtime system A supp ort system for

parallel IO was designed and implemented covering checkp ointing and outofcore computations

Additional work in the language area went b eyond the dataparallel SPMD paradigm represented

by HPF the ob jectbased Opus language supp orts the taskparallel pro cessing of HPF programs

see Section We also studied the integration of HPF with threadbased systems for shared

address spaces in particular the integration of HPF with Op enMP

In the context of HPF and VFC a variety of software to ols were develop ed see Chapter

SCALA is a p ortable instrumentation measurement and p ostexecution p erformance analysis sys

3

tem for distributed and parallel programs The p erformance prediction to ol P T estimates the

p erformance of regular HPF programs and oers supp ort for exp erimentation with data distri

bution strategies Further developments include a symb olic highlevel debugger for HPF and a

graphical user interface GUI

Algorithms and Applications

In addition to a pro ject dealing with the design of numerical algorithms AURORA includes three

application groups working in the areas of nancial planning quantum mechanics and semicon

ductor simulation

The work in numerical algorithms fo cused on numerical linear algebra and fast Fourier trans

forms FFTs ranging from the provision of theoretical results and exp erimental schemes for the

assessment of highp erformance algorithms to the development of application software In the eld

of numerical linear algebra new reductionbased metho ds for solving generalized symmetric eigen

problems were develop ed A multielimination metho d for solving large linear systems allows for

a recursive reduction of the size of the systems to b e solved and is based on a sp ecially designed

new reordering strategy for sparse matrices New FFT algorithms with optimized utilization of

multiplyadd instructions were designed resulting in a signicant reduction of complexity Based

on a twolevel recursive Kronecker pro duct factorization an approach was intro duced to overlap

all communication intensive parts of fourstep FFT algorithms initial data distribution matrix

transp osition and nal data collection with computation

The Financial Planning pro ject develop ed a new mo del for implementing parallel algorithms for

solving dynamic sto chastic nancial planning problems The mo del covers securities and related

cash ows a pricing kernel and optimization The pricing kernel uses parallelized simulation to

price complex and structured pro ducts Several optimization problems have b een implemented that

use data parallel co des based on interior p oint metho ds for solving sto chastic linear programming

mo dels Moreover rst results have b een obtained by applying decomp osition metho ds that require

task parallel co des

The Quantum Mechanics pro ject fo cused on WIEN a quantum mechanical package for cal

culating materials prop erties and searching for new higher p erformance materials WIEN is used

by more than groups across the world Within AURORA a coarse grain parallel version has

b een develop ed which runs very eciently due to new algorithms for solving the general eigenvalue

problem the main numerical task In addition numerical kernels were explored using HPF and

BLAS routines the imp ortance of level BLAS routines for achieving large sp eedup and improved

p erformance was demonstrated

The Semiconductor Simulation pro ject fo cused on the study and implementation of a parallel

version of a Monte Carlo co de for the simulation of ion implantation MCIMPL which places

extremely high demands on computing resources The parallel co de develop ed in Phase provides

nearly linear sp eedup on a heterogeneous cluster of workstations

The ESPRIT Pro ject HPF

Siegfried Benkner

The ESPRIT IV Long Term Research Pro ject HPF Optimizing HPF for Advanced

Applications started in January and was successfully concluded in April The

aims of this pro ject were to improve the HPF language and related compilation technology by

extending the functionality of HPF and developing compilation strategies based on the requirements

of a set of advanced applications Its ob jectives included

 the development of a set of advanced pro ject b enchmarks from the areas of computational

uid dynamics weather prediction and crash simulation

 the sp ecication of an extended HPF language HPF which addresses the requirements

of the pro ject b enchmarks

 a contribution to the standardization eort for HPF

 the extension and implementation of optimizing compiler technology for HPF

 the integration of the Measurements Description Evaluation and Analysis To ol MEDEA

of the University of Pavia for p erformance analysis of the pro ject b enchmarks and

 the evaluation of the new language and compiler technology via a comparison with imple

mentations based on HPF and explicit message passing

The consortium was comp osed of application designers ESI ECMWF AVL and NEC and b oth

academic and commercial language compiler and to ol develop ers Universities of Vienna and

Pavia and NA Software Our Institute managed the pro ject and was in charge of the development

of the HPF language and its implementation within the VFC compiler

The pro ject succeeded in demonstrating that HPF with a small set of language extensions and an

appropriate compiler and to ol infrastructure has the p otential to b e ecient for advanced industrial

applications sometimes approaching the p erformance of manually written messagepassing co de

APART ESPRIT Working Group on Automatic Performance

Analysis Resources and To ols

Thomas Fahringer

The ecient usage of parallel computers requires a strong involvement of the programmer in tun

ing the application Current to ols supp orting this pro cess collect p erformance data at runtime

and analyze it with the help of sophisticated visualization to ols Although these to ols make the

analysis pro cess more userfriendly the learning overhead as well as the overhead in applying these

interactive to ols is to o high

The goal of APART is to bring together to ol exp erts parallel computer vendors and software

companies to discuss all issues related to the automation of p erformance analysis for a broad range

of programming environments

APART includes three Europ ean companies FECIT NEC PALLAS six Europ ean universities

Technical University of Dresden University of Malaga Victoria University of Manchester Tech

nical University of Munich University of Pavia and University of Vienna one Europ ean research

center Forschungszentrum Julic h ZAM and three American universities Louisiana State Uni

versity University of Oregon University of Wisconsin It is organized in three workpackages

requirements for automatic p erformance analysis identication and formalization of knowledge

and implementation issues The exp ected results of APART include

 the denition of a common terminology

 a list of requirements for automatic p erformance analysis supp ort based on current and future

machines

 a summary of issues and techniques for implementing an automatic p erformance analyzer

interfacing current to ols

 a catalog of standard p erformance b ottlenecks together with rules for their automatic detec

tion in parallel programs on shared and distributed memory architectures and on

clusters of SMPs

 insights into the applicabili ty of the develop ed techniques to a broader range of other pro

gramming environments such as metacomputing distributed ob jectoriented programming

and internetbased intelligent agents

The Industry Co op eration Pro ject ADVANCE

Siegfried Benkner

ADVANCE is a joint pro ject of the Institute and NEC Europ e Ltd The main ob jectives of this

research co op eration include an adaptation and extension of the HPF halo concept for the op

timization of irregular applications as for example simulation co des based on nite element meshes

FEM and the development of HPF extensions for the supp ort of highlevel parallel programming

on clusters of SMPs in particular on clusters of NEC SX and SX vector sup ercomputers

These extensions will b e implemented and evaluated within the VFC compiler develop ed at

Vienna The ADVANCE pro ject is funded by the CC Research Lab oratories of NEC Europ e

Ltd St Augustin Germany

Chapter

Languages Compilers and Runtime

Systems for Scientic Computing

The HPF Language

Siegfried Benkner

High Performance Fortran HPF denes a set of language extensions to Fortran to facilitate

ecient parallel programming on a wide range of parallel architectures The initial version of HPF

HPF provided reasonable supp ort for regular co des but advanced algorithms based on highly

irregular data structures could not b e handled eciently HPF the current version of HPF

denes a set of approved extensions which address some requirements of irregular applications

by providing features for the distribution of data in an irregular manner and explicit control of load

balancing However the study of advanced industrial co des revealed that even with the approved

extensions certain information that may decisively inuence a programs p erformance cannot b e

expressed in HPF These limitations were the main motivation for the development of HPF an

improved version of HPF

The HPF development was p erformed within the longterm research pro ject HPF

funded by the Europ ean Unions ESPRIT program HPF contributed to the new defacto

standard HPF resulting in a convergence of some features However HPF went b eyond HPF

by providing sp ecialized control for an ecient handling of complex scientic applications which

involve constructs such as unstructured adaptive meshes sparse matrices irregular data structures

and computational tasks with dynamically changing computation loads and varying data access

patterns In the following we give a brief overview of the main features of HPF

HPF essentially includes the HPF Base language except that templates are not supp orted

and alignments as well as mechanisms for passing distributed arrays to pro cedures are simplied

Furthermore HPF adds features for the explicit equivalencing of pro cessor arrays

The advanced features of HPF have b een designed with the goal of supp orting the ecient

execution of highly complex applications In particular a range of metho ds for the sophisticated

control of data layout is provided Users can map p ointers and comp onents of derived typ es and

can map ob jects to subsets of pro cessors directly The GEN BLOCK distribution generalizes the blo ck

distribution by allowing nonequal sized blo cks the INDIRECT distribution allows each element of

BLOCK distributions an array to b e mapp ed individual ly using a mapping array and the MULTI

allow the mapping of multiple arbitrarilysized blo cks to pro cessors

Another imp ortant feature is the supp ort of dynamic remapping of data If an ob ject has b een

declared DYNAMIC then it can b e remapp ed at runtime using the REDISTRIBUTE directive

The ON directive allows users to map computation onto pro cessors The RESIDENT directive allows

the sp ecication of information ab out accesses to data ob jects within the scop e of an asso ciated ON

blo ck

The ab ove features except for the multiblo ck distribution are also included in the HPF Ap

proved Extensions HPF provides additional functionality in particular for the explicit control

of communication and lo cality

The REUSE clause can b e used to express the invariance of communication schedules asso ciated

with an indep endent lo op It asserts that the schedules for all arrays are invariant for all lo op

executions and thus have to b e computed only once up on rst execution of the lo op This clause

can b e guarded by a condition implying that schedules should b e only reused if the condition yields

true Recently these features have b een adopted by the Asso ciation of High Performance

Fortran and included in the sp ecication of HPFJA

Furthermore the language provides schedule variables which may b e explicitly b ound to commu

nication schedules by the user enabling the reuse of schedules b eyond a single lo op

The HALO directive of HPF extends the HPF SHADOW directive It allows the explicit sp ecication

of nonlo cal data access patterns In contrast to shadows halos may b e sp ecied for any distribution

and may b e dened and changed at runtime whenever the distribution of an array is changed The

information provided by halos is utilized by the compiler andor runtime system to optimize the

management of nonlo cal data and the asso ciated address translation mechanisms and to reduce

the prepro cessing overheads of irregular computations Moreover halos and asso ciated highlevel

communication primitives simplify the integrationreuse of optimized sequential co de and the

adoption of ecient lo cal programming techniques based on extrinsic pro cedures

Finally the PUREST directive can b e used to characterize a pure pro cedure with the additional

prop erty that its invo cations do not require communication

Current work in AURORA is analyzing the use of HPF in connection with asso ciated compilation

and runtime technology for clusters of SMPs see also Section

VFC

Siegfried Benkner

VFC is a sourcetosource parallelization system that translates HPF programs into parallel

FMPI messagepassing programs The messagepassing program that is generated is compiled

with the Fortran compiler of the target machine and linked with the VFC runtime system and

the MPI library to yield a parallel executable program VFC is available on a variety of parallel

platforms including the QSW CS the NEC Cenju the NEC SX the SGI Origin the IBM

SP PC clusters and networks of workstations

VFC implements the HPF features as discussed in Section including general blo ck and indirect

distributions dynamic data distribution the reuse clause and the halo directive A ma jor fo cus

in the development of VFC has b een put on the implementation of novel language features and

parallelization techniques required for an ecient handling of dynamicirregular applications which

are currently not adequately supp orted by commercial HPF compilers

The parallelization strategy of VFC is based on the SingleProgramMultipleData SPMD pro

gramming mo del VFC translates a source program into an SPMD messagepassing target program

which is usually parameterized in such a way that it can b e executed on an arbitrary numb er of pro

cessors In contrast to most commercial compilers VFC provides p owerful runtime parallelization

strategies for nonp erfectly nested irregular indep endent lo ops that may contain conditional state

ments and pro cedure calls In order to provide supp ort for dynamic memory allo cation dynamic

distributionredistribution parameterization of programs by the numb er of pro cessors communi

cation schedule reuse separate compilation and other features a general dynamic parallelization

metho dology and a p owerful runtime system has b een realized

The VFC runtime system manages distributed ob jects at runtime supp orts work distribution of

data parallel constructs index translation and communication generation for accesses to distributed

arrays and implements dynamic data redistribution A ma jor fo cus has b een put on a exible yet

ecient handling of irregular communication schedules and the asso ciated messagepassing

communication primitives in the context of a generalized insp ectorexecutor parallelization strategy

including ecient supp ort for communication schedule reuse and halos The VFC runtime system

provides generic interfaces to the MPI library to an extended version of the PARTICHAOS

runtime library and to the ADLIB library

VFC has b een successfully used by application develop ers to parallelize representative kernels from

diverse industrial and scientic co des such as weather prediction applications crash simula

tions uid dynamics co des nancial optimization and others For these

co des the p erformance obtained with HPF and VFC is close to the p erformance of manually

written messagepassing programs while slowdowns of up to two orders of magnitude are in

tro duced by commercial HPF compilers With the development of VFC we have demonstrated

that with an appropriate compiler and runtime infrastructure a highlevel approach to parallel

programming has the p otential to b e ecient even for advanced applications

Although the VFC compiler is mainly targeting distributed memory machines we have p orted

the system to diverse parallel architectures includin g shared memory systems and vectorparallel

sup ercomputers Moreover we have shown that some of the optimizations p erformed by VFC for

irregular co des can b e utilized for optimizing existing shared memory parallelization techniques

as for example employed in Op enMP

The VFC compiler is sub ject to ongoing research within the sp ecial research program AURORA

where the system is b eing optimized for clusters of SMPs develop ed in this context including

an automatic program instrumentation system a p erformance analysis and prediction to ol a

data visualization to ol and a debugger Moreover VFC serves as a testb ed for work on

parallel IO and parallel database technology and for research in the context of

distributed highp erformance computing as discussed in the next section

Distributed High Performance Computing with Opus and

OpusJava

Erwin Laure

The past few years have dramatically changed the view of high p erformance applications and com

puting While applications have traditionally b een targeted towards dedicated parallel machines

we see the emerging trend of building metaapplications that are comp osed of several mo dules

which exploit heterogeneous platforms and employ hybrid forms of parallelism Repre

sentative examples of such applications are multidiscipli nary optimizations where a numb er of

mo dels each of which represents a dierent discipline are combined to mo del complex systems

To provide supp ort for such applications the co ordination language Opus was designed in

a joint eort of ICASE NASA Langley Research Center and our Institute The goal of Opus is to

dene language extensions to HPF supp orting the co ordination of data parallel tasks Opus

intro duces coarse grained task paral lelism on top of ne grained data paral lelism and supp orts the

coupling of multiple programs into complex multidiscipli nary co des In addition Opus provides

a Java interface called OpusJava that allows a seamless integration of high p erformance HPF

mo dules into multilanguage distributed applications

The Opus Language and its Implementation At the heart of Opus is an abstract data

typ e called ShareD Abstraction or SDA whose purp ose is to provide a means for encapsulation of

data and metho ds pro cedures which act on this data SDAs may exploit data parallelism in that

the internal data of SDAs as well as the data of SDA metho ds may b e distributed By creating

an instance of an SDA typ e an SDA object or simply SDA is generated During its lifetime an

SDA executes asynchronously in a separate address space and can b e accessed through calls to its

metho ds

An SDA metho d can b e invoked synchronously where the caller is blo cked until control returns

or asynchronously where the caller do es not have to wait for the completion of the metho d Ex

plicit synchronization is p ossible via event variables that can b e b ound to asynchronous metho d

invo cations Arguments are passed with copy incopy out semantics In general each metho d

has exclusive access to the data of the SDA much like in monitors The user may however as

sert pairwise interference freedom among metho ds by means of ParBlocks thus allowing them

to execute concurrently A metho d may have an asso ciated condition clause sp ecifying a logical

expression which guards the metho ds activation

At runtime an SDA is conceptually represented by a data parallel thread of control This

conceptual thread is realized by two indep endent lightweight threads that communicate via a shared

memory area In particular the runtime comp onents of an SDA are a shared memory area

which contains queues for temp orarily storing Metho d Invo cation requests MIs the Server

Thread which receives MIs unmarshals input arguments and places the request into the shared

memory area in the form of execution records the runtime representation of MIs and the

Execution Thread which retrieves records from the shared memory area evaluates the condition

clauses and executes the resp ective metho ds which are compiled to internal subroutines of the

execution thread

A prototyp e compiler that translates an Opus program into an equivalent HPF program with calls

to the Opus runtime system has b een develop ed by extending the HPF compiler VFC

Prototyp es of the Opus runtime system are currently available on clusters of Solaris Workstations

the QSW CS and the Cray TE

OpusJava While Opus provides ecient supp ort for data parallel comp onents within homoge

neous environments it lacks supp ort for language interop erability and heterogeneous platforms

To overcome these deciencies we are currently developing a Java interface to Opus called

OpusJava

OpusJava is a Java framework that allows a seamless integration of Opus HPF mo dules into a

distributed Java application As a side eect OpusJava may b e used for co ordinating pure Java

mo dules at a high level as well The OpusJava framework consists of two main comp onents a set

of stubs and wrapp ers that facilitate interaction among Opus and Java and the OpusJava class

library

The OpusJava interface is realized by an external Opus SDA stub which is resp onsible for

routing requests from and to Java applications a set of C wrappers for data marshaling and Java

side stubs which have the job of routing requests b etween other Java ob jects and the external Opus

SDA stub All these comp onents are created by the Opus compiler

At runtime the external Opus

SDA stub is created within the

SDA

Opus program For any Opus

SDA

SDA ob ject created a corre

sp onding Java side stub ob ject is

established as well The pro cess

routing requests from Java to

External SDA Stub C Wrappers of

Opus and vice versa is exempli

in Figure a request from

Opus/HPF Application OpusJava Application ed

a Java SDA is marshaled by C

wrapp ers and transferred to the

Figure Comp onents of the Opus HPFJava Interface

external SDA stub which routes

the request to the appropriate HPF SDA taking care of the required data distribution In the

other direction an HPF SDA sends a request to the external stub which collects all the required

distributed data The request is translated to Java by C wrapp ers and forwarded to the appropriate

Java SDA

The OpusJava class library whose API is mo deled on the syntax of Opus to provide a common

lo ok and feel provides the basic mechanisms for accessing and co ordinating Opus or pure Java

ob jects from within Java applications

OpusJava intro duces three typ es of ob jects that provide the user with the standard Opus features of

creating SDAs on user sp ecied resources invoking metho ds within SDAs and synchronizing with

asynchronously spawned metho ds the nal class SDA which is a stub for a remote SDA ob ject

the nal class Event which is used to reference asynchronous metho d invo cations and the nal

class Resource describing an SDAs resources Sp ecically an SDA ob ject holds a remote reference

to an Opus Java side stub or any other Java ob ject With help of the metho ds call and spawn

synchronous and asynchronous requests can b e p osed to the referenced ob ject As explained ab ove

such requests are routed through Java and Opusside stubs and translated with the help of C

wrapp ers

A detailed description of the OpusJava framework as well as of its Opus interface can b e found

in

Prole Guided Optimizations

Eduard Mehofer

Traditionally compiler optimization has b een done statically for a given machinesystem environ

ment indep endent of actual execution runs However runtime information ab out the program

execution environment can b e used to pro duce more ecient co de Dierent approaches have b een

prop osed which take changing system or execution environments into account This broad research

area includes generic probabilistic dataow systems controlow restructuring systems

based on hot paths dedicated optimizations using prole information or sp ecialized

numerical systems like Atlas or FFTW In the rep orting p erio d we mainly investigated

prole guided optimizations based on generic probabilistic dataow systems which will b e pre

sented in the following in more detail

Classical dataow analysis is done statically without utilizing runtime information All paths are

equally weighted irresp ectively whether they are never heavily or rarely executed In contrast

probabilistic dataow systems take runtime information into account by using edge probabilities

to distinguish b etween frequently and rarely executed branches The solution of such a system

sp ecies the probability of dataow facts at each program p oint

Figure depicts a probabilistic optimization framework for a compiler with a proling feed

back lo op The proler is resp onsible for pro ducing prole information based on the execution

environment This information is passed over to the probabilistic optimizer Controlow analy

sis constructs the controlow graph annotated with edge probabilities Based on the annotated

controlow graph and on dataow equations probabilistic dataow analysis computes a proba

bilistic solution of the dataow problem Sophisticated transformations can utilize that solution

for generating highly ecient co de For changing execution environments the target co de is adapted

by the proling feedback lo op

probabilistic optimizer

Program front end control flow analysis

probabilistic DFA code generator Code

Input profiler transformations

Figure Probabilistic optimization framework

Ramalingam develop ed a dataow framework which computes the probability of dataow

facts once every edge in the controlow graph has b een annotated with a probability In order

to get an idea of the accuracy of his results we dened in the b est solution what we can

theoretically obtain and showed that the dierences can b e considerable In fact there are two

reasons which are resp onsible for the deviations On the one hand a program path is reduced to edge

probabilities and on the other hand it is assumed that a particular branch is indep endent of the

execution history which is obviously not the case in reality While the usage of edge probabilities is

indisp ensabl e to get an ecient handle on the problem there is p otential for improvement by taking

execution history into account In we develop ed a novel approach which mo dels execution

history by altering the dataow equation system giving signicantly b etter results Currently

we are working on further improvements of our probabilistic dataow analysis framework and on

advanced optimizations in various contexts based on it This work is b eing done in co op eration

with Bernhard Scholz from the Vienna University of Technology

In our future research we will also investigate the eectiveness of approaches which utilize run

time information to obtain b etter results This will include controlow restructuring systems and

dedicated optimizations that are based on prole information as well as numerical systems which

adapt themselves automatically to a given execution environment A comparison of the dierent

approaches is crucial to get a b etter understanding of their strengths and weaknesses Moreover

the exp eriences gained in numerical systems can b e useful in building highly optimizing compilers

Compilation based on runtime information is esp ecially applicable in highp erformance systems

emb edded systems and in digital signal pro cessing Finally it is worth mentioning that Intel IA

Itanium CFortran compilers also use prole information to guide pro cedure inlining reduce

instruction cache and TLB misses and to make the b est use of machine instruction width and

sp eculation features

Advanced Communication Optimizations for Distributed and

Parallel Systems

Thomas Fahringer and Eduard Mehofer

The overhead to access nonlo cal data from remote pro cessors on distributed memory architectures

may b e an order of magnitude higher than the cost of accessing lo cal data As a consequence

the eective use of distributed memory architectures requires optimization of communication

including message vectorization hoisting communication outside of lo ops

message coalescing removing redundant communication based on the same array communication

aggregation combining messages related to dierent arrays communication latency hiding and

pip elined communication The eect of these optimizations is limited by the fact that most of the

analysis in current parallelizing compilers is p erformed for a single lo op nest at a time Further it

has recently b een observed that maximizing communication latency hiding can prevent valuable

opp ortunities to reduce the numb er of messages and to eliminate partial redundancy Most existing

communication optimization approaches however are based on a xed communication optimization

strategy ignoring tradeos b etween enhancing communication latency hiding and reducing the

numb er and volume of messages

We have develop ed a novel communication optimization framework that is based on global uni

directional bitvector dataow analysis problems and p erformance prediction and that

addresses the ab ove issues Firstly we maximize latency hiding by placing SENDs as early and

RECVs as late as p ossible where SEND represents the sending comp onent which initiates commu

nication and RECV denotes the comp onent which nalizes communication Since hoisting com

munication to the earliest p ossible program p oints increases the lifetime of communication buers

which can cause serious memory problems buer constraints have to b e taken into account In our

3

framework buer safety is achieved by using P T an eective cost estimator which allows

us to selectively blo ck SENDs in order to satisfy buer constraints Secondly message coalescing is

applied to eliminate communication redundancy b oth in terms of numb er and volume of messages

Thirdly based on a buersafe program we systematically create and examine a reasonable set of

communication placements for a given program covering several p ossibly conicting communi

cation guiding prot motives including promising combinations of communication latency hiding

3

and reducing the numb er and volume of messages P T is used to determine the b est choice of

the communication placements based on cost functions that mo del work distribution computation

and communication times and the degree of overlapping communication with computation

Performance results demonstrate that the quality of a xed communication optimization strat

egy may change for dierent problem and machine sizes Our metho d shows signicant reduction

in communication costs and overall improvements in p erformance as compared to previous xed

communication strategies Future research will fo cus on further improvements such as consider

ing additional p erformance parameters in particular memory and cache lo cality cost functions

applying symb olic cost mo dels and p erforming interpro cedural analysis

Parallel Access to Persistent Arrays from HPF Applications

Peter Brezany and Viera Sipkova

Multidimensional arrays are a fundamental data typ e in scientic computing Often these arrays

are persistent that is they outlive the invo cation of the program that created them Portability

and p erformance with resp ect to IO of arrays p ose signicant challenges to HPF applications that

access large p ersistent arrays This section presents our solution develop ed in the context of the

Vienna Fortran Compiler VFC see

We addressed research issues asso ciated with four dierent scenarios in which the application may

need to access secondary memory frequently and p otentially b ecome IO b ound

 Compulsory IO Some IO accesses used in applications are an inherent part of the appli

cation algorithm for example reading initialization les or generating application output

 Timestep Operations Applications solving timedep endent problems output data at selected

intervals over time to b e p ostpro cessed for later analysis by visualization to ols

 Checkpointing For longrunning pro duction co des it is desirable to save the state of certain

program ob jects p erio dically at given checkp oints in order to b e able to resume restart

from a previous state in case of an irregular termination of the execution

 OutofCore Computations An application may deal with large quantities of primary data

which cannot b e held in the main memory Such a co de needs to access the secondary memory

very frequently

In our approach the overall parallel IO software consists of two comp onents a set of mo dules

included in VFC and the parallel IO runtime system An HPF program can contain Fortran

IO op erations as well as other language constructs which extend HPF IO capabilities New

language constructs are discussed in The HPF compiler pro cesses the IO sp ecications

provided by the user and inserts appropriate runtime routine calls to p erform IO op erations In

order to simplify pro cessing of IO op erations by the compiler a generic interface to the dierent

comp onents of the IO runtime system has b een implemented called HPF PARIO The parallel

IO runtime system that we develop ed in co op eration with the Erich Schikutas group Department

for Data Engineering University of Vienna consists of two main comp onents ViMPI and ViPIOS

ViMPI has b een built on top of MPIIO and provides supp ort for parallel IO op erations

timestep op erations checkp ointing and outofcore computations Checkp ointing is supp orted

by pro cedures to create lists of checkp ointed ob jects writing these ob jects to disk checkp oint

op eration and reading the data for these ob jects from disk restart op eration Similar supp ort is

provided for the implementation of timestep op erations For outofcore computations the concept

of array les has b een develop ed enabling the ecient transfer of sections of large arrays b etween

secondary storage and main memory Array les extend the conventional le organization

by means of metadata describing attributes of arrays eg rank shap e and element typ e in order

to manipulate arbitrary sections of arrays Array les are also targeted by the HPF debugger and

GDDT visualization to ol

The Vienna Parallel Input Output System ViPIOS is an advanced IO runtime system

which is partially based on parallel and distributed database technology ViPIOS is designed as a

standalone indep endent runtime mo dule to provide parallel applications with ecient transparent

access to p ersistent arrays The system applies a novel blackb oard metho d based on AI technology

to the distribution of arrays on disks and to the optimization of disk accesses and the communication

scheme Further it uses advanced buering and prefetching algorithms to sp eed up the system

p erformance ViPIOS provides interfaces to HPF MPI and PVM

In another research eort we established an interface b etween our HPF compiler and the Panda

parallel IO library develop ed at the University of Illinois httpdrlcsuiucedupanda Panda

provides highlevel supp ort for most common IO access patterns which o ccur in a typical scientic

parallel application Arrays managed by Panda can b e distributed b oth in memory and on disk

The results of this eort are describ ed in

To the b est of our knowledge there is no system providing comparable parallel IO supp ort for

HPF applications Existing HPF compilers either rely on implementations in which one no de

the socalled master no de reads and writes distributed arrays fromto sequential les or they

implement accesses to parallel les by means of the HPF extrinsic pro cedure mechanism

Mining Large Spatial Databases With Parallel Pro cessing

Peter Brezany

The computerization of many business and government transactions and the advances in scientic

data collection to ols provide us with a huge and continuously increasing amount of data This

explosive growth of datasets has far outpaced human ability to interpret this data creating an

urgent need for new techniques and to ols that supp ort humans in transforming data into useful

information and knowledge Resp onding to this need data mining ie know ledge discovery in

databases has b ecome an imp ortant area of research

Although there have b een many studies of knowledge discovery in relational and transaction

databases knowledge discovery is in great demand in other applicative databases in

cluding spatial databases temp oral databases ob jectoriented databases multimedia databases

etc Our research fo cus is on the metho ds of spatial data mining ie discovery of interesting

knowledge from spatial database systems

In general spatial database systems are systems that manage multidimensional and spatial

ob jects Whereas for many years geographic information systems dominated this application area

the requirement for spatial data management emerges in more and more domains such as molecular

biology environmental protection mechanical engineering navigation medical imaging and data

warehousing among many others In these applications the space of the interest can b e for

example the twodimensional abstraction of parts of the surface of the earth the layout of a

VLSI design a mo del of the human brain or a dspace representing the arrangement of chains of

protein molecules

Each spatial database system is a fulledged database system with additional supp ort for spatial

data types Spatial data typ es eg POINT LINE REGION provide a fundamental abstraction

for mo deling the structure of geometric entities in space as well as their relationships l intersects

r prop erties arear and op erations intersectionlr Which typ es are used dep ends

on a class of applications to b e supp orted eg rectangles in VLSI design surfaces and volumes in

d A spatial database system must at least b e able to retrieve from a large collection of ob jects in

some space those lying within a particular area without scanning the whole set Therefore spatial

indexing is mandatory Each spatial database system should also provide ecient algorithms for

the implementation of spatial queries

Imp ortant query typ es include

 Reqion queries obtaining all ob jects intersecting a sp ecied query region eg nd all

rivers in Burgenland

 Nearest neighbor NN queries obtaining the ob jects closest to a sp ecied query ob ject

eg nd the two closest hospitals to our house

 Spatial joins yielding all pairs of ob jects satisfying a sp ecial spatial predicate eg nd

the pairs of elements that are closer than and thus create electromagnetic interference to

each other

Recent studies on data mining eg attempt to extend the scop e of data mining from

relational and transaction databases to spatial databases Almost all of them fo cus on the de

velopment of sequential spatial data mining algorithms A straightforward application of these

algorithms to large spatial databases leads to excessively long pro cessing times Therefore the aim

of our research is to contribute to the development of a new spatial data mining technology based

on parallel pro cessing

The goal of our work is twofold First we want to analyze and parallelize several existing algorithms

for spatial data mining esp ecially clustering classication characterization and trend detection

algorithms We will consider b oth incore and outofcore implementations Our second goal is

to prop ose an architecture for a parallel spatial database system that will provide supp ort for

storage retrieval and analysis of very large spatial data sets in a timeecient and spaceecient

manner on clusters of SMPs We also plan to realize a pro ofofconcept implementation of selected

database mo dules The implementation will b e based on the Vienna Parallel Input Output System

ViPIOS intro duced in Section and rely on MPI to ensure p ortability over a wide range of

parallel platforms

Macroservers An Execution Mo del for Pro cessorinMemory

Arrays

Hans Zima and Thomas Sterling

Pro cessorinMemory

Pro cessorinMemory or PIM technology and architecture has emerged as one of the most imp or

tant domains of parallel computer architecture research and development It is b eing pursued as a

means of accelerating conventional systems for array pro cessing and for manipulating irregular

data structures It is b eing considered as a basis for scalable spaceb orne computing as

smart memory to manage systems resources in a hybrid technology multithreaded architecture for

ultrascale computing and most recently as the means for achieving Petaops p erformance

PIM exploits recent advances in semiconductor fabrication pro cesses that enable the inte

gration of DRAM cell blo cks and CMOS logic on the same chip The b enet of PIM structures is

that pro cessing logic can have direct access to the memory blo ck row buers at an internal memory

bandwidth on the order of Gbps yielding the p otential p erformance of Gips bit op erands

on a memory chip with a Mbyte capacity Because of the eciencies derived from staying on

chip p ower consumption can b e an order of magnitude lower than comparable p erformance with

conventional micropro cessor based systems But the dramatic advances in p erformance will b e

derived from arrays of tightly coupled PIM chips in the hundreds or thousands either alone or

in conjunction with external micropro cessors Such systems could deliver low Teraops scale p eak

p erformance within the next couple of years and p ossibly a Petaops at least for some applications

in ve years

The challenge to realizing the extraordinary p otential of arrays of PIMs is not simply the interesting

problem of the basic onchip structure and pro cessor architecture but also the metho dology for

co ordinating the synthesis of as many as a million PIM pro cessors to engage in concert in the

solution of a single parallel application A large PIM array is not simply another MPP it is a new

balance of pro cessing and memory in a new organization Its lo cal op eration and global emergent

b ehavior will b e a direct reection of a shared highly parallel systemwide mo del of computation

that governs the execution and interactions of the PIM pro cessors and chips Such a computing

paradigm must treat the semantic requirements of the whole system even as it derives its pro cessing

capabilities from the lo cal mechanisms of the individual parts A synergy of co op erating elements

is to b e accomplished through this shared execution mo del

PIM diers signicantly from more common MPP structures in several key ways The ratio of

computation p erformance to asso ciated memory capacity is much lower Access bandwidth to on

chip memory is a hundred times greater And latency is lower by a factor of two to four while logic

clo ck sp eeds are approximately half that of the highest sp eed micropro cessors PIM favors data

oriented computing where op erations are scheduled and p erformed at the site of the data and tasks

are often moved from one PIM to another dep ending on where the argument data is rather than

moving the data PIM pro cessor utilization is less imp ortant than memory bandwidth A natural

organization of computation on a PIM array is a binding of tasks and data segments to coincide

with physical data allo cation while making remote service requests where data is nonlo cal This

is very similar to evolving practices for accomplishing tasks on the Web including the use of Java

and encourages an ob jectoriented approach to managing the logical tasks and physical resources

of the PIM array

The Macroserver Mo del

This work presents a strategy for relating the physical resources of next generation PIM arrays to

the logical requirements of user dened applications The strategy is emb o died in an intermediate

form of an execution mo del that provides the generalized abstractions of b oth lo cal and global

computation in a unied framework The principal abstract entity of the prop osed mo del is the

macroserver a distributed agent of state and action It complements the concept of the microserver

a purely lo cal agent This early work explores one p ossible mo del that is ob jectbased in a

manner highly suitable to PIM structures but of a suciently high level with task virtualization

that aggregations of PIM no des can b e co op eratively applied to a segment of parallel computation

without phase changes in representations as would b e found with Op en MP combined with MPI

In our mo del the execution in a PIMbased system can b e describ ed by the activity of a collection

of co op erating macroservers A macroserver is an ob ject encapsulating a set of data and a set of

routines metho ds that op erate on that data Furthermore it contains an interface sp ecication

dening its relationship to other macroservers These three elements data metho ds and interface

establish a macroserver as the basis for organizing all computation on an array of PIMs

The relationship b etween a macroserver and the underlying hardware is imp ortant to appreciate

A macroserver is a virtual named ob ject as are the data and metho ds of which it is made In

principle a given macroserver can exist on any part of the underlying physical PIMs and over time

move across this physical medium as the virtual pages holding the data migrate Supp ort services

to manage the creation execution interaction and migration of macroservers are provided by a

set of microserver routines available within any PIM no de This interface is an imp ortant asp ect

of the macroserver implementation Macroservers co op erate by calls to each others metho ds The

underlying representation of the data is transparent as it is accessed and manipulated through the

metho ds which therefore dene the data semantics A macroserver is in general a dynamic ob ject

while an application program will have a main macroserver that represents its b eginning and end

other macroservers may b e created and destroyed at execution time Macroservers can also provide

system software services and may b e ephemeral as well They are named rstclass ob jects that

may b e manipulated by other macroservers making parallel system software daemons particularly

easy to construct

In the following we intro duce a few concepts of our mo del and their mutual relationship in slightly

more detail A rst sp ecication of the mo del can b e found in

A macroserver comes into existence by b eing created as an instantiation of a parameterized tem

plate called a macroserver class which contains declarations of variables and the metho ds dening

its b ehavior While the hardware architecture provides a shared address space the discipline

imp osed by the ob jectbased framework requires all accesses to external data to b e p erformed via

metho d calls optionally controlled through a set of access privileges At the time a macroserver

is created a region in the virtual PIM array memory is allo cated to it This allo cation can b e ex

plicitly controlled in the mo del by either directly sp ecifying the region or aligning the macroserver

with an already existing one A reference to the newly created macroserver can b e assigned to a

variable which can act as a handle to the ob ject At any p oint in time a macroserver is asso ciated

with a distributed state space in which a set of asynchronous threads may b e op erating each of

which b eing the result of the spawning of a metho d A data structure can b e distributed across

the memory region allo cated to a macroserver by binding it to a distribution ob ject Distribution

ob jects are rstclass ob jects supp orting dynamic distribution and redistribution of data dep end

ing on dynamically arising conditions Similarly we intro duce work distributions as rstclass

ob jects that can b e used to sp ecify a binding of threads to data guided by the distribution of the

data As b efore such bindings can b e managed dynamically resulting in a very exible scheme

for the allo cation and migration of data and threads Threads are lightweight they execute asyn

chronously as long as they are not sub ject to synchronization Mutual exclusion can b e controlled

via atomic metho ds A macroserver whose metho ds are all atomic is a monitor and can b e used

as a exible instrument for scheduling access to resources A small monitor can b e asso ciated

with each element of a large data structure such as a reservation system coallo cating the set of

variables required by the monitor with the asso ciated element This organization allows the system

to p erform the scheduling in a highly ecient way State synchronization can b e expressed using

condition variables which provide a lowlevel eciently implementable mechanism Finally

future variables can b e b ound to threads and used for implicit or explicit synchronization based

up on the thread status

Chapter

To ols

Thomas Fahringer

Intro duction

Although rapid advances in HPC systems are bringing teraops p erformance within reach the

software infrastructure for massive parallelism has not kept pace Substantial advances in the eld

of parallel languages enable the programmer to write parallel programs at a machineindep end ent

level However as parallelization of programs is far from b eing automated there is a clear need

for useful ecient and accurate to ols to supp ort this pro cess Although there exist many to ols for

measuring and analyzing program p erformance they have not b een very eective in supp orting p er

formance tuning of parallel programs Many p erformance to ols suer from some severe restrictions

imp osed on mo deling architectures Often these to ols are unable to determine useful parame

ters reecting computational and communication overhead Moreover very few p erformance to ols

actually consider co de transformations and optimizations applied by a compiler Debugging of par

allel programs is frequently done by inserting print statements or using message passing program

debuggers

The AURORA pro ject To ols has b een established to supp ort p erformanceoriented program

development and debugging at the applicationalgorithm level as well as to enhance translation

from sequential to parallel programs The following to ols have b een extended or develop ed in this

context and integrated with the VFC parallelization system see Section

 A highlevel debugger called SPiDER enables programmers to observe the b ehavior of their

programs at a level at which the programs have b een develop ed SPiDER supp orts SPMD

singleprogrammultipledata programs only We followed an approach referred to as se

quential view of parallel execution the real parallel co de is executed but a corresp onding

source co de level interface is presented to the programmer

 SCALA a p erformance to ol for the b ehavioral analysis of parallel programs comprises p erfor

mance prediction techniques and p ostexecution and scalability analysis to compute p erfor

mance indices that reect the b ehavior of parallel programs Performance indices are provided

to the user andor to the compiler in order to guide the selection of compiler transformations

and optimization strategies under VFC

3

 P T is a prototyp e p erformance estimator for parallel programs which has b een imple

3

mented on the basis of VFC P T estimates the outcome of communication computation

load balance and cache b ehavior based on messagepassing systems at compile time In the

3 3

context of this pro ject P T has b een substantially reco ded for integration with VFC P T

has b een tuned for the Meiko CS parallel architecture and a network of workstations A

graphical user interface has b een provided that enables p erformance data ltering at the

application level

 MIGRATOR a reverse engineering to ol that supp orts the translation of sequential to parallel

programs has b een partially develop ed It combines techniques for automatic detection of

algorithmic patterns data and work distributions

 A graphical user interface GUI has b een implemented based on ob ject oriented user interface

design principles that provides a unique lo ok and feel for the to ols describ ed ab ove Although

the to ols can b e executed on dierent architectures connected via the Internet the GUI

integrates them into a seamless consistent user interface environment The system b ehavior

of the various architecture platforms on which the parallel applications are b eing executed

and the interplay among all connected to ols is graphically displayed

P T +

3

P T is a p erformance estimator that supp orts the development of parallel applications under

VFC by evaluating program transformations as well as HPF directives at compiletime Figure

3

shows P T as part of a program development and optimization system Input programs are

parsed and analyzed by VFC which generates syntax trees call graphs ow graphs etc and stores

them in a program database VFC applies various co de transformations and optimizations to the

program The programmer can invoke a p erformance analysis system SCALA to instrument and

compile a parallel program which then can b e executed on a target architecture Based on the in

strumented program execution p erformance data is gathered and stored in the program database

3

Moreover P T can b e employed to predict the p erformance b ehavior of the co de transforma

3

tions and optimizations applied by VFC P T s p erformance data is also stored in the program

3

database All three to ols VFC SCALA and P T are co ordinated and controlled through a

co ordination system that also includes a graphical user interface GUI for displaying source co de

and p erformance data and for enabling user interaction Finally as a result of p erformancedriven

program development an optimized parallel program is created by VFC

3

P T currently supp orts regular HPF programs which restrict array subscript expressions and

lo op b ounds to linear functions of lo op variables

A key issue for a useful p erformance estimator is to provide critical information to the programmer

and compiler which allows steering of the p erformance tuning pro cess Most existing to ols estimate

only execution time The problem with this parameter is that much imp ortant information is hidden

in a single runtime gure As a consequence the cause of p otential p erformance losses remains

unknown It is not clear whether a parallel programs p erformance is p o or due to cache load

balance communication or computation b ehavior Other p erformance parameters may also play

an imp ortant role Without making such information transparent p erformance tuning is extremely

3

dicult P T computes a set of p erformance parameters at compiletime each of which reects

a dierent p erformance asp ect

3

P T is currently b eing extended by a p erformance estimator for highlevel algorithms which Input Code

... Performance Data Syntax Tree Call Graph Transformed Flow Graph VFC Code Program Database

Coordination System Performance Measured + Analysis Performance User GUI System (SCALA) Data

Performance Predicted Prediction Performance System (P3T+) Data Performance evaluation sub-system

Interactive performance-driven parallelization system

data flow Message control flow

Passing Program

Figure Performancedriven development of parallel programs

provides p erformance analysis at an early stage of application development A graphical mo deling

approach is envisioned in order to describ e b oth applications and architectures The graphic mo dels

will b e annotated with cost functions based on which simulation will derive p erformance metrics

The novelty of this approach will b e that no application co de must b e provided in order to apply

p erformance prediction

SCALA

SCALA is a p ostexecution p erformance system that instruments measures and analyzes the b e

havior of parallel programs The architecture of SCALA is based on a p ortable instrumentation sys

tem runtimelibraries that collect p erformance data during program execution and p ostexecution

p erformance analysis that computes various p erformance metrics and relates them back to the input

program In addition SCALA supp orts an interface to external visualization systems Although

SCALA has b een integrated with an existing compiler it can b e easily p orted to frontends and

compilers for other programming languages and architectures by p orting its instrumentation and

runtime libraries

Figure shows the architecture of SCALA as an integrated system of VFC see Section The

input programs F F HPF and explicit messagepassing programs of SCALA are pro cessed

by VFCs frontend which generates an abstract syntax tree AST The SCALA Instrumentation

System SIS enables the user to select by directives or commandline options co de regions of

interest Based on the selected co de regions SIS automatically inserts monitoring co de in the

AST which will collect all relevant p erformance information during execution of the program SIS

also generates a measurement description le that enables all the p erformance data gathered to b e

related back to the input program This is a crucial asp ect of SCALA as instrumentation may b e

done at a dierent level eg messagepassing to that of the original input program eg HPF

The compiler then generates an instrumented parallel program which will b e executed on the target

architecture Note that the compiler can also parse messagepassing programs for instrumentation

and p erformance analysis During execution all relevant p erformance data is collected in a tracele

The tracele provides a generic input for a p ostmortem data management and measurement anal

ysis to reduce lter summarize and analyze p erformance information A variety of p erformance

metrics are computed which include among others sp eedup eciency ecacy communication

and work distribution An interface for a visualization system is employed to graphically display

various p erformance statistics and proles that can b e shown together with the original input program

Post-execution Input Program Visualization analysis/visualization

VFC Frontend Data Management and Measurement Analysis SIS Syntax Code Region Tree Selector

Parallel Measurement Instrument. Description

Parallelization SIS run-time System

Compilation Instrumented & Tracefile Code Execution

VFC run-time

System

Figure ExecutionDriven Performance Analysis System

The general structure of SCALA comprises several mo dules which combined together provide a

robust environment for advanced p erformance analysis

 SCALA Instrumentation System SIS

 Performance data correlation

 Data management and measurement analysis

 Performance visualization interface

SCALA is currently b eing extended for a larger class of input programs including Opus and

3

Op enMP programs Moreover SCALA as well as P T will b e enhanced by a data rep ository in

order to store b oth p erformance data and input programs A query system will b e provided that

enables more sophisticated and comprehensive p erformance analysis HPF+ Source Program

Vienna Fortran Compiler VFC

F90+MPI Parallel Program Visualization System University of Vienna & University of Linz

GDDT GUI

F90 Compiler HPF Dependent HPF+ Debugging System Symbol University of Vienna Table

F90 Base Debugging System Symbol TU Munich & AGH Cracow Table

Parallel Hardware Monitoring Object program System

OCM

Figure Architecture of the HPF Debugging System

SPIDER

SPiDER an advanced symb olic debugging system for FortranHPF parallel programs

enables the control and monitoring of program pro cesses at the source co de level Multiple pro cess

views of the program allow a programmer to examine a single pro cess of a parallel program or

to insp ect the entire program from a global p oint of view SPiDER allows the examination of

distributed data structures as a single global entity ie a programmer can insp ect and mo dify a

section or individual elements of distributed arrays without the need to sp ecify on which pro cessors

the elements reside Moreover SPiDER provides supp ort for regular and irregular applications with

several exceptional features for visualization and steering of data distributions Data distributions

can b e dynamically changed after stopping program execution at a breakp oint Sophisticated

visualization capabilities provide graphical representation of array values and data distributions

with convenient navigation facilities for distributed data and logical pro cessor arrays Snapshots

of a given array can also b e stored and dierences b etween them visualized For applications in

which the distribution of arrays changes during program execution SPiDER provides an animated

replay of the array redistribution sequence and allows the migration of arbitrary array elements

to b e observed in a stepwise or continuous mo de Finally SPiDER provides a load diagram that

visualizes how many array elements have b een mapp ed to every pro cessor This feature enables a

programmer to examine how evenly data has b een distributed across all pro cessors

Figure shows the architecture of SPiDER with an emphasis on the supp ort provided by VFC and

low and highlevel debugging technology The input programs of SPiDER are compiled with VFC

to Fortran messagepassing programs In order to generate an executable le a vendor back

end Fortran compiler is used The twostage compilation pro cess is reected in the debugger

architecture The main parts of the system are the Base Debugging System BDS and the HPF

Dependent Debugging System HDDS BDS op erates as a low level debugger closely related to

the target machine on which it is running It resolves all platform sp ecic issues and hides them

from the HDDS level It also constitutes a clear simple but unequivo cal interface that provides

functionality for insp ection the state of pro cesses and values of data in the parallel program BDS

do es not check for consistency of the running application with the HPF source co de but provides

information to HDDS ab out every pro cess of the program The design of BDS partially relies on the

DETOP parallel debugger and the OCM monitoring system develop ed at the Technical

University Munich HDDS works on top of BDS and provides a higher level functionality to allow

viewing the asso ciated HPF source co de of the target parallel program and to interactively control

and alter the application data The interface of SPiDER to VFC is supp orted by a symb ol table

le which includes mapping information ab out mutually corresp onding lines and symb ols in the

HPF source program and the resulting Fortran messagepassing program and information ab out

compiler transformations

A programmer interacts with SPiDER by using the visualization system which consists of a Graph

ical User Interface GUI and a Graphical Data Distribution Tool GDDT for visualization of

HPF data structures

Chapter

Europ ean Centre of Excellence for

Parallel Computing at Vienna VCPC

Ian Glendinning and Wil ly Weisz

Intro duction

VCPC has the primary ob jective of furthering the use of parallel pro cessing in the marketplace

The Centre was established in when it received startup funding from the Europ ean Union

provided within the ESPRIT research and development program in Information Technologies

Since that time VCPC has pursued an agenda of research and development in languages to ols

and applications for high p erformance computing architectures A ma jor fo cus of the work is the

transfer of maturing technologies in HPCN from research to industry VCPC has b een active at the

Europ ean level within the framework of the EUs HPCN Technology Transfer Network where it

joined with lo cal industrial organizations to form ATTN the Austrian Technology Transfer No de

The VCPC has participated actively in standardization activities and encouraged the takeup of

new programming paradigms in particular HPF VCPC provides consulting services and education

and training Sta memb ers are able to teach in the usage of MPI HPF Fortran and in other

topics of parallel programming as well as the usage and administration of op erating systems used

for parallel architectures eg Linux on Beowulfclass clusters

The main computing facility at VCPC during was a Meiko CS which was used by industry

and academia for applied research co de development b enchmarking and demonstration purp oses

In the CS was withdrawn from service and a Beowulfclass cluster of PCs was installed

These computing facilities are describ ed in more detail in the next section and in the remainder

of the chapter a brief overview is given of the ma jor pro jects in which VCPC was involved during

this rep orting p erio d

Computing Facilities

CS

The CS was a mo dular generalpurp ose distributedmemory parallel computer and the system

installed at VCPC comprised a total of Sup erscalar SPARC pro cessing no des connected by a

proprietary multistage network of the no des were diskless and had MByte of memory each

while server no des had MByte of memory and a GByte disk each making a total GByte

of main memory A further GByte of disk space was available for shared access by all no des

The system oered a Solaris op erating system adapted to the parallel computing environment by

Meiko

The development software installed on the system included singleno de compilers Fortran

Fortran C and C from Ap ogee PGI and Sun messagepassing libraries PARMACS

MPICH PVM MPSC p erformance monitoring and debugging to ols TraceGen ParaGraph

VAMPIR upshot TotalView numerical libraries BLAS BLACS PBLAS ScaLAPACK PETSc

HPFlevel software PGI HPF NAS HPF Mapp er Debugger co de migration and cleanup to ols

Loft FORESYS FORGExplorer and public domain and research to ols PARMACS to MPI

converter ADAPTOR sHPF IDA VFCS

Apart from some interruptions due to hardware and op eratingsystem problems the system pro

vided the service necessary for the implementation testing and b enchmarking of to ols and appli

cations in quite a stable manner Partitions of up to pro cessor no des were oered to the users

during regular op eration and up to no des were available up on request for a limited time span

PC cluster

When the CS was withdrawn from service in a small Beowulfclass cluster of PCs was

installed as a temp orary measure until a more substantial system could b e pro cured Since clusters

of SMPs were envisaged as the target architecture for a planned pro curement the no des of the

PC cluster feature twinpro cessor PCs which allow testing of programs written for shared memory

parallel pro cessing The conguration of the cluster is

A frontend PC with

 Intel Pentium I I MHz pro cessors

 GB ECC RAM MHz bus

 GB LVD hard disk for system temp orary and swap space

 GB disk space mainly for users RAID array with ve GB LVD disks and a hot

spare disk for automatic reconstruction in case of a disk failure

computeno de PCs each with

 Intel Pentium I I MHz pro cessors

 MB ECC RAM MHz bus

 GB EIDE hard disk for system temp orary and swap space

All no des of the cluster are connected via a lo cal switched Fast Ethernet and only the frontend PC

has a second network interface card that connects it to the outside world The network addresses

on the cluster LAN have b een chosen in the range to so that the LAN

constitutes a Private Internet whose lo cal no des cannot b e addressed from the Internet This

setup was chosen so that users can only log into the frontend PC as geschervcpcunivieacat

and the other PCs act like the compute no des of a parallel computer

The op erating system installed is Linux kernel version Due to previous rather problematic

exp eriences with a cluster having shared system disks under AIX it was decided that each

no de should carry its own version of the op erating system and basic software comp onents so that

the no des could b e b o oted indep endently of the availability of matching software comp onents on the

front end The setup with lo cal swap and temp orary space was chosen for its obvious p erformance

advantage

C C and Fortran compilers oered as Op en Source are installed as well as the compiler suite

PGIHPF marketed by Portland Group Inc whose Fortran and Fortran compilers are

known to generate ecient ob ject co de for the x platform

The messagepassing libraries MPICH version an implementation of version of the MPI

standard which supp orts shared memory communication on SMP no des and PVM version

are installed together with their supp ort programs The mathematical and supp ort libraries BLAS

BLACS LAPACK and ScaLAPACK are also available

The system is set up so that users only work interactively on the frontend PC Pro cesses on the

compute no des can only b e started via remote shell calls a mechanism also used by the message

passing libraries and supp ort programs The mailing system used by the hardware monitoring

feature and also available to user pro cesses is set up such that only the frontend PC has a full

mail system and the compute no des use the front end as a mail relay

Since the cluster must run unattended hardware monitoring features are essential Each mother

b oard has sensors for temp erature voltage and fan sp eed and the software package Lm sensors

sends a warning email message to the system administrator whenever a value transmitted by a

sensor is outside its acceptable range

The ESPRIT Pro ject ATTN and Asso ciated Pro jects

The Austrian Technology Transfer No de ATTN was the Austrian no de in the Europ ean HPCN

Technology Transfer No des TTN Network which consisted of Technology Transfer No des

throughout Europ e The ATTN consortium consisted of VCPC and ACR the umbrella organization

of Austrian coop erative research organizations The pro ject started in July and was completed

in March

The goal of ATTN was to supp ort the takeup of HPCN technologies in Europ ean industry in

general and in the Austrian marketplace in particular by initiating and supp orting activities which

use HPCN technologies to solve existing business problems and disseminating their results During

the p erio d covered by this rep ort ATTN administered and coordinated seven pro jects six of which

it initiated in dierent industrial sectors

FRAME The digital lm restoration system LIMELIGHT was parallelized allowing fast and

costecient restoration of damaged cinema and video lms

HPCNWOOD An HPCNawareness campaign ab out the advantages of HPCNsupp orted au

tomatic quality control targeting the Austrian wo o d industry

FLOAT An application for waterlevel prediction was develop ed based on the commercial genetic

algorithm co de EVIS Its p erformance was evaluated using historical waterlevel data and

analysis showed that it could b e eciently parallelized

HiPEC A visualization system for bathro om design was enhanced with the help of HPCN and it

was proven that it is p ossible to pro duce photorealistic pictures of a fully equipp ed bathro om

within the time a p otential customer is willing to sp end on the premises of a retailer

ADOLA An existing parallelized CFD co de was adapted to simulate the heatexchangesystem

of a laundry dryer to make the design pro cess faster

HPCNCAST An HPCNawareness campaign and study targeting the Austrian foundry indus

try

APPEAL The goal of the pro ject was to assess the applicabili ty of HPCNtechnology for simu

lation of ASIC design

VCPC participated as a partner in four of these pro jects which are describ ed in more detail in the

following subsections

The Network of Technology Transfer No des clustered its activities around sector groups addressing

dierent industrial sectors in order to facilitate access by companies to results relevant to them to

direct information eciently at appropriate companies and to use market feedback to t enduser

requirements b etter The ATTN partners disseminated the results of the Austrian activities to

a wider audience by collab orating with other TTNs and by participating in the following sector

groups

Media Video and Entertainment based on the FRAME pro ject ATTN coordinated this

sector group It organized three sector group meetings provided external exp ertise partici

pated in three events supp orted the organization of a b o oth at IST in Vienna made press

releases and pro duced dissemination material

Forging Casting Moulding based on the HPCNCAST pro ject This sector group was ini

tially coordinated by ATTN and although it had to hand over the activity for p ersonnel

reasons it continued to work in the group creating and maintaining its website providing

external exp ertise for the metal part of the group and contributing printed material

Quality Control and Insp ection based on the HPCNWOOD pro ject Several presentations

at workshops and fairs were given by the ATTN partners eg at IST in Helsinki

VCPC also initiated and coorganized an information workshop on the RD Framework Programme

in Information Technology of the Europ ean Commission The event was organized in coop eration

with the Hungarian National Committee for Technological Development OMFB the Austrian

Embassy in Budap est and with the supp ort of the Delegation of the Europ ean Commission The

Information Day was regarded by the Hungarian partners as a kicko event for seeking pro jects

and partners for IT related prop osals for the th Framework Programme FP Memb ers of the

VCPC gave presentations on the HPCN activities in FP the HPCN TTN Network ATTN and

its pro jects and on the IST Programme in FP with emphasis on its rules for preparing prop osals

the decision phase and the funding schemes

At the ocial presentation of the FP IST Programme organized by the OMFB and the Europ ean

Commission Information So ciety Technology Information Day a month later Willy Weisz was

invited to chair a session and to give a presentation on HPCN

The ESPRIT Pro ject FRAME

Within the FRAME pro ject VCPC parallelized the digital lm restoration system LIMELIGHT

develop ed by Joanneum Research The pro ject started in July and was completed in Novemb er

VCPCs partners in the pro ject were Joanneum Research Lab oratoires Neyrac Films LNF

and Quadrics Sup ercomputers World QSW

The system develop ed within the EUREKA pro ject LIMELIGHT applies semiautomatic defect

detection and removal metho ds to degraded digitized lm material Although the results achieved

were promising the computation time was much to o high for industrial use The results clearly

indicated that parallelization could enable an acceptable level of p erformance to b e obtained while

retaining a high quality of restoration By parallelizin g the proven digital lm restoration metho ds

implemented in LIMELIGHT FRAME enabled real industrial use of the software The system

created in the FRAME pro ject was the rst sp ecialpurp ose system for automatic largescale lm

restoration

The initial sequential software consisted of a basic analysis mo dule to prepare the lm data for

restoration and a numb er of mo dules for p erforming detection and removal of dierent typ es of lm

defects Within FRAME the p erformance of the existing restoration software was rst analyzed

and the most computationally intensive mo dules were found to b e the basic analysis mo dule the

mo dule for dust detection and removal and the mo dule for noise suppression These three mo dules

were therefore considered to b e the main candidates for parallelization and after making some

general optimizations which improved the p erformance signicantly the mo dules were parallelized

for execution on large distributed or massively parallel computers using the CS at VCPC as a

hardware platform and the p ortable MPI messagepassing interface as a software base

Two main parallelization strategies were applied In a scenario where the restoration parameters

remain xed for relatively long sequences of frames frame sequence decomposition is the optimal

strategy in which the frame sequence is split into smaller sequences and each pro cessor works on

one of the smaller sequences However this is not appropriate when netuning of the restoration

parameters is p erformed by a human op erator who needs fast pro cessing of each frame rather

than just a high average throughput In such a scenario a task splitting strategy is also needed

in which almost indep endent functional units such as lo cal motion warping are identied in each

mo dule and executed in parallel Both approaches used together guarantee a go o d scaling for a large

numb er of pro cessors For mo dules where a frame sequence decomposition was not p ossible a frame

decomposition was used instead in which each frame was decomp osed into blo cks of almost equal

size and an image pro cessing op eration was applied to each blo ck The scaling of this approach is

not as go o d as that of frame sequence decomposition due to the limited size of each frame and it

was therefore only used where necessary

Evaluation of the resulting co de running on pro cessors of the CS resulted in a pro cessing time

of seconds p er frame in video resolution which represented a satisfactory improvement in

throughput compared with seconds p er frame for a single pro cessor The resulting prototyp e

system was further optimized keeping it p ortable and nally installed on an pro cessor Quadrics

CS system at LNFs site near Paris There it was tested and evaluated under real industrial

conditions and a pro cessing time of seconds p er frame was measured in video resolution and

seconds p er frame in high resolution While this was slightly higher than the pro jects original

goal of seconds p er frame FRAME achieved its main ob jective of reducing restoration costs

and the software has since b een licensed several times

The ESPRIT Pro ject FLOAT

The FLOAT pro ject carried out an assessment to determine whether there was a need for HPCN

techniques to create a computer mo del which could b e used to predict water levels at critical

sections of the Danub e The pro ject started in Octob er and nished in March VCPCs

partners in the pro ject were OIR the Austrian Institute for Development Planning and EVIS

Technologies

The motivation for the pro ject was that inland shipping companies do not fully load their barges

in order to increase their chances of navigating all sections of the waterway on their route without

delay and if water levels drop b elow exp ectations barges must either waste time waiting to pass

a shallow stretch of river or must lighten their load via additional transshipment thus increasing

transp ort costs Given access to suciently accurate prediction of water levels in particular on

critical sections of waterways carriers could adjust schedules and levels of cargo to the given water

situation This applies to many Europ ean waterways and their users but is of extreme relevance

for Danub e navigation

TM

Within FLOAT a water level prediction mo del was develop ed using EVIS a machine learning

software pro duct based on the application of genetic algorithms to symb olic computation OIR

provided data from for daily mean water levels of the Danub e the daily amount of

precipitation and daily mean temp eratures each at a numb er of measuring stations and this data

TM

was used together with EVIS to evolve a water level prediction mo del for one day in advance

To evaluate the quality of the prediction mo del its results were compared with the observed water

level data of the next day for a one year p erio d the th year Although only a very small amount

of input data was used for the prediction a very accurate prognosis was obtained The mean

absolute deviation was cm b etween the predicted and the actual water level and of all

predicted water levels deviated no more than cm from the actual value Shipping companies

would b e satised with an accuracy of cm or b etter

TM

However despite the small amount of input data the EVIS software ran on a single worksta

tion for approximately hours for the oneday prediction Furthermore shipping companies need a

reliable prognosis at least days in advance and to achieve the necessary accuracy additional vari

ables will b e needed such as meteorological satellite data weather forecast data water levels from

monitoring stations along the Danub e throughout Europ e and data supplied from hydro electric

p ower stations This augmented data set will increase the demand in computing resources

TM

This computing time could b e reduced by running the EVIS software on several computers

TM

concurrently and therefore an assessment of the p otential for parallelization of the EVIS software

was made It was concluded that it is p ossible to parallelize it eciently and that the use of HPCN

could enable accurate prediction of water levels to b e achieved within a reasonable time

The ESPRIT Pro ject HPCNCAST

The HPCNCAST pro ject was an awareness campaign to inform the foundry industry ab out the

technical and economical advantages of using HPCN technology The main goal of the pro ject was

to gain a b etter understanding of current business pro cesses in foundries and to determine their

p otential b enet from HPCN technology and networking services The pro ject started in August

and nished in March and the work was carried out by the two partners of the ATTN

consortium VCPC and ACR with supp ort from the Asso ciation of the Austrian Foundry Industry

Fachverband der Gieereiindustrie der Wirtschaftskammer Osterreich and the Austrian Foundry

Research Insitute OGI Osterreichisches Giesserei Institut a memb er of the ACR

The foundry industry in Austria consists of ab out companies most of them small and medium

size Along with the rest of the industrial sector it is under intense economic pressure from the

former eastern Europ ean countries which are forcing Austrian businesses to achieve shorter reengi

neering cycles to remain comp etitive Simulation applications are b eing used more and more in this

industry to save b oth time and money The level of accuracy required necessitates sophisticated

computing techniques and leads to high computing requirements typical calculations requiring

from several hours up to several days

The HPCNCAST pro ject was intended to deliver an overview of the needs and exp ectations of

the foundry industry regarding HPCN technologies within the following three years The analysis

of the computing needs was not restricted to the simulation phase in the casting pro cess but also

included the prepro cessing necessary to convert the data received from the industries A further

ob jective of the work was to determine to what extent the ndings of a survey of Austrian foundry

businesses are characteristic of the Europ ean foundry industry as a whole

In order to achieve its goals the pro ject prepared promotional material targeted at foundries held

two awareness events in Austria prepared and distributed a questionnaire to Austrian foundries

and evaluated its results and p erformed three indepth assessments The results were compared

with the ndings in other EUcountries via the EARTO network and a rep ort on the results was

pro duced and disseminated via the network of HPCN TTN no des

The pro ject found that there is a steep increase in use of compute intensive tasks like rapid proto

typing and simulation within the Austrian foundry industry but that the ma jority of the companies

give all work related to computeintensive tasks to external consultants In general simulation was

not considered to b e time critical A reduction of runtime would b e interesting only if it was by

more than a factor of ten b ecause reasonable working during oce hours would then b e p ossible

It was concluded that for most of the small and medium size companies there is no real demand for

HPCNtechnology within the next years however demand may exist for service providers for the

Austrian foundry industry and for the few big Austrian companies Finally the ndings in Austria

were found to b e very similar to observations made in Germany and Spain

The ESPRIT Pro ject APPEAL

The goal of the APPEAL pro ject was to assess the p otential of HPCN technology to improve

the quality of simulation of analog and mixedsignal ASICs ApplicationSp eci c Integrated Cir

cuits in an industrial context The pro ject started in March and nished in Octob er

VCPCs partners were the the French software company ANACAD pro ducer of the commercial

software package ELDO widely used in industry for analog circuit simulation and the Austrian

chip manufacturer AMS Austria Mikro Systeme a leading end user of ELDO

In many sectors including the communications and automotive industries in particular mechanical

functions are b eing replaced by electronic functions This has led to a strong demand for integrated

systems solutions esp ecially where the integration of analog and digital functions on one integrated

circuit is required For the suppliers time to market is of critical imp ortance when custom chips

are pro duced to satisfy these demands

The extent to which ASICs are simulated b efore they go into pro duction is as a result of the

market constraints strongly related to the computational requirements of the task The numb er

of devices integrated on a single chip and hence the complexity of integrated circuits is steadily

increasing and as a result the demands placed on the designers and their development to ols

are rising dramatically Software pro ducts are currently in widespread use for the various phases

of digital circuit design but analog and mixedsignal integrated circuits exhibit a much higher

complexity and the computing constraints do not currently p ermit the same extensive simulations

during the design phase The industrial risk is therefore signicantly higher

The APPEAL pro ject aimed to b e a rst step in providing improved computational supp ort for

the design of such circuits by assessing the p otential of HPCN technology to improve the quality

of simulation of analog and mixedsignal ASICs in an industrial context The intention was to

evaluate the p otential for ANACADs software package ELDO in order to determine its suitability

for parallel execution on a network of workstations ELDO is derived from the public domain

circuit simulator Spice develop ed by the University of Berkeley in the early seventies as are other

commercial analog circuit simulators and it was therefore planned to analyze the kernel of Spice

in order to determine parallelization strategies and to assess the implementation eort needed and

computing resources required to achieve appropriate p erformance goals for industrial deployment

of a parallel version of ELDO

Test data was supplied by AMS who develop ed three dierent test cases two digital and one

analogue and mixedsignal circuit Unfortunately none of the three test cases could b e simulated

correctly by Spice The problem was tracked down to feedback lo ops which resulted in severe

convergence problems in Spice Without feedback lo ops no test case of reasonable size fullling

the requirements of the pro ject plan could b e created so VCPC suggested continuing the work

with the ELDO application However the ELDO source co de was not disclosed to the APPEAL

pro ject and thus no further work in assessing its p otential for parallelization could b e done and

it was not p ossible to complete the assessment

The ESPRIT Pro ject FITS

The goal of the FITS pro ject was to create an extensible integrated to olset for the creation migra

tion and p erformance tuning of Fortran applications for execution on a variety of HPC systems

The pro ject started in June and nished in May and the partners were SIMULOG

PALLAS ZHR Technical University Dresden VCPC INRIA Battelle and QSW

The FITS to olset was based on two ma jor Europ ean to ols dedicated to the supp ort of Fortran

applications the FORESYS source co de restructurer from SIMULOG and the VAMPIR p erfor

mance analysis to ol from PALLAS as well as the TSF transformation mo dule from SIMULOG and

ANALYST a research prototyp e interactive Fortran program analyzer from VCPC In addition

ideas were incorp orated from another to ol IDA from the University of Southampton

FORESYS is itself an integrated collection of to ols to check analyze and restructure Fortran

programs It accepts Fortran with many extensions and can transform it into standardconforming

Fortran or Fortran It pretty prints the source co de to make the co de structure clearer and

p erforms a numb er of interpro cedural checks and can display data dep endence graphs VAMPIR

is a to ol which supp orts program optimization by visualizing its execution b ehavior It do es so by

displaying information obtained by tracing a program run and has many measurement options and

graphical displays and allows the user to zo om in to arbitrarily small time intervals of the trace

The TSF mo dule is an extension to FORESYS that provides a set of source co de transformations

such as lo op blo cking unrolling interchanging etc which are applied under the users control

ANALYST is an interactive Fortran program analyzer which allows the user to browse through a

co de and obtain information ab out it in graphical and textual forms The displays are interactive so

that for example clicking on an arc of a call graph gives details of the call it represents ANALYST

formed the basis for developing the FITS Graphical Interface Mo dule GIM IDA is a command

line driven to ol for interactive program analysis which can provide a quick textual display of a

programs call graph

The combined FITS to olset extends the functionality of the individual to ols in several ways For

example FORESYS can automatically instrument a program so that it generates a tracele for

display by VAMPIR VAMPIR can then not only visualize the execution b ehavior but it can also

invoke FORESYS to display the corresp onding lo cation in the source text and can invoke the GIM

to display the relevant no des in the call graph Conversely the GIM can invoke FORESYS and

VAMPIR displays

The GIM which was VCPCs contribution to the pro ject provides several ways of displaying

information ab out the co de in a graphical way such as a call graph Clicking on a program unit

no de in the call graph can display its source co de in the FORESYS text display or display statistics

from VAMPIR The no des of the graphs are movable and the user can group several no des together

hide them or shrink them into one no de and give it a lab el Graphs can b e displayed as graphs

or trees The tree mo de makes it easier to follow a path from a caller to a callee but the graph

mo de is a more compact form of displaying graphs where some subprograms are called from many

dierent lo cations Various other options help to make huge graphs more compact so that users can

have p ortions of the graph displayed in which they are interested such as layer mo de in which

no des at the same depth from the ro ot no de are group ed together

The design of the to olset was inuenced by the requirements of the enduser partners Battelle and

QSW who each migrated an industrial Fortran legacy co de to Fortran and parallel systems

initially using the comp onent to ols and evaluating them and providing feedback to the develop ers

of the integrated to olset Battelle restructured and parallelized their CFD co de DIVA and QSW

did the same for the RIVIA co de from their parent company Alenia Spazio which computes the

radiation patterns of complex antenna systems Battelle and QSW also evaluated the nal to olset

and concluded that it fullled most of their requirements

The ESPRIT Pro ject VICAR

The goal of VICAR Video Indexing Classication Annotation and Retrieval was to create a

system to supp ort the cataloguing and retrieval of huge amounts of video material in television

archives The pro ject started in August and nished in Decemb er and VCPCs partners

were the Dutch company Sentient Machine Research SMR the Austrian research center Joanneum

Research JRS the Free University of Amsterdam VUA the Austrian national broadcaster ORF

the German television station SWR the Swedish national television station SVT the Dutch Audio

Visual Archive NAA and the Dutch company Ko ot Management Consultancy KMC

The task of TV archives is the longterm preservation and highquality do cumentation of all TV

pro duction to enable reuse of material by program pro ducers This material has to b e annotated

with textual descriptions by archivists who view the material VICAR aimed to help them by

p erforming some classication and annotation of video material automatically and to help program

pro ducers by providing them with querybyimage capabilities in addition to traditional textual

queries

The VICAR system was designed to pro cess the video material in two basic phases rst indexing

it to pro duce data structures representing video content and then p erforming interpretation of

the index structures to extract explicit textual lab els Query by image can then b e done by

indexing the target image and comparing its index directly with the stored indexes whereas textual

queries match against the results of the interpretation Six indexing mo dules were develop ed by

SMR and JRS a basic analysis mo dule which identied shots and keyframes a motion analysis

mo dule a similarity matching mo dule a setting classier mo dule a car nder mo dule and a

face nder mo dule VUA did work on the representation of video content ORF SWR SVT and

NAA provided video material and were pilot users and KMC was resp onsible for co ordinating

commercial exploitation of the system

During the rst year of the pro ject VCPC develop ed a prototyp e Webbased user interface which

allowed a textual annotation database to b e queried and movies corresp onding to the results of

the queries to b e browsed by displaying sequences of socalled keyframes representative frames

which give a quick overview of a p ortion of the movie in a static form The Web software was

implemented using the CGI Interface to serverside software running on the CS system at VCPC

The software consisted of scripts written in Perl which queried an SQL database executed a C

program to extract individual frames from the movies which were stored in MPEG format and

generated HTML for queries and results

Further work on a user interface was carried out by JRS who develop ed a Javabased user interface

for the indexing mo dules called the Video Navigator VIN It oered a plugin interface for

develop ers so new mo dules could easily b e added and allowed textual queries as well as query by

image Meanwhile VCPC worked on the parallelization of indexing mo dules and set up a reference

system for evaluation of the software Due to a combination of factors including the withdrawal

of the CS from service it was decided to target the parallel software for a Windows NT system

rather than Unix as originally planned and an NT cluster with sixpro cessors was pro cured A

master PC was installed with Windows NT Server Terminal Server Edition and MetaFrame

which allowed simultaneous remote access for up to ve develop ers or users and the WMPI

implementation of the MPI message passing interface was installed on all no des

VCPC worked on parallelizin g SMRs face nder and setting classier mo dules but most progress

was made with the face nder which had the task of identifying human faces It worked by scanning

each image horizontally and vertically at several dierent scales cutting out a shap e corresp onding

to an average face shap e and applying a neural network to calculate its similarity with a human face

Proling of the co de revealed that the lo op to p erform the scanning consumed most of the execution

time and since the scanning of each scale was indep endent the parallelization strategy adopted

was to use a task farm to pro cess the scales and a parallel version of the co de demonstrated a

mo dest but promising sp eedup of ab out three running on all six pro cessors Towards the end of

the pro ject SMR delivered another sequential version of the face nder with a signicantly dierent

internal structure and execution prole which needed a dierent parallelization strategy to achieve

a go o d sp eedup A new strategy was identied but its realization was not p ossible within the time

frame of the pro ject

VCPC also installed the VIN on the reference system integrated the parallel version of the face

nder mo dule into it and made p erformance measurements for the indexing and query by image

mo dules This system was used to supp ort ORF in their evaluation of the software

The FSP Pro ject Mathematical Metho ds and To ols for

Digital Image Pro cessing

Within pro ject S of the Austrian FSP research program VCPC collab orated with the Insti

tute for Computer Graphics ICG in Graz Austria and the Jet Propulsion Lab oratory JPL in

Pasadena California USA to parallelize the Magellan SAR pro cessor a program which p erforms

image analysis on the data collected by the Magellan spacecraft

Magellan used a sophisticated imaging radar to pierce the cloud cover enshrouding the planet Venus

and map its surface The spacecraft was carried into Earth orbit in May by space shuttle

Atlantis and was then prop elled by a b o oster engine toward Venus where it arrived in August

During its day primary mission referred to as Cycle the spacecraft mapp ed well over

p ercent of the planet with its highresolution Synthetic Ap erture Radar SAR The spacecraft

returned more digital data in the rst cycle than all previous US planetary missions combined It

completed its third day p erio d mapping the planet in Septemb er

The motivation for parallelizing the co de was to facilitate reanalysis of the data b ecause when it was

rst analyzed there were ephemeris errors of the order of kilometer as well as radiometric errors

which meant that the pro cessed data was not go o d enough for stereoscopic analysis However

the data was originally analyzed using a program running on sp ecial hardware optimized for the

task and even then it to ok around hours of pro cessing which is ab out six weeks running

hours a day A parallel version of the co de that could reduce the analysis time would clearly make

reanalysis of the data a more practical prop osition

Magellan orbited Venus approximately once every hours passing over the p oles and mapp ed

the surface of the planet in thin northsouth strips ab out km wide and km long These

strips were nicknamed no o dles and some of them were mapp ed during the whole mission

p er cycle The radar op erated in burst mo de sending out trains of pulses and listening for

echo es b etween the pulses Each no o dle contained data from around bursts each of which

can b e analyzed indep endently except for the nal stage called lo ok buildup which merges

results from overlapping observations in neighb oring bursts and pro duces the nal output from

the program a bitmap image of the planets surface

The parallelization strategy that was adopted was to reco de the radar burst pro cessing lo op so that

part of each iteration could b e executed as a task on a separate pro cessor In order to do this it was

necessary to understand co des data dep endences and due to its complexity approximately

thousand lines of Fortran in around subroutines plus ab out lines of C and the limited

amount of do cumentation available for it program analysis to ols were needed The FORESYS

IDA and FORGExplorer to ols were evaluated and it was found that the most eective approach

was to use FORGExplorers variable trace and common blo ck usage facilities in conjunction with

FORESYS as a prepro cessor to clean up the co de The parallelization was p erformed in the

messagepassing style of programming using the p ortable MPI message passing interface with the

no de QSW CS machine at VCPC as a development platform Technical details of this work

can b e found in

The ACTS Pro ject AC DIANE

DIANE Design Implementation and operation of a distributed Annotation Environment was

conceived as a service allowing users to create exchange and consume multimedia data easily The

basic concept to b e supp orted was that of a multimedia annotated do cument consisting of several

media recorded application output text audio video images HTML pages mouse movements

etc The pro ject started in Septemb er and nished in April It was co ordinated by

Kapsch AG in Vienna and the other partners were IPVR of the University of Stuttgart VCPC

Systemas y Tratamiento de Information SA STI in Spain Hospital General de Manresa HGM

in Spain and Silogic SA in France

Annotations are simply questions remarks suggestions or notes that a reader adds to an existing

do cument Because the reader may not b e the owner of the do cument these annotations should

b e distinguishable from the original do cument and it should b e clear which annotations b elong to

which user Usenet newsgroups are an example of a textbased annotations system The DIANE

pro ject designed a multimedia authoring system for distributed environments and develop ed a

prototyp e implementation of it The system allows users to create and combine multimedia data

in do cuments share these do cuments with other users through a common workspace and add

annotations to their own or other users do cuments A DIANE do cument is not just a container for

dierent typ es of media ob jects but is more like a movie with an extent in time and the various

ob jects within it not only have a p osition within the do cument area but also a starting p oint and

a duration on the time line of the do cument

The functionality of DIANE is realized through a clientserver architecture where the DIANE

server manages the do cument database shared workspace on a remote machine while the user

delivers and retrieves multimedia data through the client on a lo cal machine The DIANE client

not only exchanges data with the server but is a fullfeatured multimedia authoring system which

allows various multimedia ob jects and streams to b e combined into do cuments The system was

implemented almost entirely in Java for maximum p ortability and a numb er of standard frame

works and technologies were exploited The Java Media Framework was used for the audiovideo

steam implementation security over the Internet was ensured by using the Secure So cket Layer

SSL and the Java RMI Remote Metho d Invo cation API was used to implement the clientserver

connection CORBA was also evaluated for the latter task and was considered to b e a candidate

for a next generation DIANE architecture Finally a standard relational database management

system was used to hold the annotation metadata accessed via JDBC The complete system is

available for Windows Windows NT and Solaris

VCPC contributed to two phases of the DIANE pro ject rstly to the denition of user require

ments by making a case study of sp ecic usage scenarios of the system for training and then to the

evaluation of the prototyp e system by carrying out eld trials The eld trials at VCPC fo cused on

the use of DIANE in educational environments like universities or scho ols sp ecically for teletrain

ing and collab orative work The main fo cus was put on the use of the software for learning course

development for student teaching and for daily pro ject work The Hospital General in Manresa

also evaluated the DIANE system for use in telemedicine sp ecically in the area of telepathology

Pathologists at HGM and at the University of Vall dHebron in Barcelona km away assessed

the use of DIANE in two tasks they normally p erform namely consulting each other for a second

opinion or diagnosis with dicult cases and preparing multimedia material for teaching

It was concluded that DIANE lends itself much more to scenarios which require multimedia training

and presentation and that at present it would b e appropriate for creating and publishing training

material for use within one large organization such as a university or b etween organizations sharing

dedicated highbandwidth communication links It was also concluded that that the functionality

of the prototyp e system needed to b e enhanced in various ways in particular to allow it to use a

normal internet browser which would dramatically increase the available market size for a pro duct

based on DIANE At the end of the pro ject the partners had already b egun IPVR or intended

Kapsch STISA Silogic to develop such pro ducts

Chapter

Teaching

University Courses

The Institute oers a wide variety of undergraduate courses and graduate courses covering lan

guages compilers programming environments architectures parallel programming and software

engineering Moreover a sp ecial track Parallel Systems covers high p erformance systems parallel

programming and parallelizin g compilers

Lectures

Theoretical Computer Science by H Zima

Software Engineering by S Benkner H Zima

Compiler Construction by T Fahringer

Program Analysis for Sup ercomputing by H Zima

Automatic Parallelization for Sup ercomputers by H Zima

Parallel Systems by P Brezany

Classro om Exercises

Theoretical Computer Science by B Wender

Software Engineering by S Benkner P Brezany E Mehofer and B Wender

Compiler Construction by T Fahringer

Lab oratories

Internet Programming

Parallel Software Development

High Performance Programming Environments

by S Benkner T Fahringer E Laure E Mehofer and B Wender

Seminars

High Performance Computing

Internet Technology

by T Fahringer E Laure E Mehofer B Wender and H Zima

Other Courses

Linux Unix Basics Installation and Administration by W Weisz

Diploma Theses

Ch Neuhold Parallelisierung von HPF Array Assignments in German Masters Thesis Univer

sity of Vienna

M Egger Programmtransformation fur parallelisierende Compiler in German Masters Thesis

University of Vienna

B Velkov VFC Design and Implementation of a High Performance Fortran Compilation Environ

ment Masters Thesis Vienna University of Technology

Chapter

Publications

Chapters in Bo oks

S Benkner P Mehrotra J Van Rosendale and HP Zima Explicit Management of Communication Schedules

in HPF In Seventh ECMWF Workshop on the Use of Paral lel Processors in Meteorology World Scientic

Publishi ng Co ISBN

S Benkner High Performance Fortran for advanced application s In Annual Review of Scalable Computing

pages World Scientic Publishing Co ISBN

Refereed Publications

S Benkner VFC The Vienna Fortran Compiler Journal of Scientic Programming Decemb er

S Benkner HPF High Performance Fortran for Advanced Industrial Applications In Proc HPCN

Amsterdam April

S Benkner K Sanjari V Sipkova and B Velkov Parallelizi ng Irregular Applications with Vienna HPF

Compiler VFC In Proc HPCN Amsterdam Netherlands April

S Benkner C Neuhold M Egger K Sanjari V Sipkova and B Velkov VFC The Vienna HPF Compiler

In Proc Int Conf on Compilers for Paral lel Computers CPC Linkoping Sweden July

S Benkner P Mehrotra J Van Rosendale and H Zima HighLevel Management of Communication Sched

ules in HPFlike Languages In Proc Int Conf on Supercomputing ICS Melb ourne Australia July

S Benkner Optimizing Irregular HPF Applications Using Halos In Workshop Proceedings of the International

Symposium on Paral lel Processing San Juan Puerto Rico April

S Benkner G Lonsdale and H P Zima The HPF Pro ject Supp orting HPF for Advanced Industrial

Application s ProcEuroPar Paral lel Processing Toulouse France AugustSeptemb er Lecture Notes

in Computer Science LNCS Vol Springer Verlag

S Benkner HPF High Performance Fortran for Advanced Scientic and Engineering Applications Journal

of Future Generation Computer Systems

S Benkner and HP Zima Compiling High Performance Fortran for distributedmemory architectures In

TrystramDEd Paral lel Computing Special Anniversary Issue pp

P Brezany M Dang and A Choudhary Language and Compiler Supp ort for OutofCore Irregular Applica

tions on DistributedMemory Multipro cessors In Proc th International Workshop on Languages Compilers

and RunTime Systems for Scalable Computers LCR Pittsburgh USA May Springer Verlag LNCS

P Brezany and M Dang Extending InputOutput Functionality of High Performance Fortran In Proc

International Conference on Paral lel and Distributed Processing Techniques and Applications PDPTA

Las Vegas Nevada USA July

P Brezany Parallel InputOutput Supp ort for High Performance Fortran Programming Environments In

Proc Workshop on OutofCore Computation COCA Cap Hornu France Septemb er

P Brezany S Grabner K Sowa and R Wissmueller DeHiFo An Advanced HPF Debugging System In

th Euromicro Workshop on Paral lel and Distributed Processing Funchal Portugal February

P Brezany P Czerwinski R Koppler K Sowa and J Volkert Advanced Visualizati on and Data Distribution

Steering in an HPF Paralleli zatio n Environment In Proc ParCo Delft The Netherlands August

P Brezany and M Winslett Advanced Data Rep ository Supp ort for Java Scientic Programming In Proc

Conference HPCN Europe Springer Verlag LNCS April

M Calzarossa L Massari A Merlo M Pantano and D Tessera Integration of a Compilation System and

a Performance To ol The HPF Approach In Proc HPCN Amsterdam The Netherlands April

B Chapman HP Zima M Haines P Mehrotra and J Van Rosendale OPUS A Co ordination Language

for Multidiscip li na ry Applications Scientic Programming

B Chapman P Mehrotra and H P Zima Enhancing Op enMP with Features for Lo cality Control In

ZwiehoferW and KreitzN Eds Proc Eighth ECMWF Workshop on the Use of Paral lel Processors in

Meteorology Towards Teracomputing pp Reading England Novemb er World Scientic

B Chapman P Mehrotra and H Zima Enhancing Op enMP With Features for Lo cality Control In Proc

ECMWF Workshop Towards Teracomputing The Use of Paral lel Processors in Meteorology Reading

England Novemb er

B Di Martino Algorithmic Concept Recognition Supp ort for Automatic Paralleli zatio n A Case Study for

Lo op Optimization and Paralleli zatio n Journal of Information Science and Engineering special issue on

Compiler Techniques for HighPerformance Computing

B Di Martino and H P Zima Supp ort of Automatic Paralleliza tion with Concept Comprehension Journal

of Systems Architecture JSA Vol pp

T Fahringer Ecient Symb olic Analysis for Paralleli zin g Compilers and Performance Estimators Journal of

Supercomputing Kluwer Academic Publishers May

T Fahringer Symb olic Analysis Techniques for Program Parallelizati on Journal of Future Generation Com

puter Systems March

T Fahringer and E Mehofer Problem and Machine Sensitive Communication Optimization In Proc th

ACM International Conference on Supercomputing Melb ourne Australia July ACM Press

T Fahringer P Brezany B Dimartino M Pantano A Pozga j K Sowa and B Wender On the Development

of HPF To ols as Part of the Aurora Pro ject In nd Annual HPF User Group Meeting held in conjunction

with VECPAR Porto Portugal June

T Fahringer and E Mehofer BuerSafe and CostDriven Communication Optimization Journal of Paral lel

and Distributed Computing Academic Press

I Glendinnin g Paralleliza tion of a Satellite Signal Pro cessing Co de Strategies and To ols In P Zinterhof

M Va jtersic and A Uhl editors Paral lel Computation Proc th International ACPC Conference volume

of Lecture Notes in Computer Science pages SpringerVerlag February

S Kroner M Nolle and G Schreib er Parallelizati on of Structured Invariant Neural Networks for Shift And

Rotation Invariant Pattern Recognition In Proc International Symposium on Neural Computation Vienna

Austria Septemb er

E Laure and B Chapman Interpro cedural Array Alignment Analysis In Proceedings HPCN Europe

Amsterdam The Netherlands April

E Laure P Mehrotra and H Zima Opus Heterogeneous computing with data parallel tasks In Proc Work

shop on Programming Environments Clusters and Computational Grids for Scientic Computing Blackb erry

Farm Tennessee Septemb er

E Laure Distributed High Performance Computing with OpusJava In Proc ParCo Delft The Nether

lands August

E Laure M Haines P Mehrotra and H Zima Compiling Data Parallel Tasks for Co ordinated Execution

In P Amestoy P Berger M Dayde I Du V Fraysse L Giraud and D Ruiz editors EuroPar Paral lel

Processing Lecture Notes in Computer Science No SpringerVerlag

E Laure ParBlo cks A New Metho dology for Sp ecifying Concurrent Metho d Executions in Opus In

P Amestoy P Berger M Dayde I Du V Fraysse L Giraud and D Ruiz editors EuroPar Paral lel

Processing Lecture Notes in Computer Science No SpringerVerlag

E Laure P Mehrotra and H Zima Opus Heterogeneous Computing With Data Parallel Tasks Paral lel

Processing Letters June

P Mehrotra J Van Rosendale and H P Zima High Performance Fortran Status and Prosp ects Pro c

Fourth International Workshop on Applied Parallel Computing PARA Umea Sweden June

P Mehrotra J Van Rosendale and H P Zima High Performance Fortran History Status and Future

In E Zapata and D Padua Eds Parallel Computing Sp ecial Issue on Languages and Compilers for

Parallel Computers Vol No pp

P Mehrotra J Van Rosendale and HP Zima Language Supp ort for Multidisci pl in ary Applications IEEE

Computational Science and Engineering Vol No pp AprilJune

J H Merlin S B Baden S J Fink and B M Chapman Multiple data parallelis m with HPF and KeLP

Future Generation Computer Systems

J H Merlin S B Baden S J Fink and B M Chapman Multiple data parallelis m with HPF and KeLP

In Peter Slo ot Marian Bubak and Bob Hertzb erger editors High Performance Computing and Networking

Proc HPCN Europe pages Amsterdam Netherlands April SpringerVerlag

Lecture Notes in Computer Science

M Nolle M Pantano and X Sun Communication Overhead Prediction and its Inuence on Scalabil

ity In Proc International Conference on Paral lel and Distributed Processing Techniques and Applications

PDPTA Las Vegas USA July

K Sowa M Bubak W Funika and R Wismueller Symb ol Table Management in an HPF Debugger In

Proc Conference HPCN Europe Springer Verlag LNCS April

X H Sun M Pantano and T Fahringer Performance Range Comparison for Restructuring Compilation In

International Conference on Paral lel Processing Minneap olis Minnesota August IEEE Computer

So ciety Press

Technical Rep orts

J Kno op and E Mehofer Interpro cedural Distribution Assignment Placement More than just Enhancing

Intrapro cedural Placing Techniques Technical Rep ort TR Institute for Software Technology and Parallel

Systems University of Vienna February

P Mehrotra J Van Rosendale and H Zima Language Supp ort for Multidisci pl i nary Applications Technical

Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna February

S Benkner K Sanjari V Sipkova and B Velkov Parallelizi ng Irregular Applications with the Vienna HPF

Compiler VFC Technical Rep ort TR Institute for Software Technology and Parallel Systems University

of Vienna May

S Benkner HPF High Performance Fortran for Advanced Industrial Applications Technical Rep ort TR

Institute for Software Technology and Parallel Systems University of Vienna May

S Kroner M Nolle and G Schreib er Parallelizati on of SINNs for Shift and Rotation Invariant Pattern

Recognition Technical Rep ort TR Institute for Software Technology and Parallel Systems University

of Vienna Octob er

M Nolle M Pantano and X Sun Communication Overhead Prediction and its Inuence on Scalabili ty

Technical Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna

Octob er

E Mehofer Optimization of Data Remapping in Data Parallel Language Technical Rep ort TR Institute

for Software Technology and Parallel Systems University of Vienna Novemb er

S Benkner E Laure and H Zima HPF An Extension of HPF for Advanced Industrial Applications

Technical Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna

January

B Chapman P Mehrotra and H Zima Enhancing Op enMP With Features for Lo cality Control Technical

Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna February

E Laure M Haines P Mehrotra and H Zima On the Implementation of the Opus Co ordination Language

Technical Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna

May

E Laure P Mehrotra and H Zima Opus Heterogeneous Computing With Data Parallel Tasks Technical

Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna May

E Laure ParBlo cks A new Metho dology for Sp ecifying Concurrent Metho d Executions in Opus Technical

Rep ort TR Institute for Software Technology and Parallel Systems University of Vienna June

E Mehofer and B Scholz Probabilisti c Data Flow System with TwoEdge Proling Technical Rep ort TR

Institute for Software Technology and Parallel Systems University of Vienna Decemb er

H J Ehold WN Gansterer and CW Ueb erhub er HPF State of the Art Technical Rep ort AuR

Institute for Software Technology and Parallel Systems University of Vienna February

H J Ehold W N Gansterer D F Kvasnicka and C W Ueb erhub er Utilization of ReadyMade Software

Institute for Software Technology and Parallel Systems in HPF Programs Technical Rep ort AuR

University of Vienna Septemb er

B Chapman P Mehrotra and H Zima Enhancing Op en MP With Features for Lo cality Control Technical

Institute for Software Technology and Parallel Systems University of Vienna February Rep ort AuR

E Laure M Haines P Mehrotra and H Zima On the Implementation of the Opus Co ordination Language

Technical Rep ort AuR Institute for Software Technology and Parallel Systems University of Vienna

May

E Laure P Mehrotra and H Zima Opus Heterogeneous Computing With Data Parallel Tasks Technical

Institute for Software Technology and Parallel Systems University of Vienna May Rep ort AuR

H J Ehold W N Gansterer D F Kvasnicka and C W Ueb erhub er Ecient HPF Programs Technical

Rep ort AuR Institute for Software Technology and Parallel Systems University of Vienna June

H J Ehold W N Gansterer H Karner D F Kvasnicka and C WUeb erhub er High Performance Cholesky

and FFT Algorithms in HPF Technical Rep ort AuR Institute for Software Technology and Parallel

Systems University of Vienna July

T Fahringer and B Scholz A Unied Symb olic Evaluation Framework for Parallelizi ng Compilers Technical

Institute for Software Technology and Parallel Systems University of Vienna August Rep ort AuR

T Fahringer P Blaha A Hssinger J Luitz E Mehofer H Moritsch and B Scholz Development and

Performance Analysis of RealWorld Applications for Distributed and Parallel Architectures Technical Rep ort

AuR Institute for Software Technology and Parallel Systems University of Vienna August

T Fahringer and A Pozga j PT A Performance Estimator for Distributed and Parallel Programs Techni

cal Rep ort AuR Institute for Software Technology and Parallel Systems University of Vienna Septem

b er

T Fahringer A Pozga j J Luitz and H Moritsch Evaluation of PT A Performance Estimator for

Institute for Software Technology and Distributed and Parallel Applications Technical Rep ort AuR

Parallel Systems University of Vienna Octob er

E Laure E Mehofer H Moritsch V Sipkova and A Swietanowski HPF in Financial Management Under

Uncertainty Technical Rep ort AuR Institute for Software Technology and Parallel Systems University

of Vienna Octob er

P Brezany K Sowa S Grabner and R Wismller SPiDER An Advanced HPF Debugging System Technical

Institute for Software Technology and Parallel Systems University of Vienna Novemb er Rep ort AuR

M Bubak W Funika G Mlynarczyk K Sowa and R Wismller Symb ol Table Management in an HPF

Debugger Technical Rep ort AuR Institute for Software Technology and Parallel Systems University

of Vienna Novemb er

P Brezany P Czerwinski K Sowa R Koppler and J Volkert Advanced Visualization and Data Distri

Institute for Software bution Steering in an HPF Paralleliz atio n Environment Technical Rep ort AuR

Technology and Parallel Systems University of Vienna Novemb er

E Mehofer and B Scholz Probabilisti c Data Flow System with TwoEdge Proling Technical Rep ort

AuR Institute for Software Technology and Parallel Systems University of Vienna Decemb er

Editorial Activities

Peter Brezany

Pro ceedings of the Workshop High Performance Computing on Very Large Data Sets Amsterdam April

Included into SpringerVerlag LNCS

Hans P Zima

Editor Internationale Computer Bibliothek AddisonWesley

Asso ciate Editor Compiler Technology Scientic Programming John Wiley

Memb er Editorial Board Concurrency Practice and Exp erience Parallel Pro cessing Letters The Journal for

Universal Computer Science JUCS Parallel Computing

Program Committee Memb erships

Peter Brezany

 International Workshop on High Performance Computing on Very Large Data Sets Amsterdam April

 st International Conference on ProblemSolving Environments Infrastructure and Prototyp es San Feliu de

Guixols Spain June

 th International Conference EuroPVMMPI Barcelona Spain Septemb er

 th International Conference HPCN Europ e Amsterdam The Netherlands April

 th International Conference HPCN Europ e Amsterdam The Netherlands May

 International Conference on High Performance Computing on HewlettPackard Systems HiPer Zurich

Switzerland Octob er

 International Conference on High Performance Computing on HewlettPackard Systems HiPer Bergen

Norway Septemb er

 Third International Conference on Parallel Pro cessing and Applied Mathematics PPAM Kazimierz Dolny

Poland Septemb er

Thomas Fahringer

IEEE Euromicro Workshop on Parallel and Distributed Pro cessing Madrid Spain January

th IEEE International Parallel Pro cessing Symp osium th Symp osium on Parallel and Distributed Pro

cessing Orlando Florida April

th ACM Workshop on Languages Compilers and Runtime Systems for Scalable Computers Pittsburgh

Pennsylvani a May

th ACM International Conference on Sup ercomputing Melb ourne Australia July

IEEE Workshop on Communication Architecture and Applications for Networkbased Parallel Computing

CANPC Orlando Florida January

John Merlin

Second HPF User Group Conference Porto Portugal June

Hans P Zima

th Workshop on Languages Compilers and RunTime Systems for Scalable Computers LCR Carnegie

Mellon University Pittsburgh Pennsylvania May

th International Conference on High Performance Computing Madras India Decemb er

th International Conference on Compiler Construction CC Amsterdam The Netherlands March

Fourth International Workshop on HighLevel Parallel Programming Mo dels and Supp ortive Environments

HIPS Puerto Rico April

International Symp osium on High Performance Computing ISHPC Kyoto Japan May

Parallel Computing ParCo Delft The Netherlands August

Third International Conference on Parallel Pro cessing and Applied Mathematics PPAM Kazimierz Dolny

Poland Septemb er

PhD and Habilitation Theses

E Mehofer Optimization of Data Remapping in DataParal lel Languages PhD Thesis Vienna University of

Technology April

T Fahringer Program Analysis and Optimization for Paral lel Architectures Habilitatio n Thesis University

of Vienna Octob er

P Kacsuk LOGFLOW Data Driven Paral lel Execution of Logic Programs Habilitati on Thesis University

of Vienna

G Schreib er Mo dulare Parallelisi erun g von Algorithmen der digitalen Bildverarb eitu ng PhD Thesis Tech

nical University HamburgHarburg Germany

A Goller Parallel and Distributed Pro cessing of Large Image Data Sets PhD Thesis Technical University

Graz Austria

Exhibitions and Conferences

International Workshop on Source Level Debugging Systems for Parallel Programming Environment May

Vienna Austria

IST Exhibition Decemb er Vienna Austria HPF Research Exhibition

Workshop on HighPerformance Computing on Very Large Data Sets April Amsterdam The Nether

lands

Parallel Computing ParCo August Delft The Netherlands HPF Debugging System Research

Exhibition

High Performance Computing and Networking SC Novemb er Orlando Oregon USA HPF Debug

ging System Research Exhibition

Chapter

Lectures Research Visits and Visitors

Lectures and Research Visits

 Siegfried Benkner

High Performance Fortran for Advanced Industrial Applications International Workshop on HPF held

in conjunction with HPCN Amsterdam The Netherlands April

VFC The Vienna HPF Compiler International Conference on Compilers for Parallel Computers

Linkoping Sweden June

VFC on NEC Parallel Computers Industrial Aliates Meeting Vienna Austria Octob er

HPF and the VFC Compiler CC Research Labs NEC Europ e Ltd St Augustin Germany

Novemb er

Ecient Paralleliza tion of FEMApplications with HPF CC Research Labs NEC Europ e Ltd St

Augustin Germany February

HPF and VFC on NEC Parallel Computers NEC Corp oration Central Research Lab oratories Tokyo

Japan March

Optimizing Irregular Applications Using Halos Workshop on Solving Irregularly Structured Problems

in Parallel IPPS San Juan Puerto Rico April

Exp eriments With Lo cal HPF Programming Styles NEC Corp oration Central Research Lab oratories

Tokyo Japan July

HPF Extensions for Irregular Application s rd International HPF Users Group Meeting Redondo

Beach CA USA August

The HPF Pro ject Supp orting HPF for Advanced Industrial Applications EuroPar Toulouse

France August

Additional research visits

CC Research Lab oratories NEC Europ e Ltd St Augustin Germany Research Sta Memb er from

August until Septemb er on leave from the University of Vienna

Engineering Systems International Paris France February

Institute for Algorithms and Scientic Computing SCAI German National Research Center for Infor

mation Technology GMD Schloss Birlinghoven St Augustin Germany Dezemb er March

June

 Peter Brezany

Language and Compiler Supp ort for OOC Irregular Application s on DistributedMemory Multipro ces

sors Workshop on Languages Compilers and Runtime Systems for Parallel Computers Pittsburgh

PA USA May

Extending InputOutput Functionality of High Performance Fortran University of Illinois Champaign

Urbana IL USA June

Fast Access to Persistent Multidimensi onal Arrays from HPF Applications Dartmouth College Hanover

NH USA June

Parallel InputOutput Supp ort for High Performance Fortran Programming Environments Invited Lec

ture Workshop on OutofCore Computation COCA Cap Hornu France Septemb er

Fast Access to Persistent Multidimens ion al Arrays in Scientic Programs Industrial Aliates Meeting

Vienna Austria Octob er

Advanced Data Rep ository Supp ort for Java Scientic Programming HPCN Europ e Amsterdam

The Netherlands April

InputOutput Intensive Massively Parallel Computing Workshop on Simulations and Data Analysis

Seib ersdorf Austria July

Parallel Scientic Data Rep ositories EUNSF Workshop on Large Scientic Databases Annap olis MD

USA Septemb er

Additional research visit

Technical University Munich Germany January

 Thomas Fahringer

Distributed and Parallel Pro cessing Instituto de Matematica e Estatistica Universidade de Sao Paulo

Brazil January

 Ian Glendinning

The VICAR Pro ject Video Indexing Classication Annotation and Retrieval Industrial Aliates

Meeting Vienna Austria Octob er

Paralleliza tion of a Satellite Signal Pro cessing Co de Strategies and To ols ACPC Conference

Salzburg Austria February

 Peter Jungwirth

The Austrian Technology Transfer No de Information Workshop on the RD Framework Programme in

IT Budap est Hungary March

 Erwin Laure

Interpro cedural Array Alignment Analysis HPCN Europ e Amsterdam The Netherlands April

OPUS A Co ordination Language for Multidisci pl i nary Metacomputing Applications Euroto ols Work

shop On To ols for High Performance MetaComputing Europar Southampton United Kingdom

Septemb er

Compiling Data Parallel Tasks for Co ordinated Execution Industrial Aliates Meeting Vienna Austria

Octob er

Distributed High Performance Computing with OpusJava Parallel Computing Delft The Nether

lands August

Compiling Data Parallel Tasks for Co ordinated Execution EuroPar Toulouse France Septemb er

ParBlo cks A New Metho dology for Sp ecifying Concurrent Metho d Executions in Opus EuroPar

Toulouse France Septemb er

Additional research visits

Institute for Computer Science Alb ertLudwigsUniversi tat Freiburg Germany May

Ernst Klett Verlag Stuttgart Germany May

 John Merlin

HPF tutorial HUG Conference Porto Portugal June

Using KeLPHPF for Dynamic Blo ckStructured Application s HUG Conference Porto Portugal

June

HPF tutorial Europar Conference Southampton United Kingdom Septemb er

The FITS pro ject Industrial Aliates Meeting Vienna Austria Octob er

MPI vs HPF for Conjugate Gradient Iteration in Lattice QCD Industrial Aliates Meeting Vienna

Austria Octob er

Additional research visit

Battelle GmbH Eschb orn Germany January

 Krzysztof Sowa

DeHiFo An Advanced HPF Debugging System th Euromicro Workshop on Parallel and Distributed

Pro cessing Funchal Portugal February

Additional research visits

Technical University Munich Germany January

University of Linz Austria Octob er

 Therese Stickler

ATTN Pro jects Information Workshop on the RD Framework Programme in IT Budap est Hungary

March

 Willy Weisz

Presentation of the High Performance Computing and Networking Technology Transfer No des Informa

tion Workshop on the RD Framework Programme in IT Budap est Hungary March

The IST Programme in the th RTD Framework Programme Information Workshop on the RD

Framework Programme in IT Budap est Hungary March

From Sup ercomputers to Clusters The Demo cratization of High Performance Computing IST Infor

mation Day Budap est Hungary April

Additional research visit

Universities in Beijing Xian and Shanghai China April

 Hans Zima

HPF Eziente Parallelisi erung komplexer Anwendungen in High Performance Fortran Kollo quiu m

der Fakultat fur Informatik OttovonGuerickeUniversi tat Magdeburg Germany May

High Performance Fortran History Status and Future Fourth International Workshop on Applied

Parallel Computing PARA Umea University Sweden June

CEAEDFINRIA Summerscho ol on Computing Le Breau France June July

Lecture Intro duction to Parallel Programming Mo dels Languages and To ols

Lecture The Message Passing Approach

Lecture SharedAddress Space Approaches

Lecture High Performance Fortran

Lecture Formulating Advanced Applications With High Performance Fortran

Lecture Ob jectOriented Approaches

Lecture Parallel Programming Environments

Lecture Future Developments in Parallel Programming Mo dels Languages and To ols

Opus Invited Lecture Workshop on Programming Environments Clusters and Computational Grids

For Scientic Computing Blackb erry Farm TN USA Septemb er

High Performance Fortran Status and Future ICASE Collo quium ICASE Nasa Langley Research

Center Hampton VA USA Septemb er

The ESPRIT Pro ject HPF Computer Science Seminar Northwestern University Evanston IL USA

Novemb er

Integrating HPF and Op enMP ECMWF Workshop Use of Parallel Computers in Meteorology Euro

p ean Centre for Medium Range Weather Forecasts Reading England Novemb er

The ESPRIT Pro ject HPF IRSIP Italian Research Center for Parallel Computing Naples

March

HighLevel Supp ort for Parallel Scientic Computing Computer Science Seminar Seconda Universita

degli Studi di Nap oli Aversa Italy March

Das ESPRIT Pro jekt HPF Parallelisi erung industriel ler Anwendungen in High Performance Fortran

Computer Science Collo quium Technical University Dresden Germany April

The ESPRIT Pro ject HPF Invited Lecture International Symp osium on High Performance Com

puting ISHPC Kyoto Japan May

Parallelisi erun g industrieller Anwendungen in High Performance Fortran Computer Science Collo quiu m

University of Salzburg Austria June

Solving Irregular Problems with High Performance Fortran Computer Science Collo quium University

of Delaware Newark DE USA Septemb er

Solving Irregular Problems with High Performance Fortran Computer Science Collo quium Notre Dame

University South Bend Indiana IN USA Septemb er

Towards an Execution Mo del for the HTMT Architecture HTMT Workshop on Hybrid Technology

Multithreaded Architecture for Petaops Computing Half Mo on Bay CA USA Decemb er

Additional research visits

ICASE NASA Langley Research Center Hampton VA USA February Septemb er February

Center for Advanced Computing Research CACR California Institute of Technology Pasadena CA

USA January July Decemb er

Visitors and Guest Lectures

Visitors

 Reinhard v Hanxleden DaimlerBenz AG Berlin Germany January

 Hans Burkhardt University of Freiburg im Breisgau Germany March

 Marian Bubak University of Krakow Poland May

 Siegfried Grabner University of Linz Austria May

 Piyush Mehrotra ICASE NASA Langley Research Center Hampton VA USA May

 Roland Wismuller Technical University of Munich Germany May

 Lennart Johnsson University of Houston TX USA June

 G Cyb enko Dartmouth College Hanover NH USA Septemb er

 Susan FlynnHummel IBM T J Watson Research Center Yorktown Heights NY USA August

 Jack Dongarra University of Tennessee and Oak Ridge National Lab oratory Knoxville TN USA Septemb er

 Marianne Winslett University of Illinois UrbanaChampaign IL USA Octob er

 Barton P Miller University of Wisconsin Madison WI USA Novemb er

 William Jalby University of Versailles France January

 Carl Kesselman University of Southern California Los Angeles CA USA April

 Thomas Sterling California Institute of Technology Pasadena CA USA April

 Friedel Hossfeld Forschungszentrum Julic h Germany April

 Renate Dohmen Computing Centre Garching of the MaxPlanckGesellschaft Germany May

 Allen Malony Oregon State University Eugene OR USA April July

 Ralf Grub er SICEPFL Lausanne June

 Boleslaw Szymanski Rensselaer Polytechnique Institute Troy NY USA June

 Michael Gerndt Forschungszentrum Julic h Germany June

 Alok N Choudhary Northwestern University Evanston IL USA August

 Marco Gubitoso University of Sao Paulo Brazil Septemb er

 Matteo Frigo Massachusetts Institute of Technology Cambridge MA USA Septemb er

Institute Collo quia

Reinhard v Hanxleden DaimlerBenz AG Berlin Germany

Embedded Systems Design and Synthesis January

Hans Burkhardt University of Freiburg im Breisgau Germany

Algorithms and Structures for Paral lel Image Processing March

Andreas Krall Vienna University of Technology Austria

Ecient Implementation of the JavaVM May

Roland Wismuller Technical University Munich Germany

Debugging Paral lel Programs using DETOP and OMIS May

Lennart Johnsson University of Houston TX USA

Using HPF for Irregular Problems June

Susan FlynnHummel IBM T J Watson Research Center Yorktown Heights NY USA

Jalapeno August

George Cyb enko Dartmouth College Hanover NH USA

Mobile Agents and Scientic Computing Septemb er

Jack Dongarra University of Tennessee and Oak Ridge National Lab oratory Knoxville TN USA

NetSolves Network Enabled Server Examples and Applications Septemb er

Marianne Winslett University of Illinois UrbanaChampaign IL USA

The Panda Library for Paral lel IO Octob er

Georg Gottlob Vienna University of Technology Austria

The Complexity of Database and AI Problems Involving Acyclic Hypergraphs Octob er

Michael Resch High Performance Computing Center Stuttgart Germany

Supercomputing Simulation for Research and Industry Octob er

Barton P Miller University of Wisconsin Madison WI USA

The Paradyn Paral lel Performance Tool Project Novemb er

Barton P Miller University of Wisconsin Madison WI USA

Adaptive Operating Systems An Architecture for Evolving Systems Decemb er

Willi am Jalby University of Versailles France

A Perspective on Memory System Architecture and Its Impact on Compilers January

Friedel Hossfeld Research Center Julic h Germany

On Chal lenges of BeyondTeraops Computing April

Thomas Sterling Caltech and NASA Jet Propulsion Lab oratory Pasadena CA USA

Future Directions in High End Computing April

Carl Kesselman University of Southern California Los Angeles CA USA

The Computational Grid The Future of High Performance Computing April

Renate Dohmen Computing Centre Garching of the MaxPlanckGesell schaft Germany

Paral lelization of the FPLAPW Code WIEN for MessagePassing Systems May

Allen D Malony University of Oregon Eugene OR USA

A Perspective on Paral lel Performance Tools June

Boleslaw Szymanski Rensselaer Polytechnic Institute Troy NY USA

Performance Analysis Tools for Paral lel ObjectOriented Scientic Computations June

Ralf Grub er SICEPFL Lausanne Switzerland

From Commodity to Supercomputers June

Alok N Choudhary Northwestern University Evanston IL USA

PARSIMONY Paral lel and Scalable OLAP and Data Mining August

Marco GUBITOSO University of Sao Paulo Brazil

Delay Behavior in Domain Decomposition Applications Septemb er

Chapter

Faculty and Sta

Martin Aichhorn

Research Activities parallel programming to ols and compilers

Siegfried Benkner

Research Activities programming environments parallel programming languages

compilers and runtime systems for parallel and distributed computing

email sigiieeeorg

www httpwwwparunivieacatsigi

phone

Peter Brezany

Research Activities parallel IO knowledge discovery in large scientic datasets

highp erformance databases parallel and distributed systems

email brezanyparunivieacat

www httpwwwparunivieacatbrezany

phone

Barbara Chapman

Research Activities programming environments parallel programming languages

compilers and runtime systems for parallel and distributed computing

Maria Cherry

Secretary

email mariaparunivieacat

www httpwwwparunivieacatmaria

phone

Tony Curtis

Research Activities webbased computing

Przemek Czerwinski

Research Activities p erformance analysis for parallel and distributed systems

email przemekparunivieacat

www httpwwwparunivieacatprzemek

phone

Dinis de Brito e Cunha

Research Activities parallelization of imagepro cessing applications

Minh Dang

Research Activities language compiler and runtime supp ort for parallel IO

Beniamino Di Martino

Research Activities compilers and to ols for parallel programming pattern

recognition in program co des

Markus Egg

Research Activities analysis to ols for parallelizing programs graphical user

interfaces

Markus Egger

Research Activities program transformations for parallelizing compilers

Harald Ehold

Research Activities parallel and numerical algorithms

Thomas Fahringer

Research Activities p erformance analysis for parallel and distributed systems

email tfparunivieacat

www httpwwwparunivieacattf

phone

Josef Fromcke

Research Activities parallelization of imagepro cessing applications

Nikola j Georgi

Research Activities parallelization of geneticalgorithmbased and image

pro cessing applications

Ian Glendinning

Research Activities to ols for the development of explicitly messagepassing

parallel programs parallelization of imagepro cessing applications

email ianvcp cunivieacat

www httpwwwvcp cunivieacatian

phone

Gerald Hampapa

Systems Administrator

Christoph Harms

Research Activities parallelization of applications

Gerhard Hejc

Research Activities parallelization of imagepro cessing applications distributed

multimedia systems

Peter Jungwirth

ATTN Pro ject Co ordinator

Michael Krausz

Systems Administrator

Erwin Laure

Research Activities parallel and distributed computing metacomputing hybrid

parallel mo dels

email erwinparunivieacat

www httpwwwparunivieacaterwin

phone

Eduard Mehofer

Research Activities parallelizing compilers communication optimizations

feedback oriented compilation probabilistic dataow systems parallel computation

email mehoferparunivieacat

www httpwwwparunivieacatmehofer

phone

John Merlin

Research Activities parallel programming languages compilation systems and

analysis to ols

Christian Neuhold

Research Activities parallelizing HPF array assignments

Michael Nolle

Research Activities hardware and software to ols for parallel and distributed

image pro cessing

Elisab eth Ob ermaier

Secretary

email auroraparunivieacat

phone

Mario Pantano

Research Activities programming environments and to ols for highp erformance

computing

Martin Paul

Systems Administrator

email martinparunivieacat

www httpwwwparunivieacatmartin

phone

Alex Pozga j

Research Activities design and implementation of a p erformance estimator for

parallel and distributed computing

Kamran Sanjari

Research Activities compilers and runtime systems for highp erformance com

puting ob jectoriented technologies

Martin Scheibl

Research Activities co ordination environments with graphical user interfaces for

software to ols

Bernhard Scholz

Research Activities symb olic analysis parallelizing compilers realtime systems

probabilistic data ow systems

Viera Sipkova

Research Activities compilers for parallel and distributed systems parallel

computing irregular applications parallel IO

email sipkaparunivieacat

www httpwwwparunivieacatsipka

phone

Krzysztof Sowa

Research Activities debugging to ols for parallel programming advanced visual

izing to ols ob jectoriented technologies

Therese Stickler

TechnologyTransfer Pro ject Manager

Ansb ert Sturm

Systems Administrator

email sturmparunivieacat

www httpwwwparunivieacatsturm

phone

Borislav Velkov

Research Activities design and implementation of a High Performance Fortran

compilation environment

Willy Weisz

Research Activities parallel and distributed computing parallel computer

architectures highp erformance computing technology transfer

email weiszvcp cunivieacat

www httpwwwvcp cunivieacatweisz

phone

Bernd Wender

Research Activities programming environments parallel programming languages

compilers and runtime systems for parallel and distributed computing

Elisab eth Wurth

Secretary

Hans Zima

Research Activities parallel programming languages compilers for parallel and

distributed computing programming mo dels for pro cessorinmemory systems

email zimaparunivieacat

www httpwwwparunivieacatzima

phone

Bibliography

S P Amarasinghe and M S Lam Communication Optimization and Co de Generation for Distributed

Memory Machines In Proc ACM SIGPLAN Conf on Programming Language Design and Imple

mentation Albuquerque NM June

G Ammons and J Larus Improving dataow analysis with path proles In Proc of the ACM

SIGPLAN Conference on Programming Language Design and Implementation PLDI pages

Montreal Canada June

APART Esprit Working Group on Automatic Performance Analysis Resources and To ols Forschungs

zentrum Julic h Zentralinstitut fur Angewandte Mathematik ZMG httpwwwkfajuelichdeapart

AURORA Advanced Mo dels and Software Systems for High Performance Computing Sp ecial Research

Program of the Austrian Science Fundation httpwwwvcpcunivieacataurora

AURORA Pro ject Numerical Algorithms and Software for High Performance Computing http

wwwvcpcunivieacatauroragroupgrouphtml

AURORA Pro ject Paral lel Algorithms for Dynamic Stochastic Optimization in Financial Planning

httpwwwunivieacatsoraurora

AURORA Pro ject Quantum Mechanical Calculations of Solids With WIEN httpinfotuwien

acattheochemparalwelcomehtml

P Banerjee J A Chandy M Gupta J G Holm A Lain D J Palermo S Ramaswamy and E Su

The PARADIGM Compiler for DistributedMemory Message Passing Multicomputers In Proceedings

of the First International Workshop on Paral lel Processing pages Bangalore India Decemb er

S Barros D Dent L Isaksen G Robinson G Mozdzynski and F Wollenweb er The ifs mo del A

parallel pro duction weather co de Paral lel Computing

S Benkner Comparison of HPF Execution Mo dels on the NEC SX Deliverable NECGMD ADVICE

Pro ject June

S Benkner High Performance Fortran for advanced applications In Annual Review of Scalable Com

puting volume pages World Scientic Publishing Co Singap ore ISBN

S Benkner HPF High Performance Fortran for Advanced Scientic and Engineering Applications

Journal of Future Generation Computer Systems

S Benkner VFC The Vienna Fortran Compiler Scientic Programming

S Benkner Optimizing Irregular HPF Applications Using Halos Concurrency Practice and Experi

ence pages

S Benkner Sp ecication of Extended Halo Concept for HPFSX and HPFJA Technical rep ort

Deliverable of the NECUniVienna ADVANCE Pro ject NEC Corp oration Februrary

S Benkner and T Brandes Ecient Parallelization of Unstructured Reductions on Shared Memory

Parallel Architectures In Workshop Proceedings of the IEEE International Paral lel and Distributed

Processing Symposium May

S Benkner E Laure and V Sipkova VFC Compilation System for HPF Installation Notes

NEC SX Technical rep ort Deliverable of the NECUniVienna ADVANCE Pro ject NEC Corp ora

tion Februrary

S Benkner E Laure and H Zima HPF An Extension of HPF for Advanced Industrial Applications

Deliverable ESPRIT IV LTR Pro ject HPF February

S Benkner G Lonsdale and H Zima The HPF Pro ject Supp orting HPF for Advanced Industrial

Applications In EuroPar Paral lel Processing Toulouse France August SpringerVerlag

S Benkner P Mehrotra J Van Rosendale and H Zima Explicit Management of Communication

Schedules in HPF In Making its Mark Seventh ECMWF Workshop on the Use of Paral lel Processors

in Meteorology World Scientic Publishing Co Singap ore Editors G Homann N Kreitz ISBN

S Benkner P Mehrotra J Van Rosendale and H Zima HighLevel Management of Communica

tion Schedules in HPFlike Languages In Proc Int Conf on Supercomputing ICS Melb ourne

Australia July

S Benkner and M Pantano HPF Optimizing HPF for Advanced Applications Supercomputer

S Benkner and H Zima Compiling High Performance Fortran for DistributedMemory Architectures

Paral lel Computing

A Berson and S J Smith Data Warehousing Data Mining and OLAP McGrawHill

K S P Blaha and J Luitz WIEN A Ful l Potential Linearized Augmented Plane Wave Package for

Calculating Crystal Properties ISBN

P Brezany M Bubak P Czerwinski R Koppler K Sowa J Volkert and R Wisuller Advanced

symb olic debugging of HPF programs with SPiDER In Proc of SC ACM ISBN

Portland Oregon USA Novemb er

P Brezany A Choudhary and M Dang Parallelization of irregular outofcore applications for

distributedmemory systems In Proc International Conference and Exhibition on HighPerformance

Computing and Networking HPCN pages Vienna Austria April SpringerVerlag

LNCS

P Brezany P Czerwinski R Koppler K Sowa and J Volkert Advanced Visualization and Data Dis

tribution Steering in an HPF Parallelization Environment In Proc ParCo Delft The Netherlands

August

P Brezany P Czerwinski A Swietanowski and M Winslett Parallel Access to Persistent Multidimen

sional Arrays from HPF Applications Using Panda In HighPerformance Computing and Networking

Europe SpringerVerlag May

P Brezany and M Dang Extending InputOutput Functionality of High Performance Fortran In

Proc International Conference on Paral lel and Distributed Processing Techniques and Applications

PDPTA Las Vegas Nevada USA July

P Brezany M Dang and A Choudhary Language and Compiler Supp ort for OutofCore Irregular

Applications on DistributedMemory Multipro cessors In Proc th International Workshop on Lan

guages Compilers and RunTime Systems for Scalable Computers LCR Pittsburgh USA May

Springer Verlag LNCS

P Brezany T A Mueck and E Schikuta A Software Architecture for Massively Parallel Input

Output Technical Rep ort TR Institute for Software Technology and Parallel Systems University

of Vienna Octob er

J B Bro ckman P M Kogge V W Freeh S K Kuntz and T L Sterling Microservers A New

Memory Semantics for Massively Parallel Computing In Proceedings ACM International Conference

on Supercomputing ICS

M Bubak W Funika G Mlynarczyk K Sowa and R Wismuller Symb ol Table Management in an

HPF Debugger In HighPerformance Computing and Networking Europe SpringerVerlag April

M Calzarossa L Massari A Merlo M Pantano and D Tessera Integration of a Compilation System

and a Performance To ol The HPF Approach In Proc HPCN Amsterdam The Netherlands

April

B Carp enter Adlib A distributed array library to supp ort hpf translation In Proc th Workshop on

Compilers for Paral lel Computers Malaga June

S Chakrabarti M Gupta and JD Choi Global Communication Analysis and Optimization In ACM

SIGPLAN Conference on Programming Language Design and Implementation PLDI Philadelphia

PA May

B Chapman M Haines P Mehrotra E Laure J Van Rosendale and H Zima Opus Reference

Manual Technical Rep ort TR Institute for Software Technology and Parallel Systems University

of Vienna Austria Octob er

B Chapman M Haines P Mehrotra J Van Rosendale and H Zima OPUS A Co ordination Language

for Multidisciplinary Applications Scientic Programming Winter

B Chapman P Mehrotra and H P Zima Enhancing Op enMP with Features for Lo cality Control

In ZwiehoferW and KreitzN Eds Proc Eighth ECMWF Workshop on the Use of Paral lel

Processors in Meteorology Towards Teracomputing pp Reading England Novemb er

World Scientic

M S Chen J Han and P S Yu Data Mining An Overview from a Database Persp ective IEEE

Transactions on Know ledge and Data Engineering Decemb er

J Clinckemaillie B Elsner G Lonsdale S Meliciani S Vlachoutsis F de Bruyne and M Holzner

Performance Issues of the Parallel PAMCRASH Co de Journal of Supercomputing Applications and

HighPerformance Computing

O Consortium Op enMP Fortran Application Program Interface version Octob er

M Ester Knowledge Discovery in Spatial Databases Habilitationsschrift Novemb er

T Fahringer Automatic Performance Prediction of Paral lel Programs Kluwer Academic Publishers

Boston USA ISBN March

T Fahringer Automatic Estimation of Communication Costs for Data Parallel Programs Journal of

Paral lel and Distributed Computing Academic Press Novemb er

T Fahringer and E Mehofer Buersafe Communication Optimization Based on Data Flow Analysis

and Performance Prediction In IEEE Proc International Conference on Paral lel Architectures and

Compilation Techniques PACT pages San Francisco CA Novemb er

T Fahringer P Brezany B Dimartino M Pantano A Pozga j K Sowa and B Wender On the

Development of HPF To ols as Part of the AURORA Pro ject nd Annual HPF User Group Meeting

held in conjunction with VECPAR Porto Portugal June

T Fahringer and E Mehofer Problem and Machine Sensitive Communication Optimization In Proc

th ACM International Conference on Supercomputing Melb ourne Australia July ACM Press

T Fahringer and E Mehofer BuerSafe and CostDriven Communication Optimization Journal of

Paral lel and Distributed Computing April

T Fahringer M Gerndt G Riley and J Tra Knowledge Sp ecication for Automatic Performance

Analysis APART Technical Report Workpackage Identication and Formalization of Know ledge

Technical Report FZJZAMIB Research Centre Julic h Zentralinstitut fur Angewandte Mathe

matik ZMG Julic h Germany Novemb er

T Fahringer M Gerndt G Riley and J Tra Formalizing Op enMP Performance Prop erties with

ASL Proc of the International Workshop on OpenMP Experiences and ImplementationsTokyo Japan

Octob er

T Fahringer M Gerndt G Riley and J Tra On Performance Mo deling for HPF Applications with

ASL Proc of the rd International Symposium on High Performance Computing ISHPCKTokyo

Japan Octob er

T Fahringer A Pozga j J Luitz and H Moritsch Evaluation of pt A p erformance estimator

for distributed and parallel applications In IEEE Proc of the International Paral lel and Distributed

Processing Symposium IEEE Computer So ciety Press May

U M J Fayyad G PiatetskyShapiro and P Smyth From Data Mining to Knowledge Discovery An

Overview In Advances in Know ledge Discovery and Data Mining pages AAAI Press

HPF Forum High Performance Fortran Language Sp ecication Version Novemb er

HPF Forum High Performance Fortran Language Sp ecication Version January

I Foster and C Kesselman editors The Grid Morgan Kaufmann

M Frigo A Fast Fourier Transform Compiler In Proc of the ACM SIGPLAN Conference on

Programming Language Design and Implementation PLDI Atlanta Georgia June

T Fuerle E Schikuta C Lo eelhardt K Sto ckinger and H Wanek On the implementation of a

p ortable clientserver based mpiio interface In EuroPVMMPI SpringerVerlag Septemb er

D Gannon et al Developing Comp onent Architectures for Distributed Scientic Problem Solving

IEEE Computational Science Engineering AprilJune

I Glendinning Parallelisation of a Satellite Signal Pro cessing Co de Strategies and To ols In P Zinter

hof M Va jtersic and A Uhl editors Paral lel Computation Proc th International ACPC Conference

volume of Lecture Notes in Computer Science pages SpringerVerlag February

J Gosling B Joy and G Steele The Java Language Specication AddisonWesely

R H Gutig An Intro duction to Spatial Database Systems The VLDB Journal Octob er

M Gupta S Midki E Schonb erg V Seshadri K Wang D Shields WM Ching and T Ngo An

HPF compiler for the IBM SP In Proc Supercomputing San Diego CA Decemb er

M Gupta E Schonb erg and H Srinivasan A unied framework for optimizing communication in

dataparallel programs IEEE Transactions on Paral lel and Distributed Systems pages

July

R Gupta D Berson and J Fang Path prole guided partial dead co de elimination using predication

In International Conference on Paral lel Architectures and Compilation Techniques PACT pages

San Francisco California Novemb er

R Gupta D Berson and J Fang Path prole guided partial redundancy elimination using sp eculation

In IEEE International Conference on Computer Languages pages Chicago Illinois May

M Hall J Koller P Diniz J Chame J Drap er J LaCoss J Granacki J Bro ckman A Srivastava

W Athas V Freeh J Shin and JPark Mapping Irregular Applications to DIVA a PIMBased Data

Intensive Architecture In Proceedings SC Novemb er

J R H Halstead Multilisp A Language for Concurrent Symb olic Computation ACM Transactions

on Programming Languages and Systems TOPLAS Octob er

S Hiranandani K Kennedy and C Tseng Evaluating Compiler Optimizations for Fortran D Journal

of Paral lel and Distributed Computing

C A R Hoare Monitors An Op erating Systems Structuring Concept Communication of ACM

A Ho essinger M Radi B Scholz T Fahringer E Langer and S Selb erherr Parallelization of a

montecarlo ion implantation simulator for threedimensional crystalline structures In Simulation of

Semiconductor Processes and Devices Septemb er

Japan Asso ciation of High Performance Fortran wwwtokyoristorjpshunchanindexehtml

K Kennedy and A Sethi A Communication Placement Framework with Unied Dep endence and

Dataow Analysis In rd International Conference on High Performance Computing Trivandrum

India Decemb er

K Kennedy and A Sethi ResourceBased Communication Placement Analysis In Proc of the th

Workshop on Language and Compilers for Paral lel Computing San Jose CA August

K Kop erski J Adhikary and J Han Spatial Data Mining Progress and Challenges SIGMOD

Workshop on Research Issues on Data Mining and Knowledge Discovery DMKD Montreal Canada

R Koppler S Grabner and J Volker Design and Visualization of Irregular Data Distributions

Technical Rep ort Deliverable DV Institute for Computer Science Johannes Kepler University Linz

PACT Consortium CEI May

R Koppler S Grabner and J Volkert Visualization of Distributed Data Structures for HPFlike

Languages Scientic Programming spec issue High Performance Fortran Comes of Age

E Laure Distributed High Performance Computing with OpusJava In Proc ParCo Delft The

Netherlands August

E Laure ParBlo cks A New Metho dology for Sp ecifying Concurrent Metho d Executions in Opus In

P Amestoy P Berger M Dayde I Du V Fraysse L Giraud and D Ruiz editors EuroPar

Paral lel Processing Lecture Notes in Computer Science No SpringerVerlag

E Laure OpusJava A Java Framework for Distributed High Performance Computing Future Gener

ation Computer Systems in print

E Laure M Haines P Mehrotra and H Zima Compiling Data Parallel Tasks for Co ordinated

Execution In P Amestoy P Berger M Dayde I Du V Fraysse L Giraud and D Ruiz editors

EuroPar Paral lel Processing Lecture Notes in Computer Science No SpringerVerlag

E Laure M Haines P Mehrotra and H Zima On the Implementation of the Opus Co ordination

Language Concurrency Practice and Experience April

E Laure E Mehofer H Moritsch V Sipkova and A Swietanowski HPF in Financial Management

Under Uncertainty Technical Rep ort AuR Sp ecial Research Program AURORA Institute for

Software Science University of Vienna Vienna Austria

E Laure P Mehrotra and H Zima Opus Heterogeneous Computing With Data Parallel Tasks

Paral lel Processing Letters June

J Li and M Chen Compiling global namespace parallel lo ops for distributed execution IEEE

Transactions on Paral lel and Distributed Systems pages July

E Mehofer and B Scholz Probabilistic Data Flow System with TwoEdge Proling In Accepted

for publication at ACM Sigplan Workshop on Dynamic and Adaptive Compilation and Optimization

Dynamo Boston MA January Also available as Technical Rep ort TR

P Mehrotra J Van Rosendale and H Zima Language Supp ort for Multidisciplinary Applications

IEEE Computational Science Engineering

M Ob erhub er and R Wismuller DETOP An Interactive Debugger for PowerPC Based Multicom

puters In P Fritzson and L Finmo editors Paral lel Programming and Applications pages

IOS Press Amsterdam May

D Patterson et al A Case for Intelligent DRAM IRAM In IEEE Micro April

G Ramalingam Data ow frequency analysis In Proc of the ACM SIGPLAN Conference on Pro

gramming Language Design and Implementation PLDI pages Philadephia Pennsylvania

May

J Saltz R Das B Mo on S Sharma YS Hwang R Ponnusamy and M Uysal A Manual for the

CHAOS Runtime Library Technical rep ort University of Maryland College Park MD May

H Schiermueller and G Bachler Industrial CFD on Parallel Computer Systems Technical rep ort

AVLLIST GmbH Graz Austria

L Smarr and C Catlett Metacomputing Communications of the ACM June

T Sterling and L Bergman A Design Analysis of a Hybrid Technology Multithreaded Architecture

for Petaops Scale Computation In Proceedings ACM International Conference on Supercomputing

ICS June

T Sterling and P Kogge An Advanced PIM Architecture for Spaceb orne Computing In ProcIEEE

Aerospace Conference March

K Sto ckinger E Schikuta T Fuerle and H Wanek Design and analysis of parallel disk accesses in

vipios In Proceedings of the PCS Ensenada Mexico IEEE Computer So ciety Press August

R v Hanxleden and K Kennedy GiveNTakea balanced co de placement framework In ACM

SIGPLAN Conference on Program Language Design and Implementation Orlando FL June

R C Whaley and J J Dongarra Automatically tuned linear algebra software In Proc SC

Orlando Florida Novemb er

R Wismuller J Trinitis and T Ludwig OCM A Monitoring System for Interop erable To ols In

Proc nd SIGMETRICS Symposium on Paral lel and Distributed Tools SPDT Welches OR USA

Aug

WWW do cuments httpwwwcordisluespritsrchtm or httpwwwprosomalu or

httpdbscordislu

WWW do cument httpwwwibmcomnewsphtml

H P Zima and T L Sterling Macroservers An Execution Mo del for DRAM Pro cessorInMemory

Arrays Technical Report CACR Center for Advanced Computing Research California Institute of

Technology Pasadena CA Also Technical Rep ort TR Institute for Software Science University

of Vienna February