PASSION Runtime Library for Parallel I/O

Syracuse University SURFACE Electrical Engineering and Computer Science College of Engineering and Computer Science 1994 PASSION Runtime Library for Parallel I/O Rajeev Thakur Syracuse University, Northeast Parallel Architectures Center, and Department of Electrical and Computer Engineering, [email protected] Rajesh Bordawekar Syracuse University Alok Choudhary Syracuse University, Northeast Parallel Architectures Center, and Department of Electrical and Computer Engineering Ravi Ponnusamy Syracuse University, Northeast Parallel Architectures Center, and Department of Electrical and Computer Engineering, [email protected] Follow this and additional works at: https://surface.syr.edu/eecs Part of the Computer Sciences Commons Recommended Citation Thakur, Rajeev; Bordawekar, Rajesh; Choudhary, Alok; and Ponnusamy, Ravi, "PASSION Runtime Library for Parallel I/O" (1994). Electrical Engineering and Computer Science. 30. https://surface.syr.edu/eecs/30 This Working Paper is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Electrical Engineering and Computer Science by an authorized administrator of SURFACE. For more information, please contact [email protected]. Scalable Parallel Libraries Conference Oct PASSION Runtime Library for Parallel IO Rajeev Thakur Rajesh Bordawekar Alok Choudhary Ravi Ponnusamy Tarvinder Singh Dept of Electrical and Computer Eng and Northeast Parallel Architectures Center Syracuse University Syracuse NY thakur rajesh choudhar ravi tpsingh npacsyredu At Syracuse University we consider the IO prob Abstract lem from a language compiler and runtime supp ort We are developing a compiler and runtime sup p oint of view We are developing a compiler and run port system cal led PASSION Paral lel And Scalable time supp ort system called PASSION Parallel And Software for InputOutput PASSION provides soft Scalable Software for InputOutput PASSION ware support for IO intensive outofcore loosely syn provides supp ort for compiling outofcore data paral chronous problems This paper gives an overview lel programs parallel inputoutput of data of the PASSION Runtime Library and describes two communication of outofcore data redistribution of of the optimizations incorporated in it namely Data data stored on disks many optimizations including Prefetching and Data Sieving Performance improve data prefetching from disks data sieving data reuse ments provided by these optimizations on the Intel etc as well as supp ort at the op erating system level Touchstone Delta are discussed together with an out We have also develop ed an initial framework for run ofcore Median Filtering application time supp ort for outofcore irregular problems This pap er gives an overview of PASSION and de Intro duction scrib es some of the main features of the PASSION There are a numb er of applications which deal with Runtime Library We explain the basic mo del of very large quantities of data These applications exist computation and IO used by the runtime library in diverse areas such as large scale scientic compu The runtime routines supp orted by PASSION are dis tations database applications hyp ertext and multi cussed A numb er of optimizations have b een in media systems information retrieval and many other corp orated in the runtime library to reduce the IO applications of the Information Age The numb er of cost We describ e in detail two of these optimizations such applications and their data requirements keep namely Data Prefetching and Data Sieving Perfor increasing day by day Consequently it has b ecome mance improvements provided by these optimizations apparent that IO p erformance rather than CPU or on the Intel Touchstone Delta are discussed together communication p erformance may b e the limiting fac with an outofcore Median Filtering application tor in future computing systems Recent advances in high p erformance computing have resulted in comput PASSION Overview ers which can provide more than Gops of com PASSION provides software supp ort for IO inten puting p ower However the p erformance of the IO sive lo osely synchronous problems It has a layered ap systems of these machines has lagged far b ehind It proach and provides supp ort at the compiler runtime is still several orders of magnitude more exp ensive to and op erating systems level as shown in Figure The read data from disk than to read it from lo cal memory PASSION compiler translates outofcore HPF pro Improvements are needed b oth in hardware as well as grams to message passing no de programs with explicit software to reduce the imbalance b etween CPU p er parallel IO It extracts information from user direc formance and IO p erformance tives ab out the data distribution which is required by the PASSION runtime system It restructures lo ops This work was supp orted in part by NSF Young Investiga having outofcore arrays and also decides the trans tor Award CCR grants from Intel SSD and IBM Corp formations on outofcore data to map the distribu and in part by USRA CESDIS Contract This work tion on disks with the usage in the lo ops The PAS was p erformed in part using the Intel Touchstone Delta System SION compiler uses well known techniques such as op erated by Caltech on b ehalf of the Concurrent Sup ercomput lo op stripmining iteration blo cking etc to generate ing Consortium Access to this facility was provided by CRPC I/O Intensive OOC Applications Compiler & Runtime Support Prefetch Manager I/O Controller and Disk Subsystem Cache and Buffer Manager Two-Phase Access Manager Compiler Support for HPF Directives Support for prefetching etc. Loosely Synchronous Computations Figure PASSION Rings program written in a highlevel data parallel language ecient co de for IO intensive applications It also like HPF can b e translated into ecient co de using emb eds calls to appropriate PASSION runtime rou the PASSION compiler and runtime system A de tines which carry out IO eciently The Compiler tailed description of all the features of PASSION is and Runtime Layers pass data distribution and access given in pattern information to the TwoPhase Access Man ager and the Prefetch Manager They optimize IO Mo del for Computation and IO using buering redistribution and prefetching strate In the SPMD Single Program Multiple Data pro gies At the op erating system level PASSION pro gramming mo del each pro cessor has a lo cal array as vides supp ort to handle prefetching and buering so ciated with it In an incore program the lo cal array The PASSION runtime supp ort system makes IO resides in the lo cal memory of the pro cessor For large optimizations transparent to users The runtime pro data sets however lo cal arrays cannot entirely t in cedures can either b e used together with a compiler to main memory In such cases parts of the lo cal array translate outofcore data parallel programs or used have to b e stored on disk We refer to such a lo cal ar directly by application programmers The runtime li ray as an Outofcore Lo cal Array OCLA Parts brary p erforms the following functions of the OCLA need to b e swapp ed b etween main mem ory and disk during the course of the computation hides disk data distribution from the user The basic mo del for computation and IO used by provides consistent IO p erformance indep endent PASSION is shown in Figure The simplest way to of data distribution view this mo del is to think of each pro cessor as hav ing another level of memory which is much slower than reorders IO requests to minimize seek time main memory Since the lo cal arrays are outofcore eliminates duplicate IO requests to reduce IO they have to b e stored in les on disk The lo cal ar cost ray of each pro cessor is stored in a separate le called the Lo cal Array File LAF of that pro cessor The prefetches disk data to hide IO latency no de program explicitly reads from and writes into the le when required If the IO architecture of the Writing message passing parallel programs with ef system is such that each pro cessor has its own disk cient parallel IO is a tedious pro cess Instead a Global Array To P0 To P1 To P2 To P3 P0 P1 P2 P3 Processors ICLA ICLA Disks Disks Local array Local array Files Files Figure Mo del for Computation and IO ized routines for parallel IO and collective commu the LAF of each pro cessor will b e stored on the disk nication These routines are built using the native attached to that pro cessor If there is a common set communication and IO primitives of the system and of disks for all pro cessors the LAF will b e distributed provide a high level abstraction which avoids the in across one or more of these disks In other words convenience of working directly with the lower layers we assume that each pro cessor has its own logical disk For example the routines hide details such as buer with the LAF stored on that disk The mapping of ing mapping of les on disks lo cation of data in les the logical disk to the physical disks is system dep en synchronization optimum message size for communi dent At any time only a p ortion of the lo cal array is cation b est communication algorithms communica fetched and stored in main memory The size of this tion scheduling IO scheduling etc p ortion dep ends on the amount of memory available The p ortion of the lo cal array which is in main mem PASSION Runtime Library ory is called the InCore Lo cal Array ICLA All The PASSION routines can b e divided into four computations are p erformed on the data in the ICLA main categories

Load more