WORKSHOP PROCEEDINGS First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters
Total Page:16
File Type:pdf, Size:1020Kb
First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-1) June 26th, 2004 Saint-Malo, France Ville de Saint-Malo – Service Communication – Photos : Manuel CLAUZIER Held in conjunction with 2004 ACM International Conference on Supercomputing (ICS ’04) WORKSHOP PROCEEDINGS First International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters COSET-1 Clusters are not only the most widely used general high-performance computing platform for scientific computing but also according to recent results on the top500.org site, they have become the most dominant platform for high-performance computing today. While the cluster architecture is attractive with respect to price/performance there still exists a great potential for efficiency improvements at the software level. System software requires improvements to better exploit the cluster hardware resources. Programming environments need to be developed with both the cluster and human programmer efficiency in mind. Administrative processes need refinement both for efficiency and effectiveness when dealing with numerous cluster nodes. The goal of this one-day workshop is to bring together a diverse community of researchers and developers from industry and academia to facilitate the exchange of ideas and to discuss the difficulties and successes in this area. Furthermore, to discuss recent innovative results in the development of cluster based operating systems and programming environments as well as management tools for the administration of high-performance computing clusters. COSET-1 Workshop co-chairs Stephen L. Scott Oak Ridge National Laboratory P. O. Box 2008, Bldg. 5600, MS-6016 Oak Ridge, TN 37831-6016 email: [email protected] Christine A. Morin IRISA/INRIA Campus universitaire de Beaulieu 35042 Rennes cedex, France email: [email protected] Program Committee Ramamurthy Badrinath, HP, India Amnon Barak, Hebrew University, Israël Jean-Yves Berthou, EDF R&D, France Brett Bode, Ames Lab, USA Ron Brightwell, SNL, USA Emmanuel Cecchet, INRIA, France Toni Cortès, UPC, Spain Narayan Desai, ANL, USA Christian Engleman, ORNL, USA Graham Fagg, University of Tennessee, USA Paul Farrell, Kent State University, USA Andrzej Goscinski, Deakin University, Australia Liviu Iftode, Rutgers University, USA Chokchai Leangsuksun, Louisiana Tech University, USA Laurent Lefèvre, INRIA, France John Mugler, ORNL, USA Raymond Namyst, Université de Bordeaux 1, France Thomas Naughton, ORNL, USA Hong Ong, University of Portsmouth, UK Rolf Riesen, SNL, USA Michael Schoettner, University of Ulm, Germany Assaf Schuster, Technion, Israël COSET-1 Program 9:00-9:05 Opening 9:05-10:05 Session 1: Cluster Operating System Services Session chair: Christine Morin, INRIA Parallel File System for Networks of Windows Workstations Jose Maria Perez, Jesus Carretero, Felix Garcia, Jose Daniel Garcia, Alejandro Calderon, Universidad Carlos III de Madrid, Spain An application-oriented Communication System for Clusters of Workstations Thiago Robert C. Santos and Antonio Augusto Frohlich, LISHA, Federal University of Santa Catarina (UFSC), Brazil 10:05-10:35 Session 2: Application Management Session chair: Christine Morin, INRIA A first step toward autonomous clustered J2EE applications management Slim Ben Atallah, Daniel Hagimont, Sébastien Jean and Noël de Palma, INRIA Rhône-Alpes, France 10:35-11:00 Coffee break 11:00 - 12:30 Session 3: Highly Available Systems for Clusters Session chair: Stephen Scott, ORNL Highly Configurable Operating Systems for Ultrascale Systems Arthur B. Maccabe and Patrick G. Bridges, The University of New Mexico, USA Ron Brightwell and Rolf Riesen, Sandia National Laboratories, USA Trammell Hudson, Operating Systems Research, Inc., USA Cluster Operating System Support for Parallel Autonomic Computing A. Goscinski, J. Silcock, M. Hobbs, Deakin University, Australia Type-Safe Object Exchange Between Applications and a DSM kernel R. Goeckelmann, M. Schoettner, S. Frenz and P. Schulthess, University of Ulm, Germany 12:30-14:30 Lunch 14:30-16:00 Session 4: Cluster Single System Image Operating Systems Session chair: Christine Morin, INRIA SGI's Altix 3700, a 512p SSI system. Architecture and Software environment. Jean-Pierre Panziera, SGI OpenSSI speaker TBA SSI-OSCAR Geoffroy Vallée, INRIA 16:00 -16:20 Coffee break 16:20-17:20 Session 4 (continued): Cluster Single System Image Operating Systems Session chair: Geoffroy Vallée, INRIA Millipede Virtual Parallel Machine for NT/PC clusters Assaf Schuster, Technion Genesis cluster operating system Andrzej Goscinski, Deakin University 17:20-18:00 Panel: SSI: Software versus Hardware Approaches Moderator: Stephen Scott, ORNL A Parallel File System for Networks of Windows Worstations José María Pérez Jesús Carretero José Daniel García Computer Science Department. Computer Science Department. Computer Science Department. Universidad Carlos III de Madrid Universidad Carlos III de Madrid Universidad Carlos III de Madrid Av. De la Unversidad, 30 Av. De la Unversidad, 30 Av. de la Universidad Carlos III, 22 Leganes 28911, Madrid, Spain Leganes 28911, Madrid, Spain Colmenarejo 28270, Madrid, Spain +34 91 624 91 04 +34 91 624 94 58 +34 91 856 13 16 [email protected] [email protected] [email protected] ABSTRACT 1. INTRODUCTION The usage of parallelism in file systems allows the In the last years, the need of high performance data achievement of high performance I/O in clusters and storage has grown as the capacity of disks and the networks of workstations. Traditionally this kind of applications needs have grown [1][2][3]. One solution was only available for UNIX systems, approach to overpass the bottleneck that characterizes requires the usage of special servers and the usage of typical I/O systems is the usage of parallel I/O special APIs, which leads to the modification, and/or approach [1]. This technique allows the creation of recompilation of existing applications. This paper large storage systems, by joining several storage presents the first prototype of a Parallel File System, resources, to increase the scalability and performance called WinPFS, for the Windows platform. It is of the I/O system and to provide load balancing. implemented as a new Windows File System and it is The usage of parallelism in file systems relies on the integrated within the Windows kernel components, fact that a distributed and parallel system consists on which implies that no modification or recompilation several nodes with storage devices. Performance and of applications is needed to take advantage of parallel bandwidth can be increased if data accesses are I/O. WinPFS uses shared folders (through the usage exploited in parallel. Parallelism in file systems is of the CIFS/SMB protocol) to access remote data in obtained by using several independent server nodes, parallel. The proposed prototype has been developed each one supporting one or more secondary storage under the Windows XP platform, and has been tested devices. Data are striped among those nodes and with a cluster of Windows XP nodes and a Windows devices to allow parallel accesses to different files, 2003 Server node. and parallel accesses to the same file. Initially, this Categories and Subject Descriptors idea was used in RAID [4] (Redundant Array of D.4.3 [ Operating Systems ]: File Systems Inexpensive Disks). However, when a RAID is used Management – distributed file systems. C.2.4 in a traditional file server, the I/O bandwidth is [Computer-Communications Networks ]: limited by the server memory bandwidth. However, if Distributed Systems – network operating systems. several servers are used in parallel, performance can be increased in two ways: Keywords 1. Allowing parallel access to different files by Parallel I/O, Cluster, Windows. using several disks and servers. 2. Striping data using distributed partitions, allowing parallel access to the data of the same file. However, current parallel file systems and parallel I/O libraries lack of generality and flexibility for general purpose distributed environments. Furthermore, all parallel file systems do not use standard servers, which makes it very difficult to integrate those describes what she wants and the system tries to systems in existing networks of workstations due to optimize the I/O requests applying optimization the need of installing new servers that are not easy to techniques. This approach is used in, ViPIOS[10]. use and that are only available for specific platforms The main problem with parallel I/O software (usually some UNIX flavour). Moreover, those architectures and parallel I/O techniques is that they systems are implemented outside the operating often lack of generality and flexibility, because they system, so that a new I/O API is needed to take create only tailor-made software for specific advantage of parallel I/O with the modification of problems. On the other hand, parallel file systems are existing applications. specially conceived for multiprocessors and Most of the software related to high performance I/O multicomputers, and do not integrate appropriately in is only available for UNIX environments, or has been general purpose distributed environments as clusters created as UNIX middleware. The work presented in of workstations. this paper tries to fulfil the lack of this kind of systems in the Windows environment, presenting a Last years some file