The Integrative Role of COW's and Supercomputers in Research

The Integrative Role of COWÕs and Supercomputers in Research and Education Activities Don Morton, Ganesh Prabu, Daniel Sandholdt, Lee Slater Department of Computer Science The University of Montana {morton | gprabu | marist92 | lslater}@cs.umt.edu ABSTRACT: Experiences of researchers and students are presented in the porting of code between a cluster of Linux workstations at The University of Montana and the Cray T3E at the Arctic Region Supercomputing Center. We test the thesis that low-cost workstation environments may be utilized for training, and to develop and debug parallel codes which can ultimately be moved to the Cray T3E with relative ease for realizing high performance gains. We further present ideas on how the computing environments of supercomputers and COW’s might benefit from more commonality. believe the majority of parallel programmers have 1 Introduction insufficient resources to construct such a “supercomputer.” For this reason, powerful machines In the early 1990’s, several pivotal events, revolving such as the Cray T3E will always be high-demand about the availability of low-cost commodity machines for large-scale production runs. Such processors, occurred which forever changed the nature environments often favor the batch user, and, in our of scientific and high-performance computing. The experience, supercomputer centers encourage such long improving line of Intel x86 architectures and the batch jobs in order to achieve high CPU utilization. increasing use of the growing Internet encouraged the Though these centers try to keep a few interactive PE’s development of the Linux operating system, ensuring available, it is often difficult for several programmers that anybody could own a Unix workstation for home to test their code (or debug) in an interactive manner. or business use. During this same time-frame, When it comes time to hold group training sessions in researchers at Oak Ridge National Laboratory (and U. parallel computing, it becomes even more difficult. So, Tennessee at Knoxville) initiated development of PVM, we maintain that COW environments should be used a highly-portable library and environment for the for this sort of interactive parallel computing, allowing construction of message-passing programs on clusters trainees and developers the interactivity (and of workstations (COW’s). In addition to supporting sometimes lack of network delays) they need, while well-known architectures, PVM supported Linux as freeing the MPP CPU’s for the large-scale batch jobs early as 1993. Also during this time, Cray Research, that utilize these expensive resources best. Inc. introduced its MPP Cray T3D based on the low- cost DEC Alpha chip, running its own version of Unix. In this paper, we accept the above thesis and begin to In addition to supporting its own CRAFT environment, explore the issues of integrating COW and MPP the T3D supported PVM (though somewhat different supercomputer resources so that users may migrate from the “standard” PVM). between the environments as effortlessly as possible. We begin with a case-study – a recent graduate-level The convergence of these tracks marked the beginning course in parallel computing in which students are of a healthy, complementary relationship between initially trained on a Linux cluster and, towards the end COW environments and state of the art supercomputers of the semester port their codes to the Cray T3E. Then, such as the Cray MPP series. Though some Linux and we discuss some of the research activities that have Cray supporters have felt somewhat threatened by the been taking place in the past four years using both presence of a “rival,” the reality is that both Linux and Cray MPP systems. Finally, we present our environments hold important niches, and an integration own opinions on how COW’s and supercomputers may of the environments may easily result in a win-win be better integrated for the benefit of all. situation for everyone. Our thesis is that the COW environment is well-suited for training users in concepts of parallel programming and in the development (which often means a lot of 2 Computing Environments debugging) of parallel codes, all at relatively low The author has been working with Linux since 1991 equipment cost. Though some Beowulf clusters have and with the Cray T3D/E series since 1993. With achieved remarkable performance benchmarks, we funding from the National Science Foundation, a poor 1 CUG 1999 Spring Proceedings man’s supercomputer (Linux cluster) was constructed parallel programming on the existing cluster of Linux in 1995 to provide a local environment for education workstations, using PVM, MPI, and High Performance and continued research in parallel computing. This Fortran. Then, through the support of the Arctic funding supported summer residencies at the Arctic Region Supercomputing Center, students would use Region Supercomputing Center. Therefore, there has accounts on the Cray T3E to execute and compare been constant emphasis placed on developing codes programs that were previously run on the Linux cluster. that run on both the Linux cluster and the Cray T3D/E, and to create similar programming environments. UM Scientific Computing Lab Sometimes this means that we don’t use certain features p1 p2 p3 p4 p5 p6 p7 p8 unless they’re available on both platforms. For example, until recently there was no Fortran 90 compiler on the Linux cluster, so any Fortran code was written in Fortran 77 style for both platforms. 100BaseT Hub Likewise, although shmem is a powerful programming LittleHouse.prairie.edu frontend.scinet.prairie.edu tool on the Cray MPP series, we do not use it in elk.prairie.edu “portable” codes (though, sometimes we use scinet.prairie.edu conditional compilation so that if the program is 10BaseT Hub compiled on the Cray, shmem is used, otherwise MPI or PVM). Internet The Linux cluster at The University of Montana (see Figure 1: University of Montana Linux cluster. Figure 1) consists of nine 100MHz Pentiums. One machine, possessing 128 Mbytes of memory, acts as an One of the first parallel programming activities that account and file server for the other eight machines, students engaged in was the PVM implementation of each possessing 64 Mbytes of memory. The machines the n-body problem. In its simplest form, we are connected by a 100Mb Fast Ethernet. Although considered a set of masses in two-dimensions and users may log into any machine (via NIS) and see their calculated the net gravitational force applied to each file systems (via NFS), the primary mode of use is to particle. A quick and dirty parallel algorithm was log in to the server. Since users have valid accounts on introduced so that students could gain experience in each machine through NIS, and their applications can writing their first parallel program that did something be seen by each machine via NFS, there is no need to “useful.” For most students, this was a difficult task, transfer an application to each machine for parallel and they spent a lot of time learning how separate computing. Thus, in many respects, users of the system executables could interact in a nondeterministic can run programs transparently, much as they would on fashion, plus they began to familiarize themselves with a T3E. The system supports PVM, MPI and recently, the mechanics of launching parallel executables. Such Portland Group’s HPF. difficulty would have occurred on either our Linux cluster or the T3E, so it was more appropriate to use The Cray T3E at the Arctic Region Supercomputing local resources for this. Though PVM doesn’t seem as Center consists of 272 450-MHz processors. Until popular as it did in the early 1990’s (a time when it had recently, the network connections to/from ARSC were little competition), it certainly appears in much legacy very slow, making remote use of the system frustrating, code, and it served to introduce students to something which of course increased the need to work on code other than SPMD. locally. Outside network connections have improved substantially since April 1999, but, due to very high Through the semester, students were also introduced to CPU utilization, it still is often more convenient to MPI and HPF, and were required to implement the n- perform training and development activities on the body problem in each of these paradigms. Since the local Linux cluster. students had already been exposed to a message- passing implementation with PVM, an MPI 3 Case Study Ð Parallel Programming implementation was simpler for them, and they enjoyed Course the higher-level constructs provided by that library. Of course, those students who didn’t constrain themselves In the Spring 1999 semester, a graduate-level (primarily to SPMD programming with PVM had to modify their masters degree students) course in parallel processing codes extensively when using MPI. Finally, students was offered at The University of Montana with the had little difficulty modifying their programs for HPF, intent of giving students an applied, hands-on but got a little confused playing around with HPF introduction to parallel computing. One expected compiler directives for optimizing their code. outcome of the course was to test the thesis stated in the Introduction. Students would be initially introduced to 2 CUG 1999 Spring Proceedings At this point, students had written and executed material for this paper. Two lab sessions were programs in PVM, MPI, and HPF, all on the Linux developed – the first session was simply an introduction cluster, and were becoming somewhat proficient in to the use of the T3E environment, in which students writing simple parallel programs. The next step was to were required to compile previously-ported code, and introduce them to readily-available performance then execute, in both interactive and batch (NQS) analysis tools that would run on the Linux cluster, and modes.

Load more