A Comparison of the Capabilities of PVM, MPI and JAVA for Distributed Parallel Processing

Roger Eggen Maurice Eggen Science Computer Science University of North Florida Trinity University Jacksonville, Florida 32224 San Antonio, Texas 78212

Abstract: Networked Unix as well as network technology have improved the speed of workstations based on Windows 95 and Windows NT the network itself, it becomes more and more de- are fast becoming the standard for computing environ- sirable to attempt to harness the potential a net- ments in many universities and research sites. In order worked laboratory of workstations presents to the to harness the tremendous potential for computing ca- researcher. pability represented by these networks of workstations Several programming environ- many new (and not so new) tools are being developed. ments are beginning to emerge as standards in Parallel (PVM) and Message Passing distributed parallel processing (DPP). For a het- Interface (MPI) have existed on Unix workstations for erogeneous cluster of unix workstations, parallel some time, and are maturing in their capability for virtual machine (PVM) enjoys wide usage. An handling Distributed Parallel Processing (DPP). Re- attempt to standardize the message passing inher- cently, however, JAVA, with all of its followers, has ent in DPP, message passing interface (MPI) was began to make an impact in the DPP arena as well. This paper will explore each of these three vehicles for developed. Both of these programming environ- DPP considering capability, ease of use, and avail- ments effectively harness the capability of a dis- ability. We will explore the interface as tributed network of unix workstations as well as a well as explore their utility in solving real world par- heterogeneous network of unix and windows NT allel processing applications. We will show that each workstations. has its advantages. The bottom line is that program- Because it encompasses a complete pro- ming a distributed cluster of workstations is challeng- gramming environment, JAVA, with its racehorse ing, worthwhile and fun! development, is enjoying considerable usage. With its remote method invocation (RMI), JAVA Keywords: parallel and distributed - has become a significant player in DPP. ing, PVM, MPI, Java How do these programming environ- ments compare? How easy is it to program one versus the other? What kind of programmer inter- 1 Introduction face exists? What kind of workstations are neces- sary? Which programming system appears to be Networked Unix workstations as well as worksta- the most capable? What are the advantages and tions based on Windows 95 and Windows NT are disadvantages of each? In this paper, we shall dis- fast becoming the standard for computing envi- cuss each of these programming systems in some ronments in universities and research sites. Work- detail. We shall then offer our opinions of the an- stations are becoming more powerful, less expen- swers to the above questions. Of course, one sive, and generally easier to use than ever before. person's gem is another's garbage. The reader Because of the attractiveness of these worksta- must ultimately decide upon programming prefer- tions, and because of the fact that they represent ences based on experience alone. tremendous unharnessed potential for computing, many people are beginning to harness clusters of 2 Hardware workstations to effectively simulate parallel com- puters. The hardware for these experiments is rich in its Since workstations are becoming increas- diversity. There should be something here for ingly more powerful, and since advances in everyone. A network of Unix workstations con- memory, and the team decided to sisting of 15 HP 9000 - 712 60 mHz workstations use a message passing scheme running HPUX 9.5 formed the bulk of the unix via sockets for inter-process environment. In addition, however, communication. I wasn't actually there were two HP 9000 - 712 80 mHz worksta- involved in that decision, but the tions. A 200 mHz SGI workstation as well as a group at the time thought it was 160 mHz SGI were also included. For diversity, the way to go. The work has several 133 and 166 mHz pentium machines run- been interesting."[5] ning RedHat Linux 4.2 were also included in the cluster. In all, one could configure a parallel ma- Because this author believes that messag- chine consisting of three different architectures, ing is important to any discussion of parallel and with 6 different processor speeds as well as either distributed processing, we shall focus on the ca- a RISC architecture (HPPA or SGI5) or a CISC pabilities for messaging provided by some of the architecture (pentium). important software development tools available in Windows NT workstations were also in- the market today. Of particular importance is the volved in the experiment. A windows NT server fact that these software environments are avail- (200 mHz pentium) provided the networking able in the public domain and can be used on a backbone. Additionally, five NT machines (133 wide variety of hardware platforms. and 166 mHz pentium machines) were part of the Messaging systems are subject to certain network. requirements in order to provide the capability to perform general purpose concurrent execution of 3 Software processes. • The idealized model for a parallel machine is that A process is itself inherently sequen- of a multicomputer. Several von Neumann ma- tial code. Processes must have the chines are connected via an interconnection net- ability to communicate with a larger work. Whether the von Newmann machines are environment; that is, read and write tightly coupled or exist on some network, the remote memory. model applies. The various nodes on the network communicate via a messaging scheme. The only • A process must be able to send mes- difference between the network model and the sages and receive messages. tightly coupled model is the cost of sending a message between the nodes. • A send operation should be synchro- Since messaging seems to be the focus nous. Processing should continue for the ability of the multicomputer to be able to once a message is sent. access (read and write) remote memory, various schemes for accomplishing messaging have been • A receive operation should have the devised. Supporting evidence for messaging is ability to be asynchronous. It should included in the following quote from a young pro- be able to block the execution of the fessional in the field: process until a desired message is re- ceived. "If you have time every now and then, let me know how the Paral- • A process should have the ability to lel Processing research goes. perform dynamic task creation and The TWC work I've been in- allocation. Messaging should be able volved with at SwRI has required to be created and destroyed dynami- some Parallel Processing. We're cally. running 19 processes on a 4 pro- cessor Onyx. The load balancing We shall now discuss each of the chosen and synchronization issues have software tools and explain how each satisfies the been quite a task. We had a lot requirements for messaging systems. We shall of problems with using shared also discuss how to obtain the software, some general comments concerning its installation, and Here, the pvm_initsend() function pre- comment on each packages' ease of use. pares a send buffer, while the pvm_pkint() packs an array into the buffer, and the pvm_send() func- 4 Parallel Virtual Machine: Avail- tion sends the buffer to a task with PVM task identifier tid. the tag msgtag is used to assist in ability identifying appropriate messages. In the receiving process, the process with PVM is in the public domain. As a relatively ma- PVM task identifier tid, there must be a cor- ture software environment for parallel processing, responding receive, which will wait until it hears PVM enjoys an extensive collection of ancillaries from the sending process. The receiving code and libraries which may be obtained from several might appear as follows: sites. The PVM home page at http://www.epm.ornl.gov provides links to most pvm_recv(tid,msgtag); of the sites. Downloading the source is straight- pvm_upkint(&Size,1,1); forward. pvm_upkint(array,Size,1); Installation on most Unix platforms is relatively easy (students have performed the in- The pvm_recv() is blocking, so that it sat- stallation at our institutions). PVM enjoys wide isfies the requirement of being asynchronous. applicability as well. It has been successfully Thus, PVM processes can accomplish the require- been installed on many Unix platforms, including ments set forth above. It can dynamically create Silicon Graphics, Hewlett Packard, IBM, Linux, processes with the function pvm_spawn() and can and many others, as well as Windows 95 and communicate with its environment with the send Windows NT. The installation on the Windows and receive commands. operating systems is not as straightforward, but can be accomplished by someone who knows the quirks of these operating systems. Thus, PVM 6 Message Passing Interface: Avail- enjoys the capability to perform experiments in ability parallel processing on virtually any heterogenous cluster of workstations as well as many shared A public domain version of MPI can be obtained memory parallel processing machines.[4] from http://www.mcs.anl.gov/mpi. This version, called , is easy to install in most Unix sys- 5 Parallel Virtual Machine: Pro- tems and requires only basic Unix knowledge. MPI is also a relatively mature software environ- gramming ment, and, with recent releases of the software, enjoys a relatively bug-free execution. [1] Designed as a collection of , C++ or functions, PVM is a comfortable environment for 7 Message Passing Interface: Pro- an experienced programmer. Moreover, after hav- ing an introductory class in programming, stu- gramming dents find the environment accessible. The mes- saging requirements are provided by specialized Also designed as a collection of C and Fortran functions which accomplish the buffer creation, libraries, MPI, like PVM, is relatively easy to the data preparation, the sending and the receiv- program. MPI attempts to standardize the mes- ing of the data. For example, the following code sage passing environment. fragment illustrates a process packing an array of Like programming PVM, MPI also pro- integers of size Size, and sending that array to a vides functions MPI_Send() and MPI_Recv() to receiving process (presented in C): facilitate sending messages between processes. Additionally, MPI_Recv() is blocking, providing pvm_initsend(PvmDataDefault); for both synchronous and asynchronous com- pvm_pkint(&Size,1,1); munication. As an example of a send operation, pvm_pkint(array,Size,1); consider pvm_send(tid,msgtag); MPI_Send(&variable, 1, MPI_FLOAT, dest, tag, Now it gets interesting. Java, with all of its devo- MPI_COMM_WORLD); tees, has fast become the programming fad of the decade. Like any other hot new programming en- Of course, each of the variable positions vironment, Java has an ever-increasing following. mean something to the send command. In this A complete programming system, Java incorpo- case, one floating point variable is to be sent to rates many tools to make it more than a general process with rank dest. By comparison, a receive purpose programming system. Java tools may be command may have the syntax obtained from many places on the World Wide Web and via anonymous ftp from several sites. MPI_Recv(&variable, 1, MPI_FLOAT, source, The references to this paper contain several start- tag, MPI_COMM_WORLD, &status); ing URL's for searching for the desired program- ming tools. [7] To summarize the capabilities of PVM However, the interesting thing about Java and MPI in a nutshell, here are the most funda- is that it may be used for DPP, in a manner simi- mental function calls in both environments: lar in some respects to the capabilities of PVM and MPI. PVM: 9 JAVA: Programming Start: int pvm_mytid(); Send: pvm_initsend(int encoding); Java programming is an object based program- pvm_pk...(datatype *ptr, int size, int stride); ming system which was originally designed to pvm_send(int task_id, int msgtag); facilitate programming for the World Wide Web. Receive: pvm_recv(int task_id, int msgtag); It has evolved into much more, and has found its pvm_upk...(datatype *ptr, int size, int stride); way into general use, including, for example, Shut down: pvm_kill(int task_id); teaching elementary programming concepts such pvm_exit(); as one might find in an introductory course at your favorite university. Anyone with experience MPI: in C++ will find that Java includes and embodies a similar programming style. Thus, the C++ pro- Start: MPI_Init(); grammer, with just a bit of effort, will find Java a Send: MPI_Send(void *message, int size, relatively nice programming environment. How- MPI_Datatype datatype, int destination, int tag, ever, in this discussion, we are concerned mainly MPI_Comm communicator); with DPP. Receive: MPI_Recv(void *message, int size, Java has many features that make pro- MPI_Datatype datatype, int source, int tag, gramming easier. However, they are outside the MPI_comm communicator, MPI_Status status); scope of this document. The features that make Shut down: MPI_Finalize(); Java viable as a parallel programming language are: MPI provides a wealth of commands to facilitate message passing (hence its name). Pro- • gramming PVM and MPI is similar. If one learns String support for networking. to program in one of these environments it isn't hard to learn to program in the other. Problem • The ability to perform remote proce- solving methodologies are similar in both pro- dure calls and remotely register ob- gramming systems. Students have been success- jects via the Remote Method Invoca- ful in programming MPI, since many of our stu- tion (RMI) libraries. dents are experienced in C and C++ programming and are comfortable with C style function and li- • Built-in support for multi-threading. brary calls. • Java's ability to invoke instances of 8 JAVA: Availability the Java Virtual machine, dynami- cally load and link classes, and invoke the when when you make a method call on the remote ob- necessary.[2] ject, all the parameters are marshaled (prepared for communication), sent to the remote registry, Our primary goal is allowing one host to unmarshaled (unpacked), executed, marshaled run programs on another host. Using the built-in again, send back to the local machine, and unmar- threading capabilities of the language coupled shaled yet again. As you see, this can be a very with RMI, threads can be started on a slave work- powerful component of your programming. station using a separate master workstation. In As you can imagine, these features simplest terms, this happens by using RMI, pass- present us with enormous opportunities for creat- ing a parameter to a remote method with a run- ing parallel or applications. nable class as a parameter, and then getting that While the nitty-gritty programming of Java dif- runnable class to execute by invoking another re- fers from programming PVM or MPI, the funda- mote method. mental capabilities are similar. To understand how this works, we must discuss two of Java's technologies: RMI and 10 Summary threading. In Java, there are a number of ways to go about creating a , but only two ways to Programming a distributed environment using a create a threadable object. Since Java is truly ob- message passing scheme is straightforward using ject oriented, everything in the language is an ob- PVM or MPI. Programming a distributed cluster ject (except for primitives-but they can also be of workstations using JAVA is not yet as mature treated as objects). To create an object that is as PVM or MPI, and is limited to the JAVA threadable, you either create a class that extends structure. Source code for many different PVM, the Thread class or create a class that implements MPI, or Java programming applications is avail- the interface Runnable. The second option is able upon request. more desirable because Java doesn't support mul- tiple inheritance, so we are given the ability to extend some other class if necessary. 11 Bibliography Once you are able to start a thread, then you will be able to export the thread to another [1] Peter S. Pacheco, Parallel Programming with machine using RMI. RMI gives you the ability to MPI, Morgan Kaufmann, 1997 create remote objects, which you can pass around and call methods just like any other Java object. [2] Elliotte Rusty Harold, Java Network Program- The unique thing about the remote object is that ming, O'Reilly, 1997 the object isn't physically located on the host ma- chine. You gain reference to these remote objects [3] Ian Foster, Designing and Building Parallel by using the Java based RMI Registry. On the re- Programs, Addison Wesley, 1994 mote computer, the code running must bind an object to the local RMI Registry using the [4] Al Geist et.al., PVM, Parallel Virtual Ma- Naming.rebind() method. However, before an ob- chine, A Users' Guide and Tutorial for Net- ject can be bound you must run the RMIC com- worked , The MIT Press, 1994 piler on the bytecode of the object. This compiler generates two additional postprocessed classes: [5] Bill Randal, SwRI, Personal Conversation, the Stub class and the Skeleton class. 1998. Once this binding is done, then a remote computer can gain reference to this object by call- [6] C. Tyson Weihs, Personal Conversation, 1998 ing the Naming.lookup method. This lookup func- tion contacts the remote registry at a specified IP [7] http://www.sun.com/java or URL, requests the stub of the class, and then loads that stub class into the local Virtual Ma- [8] http://www.epm.ornl.gov/pvm chine. Then, once you have the reference to that class, you can make method calls on this object [9] http://www.mcs.anl.gov/mpi like any other object. The only difference is that