
ANALYSIS OF INTERTASK COMMUNICATION IN PVM COMPUTATIONS ON WORKSTATION CLUSTERS by Sarah E. Zabel THESIS Presented to the Faculty of The University of Texas at San Antonio In Partial Ful®llment of The Requirements for the Degree of MASTER OF SCIENCE April 15, 1996 Acknowledgements I would like to thank Dr. Rajendra Boppana for the long hours and hard work he dedicated to this effort. I would also like to thank Dr. Bob Hiromoto and Dr. Samir Das for participating in my thesis committee, and for reviewing and providing com- ments on my thesis. iv Abstract PVM and other message-passing libraries facilitate parallel processing on worksta- tion clusters. However, the intertask communications latency often limits the achiev- able speedup. This thesis studies the intertask communications latency in PVM computations. The PVM message library is instrumented to generate traces which are in turn used to drive a simulator. This approach provides a ¯exible method to evaluate the performance improvement which newer high-speed networks could provide to a workstation cluster. This thesis presents the results of this analysis for the NAS parallel benchmarks on workstations interconnected by 10 Mbps and 100 Mbps Ethernets. v Contents 1 Introduction 1 2 The PVM Message Model 5 2.1 Overview of PVM Communications :: :: ::: ::: ::: ::: ::: :: : 6 2.2 Communications Between PVM Tasks :: ::: ::: ::: ::: ::: :: : 8 2.2.1 Pvmd to Pvmd Communications :: ::: ::: ::: ::: ::: :: : 8 2.2.2 Pvmd to Task Communications :: ::: ::: ::: ::: ::: :: : 10 2.2.3 Task to Pvmd and Task to Task Communications ::: ::: :: : 11 2.3 PVM Group Server ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 13 2.4 An Example of PVM Communications :: ::: ::: ::: ::: ::: :: : 13 2.5 Data Sending and Encoding Options : :: ::: ::: ::: ::: ::: :: : 20 2.6 The PVM Trace Facility ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 22 3 Instrumentation of PVM Message Library 24 3.1 Trace Generation : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 25 3.1.1 Instrumentation of Send Function ::: ::: ::: ::: ::: :: : 27 vi vii 3.1.2 Instrumentation of Receive Function :: ::: ::: ::: ::: :: : 29 3.1.3 Instrumentation of the Multicast Operation ::: ::: ::: :: : 29 3.1.4 Instrumentation of PVM Communications Overhead : ::: :: : 29 3.2 Trace Output ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 30 3.3 Converting Traces to Simulation Input :: ::: ::: ::: ::: ::: :: : 30 4 Simulations of PVM Program Executions 35 4.1 CSIM ::: :: ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 35 4.2 Ethernet, 10 Mbits/second : ::: ::: :: ::: ::: ::: ::: ::: :: : 37 4.2.1 Send a Message to Another Task : ::: ::: ::: ::: ::: :: : 37 4.2.2 Receive a Message From Another Task : ::: ::: ::: ::: :: : 39 4.2.3 An Example of Simulation Processing :: ::: ::: ::: ::: :: : 39 4.3 Ethernet, 100 Mbits per Second : ::: :: ::: ::: ::: ::: ::: :: : 41 4.4 Assumptions and Approximations :: :: ::: ::: ::: ::: ::: :: : 41 5 PVM Benchmark Programs 44 5.1 Multigrid : :: ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 45 5.2 Conjugate Gradient :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 46 5.3 Integer Sort : ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 46 5.4 Fast Fourier Transform ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 53 5.5 Lower-Upper Diagonal : ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 53 5.6 Scalar Pentadiagonal :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 58 viii 5.7 Block Tridiagonal : ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 58 5.8 Test Network Con®guration ::: ::: :: ::: ::: ::: ::: ::: :: : 61 6 Performance Analysis 65 6.1 Test and Simulation Results ::: ::: :: ::: ::: ::: ::: ::: :: : 66 6.2 Impact of Network Speed on Communications Time :: ::: ::: :: : 68 6.2.1 Multigrid (MG) :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 68 6.2.2 Conjugate Gradient (CG) : ::: :: ::: ::: ::: ::: ::: :: : 70 6.2.3 Integer Sort (IS) : ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 70 6.2.4 Fast Fourier Transform (FT) :: :: ::: ::: ::: ::: ::: :: : 71 6.2.5 Lower-Upper Diagonal (LU) ::: :: ::: ::: ::: ::: ::: :: : 71 6.2.6 Scalar Pentadiagonal (SP) : ::: :: ::: ::: ::: ::: ::: :: : 72 6.2.7 Block Tridiagonal (BT) ::: ::: :: ::: ::: ::: ::: ::: :: : 73 6.3 Algorithmic Blocking Factors ::: ::: :: ::: ::: ::: ::: ::: :: : 73 6.4 Summary : :: ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 76 7 Discussion and Conclusions 78 A Simulation Results 80 A.1 Benchmark: mg :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 80 A.2 Benchmark: cg ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 81 A.3 Benchmark: is ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 83 A.4 Benchmark: ft ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 84 ix A.5 Benchmark: lu ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 85 A.6 Benchmark: sp ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 87 A.7 Benchmark: bt ::: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 88 Bibliography 92 Vita 93 List of Figures 1.1 Network Simulation Analysis Process : :: ::: ::: ::: ::: ::: :: : 3 2.1 PVM Communications and Communications Support Functions :::: 7 2.2 Message Storage in Pvmd ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 9 2.3 Relationship of Pvmd Messaging Functions ::: ::: ::: ::: ::: :: : 10 2.4 Pvmd Messaging :: ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 11 2.5 Message Storage in libpvm :: ::: ::: :: ::: ::: ::: ::: ::: :: : 12 2.6 Direct and Default Message Routing :: :: ::: ::: ::: ::: ::: :: : 13 2.7 Libpvm Messaging Functions ::: ::: :: ::: ::: ::: ::: ::: :: : 14 2.8 Example PVM System :: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 15 2.9 Pseudocode for Ring Application : ::: :: ::: ::: ::: ::: ::: :: : 15 2.10 Sample Application Trace Results ::: :: ::: ::: ::: ::: ::: :: : 16 2.11 PVM Implementation of Communications Functions ::: ::: ::: :: : 17 2.12 Multi-Packet Message Receipt Scenarios : ::: ::: ::: ::: ::: :: : 18 2.13 Pseudocode for mxfer and mxinput Functions : ::: ::: ::: ::: :: : 19 2.14 More Complicated Receive Operation :: :: ::: ::: ::: ::: ::: :: : 20 x xi 2.15 PVM Send and Receive Via TCP :: ::: :: ::: ::: ::: ::: ::: :: : 23 3.1 Mxfer and mxinput, Instrumented ::: :: ::: ::: ::: ::: ::: :: : 26 3.2 Trace Output Formats : ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 28 3.3 Pvm send and pvm psend Instrumentation ::: ::: ::: ::: ::: :: : 31 3.4 PVM Overhead Trace Files Format ::: :: ::: ::: ::: ::: ::: :: : 32 3.5 PVM Communications Functions Instrumented for Overhead ::: :: : 33 3.6 Format of Simulation Input : ::: ::: :: ::: ::: ::: ::: ::: :: : 34 4.1 Simulation Timeline ::: ::: ::: ::: :: ::: ::: ::: ::: ::: :: : 40 5.1 Aggregate Communications Pro®le for MG ::: ::: ::: ::: ::: :: : 47 5.2 Dynamic Communication Pro®le for MG : ::: ::: ::: ::: ::: :: : 48 5.3 Aggregate Communications Pro®le for CG : ::: ::: ::: ::: ::: :: : 49 5.4 Dynamic Communication Pro®le for CG :: ::: ::: ::: ::: ::: :: : 50 5.5 Aggregate Communications Pro®le for IS : ::: ::: ::: ::: ::: :: : 51 5.6 Dynamic Communication Pro®le for IS :: ::: ::: ::: ::: ::: :: : 52 5.7 Aggregate Communications Pro®le for FT : ::: ::: ::: ::: ::: :: : 54 5.8 Dynamic Communication Pro®le for FT :: ::: ::: ::: ::: ::: :: : 55 5.9 Aggregate Communications Pro®le for LU : ::: ::: ::: ::: ::: :: : 56 5.10 Dynamic Communication Pro®le for LU :: ::: ::: ::: ::: ::: :: : 57 5.11 Aggregate Communications Pro®le for SP : ::: ::: ::: ::: ::: :: : 59 5.12 Dynamic Communication Pro®le for SP :: ::: ::: ::: ::: ::: :: : 60 xii 5.13 Aggregate Communications Pro®le for BT : ::: ::: ::: ::: ::: :: : 62 5.14 Dynamic Communication Pro®le for BT :: ::: ::: ::: ::: ::: :: : 63 5.15 Network Testbed Computer Con®guration ::: ::: ::: ::: ::: :: : 64 6.1 Benchmark Test and Simulation Results : ::: ::: ::: ::: ::: :: : 68 6.2 Comparison of Multigrid Simulation Results :: ::: ::: ::: ::: :: : 69 6.3 Comparison of Conjugate Gradient Simulation Results : ::: ::: :: : 70 6.4 Comparison of Integer Sort Simulation Results ::: ::: ::: ::: :: : 71 6.5 Comparison of Fast Fourier Transform Simulation Results :: ::: :: : 72 6.6 Comparison of Lower-Upper Diagonal Simulation Results ::: ::: :: : 73 6.7 Comparison of Scalar Pentadiagonal Simulation Results ::: ::: :: : 74 6.8 Comparison of Block Tridiagonal Simulation Results ::: ::: ::: :: : 75 6.9 Components of Blocking Time ::: ::: :: ::: ::: ::: ::: ::: :: : 77 6.10 Benchmark Blocking Times for 10 Mb/sec Ethernet Simulation :: :: : 77 Chapter 1 Introduction Scienti®c and technical research demands a great deal of computer power. In many cases, a single workstation cannot provide enough system resources to conduct a realistic computation, but a researcher's access to supercomputers is limited and such resources are expensive. A much less expensive alternative is the use of net- works of less powerful computers, operating in cooperation on a single task. Parallel Virtual Machine (PVM) provides such an environment [1]. PVM is a system that allows a programmer to treat a heterogeneous collection of com- puters as one ºvirtualº machine. PVM programs use the message-passing model to link resources together across a network so that components of a user's task can be processed in cooperation on several machines at once. In this manner, a com- putation takes advantage of the processing power of several machines rather than one. The performance of this virtual parallel computer on the task can be deter- mined from three factors: the processing power of the machines participating in the computation, the ef®ciency of the division of tasks among the processors, and speed and ef®ciency of intertask communication. In this thesis, we address the less- widely
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages103 Page
-
File Size-