A Parallel Communication Architecture for The
Total Page:16
File Type:pdf, Size:1020Kb
A PARALLEL COMMUNICATION ARCHITECTURE FOR THE LANGUAGE SEQUENCEL by JULIAN B. RUSSBACH, B.S. A THESIS IN COMPUTER SCIENCE Submitted to the Graduate Faculty of Texas Tech University in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Approved Chairperson ofthe Committee Accepted Uean ot the Uraduate School December, 2004 ACKNOWLEDGEMENTS I would like to thank my committee chair Dr. Per Andersen for his wisdom, patience, and insight as a person and computer scientist; Dr. Nelson Rushton for his numerous ideas and contributions to SequenceL grammar and semantics, the inception of the token ring for dynamic load balancing SequenceL execution, and his contribution to the cluster; and Dr. Daniel E. Cooke for his enthusiasm, keen eye, cluster provisions, and the opportunity to work on a great research team. Thanks also to Chris G. Fielder for help with cluster assembly and troubleshooting; Chris McClimmans for cluster maintenance suggestions; and Dr. Philip Smith for use of the Texas Tech HPCC computers and valuable MPI lessons. A special thanks goes to my girlfriend Radoslava for her tolerance of me through a year of work. TABLE OF CONTENTS ACKNOWLEDGEMENTS ii ABSTRACT v LIST OF FIGURES vi CHAPTER I. INTRODUCTION 1 I. I Document Overview 3 1.2 Introduction to the Language SequenceL 4 1.2.1 What is SequenceL? 4 1.2.2 Consume-Simplify-Produce 8 1.2.3 Normalize-Transpose-Distribute 10 1.3 Other Parallel Languages 12 1.4 SequenceL Implementations 14 II. LITERATURE REVIEW 19 2.1.1 Load Balancing 19 2.2 Token rings 23 2.2.1 IBM's Token Ring 23 2.2.2 "token rings" 26 III. METHODOLOGY 28 3.1 Inspiration 28 3.2 Cluster Implementation 31 111 3.3 Communication Model 33 3.3.1 Communication Model Problems 39 3.3.2 Communication Model Solutions and Changes 41 3.4 The SequenceL Interpreter 44 3.5 Proof of Concept: Interpreter and Architecture 48 IV. RESULTS 56 4.1 Interpreter Testing 56 4.2 Distributing Parallelisms 60 4.3 Performance Analysis 66 4.3.1 Time metrics 66 4.3.2 Explanations 73 4.3.3 Other Considerations 77 4.4 Intermediate Code and Persistent Data 78 V. CONCLUSIONS 80 5.1 Suggestions and Improvements 80 5.2 Future Work 82 5.3 Closing Remarks 82 REFERENCES 83 A. GRAMMARS 86 B. CLUSTER SPECIFICATIONS 88 IV ABSTRACT SequenceL is a language that discovers all parallelisms in a program from the nature of its execution cycle. This suggests the language is a good candidate for execution in a distributed memory, high performance computing environment. However, unrestricted execution of SequenceL parallelisms in distributed memory can lead to problems of granularity and load imbalance associated with the distribution of fully parallelized program and data. This thesis is a proof of concept investigation into a token ring communication architecture to load balance SequenceL execution in a distributed memory environment. The thesis provides a background over previous work on SequenceL in distributed memory, research into dynamic load balancing and token rings, and a methodology for construction of the communication architecture. This thesis has achieved the following results: • A non-recursive C SequenceL interpreter written and tested on a set of SequenceL programs. • A working distributed memory communication architecture and parallelized SequenceL execution • A distributed memory representation of SequenceL • A method of enforcing persistent SequenceL data and programs • Performance measurements of SequenceL programs executed on the communication architecture LIST OF FIGURES I.I TSpace Communication 15 2.1 Bisection Tree 22 2.2 Token Ring token 24 2.3 Token Ring frame 24 3.1 Single node concunency in token-based operations 37 3.2 Token ring 38 3.3 Communication table after tuple offload 54 4.1 Dynamics of parallelisms trace output 59 4.2 Communication trace output 61 4.3a Illustration of communications occurring in lines 1-6 62 4.3b Illustration of communications occurring in lines 7-25 63 4.4 Communication trace output 64 4.5 Communication tree representation 64 4.6 Communication hierarchy 65 4.7a Time proportion for 2 processors 67 4.7b Time proportion for 3 processors 68 4.7c Time proportion for 4 processors 68 4.7d Time proportion for 5 processors 69 4.8a # of processors versus execution time, data size = 1000 70 4.8b # of processors versus execution time, data size = 2000 71 4.8c # of processors versus execution time, data size = 3000 71 vi 4.9 Time results from variations in the upper bound profitability threshold 73 4.10 Times steps during distribution and aggregation of tuples 75 Vll CHAPTER I INTRODUCTION SequenceL is a language that discovers all parallelisms in a program from the nature of its execution cycle. This suggests the language is a good candidate for execution in a distributed memory, high performance computing environment. However, unrestricted execution of SequenceL parallelisms in distributed memory can lead to problems of granularity and load imbalance associated with the distribution of fully parallelized program and data. This thesis is a proof of concept investigation into a token ring communication architecture to load balance SequenceL execution in a distributed memory environment. The thesis provides a background over previous work on SequenceL in distributed memory, research into dynamic load balancing and token rings, and a methodology for construction of the communication architecture. This thesis has achieved the following results: • A non-recursive C SequenceL interpreter written and tested on a set of SequenceL programs. • A working distributed memory communication architecture and parallelized SequenceL execution • A distributed memory representation of SequenceL • A method of enforcing persistent SequenceL data and programs • Performance measurements of SequenceL programs executed on the communication architecture • Insight into the dynamics of distributed SequenceL execution SequenceL frees the programmer of the burden of finding or explicitly marking parallelisms in code - all parallelisms are generated for the programmer. This is potentially a revolutionary innovation in computer science language theory as cunently we are unaware of any other language that has been recognized to do so. The advantage of this feature has implications in the scientific and high performance computing communities for researchers who seek quicker solutions to mathematical problems through parallel execution. Removing the difficulty will allow programming ease, save programming time, and potentially reduce cost. One line of parallel code has been estimated to average $800 [And]. However, with the burden comes control of execution. If the programmer is not required to be aware of the parallelisms found, then the execution of SequenceL cannot require the programmer to explicitly state how the parallelisms are executed. On a single processor this is of little concem. By default, a time shared threaded process could execute code and emulate concunent execution or it could execute all of the code serially. In either scenario the code is executed on a single processor with little overhead. When in a distributed memory environment, one must approach differently. An automatic solution is needed for balancing automatic generation of parallelisms for arbitrary programs. This begs the questions of where, how much, and when should the parallel code be executed/distributed. A synchronous token ring network is a proposed as a solution. SequenceL parallel tasks are passed between and executed on nodes in this network. Load information is stored in a token and communicated by default circularly through the nodes. When a node receives the token it will then update the token with its cunent load estimation and as in traditional token rings, is given the temporary exclusive right to communicate with other nodes before passing the token on. In this case node-to-node communication refers to offloading or sending parallel tasks to another node if a load imbalance is detected. When a node is not engaged in sending load to another machine or token-based operations, it will compute parallel tasks. This allows serial execution of parallelisms by default and periodic dynamic load balancing when needed on the network. The nodes of the network are considered autonomous - decisions are made independently and all communication is peer-to-peer. There is no active arbitrator or server thus bottlenecking is not an issue and the token passing provides synchronicity between machines. The overall design aims to eliminate these common concems and others found in previous SequenceL distributed research. Two Beowulf clusters, a SequenceL interpreter, and the underlying communication architecture were built from scratch to test specific problems sets in SequenceL. Results of this study may draw insight into future implementations, modifications/revisions, and other dynamics of distributed SequenceL execution. 1.1 Document Overview Chapter I continues with a brief history, introduction, and description of SequenceL with examples. The chapter also touches on implementations of other parallel languages and concludes with a detailed look at previous SequenceL development. Chapter II provides a literature review over both, considerations of dynamic load balancing in distributed memory, and token ring communication. The two-fold investigation provides an academic background for the basis of the methodology covered in Chapter III. Chapter IV presents results of this study and Chapter V states conclusions and suggestions for future work. 1.2 Introduction to the Language SequenceL The SequenceL language was originally created in 1991 (by Dr. Daniel E. Cooke) and has since undergone a series of maturations continuing through to 2004. In 1995 the language was proven Turing complete. In 1999 it was discovered that SequenceL execution lent itself to the natural unfolding of data and control parallelisms [CoAn]. Work continued on the language and a SequenceL research team was organized. In 2002 a shared memory SequenceL compiler was completed resulting in a series of semantic revelations that lead to a drastic reduction in the grammar and a leaner simplified execution cycle.