The Exploitation of Parallelism on Shared Memory Multiprocessors

THE UNIVERSITY OF NEWCASTLE UPON TYNE COMPUTING LABORATORY UN IVERSITY OF NEWCASTLE UPON TYNE The Exploitation of Parallelism on Shared Memory Multiprocessors by Michael A. Stoker PhD Thesis September 1990 Abstract With the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms. Acknowledgements Although the research behind this thesis and its subsequent writing are attributed to one individual, I feel that it is fitting to mention the other people who helped make it possible. First and foremost, I would like to thank my supervisor, Professor Peter Lee, for his comments and guidance over the course of my research. Moreover, I am grateful for his time and encouragement given during the latter phases of writing this thesis. Secondly, I would like to thank Dan McCue for his comments and suggestions regarding the implementation of my work. In addition, I would like to thank him for reading and commenting on early drafts of this thesis. Finally, I acknowledge the financial support from the Science and Engineering Rese~rch Council during the production this thesis. Table of Contents Abstract Acknowledgements Table of Contents List of Figures Chapter 1 Introduction ....................................................... 1 1.1 Applications and Technology .................................. 3 1.2 The Software Shortfall ........................................ 6 1.2.1 The Roots of the Problems of Exploiting Parallelism ........ 7 1.2.2 Bridging the Gap ....................................... 8 1.3 Areas for Research ............................................ 10 1.4 A Portent of the Chapters ..................................... 13 Chapter 2 The Nature of Parallelism 15 2.1 Parallel Architectures ........................................ 16 2.1.1 Flynn's Categorisation ................................. 16 2.1.2 SISD - Conventional Processors ......................... 17 2.1.3 SIMD ................................................. 18 Array Processors ....................................... 18 Vector and Pipelined Processors ......................... 19 Associative Processors .................................. 20 Analysis of the SIMD approach to Parallelism....... ..... 20 2.1.4 MIMD - Conventional Multiprocessors ................... 21 Shared Memory Multiprocessors ......................... 23 Distributed Memory Multiprocessors ..................... 24 Hybrid Multiprocessors ................................. 25 Analysis of the MIMD approach to Parallelism ............ 26 2.1.5 Exotic Architectures ................................... 27 VLIW Architectures .................................... 27 Systolic Arrays ........................................ 28 Dataflow Architectures ................................. 29 Graph Reduction Architectures .......................... 30 2.1.6 Architecture Summary 31 2.2 Operating System Issues and Tools ............................. 33 2.2.1 Resource Management ................................. 35 Multiprocessor Operating Systems ....................... 36 Parallelism Support .................................... 37 Alternatives to Processes ................................ 39 2.2.2 Tool Support .......................................... 40 Auto-parallelising Compilers and Restructurers ........... 41 Vectorisation ..................................... 43 Auto-parallelisation and Restructuring. 46 Other Research ................................... 48 Design Tools ........................................... 50 Monitoring and Debugging Tools ........................ 51 Parallel Profilers and Visualisers ................... 51 Parallel Debuggers ................................ 55 Summary of Monitoring Tools ...................... 59 2.3 Parallel Programming Mechanisms ............................ 60 2.3.1 Specifying Parallel Execution ........................... 60 Coroutines ............................................ 60 Fork and Join .......................................... 61 Cobegin ............................................... 61 Doall .................................................. 62 Process declarations .................................... 62 2.3.2 Thread Communication and Synchronisation ............. 63 Hardware Synchronisation Primitives........ .......... .. 65 Busy-waiting ..................................... 65 Fetch-and-Add .................................... 66 Denelcor HEP .................................... 66 Software Synchronisation Primitives... .................. 67 Bloc~ing Mechanisms ............................. 67 Buffering Mechanisms ............................. 68 N on-deterministic Choice .......................... 68 2.3.3 Communication and Synchronisation via Shared Variables 69 Semaphores ........................................... 70 Read and Write Locks... .................... ........... 70 Barriers and the Force .................................. 71 Conditional Critical Regions ............................ 71 Moni tors .............................................. 72 Serializers ............................................. 74 Path Expressions ....................................... 76 2.3.4 Communication and Synchronisation via Message Passing. 77 Specifying Channels of Communication .................. 77 Message Passing Abstractions ........................... 79 Remote Procedure Call ............................. 79 Atomic Actions ................................... 81 2.3.5 High Level Models of Parallelism ........................ 82 Linda Primitives ....................................... 82 Vector Programming Languages ......................... 84 Object-Oriented Languages ............................. 86 Logic Languages ....................................... 87 Functional Languages .................................. 90 Dataflow Languages .................................... 93 Chapter 3 Parallelism Issues 96 3.1 Parallelism Granularity ...................................... 97 3.2 Programming Styles and Constructs ........................... 100 3.3 Algorithms and Models for Parallel Programming .............. 104 3.4 Problems in the Parallel Execution of Programs ................ 115 3.5 Requirements For Explicit Parallelism Constructs .............. 118 3.6 Methods of Exploiting Parallelism .. 121 Chapter 4 Shared Values 125 4.1 Philosophy of Shared Values.................................. 125 4.2 Tyger Parallel Programming Model ........................... 128 4.2.1 Programming Language Constructs .................... 131 4.2.2 Static Shared Value Semantics ......................... 131 4.2.3 Control Flow Partitioning Functions .................... 145 4.2.4 Summary of the Tyger Model........................... 156 4.2.5 Dynamic Shared Values ............................... 161 4.3 System Components ......................................... 168 4.3.1 Ideal Thread Environment ............................. 169 4.3.2 Supporting Tools ...................................... 171 -Chapter 5 Implementation of Shared Values 173 5.1 Multimax Multiprocessor .................................... 173 5.1.1 Caching Strategies and Cache Coherency ................ 174 5.1.2 Operating System ..................................... 175 5.1.3 Lightweight User Level Threads Library Model.......... 176 5.1.4 System Characteristics ................................ 177 5.2 Implementing Shared Values ................................. 179 5.2.1 Representation of Shared Values ....................... 179 5.2.2 Synchronisation for Shared Values ..................... 180 5.3 Compile Time Optimisations .................................. 182 Chapter 6 Performance Evaluation of Shared Values ................... 184 6.1 Measuring Parallel Processing Performance .................... 185 6.2 Primitive Shared Value Operations ........................... 187 6.3 Programming Scenarios ...................................... 189 6.3.1 Array Assignment .................................... 190 6.3.2 Matrix Addition .....................................

Load more