BMDFM: a Hybrid Dataflow Runtime Parallelization Environment for Shared Memory Multiprocessors

BMDFM: A Hybrid Dataflow Runtime Parallelization Environment for Shared Memory Multiprocessors Oleksandr Pochayevets Technische Universität München Institut für Informatik Lehrstuhl für Rechnertechnik und Rechnerorganisation BMDFM: A Hybrid Dataflow Runtime Parallelization Environment for Shared Memory Multiprocessors Thesis in Computer Engineering Oleksandr Pochayevets Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr.rer.nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Ernst Mayr Prüfer der Dissertation: 1. Univ.-Prof. Dr. Arndt Bode 2. Univ.-Prof. Dr. Hans Michael Gerndt Die Dissertation wurde am _______21.07.2005_______ bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am _______01.02.2006_______ angenommen. iii Abstract Nowadays parallel shared memory symmetric multiprocessors (SMP) are complex machines, where a large number of architectural aspects have to be simultaneously addressed in order to achieve high performance. The quick evolution of parallel machines has been followed by the evolution of parallel execution environments. An effective parallel environment must be high-level enough so that it is easy for the programmer to use, and map well to the underlying computer architecture for efficient execution. A recent general methodology of sequential code parallelization for SMP relies on compile-time methods. But a compiler can apply them only when the dependencies are simple and clear. However if dependencies are complex, compilers may not be able to suggest a different parallel execution order. Compile-time optimizations cannot be applied to situations where the time it takes to complete an operation varies at runtime, so the programmers have to synchronize the units of parallelism in their application programs explicitly. Compilers also can perform only limited inter-procedural and cross-conditional optimizations. Directive-based parallelism for SMP supported by a fork-join paradigm runtime library has disadvantages due to the necessity of expressing parallelism explicitly, insufficient portability, granularity that is too fine and idling for the slowest slave threads. On the other hand, the dataflow computational model presumes implicit exploitation of parallelism but lacks a natural software development methodology and has dynamic scheduling overhead. These were the trends we were interested in, when we started the development. To complement existing compiler-optimization methods we propose a programming model and a runtime system called BMDFM (Binary Modular DataFlow Machine), a novel hybrid parallel environment for SMP, that creates a data-dependence graph and exploits parallelism of user application programs at run time. This thesis describes the design and provides a detailed analysis of BMDFM, which uses a dataflow runtime engine instead of a plain fork-join runtime library, thus providing transparent dataflow semantics on the top virtual machine level. Our hybrid approach eliminates disadvantages of the compile-time methods, the directive-based paradigm and the dataflow computational model. It is portable and is already implemented on a set of available shared memory symmetric multiprocessors. The transparent dataflow semantics paradigm does not require parallelization and synchronization directives. The BMDFM runtime system shields the end-users from these details. Tunable grain of parallelism provides efficient performance. BMDFM is ported and evaluated on most available SMP platforms. The evaluation of BMDFM in this thesis was done on a POWER4 IBM p690 SMP machine. The evaluation consisted of several different types of experiments. We evaluated the overhead introduced by the execution environment, BMDFM Abstract iv and the performance obtained running both standard numerical applications and non-trivial adaptive algorithm based applications. Abstract BMDFM v Table of Contents Abstract..............................................................................................................................................iii Table of Contents ............................................................................................................................... v List of Figures....................................................................................................................................ix List of Tables ..................................................................................................................................... xi Acknowledgements .........................................................................................................................xiii Chapter 1 Introduction........................................................................................................................................ 1 1.1 Motivation .............................................................................................................................. 1 1.2 Contributions .......................................................................................................................... 2 1.3 Thesis Structure...................................................................................................................... 3 Chapter 2 State of the Art ................................................................................................................................... 5 2.1 Overview ................................................................................................................................ 5 2.2 Target SMP Hardware............................................................................................................ 5 2.3 Directive-Based Parallelization and Fork-Join Paradigm for SMP...................................... 10 2.4 Dataflow Computation Model.............................................................................................. 15 2.5 Thread-Level Speculations................................................................................................... 18 2.6 Software Dynamic Parallelization........................................................................................ 20 2.7 Summary .............................................................................................................................. 21 Chapter 3 BMDFM Architectural Overview .................................................................................................. 23 3.1 Overview .............................................................................................................................. 23 3.2 Basic Concept....................................................................................................................... 23 3.3 Multithreaded Architecture .................................................................................................. 24 3.4 Static Scheduler.................................................................................................................... 26 3.5 Dynamic Scheduler .............................................................................................................. 26 3.6 Programming Model............................................................................................................. 30 3.7 Virtual Machine Language and C Interface ......................................................................... 31 3.8 Workflow for Applications .................................................................................................. 40 3.9 Running Applications on a Single Threaded Engine ........................................................... 41 3.10 Running Applications Multithreadedly .............................................................................. 42 3.11 Summary ............................................................................................................................ 45 Chapter 4 Dynamic Scheduling Subsystem ..................................................................................................... 47 BMDFM Table of Contents vi 4.1 Overview .............................................................................................................................. 47 4.2 Inter Process Synchronization .............................................................................................. 47 4.3 Shared Memory Pool............................................................................................................ 52 4.4 Non-Dead-Locking Policy ................................................................................................... 55 4.5 Inter Process Communication .............................................................................................. 57 4.6 Task Connection Zone.......................................................................................................... 61 4.7 I/O Ring Buffer Ports ........................................................................................................... 65 4.8 Data Buffer ........................................................................................................................... 68 4.9 Operation Queue................................................................................................................... 72 4.10 IORBP Scheduling Process ................................................................................................ 74 4.11 OQ Scheduling

Load more