MT 2015 21011433 8070.Pdf

Deanship of Graduate Studies Al-Quds University Performance Evaluation of Message Passing vs. Multithreading Parallel Programming Paradigms on Multi-core Systems Hadi Mahmoud Khalilia M.Sc. Thesis Jerusalem – Palestine 1435/ 2014 Deanship of Graduate Studies Al-Quds University Performance Evaluation of Message Passing vs. Multithreading Parallel Programming Paradigms on Multi-core Systems Prepared By Hadi Mahmoud Khalilia Supervisor: Dr.Nidal Kafri Co-Supervisor: Dr.Rezek Mohammad Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Computer Science at Al-Quds University 1435/ 2014 Al-Quds University Deanship of Graduate Studies Master in Computer Science Computer Science & Information Technology Thesis Approval Performance Evaluation of Message Passing vs. Multithreading Parallel Programming Paradigms on Multi-core Systems Prepared By: Hadi Mahmoud Khalilia Registration No: 21011433 Supervisor: Dr.Nidal Kafri Master thesis submitted and accepted, Date: / / 2015 The names and signatures of the examining committee members are as follows: 1- Head of Committee: Dr.Jehad Najjar Signature …………... 2- Internal Examiner: Signature …………... 3- External Examiner: Signature …………... 4- Committee Member: Dr.Nidal Kafri Signature …………... 5- Co-Supervisor: Dr.Rezek Mohammad Signature …………... Jerusalem – Palestine 1435 / 2014 Dedication To my beloved Country Palestine To My Parents and family To My wife and sons (Zaid and Mahmoud) Declaration I certify that this thesis submitted for the degree of Master, is the result of my own research, except where otherwise acknowledged, and that this study (or any part of the same) has not been submitted for a higher degree to any other university or institution. Signed…………………………… Hadi Mahmoud Yousef Khalilia Date:…………………………….. i Acknowledgments For his support and guidance, I would like to thank my supervisors, Dr. Nidal Kafri and Dr.Rezek Mohammad. Without their meticulous effort and support, this thesis would not have been the same. I would also very much like to express my gratitude to all people in the Computer Science department for their endless help during study. In addition, I would like to dedicate special and great thanks to my parents, brothers, sisters, wife, lovely son Zaid and friends for their inspiration and support throughout my academic life. ii Abstract Present and future multi-core computational system architecture attracts researchers as an adequate and inexpensive solution to achieve high performance computation for many problems. The multi-core architecture enables implementation of shared memory and/or message passing parallel processing paradigms. Therefore, there is a great need for standard libraries in order to utilize the resources efficiently and effectively. In this work, we evaluate the performance of message passing using two versions of the well-known message-passing interface (MPI) library: MPICH1 vs. MPICH2. Furthermore, we compared the performance of shared memory using OpenMP that supports multithreading with MPI. The added features (total 9) impacted the MPICH2 results over MPICH1. On the other hand, the overheads of message passing and large data communication impact negatively on the performance of this paradigm against multithreading paradigm. iii دراﺳﺔ ﻣﻘﺎرﻧﺔ اﻷداء واﻟﻛﻔﺎءة ﺑﯾن ﺗﻘﻧﯾﺎت اﻟﺧوارزﻣﯾﺎت اﻟﻣﺗوازﯾﺔ: ﺗﻣرﯾر اﻟرﺳﺎﺋل وﻣﺗﻌدد اﻟﺧﯾوط ﻋﻠﻰ ﺟﻬﺎز ﺣﺎﺳوب ﻣﺗﻌدد اﻟﻣﻌﺎﻟﺟﺎت إﻋﺪاد : ھﺎدي ﻣﺤﻤﻮد ﯾﻮﺳﻒ ﺧﻠﯿﻠﯿﺔ إﺷﺮاف : د.ﻧﻀﺎل اﻟﻜﻔﺮي ﻣﻠﺧص: ان اﻟﺗطور ﻓﻲ ﻫﯾﻛﻠﯾﺔ اﻧظﻣﺔ اﻟﺣﺎﺳوب ﻣﺗﻌددة اﻟﻣﻌﺎﻟﺟﺎت اﻟﺣﺎﻟﯾﺔ واﻟﻣﺳﺗﻘﺑﻠﯾﺔ ﺗﺷﻛل ﻋﺎﻣل ﺟذب وﺗﺣول ﻧﺣو ﺗﻛﺛﯾف اﺳﺗﺧدام اﻟﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ ﻓﻲ اﻟﺗطﺑﯾﻘﺎت ﻋﻠﻰ اﺧﺗﻼف اﻧواﻋﻬﺎ ﻟﻠووﺻول اﻟﻰ اداء اﻓﺿل ﺣﯾث ﯾﻣﻛن ﺗﻘﺳﯾم اﻟﻌﻣﻠﯾﺎت واﻟﺑﯾﺎﻧﺎت اﻟﻛﺑﯾرة اﻟﻰ ﻋﻣﻠﯾﺎت/وﻣﻌﺎﻟﺟﺎت اﺻﻐر وﺗﻘﺳم اﻟﺑﯾﺎﻧﺎت ﻋﻠﻰ ﻫذﻩ اﻟﻣﻌﺎﻟﺟﺎت/اﻟﻌﻣﻠﯾﺎت. ﻟذا ﯾﺷﻬد اﻟﺑﺣث اﻟﻌﻠﻣﻲ ﻧﺷﺎطﺎ ﻣﻛﺛﻔﺎ ﻓﻲ اﻟﺑﺣث ﻋن اﻓﺿل اﻟطرق ﻟزﯾﺎدة ﻓﻌﺎﻟﯾﺔ اﺳﺗﺧدام ﻫذﻩ اﻟﺑﻧﯾﺔ اﻟﺣﺎﺳوﺑﯾﺔ ﺑﺎﻓﺿل ﻣﺎ ﯾﻣﻛن. ﺣﯾث ﯾﻣﻛن اﺳﺗﺧدام اﻟﻧﻣﺎذج واﻟطرق اﻻﺳﺎﺳﯾﺔ اﻟﻣﺗﺑﻌﺔ ﻓﻲ اﻟﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ واﻟﺗﻲ ﺗﺳﺎﻋد ﻓﻲ ﺗوﻓﯾر اﻟﺗواﺻل واﻟﺗﻔﺎﻋل ﺑﯾن اﻟﻌﻣﻠﯾﺎت اﻟﻣﺧﺗﻠﻔﺔ: اﻟﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ ﺑوﺟود اﻟذاﻛرة اﻟﻣﺷﺗرﻛﺔ (Shared memory parallel processing paradigm) وﻛذﻟك ﺑﺎﺳﺗﺧدام ﺗﺑﺎدل اﻟرﺳﺎﺋل واﻟﺑﯾﺎﻧﺎت ﺑﯾن ﻫذﻩ اﻟﻌﻣﻠﯾﺎت (Message passing interface MPI) أو ﻛﻼﻫﻣﺎ ﻣﻌﺎ (Hybrid approach). ﻟذا ﺗوﻓر ﻣراﻛز اﻻﺑﺣﺎث واﻟﻣؤﺳﺳﺎت ﻣﻛﺗﺑﺎت ﺑرﻣﺟﯾﺔ ﻟﺗﺳﻬﯾل ﺗطوﯾر ﺑرﻣﺟﯾﺎت اﻟﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ. وﻟﺗﻘﯾﯾم اﻟطرق اﻟﻣﻘﺗرﺣﺔ ﻓﻲ ﺗﺟﺎرب اﻻﺑﺣﺎث ﻻﺳﺗﺧدام ﻫذﻩ اﻟﺑﻧﯾﺔ وطﺑﯾﻌﺔ اﻟﺗطﺑﯾﻘﺎت اﻟﺑرﻣﺟﯾﺔ ﻻﺑد ﻣن اﺳﺗﺧدام ﺑرﻣﺟﯾﺎت وﺑﯾﺎﻧﺎت ﻗﯾﺎﺳﯾﺔ ﻣﻌرﻓﺔ ﻓﻲ ﻣﺟﺎل اﻟﺑﺣث اﻟﻌﻠﻣﻲ. ﻫﻧﺎك اﻟﻌدﯾد ﻣن اﻟﻣﺷﺎﻛل اﻟﯾوﻣﯾﺔ اﻟﺗﻲ ﺗم ﺣﻠﻬﺎ ﺑﺎﺳﺗﺧدام اﻟﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ ﻣﻧﻬﺎ اﻟﺣزﻣﺔ اﻟﻔﯾزﯾﺎﺋﯾﺔ (WEIN2K)، ﻫذﻩ اﻟﺣزﻣﺔ ﺗﺣﺗوي ﻋﻠﻰ ﺑﻌض اﻟﻣﺷﺎﻛل ﻣﻧﻬﺎ اﺳﺗﻐراق اﻟوﻗت اﻟﻛﺑﯾر ﻓﻲ اﻟﺗﺷﻐﯾل، وﻫذا ﻧﺎﺗﺞ ﻣن أن اﻟﺣزﻣﺔ ﻻزاﻟت ﺗﺳﺗﺧدم اﻟﻘﻧﺎة اﻟﻘدﯾﻣﺔ (MPICH1) ﻓﻲ ﺗﺑﺎدل اﻟرﺳﺎﺋل ﻟﻠﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ. ﻓﻲ ﻫذا اﻟﻌﻣل اﻟﺑﺣﺛﻲ ﻗﻣﻧﺎ ﺑﺗﻣﺛﯾل اﻟﺣزﻣﺔ اﻟﻔﯾزﯾﺎﺋﯾﺔ ﻋﻠﻰ اﻟﻘﻧﺎة اﻟﺟدﯾدة (MPICH2) ﻟﺗﺑﺎدل اﻟرﺳﺎﺋل، وﻣن ﺛم ﻗﻣﻧﺎ ﺑﺗﻘﯾﯾم أداء اﺻدارﯾن ﻣﺗﺗﺎﻟﯾﺎن ﻟﻠﺣزﻣﺔ اﻟﺑرﻣﺟﯾﺔ اﻟﺗﻲ ﺗدﻋم اﻟﻣﻌﺎﻟﺟﺔ اﻟﻣﺗوازﯾﺔ ﺑطرﯾﻘﺔ ﺗﺑﺎدل اﻟرﺳﺎﺋل (MPI) ﺑﯾن اﻟﻌﻣﻠﯾﺎت وﻫﻲ iv ﺑﺎﻟﺗﺣدﯾد (MPICH1) و (MPICH2). ﺣﯾث ﻗﻣﻧﺎ ﺑﺗﻧﻔﯾذ اﻟﺣزﻣﺔ (WIEN2K) اﻟﻣﻌروﻓﺔ ﻓﻲ أﺑﺣﺎث ﻋﻠم اﻟﻔﯾزﯾﺎء واﻟﻛﯾﻣﯾﺎء ﻛﺣﺎﻟﺔ دراﺳﯾﺔ. وﻫﻲ ﺣزﻣﺔ ﻣﺧﺻﺻﺔ ﻟدراﺳﺔ اﻟﺧﺻﺎﺋص اﻟﻔﯾزﯾﺎﺋﯾﺔ واﻟﻛﯾﻣﯾﺎﺋﯾﺔ ﻟﻠﻣواد ﺑﺎﻹﻋﺗﻣﺎد ﻋﻠﻰ ﻧظرﯾﺔ اﻟﻛﺛﺎﻓﺔ اﻟوظﯾﻔﯾﺔ اﻟﻔﯾزﯾﺎﺋﯾﺔ ﺑواﺳطﺔ اﻟﻣﺣﺎﻛﺎة ﻛﺗطﺑﯾق ﻗﯾﺎﺳﻲ ﻟﺗﺟﺎرﺑﻧﺎ ﺑﺎﺳﺗﺧدام (MPICH1) و (MPICH2). وﻓﻲ ﻫذا اﻟﺑﺣث ﺗم ﻛذﻟك ﺗﻘﯾﯾم أداء (Multithreading) اﻟﻣﺗوﻓرة ﻓﻲ ﺣزﻣﺔ (OpenMP) إﻋﺗﻣﺎدا ً ﻋﻠﻰ ﺧﺎﺻﯾﺔ اﻟذاﻛرة اﻟﻣﺷﺗرﻛﺔ ﻓﻲ ﻫذﻩ اﻟﻬﯾﻛﻠﯾﺔ و (Multiprocesses) ﺑﺎﺳﺗﺧدام ﺗﺑﺎدل اﻟرﺳﺎﺋل ﺑﯾن اﻟﻌﻣﻠﯾﺎت ﻓﻲ ﺗطﺑﯾﻘﯾن ﻣﺧﺗﻠﻔﯾن ﻣن ﺣﯾث ﺗﺑﺎدل اﻟﺑﯾﺎﻧﺎت وﺣﺟﻣﻬﺎ. ﻟذا ﻗﻣﻧﺎ ﺑﺗﻧﻔﯾذ ﺧوارزﻣﯾﺔ ﺿرب ﻣﺻﻔوﻓﺗﯾن ﻛﺗطﺑﯾق ﻓﯾﻪ ﺣﺟم ﺗﺑﺎدل اﻟﺑﯾﺎﻧﺎت ﻛﺑﯾر واﺧر وﻫو ﺣﺳﺎب اﻟﺛﺎﺑت اﻟﺣﺳﺎﺑﻲ (π) وﻫﻲ اﻟﻧﺳﺑﺔ اﻟﺗﻘرﯾﺑﯾﺔ ﻟﻣﺣﯾط اﻟداﺋرة إﻟﻰ ﻧﺻف ﻗطرﻫﺎ. ﺣﯾث إن ﺗﺑﺎدل اﻟﺑﯾﺎﻧﺎت ﯾﻛﺎد ﯾﻛون ﺻﻐﯾرا ً ﺟدا ً. اﻟﻧﺗﺎﺋﺞ أظﻬرت أن آداء اﻹﺻدار اﻟﺛﺎﻧﻲ ﻣن ﻣﻛﺗﺑﺔ واﺟﻬﺔ ﺗﻣرﯾر اﻟرﺳﺎﺋل (MPICH2) أﻓﺿل ﻣن آداء اﻹﺻدار اﻷول (MPICH1)، وﺗﻌﻠﯾل ذﻟك ﯾرﺟﻊ اﻟﻰ اﻟﺗﺣﺳﯾﻧﺎت/اﻟﻣﻣﯾزات اﻹﺿﺎﻓﯾﺔ اﻟﺗﺳﻌﺔ اﻟﺗﻲ ﺗم إﺿﺎﻓﺗﻬﺎ ﻋﻠﻰ (MPICH2). ﻛﻣﺎ أظﻬرت اﻟﻧﺗﺎﺋﺞ أن آداء (Multithreading) ﻓﻲ اﻟﺗطﺑﯾﻘﺎت اﻟﺗﻲ ﺗﺗﺑﺎدل ﻓﯾﻬﺎ اﻟﻌﻣﻠﯾﺎت ﺑﯾﺎﻧﺎت ﻛﺑﯾرة ﯾﻛون أﻓﺿل ﻣن (Message Passing) واﻟﻌﻛس ﺻﺣﯾﺢ، وﻫذا ﯾرﺟﻊ إﻟﻰ ﻋﻣﻠﯾﺎت ذات ﺣﻣوﻟﺔ ﻛﺑﯾرة وﺣﺟم اﻟرﺳﺎﺋل اﻟﻛﺑﯾر. v Table of Contents DECLARATION ............................................................................................. i ABSTRACT .................................................................................................... iii LIST OF TABLES ........................................................................................... viii LIST OF FIGURES ......................................................................................... ix LIST OF APPENDICES ……........................................................................... x LIST OF ABBREVIATIONS............................................................................ xi CHAPTER ONE .............................................................................................. 1 INTRODUCTION ............................................................................................................. 1 CHAPTER TWO ............................................................................................. 5 BACKGROUND .............................................................................................................. 5 A. PARALLEL APPROACHES......................................................................................... 5 A.1 PARALLEL HARDWARE AND SOFTWARE........................................................ 7 A.2 SHARED MEMORY AND DISTRIBUTED MEMORY PARADIGMS........................ 9 B. MESSAGE PASSING CHANNEL (MPICH) ............................................................... 12 B.1 DIFFERENCES BETWEEN TWO CHANNELS: MPICH1 Vs. MPICH2 ………….. 15 C. BENCHMARKS......................................................................................................... 20 C.1 DENSITY FUNCTIONAL THEORY (DFT) ........................................................... 20 C.2 MATRIX MULTIPLICATION.............................................................................. 24 C.3 APPROXIMATE VALUE -- PI (π)....................................................................... 26 CHAPTER THREE .......................................................................................... 29 LITERATURE REVIEW ................................................................................................... 29 CHAPTER FOUR ........................................................................................... 38 vi METHODOLOGY …........................................................................................................ 38 CHAPTER FIVE ............................................................................................. 45 EXPERIMENTS AND RESULTS ANALYSIS.................................................................... 45 1. PART 1.............................................................................................................. 48 2. PART 2.............................................................................................................. 53 A. PART 2: CASE 1........................................................................................ 53 B. PART 2: CASE 2........................................................................................ 59 CHAPTER SIX.............................................................................................. 62 CONCLUSION................................................................................................................ 62 REFERENCES................................................................................................. 64 APPENDICES.................................................................................................

MT 2015 21011433 8070.Pdf

Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++

Parallel Computing a Key to Performance

Control Replication: Compiling Implicit Parallelism to Efficient SPMD with Logical Regions

Regent: a High-Productivity Programming Language for Implicit Parallelism with Logical Regions

Implicit Parallelism, Mixed Approaches

Lawrence Berkeley National Laboratory Recent Work

14. Parallel Computing 14.1 Introduction 14.2 Independent

Implicitly-Multithreaded Processors

Parallel Programming – Multicore Systems

Global HPCC Benchmarks in Chapel: STREAM Triad, Random Access, and FFT (Revision 1.4 — June 2007 Release)

SCS5108 Parallel Systems Unit II Parallel Programming

ZPL - Parallel Programming Language