Design of a Parallel Multi-Threaded Programming Model for Multi-Core Processors
Total Page:16
File Type:pdf, Size:1020Kb
DESIGN OF A PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS By Muhammad Ali Ismail Thesis submitted for the Degree of Doctor of Philosophy Department of Computer and Information Systems Engineering NED University of Engineering & Technology University Road, Karachi - 75270, Pakistan 2011 DESIGN OF A PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS PhD Thesis By Muhammad Ali Ismail Batch: 2008-2009 Project Advisor: Prof. Dr. Shahid Hafeez Mirza Project Co-supervisor: Prof. Dr. Talat Altaf 2011 Department of Computer and Information Systems Engineering NED University of Engineering & Technology University Road, Karachi - 75270, Pakistan Certificate Certified that the thesis entitled, “DEVELOPMENT OF A NEW PARALLEL MULTI-THREADED PROGRAMMING MODEL FOR MULTI-CORE PROCESSORS” which is being submitted by Mr. Muhammad Ali Ismail for the award of degree of Doctor of Philosophy in Computer & Information Systems Engineering Department of NED University of Engineering and Technology is a record of candidate’s own original work carried out by him under our supervision and guidance. The work incorporated in this thesis has not been submitted elsewhere for the award of any other degree. ___________________ _________________________ Prof. Dr. Talat Altaf, Prof. Dr. Shahid Hafeez Mirza Dean (ECE ), NEDUET Professor, UIT PhD Co-supervisor PhD Supervisor Acknowledgements In first place, I would like to thank the Almighty Allah for His countless blessings. In fact, all praise and glory belongs to Him and none has the right and worth to be worshipped but He. Next, I would like to acknowledge my home university, NED university of Engineering and Technology, for giving me the opportunity and funding for conducting this PhD research. I would also like to express my gratitude to my mentor and supervisor, Prof. Dr. Shahid Hafeez Mirza, for his generous supervision. His continuous support, encouragement, guidance, advices and comments helped me to stay in the right direction to complete this research. I am also very grateful to my co-supervisor, Prof. Dr. Talat Altaf, for his very kind advices, support and motivation throughout my PhD research. Many thank to my department, Computer and Information System Engineering, including my colleagues and its administrative and technical staff for providing me such a supportive and productive work environment. Last but not the least, special thanks to my family, particularly to my parents for their endless prayers and support. CONTENTS Abstract……………………………………………………………………………………………………………………..………….. v List of Publications…………………………………………………………………………………………………………………. vi List of Figures………………………………………………………………………………………………………..……..………… vii List of Tables………………………………………………………………………………………………………………………..… x 1. Introduction…………………………………………………………………………………………………………………….. 1 1.1. Contributions of Dissertation 1 1.1.1. Multi-level Cache System for Multi-core Processors ( "LogN+1" and "LogN" Cache Models ) 2 1.1.2. Multi-level Cache Simulator for Multi-core processors ( "MCSMC" ) 3 1.1.3. Multi-threaded Parallel Programming Model for Multi-core processors ( "SPC3 PM" ) 3 1.2. The Thesis Organization 4 2. Motivation and Challenges with Multi-Core Processors……………………………………………………. 5 2.1. Architectural Challenges 6 2.1.1. Memory Hierarchy 6 2.1.1.1. Cache Levels 7 2.1.1.2. Synchronization 7 2.1.1.3. False Sharing 8 2.1.1.4. Spinning 8 2.1.1.5. Communication Minimization 8 2.1.2. Architectural Support for Compilers / Programming Models 9 2.2. Software Challenges 9 2.2.1. Parallel Programming Models 9 2.2.2. Parallel Algorithm Models 10 2.2.2.1. Data Parallel Models 10 2.2.2.2. Task Graph Model 11 2.2.2.3. Work Pool Model 11 2.2.2.4. Master-Slave Model 11 2.2.2.5. Pipeline or Producer-Consumer Model 11 2.2.3. Decomposition Techniques 12 2.2.3.1. Recursive Decomposition 12 2.2.3.2. Data Decomposition 13 2.2.3.3. Exploratory Decomposition 13 2.2.3.4. Speculative Decomposition 13 2.2.4. Levels of Parallelism 13 2.2.5. Compiler Optimization 15 2.2.5.1. Parallelism 15 2.2.5.2. Removal of Data Dependencies 16 I 2.2.5.3. Memory Space 16 2.2.6. Related Tools for Performance and Parallel Debugging 16 2.2.7. Regular and Irregular Problems 17 2.3. Performance and Scalability Issues 18 2.4. Summary 19 3. LogN+1' and 'LogN' Cache model, A Binary Tree Based Cache System for Multi-Core Processors……………………………………………………………………………………………………………………….. 20 3.1. Present 3-level Cache System and Related Improvements for Multi-core Processors 20 3.2. 'LogN+1' and 'LogN' Cache Model 22 3.2.1. Design Concept 23 3.2.2. Cache Hierarchy and Cache Size 23 3.2.3. Cache Hierarchy and Cache frequency (Cycle Time) 28 3.3. Performance Evolution 30 3.3.1. Average Cache Access Time 30 3.3.2. Probability of Cache Hits 32 3.3.3. Result Analysis 34 3.4. Summary 35 4. Queuing Modeling of 'LogN+1' and 'LogN' Cache Models……………….………………………………… 36 4.1. Queuing Theory and Kendal’s Notation 36 4.2. M/D/C/K- FIFO Queuing Model, for LogN+1 and LogN Cache Model 37 4.2.1. Basic Model 38 4.2.2. Performance Equations 39 4.2.2.1. Average Data Request Rate 40 4.2.2.2. Average Cache Utilization 41 4.2.2.3. Average Individual Cache Access Time 42 4.2.2.4. Average Request Queue Length 42 4.2.2.5. Overall Average Cache System Access Time 42 4.3. Queuing Model for 3-Level Cache system 43 4.4. Performance Evolution 45 4.4.1. LogN+1 Model 45 4.4.2. LogN Model 48 4.4.3. Present 3-Level Cache System 48 4.4.4. Result Analysis 52 4.5. Summary 56 5. Simulation of 'LogN+1' and 'LogN' Cache Models Using 'MCSMC'…….………………………………. 57 5.1. Cache Simulation 57 5.2. MCSMC (Multi-level Cache Simulator for Multi-Cores) 58 5.2.1. Input Parameters Set 58 5.2.2. Software Modules 59 5.2.2.1. Cache Architecture Generator 60 5.2.2.2. Program Scheduler 60 5.2.2.3. Trace Generator 60 5.2.2.4. Replacement Policy Module 62 II 5.2.2.5. Results Generation 62 5.2.3. Serial / Parallel Execution of MCSMC 62 5.2.4. Comparison with CACTI Cache Simulator 65 5.3. Performance Evolution 67 5.3.1. Simulation Environment 67 5.3.2. Result Analysis 67 5.4. Summary 72 6. SPC3 PM; A Multithreaded Parallel Software Development Environment for Multi-Core Processors………………………………………………………………………………………...…………………………… 73 6.1. Currently Available Parallel Programming Tools 73 6.1.1. Commercially Available Multi-Core Application Development Aids 73 6.1.1.1. Intel's Multi-Core Application Development Aids 74 6.1.1.2. Microsoft’s Multi-Core Application Development Aids 76 6.1.1.3. Sun's Multi-Core Application Development Aids 76 6.1.1.4. Other Commercial Multi-Core Application Development Aids 77 6.1.2. Other Standard Shared Memory Programming Approaches Use for 78 Multi-core processors 6.1.2.1. Erlang 78 6.1.2.2. POSIX Thread (Pthreads) 79 6.1.2.3. OpenMP 79 6.1.3. Research Oriented Multi-Core Application Development Tools 79 6.1.4. Current Multi-Core Research Groups 81 6.1.5. Summary 83 6.2. Key Features of SPC3 PM 84 6.3. Design Concepts 85 6.3.1. Design Issues with Multi-Core Programming 86 6.3.2. Task Based Parallelism 89 6.3.3. Thread Level Parallelism 89 6.3.4. Decomposition Techniques 90 6.3.5. Task Scheduling 92 6.3.6. Execution Modes 93 6.3.7. Types of Problem Supported 93 6.3.8. Data Sharing 94 6.3.9. Compilation 94 6.4. Programming with SPC3 PM 96 6.4.1. Rules for Task Decomposition 96 6.4.2. Properties of a Task 97 6.4.3. Program Structure 99 6.4.4. SPC3 PM Library 100 6.4.4.1. Serial Function 100 6.4.4.2. Parallel Function 102 6.4.4.3. Concurrent Function 104 6.5. Performance Evolution 106 6.5.1. Matrix Multiplication Algorithm 107 6.5.2. Serial Function 109 6.5.3. Parallel Function 113 6.5.4. Concurrent Function 119 6.6. Summary 125 III 7. Solving Travelling Salesman Problem using SPC3 PM..………………………………………………………. 126 7.1. Travelling Salesman Problem (TSP) 126 7.1.1. TSP applications 126 7.1.2. TSP solutions 128 7.1.2.1. Exact Algorithms 129 7.1.2.2. TSP Heuristics 129 7.1.2.3. Meta-Heuristics 129 7.1.2.4. Hyper-Heuristics 130 7.2. Lin-Kernighan Heuristic 130 7.2.1. Basic Lin-Kernighan Heuristic Algorithm (LKH) 130 7.2.2. Modified Lin-Kernighan Heuristic Algorithm (LKH-1) 133 7.2.3. Lin-Kernighan Heuristic Algorithm with General k-opt Sub-move (LKH2) 134 7.3. LKH-2 Software 135 7.3.1. Execution of LKH-2 Software 135 7.3.2. Flow Chart for LKH-2 Software Processing 138 7.4. Parallelization of LKH-2 Software using SPC3 PM 139 7.4.1. Flow Chart for Parallel LKH-2 Software Processing Parallelized using SPC3 PM 141 7.5. Performance Evaluation 142 7.5.1. TSP Library (TSPLIB) 142 7.5.2. Result Analysis 143 7.6. Summary 150 8. Conclusions and Future Work……………………………………………….…………………………………………. 151 8.1. Summary 151 8.2. Future work 154 Appendix A: List of TSP instances in TSPLIB............................................................................... 156 References……………………………………………………………………………….……………………………………………. 159 IV Abstract With the arrival of Chip Multi-Processors (CMPs), every processor has now built-in parallel computational power and that can be fully utilized only if the program in execution is written accordingly. Also existing memory system and parallel developments tools do not provide adequate support for general purpose multi-core programming and unable to utilize all available cores efficiently.