Is Parallel Programming Hard, And, If So, What Can You Do About It?
Total Page:16
File Type:pdf, Size:1020Kb
Is Parallel Programming Hard, And, If So, What Can You Do About It? Edited by: Paul E. McKenney Linux Technology Center IBM Beaverton [email protected] January 2, 2017 ii Legal Statement This work represents the views of the editor and the authors and does not necessarily represent the view of their respective employers. Trademarks: • IBM, zSeries, and PowerPC are trademarks or registered trademarks of Interna- tional Business Machines Corporation in the United States, other countries, or both. • Linux is a registered trademark of Linus Torvalds. • i386 is a trademark of Intel Corporation or its subsidiaries in the United States, other countries, or both. • Other company, product, and service names may be trademarks or service marks of such companies. The non-source-code text and images in this document are provided under the terms of the Creative Commons Attribution-Share Alike 3.0 United States license.1 In brief, you may use the contents of this document for any purpose, personal, commercial, or otherwise, so long as attribution to the authors is maintained. Likewise, the document may be modified, and derivative works and translations made available, so long as such modifications and derivations are offered to the public on equal terms as the non-source-code text and images in the original document. Source code is covered by various versions of the GPL.2 Some of this code is GPLv2-only, as it derives from the Linux kernel, while other code is GPLv2-or-later. See the comment headers of the individual source files within the CodeSamples directory in the git archive3 for the exact licenses. If you are unsure of the license for a given code fragment, you should assume GPLv2-only. Combined work © 2005-2016 by Paul E. McKenney. 1 http://creativecommons.org/licenses/by-sa/3.0/us/ 2 http://www.gnu.org/licenses/gpl-2.0.html 3 git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/perfbook.git Contents 1 How To Use This Book 1 1.1 Roadmap . .1 1.2 Quick Quizzes . .2 1.3 Alternatives to This Book . .3 1.4 Sample Source Code . .4 1.5 Whose Book Is This? . .4 2 Introduction 7 2.1 Historic Parallel Programming Difficulties . .7 2.2 Parallel Programming Goals . .9 2.2.1 Performance . .9 2.2.2 Productivity . 11 2.2.3 Generality . 12 2.3 Alternatives to Parallel Programming . 14 2.3.1 Multiple Instances of a Sequential Application . 15 2.3.2 Use Existing Parallel Software . 15 2.3.3 Performance Optimization . 15 2.4 What Makes Parallel Programming Hard? . 16 2.4.1 Work Partitioning . 17 2.4.2 Parallel Access Control . 18 2.4.3 Resource Partitioning and Replication . 18 2.4.4 Interacting With Hardware . 19 2.4.5 Composite Capabilities . 19 2.4.6 How Do Languages and Environments Assist With These Tasks? 19 2.5 Discussion . 20 3 Hardware and its Habits 21 3.1 Overview . 21 3.1.1 Pipelined CPUs . 21 3.1.2 Memory References . 23 3.1.3 Atomic Operations . 24 3.1.4 Memory Barriers . 25 3.1.5 Cache Misses . 26 3.1.6 I/O Operations . 26 3.2 Overheads . 27 3.2.1 Hardware System Architecture . 27 3.2.2 Costs of Operations . 28 3.3 Hardware Free Lunch? . 30 iii iv CONTENTS 3.3.1 3D Integration . 31 3.3.2 Novel Materials and Processes . 31 3.3.3 Light, Not Electrons . 32 3.3.4 Special-Purpose Accelerators . 32 3.3.5 Existing Parallel Software . 33 3.4 Software Design Implications . 33 4 Tools of the Trade 35 4.1 Scripting Languages . 35 4.2 POSIX Multiprocessing . 36 4.2.1 POSIX Process Creation and Destruction . 36 4.2.2 POSIX Thread Creation and Destruction . 38 4.2.3 POSIX Locking . 39 4.2.4 POSIX Reader-Writer Locking . 42 4.2.5 Atomic Operations (gcc Classic) . 45 4.2.6 Atomic Operations (C11) . 46 4.2.7 Per-Thread Variables . 47 4.3 Alternatives to POSIX Operations . 47 4.3.1 Organization and Initialization . 47 4.3.2 Thread Creation, Destruction, and Control . 47 4.3.3 Locking . 49 4.3.4 Atomic Operations . 51 4.3.5 Per-CPU Variables . 51 4.3.6 Performance . 53 4.4 The Right Tool for the Job: How to Choose? . 53 5 Counting 55 5.1 Why Isn’t Concurrent Counting Trivial? . 56 5.2 Statistical Counters . 58 5.2.1 Design . 58 5.2.2 Array-Based Implementation . 59 5.2.3 Eventually Consistent Implementation . 60 5.2.4 Per-Thread-Variable-Based Implementation . 63 5.2.5 Discussion . 64 5.3 Approximate Limit Counters . 64 5.3.1 Design . 64 5.3.2 Simple Limit Counter Implementation . 65 5.3.3 Simple Limit Counter Discussion . 71 5.3.4 Approximate Limit Counter Implementation . 72 5.3.5 Approximate Limit Counter Discussion . 72 5.4 Exact Limit Counters . 72 5.4.1 Atomic Limit Counter Implementation . 73 5.4.2 Atomic Limit Counter Discussion . 77 5.4.3 Signal-Theft Limit Counter Design . 77 5.4.4 Signal-Theft Limit Counter Implementation . 79 5.4.5 Signal-Theft Limit Counter Discussion . 85 5.5 Applying Specialized Parallel Counters . 85 5.6 Parallel Counting Discussion . 86 5.6.1 Parallel Counting Performance . 86 5.6.2 Parallel Counting Specializations . 87 CONTENTS v 5.6.3 Parallel Counting Lessons . 88 6 Partitioning and Synchronization Design 91 6.1 Partitioning Exercises . 91 6.1.1 Dining Philosophers Problem . 91 6.1.2 Double-Ended Queue . 95 6.1.3 Partitioning Example Discussion . 103 6.2 Design Criteria . 103 6.3 Synchronization Granularity . 106 6.3.1 Sequential Program . 106 6.3.2 Code Locking . 107 6.3.3 Data Locking . 109 6.3.4 Data Ownership . 111 6.3.5 Locking Granularity and Performance . 112 6.4 Parallel Fastpath . 115 6.4.1 Reader/Writer Locking . 116 6.4.2 Hierarchical Locking . 116 6.4.3 Resource Allocator Caches . 117 6.5 Beyond Partitioning . 123 6.5.1 Work-Queue Parallel Maze Solver . 123 6.5.2 Alternative Parallel Maze Solver . 126 6.5.3 Performance Comparison I . 127 6.5.4 Alternative Sequential Maze Solver . 130 6.5.5 Performance Comparison II . 131 6.5.6 Future Directions and Conclusions . 132 6.6 Partitioning, Parallelism, and Optimization . 133 7 Locking 135 7.1 Staying Alive . 136 7.1.1 Deadlock . 137 7.1.2 Livelock and Starvation . 145 7.1.3 Unfairness . 146 7.1.4 Inefficiency . 147 7.2 Types of Locks . 147 7.2.1 Exclusive Locks . 148 7.2.2 Reader-Writer Locks . 148 7.2.3 Beyond Reader-Writer Locks . 148 7.2.4 Scoped Locking . 150 7.3 Locking Implementation Issues . 152 7.3.1 Sample Exclusive-Locking Implementation Based on Atomic Exchange . 152 7.3.2 Other Exclusive-Locking Implementations . 153 7.4 Lock-Based Existence Guarantees . 155 7.5 Locking: Hero or Villain? . 157 7.5.1 Locking For Applications: Hero! . 157 7.5.2 Locking For Parallel Libraries: Just Another Tool . 157 7.5.3 Locking For Parallelizing Sequential Libraries: Villain! . 161 7.6 Summary . 163 vi CONTENTS 8 Data Ownership 165 8.1 Multiple Processes . 165 8.2 Partial Data Ownership and pthreads . 166 8.3 Function Shipping . 166 8.4 Designated Thread . 167 8.5 Privatization . ..