Is Parallel Programming Hard, And, If So, What Can You Do About It?
Total Page:16
File Type:pdf, Size:1020Kb
Is Parallel Programming Hard, And, If So, What Can You Do About It? Edited by: Paul E. McKenney Linux Technology Center IBM Beaverton [email protected] December 16, 2011 ii Legal Statement This work represents the views of the authors and does not necessarily represent the view of their employers. IBM, zSeries, and Power PC are trademarks or regis- tered trademarks of International Business Machines Corporation in the United States, other countries, or both. Linux is a registered trademark of Linus Torvalds. i386 is a trademarks of Intel Corporation or its sub- sidiaries in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of such companies. The non-source-code text and images in this doc- ument are provided under the terms of the Creative Commons Attribution-Share Alike 3.0 United States li- cense (http://creativecommons.org/licenses/by-sa/ 3.0/us/). In brief, you may use the contents of this doc- ument for any purpose, personal, commercial, or other- wise, so long as attribution to the authors is maintained. Likewise, the document may be modified, and derivative works and translations made available, so long as such modifications and derivations are offered to the public on equal terms as the non-source-code text and images in the original document. Source code is covered by various versions of the GPL (http://www.gnu.org/licenses/gpl-2.0.html). Some of this code is GPLv2-only, as it derives from the Linux kernel, while other code is GPLv2- or-later. See the CodeSamples directory in the git archive (git://git.kernel.org/pub/scm/linux/ kernel/git/paulmck/perfbook.git) for the exact li- censes, which are included in comment headers in each file. If you are unsure of the license for a given code fragment, you should assume GPLv2-only. Combined work c 2005-2011 by Paul E. McKenney. Contents 1 Introduction 1 1.1 Historic Parallel Programming Difficulties . 1 1.2 Parallel Programming Goals . 2 1.2.1 Performance . 3 1.2.2 Productivity . 3 1.2.3 Generality . 4 1.3 Alternatives to Parallel Programming . 6 1.3.1 Multiple Instances of a Sequential Application . 6 1.3.2 Make Use of Existing Parallel Software . 6 1.3.3 Performance Optimization . 7 1.4 What Makes Parallel Programming Hard? . 7 1.4.1 Work Partitioning . 8 1.4.2 Parallel Access Control . 8 1.4.3 Resource Partitioning and Replication . 9 1.4.4 Interacting With Hardware . 9 1.4.5 Composite Capabilities . 9 1.4.6 How Do Languages and Environments Assist With These Tasks? . 9 1.5 Guide to This Book . 10 1.5.1 Quick Quizzes . 10 1.5.2 Sample Source Code . 10 2 Hardware and its Habits 11 2.1 Overview . 11 2.1.1 Pipelined CPUs . 11 2.1.2 Memory References . 12 2.1.3 Atomic Operations . 13 2.1.4 Memory Barriers . 13 2.1.5 Cache Misses . 13 2.1.6 I/O Operations . 14 2.2 Overheads . 14 2.2.1 Hardware System Architecture . 15 2.2.2 Costs of Operations . 16 2.3 Hardware Free Lunch? . 16 2.3.1 3D Integration . 17 2.3.2 Novel Materials and Processes . 17 2.3.3 Special-Purpose Accelerators . 18 2.3.4 Existing Parallel Software . 18 2.4 Software Design Implications . 18 iii iv CONTENTS 3 Tools of the Trade 21 3.1 Scripting Languages . 21 3.2 POSIX Multiprocessing . 22 3.2.1 POSIX Process Creation and Destruction . 22 3.2.2 POSIX Thread Creation and Destruction . 23 3.2.3 POSIX Locking . 24 3.2.4 POSIX Reader-Writer Locking . 26 3.3 Atomic Operations . 28 3.4 Linux-Kernel Equivalents to POSIX Operations . 28 3.5 The Right Tool for the Job: How to Choose? . 30 4 Counting 31 4.1 Why Isn't Concurrent Counting Trivial? . 31 4.2 Statistical Counters . 33 4.2.1 Design . 33 4.2.2 Array-Based Implementation . 33 4.2.3 Eventually Consistent Implementation . 34 4.2.4 Per-Thread-Variable-Based Implementation . 35 4.2.5 Discussion . 36 4.3 Approximate Limit Counters . 36 4.3.1 Design . 36 4.3.2 Simple Limit Counter Implementation . 37 4.3.3 Simple Limit Counter Discussion . 40 4.3.4 Approximate Limit Counter Implementation . 40 4.3.5 Approximate Limit Counter Discussion . 41 4.4 Exact Limit Counters . 41 4.4.1 Atomic Limit Counter Implementation . 41 4.4.2 Atomic Limit Counter Discussion . 44 4.4.3 Signal-Theft Limit Counter Design . 44 4.4.4 Signal-Theft Limit Counter Implementation . 45 4.4.5 Signal-Theft Limit Counter Discussion . 47 4.5 Applying Specialized Parallel Counters . 48 4.6 Parallel Counting Discussion . 49 5 Partitioning and Synchronization Design 51 5.1 Partitioning Exercises . 51 5.1.1 Dining Philosophers Problem . 51 5.1.2 Double-Ended Queue . 53 5.1.3 Partitioning Example Discussion . 58 5.2 Design Criteria . 58 5.3 Synchronization Granularity . 59 5.3.1 Sequential Program . 59 5.3.2 Code Locking . 60 5.3.3 Data Locking . 61 5.3.4 Data Ownership . 63 5.3.5 Locking Granularity and Performance . 64 5.4 Parallel Fastpath . 65 5.4.1 Reader/Writer Locking . 66 5.4.2 Hierarchical Locking . 66 5.4.3 Resource Allocator Caches . 67 CONTENTS v 5.5 Performance Summary . 70 6 Locking 71 6.1 Staying Alive . 71 6.1.1 Deadlock . ..