C++ Concurrency in Action

Total Page:16

File Type:pdf, Size:1020Kb

C++ Concurrency in Action IN ACTION Practical Multithreading Anthony Williams MANNING C++ Concurrency in Action C++ Concurrency in Action PRACTICAL MULTITHREADING ANTHONY WILLIAMS MANNING SHELTER ISLAND For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 261 Shelter Island, NY 11964 Email: [email protected] ©2012 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Manning Publications Co. Development editor: Cynthia Kane 20 Baldwin Road Technical proofreader: Jonathan Wakely PO Box 261 Copyeditor: Linda Recktenwald Shelter Island, NY 11964 Proofreader: Katie Tennant Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781933988771 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 – MAL – 18 17 16 15 14 13 12 To Kim, Hugh, and Erin brief contents 1 ■ Hello, world of concurrency in C++! 1 2 ■ Managing threads 15 3 ■ Sharing data between threads 33 4 ■ Synchronizing concurrent operations 67 5 ■ The C++ memory model and operations on atomic types 103 6 ■ Designing lock-based concurrent data structures 148 7 ■ Designing lock-free concurrent data structures 180 8 ■ Designing concurrent code 224 9 ■ Advanced thread management 273 10 ■ Testing and debugging multithreaded applications 300 vii contents preface xv acknowledgments xvii about this book xix about the cover illustration xxii Hello, world of concurrency in C++! 1 1 1.1 What is concurrency? 2 Concurrency in computer systems 2 Approaches to concurrency 4 1.2 Why use concurrency? 6 Using concurrency for separation of concerns 6 Using concurrency for performance 7 ■ When not to use concurrency 8 1.3 Concurrency and multithreading in C++ 9 History of multithreading in C++ 10 ■ Concurrency support in the new standard 10 ■ Efficiency in the C++ Thread Library 11 ■ Platform-specific facilities 12 1.4 Getting started 13 Hello, Concurrent World 13 1.5 Summary 14 ix x CONTENTS Managing threads 15 2 2.1 Basic thread management 16 Launching a thread 16 ■ Waiting for a thread to complete 18 Waiting in exceptional circumstances 19 ■ Running threads in the background 21 2.2 Passing arguments to a thread function 23 2.3 Transferring ownership of a thread 25 2.4 Choosing the number of threads at runtime 28 2.5 Identifying threads 31 2.6 Summary 32 Sharing data between threads 33 3 3.1 Problems with sharing data between threads 34 Race conditions 35 ■ Avoiding problematic race conditions 36 3.2 Protecting shared data with mutexes 37 Using mutexes in C++ 38 ■ Structuring code for protecting shared data 39 ■ Spotting race conditions inherent in interfaces 40 ■ Deadlock: the problem and a solution 47 Further guidelines for avoiding deadlock 49 ■ Flexible locking with std::unique_lock 54 ■ Transferring mutex ownership between scopes 55 ■ Locking at an appropriate granularity 57 3.3 Alternative facilities for protecting shared data 59 Protecting shared data during initialization 59 ■ Protecting rarely updated data structures 63 ■ Recursive locking 64 3.4 Summary 65 Synchronizing concurrent operations 67 4 4.1 Waiting for an event or other condition 68 Waiting for a condition with condition variables 69 Building a thread-safe queue with condition variables 71 4.2 Waiting for one-off events with futures 76 Returning values from background tasks 77 ■ Associating a task with a future 79 ■ Making (std::)promises 81 ■ Saving an exception for the future 83 ■ Waiting from multiple threads 85 4.3 Waiting with a time limit 87 Clocks 87 ■ Durations 88 ■ Time points 89 Functions that accept timeouts 91 CONTENTS xi 4.4 Using synchronization of operations to simplify code 93 Functional programming with futures 93 ■ Synchronizing operations with message passing 97 4.5 Summary 102 The C++ memory model and operations on atomic types 103 5 5.1 Memory model basics 104 Objects and memory locations 104 ■ Objects, memory locations, and concurrency 105 ■ Modification orders 106 5.2 Atomic operations and types in C++ 107 The standard atomic types 107 ■ Operations on std::atomic_flag 110 ■ Operations on std::atomic<bool> 112 Operations on std::atomic<T*>: pointer arithmetic 114 Operations on standard atomic integral types 116 The std::atomic<> primary class template 116 ■ Free functions for atomic operations 117 5.3 Synchronizing operations and enforcing ordering 119 The synchronizes-with relationship 121 ■ The happens-before relationship 122 ■ Memory ordering for atomic operations 123 Release sequences and synchronizes-with 141 ■ Fences 143 Ordering nonatomic operations with atomics 145 5.4 Summary 147 Designing lock-based concurrent data structures 148 6 6.1 What does it mean to design for concurrency? 149 Guidelines for designing data structures for concurrency 149 6.2 Lock-based concurrent data structures 151 A thread-safe stack using locks 151 ■ A thread-safe queue using locks and condition variables 154 ■ A thread-safe queue using fine-grained locks and condition variables 158 6.3 Designing more complex lock-based data structures 169 Writing a thread-safe lookup table using locks 169 ■ Writing a thread-safe list using locks 175 6.4 Summary 179 Designing lock-free concurrent data structures 180 7 7.1 Definitions and consequences 181 Types of nonblocking data structures 181 ■ Lock-free data structures 182 ■ Wait-free data structures 182 The pros and cons of lock-free data structures 183 xii CONTENTS 7.2 Examples of lock-free data structures 184 Writing a thread-safe stack without locks 184 ■ Stopping those pesky leaks: managing memory in lock-free data structures 188 Detecting nodes that can’t be reclaimed using hazard pointers 193 Detecting nodes in use with reference counting 200 ■ Applying the memory model to the lock-free stack 205 ■ Writing a thread-safe queue without locks 209 7.3 Guidelines for writing lock-free data structures 221 Guideline: use std::memory_order_seq_cst for prototyping 221 Guideline: use a lock-free memory reclamation scheme 221 Guideline: watch out for the ABA problem 222 Guideline: identify busy-wait loops and help the other thread 222 7.4 Summary 223 Designing concurrent code 224 8 8.1 Techniques for dividing work between threads 225 Dividing data between threads before processing begins 226 Dividing data recursively 227 ■ Dividing work by task type 231 8.2 Factors affecting the performance of concurrent code 233 How many processors? 234 ■ Data contention and cache ping-pong 235 ■ False sharing 237 ■ How close is your data? 238 ■ Oversubscription and excessive task switching 239 8.3 Designing data structures for multithreaded performance 239 Dividing array elements for complex operations 240 Data access patterns in other data structures 242 8.4 Additional considerations when designing for concurrency 243 Exception safety in parallel algorithms 243 ■ Scalability and Amdahl’s law 250 ■ Hiding latency with multiple threads 252 Improving responsiveness with concurrency 253 8.5 Designing concurrent code in practice 255 A parallel implementation of std::for_each 255 ■ A parallel implementation of std::find 257 ■ A parallel implementation of std::partial_sum 263 8.6 Summary 272 CONTENTS xiii Advanced thread management 273 9 9.1 Thread pools 274 The simplest possible thread pool 274 ■ Waiting for tasks submitted to a thread pool 276 ■ Tasks that wait for other tasks 280 ■ Avoiding contention on the work queue 283 Work stealing 284 9.2 Interrupting threads 289 Launching and interrupting another thread 289 ■ Detecting that a thread has been interrupted 291 ■ Interrupting a condition variable wait 291 ■ Interrupting a wait on std::condition_variable_any 294 ■ Interrupting other blocking calls 296 ■ Handling interruptions 297 Interrupting background tasks on application exit 298 9.3 Summary 299 Testing and debugging multithreaded applications 300 10 10.1 Types of concurrency-related bugs 301 Unwanted blocking 301 ■ Race conditions 302 10.2 Techniques for locating concurrency-related bugs 303 Reviewing code to locate potential bugs 303 Locating concurrency-related bugs by testing 305 Designing for testability 307 ■ Multithreaded testing techniques 308 ■ Structuring multithreaded test code 311 Testing the performance of multithreaded code 314 10.3 Summary 314 appendix A Brief reference for some C++11 language features 315 appendix B Brief comparison of concurrency libraries 340 appendix C A message-passing framework and complete ATM example 342 appendix D C++ Thread Library reference 360 resources 487 index 489 preface I encountered the concept of multithreaded code while working at my first job after I left college. We were writing a data processing application that had to populate a data- base with incoming data records. There was a lot of data, but each record was inde- pendent and required a reasonable amount of processing before it could be inserted into the database. To take full advantage of the power of our 10-CPU UltraSPARC, we ran the code in multiple threads, each thread processing its own set of incoming records. We wrote the code in C++, using POSIX threads, and made a fair number of mistakes—multithreading was new to all of us—but we got there in the end.
Recommended publications
  • Lecture 4 Resource Protection and Thread Safety
    Concurrency and Correctness – Resource Protection and TS Lecture 4 Resource Protection and Thread Safety 1 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS This Lecture The Goals: 1) Understand the problem of contention of resources within a parallel application 2) Become familiar with the design principles and techniques to cope with it 3) Appreciate the advantage of non-blocking techniques The outline: § Threads and data races: synchronisation issues § Useful design principles § Replication, atomics, transactions and locks § Higher level concrete solutions 2 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS Threads and Data Races: Synchronisation Issues 3 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS The Problem § Fastest way to share data: access the same memory region § One of the advantages of threads § Parallel memory access: delicate issue - race conditions § I.e. behaviour of the system depends on the sequence of events which are intrinsically asynchronous § Consequences, in order of increasing severity § Catastrophic terminations: segfaults, crashes § Non-reproducible, intermittent bugs § Apparently sane execution but data corruption: e.g. wrong value of a variable or of a result Operative definition: An entity which cannot run w/o issues linked to parallel execution is said to be thread-unsafe (the contrary is thread-safe) 4 Danilo Piparo – CERN, EP-SFT Concurrency and Correctness – Resource Protection and TS To Be Precise: Data Race Standard language rules, §1.10/4 and /21: • Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
    [Show full text]
  • Clojure, Given the Pun on Closure, Representing Anything Specific
    dynamic, functional programming for the JVM “It (the logo) was designed by my brother, Tom Hickey. “It I wanted to involve c (c#), l (lisp) and j (java). I don't think we ever really discussed the colors Once I came up with Clojure, given the pun on closure, representing anything specific. I always vaguely the available domains and vast emptiness of the thought of them as earth and sky.” - Rich Hickey googlespace, it was an easy decision..” - Rich Hickey Mark Volkmann [email protected] Functional Programming (FP) In the spirit of saying OO is is ... encapsulation, inheritance and polymorphism ... • Pure Functions • produce results that only depend on inputs, not any global state • do not have side effects such as Real applications need some changing global state, file I/O or database updates side effects, but they should be clearly identified and isolated. • First Class Functions • can be held in variables • can be passed to and returned from other functions • Higher Order Functions • functions that do one or both of these: • accept other functions as arguments and execute them zero or more times • return another function 2 ... FP is ... Closures • main use is to pass • special functions that retain access to variables a block of code that were in their scope when the closure was created to a function • Partial Application • ability to create new functions from existing ones that take fewer arguments • Currying • transforming a function of n arguments into a chain of n one argument functions • Continuations ability to save execution state and return to it later think browser • back button 3 ..
    [Show full text]
  • Thread Management for High Performance Database Systems - Design and Implementation
    Nr.: FIN-003-2018 Thread Management for High Performance Database Systems - Design and Implementation Robert Jendersie, Johannes Wuensche, Johann Wagner, Marten Wallewein-Eising, Marcus Pinnecke, Gunter Saake Arbeitsgruppe Database and Software Engineering Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg Nr.: FIN-003-2018 Thread Management for High Performance Database Systems - Design and Implementation Robert Jendersie, Johannes Wuensche, Johann Wagner, Marten Wallewein-Eising, Marcus Pinnecke, Gunter Saake Arbeitsgruppe Database and Software Engineering Technical report (Internet) Elektronische Zeitschriftenreihe der Fakultät für Informatik der Otto-von-Guericke-Universität Magdeburg ISSN 1869-5078 Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg Impressum (§ 5 TMG) Herausgeber: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Der Dekan Verantwortlich für diese Ausgabe: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Marcus Pinnecke Postfach 4120 39016 Magdeburg E-Mail: [email protected] http://www.cs.uni-magdeburg.de/Technical_reports.html Technical report (Internet) ISSN 1869-5078 Redaktionsschluss: 21.08.2018 Bezug: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Dekanat Thread Management for High Performance Database Systems - Design and Implementation Technical Report Robert Jendersie1, Johannes Wuensche2, Johann Wagner1, Marten Wallewein-Eising2, Marcus Pinnecke1, and Gunter Saake1 Database and Software Engineering Group, Otto-von-Guericke University
    [Show full text]
  • Concurrency & Parallel Programming Patterns
    Concurrency & Parallel programming patterns Evgeny Gavrin Outline 1. Concurrency vs Parallelism 2. Patterns by groups 3. Detailed overview of parallel patterns 4. Summary 5. Proposal for language Concurrency vs Parallelism ● Parallelism is the simultaneous execution of computations “doing lots of things at once” ● Concurrency is the composition of independently execution processes “dealing with lots of thing at once” Patterns by groups Architectural Patterns These patterns define the overall architecture for a program: ● Pipe-and-filter: view the program as filters (pipeline stages) connected by pipes (channels). Data flows through the filters to take input and transform into output. ● Agent and Repository: a collection of autonomous agents update state managed on their behalf in a central repository. ● Process control: the program is structured analogously to a process control pipeline with monitors and actuators moderating feedback loops and a pipeline of processing stages. ● Event based implicit invocation: The program is a collection of agents that post events they watch for and issue events for other agents. The architecture enforces a high level abstraction so invocation of an agent is implicit; i.e. not hardwired to a specific controlling agent. ● Model-view-controller: An architecture with a central model for the state of the program, a controller that manages the state and one or more agents that export views of the model appropriate to different uses of the model. ● Bulk Iterative (AKA bulk synchronous): A program that proceeds iteratively … update state, check against a termination condition, complete coordination, and proceed to the next iteration. ● Map reduce: the program is represented in terms of two classes of functions.
    [Show full text]
  • Model Checking Multithreaded Programs With
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository Model Checking Multithreaded Programs with Asynchronous Atomic Methods Koushik Sen and Mahesh Viswanathan Department of Computer Science, University of Illinois at Urbana-Champaign. {ksen,vmahesh}@uiuc.edu Abstract. In order to make multithreaded programming manageable, program- mers often follow a design principle where they break the problem into tasks which are then solved asynchronously and concurrently on different threads. This paper investigates the problem of model checking programs that follow this id- iom. We present a programming language SPL that encapsulates this design pat- tern. SPL extends simplified form of sequential Java to which we add the ca- pability of making asynchronous method invocations in addition to the standard synchronous method calls and the ability to execute asynchronous methods in threads atomically and concurrently. Our main result shows that the control state reachability problem for finite SPL programs is decidable. Therefore, such mul- tithreaded programs can be model checked using the counter-example guided abstraction-refinement framework. 1 Introduction Multithreaded programming is often used in software as it leads to reduced latency, improved response times of interactive applications, and more optimal use of process- ing power. Multithreaded programming also allows an application to progress even if one thread is blocked for an I/O operation. However, writing correct programs that use multiple threads is notoriously difficult, especially in the presence of a shared muta- ble memory. Since threads can interleave, there can be unintended interference through concurrent access of shared data and result in software errors due to data race and atom- icity violations.
    [Show full text]
  • Lecture 26: Creational Patterns
    Creational Patterns CSCI 4448/5448: Object-Oriented Analysis & Design Lecture 26 — 11/29/2012 © Kenneth M. Anderson, 2012 1 Goals of the Lecture • Cover material from Chapters 20-22 of the Textbook • Lessons from Design Patterns: Factories • Singleton Pattern • Object Pool Pattern • Also discuss • Builder Pattern • Lazy Instantiation © Kenneth M. Anderson, 2012 2 Pattern Classification • The Gang of Four classified patterns in three ways • The behavioral patterns are used to manage variation in behaviors (think Strategy pattern) • The structural patterns are useful to integrate existing code into new object-oriented designs (think Bridge) • The creational patterns are used to create objects • Abstract Factory, Builder, Factory Method, Prototype & Singleton © Kenneth M. Anderson, 2012 3 Factories & Their Role in OO Design • It is important to manage the creation of objects • Code that mixes object creation with the use of objects can become quickly non-cohesive • A system may have to deal with a variety of different contexts • with each context requiring a different set of objects • In design patterns, the context determines which concrete implementations need to be present © Kenneth M. Anderson, 2012 4 Factories & Their Role in OO Design • The code to determine the current context, and thus which objects to instantiate, can become complex • with many different conditional statements • If you mix this type of code with the use of the instantiated objects, your code becomes cluttered • often the use scenarios can happen in a few lines of code • if combined with creational code, the operational code gets buried behind the creational code © Kenneth M. Anderson, 2012 5 Factories provide Cohesion • The use of factories can address these issues • The conditional code can be hidden within them • pass in the parameters associated with the current context • and get back the objects you need for the situation • Then use those objects to get your work done • Factories concern themselves just with creation, letting your code focus on other things © Kenneth M.
    [Show full text]
  • Java Concurrency in Practice
    Java Concurrency in practice Chapters: 1,2, 3 & 4 Bjørn Christian Sebak ([email protected]) Karianne Berg ([email protected]) INF329 – Spring 2007 Chapter 1 - Introduction Brief history of concurrency Before OS, a computer executed a single program from start to finnish But running a single program at a time is an inefficient use of computer hardware Therefore all modern OS run multiple programs (in seperate processes) Brief history of concurrency (2) Factors for running multiple processes: Resource utilization: While one program waits for I/O, why not let another program run and avoid wasting CPU cycles? Fairness: Multiple users/programs might have equal claim of the computers resources. Avoid having single large programs „hog“ the machine. Convenience: Often desirable to create smaller programs that perform a single task (and coordinate them), than to have one large program that do ALL the tasks What is a thread? A „lightweight process“ - each process can have many threads Threads allow multiple streams of program flow to coexits in a single process. While a thread share process-wide resources like memory and files with other threads, they all have their own program counter, stack and local variables Benefits of threads 1) Exploiting multiple processors 2) Simplicity of modeling 3) Simplified handling of asynchronous events 4) More responsive user interfaces Benefits of threads (2) Exploiting multiple processors The processor industry is currently focusing on increasing number of cores on a single CPU rather than increasing clock speed. Well-designed programs with multiple threads can execute simultaneously on multiple processors, increasing resource utilization.
    [Show full text]
  • An Enhanced Thread Synchronization Mechanism for Java
    SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2001; 31:667–695 (DOI: 10.1002/spe.383) An enhanced thread synchronization mechanism for Java Hsin-Ta Chiao and Shyan-Ming Yuan∗,† Department of Computer and Information Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan SUMMARY The thread synchronization mechanism of Java is derived from Hoare’s monitor concept. In the authors’ view, however, it is over simplified and suffers the following four drawbacks. First, it belongs to a category of no-priority monitor, the design of which, as reported in the literature on concurrent programming, is not well rated. Second, it offers only one condition queue. Where more than one long-term synchronization event is required, this restriction both degrades performance and further complicates the ordering problems that a no-priority monitor presents. Third, it lacks the support for building more elaborate scheduling programs. Fourth, during nested monitor invocations, deadlock may occur. In this paper, we first analyze these drawbacks in depth before proceeding to present our own proposal, which is a new monitor-based thread synchronization mechanism that we term EMonitor. This mechanism is implemented solely by Java, thus avoiding the need for any modification to the underlying Java Virtual Machine. A preprocessor is employed to translate the EMonitor syntax into the pure Java codes that invoke the EMonitor class libraries. We conclude with a comparison of the performance of the two monitors and allow the experimental results to demonstrate that, in most cases, replacing the Java version with the EMonitor version for developing concurrent Java objects is perfectly feasible.
    [Show full text]
  • Designing for Performance: Concurrency and Parallelism COS 518: Computer Systems Fall 2015
    Designing for Performance: Concurrency and Parallelism COS 518: Computer Systems Fall 2015 Logan Stafman Adapted from slides by Mike Freedman 2 Definitions • Concurrency: – Execution of two or more tasks overlap in time. • Parallelism: – Execution of two or more tasks occurs simultaneous. Concurrency without 3 parallelism? • Parts of tasks interact with other subsystem – Network I/O, Disk I/O, GPU, ... • Other task can be scheduled while first waits on subsystem’s response Concurrency without parrallelism? Source: bjoor.me 5 Scheduling for fairness • On time-sharing system also want to schedule between tasks, even if one not blocking – Otherwise, certain tasks can keep processing – Leads to starvation of other tasks • Preemptive scheduling – Interrupt processing of tasks to process another task (why with tasks and not network packets?) • Many scheduling disciplines – FIFO, Shortest Remaining Time, Strict Priority, Round-Robin Preemptive Scheduling Source: embeddedlinux.org.cn Concurrency with 7 parallelism • Execute code concurrently across CPUs – Clusters – Cores • CPU parallelism different from distributed systems as ready availability to shared memory – Yet to avoid difference between parallelism b/w local and remote cores, many apps just use message passing between both (like HPC’s use of MPI) Symmetric Multiprocessors 8 (SMPs) Non-Uniform Memory Architectures 9 (NUMA) 10 Pros/Cons of NUMA • Pros Applications split between different processors can share memory close to hardware Reduced bus bandwidth usage • Cons Must ensure applications sharing memory are run on processors sharing memory 11 Forms of task parallelism • Processes – Isolated process address space – Higher overhead between switching processes • Threads – Concurrency within process – Shared address space – Three forms • Kernel threads (1:1) : Kernel support, can leverage hardware parallelism • User threads (N:1): Thread library in system runtime, fastest context switching, but cannot benefit from multi- threaded/proc hardware • Hybrid (M:N): Schedule M user threads on N kernel threads.
    [Show full text]
  • Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-Core Architectures
    Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-core Architectures Simon A. Berger and Alexandros Stamatakis The Exelixis Lab Department of Computer Science Technische Universitat¨ Munchen¨ Boltzmannstr. 3, D-85748 Garching b. Munchen,¨ Germany Email: [email protected], [email protected] WWW: http://wwwkramer.in.tum.de/exelixis/ time Abstract—Barrier performance for synchronizing threads master thread on current multi-core systems can be critical for scientific applications that traverse a large number of relatively small parallel regions, that is, that exhibit an unfavorable com- fork putation to synchronization ratio. By means of a synthetic and a real-world benchmark we assess 4 alternative barrier worker threads parallel region implementations on 7 current multi-core systems with 2 up to join 32 cores. We find that, barrier performance is application- and data-specific with respect to cache utilization, but that a rather fork na¨ıve lock-free barrier implementation yields good results across all applications and multi-core systems tested. We also worker threads parallel region assess distinct implementations of reduction operations that join are computed in conjunction with the barriers. The synthetic and real-world benchmarks are made available as open-source code for further testing. Keywords-barriers; multi-cores; threads; RAxML Figure 1. Classic fork-join paradigm. I. INTRODUCTION The performance of barriers for synchronizing threads on modern general-purpose multi-core systems is of vital In addition, we analyze the efficient implementation of importance for the efficiency of scientific codes. Barrier reduction operations (sums over the double values produced performance can become critical, if a scientific code exhibits by each for-loop iteration), that are frequently required a high number of relatively small (with respect to the in conjunction with barriers.
    [Show full text]
  • C/C++ Thread Safety Analysis
    C/C++ Thread Safety Analysis DeLesley Hutchins Aaron Ballman Dean Sutherland Google Inc. CERT/SEI Email: [email protected] Email: [email protected] Email: [email protected] Abstract—Writing multithreaded programs is hard. Static including MacOS, Linux, and Windows. The analysis is analysis tools can help developers by allowing threading policies currently implemented as a compiler warning. It has been to be formally specified and mechanically checked. They essen- deployed on a large scale at Google; all C++ code at Google is tially provide a static type system for threads, and can detect potential race conditions and deadlocks. now compiled with thread safety analysis enabled by default. This paper describes Clang Thread Safety Analysis, a tool II. OVERVIEW OF THE ANALYSIS which uses annotations to declare and enforce thread safety policies in C and C++ programs. Clang is a production-quality Thread safety analysis works very much like a type system C++ compiler which is available on most platforms, and the for multithreaded programs. It is based on theoretical work analysis can be enabled for any build with a simple warning on race-free type systems [3]. In addition to declaring the flag: −Wthread−safety. The analysis is deployed on a large scale at Google, where type of data ( int , float , etc.), the programmer may optionally it has provided sufficient value in practice to drive widespread declare how access to that data is controlled in a multithreaded voluntary adoption. Contrary to popular belief, the need for environment. annotations has not been a liability, and even confers some Clang thread safety analysis uses annotations to declare benefits with respect to software evolution and maintenance.
    [Show full text]
  • Migrating Thread-Based Intentional Concurrent Programming to a Task-Based Paradigm
    University of New Hampshire University of New Hampshire Scholars' Repository Master's Theses and Capstones Student Scholarship Fall 2016 MIGRATING THREAD-BASED INTENTIONAL CONCURRENT PROGRAMMING TO A TASK-BASED PARADIGM Seth Adam Hager University of New Hampshire, Durham Follow this and additional works at: https://scholars.unh.edu/thesis Recommended Citation Hager, Seth Adam, "MIGRATING THREAD-BASED INTENTIONAL CONCURRENT PROGRAMMING TO A TASK-BASED PARADIGM" (2016). Master's Theses and Capstones. 885. https://scholars.unh.edu/thesis/885 This Thesis is brought to you for free and open access by the Student Scholarship at University of New Hampshire Scholars' Repository. It has been accepted for inclusion in Master's Theses and Capstones by an authorized administrator of University of New Hampshire Scholars' Repository. For more information, please contact [email protected]. MIGRATING THREAD-BASED INTENTIONAL CONCURRENT PROGRAMMING TO A TASK-BASED PARADIGM BY Seth Hager B.M., University of Massachusetts Lowell, 2004 THESIS Submitted to the University of New Hampshire in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science September 2016 This thesis has been examined and approved in partial fulfillment of the requirements for the degree of Master of Science in Computer Science by: Thesis director, Philip J. Hatcher, Professor of Computer Science Michel H. Charpentier, Associate Professor of Computer Science R. Daniel Bergeron, Professor of Computer Science August 16th, 2016 Original approval signatures are on file with the University of New Hampshire Graduate School. DEDICATION For Lily and Jacob. iii ACKNOWLEDGMENTS I would like to thank the members of my committee for all of their time and effort.
    [Show full text]