ADPLUS, 307–308 Aggregate Method, 260 ASP.NET MVC Index View, 180 in .NET 4.0 APM, 183 Asynccontroller, 181 EAP, 181 Indexasyn

Total Page:16

File Type:pdf, Size:1020Kb

ADPLUS, 307–308 Aggregate Method, 260 ASP.NET MVC Index View, 180 in .NET 4.0 APM, 183 Asynccontroller, 181 EAP, 181 Indexasyn Index A page lifecycle, 166 synchronous WebForms implementation, 167 ADPLUS, 307–308 Async and await keywords Aggregate method, 260 asynchronous web page word count, 137 ASP.NET MVC CalculateMeaningOfLifeAsync method, 135 Index View, 180 catch block, 136 in .NET 4.0 coercing result, 135 APM, 183 keyword performance, 136 AsyncController, 181 mechanics EAP, 181 awaiter implementation, 146 IndexAsync implementation, 182 GetAwaiter method, 146 IndexCompleted method INotifyCompletion interface, 146 implementation, 182 TickTockAsync method, 145, 147 Multiple Async I/O Operations, 183 reserved word language, 135 in .NET 4.5, 184 return type method, 135 Synchronous Controller, 179 SynchronizationContext, 137 ASP.NET Web API synchronous structure, 134 in .NET 4.0, 186 synchronous web page word count, 136 in .NET 4.5, 187 task delay Synchronous, 185 synchronous back off and retry, 140 ASP.NET WebForms thread efficient back off and retry, 140 AuthorRepository implementation, 167 task effects, 137 GetAuthors method, 167 Task.WhenAll in .NET 4.0 downloading documents one by one, 141 Begin method implementation, 171 download many documents Complex Async Page, 174 asynchronous, 141 End method implementation, 172 error handling, 142 error handling, 175 Task.WhenAny IAsyncResult, 172 task code, 143 lifecycle, 170 UI updation, 143 marking, 170 winning task, 143 registering, 171 UI event handler, 133 in .NET 4.5 UI responsive goal, 134 async and await, 178 UI thread RegisterAsyncTask method, 177 API ways, 139 Task.WhenAll, 178 asynchronously loading web page TPL-Friendly Version, 177 and removing adverts, 138 321 ■ INDEX Async and await keywords (cont.) Asynchronous Programming Model (APM), 14 ConfigureAwait method, 139 Asynchronous UI LoadContent method, 139 BackgroundWorker, 118 RemoveAdverts method, 139 data binding, 120 synchronously loading web page EAP, 117 and removing adverts, 138 mechanics, 113 task continuation Send and Post, 116 method, 138 SynchronizationContext, 115 Asynchronous API task continuations, 116 accessing results, 15 threading model, 114 APM timers delegates, 20 Dispatcher Time, 128 NET Framework, 18 System.Windows, 128 Begin method, 14 UI thread, 129 Completion Notification, 18 Windows Forms, 121 Errors, 15 Windows Presentation Foundation Housekeeping, 17 data-binding layer, 123 IAsyncResult, 14 Dispatcher Time, 128 NET 1.0 Freezable Components, 129 abort method, 8 marshaling, 122 background threads, 10 observable collection, 122 Boolean flag, 9 ReaderWriterLockSlim, 124 Coordinating Threads (Join), 10 synchronization context, 122 interrupt method, 9 user-defined context, 124 JIT compiler, 9 WinRT, 125 memory model, 10 WinRT dispatcher Start Method, 8 CoreWindow.GetForCurrentThread() system.threading.thread, 7 .Dispatcher, 127 Thread class, 11 Priorities, 127 thread’s interaction, COM, 11 RunAsync method, 127 NET 1.1, 21 UI Thread, 127 NET 2.0 WPF Dispatcher closures, 23 APM pattern (BeginInvoke), 125 EAP pattern, 25 Application.Current.Dispatcher, 125 logical and physical Dispatcher Priorities, 126 separation, 22 UI thread, 126 ParameterizedThreadStart, 22 Asynchronous programming SynchronizationContext, 23 mechanisms NET 3.5 multiple machines, 2 lambda expressions, 27 multiple processes, 3 thread pool heuristics, 28 multiple threads, 3 NET 4.0 multiple treads, 3 thread pool heuristics, 30 thread scheduling, 3 thread pool queue, 28 thread-specific resources work-stealing queues, 30 register values, 4 Polling for Completion, 15 stack, 4 progress, 49 thread local storage (TLS), 4 system thread pool atomic state transition, 58 heuristics, 13 use of, 12 worker and I/O threads, 13 B thread pool Base class library (BCL), 165 APM, 14 Blocking collection ThreadPool.QueueUserWorkItem, 13 Add method, 108 timers, 13 bounded collection, 109 Waiting for Completion, 16 consuming enumerable, 111 322 ■ INDEX feature, 112 Custom scheduler graceful shutdown, 109 unit testing producer/consumer, 106, 108 adding members, 279 Take method, 108 advantages, 277 TPL, 107 GetScheduledTasks, 280 Building task-based combinator issues, 276 OrderByCompletion, 157 synchronization primitives, 277 SetException, 158 WhenAllOrFail task, 159 WhenAny D Out of the Box task, 156 DebugDiag WhenNext style method, 157 adding rules, 301 crash option, 302 native memory and handle leak, 307 C performance, 304 CalculateMeaningOfLifeAsync method, 134–135 executing rules, 307 Child task, 53 Debugging Async, 299 Community Technology Preview (CTP), 38 memory dump Component Object Model (COM), 11 bitness, 299 Concurrent data structures DebugDiag (see DebugDiag) API, 98 full dumps, 299 blocking collection (see Blocking collection) mini dumps, 299 ConcurrentBag<T>, 105 Task Manager, 300 concurrentdictionary<K,V> Debugging multithreaded applications, 283 (see ConcurrentDictionary<K,V>) data corruption, 283 ConcurrentQueue<T> and deadlocks, 284 ConcurrentStack<T>, 104 interactive debugger, 285 CsvRepository class in production, 285 eager loading, 89 race condition, 283 finer-grain locking, 92 runaway thread, 284 lazy loading (see LazyLoadData method) Visual Studio, 284–285 Lazy<T> breakpoint and threads, 285 CsvRepository class, 98 Call Stack window, 286 multiple creation, 96 Concurrency Visualizer, 297 non-thread-safe creation, 97 Locals, Autos, and Watch no thread safety, 96 Windows, 286 object creation function, 97 Parallel Stacks Window, 293 use of, 95 Parallel Tasks/Tasks Window, 290 NET collection, 98 Threads window, 289 nonblocking, 107 Dequeue method, 104 Queue<T> class, 99 Directory Walker, 105 ConcurrentDictionary<K,V> Double check locking, 94 Add method, 101 AddOrGet method, 102 AddOrUpdate method, 103 E GetorAdd method with Lazy<T>, 102 Enqueue method, 105 initial refactor, 101 Event-based asynchronous pattern (EAP) locking mechanics, 103 cancellation, 26 Map method, 99 error handling, 25 non-thread-safe CsvRepository, 100 multiple async requests, 26 ConcurrentExclusiveSchedulerPair, 263 WebClient, 25 concurrent scheduler, 264 exclusive scheduler, 264 nonthread-Safe, 263 F, G Reader/Writer Lock, 264 Foreground task, 150 323 ■ INDEX H fine-grained parallelism, 235 ForEach Loop, 247 HTTP pipeline, 166 For Loop associated tasks, 241 I, J, K cache memory, 244 calculate pi, 243 Interlocked functionality, 61 C# Loop, 240 Invoke method, 237–239 CPU utilization, 244 I/O Completion Ports (IOCP), 163 implementation, 240 Per-Task Local State, 245 L Per-Task Value, 245 synchronous version, 247 Large Object Heap (LOH), 76 thread safety, 246 LazyLoadData method Work per Iteration, 242 contention possibility, 91 goals, 234 double check locking, 94 high-level abstraction, 237 goal of, 90 imagine scene, 236 less synchronization, 93 implementation, 236 thread-safe lazy loading, 91 Invoke method, 237–239 LoadContent method, 139 nested loops CalculatePi, 253 M default partitioner, 254–255 Map method, 89, 99 delegate invocation, 253–254 Memory dumps implementations, 255 bitness, 299 single and equivalent loop, 252 DebugDiag (see DebugDiag) single-threaded version, 252 full dumps, 299 ParallelLoopState mini dumps, 299 AggregateException, 251 SOS, 310 loop termination, 248 commands, 317 ParallelLoopState.Break(), 248 deadlocks, 312 ParallelLoopState.Stop(), 249–250 examining threads, 311 properties, 250 loading extensions, 310 stop request, 250–251 PSSCOR, 319 PLINQ runaway thread, 315 Aggregate method, 260 SOSEX, 318 AsUnOrdered() method, 258 Task Manager, 300 configuration, 259 ADPLUS. ADPLUS extension methods, 256 WinDbg, 309 ForAll method, 259 Model, View, Controller (MVC), 180 IEnumerable<T>, 256 Model-View-View Model (MVVM), 120 input index, 258–259 MoveNext method, 148 LINQ query, 255–256 Multithreaded Apartment (MTA), 11, 268 partitioning, 257 Pre-TPL asynchronous, 233 System.Threading.Tasks, 237 N, O task and data-based parallelism, 235 Nested task, 53 task scheduler, 237 PSSCOR, 319 P, Q Parallel Framework Extensions (Pfx), 234 R Parallel programming Random error notification mechanism, 44 algorithms, 235–236 Razor, 180 CancellationToken, 237 RemoveAdverts method, 139 coarse-grained parallelism, 234 REpresentational State Transfer (REST), 185 324 ■ INDEX S asynchronous Hello World, 31 DataImporter class, 34 Server-side asynchronous factory-style approach, 32 ASP.NET MVC (see ASP.NET MVC) Import method, 33 ASP.NET Web API long-running task, 33 in .NET 4.0, 186 task body, Parameter, 34 in .NET 4.5, 187 thread pool thread, 32 Synchronous, 185 CTP, 38 ASP.NET WebForms (see ASP.NET WebForms) definition, 31 I/O designing task-based APIs, 45 IOCP, 164 error handling load test output, 164 AggregateException, 42 Overlapped, 163 exception handler, role of, 43 synchronous I/O, 162 Handle method, 43 natural parallelism, 161 ignoring errors, 43 WCF (see Windows Communication inner exceptions, 42 Foundation (WCF)) NET 1.1, 41 Single Threaded Apartments (STAs), 11, 268 parent/child relationship, 42 SlowConsume method, 199 try/catch, 41 Son of Strike, 310 underlying exceptions, 42 SOSEX, 318 unhandled exception, 41 SpinLock, 62 XML-based errors, 43 I/O-based tasks APM idiom, 40 T asynchronous operation, 40 Task CPU, 38 background threads, 150 DownloadPageAsync building task-based combinator (see Building method, 39 task-based combinator DownloadWebPage, 39 foreground task, 150 NET 4.5, 41 TaskCompletionSource<T>, 149 thread pool, 40 thread types, 150 NET 4.0, 43 TPL, 150 NET 4.5, 44 unit test method (see Unit test method) progress, 49 Task delay relationships synchronous back off and retry, 140 chaining tasks (continuations), 51 thread efficient back off
Recommended publications
  • Automatically Detecting ORM Performance Anti-Patterns on C# Applications Tuba Kaya Master's Thesis 23–09-2015
    Automatically Detecting ORM Performance Anti-Patterns on C# Applications Tuba Kaya Master's Thesis 23–09-2015 Master Software Engineering University of Amsterdam Supervisors: Dr. Raphael Poss (UvA), Dr. Giuseppe Procaccianti (VU), Prof. Dr. Patricia Lago (VU), Dr. Vadim Zaytsev (UvA) i Abstract In today’s world, Object Orientation is adopted for application development, while Relational Database Management Systems (RDBMS) are used as default on the database layer. Unlike the applications, RDBMSs are not object oriented. Object Relational Mapping (ORM) tools have been used extensively in the field to address object-relational impedance mismatch problem between these object oriented applications and relational databases. There is a strong belief in the industry and a few empirical studies which suggest that ORM tools can cause decreases in application performance. In this thesis project ORM performance anti-patterns for C# applications are listed. This list has not been provided by any other study before. Next to that, a design for an ORM tool agnostic framework to automatically detect these anti-patterns on C# applications is presented. An application is developed according to the designed framework. With its implementation of analysis on syntactic and semantic information of C# applications, this application provides a foundation for researchers wishing to work further in this area. ii Acknowledgement I would like to express my gratitude to my supervisor Dr. Raphael Poss for his excellent support through the learning process of this master thesis. Also, I like to thank Dr. Giuseppe Procaccianti and Prof. Patricia Lago for their excellent supervision and for providing me access to the Green Lab at Vrije Universiteit Amsterdam.
    [Show full text]
  • Thread Management for High Performance Database Systems - Design and Implementation
    Nr.: FIN-003-2018 Thread Management for High Performance Database Systems - Design and Implementation Robert Jendersie, Johannes Wuensche, Johann Wagner, Marten Wallewein-Eising, Marcus Pinnecke, Gunter Saake Arbeitsgruppe Database and Software Engineering Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg Nr.: FIN-003-2018 Thread Management for High Performance Database Systems - Design and Implementation Robert Jendersie, Johannes Wuensche, Johann Wagner, Marten Wallewein-Eising, Marcus Pinnecke, Gunter Saake Arbeitsgruppe Database and Software Engineering Technical report (Internet) Elektronische Zeitschriftenreihe der Fakultät für Informatik der Otto-von-Guericke-Universität Magdeburg ISSN 1869-5078 Fakultät für Informatik Otto-von-Guericke-Universität Magdeburg Impressum (§ 5 TMG) Herausgeber: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Der Dekan Verantwortlich für diese Ausgabe: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Marcus Pinnecke Postfach 4120 39016 Magdeburg E-Mail: [email protected] http://www.cs.uni-magdeburg.de/Technical_reports.html Technical report (Internet) ISSN 1869-5078 Redaktionsschluss: 21.08.2018 Bezug: Otto-von-Guericke-Universität Magdeburg Fakultät für Informatik Dekanat Thread Management for High Performance Database Systems - Design and Implementation Technical Report Robert Jendersie1, Johannes Wuensche2, Johann Wagner1, Marten Wallewein-Eising2, Marcus Pinnecke1, and Gunter Saake1 Database and Software Engineering Group, Otto-von-Guericke University
    [Show full text]
  • Concurrency & Parallel Programming Patterns
    Concurrency & Parallel programming patterns Evgeny Gavrin Outline 1. Concurrency vs Parallelism 2. Patterns by groups 3. Detailed overview of parallel patterns 4. Summary 5. Proposal for language Concurrency vs Parallelism ● Parallelism is the simultaneous execution of computations “doing lots of things at once” ● Concurrency is the composition of independently execution processes “dealing with lots of thing at once” Patterns by groups Architectural Patterns These patterns define the overall architecture for a program: ● Pipe-and-filter: view the program as filters (pipeline stages) connected by pipes (channels). Data flows through the filters to take input and transform into output. ● Agent and Repository: a collection of autonomous agents update state managed on their behalf in a central repository. ● Process control: the program is structured analogously to a process control pipeline with monitors and actuators moderating feedback loops and a pipeline of processing stages. ● Event based implicit invocation: The program is a collection of agents that post events they watch for and issue events for other agents. The architecture enforces a high level abstraction so invocation of an agent is implicit; i.e. not hardwired to a specific controlling agent. ● Model-view-controller: An architecture with a central model for the state of the program, a controller that manages the state and one or more agents that export views of the model appropriate to different uses of the model. ● Bulk Iterative (AKA bulk synchronous): A program that proceeds iteratively … update state, check against a termination condition, complete coordination, and proceed to the next iteration. ● Map reduce: the program is represented in terms of two classes of functions.
    [Show full text]
  • Experimental Algorithmics from Algorithm Desig
    Lecture Notes in Computer Science 2547 Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 3 Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo Rudolf Fleischer Bernard Moret Erik Meineche Schmidt (Eds.) Experimental Algorithmics From Algorithm Design to Robust and Efficient Software 13 Volume Editors Rudolf Fleischer Hong Kong University of Science and Technology Department of Computer Science Clear Water Bay, Kowloon, Hong Kong E-mail: [email protected] Bernard Moret University of New Mexico, Department of Computer Science Farris Engineering Bldg, Albuquerque, NM 87131-1386, USA E-mail: [email protected] Erik Meineche Schmidt University of Aarhus, Department of Computer Science Bld. 540, Ny Munkegade, 8000 Aarhus C, Denmark E-mail: [email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at <http://dnb.ddb.de> CR Subject Classification (1998): F.2.1-2, E.1, G.1-2 ISSN 0302-9743 ISBN 3-540-00346-0 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag.
    [Show full text]
  • Model Checking Multithreaded Programs With
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Illinois Digital Environment for Access to Learning and Scholarship Repository Model Checking Multithreaded Programs with Asynchronous Atomic Methods Koushik Sen and Mahesh Viswanathan Department of Computer Science, University of Illinois at Urbana-Champaign. {ksen,vmahesh}@uiuc.edu Abstract. In order to make multithreaded programming manageable, program- mers often follow a design principle where they break the problem into tasks which are then solved asynchronously and concurrently on different threads. This paper investigates the problem of model checking programs that follow this id- iom. We present a programming language SPL that encapsulates this design pat- tern. SPL extends simplified form of sequential Java to which we add the ca- pability of making asynchronous method invocations in addition to the standard synchronous method calls and the ability to execute asynchronous methods in threads atomically and concurrently. Our main result shows that the control state reachability problem for finite SPL programs is decidable. Therefore, such mul- tithreaded programs can be model checked using the counter-example guided abstraction-refinement framework. 1 Introduction Multithreaded programming is often used in software as it leads to reduced latency, improved response times of interactive applications, and more optimal use of process- ing power. Multithreaded programming also allows an application to progress even if one thread is blocked for an I/O operation. However, writing correct programs that use multiple threads is notoriously difficult, especially in the presence of a shared muta- ble memory. Since threads can interleave, there can be unintended interference through concurrent access of shared data and result in software errors due to data race and atom- icity violations.
    [Show full text]
  • Lecture 26: Creational Patterns
    Creational Patterns CSCI 4448/5448: Object-Oriented Analysis & Design Lecture 26 — 11/29/2012 © Kenneth M. Anderson, 2012 1 Goals of the Lecture • Cover material from Chapters 20-22 of the Textbook • Lessons from Design Patterns: Factories • Singleton Pattern • Object Pool Pattern • Also discuss • Builder Pattern • Lazy Instantiation © Kenneth M. Anderson, 2012 2 Pattern Classification • The Gang of Four classified patterns in three ways • The behavioral patterns are used to manage variation in behaviors (think Strategy pattern) • The structural patterns are useful to integrate existing code into new object-oriented designs (think Bridge) • The creational patterns are used to create objects • Abstract Factory, Builder, Factory Method, Prototype & Singleton © Kenneth M. Anderson, 2012 3 Factories & Their Role in OO Design • It is important to manage the creation of objects • Code that mixes object creation with the use of objects can become quickly non-cohesive • A system may have to deal with a variety of different contexts • with each context requiring a different set of objects • In design patterns, the context determines which concrete implementations need to be present © Kenneth M. Anderson, 2012 4 Factories & Their Role in OO Design • The code to determine the current context, and thus which objects to instantiate, can become complex • with many different conditional statements • If you mix this type of code with the use of the instantiated objects, your code becomes cluttered • often the use scenarios can happen in a few lines of code • if combined with creational code, the operational code gets buried behind the creational code © Kenneth M. Anderson, 2012 5 Factories provide Cohesion • The use of factories can address these issues • The conditional code can be hidden within them • pass in the parameters associated with the current context • and get back the objects you need for the situation • Then use those objects to get your work done • Factories concern themselves just with creation, letting your code focus on other things © Kenneth M.
    [Show full text]
  • An Enhanced Thread Synchronization Mechanism for Java
    SOFTWARE—PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2001; 31:667–695 (DOI: 10.1002/spe.383) An enhanced thread synchronization mechanism for Java Hsin-Ta Chiao and Shyan-Ming Yuan∗,† Department of Computer and Information Science, National Chiao Tung University, 1001 Ta Hsueh Road, Hsinchu 300, Taiwan SUMMARY The thread synchronization mechanism of Java is derived from Hoare’s monitor concept. In the authors’ view, however, it is over simplified and suffers the following four drawbacks. First, it belongs to a category of no-priority monitor, the design of which, as reported in the literature on concurrent programming, is not well rated. Second, it offers only one condition queue. Where more than one long-term synchronization event is required, this restriction both degrades performance and further complicates the ordering problems that a no-priority monitor presents. Third, it lacks the support for building more elaborate scheduling programs. Fourth, during nested monitor invocations, deadlock may occur. In this paper, we first analyze these drawbacks in depth before proceeding to present our own proposal, which is a new monitor-based thread synchronization mechanism that we term EMonitor. This mechanism is implemented solely by Java, thus avoiding the need for any modification to the underlying Java Virtual Machine. A preprocessor is employed to translate the EMonitor syntax into the pure Java codes that invoke the EMonitor class libraries. We conclude with a comparison of the performance of the two monitors and allow the experimental results to demonstrate that, in most cases, replacing the Java version with the EMonitor version for developing concurrent Java objects is perfectly feasible.
    [Show full text]
  • Designing for Performance: Concurrency and Parallelism COS 518: Computer Systems Fall 2015
    Designing for Performance: Concurrency and Parallelism COS 518: Computer Systems Fall 2015 Logan Stafman Adapted from slides by Mike Freedman 2 Definitions • Concurrency: – Execution of two or more tasks overlap in time. • Parallelism: – Execution of two or more tasks occurs simultaneous. Concurrency without 3 parallelism? • Parts of tasks interact with other subsystem – Network I/O, Disk I/O, GPU, ... • Other task can be scheduled while first waits on subsystem’s response Concurrency without parrallelism? Source: bjoor.me 5 Scheduling for fairness • On time-sharing system also want to schedule between tasks, even if one not blocking – Otherwise, certain tasks can keep processing – Leads to starvation of other tasks • Preemptive scheduling – Interrupt processing of tasks to process another task (why with tasks and not network packets?) • Many scheduling disciplines – FIFO, Shortest Remaining Time, Strict Priority, Round-Robin Preemptive Scheduling Source: embeddedlinux.org.cn Concurrency with 7 parallelism • Execute code concurrently across CPUs – Clusters – Cores • CPU parallelism different from distributed systems as ready availability to shared memory – Yet to avoid difference between parallelism b/w local and remote cores, many apps just use message passing between both (like HPC’s use of MPI) Symmetric Multiprocessors 8 (SMPs) Non-Uniform Memory Architectures 9 (NUMA) 10 Pros/Cons of NUMA • Pros Applications split between different processors can share memory close to hardware Reduced bus bandwidth usage • Cons Must ensure applications sharing memory are run on processors sharing memory 11 Forms of task parallelism • Processes – Isolated process address space – Higher overhead between switching processes • Threads – Concurrency within process – Shared address space – Three forms • Kernel threads (1:1) : Kernel support, can leverage hardware parallelism • User threads (N:1): Thread library in system runtime, fastest context switching, but cannot benefit from multi- threaded/proc hardware • Hybrid (M:N): Schedule M user threads on N kernel threads.
    [Show full text]
  • Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-Core Architectures
    Assessment of Barrier Implementations for Fine-Grain Parallel Regions on Current Multi-core Architectures Simon A. Berger and Alexandros Stamatakis The Exelixis Lab Department of Computer Science Technische Universitat¨ Munchen¨ Boltzmannstr. 3, D-85748 Garching b. Munchen,¨ Germany Email: [email protected], [email protected] WWW: http://wwwkramer.in.tum.de/exelixis/ time Abstract—Barrier performance for synchronizing threads master thread on current multi-core systems can be critical for scientific applications that traverse a large number of relatively small parallel regions, that is, that exhibit an unfavorable com- fork putation to synchronization ratio. By means of a synthetic and a real-world benchmark we assess 4 alternative barrier worker threads parallel region implementations on 7 current multi-core systems with 2 up to join 32 cores. We find that, barrier performance is application- and data-specific with respect to cache utilization, but that a rather fork na¨ıve lock-free barrier implementation yields good results across all applications and multi-core systems tested. We also worker threads parallel region assess distinct implementations of reduction operations that join are computed in conjunction with the barriers. The synthetic and real-world benchmarks are made available as open-source code for further testing. Keywords-barriers; multi-cores; threads; RAxML Figure 1. Classic fork-join paradigm. I. INTRODUCTION The performance of barriers for synchronizing threads on modern general-purpose multi-core systems is of vital In addition, we analyze the efficient implementation of importance for the efficiency of scientific codes. Barrier reduction operations (sums over the double values produced performance can become critical, if a scientific code exhibits by each for-loop iteration), that are frequently required a high number of relatively small (with respect to the in conjunction with barriers.
    [Show full text]
  • Migrating Thread-Based Intentional Concurrent Programming to a Task-Based Paradigm
    University of New Hampshire University of New Hampshire Scholars' Repository Master's Theses and Capstones Student Scholarship Fall 2016 MIGRATING THREAD-BASED INTENTIONAL CONCURRENT PROGRAMMING TO A TASK-BASED PARADIGM Seth Adam Hager University of New Hampshire, Durham Follow this and additional works at: https://scholars.unh.edu/thesis Recommended Citation Hager, Seth Adam, "MIGRATING THREAD-BASED INTENTIONAL CONCURRENT PROGRAMMING TO A TASK-BASED PARADIGM" (2016). Master's Theses and Capstones. 885. https://scholars.unh.edu/thesis/885 This Thesis is brought to you for free and open access by the Student Scholarship at University of New Hampshire Scholars' Repository. It has been accepted for inclusion in Master's Theses and Capstones by an authorized administrator of University of New Hampshire Scholars' Repository. For more information, please contact [email protected]. MIGRATING THREAD-BASED INTENTIONAL CONCURRENT PROGRAMMING TO A TASK-BASED PARADIGM BY Seth Hager B.M., University of Massachusetts Lowell, 2004 THESIS Submitted to the University of New Hampshire in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science September 2016 This thesis has been examined and approved in partial fulfillment of the requirements for the degree of Master of Science in Computer Science by: Thesis director, Philip J. Hatcher, Professor of Computer Science Michel H. Charpentier, Associate Professor of Computer Science R. Daniel Bergeron, Professor of Computer Science August 16th, 2016 Original approval signatures are on file with the University of New Hampshire Graduate School. DEDICATION For Lily and Jacob. iii ACKNOWLEDGMENTS I would like to thank the members of my committee for all of their time and effort.
    [Show full text]
  • Intrinsically-Typed Mechanized Semantics for Session Types
    Intrinsically-Typed Mechanized Semantics for Session Types Peter Thiemann University of Freiburg Germany [email protected] ABSTRACT There are synchronous and asynchronous variants [Gay and Vasconcelos Session types have emerged as a powerful paradigm for structur- 2010; Lindley and Morris 2015; Wadler 2012], or variants with addi- ing communication-based programs. They guarantee type sound- tional typing features (e.g., dependent types [Toninho and Yoshida ness and session fidelity for concurrent programs with sophisti- 2018], context-free [Padovani 2017; Thiemann and Vasconcelos 2016]), cated communication protocols. As type soundness proofs for lan- as well as extensions to deal with more than two participants in a guages with session types are tedious and technically involved, it protocol [Honda et al. 2008]. They also found their way into func- is rare to see mechanized soundness proofs for these systems. tional and object oriented languages [Dezani-Ciancaglini et al. 2009]. We present an executable intrinsically typed small-step seman- Generally, session types are inspired by linear type systems and tics for a realistic functional session type calculus. The calculus in- systems with typestate as channels change their type at each com- cludes linearity, recursion, and recursive sessions with subtyping. munication operation and must thus be handled linearly. Some Asynchronous communication is modeled with an encoding. variants are directly connected to linear logic via the Curry-Howard The semantics is implemented in Agda as an intrinsically typed, correspondence [Caires and Pfenning 2010; Toninho et al. 2013; Wadler interruptible CEK machine. This implementation proves type preser- 2012]. vation and a particular notion of progress by construction.
    [Show full text]
  • A Comparison of Parallel Design Patterns for Game Development
    Faculty of TeknikTechnol ochogy & samh Soci¨alleety ComDatavetenskapputer Science och medieteknik Graduate Thesis 15Examensarbete hp, elementary 15 h¨ogskolepo¨ang, grundniv˚a A Comparison of Parallel Design Patterns for GameExamensarbetets Development titel En Jämförelse av Parallella Designmönster för Spelutveckling Examensarbetets titel p˚aengelska (p˚asvenska om arbetet ¨ar skrivet p˚aengelska) Robin Andblom F¨orfattaresCarl Sjöberg namn var och en p˚aegen rad, i bokstavsordning efter efternamn Thesis: Bachelor 180 hp Main �ield: Computer Science Program: Game Development Supervisor: Carl Johan Gribel Date of �inal seminar: 2018-01-15 Examiner: Carl-Magnus Olsson Eventuell bild Examen: kandidatexamen 180 hp Handledare: XYZ Huvudomr˚ade: datavetenskap Examinator: ABC Program: (t.ex. systemutvecklare) Datum f¨or slutseminarium: (t.ex. 2018-05-30) AComparisonofParallelDesignPatternsforGame Development Robin Andblom, Carl Sj¨oberg Malm¨o, Sweden Abstract As processor performance capabilities can only be increased through the use of a multicore architecture, software needs to be developed to utilize the par- allelism o↵ered by the additional cores. Especially game developers need to seize this opportunity to save cycles and decrease the general rendering time. One of the existing advances towards this potential has been the creation of multithreaded game engines that take advantage of the additional processing units. In such engines, di↵erent branches of the game loop are parallelized. However, the specifics of the parallel design patterns used are not outlined. Neither are any ideas of how to combine these patterns proposed. These missing factors are addressed in this article, to provide a guideline for when to use which one of two parallel design patterns; fork-join and pipeline par- allelism.
    [Show full text]