Stclang: State Thread Composition As a Foundation for Monadic Dataflow Parallelism Sebastian Ertel∗ Justus Adam Norman A

Total Page:16

File Type:pdf, Size:1020Kb

Stclang: State Thread Composition As a Foundation for Monadic Dataflow Parallelism Sebastian Ertel∗ Justus Adam Norman A STCLang: State Thread Composition as a Foundation for Monadic Dataflow Parallelism Sebastian Ertel∗ Justus Adam Norman A. Rink Dresden Research Lab Chair for Compiler Construction Chair for Compiler Construction Huawei Technologies Technische Universität Dresden Technische Universität Dresden Dresden, Germany Dresden, Germany Dresden, Germany [email protected] [email protected] [email protected] Andrés Goens Jeronimo Castrillon Chair for Compiler Construction Chair for Compiler Construction Technische Universität Dresden Technische Universität Dresden Dresden, Germany Dresden, Germany [email protected] [email protected] Abstract using monad-par and LVars to expose parallelism explicitly Dataflow execution models are used to build highly scalable and reach the same level of performance, showing that our parallel systems. A programming model that targets parallel programming model successfully extracts parallelism that dataflow execution must answer the following question: How is present in an algorithm. Further evaluation shows that can parallelism between two dependent nodes in a dataflow smap is expressive enough to implement parallel reductions graph be exploited? This is difficult when the dataflow lan- and our programming model resolves short-comings of the guage or programming model is implemented by a monad, stream-based programming model for current state-of-the- as is common in the functional community, since express- art big data processing systems. ing dependence between nodes by a monadic bind suggests CCS Concepts • Software and its engineering → Func- sequential execution. Even in monadic constructs that explic- tional languages. itly separate state from computation, problems arise due to the need to reason about opaquely defined state. Specifically, Keywords parallel programming, functional languages, par- when abstractions of the chosen programming model do not titioned state enable adequate reasoning about state, it is difficult to detect ACM Reference Format: parallelism between composed stateful computations. Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, In this paper, we propose a programming model that en- and Jeronimo Castrillon. 2019. STCLang: State Thread Composition ables the composition of stateful computations and still ex- as a Foundation for Monadic Dataflow Parallelism. In Proceedings poses opportunities for parallelization. We also introduce of the 12th ACM SIGPLAN International Haskell Symposium (Haskell smap, a higher-order function that can exploit parallelism in ’19), August 22–23, 2019, Berlin, Germany. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/3331545.3342600 stateful computations. We present an implementation of our programming model and smap in Haskell and show that basic 1 Introduction concepts from functional reactive programming can be built on top of our programming model with little effort. We com- Dataflow graphs are an abstraction for scalable concurrent pare these implementations to a state-of-the-art approach computations [12]. Although the concept has a strong func- tional foundation, it has received little attention in the func- ∗Work done while at TU Dresden. tional programming commmunity. Specifically, to the best of our knowledge, no proposal exists that combines the explicit Permission to make digital or hard copies of all or part of this work for concurrency of a dataflow graph with the simple and elegant personal or classroom use is granted without fee provided that copies monad type class. In fact, the sequential evaluation semantics are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights of monadic actions seem to prevent this combination. for components of this work owned by others than the author(s) must Dataflow is the de-facto standard for systems that achieve be honored. Abstracting with credit is permitted. To copy otherwise, or scalability by exploiting parallelism. It is the foundation on republish, to post on servers or to redistribute to lists, requires prior specific which data management systems implement highly parallel permission and/or a fee. Request permissions from [email protected]. execution engines. So-called Extract-Transform-Load (ETL) Haskell ’19, August 22–23, 2019, Berlin, Germany tools bring data into a data warehouse [10]. Query execution © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. over that data is dataflow-based18 [ ], and the engines that in- ACM ISBN 978-1-4503-6813-1/19/08...$15.00 tegrate data that continuously arrives in the form of streams https://doi.org/10.1145/3331545.3342600 build a dataflow graph pipeline [6, 44]. Similar streaming Haskell ’19, August 22–23, 2019, Berlin, Germany Sebastian Ertel, Justus Adam, Norman A. Rink, Andrés Goens, and Jeronimo Castrillon signals (>>=) :: Monad m => m a -> (a -> m b) -> m b ewma αhigh alarm (a) Monadic Composition smap_ / > threshold if’ dn rn−1 f g ewma αlow () (b) Dataflow Composition Figure 2. Dataflow Graph of the earthquake detection pro- Figure 1. The sequential nature of the monadic composition gram. and the concurrency in a pipeline of a dataflow graph. only on data retrieved via I/O operations but also for data pipelines and dataflow graphs underpin signal processing that resides in memory, e.g., a normal list. The main contri- STCLang and programs for embedded systems [27, 36]. They are also bution of in Section3 is a higher-order function used to build scalable server architectures [15, 42], for fine- called smap that is similar to the well-known map function. It grained parallelism in big data systems [13], for optimizing supports stateful computations, but unlike the state monad’s smap I/O in micro services [16], and concurrent UIs [3,9]. mapM, introduces concurrency into the computation of smap Studying dataflow in the monadic context is interesting the result. Our construction introduces pipeline paral- both from a theoretical and a practical perspective. On the lelism for stateful computations and exploits data parallelism one hand, the dataflow abstraction is so close to functional for stateless nodes in the graph. Afterwards, we show that STCLang programming that many optimizations for functional lan- can support the FRP abstractions (Section4). STCLang guages may directly apply to dataflow [1]. For example, fu- In Section5, we evaluate our implementation sion is studied in the context of both functional languages against other Haskell libraries for parallelism, in particu- monad-par LVars and dataflow systems [17, 20]. On the other hand, a monadic lar and its extension to . We show that STCLang composition of the dataflow graph offers improvement over does not incur additional overheads and that its per- monad-par its manual construction. Manually constructing a dataflow formance on standard benchmarks is on par with LVars STCLang graph does not scale well to larger applications, which is why and . We show that programs are just as easy monad-par many established dataflow approaches provide a GUI rather to write as programs but enhance the program- than a language [22]. Implementing dataflow as a monad has ming model with implicit parallelism, pipeline parallelism smap the benefit of integrating nicely with the rest of the program, and state. In combination with state threads, our func- for example with the monad transformers framework. tion is even expressive enough to implement parallel reduc- Monadic composition and concurrency seem to be at odds tions. In our evaluation, we also show that the transforma- with each other. Consider the monadic composition, i.e., tions in the GHC harmonize with our programming model bind, as defined in Figure 1a. An implementation has to apply and even find data parallelism inside state threads. Asafi- the function passed in as a second parameter to the result of nal contribution, we ported a state-of-the-art benchmark STCLang the first parameter. In Figure 1b, we show the corresponding for big data processing engines to Haskell/ . The re- STCLang composition of the two nodes in a dataflow graph. An arc sulting code removes short-comings of the current facilitates the data dependency and denotes data flowing stream-based programming model used in these systems. We also report preliminary results that identify weaknesses of from node f to node g. In this pipeline, applications f dn monad-par’s scheduler and open up new research opportu- and g rn−1 are independent and, as such, concurrent. Hughes noticed this problem as well and suggested that nities for the functional community. In Section6 we review stream processors be implemented using arrows, a gener- related work, and we conclude in Section7. alization of monads [21]. Arrows became the dominating abstraction for composing dataflow computations in the con- 2 Intuition and Motivation text of functional reactive programming (FRP) [8]. However, We now motivate and introduce our programming model. we believe monads to be simpler and better suited to com- posing functionality than arrows in most cases. 2.1 Use Case: Earthquake Detection In this paper, we propose STCLang (State Thread Composi- To motivate STCLang, we study an application for earthquake tion Language), a monad for dataflow computations that en- detection that served as a motivation for Flask, a language ables a parallel execution at runtime, without targeting FRP for programming sensor networks [29]. Figure2
Recommended publications
  • Asp Net Core Request Pipeline
    Asp Net Core Request Pipeline Is Royce always cowled and multilobate when achromatize some wall very instanter and lawlessly? Hobbyless and flustered brazensVladamir regressively? cocainizing her intangibles tiptoed or cluster advertently. Is Patric fuzzed or paintable when cogs some theocracy Or gray may choose to berry the request. Json files etc and community of response body is not inject all low maintenance, we will be same coin. Another through components that asp and cto of. How extensible it crashes, just by one of startup class controller. Do the extended with really very complex scenarios when he enjoys sharing it ever was to delete models first user makes for node. Even add to implement captcha in startup class to the same concept is the reason you want to. Let us a pipeline to any incoming request processing, firefox still create an output? For an app to build a cup of. Razor pages uses handler methods to deal of incoming HTTP request. Ask how the above mentioned in last middleware, the come to tell who are ready simply obsolete at asp and options. Have asp and asp net core request pipeline but will not be mapped to pipeline io threads to work both. The internet and when creating sawdust in snippets or improvements that by one description be a request pipeline branching. Help editing this article, ordinary code inside of hosting infrastructure asp and actions before, we issue was not. The body to deal with minimal footprint to entity framework of needed loans for each middleware will take a docker, that receive criticism.
    [Show full text]
  • APPLYING MODEL-VIEW-CONTROLLER (MVC) in DESIGN and DEVELOPMENT of INFORMATION SYSTEMS an Example of Smart Assistive Script Breakdown in an E-Business Application
    APPLYING MODEL-VIEW-CONTROLLER (MVC) IN DESIGN AND DEVELOPMENT OF INFORMATION SYSTEMS An Example of Smart Assistive Script Breakdown in an e-Business Application Andreas Holzinger, Karl Heinz Struggl Institute of Information Systems and Computer Media (IICM), TU Graz, Graz, Austria Matjaž Debevc Faculty of Electrical Engineering and Computer Science, University of Maribor, Maribor, Slovenia Keywords: Information Systems, Software Design Patterns, Model-view-controller (MVC), Script Breakdown, Film Production. Abstract: Information systems are supporting professionals in all areas of e-Business. In this paper we concentrate on our experiences in the design and development of information systems for the use in film production processes. Professionals working in this area are neither computer experts, nor interested in spending much time for information systems. Consequently, to provide a useful, useable and enjoyable application the system must be extremely suited to the requirements and demands of those professionals. One of the most important tasks at the beginning of a film production is to break down the movie script into its elements and aspects, and create a solid estimate of production costs based on the resulting breakdown data. Several film production software applications provide interfaces to support this task. However, most attempts suffer from numerous usability deficiencies. As a result, many film producers still use script printouts and textmarkers to highlight script elements, and transfer the data manually into their film management software. This paper presents a novel approach for unobtrusive and efficient script breakdown using a new way of breaking down text into its relevant elements. We demonstrate how the implementation of this interface benefits from employing the Model-View-Controller (MVC) as underlying software design paradigm in terms of both software development confidence and user satisfaction.
    [Show full text]
  • Let's Get Functional
    5 LET’S GET FUNCTIONAL I’ve mentioned several times that F# is a functional language, but as you’ve learned from previous chapters you can build rich applications in F# without using any functional techniques. Does that mean that F# isn’t really a functional language? No. F# is a general-purpose, multi paradigm language that allows you to program in the style most suited to your task. It is considered a functional-first lan- guage, meaning that its constructs encourage a functional style. In other words, when developing in F# you should favor functional approaches whenever possible and switch to other styles as appropriate. In this chapter, we’ll see what functional programming really is and how functions in F# differ from those in other languages. Once we’ve estab- lished that foundation, we’ll explore several data types commonly used with functional programming and take a brief side trip into lazy evaluation. The Book of F# © 2014 by Dave Fancher What Is Functional Programming? Functional programming takes a fundamentally different approach toward developing software than object-oriented programming. While object-oriented programming is primarily concerned with managing an ever-changing system state, functional programming emphasizes immutability and the application of deterministic functions. This difference drastically changes the way you build software, because in object-oriented programming you’re mostly concerned with defining classes (or structs), whereas in functional programming your focus is on defining functions with particular emphasis on their input and output. F# is an impure functional language where data is immutable by default, though you can still define mutable data or cause other side effects in your functions.
    [Show full text]
  • BIOVIA Pipeline Pilot System Requirements
    SYSTEM REQUIREMENTS PIPELINE PILOT 2020 Copyright Notice ©2019 Dassault Systèmes. All rights reserved. 3DEXPERIENCE, the Compass icon and the 3DS logo, CATIA, SOLIDWORKS, ENOVIA, DELMIA, SIMULIA, GEOVIA, EXALEAD, 3DVIA, 3DSWYM, BIOVIA, NETVIBES, IFWE and 3DEXCITE, are commercial trademarks or registered trademarks of Dassault Systèmes, a French "société européenne" (Versailles Commercial Register # B 322 306 440), or its subsidiaries in the U.S. and/or other countries. All other trademarks are owned by their respective owners. Use of any Dassault Systèmes or its subsidiaries trademarks is subject to their express written approval. Acknowledgments and References To print photographs or files of computational results (figures and/or data) obtained by using Dassault Systèmes software, acknowledge the source in an appropriate format. For example: "Computational results were obtained by using Dassault Systèmes BIOVIA software programs. Pipeline Pilot Server was used to perform the calculations and to generate the graphical results." Dassault Systèmes may grant permission to republish or reprint its copyrighted materials. Requests should be submitted to Dassault Systèmes Customer Support, either by visiting https://www.3ds.com/support/ and clicking Call us or Submit a request, or by writing to: Dassault Systèmes Customer Support 10, Rue Marcel Dassault 78140 Vélizy-Villacoublay FRANCE Contents About This Document 1 Definitions 1 Additional Information 1 Dassault Systèmes Support Resources 1 Pipeline Pilot Server Requirements 2 Minimum Hardware
    [Show full text]
  • Process Scheduling
    PROCESS SCHEDULING ANIRUDH JAYAKUMAR LAST TIME • Build a customized Linux Kernel from source • System call implementation • Interrupts and Interrupt Handlers TODAY’S SESSION • Process Management • Process Scheduling PROCESSES • “ a program in execution” • An active program with related resources (instructions and data) • Short lived ( “pwd” executed from terminal) or long-lived (SSH service running as a background process) • A.K.A tasks – the kernel’s point of view • Fundamental abstraction in Unix THREADS • Objects of activity within the process • One or more threads within a process • Asynchronous execution • Each thread includes a unique PC, process stack, and set of processor registers • Kernel schedules individual threads, not processes • tasks are Linux threads (a.k.a kernel threads) TASK REPRESENTATION • The kernel maintains info about each process in a process descriptor, of type task_struct • See include/linux/sched.h • Each task descriptor contains info such as run-state of process, address space, list of open files, process priority etc • The kernel stores the list of processes in a circular doubly linked list called the task list. TASK LIST • struct list_head tasks; • init the "mother of all processes” – statically allocated • extern struct task_struct init_task; • for_each_process() - iterates over the entire task list • next_task() - returns the next task in the list PROCESS STATE • TASK_RUNNING: running or on a run-queue waiting to run • TASK_INTERRUPTIBLE: sleeping, waiting for some event to happen; awakes prematurely if it receives a signal • TASK_UNINTERRUPTIBLE: identical to TASK_INTERRUPTIBLE except it ignores signals • TASK_ZOMBIE: The task has terminated, but its parent has not yet issued a wait4(). The task's process descriptor must remain in case the parent wants to access it.
    [Show full text]
  • Fast Linux I/O in the Unix Tradition
    — preprint only: final version will appear in OSR, July 2008 — PipesFS: Fast Linux I/O in the Unix Tradition Willem de Bruijn Herbert Bos Vrije Universiteit Amsterdam Vrije Universiteit Amsterdam and NICTA [email protected] [email protected] ABSTRACT ory wall” [26]). To improve throughput, it is now essential This paper presents PipesFS, an I/O architecture for Linux to avoid all unnecessary memory operations. Inefficient I/O 2.6 that increases I/O throughput and adds support for het- primitives exacerbate the effects of the memory wall by in- erogeneous parallel processors by (1) collapsing many I/O curring unnecessary copying and context switching, and as interfaces onto one: the Unix pipeline, (2) increasing pipe a result of these cache misses. efficiency and (3) exploiting pipeline modularity to spread computation across all available processors. Contribution. We revisit the Unix pipeline as a generic model for streaming I/O, but modify it to reduce overhead, PipesFS extends the pipeline model to kernel I/O and com- extend it to integrate kernel processing and complement it municates with applications through a Linux virtual filesys- with support for anycore execution. We link kernel and tem (VFS), where directory nodes represent operations and userspace processing through a virtual filesystem, PipesFS, pipe nodes export live kernel data. Users can thus interact that presents kernel operations as directories and live data as with kernel I/O through existing calls like mkdir, tools like pipes. This solution avoids new interfaces and so unlocks all grep, most languages and even shell scripts.
    [Show full text]
  • Making a Faster Curry with Extensional Types
    Making a Faster Curry with Extensional Types Paul Downen Simon Peyton Jones Zachary Sullivan Microsoft Research Zena M. Ariola Cambridge, UK University of Oregon [email protected] Eugene, Oregon, USA [email protected] [email protected] [email protected] Abstract 1 Introduction Curried functions apparently take one argument at a time, Consider these two function definitions: which is slow. So optimizing compilers for higher-order lan- guages invariably have some mechanism for working around f1 = λx: let z = h x x in λy:e y z currying by passing several arguments at once, as many as f = λx:λy: let z = h x x in e y z the function can handle, which is known as its arity. But 2 such mechanisms are often ad-hoc, and do not work at all in higher-order functions. We show how extensional, call- It is highly desirable for an optimizing compiler to η ex- by-name functions have the correct behavior for directly pand f1 into f2. The function f1 takes only a single argu- expressing the arity of curried functions. And these exten- ment before returning a heap-allocated function closure; sional functions can stand side-by-side with functions native then that closure must subsequently be called by passing the to practical programming languages, which do not use call- second argument. In contrast, f2 can take both arguments by-name evaluation. Integrating call-by-name with other at once, without constructing an intermediate closure, and evaluation strategies in the same intermediate language ex- this can make a huge difference to run-time performance in presses the arity of a function in its type and gives a princi- practice [Marlow and Peyton Jones 2004].
    [Show full text]
  • Scheduling: Introduction
    7 Scheduling: Introduction By now low-level mechanisms of running processes (e.g., context switch- ing) should be clear; if they are not, go back a chapter or two, and read the description of how that stuff works again. However, we have yet to un- derstand the high-level policies that an OS scheduler employs. We will now do just that, presenting a series of scheduling policies (sometimes called disciplines) that various smart and hard-working people have de- veloped over the years. The origins of scheduling, in fact, predate computer systems; early approaches were taken from the field of operations management and ap- plied to computers. This reality should be no surprise: assembly lines and many other human endeavors also require scheduling, and many of the same concerns exist therein, including a laser-like desire for efficiency. And thus, our problem: THE CRUX: HOW TO DEVELOP SCHEDULING POLICY How should we develop a basic framework for thinking about scheduling policies? What are the key assumptions? What metrics are important? What basic approaches have been used in the earliest of com- puter systems? 7.1 Workload Assumptions Before getting into the range of possible policies, let us first make a number of simplifying assumptions about the processes running in the system, sometimes collectively called the workload. Determining the workload is a critical part of building policies, and the more you know about workload, the more fine-tuned your policy can be. The workload assumptions we make here are mostly unrealistic, but that is alright (for now), because we will relax them as we go, and even- tually develop what we will refer to as ..
    [Show full text]
  • Scheduling Light-Weight Parallelism in Artcop
    Scheduling Light-Weight Parallelism in ArTCoP J. Berthold1,A.AlZain2,andH.-W.Loidl3 1 Fachbereich Mathematik und Informatik Philipps-Universit¨at Marburg, D-35032 Marburg, Germany [email protected] 2 School of Mathematical and Computer Sciences Heriot-Watt University, Edinburgh EH14 4AS, Scotland [email protected] 3 Institut f¨ur Informatik, Ludwig-Maximilians-Universit¨at M¨unchen, Germany [email protected] Abstract. We present the design and prototype implementation of the scheduling component in ArTCoP (architecture transparent control of parallelism), a novel run-time environment (RTE) for parallel execution of high-level languages. A key feature of ArTCoP is its support for deep process and memory hierarchies, shown in the scheduler by supporting light-weight threads. To realise a system with easily exchangeable components, the system defines a micro-kernel, providing basic infrastructure, such as garbage collection. All complex RTE operations, including the handling of parallelism, are implemented at a separate system level. By choosing Concurrent Haskell as high-level system language, we obtain a prototype in the form of an executable specification that is easier to maintain and more flexible than con- ventional RTEs. We demonstrate the flexibility of this approach by presenting implementations of a scheduler for light-weight threads in ArTCoP, based on GHC Version 6.6. Keywords: Parallel computation, functional programming, scheduling. 1 Introduction In trying to exploit the computational power of parallel architectures ranging from multi-core machines to large-scale computational Grids, we are currently developing a new parallel runtime environment, ArTCoP, for executing parallel Haskell code on such complex, hierarchical architectures.
    [Show full text]
  • Scheduling Weakly Consistent C Concurrency for Reconfigurable
    SUBMISSION TO IEEE TRANS. ON COMPUTERS 1 Scheduling Weakly Consistent C Concurrency for Reconfigurable Hardware Nadesh Ramanathan, John Wickerson, Member, IEEE, and George A. Constantinides Senior Member, IEEE Abstract—Lock-free algorithms, in which threads synchronise These reorderings are invisible in a single-threaded context, not via coarse-grained mutual exclusion but via fine-grained but in a multi-threaded context, they can introduce unexpected atomic operations (‘atomics’), have been shown empirically to behaviours. For instance, if another thread is simultaneously be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how these writing to z, then reordering two instructions above may algorithms can be compiled from C to reconfigurable hardware introduce the behaviour where x is assigned the latest value via high-level synthesis (HLS). but y gets an old one.1 We focus on the scheduling problem, in which software The implication of this is not that existing HLS tools are instructions are assigned to hardware clock cycles. We first wrong; these optimisations can only introduce new behaviours show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction when the code already exhibits a race condition, and races reorderings that, though sound in a single-threaded context, are deemed a programming error in C [2, §5.1.2.4]. Rather, demonstrably cause erroneous results when synthesising multi- the implication is that if these memory accesses are upgraded threaded programs. We then show that correct behaviour can be to become atomic (and hence allowed to race), then existing restored by imposing additional intra-thread constraints among scheduling constraints are insufficient.
    [Show full text]
  • Pipecheck: 3D Scanning Solution Pipeline Integrity Assessment
    NDT SOLUTIONS 3D3D SCANNING SCANNING SOLUTION SOLUTION FORFOR PIPELINE PIPELINE INTEGRITY INTEGRITY ASSESSMENTASSESSMENT CODE-COMPLIANT CODE-COMPLIANT Pipeline operators and NDT service companies are facing an increasing pressure from regulation authorities and environmentalist groups to guarantee pipeline networks integrity, while wanting to keep maintenance costs as low as possible. Field crews are experiencing pressure to complete inspection as quickly as possible so the excavation site can be backfilled, and the pipeline, put back into service in the shortest time. Thus, the use of surface inspection tools that are reliable, efficient and user-friendly is of paramount importance. Creaform has developed the PipecheckTM solution for pipeline integrity assessment. The solution includes a HandySCAN 3DTM portable handheld scanner and the Pipecheck software. Thanks to this unique 3D scanning technology and innovative software, surface inspection has been reinvented completely! RELIABLE. EFFICIENT. EASY. INTRODUCING PIPECHECK. PIPECHECK SOFTWARE MODULES CORROSION MECHANICAL DAMAGE Pipecheck’s pipeline corrosion software module offers very fast and As its name states, this software module has been developed reliable data processing that generates instant, on-site results. specifically for pipeline mechanical damage analysis. This module In comparison with traditional measurement methods, this software features numerous key functionalities that increase dent understanding offers accuracy and repeatability that are beyond expectations. and facilitate the decision making process. ADVANCED FUNCTIONALITIES CORROSION IN ILI CORRELATION MECHANICAL DAMAGE STRAIGHTENING OPERATION TOOL Being able to separate material loss depth Traditional methods for depth measurement The ILI Correlation tool is used to correlate from a mechanical damage deformation is (pit gauge) on field bends can’t be used in-line inspection data with the Pipecheck 3D no longer a dream.
    [Show full text]
  • Fork-Join Pattern
    Fork-Join Pattern Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 9 – Fork-Join Pattern Outline q What is the fork-join concept? q What is the fork-join pattern? q Programming Model Support for Fork-Join q Recursive Implementation of Map q Choosing Base Cases q Load Balancing q Cache Locality and Cache-Oblivious Algorithms q Implementing Scan with Fork-Join q Applying Fork-Join to Recurrences Introduction to Parallel Computing, University of Oregon, IPCC Lecture 9 – Fork-Join Pattern 2 Fork-Join Philosophy When you come to a fork in the road, take it. (Yogi Bera, 1925 –) Introduction to Parallel Computing, University of Oregon, IPCC Lecture 9 – Fork-Join Pattern 3 Fork-Join Concept q Fork-Join is a fundamental way (primitive) of expressing concurrency within a computation q Fork is called by a (logical) thread (parent) to create a new (logical) thread (child) of concurrency ❍ Parent continues after the Fork operation ❍ Child begins operation separate from the parent ❍ Fork creates concurrency q Join is called by both the parent and child ❍ Child calls Join after it finishes (implicitly on exit) ❍ Parent waits until child joins (continues afterwards) ❍ Join removes concurrency because child exits Introduction to Parallel Computing, University of Oregon, IPCC Lecture 9 – Fork-Join Pattern 4 Fork-Join Concurrency Semantics q Fork-Join is a concurrency control mechanism ❍ Fork increases concurrency ❍ Join decreases concurrency q Fork-Join dependency rules ❍ A parent must join with its forked children
    [Show full text]