A Modern Openmp Implementation Leveraging HPX, an Asynchronous Many-Task System

Total Page:16

File Type:pdf, Size:1020Kb

A Modern Openmp Implementation Leveraging HPX, an Asynchronous Many-Task System An Introduction to hpxMP – A Modern OpenMP Implementation Leveraging HPX, An Asynchronous Many-Task System Tianyi Zhang Shahrzad Shirzad Patrick Diehl [email protected] [email protected] [email protected] Center for Computation and Center for Computation and Center for Computation and Technology, LSU Technology, LSU Technology, LSU R. Tohid Weile Wei Hartmut Kaiser [email protected] [email protected] [email protected] Center for Computation and Center for Computation and Center for Computation and Technology, LSU Technology, LSU Technology, LSU ABSTRACT CCS CONCEPTS Asynchronous Many-task (AMT) runtime systems have gained in- • Computing methodologies → Parallel programming lan- creasing acceptance in the HPC community due to the performance guages; improvements offered by fine-grained tasking runtime systems. At the same time, C++ standardization efforts are focused on creating KEYWORDS higher-level interfaces able to replace OpenMP or OpenACC in OpenMP, hpxMP, Asynchronous Many-task Systems, C++, clang, modern C++ codes. These higher level functions have been adopted gcc, HPX in standards conforming runtime systems such as HPX, giving users the ability to simply utilize fork-join parallelism in their own codes. ACM Reference Format: Despite innovations in runtime systems and standardization efforts Tianyi Zhang, Shahrzad Shirzad, Patrick Diehl, R. Tohid, Weile Wei, and Hart- users face enormous challenges porting legacy applications. Not mut Kaiser. 2019. An Introduction to hpxMP – A Modern OpenMP Imple- only must users port their own codes, but often users rely on highly mentation Leveraging HPX, An Asynchronous Many-Task System. In Inter- optimized libraries such as BLAS and LAPACK which use OpenMP national Workshop on OpenCL (IWOCL’19), May 13–15,2019, Boston, MA, USA. for parallization. Current efforts to create smooth migration paths ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3318170.3318191 have struggled with these challenges, especially as the threading systems of AMT libraries often compete with the treading system 1 INTRODUCTION of OpenMP. To overcome these issues, our team has developed hpxMP, an The Open Multi-Processing (OpenMP) [OpenMP Consortium 2018] implementation of the OpenMP standard, which utilizes the under- standard is widely used for shared memory multiprocessing and is lying AMT system to schedule and manage tasks. This approach often coupled with the Message Passing Interface (MPI) as MPI+X leverages the C++ interfaces exposed by HPX and allows users to [Bader 2016] for distributed programming. Here, MPI is used for execute their applications on an AMT system without changing the inter-node communication and X, in this case, OpenMP, for their code. the intra-node parallelism. Nowadays, Asynchronous Many Task In this work, we compare hpxMP with Clang’s OpenMP library (AMT) run time systems are emerging as a new parallel program- with four linear algebra benchmarks of the Blaze C++ library. While ming paradigm. These systems are able to take advantage of fine hpxMP is often not able to reach the same performance, we demon- grained tasks to better distribute work across a machine. The C++ strate viability for providing a smooth migration for applications standard library for concurrency and parallelism (HPX) [Heller et al. but have to be extended to benefit from a more general task based 2017] is one example of an AMT runtime system. The HPX API programming model. conforms to the concurrency abstractions introduced by the C++ 11 standard [C++ Standards Committee 2011] and to the parallel algorithms introduced by the C++ 17 standard [C++ Standards Com- mittee 2017]. These algorithms are similar to the concepts exposed by OpenMP, e.g. #pragma omp parallel for. Permission to make digital or hard copies of all or part of this work for personal or AMT runtime systems are becoming increasing used for HPC classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation applications as they have shown superior scalability and parallel on the first page. Copyrights for components of this work owned by others than the efficiency for certain classes of applications (see [Heller etal. 2018]). author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission At the same time, the C++ standardization efforts currently focus and/or a fee. Request permissions from [email protected]. on creating higher-level interfaces usable to replace OpenMP (and IWOCL’19, May 13–15,2019, Boston, MA, USA other #pragma-based parallelization solutions like OpenACC) for © 2019 Copyright held by the owner/author(s). Publication rights licensed to the modern C++ codes. This effort is driven by the lack of integration Association for Computing Machinery. ACM ISBN 978-1-4503-6230-6/19/05...$15.00 of #pragma based solutions into the C++ language, especially the https://doi.org/10.1145/3318170.3318191 language’s type system. IWOCL’19, May 13–15,2019, Boston, MA, USA Zhang, et al. Both trends call for a migration path which will allow existing languages [Leiserson 2009] are general-purpose programming lan- applications that directly or indirectly use OpenMP to port potions guages which target multi-thread parallelism by extending C/C++ of the code an AMT paradigm. This is especially critical for ap- with parallel loop constructs and a fork-join model. Kokkos [Ed- plications which use highly optimized OpenMP libraries where wards et al. 2014] is a package which exposes multiple parallel it is not feasible to re-implement all the provided functionalities programming models such as CUDA and pthreads through a com- into a new paradigm. Examples of these libraries are linear algebra mon C++ interface. Open Multi-Processing (OpenMP) [Dagum and libraries [Anderson et al. 1999; Blackford et al. 2002; Galassi et al. Menon 1998] is a widely accepted standard used by application 2002; Guennebaud et al. 2010; Iglberger et al. 2012; Rupp et al. 2016; and library developers. OpenMP exposes fork-join model through Sanderson and Curtin 2016; Wang et al. 2013], such as the Intel compiler directives and supports tasks. math kernel library or the Eigen library. The OpenMP 3.0 standard1 introduced the concept of task-based For these reasons, it is beneficial to combine both technologies, programming. The OpenMP 3.1 standard2 added task optimization AMT+OpenMP, where the distributed communication is handled by within the tasking model. The OpenMP 4.0 standard3 offers user the AMT runtime system and the intra-node parallelism is handled a more graceful and efficient way to handle task synchronization by OpenMP or even combine OpenMP and the parallel algorithms by introducing depend tasks and task group. The OpenMP 4.5 stan- on a shared memory system. Currently, these two scenarios are not dard4 was released with its support for a new task-loop construct, possible, since the light-weighted thread implementations usually which is providing a way to separate loops into tasks. The most present in AMTs interferes with the system threads utilized by the recent, the OpenMP 5.0 standard5 supports detached tasks. available OpenMP implementations. There have also been efforts to integrate multi-thread parallelism To overcome this issue, hpxMP, an implementation of the OpenMP with distributed programming models. Charm++ has integrated standard [OpenMP Consortium 2018] that utilizes HPX’s light- OpenMP into its programming model to improve load balance [PPL weight threads is presented in this paper. The hpxMP library is 2011]. However, most of the research in this area has focused on compatible with the clang and gcc compiler and replaces their MPI+X [Bader 2016; Barrett et al. 2015] model. shared library implementations of OpenMP. hpxMP implements all of the OpenMP runtime functionalities using HPX’s lightweight 3 C++ STANDARD LIBRARY FOR threads instead of system threads. CONCURRENCY AND PARALLELISM (HPX) Blaze, an open source, high performance C++ math library, This section briefly describes the features of the C++ Standard Li- [Iglberger et al. 2012] is selected as an example library to vali- brary for Concurrency and Parallelism (HPX) [Heller et al. 2017] date our implementation. Blazemark, the benchmark suite available which are utilized in the implementation of hpxMP in the Sec- with Blaze, is used to run some common benchmarks. The mea- tion 5. HPX facilitates distributed parallel applications of any scale sured results are compared against the same benchmarks run on and uses fine-grain multi-threading and asynchronous communi- top of the compiler-supplied OpenMP runtime. This paper focuses cations [Khatami et al. 2016]. HPX exposes an API that strictly on the implementation details of hpxMP as a proof of concept im- adheres the current ISO C++ standards [Heller et al. 2017]. This plementing OpenMP with an AMT runtime system. We use HPX approach to standardization encourages programmers to write code as an exemplary AMT system that already exposes all the required that is high portability in heterogeneous systems [Copik and Kaiser functionalities. 2017]. The paper is structured as follows: Section 2 emphasizes the HPX is highly interoperable in distributed parallel applications, related work. Section 3 provides a brief introduction to HPX’s con- such that, it can be used on inter-node communication setting of cepts and Section 4 a brief introduction to OpenMP’s concepts a single machine as well as intra-node parallelization scenario of utilized in the implementation in Section 5. The benchmarks com- hundreds of thousands of nodes [Wagle et al. 2018]. The future paring hpxMP with clang’s OpenMP implementation are shown in functionality implemented in HPX permits threads to continually Section 6. Finally, we draw our conclusions in Section 7. finish their computation without waiting for their previous steps to be completed which can achieve a maximum possible level of parallelization in time and space [Khatami et al.
Recommended publications
  • Integration of CUDA Processing Within the C++ Library for Parallelism and Concurrency (HPX)
    1 Integration of CUDA Processing within the C++ library for parallelism and concurrency (HPX) Patrick Diehl, Madhavan Seshadri, Thomas Heller, Hartmut Kaiser Abstract—Experience shows that on today’s high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for distributed applications is to guarantee high utilization of all available resources, including local or remote acceleration cards on a cluster while fully using all the available CPU resources and the integration of the GPU work into the overall programming model. For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU device and the asynchronous invocation of CUDA kernels on this data. Both operations are well integrated into the general programming model of HPX which allows to seamlessly overlap any GPU operation with work on the main cores. Any user defined CUDA kernel can be launched on any (local or remote) GPU device available to the distributed application. We present asynchronous implementations for the data transfers and kernel launches for CUDA code as part of a HPX asynchronous execution graph. Using this approach we can combine all remotely and locally available acceleration cards on a cluster to utilize its full performance capabilities.
    [Show full text]
  • Analysis of GPU-Libraries for Rapid Prototyping Database Operations
    Analysis of GPU-Libraries for Rapid Prototyping Database Operations A look into library support for database operations Harish Kumar Harihara Subramanian Bala Gurumurthy Gabriel Campero Durand David Broneske Gunter Saake University of Magdeburg Magdeburg, Germany fi[email protected] Abstract—Using GPUs for query processing is still an ongoing Usually, library operators for GPUs are either written by research in the database community due to the increasing hardware experts [12] or are available out of the box by device heterogeneity of GPUs and their capabilities (e.g., their newest vendors [13]. Overall, we found more than 40 libraries for selling point: tensor cores). Hence, many researchers develop optimal operator implementations for a specific device generation GPUs each packing a set of operators commonly used in one involving tedious operator tuning by hand. On the other hand, or more domains. The benefits of those libraries is that they are there is a growing availability of GPU libraries that provide constantly updated and tested to support newer GPU versions optimized operators for manifold applications. However, the and their predefined interfaces offer high portability as well as question arises how mature these libraries are and whether they faster development time compared to handwritten operators. are fit to replace handwritten operator implementations not only w.r.t. implementation effort and portability, but also in terms of This makes them a perfect match for many commercial performance. database systems, which can rely on GPU libraries to imple- In this paper, we investigate various general-purpose libraries ment well performing database operators. Some example for that are both portable and easy to use for arbitrary GPUs such systems are: SQreamDB using Thrust [14], BlazingDB in order to test their production readiness on the example of using cuDF [15], Brytlyt using the Torch library [16].
    [Show full text]
  • Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing
    MATRIX: Bench - Benchmarking the state-of- the-art Task Execution Frameworks of Many- Task Computing Thomas Dubucq, Tony Forlini, Virgile Landeiro Dos Reis, and Isabelle Santos Illinois Institute of Technology, Chicago, IL, USA {tdubucq, tforlini, vlandeir, isantos1}@hawk.iit.edu Stanford University. Finally HPX is a general purpose C++ Abstract — Technology trends indicate that exascale systems will runtime system for parallel and distributed applications of any have billion-way parallelism, and each node will have about three scale developed by Louisiana State University and Staple is a orders of magnitude more intra-node parallelism than today’s framework for developing parallel programs from Texas A&M. peta-scale systems. The majority of current runtime systems focus a great deal of effort on optimizing the inter-node parallelism by MATRIX is a many-task computing job scheduling system maximizing the bandwidth and minimizing the latency of the use [3]. There are many resource managing systems aimed towards of interconnection networks and storage, but suffer from the lack data-intensive applications. Furthermore, distributed task of scalable solutions to expose the intra-node parallelism. Many- scheduling in many-task computing is a problem that has been task computing (MTC) is a distributed fine-grained paradigm that considered by many research teams. In particular, Charm++ [4], aims to address the challenges of managing parallelism and Legion [5], Swift [6], [10], Spark [1][2], HPX [12], STAPL [13] locality of exascale systems. MTC applications are typically structured as direct acyclic graphs of loosely coupled short tasks and MATRIX [11] offer solutions to this problem and have with explicit input/output data dependencies.
    [Show full text]
  • HPX – a Task Based Programming Model in a Global Address Space
    HPX – A Task Based Programming Model in a Global Address Space Hartmut Kaiser1 Thomas Heller2 Bryce Adelstein-Lelbach1 [email protected] [email protected] [email protected] Adrian Serio1 Dietmar Fey2 [email protected] [email protected] 1Center for Computation and 2Computer Science 3, Technology, Computer Architectures, Louisiana State University, Friedrich-Alexander-University, Louisiana, U.S.A. Erlangen, Germany ABSTRACT 1. INTRODUCTION The significant increase in complexity of Exascale platforms due to Todays programming models, languages, and related technologies energy-constrained, billion-way parallelism, with major changes to that have sustained High Performance Computing (HPC) appli- processor and memory architecture, requires new energy-efficient cation software development for the past decade are facing ma- and resilient programming techniques that are portable across mul- jor problems when it comes to programmability and performance tiple future generations of machines. We believe that guarantee- portability of future systems. The significant increase in complex- ing adequate scalability, programmability, performance portability, ity of new platforms due to energy constrains, increasing paral- resilience, and energy efficiency requires a fundamentally new ap- lelism and major changes to processor and memory architecture, proach, combined with a transition path for existing scientific ap- requires advanced programming techniques that are portable across plications, to fully explore the rewards of todays and tomorrows multiple future generations of machines [1]. systems. We present HPX – a parallel runtime system which ex- tends the C++11/14 standard to facilitate distributed operations, A fundamentally new approach is required to address these chal- enable fine-grained constraint based parallelism, and support run- lenges.
    [Show full text]
  • Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++
    Concurrent Cilk: Lazy Promotion from Tasks to Threads in C/C++ Christopher S. Zakian, Timothy A. K. Zakian Abhishek Kulkarni, Buddhika Chamith, and Ryan R. Newton Indiana University - Bloomington, fczakian, tzakian, adkulkar, budkahaw, [email protected] Abstract. Library and language support for scheduling non-blocking tasks has greatly improved, as have lightweight (user) threading packages. How- ever, there is a significant gap between the two developments. In previous work|and in today's software packages|lightweight thread creation incurs much larger overheads than tasking libraries, even on tasks that end up never blocking. This limitation can be removed. To that end, we describe an extension to the Intel Cilk Plus runtime system, Concurrent Cilk, where tasks are lazily promoted to threads. Concurrent Cilk removes the overhead of thread creation on threads which end up calling no blocking operations, and is the first system to do so for C/C++ with legacy support (standard calling conventions and stack representations). We demonstrate that Concurrent Cilk adds negligible overhead to existing Cilk programs, while its promoted threads remain more efficient than OS threads in terms of context-switch overhead and blocking communication. Further, it enables development of blocking data structures that create non-fork-join dependence graphs|which can expose more parallelism, and better supports data-driven computations waiting on results from remote devices. 1 Introduction Both task-parallelism [1, 11, 13, 15] and lightweight threading [20] libraries have become popular for different kinds of applications. The key difference between a task and a thread is that threads may block|for example when performing IO|and then resume again.
    [Show full text]
  • Scalable and High Performance MPI Design for Very Large
    SCALABLE AND HIGH-PERFORMANCE MPI DESIGN FOR VERY LARGE INFINIBAND CLUSTERS DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University By Sayantan Sur, B. Tech ***** The Ohio State University 2007 Dissertation Committee: Approved by Prof. D. K. Panda, Adviser Prof. P. Sadayappan Adviser Prof. S. Parthasarathy Graduate Program in Computer Science and Engineering c Copyright by Sayantan Sur 2007 ABSTRACT In the past decade, rapid advances have taken place in the field of computer and network design enabling us to connect thousands of computers together to form high-performance clusters. These clusters are used to solve computationally challenging scientific problems. The Message Passing Interface (MPI) is a popular model to write applications for these clusters. There are a vast array of scientific applications which use MPI on clusters. As the applications operate on larger and more complex data, the size of the compute clusters is scaling higher and higher. Thus, in order to enable the best performance to these scientific applications, it is very critical for the design of the MPI libraries be extremely scalable and high-performance. InfiniBand is a cluster interconnect which is based on open-standards and gaining rapid acceptance. This dissertation presents novel designs based on the new features offered by InfiniBand, in order to design scalable and high-performance MPI libraries for large-scale clusters with tens-of-thousands of nodes. Methods developed in this dissertation have been applied towards reduction in overall resource consumption, increased overlap of computa- tion and communication, improved performance of collective operations and finally designing application-level benchmarks to make efficient use of modern networking technology.
    [Show full text]
  • DNA Scaffolds Enable Efficient and Tunable Functionalization of Biomaterials for Immune Cell Modulation
    ARTICLES https://doi.org/10.1038/s41565-020-00813-z DNA scaffolds enable efficient and tunable functionalization of biomaterials for immune cell modulation Xiao Huang 1,2, Jasper Z. Williams 2,3, Ryan Chang1, Zhongbo Li1, Cassandra E. Burnett4, Rogelio Hernandez-Lopez2,3, Initha Setiady1, Eric Gai1, David M. Patterson5, Wei Yu2,3, Kole T. Roybal2,4, Wendell A. Lim2,3 ✉ and Tejal A. Desai 1,2 ✉ Biomaterials can improve the safety and presentation of therapeutic agents for effective immunotherapy, and a high level of control over surface functionalization is essential for immune cell modulation. Here, we developed biocompatible immune cell-engaging particles (ICEp) that use synthetic short DNA as scaffolds for efficient and tunable protein loading. To improve the safety of chimeric antigen receptor (CAR) T cell therapies, micrometre-sized ICEp were injected intratumorally to present a priming signal for systemically administered AND-gate CAR-T cells. Locally retained ICEp presenting a high density of priming antigens activated CAR T cells, driving local tumour clearance while sparing uninjected tumours in immunodeficient mice. The ratiometric control of costimulatory ligands (anti-CD3 and anti-CD28 antibodies) and the surface presentation of a cytokine (IL-2) on ICEp were shown to substantially impact human primary T cell activation phenotypes. This modular and versatile bio- material functionalization platform can provide new opportunities for immunotherapies. mmune cell therapies have shown great potential for cancer the specific antibody27, and therefore a high and controllable den- treatment, but still face major challenges of efficacy and safety sity of surface biomolecules could allow for precise control of ligand for therapeutic applications1–3, for example, chimeric antigen binding and cell modulation.
    [Show full text]
  • A Cilk Implementation of LTE Base-Station Up- Link on the Tilepro64 Processor
    A Cilk Implementation of LTE Base-station up- link on the TILEPro64 Processor Master of Science Thesis in the Programme of Integrated Electronic System Design HAO LI Chalmers University of Technology Department of Computer Science and Engineering Goteborg,¨ Sweden, November 2011 The Author grants to Chalmers University of Technology and University of Gothen- burg the non-exclusive right to publish the Work electronically and in a non-commercial purpose make it accessible on the Internet. The Author warrants that he/she is the au- thor to the Work, and warrants that the Work does not contain text, pictures or other material that violates copyright law. The Author shall, when transferring the rights of the Work to a third party (for ex- ample a publisher or a company), acknowledge the third party about this agreement. If the Author has signed a copyright agreement with a third party regarding the Work, the Author warrants hereby that he/she has obtained any necessary permission from this third party to let Chalmers University of Technology and University of Gothenburg store the Work electronically and make it accessible on the Internet. A Cilk Implementation of LTE Base-station uplink on the TILEPro64 Processor HAO LI © HAO LI, November 2011 Examiner: Sally A. McKee Chalmers University of Technology Department of Computer Science and Engineering SE-412 96 Goteborg¨ Sweden Telephone +46 (0)31-255 1668 Supervisor: Magnus Sjalander¨ Chalmers University of Technology Department of Computer Science and Engineering SE-412 96 Goteborg¨ Sweden Telephone +46 (0)31-772 1075 Cover: A Cilk Implementation of LTE Base-station uplink on the TILEPro64 Processor.
    [Show full text]
  • Exascale Computing Project -- Software
    Exascale Computing Project -- Software Paul Messina, ECP Director Stephen Lee, ECP Deputy Director ASCAC Meeting, Arlington, VA Crystal City Marriott April 19, 2017 www.ExascaleProject.org ECP scope and goals Develop applications Partner with vendors to tackle a broad to develop computer spectrum of mission Support architectures that critical problems national security support exascale of unprecedented applications complexity Develop a software Train a next-generation Contribute to the stack that is both workforce of economic exascale-capable and computational competitiveness usable on industrial & scientists, engineers, of the nation academic scale and computer systems, in collaboration scientists with vendors 2 Exascale Computing Project, www.exascaleproject.org ECP has formulated a holistic approach that uses co- design and integration to achieve capable exascale Application Software Hardware Exascale Development Technology Technology Systems Science and Scalable and Hardware Integrated mission productive technology exascale applications software elements supercomputers Correctness Visualization Data Analysis Applicationsstack Co-Design Programming models, Math libraries and development environment, Tools Frameworks and runtimes System Software, resource Workflows Resilience management threading, Data Memory scheduling, monitoring, and management and Burst control I/O and file buffer system Node OS, runtimes Hardware interface ECP’s work encompasses applications, system software, hardware technologies and architectures, and workforce
    [Show full text]
  • HPXMP, an Openmp Runtime Implemented Using
    hpxMP, An Implementation of OpenMP Using HPX By Tianyi Zhang Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Science in the School of Engineering at Louisiana State University, 2019 Baton Rouge, LA, USA Acknowledgments I would like to express my appreciation to my advisor Dr. Hartmut Kaiser, who has been a patient, inspirable and knowledgeable mentor for me. I joined STEjjAR group 2 years ago, with limited knowledge of C++ and HPC. I would like to thank Dr. Kaiser to bring me into the world of high-performance computing. During the past 2 years, he helped me improve my skills in C++, encouraged my research project in hpxMP, and allowed me to explore the world. When I have challenges in my research, he always points things to the right spot and I can always learn a lot by solving those issues. His advice on my research and career are always valuable. I would also like to convey my appreciation to committee members for serving as my committees. Thank you for letting my defense be to an enjoyable moment, and I appreciate your comments and suggestions. I would thank my colleagues working together, who are passionate about what they are doing and always willing to help other people. I can always be inspired by their intellectual conversations and get hands-on help face to face which is especially important to my study. A special thanks to my family, thanks for their support. They always encourage me and stand at my side when I need to make choices or facing difficulties.
    [Show full text]
  • Library for Handling Asynchronous Events in C++
    Masaryk University Faculty of Informatics Library for Handling Asynchronous Events in C++ Bachelor’s Thesis Branislav Ševc Brno, Spring 2019 Declaration Hereby I declare that this paper is my original authorial work, which I have worked out on my own. All sources, references, and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Branislav Ševc Advisor: Mgr. Jan Mrázek i Acknowledgements I would like to thank my supervisor, Jan Mrázek, for his valuable guid- ance and highly appreciated help with avoiding death traps during the design and implementation of this project. ii Abstract The subject of this work is design, implementation and documentation of a library for the C++ programming language, which aims to sim- plify asynchronous event-driven programming. The Reactive Blocks Library (RBL) allows users to define program as a graph of intercon- nected function blocks which control message flow inside the graph. The benefits include decoupling of program’s logic and the method of execution, with emphasis on increased readability of program’s logical behavior through understanding of its reactive components. This thesis focuses on explaining the programming model, then proceeds to defend design choices and to outline the implementa- tion layers. A brief analysis of current competing solutions and their comparison with RBL, along with overhead benchmarks of RBL’s abstractions over pure C++ approach is included. iii Keywords C++ Programming Language, Inversion
    [Show full text]
  • Intel Threading Building Blocks
    Praise for Intel Threading Building Blocks “The Age of Serial Computing is over. With the advent of multi-core processors, parallel- computing technology that was once relegated to universities and research labs is now emerging as mainstream. Intel Threading Building Blocks updates and greatly expands the ‘work-stealing’ technology pioneered by the MIT Cilk system of 15 years ago, providing a modern industrial-strength C++ library for concurrent programming. “Not only does this book offer an excellent introduction to the library, it furnishes novices and experts alike with a clear and accessible discussion of the complexities of concurrency.” — Charles E. Leiserson, MIT Computer Science and Artificial Intelligence Laboratory “We used to say make it right, then make it fast. We can’t do that anymore. TBB lets us design for correctness and speed up front for Maya. This book shows you how to extract the most benefit from using TBB in your code.” — Martin Watt, Senior Software Engineer, Autodesk “TBB promises to change how parallel programming is done in C++. This book will be extremely useful to any C++ programmer. With this book, James achieves two important goals: • Presents an excellent introduction to parallel programming, illustrating the most com- mon parallel programming patterns and the forces governing their use. • Documents the Threading Building Blocks C++ library—a library that provides generic algorithms for these patterns. “TBB incorporates many of the best ideas that researchers in object-oriented parallel computing developed in the last two decades.” — Marc Snir, Head of the Computer Science Department, University of Illinois at Urbana-Champaign “This book was my first introduction to Intel Threading Building Blocks.
    [Show full text]