HPX – an Open Source C++ Standard Library for Parallelism And
Total Page:16
File Type:pdf, Size:1020Kb
HPX – An open source C++ Standard Library for Parallelism and Concurrency Extended Abstract Thomas Heller Patrick Diehl Zachary Byerly Friedrich-Alexander University Polytechnique Montreal Louisiana State University Erlangen-Nuremberg Montreal, Quebec, Canada Baton Rouge, Louisiana, USA Erlangen, Bavaria, Germany John Biddiscombe Hartmut Kaiser Swiss Supercomputing Centre (CSCS) Louisiana State University Lugano, Switzerland Baton Rouge, Louisiana, USA ABSTRACT ACM Reference Format: To achieve scalability with today’s heterogeneous HPC resources, Thomas Heller, Patrick Diehl, Zachary Byerly, John Biddiscombe, and Hart- mut Kaiser. 2017. HPX – An open source C++ Standard Library for Paral- we need a dramatic shift in our thinking; MPI+X is not enough. lelism and Concurrency . In Proceedings of OpenSuCo 2017, Denver , Colorado Asynchronous Many Task (AMT) runtime systems break down the USA, November 2017 (OpenSuCo’17), 5 pages. global barriers imposed by the Bulk Synchronous Programming https://doi.org/10.1145/nnnnnnn.nnnnnnn model. HPX is an open-source, C++ Standards compliant AMT runtime system that is developed by a diverse international com- munity of collaborators called The Stejjar Group. HPX provides 1 INTRODUCTION features which allow application developers to naturally use key The free lunch is over [19] - the end of Moore’s Law means we have design patterns, such as overlapping communication and compu- to use more cores instead of faster cores. The massive increase in tation, decentralizing of control flow, oversubscribing execution on-node parallelism is also motivated by the need to keep power resources and sending work to data instead of data to work. The consumption in balance [18]. We have been using large numbers of Stejjar Group is comprised of physicists, engineers, and computer cores in promising architectures for many years, like GPUs, FPGAs, scientists; men and women from many different institutions and and other many core systems; now we have Intel’s Knights Landing affiliations, and over a dozen different countries. We are committed with up to 72 cores. Efficiently using that amount of parallelism, to advancing the development of scalable parallel applications by especially with heterogeneous resources, requires a paradigm shift; providing a platform for collaborating and exchanging ideas. In this we must develop new effective and efficient parallel programming paper, we give a detailed description of the features HPX provides techniques to allow the usage of all parts in the system. All in and how they help achieve scalability and programmability, a list all, it can be expected that concurrency in future systems will be of applications of HPX including two large NSF funded collabora- increased by an order of magnitude. tions (STORM, for storm surge forecasting; and STAR (OctoTiger) HPX is an open-source, C++ Standards compliant, Asynchronous an astrophysics project which runs at 96:8% parallel efficiency on Many Task (AMT) runtime system. Because HPX is a truly collabo- 643,280 cores), and we end with a description of how HPX and the rative, international project with many contributors, we could not Stejjar Group fit into the open source community. say that it was developed at one (or two, or three...) universities. The Stejjar Group was created “to promote the development of CCS CONCEPTS scalable parallel applications by providing a community for ideas, • Computing methodologies → Parallel computing methodolo- a framework for collaboration, and a platform for communicating gies; these concepts to the broader community.” We congregate on our IRC channel (#stejjar on freenode.net), have a website and blog KEYWORDS (stellar-group.org), as well as many active projects in close rela- parallelism, concurrency, standard library, C++ tion to HPX. For the last 3 years we have been an organization in Google Summer of Code, mentoring 17 students. The Stejjar Group is diverse and inclusive; we have members from at least a dozen Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed different countries. Trying to create a library this size that is truly for profit or commercial advantage and that copies bear this notice and the full citation open-source and C++ Standards compliant is not easy, but it is on the first page. Copyrights for components of this work owned by others than ACM something that we are committed to. We also believe it is critical to must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a have an open exchange of ideas and to reach out to domain scien- fee. Request permissions from [email protected]. tists (physicists, engineers, biologists, etc.) who would benefit from OpenSuCo’17, November 2017, Denver , Colorado USA using HPX. This is why we have invested so much time and energy © 2017 Association for Computing Machinery. ACM ISBN 978-x-xxxx-xxxx-x/YY/MM...$15.00 into our two scientific computing projects: STORM, a collaborative https://doi.org/10.1145/nnnnnnn.nnnnnnn effort which aims to improve storm surge forecasting; and STAR OpenSuCo’17, November 2017, Denver , Colorado USA T. Heller et al. (OctoTiger) an astrophysics project which has generated exciting complex data flow execution trees that generate millions ofHPX results running at 96:8% parallel efficiency on 643,280 cores. In ad- tasks that by definition execute in the proper order. This paper is dition leading the runtime support effort in the european FETHPC structured as follows: project AllScale to research further possibilities to leverage large In Section 2 we summarize briefly the important features of HPX scale AMT systems. for parallelism and concurrency. The application of HPX in differ- In the current landscape of scientific computing, the conven- ent high performance computing related topics are presented in tional thinking is that the currently prevalent programming model Section 3. In Section 4 we review the aspect of open source software of MPI+X is suitable enough to tackle those challenges with minor and community building. Finally, we conclude with Section 5. adaptations [1]. That is, MPI is used as the tool for inter-node com- munication, and X is a placeholder to represent intra-node paral- 2 HPX RUNTIME COMPONENTS lelization, such as OpenMP and/or CUDA. Other inter-node commu- In this section we briefly give an overview of the HPX runtime nication paradigms, such as partitioned global address space (PGAS), system components with a focus on parallelism and concurrency. emerged as well, which focus on one sided messages together with Figure 1 shows five runtime components of the runtime system explicit, mostly global synchronization. While the promise of keep- which are briefly reviewed in the following. The core element is ing the current programming patterns and known tools sounds the thread mamager which provides the local thread management, appealing, the disjoint approach results in the requirement to main- see Section 2.1. The Active Global Address Space (AGAS) provides tain various different interfaces and the question of interoperability an abstraction over globally addressable C++ objects which sup- is, as of yet, unclear [8]. port RPC (Section 2.2). In Section 2.3 the parcel handler which Moving from Bulk Synchronous Programming (BSP) towards implements the one-side active massaging layer is reviewed. For fine grained, constraint based synchronization in order tolimit runtime adaptivity HPX exposes several local and global perfor- the effects of global barriers, can only be achieved by changing mance counters, see Section 2.4. For more details we refer to [16]. paradigms for parallel programming. This paradigm shift is enabled by the emerging Asynchronous Many Task (AMT) runtime systems, that carry properties to alleviate the aforementioned limitations. It is therefore not a coincidence that the C++ Programming Lan- Thread Manager Performance Counter guage, starting with the standard updated in 2011, gained support for concurrency by defining a memory model in a multi-threaded world as well as laying the foundation towards enabling task based parallelism by adapting the future [2, 9] concept. Later on, in 2017, AGAS Service based on those foundational layers, support for parallel algorithms Resolve Remote Actions was added, which coincidentally, covers the need for data parallel Parcel Handler algorithms. The HPX parallel runtime system unifies an AMT tai- Figure 1: Overview of the HPX Runtime System compo- lored for HPC usage combined with strict adherence to the C++ nents. standard. It therefore represents a combination of well-known ideas (such as data flow [6, 7], futures [2, 9], and Continuation Passing Style (CPS)) with new and unique overarching concepts. The com- bination of these ideas and their strict application forms underlying design principles that makes this model unique. HPX provides: 2.1 Thread manager Within HPX a light-weight user level threading mechanism is used • A C++ Standards-conforming API enabling wait-free asyn- to offer the necessary capabilities to implement all higher level chronous execution. APIs. Predefiend scheduling policies are provided to determine the • futures, channels, and other asynchronous primitives. optimal task scheduling for a given application and/or algorithm. • An Active Global Address Space (AGAS) that supports