HPX Applications and Performance Adaptation

HPX Applications and Performance Adaptation

HPX Applications and Performance Adaptation ∗ Alice Koniges Jayashree Ajay Candadai Hartmut Kaiser Berkeley Lab / NERSC Indiana University Louisiana State University Kevin Huck Jeremy Kemp Thomas Heller University of Oregon University of Houston Friedrich-Alexander University Matthew Anderson Andrew Lumsdaine Adrian Serio Indiana University Indiana University Louisiana State University Michael Wolf Bryce Lelbach Ron Brightwell Sandia National Laboratories Berkeley Lab Sandia National Laboratories Thomas Sterling Indiana University ABSTRACT on message passing and primarily homogeneous parallel ar- This poster focuses on application performance under HPX. chitectures become less efficient. Recent experience by a Developed world-wide, HPX is emerging as a critical new number of groups has shown that systems based on light- programming model combined with a runtime system that weight tasks and data dependence are a good method for uses an asynchronous style to escape the traditional static extracting parallelism and achieving performance. HPX is communicating sequential processes execution model, namely emerging as an important new path towards achieving ex- MPI, with a fully dynamic and adaptive model exploiting ascale. HPX is developed worldwide and has growing sup- the capabilities of future generation performance-oriented port from funding agencies such as the XPRESS project runtime systems and hardware architecture enhancements. through the United States Department of Energy, the Na- We focus on comparing application performance on stan- tional Science Foundation, the Bavarian Research Founda- dard supercomputers such as a Cray XC30 to implementa- tion, and the European Horizon 2020 Programme. In this tions on next generation testbeds. We also describe perfor- poster we present application performance of HPX codes on mance adaptation techniques and legacy application migra- very recent architectures including the current and proto- tion schemes. We discuss which applications benefit sub- typical next-generation Cray-Intel machines, and compare stantially from the asynchronous formulations, and which, these results to more standard MPI+OpenMP implementa- because of communication patterns and inherent synchro- tions. nization points, and other details will require rewriting for newer architectures. Most of our applications show improve- We focus our results on two platforms, NERSC's Edison ma- ment in their new implementations, and improved perfor- chine (Cray XC-30), and Babbage (Intel Knights Corner or mance on the next generation hardware is even more pro- KNC) testbed. We choose these systems to show how results nounced. on Edison compare to the upcoming Intel Kinghts Landing (KNL) which will more closely resemble the KNC in terms of number of shared memory cores, yet will not treat the KNL 1. EXTENDED ABSTRACT (800 WORDS) as accelerators attached to a more standard processor. See Exascale programming models and runtime systems are at a e.g., Figs. 1 and 2. The application suite of codes is a com- critical juncture in development as standard methods based bination of miniapps and benchmark codes from a variety of ∗ application areas. In particular, we are studying GTC, PIC- Contact email: [email protected]. SAR, Mini-ghost, N-Body Code, Mini-Tri, CMA, LULESH, and various kernels, such as matrix transpose and fast multi- pole algorithms. For example the work with LULESH proxy application illustrates a many-tasking implementation, in- cluding oversubscription, asynchrony management seman- tics, and active messages [4]. The HPX runtime system is a modular, feature-complete, and performance-oriented repre- sentation of the ParalleX execution model (see Figure 3) tar- geted to parallel computing architectures [2]. The ParalleX execution model itself is motivated to address the sources of performance and scaling degradation highlighted by the AC04-94AL85000, DE-SC0008596, DE-SC0008638, and DE- SLOWER performance model [5]. As an alternative to AC02-05CH11231, by National Science Foundation through MPI, HPX incorporates routines to manage lightweight user- grants: CNS-1117470, AST-1240655, CCF-1160602, IIS-1447831, threads in addition to providing an Active Global Address and ACI-1339782, and by Friedrich-Alexander-University Erlangen- Space (AGAS). One example usage of AGAS is in collision Nuremberg through grant: H2020-EU.1.2.2. 671603. detection [1]. Most of the applications already have an MPI + OpenMP implementation allowing comparison between 2. REFERENCES different programing models. We have been profiling the [1] M. Anderson, M. Brodowicz, L. Dalessandro, codes and understand what problems cause scalability limi- J. DeBuhr, and T. Sterling. A dynamic execution tations in the OpenMP versions. The HPX versions usually model applied to distributed collision detection. In 29th scale much better (up to factor 2 compared to OpenMP, up International Supercomputing Conference (ISC), 2014, to factor of 1.3 compared to MPI/OpenMP) as HPX allows Leipzig, Germany, Jun 2014. us to mitigate some of the shortcomings of OpenMP/MPI. [2] C. Dekate, M. Anderson, M. Brodowicz, H. Kaiser, We note that a runtime system such as HPX starts when B. Adelstein-Lelbach, and T. Sterling. Improving the the application starts, and completes when the application scalability of parallel n-body applications with an finishes. It is distinctly different from an operating system, event-driven constraint-based execution model. and can fully support machine usage at large HPC centers, International Journal of High Performance Computing where a variety of different application codes and potentially Applications, 26:319{332, April 2012. different runtimes execute in a time-sharing environment. [3] K. Huck, S. Shende, A. Malony, H. Kaiser, A. Porterfield, R. Fowler, and R. Brightwell. An early There are two implementations of the ParalleX execution prototype of an autonomic performance environment model. One HPX implementation, known as HPX-3, ex- for exascale. In Proceedings of the 3rd International tends the C++11/14 standard to facilitate distributed op- Workshop on Runtime and Operating Systems for erations, enable fine-grained constraint based parallelism, Supercomputers, ROSS '13, pages 8:1{8:8, New York, and support runtime adaptive resource management. It ad- NY, USA, 2013. ACM. ditionally integrates OpenCL (necessary for using GPUs) within its execution through HPXCL. A second HPX im- [4] T. Sterling, M. Anderson, P. K. Bohan, M. Brodowicz, plementation, known as HPX-5, is C-language based and is A. Kulkarni, and B. Zhang. Towards exascale co-design tightly integrated with the fast-messaging library photon as in a runtime system. In Exascale Applications and opposed to using MPI as a network layer. Software Conference, Stockholm, Sweden, Apr 2014. [5] T. Sterling, D. Kogler, M. Anderson, and Exascale systems require a new approach to performance M. Brodowicz. SLOWER: A performance model for observation, analysis and runtime decision making in order Exascale computing. Supercomputing Frontiers and to optimize for performance and efficiency. The standard Innovations, 1:42{57, Sep 2014. model of multiple operating system processes and threads observing themselves in a first-person manner while writ- ing out performance profiles or traces for offline analysis does not adequately capture the highly concurrent and dy- namic behavior of HPX, nor provide opportunities for run- time adaptation. Our approach, called APEX (Autonomic Performance Environment for eXascale) [3] includes meth- ods for information sharing between the layers of the soft- ware stack, from the hardware through operating and run- time systems, all the way to domain specific or legacy ap- plications. Critically, APEX provides a mechanism for run- time adaptation called the Policy Engine, whereby events or periodic checks can trigger algorithmic changes, resource re- allocation or scheduling changes when policy conditions are met. (see Figure 4) Finally, we discuss our efforts in running OpenMP on top of HPX as shown in Figure 5. We discuss both of these in the context of our chosen application suite. In summary, we report application performance with HPX, as a demonstration of a new programming model approach to advance us towards exascale computing. We explore how replacing the static communicating sequential processes ex- ecution model, namely MPI, with a fully dynamic and adap- tive model exploits the capabilities of future generation per- formance oriented runtime systems and hardware architec- ture enhancements. Acknowledgements: Funding by DOE Office of Sci- ence through grants: DE-SC0008714, DE-SC0008809, DE- HPX by example HPX by example 29 25.10.2014 | Thomas Heller ([email protected]) | FAU 25.10.2014 | Thomas Heller ([email protected])28 | FAU Example 1: Matrix Transpose Example 1: Matrix Transpose Edison Babbage Higher is Better Higher is Better Figure 1: Comparison of various programing models on the NERSC Edison machine on left and on the Xeon Phi Babbage machine on right. The improved performance on Babbage bodes well for exascale challenges. 1 MPI HPX-PWC HPX-ISIR 0.9 0.2 0.8 0.7 0.15 0.6 0.5 0.1 MPI HPX-PWC Time per iteraon (s) 0.4 me per cycle (s) 0.3 0.05 0.2 0.1 0 (SMP) 8 27 64 125 216 0 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 28000 cores Cores MiniGhost"F"Weak"Scaling" (40"variables"F"20"9mesteps"F"200x200x200"F"10%"reduc9on)" 2000" HPX" 1500" MPI/OpenMP" Theore9cal"Peak" 1000" GFLOPS" 500" 0" 0" 50" 100" 150" 200" 250" 300" Number"of"Locali9es"(2"Locali9es"per"Node,"8"Cores"each)"

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    3 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us