hpxMP, An Implementation of OpenMP Using HPX
Phylanx Year 3 Meeting Tianyi Zhang
LSU, Baton Rouge, LA, Nov. 7th. 2019
1 Contributors
10/28/2019 Contributors to STEllAR-GROUP/hpxMP · GitHub
60
40
20
0 2014 2015 2016 2017 2018 2019
50 50
2014 2017 2014 2017
50 50
2014 2017 2014 2017
50 50
2014 2017 2014 2017
50
2014 2017
https://github.com/STEllAR-GROUP/hpxMP/graphs/contributors 1/1
LSU, Baton Rouge, LA, Nov. 7th. 2019 2 Outline
• An Overview of hpxMP • Performance Comparison • Daxpy Benchmark, Barcelona OpenMP Task Suit • llvm-OpenMP, GOMP, hpxMP • Progress
LSU, Baton Rouge, LA, Nov. 7th. 2019 3 Why hpxMP is Needed
Phylanx
HPX Blaze # pragma omp parallel for
hpxMP OpenMP
LSU, Baton Rouge, LA, Nov. 7th. 2019 4 Layers of hpxMP and OpenMP implementation
User End User and Applicaon Layer OpenMP OpenMP Program Environment Direcves, Library Layer Variable Compiler Funcon Create HPX thread
HPX Support for user-level threading
System OS/system Support for shared memory and threading Layer Schedule HPX thread Processor1 Processor2 ProcessorN Machine
https://www.openmp.org/wp- content/uploads/Intro_To_OpenMP_Mattson.pdf
LSU, Baton Rouge, LA, Nov. 7th. 2019 5 Use hpxMP underneath
Two ways to build
• Compile your program as an OpenMP program clang++/g++ -fopenmp MyCode.cpp –o MyEXE
• OMP_NUM_THREADS= N LD_PRELOAD = PATH…/libhpxmp.so ./MyEXE
HPX threads
OS thread #pragma hpx::start omp parallel
#pragma omp task HPX threads
LSU, Baton Rouge, LA, Nov. 7th. 2019 6 Examples And Implementation
LSU, Baton Rouge, LA, Nov. 7th. 2019 7 Examples
#pragma omp task #pragma omp parallel { #pragma omp single { HPX printf("A "); Scheduler #pragma omp task { printf("race "); } #pragma omp task { printf("car "); hpx::applier::register_thread_nullary } #pragma omp taskwait A car race is fun to watch printf("is fun to watch "); OR }\\end omp single }\\end omp parallel A race car is fun to watch
LSU, Baton Rouge, LA, Nov. 7th. 2019 8 A Simple OpenMP Program
Note: https://www.openmp.org/wp-content/uploads/Intro_To_OpenMP_Mattson.pdf x and sum was initialized before the for loop LSU, Baton Rouge, LA, Nov. 7th. 2019 9 Performance ? hpxMP vs llvm-OpenMP vs GCC OpenMP v0.1.0 Nov.2018 v0.2.0 May 2019 Current Version Nov 2019
LSU, Baton Rouge, LA, Nov. 7th. 2019 10 Performance Comparison
Hardware and Software Configuration
http://www.hpc.lsu.edu/resources/hpc/system.php?system=QB2
LSU, Baton Rouge, LA, Nov. 7th. 2019 11 Performance Comparison Daxpy Benchmark
# pragma omp parallel for
vector
Vector Size: 103 to 106
Thread Pool Employed in Current Version
LSU, Baton Rouge, LA, Nov. 7th. 2019 12 Performance Comparison
# pragma omp task # pragma omp taskwait SORT Benchmark
OPERATION: SORT ARRAY OBJECT: 107 32-bit numbers CUT OFF: 10 to 107. (Cut off value determines when to perform serial quicksort instead of dividing the array into 4 portions recursively where tasks are created.)
LSU, Baton Rouge, LA, Nov. 7th. 2019 13 Implementation
Thread and Task Synchronization
• The Latch class is a downward counter which can be used to synchronize threads. • The value of the counter is initialized on creation. • Threads may block on the latch until the counter is decremented to zero or simply decrement the counter
LSU, Baton Rouge, LA, Nov. 7th. 2019 14 Implementation #pragma omp task
LSU, Baton Rouge, LA, Nov. 7th. 2019 15 OpenMP Performance Toolkit (OMPT)
OMPT is an application programming interface (API) for first-party performance tools. Making it possible to construct powerful tools that will support OpenMP implementation.
LSU, Baton Rouge, LA, Nov. 7th. 2019 16 Pragmas Implemented Compared to v3.0
• #pragma omp parallel • #pragma omp flush • #pragma omp for • #pragma omp ordered • #pragma omp sections • #pragma omp threadprivate • #pragma omp single OpenMP v4.0 • #pragma omp task • #pragma omp master • #pragma omp task depend • #pragma omp critical • #pragma omp barrier OpenMP v5.0 • #pragma omp taskwait • #pragma omp task reduction • #pragma omp atomic
LSU, Baton Rouge, LA, Nov. 7th. 2019 17 Runtime Library Implemented Compared to v3.0
• omp_set_num_threads • omp_set_schedule • omp_init_lock • omp_get_num_threads • omp_get_schedule • omp_init_nest_lock • omp_get_max_threads • omp_get_thread_limit • omp_destroy_lock • omp_get_thread_num • omp_set_max_active_levels • omp_destroy_nest_lock • omp_get_num_procs • omp_get_max_active_levels • omp_set_lock • omp_in_parallel • omp_get_level • omp_set_nest_lock • omp_set_dynamic • omp_get_ancestor_thread_num • omp_unset_lock • omp_get_dynamic • omp_get_team_size • omp_unset_nest_lock • omp_set_nested • omp_get_active_level • omp_set_lock • omp_set_nested • omp_test_nest_lock • omp_get_wtime • omp_get_wtick
LSU, Baton Rouge, LA, Nov. 7th. 2019 18 Progress..
• Optimize performance by introducing hpx::latch, boost::intrusive_ptr • Automate the collection and visualization of performance data • Automate unit testing with CMake • Enable GCC compiler support • Implement OpenMP Performance Toolkit(OMPT) • Implement most recent OpenMP 5.0 feature • Update Documentation • Discover and Resolve bugs
LSU, Baton Rouge, LA, Nov. 7th. 2019 19 Publications and Talks
• Zhang, Tianyi, Shahrzad Shirzad, Patrick Diehl, R. Tohid, Weile Wei, and Hartmut Kaiser. "An Introduction to hpxMP: A Modern OpenMP Implementation Leveraging HPX, An Asynchronous Many- Task System." In Proceedings of the International Workshop on OpenCL, p. 13. ACM, 2019.
• Oct./2018, Seminar, Introduction to hpxMP, LSU, Baton Rouge, LA. https://www.youtube.com/watch?v=ajDGWPDrcxU&list=PL7vEgTL3FalbVFwzkXLHpBRKlcJNULW1g&index =12 • Feb./2019, Presentation, Scala, New Orleans, LA. • May/2019, Lightning Talk, CppNow19, Aspen, CO. https://www.youtube.com/watch?v=SI0eyXydL3M&t=79s • May/2019, Paper Presentation, IWOCL, Boston, MA. • Sept./2019, Lightning Talk, CppCon19, Aurora, CO.
LSU, Baton Rouge, LA, Nov. 7th. 2019 20 Questions?
LSU, Baton Rouge, LA, Nov. 7th. 2019 21