hpxMP, An Implementation of OpenMP Using HPX

Phylanx Year 3 Meeting Tianyi Zhang

LSU, Baton Rouge, LA, Nov. 7th. 2019

1 Contributors

10/28/2019 Contributors to STEllAR-GROUP/hpxMP · GitHub

60

40

20

0 2014 2015 2016 2017 2018 2019

50 50

2014 2017 2014 2017

50 50

2014 2017 2014 2017

50 50

2014 2017 2014 2017

50

2014 2017

https://github.com/STEllAR-GROUP/hpxMP/graphs/contributors 1/1

LSU, Baton Rouge, LA, Nov. 7th. 2019 2 Outline

• An Overview of hpxMP • Performance Comparison • Daxpy Benchmark, Barcelona OpenMP Task Suit • llvm-OpenMP, GOMP, hpxMP • Progress

LSU, Baton Rouge, LA, Nov. 7th. 2019 3 Why hpxMP is Needed

Phylanx

HPX Blaze # pragma omp parallel for

hpxMP OpenMP

LSU, Baton Rouge, LA, Nov. 7th. 2019 4 Layers of hpxMP and OpenMP implementation

User End User and Applicaon Layer OpenMP OpenMP Program Environment Direcves, Library Layer Variable Compiler Funcon Create HPX

HPX Support for user-level threading

System OS/system Support for and threading Layer Schedule HPX thread Processor1 Processor2 ProcessorN Machine

https://www.openmp.org/wp- content/uploads/Intro_To_OpenMP_Mattson.pdf

LSU, Baton Rouge, LA, Nov. 7th. 2019 5 Use hpxMP underneath

Two ways to build

• Compile your program as an OpenMP program clang++/g++ -fopenmp MyCode.cpp –o MyEXE

• OMP_NUM_THREADS= N LD_PRELOAD = PATH…/libhpxmp.so ./MyEXE

HPX threads

OS thread #pragma hpx::start omp parallel

#pragma omp task HPX threads

LSU, Baton Rouge, LA, Nov. 7th. 2019 6 Examples And Implementation

LSU, Baton Rouge, LA, Nov. 7th. 2019 7 Examples

#pragma omp task #pragma omp parallel { #pragma omp single { HPX printf("A "); Scheduler #pragma omp task { printf("race "); } #pragma omp task { printf("car "); hpx::applier::register_thread_nullary } #pragma omp taskwait A car race is fun to watch printf("is fun to watch "); OR }\\end omp single }\\end omp parallel A race car is fun to watch

LSU, Baton Rouge, LA, Nov. 7th. 2019 8 A Simple OpenMP Program

Note: https://www.openmp.org/wp-content/uploads/Intro_To_OpenMP_Mattson.pdf x and sum was initialized before the for loop LSU, Baton Rouge, LA, Nov. 7th. 2019 9 Performance ? hpxMP vs llvm-OpenMP vs GCC OpenMP v0.1.0 Nov.2018 v0.2.0 May 2019 Current Version Nov 2019

LSU, Baton Rouge, LA, Nov. 7th. 2019 10 Performance Comparison

Hardware and Software Configuration

http://www.hpc.lsu.edu/resources/hpc/system.php?system=QB2

LSU, Baton Rouge, LA, Nov. 7th. 2019 11 Performance Comparison Daxpy Benchmark

# pragma omp parallel for

vector b = vector a * float c + vector b

Vector Size: 103 to 106

Thread Pool Employed in Current Version

LSU, Baton Rouge, LA, Nov. 7th. 2019 12 Performance Comparison

# pragma omp task # pragma omp taskwait SORT Benchmark

OPERATION: SORT ARRAY OBJECT: 107 32-bit numbers CUT OFF: 10 to 107. (Cut off value determines when to perform serial quicksort instead of dividing the array into 4 portions recursively where tasks are created.)

LSU, Baton Rouge, LA, Nov. 7th. 2019 13 Implementation

Thread and Task Synchronization

• The Latch class is a downward counter which can be used to synchronize threads. • The value of the counter is initialized on creation. • Threads may block on the latch until the counter is decremented to zero or simply decrement the counter

LSU, Baton Rouge, LA, Nov. 7th. 2019 14 Implementation #pragma omp task

LSU, Baton Rouge, LA, Nov. 7th. 2019 15 OpenMP Performance Toolkit (OMPT)

OMPT is an application programming interface (API) for first-party performance tools. Making it possible to construct powerful tools that will support OpenMP implementation.

LSU, Baton Rouge, LA, Nov. 7th. 2019 16 Pragmas Implemented Compared to v3.0

• #pragma omp parallel • #pragma omp flush • #pragma omp for • #pragma omp ordered • #pragma omp sections • #pragma omp threadprivate • #pragma omp single OpenMP v4.0 • #pragma omp task • #pragma omp master • #pragma omp task depend • #pragma omp critical • #pragma omp barrier OpenMP v5.0 • #pragma omp taskwait • #pragma omp task reduction • #pragma omp atomic

LSU, Baton Rouge, LA, Nov. 7th. 2019 17 Runtime Library Implemented Compared to v3.0

• omp_set_num_threads • omp_set_schedule • omp_init_lock • omp_get_num_threads • omp_get_schedule • omp_init_nest_lock • omp_get_max_threads • omp_get_thread_limit • omp_destroy_lock • omp_get_thread_num • omp_set_max_active_levels • omp_destroy_nest_lock • omp_get_num_procs • omp_get_max_active_levels • omp_set_lock • omp_in_parallel • omp_get_level • omp_set_nest_lock • omp_set_dynamic • omp_get_ancestor_thread_num • omp_unset_lock • omp_get_dynamic • omp_get_team_size • omp_unset_nest_lock • omp_set_nested • omp_get_active_level • omp_set_lock • omp_set_nested • omp_test_nest_lock • omp_get_wtime • omp_get_wtick

LSU, Baton Rouge, LA, Nov. 7th. 2019 18 Progress..

• Optimize performance by introducing hpx::latch, ::intrusive_ptr • Automate the collection and visualization of performance data • Automate unit testing with CMake • Enable GCC compiler support • Implement OpenMP Performance Toolkit(OMPT) • Implement most recent OpenMP 5.0 feature • Update Documentation • Discover and Resolve bugs

LSU, Baton Rouge, LA, Nov. 7th. 2019 19 Publications and Talks

• Zhang, Tianyi, Shahrzad Shirzad, Patrick Diehl, R. Tohid, Weile Wei, and Hartmut Kaiser. "An Introduction to hpxMP: A Modern OpenMP Implementation Leveraging HPX, An Asynchronous Many- Task System." In Proceedings of the International Workshop on OpenCL, p. 13. ACM, 2019.

• Oct./2018, Seminar, Introduction to hpxMP, LSU, Baton Rouge, LA. https://www.youtube.com/watch?v=ajDGWPDrcxU&list=PL7vEgTL3FalbVFwzkXLHpBRKlcJNULW1g&index =12 • Feb./2019, Presentation, Scala, New Orleans, LA. • May/2019, Lightning Talk, CppNow19, Aspen, CO. https://www.youtube.com/watch?v=SI0eyXydL3M&t=79s • May/2019, Paper Presentation, IWOCL, Boston, MA. • Sept./2019, Lightning Talk, CppCon19, Aurora, CO.

LSU, Baton Rouge, LA, Nov. 7th. 2019 20 Questions?

LSU, Baton Rouge, LA, Nov. 7th. 2019 21