Extending the LLVM/Clang Framework for Openmp Metadirective Support

2020 IEEE/ACM 6th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC) and Workshop on Hierarchical Parallelism for Exascale Computing (HiPar) Extending the LLVM/Clang Framework for OpenMP Metadirective Support Alok Mishra Abid M. Malik Barbara Chapman Stony Brook University Brookhaven National Laboratory Stony Brook University Stony Brook, NY, USA Brookhaven, USA Stony Brook, NY, USA [email protected] [email protected] [email protected] Brookhaven National Laboratory Brookhaven, USA Abstract—OpenMP 5.0 introduces many new directives to meet specified a handful of directives, substantial amount of new the demand of emerging high performance computing systems. constructs have been introduced and most existing APIs have Among these new directives, the metadirective and declare variant been enhanced in each revision. The latest version of OpenMP directives are important to control the execution behavior of a given application by compile-time adaptation based on the 5.0, released in 2018, has more than 60 directives [2]. Variant OpenMP context. The metadirective directive allows the selection directives is one of the major features introduced in OpenMP of OpenMP directives based on the enclosing OpenMP context as 5.0 specification to facilitate programmers to improve perfor- well as on user-defined conditions. The declare variant directive mance portability. These directives can enable adaptation of declares a specialized variant of a base function and specifies OpenMP pragmas and user code at compile and runtime. The the context in which that specialized variant is used. Support for these directives are available in few compilers with some OpenMP specification defines traits to describe active OpenMP limitations. constructs, execution devices, functionality provided by an Support for metadirective is not available in Clang. This implementation, and context selectors based on the traits and paper presents our implementation of the metadirective directive user-defined conditions. It also defines variant directives like in Clang. In this paper, we present an implementation which metadirective and declare variant which uses context selectors supports the OpenMP 5.0 metadirective specification. However, in addition, this work also implements a dynamic extension to the to choose various directives or functions. user-specified conditions. A dynamic evaluation of user-defined • The metadirective directive is an executable directive that conditions provides programmers more freedom to express a conditionally resolves to another directive at compile time range of adaptive algorithms that improve overall performance of a given application. For example, a program can estimate the cost by selecting from multiple directive variants based on of execution, with respect to time taken or energy consumption, traits that define an OpenMP condition or context. of a kernel based on some dynamic or static variables and • The declare variant directive has similar functionality as decide whether or not to offload the kernel to GPU using the metadirective but selects a function variant at the call-site metadirective. Since there is a significant lack of knowledge about based on context or user-defined conditions. the usage and performance analysis of metadirective, the work also studies its impact on application characteristics. The mechanism provided by these directives for selecting vari- To achieve this, we have modified several benchmark codes in ants is more convenient to use than the C/C++ pre-processing the Rodinia benchmark suite. The Rodinia benchmark includes since it directly supports variant selection in OpenMP and applications and kernels which target multi-core CPU and GPU platforms which helps programmers study the emerging com- allows an OpenMP compiler to analyze and determine the final puting platforms. Our modification to the Rodinia benchmarks directive from variants and context as shown in Figure 1. enables the application developer to study the behavior of However, OpenMP 5.0 restricts context traits to be com- metadirective. Our analysis reveal that the main advantage of pletely resolvable at compile time. This constrains the potential the dynamic implementation of metadirective is that it adds minimal to no overhead to the user application, in addition to to optimize an OpenMP application based on runtime behavior allowing flexibility to the programmers to introduce portability or input data. An extension to allow runtime adaptation, based and adaptability to their code. Our modification of the Rodinia on properties like system architecture or input characteristics, benchmark suite provides several guidelines for programmers to may increase the efficiency of an application. Applications that achieve better performance with metadirective. would benefit from this feature include those that use traits Index Terms—OpenMP 5.0, Metadirective, LLVM, Clang, based on problem size, loop count, and the number of threads. Dynamic context. For example, most libraries parallelize and optimize matrix multiplications depending on the sizes of the input matrix. In I. INTRODUCTION this work we explore the possibility of extending metadirective To meet the demand of productive parallel computing on user-condition to dynamically resolve the context selector existing and emerging high performance computing systems, based on runtime attributes. the OpenMP standard has evolved significantly in recent The high performance computing community has been years [1]. Since the creation of the standard in 1997, that exploring novel platforms to push performance forward [3]. 978-0-7381-1042-4/20/$31.00 ©2020 IEEE 33 DOI 10.1109/LLVMHPCHiPar51896.2020.00009 // code adaptation using preprocessing directives int v1[N], v2[N], v3[N]; #if defined(nvptx) #pragma omp target teams distribute parallel loop map(to:v1,v2) map(from:v3) for ( int i= 0; i< N; i++) v3[i] = v1[i] * v2[i]; #else #pragma omp target parallel loop map(to:v1,v2) map(from:v3) for ( int i= 0; i< N; i++) v3[i] = v1[i] * v2[i]; #endif // code adaptation using metadirective in OpenMP 5.0 int v1[N], v2[N], v3[N]; #pragma omp target data map(to:v1,v2) map(from:v3) #pragma omp metadirective \ when(device=arch("nvptx"): target teams distribute parallel loop) \ default(target parallel loop) for ( int i= 0; i< N; i++) v3[i] = v1[i] * v2[i]; Fig. 1: C Pre-processing vs OpenMP Metadirective usage comparison. Field Programmable Logic Arrays (FPGAs) and Graphics Pro- under the Exascale Computing Project’s (ECP) [11] SOLLVE cessing Units (GPUs) have been widely used as accelerators project [12] and now supports many OpenMP 5.0 features for computational intensive applications. The heterogeneous that are essential for ECP applications. These features do not cluster is one of the most promising platforms as it combines include metadirective yet and to the best of our knowledge, characteristics of multiple processing elements to meet the our work is the first one which successfully add OpenMP 5.0 requirements of various applications. In this context, a self- specified metadirective to the Clang framework. In addition adaptive framework for heterogeneous clusters would be an to this, we also implement a dynamic extension to the user- ideal solution [4]. OpenMP context based directives like specified conditions (detailed discussion in Section III-E). The metadirective can provide an excellent solution to build such main contributions of our work are: frameworks. • Enhancing Clang capability by implementing OpenMP’s Current compilers do provide implementation of OpenMP metadirective directive in the LLVM framework. 5.0 with some limitations, but as far as we know only the • The implementation supports the compile time selection ROSE [5] source-to-source compiler infrastructure has partial of hardware support and compiler implementations for implementation of metadirective [6]. Other compilers which code portability across different architectures. claim active development of OpenMP 5.0 specifications are: • The work also presents a novel implementation of the • Cray – The lastest CCE 10.0 compiler has partial support semantics for user-defined contexts to enable runtime for OpenMP 5.0 [7]. However, the list of enhancements directive selection using LLVM. claimed by the compiler does not include the metadirec- • We also modified the Rodinia benchmarks to study the tive. impact of the directives. These benchmarks can provide a • GNU – GCC 10 [8] adds a number of newly implemented guide line to end users to help them apply these features OpenMP 5.0 features on top of the GCC 9 release such in real applications. as conditional lastprivate clause, scan and loop directives, The remainder of this paper is organized as follows. Sec- order(concurrent) and use device addr clauses, if clause tion II presents the motivation for using metadirective and on simd construct or partial support for the declare variant hence the need for it to be supported in compilers, especially directive, and they are getting closer to the complete in Clang. Section III gives details of the implementation in OpenMP 5.0 standard. However, metadirective is not Clang. Section IV introduces our experimental setup and Sec- implemented yet. tion V provides, and analyzes, the results of our experiments. • Intel – The Intel compiler provides most of the OpenMP Section VI gives related work. Finally, Section VII presents Version Technical Report 4, version 5.0 features [9]. our conclusion and future directions. Their implementation of OpenMP 5.0 specification

Load more