Clacc: Translating Openacc to Openmp in Clang

Clacc: Translating OpenACC to OpenMP in Clang Joel E. Denny, Seyong Lee, Jeffrey S. Vetter Oak Ridge National Laboratory Email: fdennyje, lees2, [email protected] Abstract—OpenACC was launched in 2010 as a portable compiler has more freedom in how it implements OpenACC programming model for heterogeneous accelerators. Although directives, placing more optimization power in the compiler’s various implementations already exist, no extensible, open-source, hands and depending less on the application developer for production-quality compiler support is available to the community. This deficiency poses a serious risk for HPC application performance tuning. The current version of the OpenACC developers targeting GPUs and other accelerators, and it limits specification is 2.6, released in November 2017. experimentation and progress for the OpenACC specification. To A variety of OpenACC implementations exist, from pro- address this deficiency, Clacc is a recent effort funded by the US duction proprietary compilers to research compilers, including Exascale Computing Project to develop production OpenACC PGI [6], OpenARC [7]–[9], GCC [10], and Sunway OpenACC compiler support for Clang and LLVM. A key feature of the Clacc design is to translate OpenACC to OpenMP to build on from Wuxi [11], [12], providing various support for offloading Clang’s existing OpenMP compiler and runtime support. In this to NVIDIA GPU, AMD GCN, multicore CPU, Intel Xeon Phi, paper, we describe the Clacc goals and design. We also describe and FPGA. However, in this time of extreme heterogeneity in the challenges that we have encountered so far in our prototyping HPC architectures, we believe it’s critical to have an open- efforts, and we present some early performance results. source OpenACC implementation to facilitate broad use, to Index Terms—OpenACC, OpenMP, LLVM, multicore, GPU, accelerators, source-to-source translation, compiler enable researchers to experiment on a broad range of architectures, and to provide a smooth transition path from research implementations into production deployments. Currently, the I. INTRODUCTION only production open-source OpenACC compiler cited by the Heterogeneous and manycore processors (e.g., multicore OpenACC website is GCC [13]. GCC’s support is relatively CPUs, GPUs, FPGAs) are becoming de facto architectures new and so lags behind commercial compilers, such as PGI, to for current HPC and future exascale platforms [1]–[3]. These provide production support for the latest OpenACC specs [14]. architectures are drastically diverse in functionality, perfor- Also, GCC has a reputation for being challenging to extend mance, programmability, and scalability, significantly increas- and, especially within the DOE, is losing favor to Clang and ing complexity that HPC application developers face as they LLVM for new compiler research and development efforts. attempt to fully utilize available hardware. Moreover, users A. Clacc Objectives plan to run on many platforms during an application’s lifetime, so this diversity creates an increasingly critical need for Clacc is a recent effort to develop an open-source, produc- performance portability of applications and related software. tion OpenACC compiler ecosystem that is easily extensible The Exascale Computing Project (ECP) and the broader and utilizes the latest advances in compiler technology. Such community are actively exploring strategies to provide this an ecosystem is critical to successful acceleration of applica- portability and performance. These strategies include libraries, tions using modern HPC hardware. Clacc’s objectives are: domain-specific languages, OpenCL, and directive-based com- 1) Develop production, standard-conforming OpenACC piler extensions like OpenMP and OpenACC. Over the last compiler/runtime support by extending Clang/LLVM. decade, most HPC users at the DOE have used either CUDA or 2) As part of the compiler design, leverage the Clang OpenACC for heterogeneous computing. CUDA has been very ecosystem to enable research and development of successful for NVIDIA GPUs, but it’s proprietary and avail- source-level OpenACC tools, such as pretty printers, able on limited architectures. OpenMP has only recently pro- analyzers, lint tools, and debugger and editor extensions. vided limited support for heterogeneous systems as historically 3) As the work matures, contribute OpenACC support to it has focused on shared-memory multi-core programming. upstream Clang and LLVM so that it can be used by the OpenCL has varying levels of support across platforms and broader HPC and parallel programming communities. is often verbose, and thus it must be tuned for each platform. 4) Throughout development, actively contribute upstream OpenACC was launched in 2010 as a portable, directive- all Clang/LLVM improvements, including OpenMP im- driven, programming model for heterogeneous accelerators provements, that are mutually beneficial to OpenACC [4]. Championed by organizations like NVIDIA, PGI, and support and to the broader Clang/LLVM ecosystem. ORNL, OpenACC has evolved into one of the most widely 5) Throughout development, actively contribute improve- used portable programming models for accelerators on HPC ments to the OpenACC specification. systems today. Because OpenACC is specified as a more de- A key feature of the Clacc design is to translate OpenACC scriptive language [5] while OpenMP is more prescriptive, the to OpenMP to build on Clang’s existing OpenMP compiler and runtime support. Because OpenACC is more descriptive while OpenMP is more prescriptive, this translation is effectively OpenACC source OpenACC source OpenACC source a lowering of the representation and thus fits the traditional ordering of compiler phases. We primarily intend this design parser parser parser as a pragmatic choice for implementing traditional compiler support for OpenACC. Thus, like other intermediate represen- OpenACC AST OpenACC AST tations, the generated OpenMP and related diagnostics are not acc2omp normally exposed to the compiler user. Even so, this design also creates the potential for several OpenMP AST OpenMP AST interesting non-traditional user-level features. For example, this design offers a path to reuse existing OpenMP debugging codegen codegen codegen and development tools, such as ARCHER [15], for the sake of OpenACC. This design should give rise to semantics and com- LLVM IR LLVM IR LLVM IR piler support for the combination of OpenACC and OpenMP LLVM LLVM LLVM in the same application source. As a lower representation, OpenMP could serve as a debugging and analysis aid as executable executable executable compiler developers and application authors investigate the optimization decisions made by the OpenACC compiler. OpenACC runtime OpenACC runtime OpenACC runtime Perhaps the most obvious user-level feature that Clacc’s OpenMP runtime OpenMP runtime OpenMP runtime design enables is source-to-source translation. This feature is especially important for the following reason. While we (A) (B) (C) believe that OpenACC’s current momentum as the go-to directive-based language for accelerators will continue into Fig. 1. Clacc Design Alternatives the foreseeable future, it has been our experience that some potential OpenACC adopters hesitate over concerns that Ope- nACC will soon be supplanted by OpenMP features. As a was not originally designed to support AST transformations tool capable of automatically porting OpenACC applications or source-to-source translation. Thus, to ensure the success of to OpenMP, Clacc could neutralize such concerns, encourage the Clacc project, it is important that we carefully weigh the adoption of OpenACC, and thus advance utilization of accel- potential ramifications of each major design decision. eration hardware in HPC applications. A. High-Level Design B. Contributions In this section, we evaluate three high-level design alter- Clacc is a work in progress, and it remains in active natives we have considered for the Clacc compiler. We have development. We are currently prototyping OpenACC support previously posted this portion of the design and discussed it on and evolving our design as issues arise. The contributions of the Clang developer mailing list, but we feel that it is important this paper are as follows: to recapture it here to provide context for the rest of the paper. 1) We explain the rationale for Clacc’s design as well as Fig. 1 depicts our three high-level design alternatives. In the current state of our prototype. each design alternative, red indicates the compiler phase that 2) We describe our extension of Clang’s TreeTransform effectively translates OpenACC to OpenMP. The components facility for transforming OpenACC to OpenMP without of this diagram are as follows: breaking Clang’s immutable AST design. • OpenACC source is C/C++ application source code 3) We describe a prescriptive mapping from OpenACC to containing OpenACC constructs. OpenMP for directives and clauses Clacc so far supports. • OpenACC AST is a Clang AST in which OpenACC con- 4) We discuss how Clacc’s design may evolve in the future structs are represented by OpenACC node types, which to utilize LLVM IR-level analyses, potentially taking don’t already exist in Clang’s upstream implementation. advantage of LLVM IR parallel extensions that are • OpenMP AST is a Clang AST in which OpenACC currently being discussed in the community. constructs have been lowered to OpenMP constructs 5) We present early performance results for our prototype. represented by OpenMP node types,

Load more