Opencoarrays: Open-Source Transport Layers Supporting Coarray Fortran Compilers

OpenCoarrays: Open-source Transport Layers Supporting Coarray Fortran Compilers ∗ y z Alessandro Fanfarillo Tobias Burnus Valeria Cardellini University of Rome Munich, Germany University of Rome Tor Vergata Tor Vergata Rome, Italy Rome, Italy x { q Salvatore Filippone Dan Nagle Damian Rouson University of Rome National Center for Sourcery Inc. Tor Vergata Atmospheric Research Oakland, California Rome, Italy Boulder, Colorado ABSTRACT 1. INTRODUCTION Coarray Fortran is a set of features of the Fortran 2008 stan- Coarray Fortran (also known as CAF) originated as a syn- dard that make Fortran a PGAS parallel programming lan- tactic extension of Fortran 95 proposed in the early 1990s guage. Two commercial compilers currently support coar- by Robert Numrich and John Reid [10] and eventually be- rays: Cray and Intel. Here we present two coarray trans- came part of the Fortran 2008 standard published in 2010 port layers provided by the new OpenCoarrays project: one (ISO/IEC 1539-1:2010) [11]. The main goal of coarrays is library based on MPI and the other on GASNet. We link to allow Fortran users to create parallel programs without the GNU Fortran (GFortran) compiler to either of the two the burden of explicitly invoking communication functions OpenCoarrays implementations and present performance com- or directives such as those in the Message Passing Interface parisons between executables produced by GFortran and the (MPI) [12] and OpenMP [2]. Cray and Intel compilers. The comparison includes syn- thetic benchmarks, application prototypes, and an applica- Coarrays are based on the Partitioned Global Address Space tion kernel. In our tests, Intel outperforms GFortran only on (PGAS) parallel programming model, which attempts to intra-node small transfers (in particular, scalars). GFortran combine the SPMD approach used in distributed memory outperforms Intel on intra-node array transfers and in all systems with the semantics of shared memory systems. In settings that require inter-node transfers. The Cray compar- the PGAS model, every process has its own memory ad- isons are mixed, with either GFortran or Cray being faster dress space but it can access a portion of the address space depending on the chosen hardware platform, network, and on other processes. The coarray syntax adds a few specific transport layer. keywords and leaves the Fortran user free to use the regular array syntax within parallel programs. Keywords Coarrays were first implemented in the Cray Fortran com- Fortran, PGAS, Coarrays, GCC, HPC piler, and the Cray implementation is considered the most ∗ mature, reliable, and high-performing. Since the inclusion [email protected] - To whom correspondence of coarrays in the Fortran standard, the number of compil- should be addressed y ers implementing them has increased: the Intel and g95 [15] [email protected] Fortran compilers as well as compiler projects at Rice Uni- z [email protected] versity [6] and the University of Houston (OpenUH) [5] sup- xsalvatore.fi[email protected] port coarrays. { [email protected] The Cray compiler only runs on proprietary architectures. q [email protected] The Intel compiler is only available for Linux and Windows, and the standard Intel license only supports coarrays in shared memory; running in distributed memory requires the Intel Cluster Toolkit. These technical limitations, in con- junction with the cost of a commercial compiler, limit the widespread usage of coarrays. The availability of coarray support in a free compiler could contribute to wider evaluation and adoption of the coarray parallel programming model. The free, released compilers with coarray support, however, do not support several other integer : : num img , me standard Fortran features, which limits their utility in mod- ern Fortran projects. For example, neither the Rice compiler num img = num images ( ) nor OpenUH nor g95 supports the object-oriented program- me = t h i s i m a g e ( ) ming (OOP) features that entered the language in the For- ! Some code here tran 2003 standard. Furthermore, g95's coarray support is x(2) = x(3)[7] ! get value from image 7 only free in its shared-memory configuration. Configuring x(6)[4] = x(1) ! put value on image 4 the compiler for multi-node coarray execution requires pur- x(:)[2] = y(:) ! put array on image 2 chasing a commercial license. sync a l l Considering the compiler landscape, the advent of support ! Remote−to−remote array transfer for coarrays in the free, open-source, and widely used GNU i f (me == 1) then Fortran compiler (GFortran) potentially represents a water- y ( : ) [ num img]=x( : ) [4] shed moment. The most recent GFortran release (4.9.1) sync images(num img ) supports most features of the recent Fortran standard, in- e l s e i f (me == num img ) then cluding the aforementioned OOP features [4]. Furthermore, sync images([1]) end i f the pre-release development trunk of GFortran offers nearly complete support for the coarray features of Fortran 2008 x(1:10:2) = y(1:10:2)[4] ! strided get from 4 plus several anticipated new coarray features expected to appear in the next Fortran standard. However, any programs that are not embarrassingly parallel require linking In this example, x and y are coarray arrays and every image GFortran-compiled object files with a parallel communica- can access these variables on every other image. All the tion library. In this paper, we present performance results usual Fortran array syntax rules are valid and applicable. for OpenCoarrays, the first collection of open-source trans- Also, a coarray may be of user-defined derived type. port layers that support coarray Fortran compilers by trans- lating a compiler's communication and synchronization re- Fortran provides Locks, Critical sections and Atomic Intrin- quests into calls to one of several communication libraries. sics for coarrays. Currently, the Cray compiler is the only We provide a link to the OpenCoarrays source code reposi- released compiler supporting these features. Although not tory on the dedicated domain http://opencoarrays.org. discussed in this paper, the OpenCoarray and GNU Fortran trunks offer partial support for these features. Section 2 introduces the coarray programming model. Sec- tion 3 introduces the OpenCoarrays transport layers: one 3. GNU FORTRAN AND LIBCAF based on MPI and one on the GASNet communication li- GNU Fortran (GFortran) is a free, efficient and widely used brary for PGAS languages [1]. Section 4.1 presents the compiler. Starting in 2012, GFortran supported the coar- test suite that forms our basis for comparisons between each ray syntax for single-image execution, but it did not pro- compiler/library combination. Section 5 presents the band- vide multi-image support. GFortran delegates communica- width, latency, and execution times for various message sizes tion and synchronization to an external library (LIBCAF) and problem sizes produced with each compiler/library com- while the compiler remains agnostic with regards to the ac- bination. Section 6 summarizes our conclusions. tual implementation of the library calls. 2. INTRODUCTION TO COARRAYS For single-image execution, GFortran calls stub implemen- In this section, we explain basic coarray concepts. A pro- tations inside a LIBCAF SINGLE library included with the gram that uses coarrays is treated as if it were replicated at compiler. For multi-image execution, an external library the start of execution. Each replication is called an image. must be provided to handle GFortran-generated procedure Each image executes asynchronously until the programmer invocations. Having an external library allows for exchang- explicitly synchronizes via one of several mechanisms. A ing communication libraries without modifying the compiler typical synchronization statement is sync all, which func- code. It also facilitates integration of the library into com- tions as a barrier at which all images wait until every image pilers other than GFortran so long as those compilers are reaches the barrier. A piece of code contained between syn- capable of generating the appropriate library calls. Open- chronization points is called a segment and a compiler is free Coarrays uses a BSD-style, open-source license. to apply all its optimizations inside a segment. OpenCoarrays currently provides two alternative versions An image has an image index that is a number between one of the library: LIBCAF MPI and LIBCAF GASNet. The and the number of images (inclusive). In order to identify MPI version has the widest features coverage; it is intended a specific image at runtime or the total number of images, as the default because of the ease of usage and installation. Fortran provides the this_image() and num_images() func- The GASNet version targets expert users; it provides better tions. A coarray can be a scalar or array, static or dynamic, performance than the MPI version but requires more effort and of intrinsic or derived type. A program accesses a coar- during the installation, configuration, and usage. ray object on a remote image using square brackets [ ]. 3.1 LIBCAF_MPI An object with no square brackets is considered local. A LIBCAF MPI currently supports simple coarray Fortran program follows: real , dimension (10), codimension[ ∗ ] : : x , y • Coarray scalar and array transfers, including efficient transfers of array sections with non-unit stride. all the synchronization routines. This paper therefore provides only a partial analysis of this version. We plan to • Synchronization via , . sync all sync images complete every test case in future work. • Collective procedures (co_sum, co_max, co_min, etc...) except for character coarrays. LIBCAF GASNet maps the Get and Put operations directly onto the gasnet put bulk and gasnet get bulk functions. As • Atomics. LIBCAF MPI, LIBCAF GASNet offers two alternatives for the strided transfers: the element-by-element approach and The support for coarray transfers includes get/put and strided the native support for strided transfers provided by GAS- get/put for every data type, intrinsic or derived.

Opencoarrays: Open-Source Transport Layers Supporting Coarray Fortran Compilers

Introduction to Programming in Fortran 77 for Students of Science and Engineering

Writing Fast Fortran Routines for Python

Fortran Resources 1

NUG Single Node Optimization Presentation

PHYS-4007/5007: Computational Physics Course Lecture Notes Section II

Using GNU Fortran 95

Advanced Fortran @

User's Guide for Quantum ESPRESSO

Coarrays in GNU Fortran

Pathscale Fortran and C/C++ Compilers

In the GNU Fortran Compiler

Using GNU Fortran