A New Vision for Coarray Fortran

A New Vision for Coarray Fortran John Mellor-Crummey, Laksono Adhianto, William N. Scherer III, Guohua Jin Department of Computer Science, Rice University {johnmc, laksono, scherer, jin}@rice.edu ABSTRACT a coarray associated with an image may be a singleton in- In 1998, Numrich and Reid proposed Coarray Fortran as a stance of a type rather than an array of type instances. In- simple set of extensions to Fortran 95 [8]. Their principal stead of explicitly coding message exchanges to obtain data extension to Fortran was support for shared data known as belonging to other images, a CAF program can directly ac- coarrays. In 2005, the Fortran Standards Committee began cess a coarray associated with another image by appending exploring the addition of coarrays to Fortran 2008, which is a bracketed tuple to a reference to a coarray variable. For now being finalized. Careful review of drafts of the emerging instance, any image can read the first column of data in Fortran 2008 standard led us to identify several shortcom- coarray a from image p by executing the right-hand side ings with the proposed coarray extensions. In this paper, we reference a(:,1)[p]. briefly critique the coarray extensions proposed for Fortran 2008, outline a new vision for coarrays in Fortran language Numrich and Reid's design for CAF included several syn- that is far more expressive, and briefly describe our strategy chronization primitives. The most important of these are for implementing the language extensions that we propose. the synchronous barrier sync_all; sync_team, which is used for synchronization among dynamically-specified teams of Categories and Subject Descriptors two or more processes; and start_critical/end_critical, which control access to a global critical section. D.3 [Programming Languages]: Language Constructs and Features; D.1.3 [Programming Techniques]: Con- In 2005, the Fortran Standards committee began exploring current Programming|parallel programming the addition of coarray constructs to the emerging Fortran 2008 standard. Their design closely follows Numrich and Keywords Reid's original vision. Coarrays are shared data allocated Coarray Fortran, Parallel programming collectively across all images. A coarray can have multiple codimensions enabling one to conveniently index a coar- 1. INTRODUCTION ray distributed over a grid of process images that is logi- In 1998, Numrich and Reid proposed a small set of exten- cally multidimensional. Our earlier criticisms about Num- sions to Fortran 95 to support parallel programming that rich and Reid's teams in CAF supporting only all-pairs com- they dubbed Coarray Fortran (CAF) [8]. They envisioned munication rather than efficient collective operations led the CAF as a model for SPMD parallel programming based on Fortran Standards Committee to consider support for pre- a static collection of asynchronous process images (known arranged image teams. Unfortunately, support for image as images for short) and a partitioned global address space. teams has been tabled for Fortran 2008, although it may be Their principal extension to Fortran was support for shared considered for inclusion in the future. A previous detailed data in the form of coarrays. Coarrays extend Fortran's critique [6] of the coarray extensions proposed for Fortran syntax for type declarations and variable references with a 2008 and a recent review of the latest working draft for For- bracketed tuple that is used to declare shared data or access tran 2008 [4] revealed several shortcomings in emerging coar- data associated with other images. For example, the decla- ray extensions that limit their expressiveness: ration integer :: a(n,m)[*] declares a shared coarray with n × m integers local to each image. Dimensions in the • There is no support for processor subsets; for instance, bracketed tuple are called codimensions. Coarrays may be coarrays must be allocated over all images. declared for primitive or user-defined types. The data for • Coarrays must be declared as global variables; one can- not dynamically allocate a coarray into a locally scoped variable. • The coarray extensions lack any notion of global pointers, which are essential for creating and manipulating any kind of linked data structure. • Reliance on named critical sections for mutual exclu- PGAS’09, October 5–8, 2009, Ashburn, Virginia, USA. sion hinders scalable parallelism by associating mutual exclusion with code regions rather than data objects. • Fortran 2008's sync images statement (a reworked { split-phase barriers (x2.5.3) for overlapping com- version of Numrich and Reid's sync_team) enables one munication and computation to synchronize directly with one or more images; however, this construct doesn't provide a safe synchroniza- • collective communication (x2.6) tion space. As a result, synchronization operations in • asynchronous communication support (x2.7) for user's code that are pending when a library call is made hiding communication latency can interfere with synchronization in the library call. • There are no mechanisms to avoid or tolerate latency • a memory model (x2.8) that enables one to trade when manipulating data on remote images. ease of use for performance • There is no support for collective communication. Most of these ideas are inspired by features in MPI [11] and • There is no support for hiding communication latency. Unified Parallel C [3]. Here, we describe their realization as a cohesive whole to support parallelism in Fortran. These shortcomings caused us to rethink the CAF model. Our interest is in developing an expressive set of parallel 2.1 Process subsets extensions for Fortran that map well onto parallel systems Processor subsets is a useful abstraction for decomposing of all sizes, ranging from multicore nodes to petascale plat- work in a parallel application. Processor subsets can be used forms. In this paper, we propose a new vision for coarray- in coupled applications (e.g., ocean and atmosphere subsets based extensions to the Fortran language. Our design fo- in a climate application) as well as within dense matrix cuses on three core tenets: orthogonality, expressiveness, numerical computation such as matrix decomposition and and simplicity. In a nutshell, it provides full support for solver for linear equations (e.g., finding pivot element and processor subsets, logical topologies that are more expres- exchange rows within column subsets and broadcast factored sive than multiple codimensions, dynamic allocation of coar- panel across row subsets). Earlier drafts of Fortran 2008 in- rays, scalable mutual exclusion, safe synchronization spaces, cluded support for image teams; however, these teams were latency hiding, collective communication, and a memory designed solely to support collective communication. Here model that enables one to trade ease of use for performance. we describe a broader vision for teams. In Sections 2, we outline our new vision for CAF. In Sec- In Coarray Fortran 2.0, a team is a first-class entity that con- tion 3, we highlight the implementation of key features to sists of an ordered sequence of process images. Teams need show that this vision is practical. In Section 4, we describe not be disjoint and a process image may be a member of ongoing work. multiple teams. A team serves three purposes. First, it rep- resents a set of process images. This set of images can serve 2. COARRAY FORTRAN 2.0 DESIGN as a domain onto which coarrays may be allocated. Second, Here, we describe an expressive set of coarray-based exten- it provides a namespace within which process images and sions to Fortran that we believe provide a productive paral- coarray instances can be indexed by an image's rank r in a lel programming model. Compared to the emerging Fortran team t, where r 2 f0..team_size(t) - 1g, rather than an 2008, our coarray-based language extensions include some absolute image ID. As identified by Skjellum [10], relative additional features: indexing by rank is particularly useful for supporting the development of libraries, where code needs to be reusable across sets of processor images. Third, a team provides a • process subsets (x2.1), which support coarrays, col- domain for collective communication. lectives, and relative indexing for pairwise operations When a CAF program is launched, all process images are ini- • topologies, which augment teams with a logical com- tially part of a pre-defined team known as team_world. New munication structure (x2.2) teams may be constructed from existing teams by using the • dynamic allocation/deallocation of coarrays collective team_split and team_merge operations shown in and other shared data (x2.3) Figure 1. The split operation was inspired by the functional- ity of MPI's MPI_Comm_split [7]. As with MPI_Comm_split, { local variables within subroutines: declaration each process image invoking team_split on an existing team and allocation of coarrays inside procedures scope provides a positive integer color (or color_undefined) and is critical for library based-code. a key. Images that supply the same positive value for color { team-based coarray allocation and deallocation will be assigned to the same new subteam. If an image pro- { global pointers in support of dynamic data struc- vides the value color_undefined, it will not be assigned a tures (x2.4) new subteam. Members of a subteam result are ordered by the supplied key; if two members of the existing team supply • enhanced support for synchronization (x2.5) for the same color and key, their rank in the new

Load more