User-Defined Data Distributions in High-Level Programming Languages

User-Defined Data Distributions in High-Level Programming Languages

User-Defined Data Distributions in High-Level Programming Languages Roxana E Diaconescu Hans P. Zima Jet Propulsion Laboratory Center for Advanced Computing Research California Institute of Technology California Institute of Technology Pasadena, CA 91109 Pasadena, CA 91125 Email: [email protected] Email: [email protected] Abstract interwoven. The High Performance Fortran (HPF) family of lan- One of the characteristic features of today’s high per- guages proposed a higher-level programming paradigm formance computing systems is a physically distributed based on an abstract specification of data distribution memory. Efficient management of locality is essential by the programmer while delegating the generation of for meeting key performance requirements for these ar- explicit message passing code to the compiler/runtime chitectures. The standard technique for dealing with system. This approach presented a major step towards this issue has involved the extension of traditional se- more expressive parallel languages, but has not, for a quential programming languages with explicit message- variety of reasons, been broadly accepted by the user passing, in the context of a processor-centric view of community. However, it has deeply influenced later re- parallel computation. This has resulted in complex and search, including the work presented in this paper. We error-prone assembly-style codes in which algorithms describe the design of a powerful facility for defining and communication are inextricably interwoven. new data distributions in the context of the Chapel This paper presents a high-level approach to the de- programming language [3, 8]. User-defined distribu- sign and implementation of data distributions. Our tions increase the power of the underlying language work is motivated by the need to improve the cur- similar to the way function definitions raise the op- rent parallel programming methodology by introducing erational level of a programming language: new data a paradigm supporting the development of efficient and distributions can be generated as first-class objects in reusable parallel code. This approach is currently be- a language-provided framework, placed in a library, ing implemented in the context of a new programming passed to functions, and reused in array declarations. language called Chapel, which is designed in the HPCS In the simplest case, the specification of a new distri- project Cascade. bution can consist of just a few lines of code to define a mapping from global indices to memory; in contrast, a sophisticated user (or distribution writer) can control 1 Introduction the internal representation and layout of data to an almost arbitrary degree, allowing even the expression A key feature of today’s high performance comput- of auxiliary structures typically used for distributed ing systems is a physically distributed memory, which sparse matrix data. is common to all architectures on the Top500 list [16]. Our goal is to provide maximum flexibility to Efficient management of locality is essential for these the programmer when distributing data collections machines. The standard technique for dealing with this across locales (units of uniform memory access), while issue has involved the extension of traditional sequen- enabling compiler transformations and optimizations tial programming languages such as Fortran, C, and that deal with low-level details of distribution man- C++ with explicit message-passing, in the context of agement such as explicit distinction between local and a processor-centric view of parallel computation. This remote accesses and generation of communication and has resulted in complex and error-prone programs in synchronization. The programmer retains control of which algorithms and communication are inextricably the higher-level aspects of data distribution: the spe- 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT'06) 0-7695-2644-6/06 $20.00 © 2006 Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 16,2010 at 23:37:45 UTC from IEEE Xplore. Restrictions apply. cific challenge for the design of the relevant language Rice University’s dHPF compiler [9]. More recently, features is to come up with a set of primitives and High Performance Java [20] and a number of parallel interfaces which establish a productive communication Matlab versions [7] have extended their base languages between programmer intentions and compiler trans- with high-level distribution specifications. formations. Object-oriented language extensions and systems such as ICC++ [6] and pC++ [2] wrap distributions The rest of the paper is organized as follows. After and data into classes and collections with overridable summarizing related research efforts and their relation- behavior to account for reuse and gain flexibility. This ship to our work in Section 2 we establish the concep- represents significant progress with respect to produc- tual foundation for the discussion of distributions in tivity, but no advance regarding the specification and Section 3. The following Section 4 shows how distri- efficient utilization of distributions. Distributions are butions can be defined and used in Chapel, illustrating either restricted to a set of built-in types, or are speci- the our framework for specifying new distributions with fied via restrictive mechanisms (e.g., in ICC++ a map a set of simple examples. Finally, Section 5 states the file which is a sequence of integer indices along with main conclusion of our paper and summarizes future virtual processor numbers needs to be supplied man- directions of research. ually). Charm++ [19] lets the runtime system decide how to map objects to processors such as to ensure 2 Related Work load-balance. Recent language developments include the emerg- Our work builds on research performed by many ing class of partitioned global array (PGAS) languages, groups over the past decades. IVTRAN [22], developed such as CoArray Fortran [10], Unified Parallel C [17], for the SIMD architecture ILLIAC IV, was an early lan- and Titanium [24]. These languages provide stan- guage providing high-level control of data distributions. dard distributions but still require the user to explicitly With the advance of distributed-memory systems in control communication in the context of a processor- the 1980s, a new class of data-parallel languages was centric programming model. Thus they represent an explored. In such languages the large data structures advance with regard to the MPI-based programming in an application are laid out across the memories of paradigm, but do not target a broader productivity a distributed-memory parallel machine. The subcom- impact along the lines of the Chapel approach. ponents of these distributed data structures can then Closer to the goals represented by Chapel are two be operated upon in parallel on all processors. Kali languages developed along with Chapel in DARPA’s [21] was the first language to introduce distribution High Productivity Computing Systems (HPCS) pro- declarations in the context of distributed-memory ar- gram: X10, designed in the PERCS project led by chitectures, together with a simple mechanism for user- IBM [12, 23], and Fortress [11], developed at SUN defined distributions. Fortran D [14], Vienna Fortran Microsystems. Both languages use the HPF-inspired [5], and Connection Machine Fortran [1], the major pre- approach of providing explicit data distribution decla- decessors of the High Performance Fortran [15] effort, rations. all offered facilities for combining multi-dimensional ar- ray declarations with the specification of a data distri- A key difference between the language work reported bution or alignment. Several other academic as well above and our research is that we do not study new as commercial projects also contributed to the under- partitioning strategies for inclusion into a set of dis- standing necessary for the development of data parallel tributions offered by the language. There is no con- languages and the required compilation technology. cept of built-in distribution in Chapel: we provide a These languages largely relied on a set of built- general method for allowing novel distributions to be in mechanisms, such as regular block and block-cyclic expressed without modifying the compiler. A prede- distributions, as well as limited features for irregu- fined generic distribution type can be customized by lar distributions, such as general block and indirect. the programmer to express specific needs of the ap- The Vienna Fortran language specification [18] intro- plication, algorithms, or data access patterns. This is duced a capability for user-defined mappings from For- supported by a standard interface that establishes the tran arrays to a set of abstract processors, and for protocol of communication between user intentions and user-defined alignments between arrays. Distribution compiler transformations. Such flexibility increases the classes more general than the standard distributions control a programmer has over efficient program execu- mentioned above include the Kelp library [13] and the tion while promoting software productivity via object- generalized multipartitioning scheme implemented in oriented reuse, type parameterization, and composi- 2nd IEEE International Conference on Space Mission Challenges for Information Technology (SMC-IT'06) 0-7695-2644-6/06 $20.00 © 2006 Authorized licensed use limited to: CALIFORNIA INSTITUTE OF TECHNOLOGY. Downloaded on April 16,2010

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    9 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us