Exastencils – Advanced Multigrid Solver Generation

Exastencils – Advanced Multigrid Solver Generation

ExaStencils – Advanced Multigrid Solver Generation Christian Lengauer, Sven Apel, Matthias Bolten, Shigeru Chiba, Ulrich Rüde, Jürgen Teich, Armin Größlinger, Frank Hannig, Harald Köstler, Lisa Claus, Alexander Grebhahn, Stefan Groth, Stefan Kronawitter, Sebastian Kuckuk, Hannah Rittich, Christian Schmitt, and Jonas Schmitt Christian Lengauer University of Passau, e-mail: [email protected] Sven Apel University of Passau and Saarland University, e-mail: [email protected] Matthias Bolten University of Wuppertal, e-mail: [email protected] Shigeru Chiba The University of Tokyo, e-mail: [email protected] Ulrich Rüde Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Jürgen Teich Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Armin Größlinger University of Passau, e-mail: [email protected] Frank Hannig Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Harald Köstler Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Lisa Claus University of Wuppertal, e-mail: [email protected] Alexander Grebhahn University of Passau, e-mail: [email protected] Stefan Groth Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Stefan Kronawitter University of Passau, e-mail: [email protected] Sebastian Kuckuk Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Hannah Rittich Jülich Supercomputing Centre, Forschungszentrum Jülich, e-mail: [email protected] Christian Schmitt Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] Jonas Schmitt Friedrich-Alexander University Erlangen-Nürnberg, e-mail: [email protected] 1 2 Christian Lengauer et al. Abstract Present-day stencil codes are implemented in general-purpose program- ming languages, such as Fortran, C, or Java, Python or derivates thereof, and har- nesses for parallelism, such as OpenMP, OpenCL or MPI. Project ExaStencils pur- sued a domain-specific approach with a language, called ExaSlang, that is stratified into four layers of abstraction, the most abstract being the formulation in continuous mathematics and the most concrete a full, automatically generated implementation. At every layer, the corresponding language expresses not only computational di- rectives but also domain knowledge of the problem and platform to be leveraged for optimization. We describe the approach, the software technology behind it and several case studies that demonstrate its feasibility and versatility: high-performance stencil codes can be engineered, ported and optimized more easily and effectively. 1 Overview of ExaStencils 1.1 Project Vision ExaStencils1 takes a revolutionary, rather than evolutionary, approach to software engineering for high performance. It seeks to provide a proof of concept that pro- gramming can be simplified considerably and that program optimization can be made much more effective by concentrating on a limited application domain. In the case of ExaStencils, it is a subdomain of geometric multigrid algorithms [85]. The idea is to write a program in a domain-specific programming language (DSL). In our case, it is an external DSL (i.e., a DSL built from scratch rather than embedded in an existing language), called ExaSlang [75]. ExaSlang is stratified into four layers of abstraction. Each layer provides a different view of the problem solution and can be enriched with information to allow for particular optimizations at that layer. ExaSlang’s most abstract layer specifies the problem as a set of partial differential equations (PDEs) defined on a continuous domain. The most concrete layer allows the user to specify low-level details for an efficient implementation on the execution platform at hand. Ideally, the domain expert should only be dealing with the first layer (plus some menu-driven options). The ExaStencils code generator should be able to generate all lower code layers while applying a set of optimizations autonomously before producing efficient target code. 1.2 Project Results This subsection summarizes the challenges that drove the development of ExaStencils and how far we got in the period of SPPEXA funding. 1 www.exastencils.org ExaStencils 3 We begin with the delineation of the domain and the mathematical challenges, distinctly from the computer science challenges, that ExaStencils addressed (see Section 2). We restricted our attention to the development of smoothers for geometric multigrid methods and the required analysis tools. In this domain, local Fourier analysis (LFA) is the method of choice to analyze the developed smoothers and the entire multigrid method. To make LFA useful for our purpose, we extended the method to periodic stencils, covering block smoothers and varying coefficients [11]. The first major computer science challenge was to cover, with one single source program, a wider range of multigrid solvers than is possible with contemporary im- plementations based on general-purpose host languages such as Fortran or C++ (see Section 3). To this end, we decided not to build ExaSlang on an existing, general- purpose host language but to make it an external DSL. We were able to demonstrate its flexibility already early on in the project by providing a common ExaSlang program for Jacobi, Gauss-Seidel, and red-black solvers for finite differences and finite volumes with constant and linear interpolation and with restriction [53]. Later on, we introduced a way to specify data layout transformations simply by a linear expression—a feature that can aid the development process significantly [44] (see Subsections 3.1–3.2). A smaller task was to decide how to describe aspects of the execution platform in a platform-description language (TPDL) [73] (see Subsec- tion 3.3). The second major computer science challenge was to reach high performance with our approach to code generation on a wide range of architectures (see Sec- tion 4). One important aspect here is what information to provide at which layer (see Subsection 4.1). While the syntax of ExaSlang is partly inspired by Scala, Matlab and LaTeX, our target language is C++ with additional features that depend on the execution platform (see Subsection 4.2). We demonstrated that, with the help of our optimization techniques (see Subsections 4.3–4.5), weak scaling could be achieved on traditional cluster architectures such as the JUQUEEN supercomputer at the Jülich Supercomputing Center (JSC) [75, 49]. We also achieved weak scaling on the GPU cluster Piz Daint at the Swiss National Supercomputing Centre (CSCS). Further- more, we demonstrated the automatic generation of solvers that can be executed on emerging architectures such as ARM [51] and FPGA platforms [77, 78]. One new concept that ExaStencils introduced into high-performance computing is that of feature-based domain-specific optimization [5] (see Subsection 4.6). The central idea is to view a source code, such as a stencil implementation or an ap- plication, as a member of a program family or software product line rather an as an isolated individual, and to describe the source code by its commonalities and variabilities with respect to the other family members in terms of features. A feature represents a concept of the domain (e.g., a type of smoother or grid) that may be selected and combined with others on demand. With this approach, a large search space of configuration choices can be reviewed automatically at the level of domain concepts and the most performant choices for the application and execution platform at hand can be identified. To this end, we devised a framework of sampling and machine learning approaches [36, 39, 81] that allow us to derive a performance model of a given code that is parameterized in terms of its features. This way, we can 4 Christian Lengauer et al. express performance behavior in terms of concepts of the domain and automatically determine optimal configurations that are tailored to the problem at hand, which we have demonstrated in the domain of stencil codes [30, 31, 32, 36, 39, 81, 59] and beyond (e.g., databases, video encoders, compilers, and compression tools). Our framework integrates well with the other parts of ExaStencils that use and gather domain and configuration knowledge in different phases. Project ExaStencils came with several case studies whose breadth was to demon- strate the practicality and flexibility of the approach (see Section 5). The case studies are a central deliverable of ExaStencils. They include studies close to real-world problems: the simulation of non-Newtonian and non-isothermal fluids (see Sub- section 5.3 and a molecular dynamics simulation (see Subsection 5.4). We conducted two additional studies at the fringes of ExaStencils, exploring alter- native approaches (see Section 6). In one, the option of an internal rather than external DSL was explored: ExaSlang 4 was embedded in the mutual host languages Java and Ruby to study the trade-off between the effort of the language implementation and the performance gain of the target code [17]. The outcome was that an embedding is possible but, as expected, with a loss of performance (see Subsection 6.1). In the second study, we implemented a simple multigrid solver in SPIRAL [10] (see Sub- section 6.2). The success of the SPIRAL project [24, 61] a decade ago was a strong motivator for project ExaStencils. SPIRAL can handle simple algebraic multigrid solvers but would have to be extended for more complex ones. 1.3 Project Peripherals Attempts of abstraction and automation in programming have received increased attention in the past two decades in the area of software engineering. High- performance computing has been comparatively conservative in going down this road. The reason is that the demands on performance are much higher than in gen- eral software engineering, and the architectures used to achieve it are more complex, notably with large numbers of loosely coupled processors. The potential of an effective automation grows as the application domain shrinks.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    48 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us