An Active-Library Based Investigation Into the Performance Optimisation of Linear Algebra and the Finite Element Method

Imperial College of Science, Technology and Medicine Department of Computing An Active-Library Based Investigation into the Performance Optimisation of Linear Algebra and the Finite Element Method Francis Prem Russell Submitted in part fulfilment of the requirements for the degree of Doctor of Philosophy in Computing and the Diploma of Imperial College, London, July 2011 Abstract In this thesis, I explore an approach called \active libraries". These are libraries that take part in their own optimisation, enabling both high-performance code and the presentation of intuitive abstractions. I investigate the use of active libraries in two domains. Firstly, dense and sparse linear algebra, particularly, the solution of linear systems of equations. Secondly, the specification and solution of finite element problems. Extending my earlier (MEng) thesis work, I describe the modifications to my linear algebra library \Desola" required to perform sparse-matrix code generation. I show that optimisations easily applied in the dense case using code-transformation must be applied at a higher level of abstraction in the sparse case. I present performance results for sparse linear system solvers generated using Desola and compare against an implementation using the Intel Math Kernel Library. I also present improved dense linear-algebra performance results. Next, I explore the active-library approach by developing a finite element library that captures runtime representations of basis functions, variational forms and sequences of operations be- tween discretised operators and fields. Using captured representations of variational forms and basis functions, I demonstrate optimisations to cell-local integral assembly that this approach enables, and compare against the state of the art. As part of my work on optimising local assembly, I extend the work of Hosangadi et al. on common sub-expression elimination and factorisation of polynomials. I improve the weight function presented by Hosangadi et al., increasing the number of factorisations found. I present an implementation of an optimised branch-and-bound algorithm inspired by reformulating the original matrix-covering problem as a maximal graph biclique search problem. I evaluate the algorithm's effectiveness on the expressions generated by our finite element solver. i ii Acknowledgements I would like to express my greatest thanks and appreciation to: • Paul Kelly, my supervisor, for his time, knowledge, patience, enthusiasm and endless discussions without which this work would never have been possible. • Tony Field, my second supervisor, for the interesting conversations we've had. • Graham Markall, for our many discussions about finite element assembly and other related issues. • David Ham, for his help in understanding the numerical issues surrounding the finite element discretisation of the Navier-Stokes equations. • William Knottenbelt and Peter Jimack, my examiners, for taking the time and effort to read my thesis and for their constructive comments and suggestions. • Tristan Allwood, Marc Hull and Matthew Sackman for their support with special thanks to Tristan and Marc for enduring countless conversations about the many research obsta- cles I encountered. • My family, for always having faith in my abilities. iii iv Contents Abstract i Acknowledgements iii 1 Introduction1 1.1 Thesis Statement...................................1 1.2 Motivation and Objectives..............................1 1.3 Contributions.....................................2 1.4 Statement of Originality...............................3 1.5 Publications......................................4 2 Background5 2.1 Optimising Domain-Specific Abstractions......................5 2.1.1 Expression Templates and Template Metaprogramming..........6 2.1.2 ROSE.....................................8 2.1.3 Broadway Compiler..............................9 2.2 Programming for Adaptation............................ 10 v vi Contents 2.2.1 Sequoia.................................... 11 2.2.2 Themis .................................... 12 2.3 Library Adaptation through Empirical Techniques................. 14 2.3.1 X Language.................................. 15 2.3.2 ATLAS.................................... 16 2.3.3 SPIRAL.................................... 19 2.4 Code Optimisation Techniques and Models..................... 22 2.4.1 Loop Fusion.................................. 22 2.4.2 Array Contraction.............................. 23 2.4.3 Polytope Model................................ 23 2.4.4 Runtime Data and Computation Reordering................ 25 2.5 Code Generation and Optimisation Frameworks.................. 30 2.5.1 LLVM..................................... 30 2.5.2 SUIF...................................... 35 2.5.3 TaskGraph.................................. 36 2.6 Tensors........................................ 36 2.7 Vector Calculus Notation............................... 38 2.8 The Finite Element Method............................. 39 2.8.1 Introduction.................................. 39 2.8.2 The Method of Weighted Residuals..................... 40 2.8.3 Boundary Conditions............................. 41 Contents vii 2.8.4 Problem Discretisation............................ 42 2.8.5 Assembly and Solution............................ 43 2.9 Finite Element Libraries............................... 44 2.9.1 Analysa.................................... 44 2.9.2 GetDP..................................... 45 2.9.3 deal.II..................................... 46 2.9.4 FreeFEM++................................. 47 2.9.5 Sundance................................... 47 2.9.6 FEniCS.................................... 48 2.10 Conclusion....................................... 49 3 Code Generation for Iterative Solvers of Sparse Linear Systems 51 3.1 Introduction...................................... 51 3.2 Delaying Evaluation................................. 53 3.3 Runtime Code Generation.............................. 55 3.4 Code Caching..................................... 57 3.5 Loop Fusion and Array Contraction......................... 59 3.6 Liveness Analysis................................... 61 3.7 Performance Evaluation for Dense Linear Algebra................. 63 3.8 Extension to Sparse Matrices............................ 72 3.8.1 Sparse Matrix Storage............................ 72 3.8.2 Code Generation............................... 72 viii Contents 3.8.3 Matrices.................................... 78 3.8.4 Results..................................... 78 3.9 Conclusions and Further Work............................ 81 4 Excafe´: An Expression Capturing Finite Element Library 85 4.1 The Finite Element Method............................. 85 4.2 Our Investigation................................... 86 4.3 Data Flow in Excafe´ ................................ 88 4.4 Representation Capabilities of Excafe´ ....................... 89 4.5 Overview of the Excafe´ Chapters......................... 92 5 Heat Solver Example in Excafe´ 95 5.1 A Simple Heat Conduction Problem......................... 95 5.2 Discretisation of the Heat Equation......................... 97 5.3 Specifying the Problem Context to Excafe´ .................... 99 5.4 Specifying the Solution Steps to Excafe´ ...................... 100 5.5 Executing the solver................................. 102 5.6 Generated Output.................................. 102 5.7 Conclusion....................................... 104 6 Expression Capture in Excafe´ 105 6.1 Overview........................................ 105 6.2 Problem Context Construction............................ 107 Contents ix 6.2.1 Specifying Tensor Field Discretisation.................... 107 6.2.2 Boundary Conditions............................. 109 6.3 Discrete Expression Capture............................. 111 6.3.1 Handle Types................................. 111 6.3.2 Linear System Solution............................ 114 6.3.3 Registering Solution Steps.......................... 114 6.4 Declarative Iteration Capture............................ 116 6.4.1 Linearising the Convective Acceleration Term of Navier-Stokes...... 117 6.4.2 C++ Syntax................................. 118 6.4.3 Comparison to Imperative Form....................... 122 6.4.4 Future Development............................. 122 6.5 Variational Form Capture.............................. 123 6.6 Basis Function Capture................................ 125 6.7 Conclusion....................................... 130 7 Imperative Loop Inference 133 7.1 The Problem..................................... 133 7.2 Algorithm....................................... 134 7.2.1 Expression DAG Annotation......................... 136 7.2.2 Determining Loop Nesting.......................... 139 7.2.3 Topological Sort................................ 141 7.3 Declarative Specification Validation......................... 142 x Contents 7.3.1 Updating persistent values with incorrectly scoped expressions...... 142 7.3.2 Under-specification of loop nesting..................... 142 7.3.3 Over-specification of loop nesting...................... 144 7.3.4 Invalid dependencies............................. 144 7.4 Conclusion....................................... 145 8 Incompressible Navier-Stokes Solver 147 8.1 The Test Problem................................... 147 8.2 Our model....................................... 149 8.3 Variational Formulation............................... 150 8.4 Outflow Boundary Condition............................ 151 8.5 Temporal Discretisation..............................

An Active-Library Based Investigation Into the Performance Optimisation of Linear Algebra and the Finite Element Method

Linear Algebra Libraries

C++ API for BLAS and LAPACK

Chapter 1. Preliminaries

A Framework for Developing Finite Element Codes for Multi- Disciplinary Applications

Domain Engineering and Generic Programming for Parallel Scientific

A Framework for Developing Finite Element Codes for Multi- Disciplinary Applications

Linear Algebra Libraries Claire Mouton

The Libflame Library for Dense Matrix Computations