The Pennsylvania State University The Graduate School
THE AUXILIARY SPACE SOLVERS AND THEIR APPLICATIONS
A Dissertation in Mathematics by Lu Wang
c 2014 Lu Wang
Submitted in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
December 2014 The dissertation of Lu Wang was reviewed and approved∗ by the following:
Jinchao Xu Professor of Department of Mathematics Dissertation Advisor, Chair of Committee
James Brannick Associate Professor of the Department of Mathematics
Ludmil Zikatanov Professor of the Department of Mathematics
Chao-Yang Wang Professor of Materials Science and Engineering
Yuxi Zheng Professor of the Department of Mathematics Department Head
∗Signatures are on file in the Graduate School. Abstract
Developing efficient iterative methods and parallel algorithms for solving sparse linear sys- tems discretized from partial differential equations (PDEs) is still a challenging task in scien- tific computing and practical applications. Although many mathematically optimal solvers, such as the multigrid methods, have been analyzed and developed, the unfortunate reality is that these solvers have not been used much in practical applications. In order to narrow the gap between theory and practice, we develop, formulate, and analyze mathematically optimal solvers that are robust and easy to use in practice based on the methodology of Fast Auxiliary Space Preconditioning (FASP). We develop a multigrid method on unstructured shape-regular grids by the construction of an auxiliary coarse grid hierarchy on which the multigrid method can be applied by using the FASP technique. Such a construction is realized by a cluster tree which can be obtained in O(N log N) operations for a grid of N nodes. This tree structure is used for the definition of the grid hierarchy from coarse to fine. For the constructed grid hierarchy, we prove that the condition number of the preconditioned system for an elliptic PDE is O(log N). Then, we present a new colored block Gauss-Seidel method for general unstructured grids. By constructing the auxiliary grid, we can aggregate the degree of freedoms in the same cells of the auxiliary girds into one block. By developing and a parallel coloring algorithm for the tree structure, a colored block Gauss-Seidel method can be applied with the aggregates serving as non-overlapping blocks. On the other hand, we also develop a new parallel unsmoothed aggregation algebraic multigrid method for the PDEs defined on an unstructured mesh from the auxiliary grid. It provides (nearly) optimal load balance and predictable communication patterns factors that make our new algorithm suitable for parallel computing. Furthermore, we extend the FASP techniques to saddle point and indefinite problems. Two auxiliary space preconditioners are presented. An abstract framework of the symmetric positive definite auxiliary preconditioner is presented so that the optimal multigrid method could be applied for the indefinite problem on the unstructured grid. We also numerically verify the optimality of the two preconditioners for the Stokes equations.
iii Table of Contents
List of Figures vii
List of Tables ix
Acknowledgments x
Chapter 1 Introduction 1
Chapter 2 Iterative Method 8 2.1 Stationary Iterative Methods ...... 10 2.1.1 Jacobi Method ...... 10 2.1.2 Gauss-Seidel Method ...... 12 2.1.3 Successive Over-Relaxation Method ...... 14 2.1.4 Block Iterative Method ...... 16 2.2 Krylov Space Method and Preconditioners ...... 19 2.2.1 Conjugate Gradient Method ...... 20 2.3 Preconditioned Iterations ...... 24 2.3.1 Preconditioned Conjugate Gradient Method ...... 25 2.3.2 Preconditioning Techniques ...... 27 2.4 Numerical Example ...... 28 2.4.1 Comparison of the Iterative Method ...... 29 2.4.2 Comparison of the Preconditioners ...... 31
Chapter 3 Multigird Method and Fast Auxiliary Space Preconditioner 33 3.1 Method of Subspace Correction ...... 33
iv 3.1.1 Parallel Subspace Correction and Successive Subspace Correction . . 35 3.1.2 Multigrid viewed as Multilevel Subspace Corrections ...... 40 3.1.3 Convergence Analysis ...... 45 3.2 The Auxiliary Space Method ...... 55 3.3 Algebraic Multigrid Method ...... 59 3.3.1 Classical AMG ...... 60 3.3.2 UA-AMG ...... 63
Chapter 4 FASP for Poisson-like Problem on unstructured grid 66 4.1 Preliminaries and Assumptions ...... 67 4.2 Construction of the Auxiliary Grid-hierarchy ...... 68 4.2.1 Clustering and Auxiliary Box-trees ...... 68 4.2.2 Closure of the Auxiliary Box-tree ...... 71 4.2.3 Construction of a Conforming Auxiliary Grid Hierarchy ...... 74 4.2.4 Adaptation of the Auxiliary Grids to the Boundary ...... 76 4.2.5 Near Boundary Correction ...... 79 4.3 Estimate of the Condition Number ...... 80 4.3.1 Convergence of the MG on the Auxiliary Grids ...... 82 4.3.1.1 Stable decomposition: Proof of (A1) ...... 83 4.3.1.2 Strengthened Cauchy-Schwarz inequality: Proof of (A2) . . 84 4.3.1.3 Condition number estimation ...... 86
Chapter 5 Colored Gauss-Seidel Method by auxiliary grid 91 5.1 Graph Coloring ...... 92 5.2 Quadtree Coloring ...... 93 5.3 Tree Representations ...... 97 5.4 Parallel Implementation of the Coloring Algorithm ...... 99 5.5 Block Colored Gauss-Seidel Methods ...... 102
Chapter 6 Parallel FASP-AMG Solvers 103 6.1 Parallel Auxiliary Grid Aggregation ...... 106 6.2 Parallel Prolongation and Restriction and Coarse-level Matrices ...... 108 6.3 Parallel Smoothers Based on the Auxiliary Grid ...... 111 6.4 GPU Implementation ...... 112 6.4.1 Sparse Matrix-Vector Multiplication on GPUs ...... 113 6.4.2 Parallel Auxiliary Grid Aggregation ...... 114
v Chapter 7 Numerical Applications for Poisson-like Problem on Unstructured Grid 117 7.1 Auxiliary Space Multigrid Method ...... 117 7.1.1 Geometric Multigrid ...... 117 7.1.2 ASMG for the Dirichlet problem ...... 118 7.1.3 ASMG for the Neumann problem ...... 119 7.2 FASP-AMG ...... 121 7.2.1 Test Platform ...... 121 7.2.2 Performance ...... 122
Chapter 8 FASP for Indefinite Problem 128 8.1 Krylov Space Method for Indefinite Problems ...... 129 8.1.1 The Minimal Residual Method ...... 129 8.1.2 Generalized Minimal Residual Method ...... 134 8.2 Preconditioners for Indefinite Problems ...... 142 8.3 FASP Preconditioner ...... 144
Chapter 9 Fast Preconditioners for Stokes Equation on Unstructured Grid 147 9.1 Block Preconditioners ...... 148 9.2 Analysis of the FASP SPD Preconditioner ...... 154 9.3 Some Examples ...... 157 9.3.1 Use a Lower Order Velocity Space Pair as an Auxiliary Space . . . . 158 9.3.2 Use a Lower Order Pressure Space as an Auxiliary Space ...... 159
Chapter 10 Conclusions 162 10.1 Conclusions ...... 162 10.2 Future works ...... 163
Bibliography 164
vi List of Figures
2.1 Matrix splitting of A ...... 11 2.2 Comparison of the number of Iterations ...... 30 2.3 Comparison of the CPU time ...... 30 2.4 Comparison of the number of iterations for preconditioners ...... 31
4.1 Left: The 2D triangulation T of Ω with elements τi. Right: The barycenters ξi (dots) and the minimal distance h between barycenters...... 68 4.2 Examples of the region quadtree on different domains...... 70 4.3 Tree of regular boxes with root B1 in 2D. The black dots mark the corre- sponding barycenters ξi of the triangles τi. Boxes with less than three points ξi are leaves...... 70 4.4 The subdivision of the marked (red) box on level ` would create two boxes (blue) with more than one hanging node at one edge...... 72 4.5 The subdivision of the red box makes it necessary to subdivide nodes on all levels...... 73 4.6 Hanging nodes can be treated by a local subdivision within the box Bν. The top row shows a box with 1, 2, 2, 3, 4 hanging nodes, respectively, and the bottom row shows the corresponding triangulation of the box...... 74 4.7 The final hierarchy of nested grids. Red edges were introduced in the last (local) closure step...... 75 4.8 Case 1: σi is subdivided in the fine level ...... 75 4.9 Case 2: σi is not subdivided in the fine level ...... 75 4.10 A triangulation of the Baltic sea with local refinement and small inclusions. . 76 4.11 Hanging nodes can be treated by a local subdivision within the cube Bν. Firstly erasing the hanging nodes on the face and then connecting the center of the cube...... 77 4.12 The boundary Γ of Ω is drawn as a red line, boxes non-intersecting Ω are light green, boxes intersecting Γ are dark green, and all other boxes (inside of Ω) are blue...... 77
vii 4.13 The boundary Γ of Ω is drawn as a red line, boxes non-intersecting Ω are light green, and all other boxes (intersecting Ω) are blue...... 78 4.14 The finest auxiliary grid σ(10) contains elements of different size. Left: Dirich- let b.c. (852 degrees of freedom), right: Neumann b.c. (2100 degrees of freedom) 79
5.1 A balanced quadtree requires at least five colors ...... 94 5.2 Forced coloring rectangles ...... 95 5.3 Adaptive quadtree and its binary graph ...... 96 5.4 Six-Coloring for adaptive quadtree ...... 98 5.5 the Mordon code of an adaptive quadtree ...... 99 5.6 Adaptive quadtree and its binary graph ...... 100 5.7 Coloring of 3D adaptive octree ...... 101
6.1 Aggregation on level L...... 107 6.2 Aggregation on the coarse levels...... 108 6.3 Coloring on the finest level L ...... 112 6.4 Sparse matrix representation using the ELL format and the memory access pattern of SpMv...... 114
7.1 Covergence rates for Auxiliary Space MultiGrid with n4 = 737, 933, n5 = 2, 970, 149, n6 = 11, 917, 397, and n7 = 47, 743, 157 degrees of freedom. . . . . 119 7.2 Covergence rates for Auxiliary Space MultiGrid with n4 = 756, 317, n5 = 3, 006, 917, n6 = 11, 990, 933, and n7 = 47, 890, 229 degrees of freedom. . . . . 120 7.3 Quasi-uniform grid for a 2D unit square ...... 123 7.4 Shape-regular grid for a 2D unit square ...... 124 7.5 Shape-regular grid for a 2D circle domain ...... 125 7.6 Shape regular-grid for the 3D heat transfer problem on a cubic domain(left) and a cavity domain(right) ...... 127
+ −1 9.1 P2 − P0 elements and P1 − P0 elements ...... 158 + −1 0 −1 9.2 P2 − P1 elements and P2 − P0 elements ...... 160 + −1 0 −1 9.3 P2 − P1 elements and P2 − P0 elements ...... 160
viii List of Tables
2.1 Comparison of the different iterative method for Poisson equation ...... 30 2.2 Comparison of the different preconditioners for Poisson equation ...... 31
7.1 The time in seconds for the setup of the matrices and for ten steps of V-cycle (geometric) multigrid, Algorithm 16...... 118 7.2 The storage complexity in bytes per degree of freedom (auxiliary grids, aux- iliary matrices and H-solvers) and the solve time in seconds for an ASMG preconditioned cg-iteration...... 119 7.3 The storage complexity in bytes per degree of freedom (auxiliary grids, aux- iliary matrices and H-solvers) and the solve time in seconds for an ASMG preconditioned cg-iteration...... 120 7.4 Test Platform ...... 121 7.5 Wall time and number of iterations for the Poisson problem on a 2D uniform grid ...... 122 7.6 Wall time and number of iterations for the Poisson problem on a 2D quasi- uniform grid ...... 123 7.7 Wall time and number of iterations for the Poisson problem on a 2D shape- regular grid ...... 124 7.8 Wall time and number of iterations for the Poisson problem on a disk domain 125 7.9 Time/Number of iterations for the heat-transfer problem on a 3D unit cube . 127 7.10 Wall time and the number of iterations for the heat-transfer problem on a cavity domain ...... 127
+ −1 9.1 Number of iterations for using P1 − P0 elements as a preconditioner for 0 −1 Stokes equation with P2 − P0 elements ...... 159 + −1 9.2 Number of iterations for using P1 − P0 elements as a preconditioner for Stokes equation with P2 − P0 elements ...... 160 0 −1 9.3 Number of iterations for using P2 − P0 elements as a preconditioner for + −1 Stokes equation with P2 − P1 elements ...... 161
ix Acknowledgments
First and foremost, I want to express my sincere gratitude to my advisor, Prof. Jinchao Xu for his enthusiasm, patience, encouragement and inspiration. I have learned a great deal from him, academically and beyond. His insightful advice, patient guidance, constant support and encouragement are essential to the completion of my education. His fine intuition and deep understanding of numerical analysis, finite element method, and multigrid methods have been very important to my Ph.D. studies and research. His support for my work, life, and career development are invaluable, and my words simply can not express my gratitude strongly enough. His work ethic has been an enormous influence on me. For me, he has redefined the word “advisor” to a new level. It has been my extreme privilege to know him and work with him. I also want to thank Prof. James Brannick and Prof. Ludmil Zikatanov, for being my committee members, and providing me with a mathematics perspective for my research and thesis. Their comments and advices have been most essential for me to understand the new methods described in this work. I would like to thank my committee member, Prof. Chao-yang Wang, who valued my research and were generously willing to take the time to evaluate my thesis. I want to thank all the excellent post-docs and colleagues in our group. Dr. Xiaozhe Hu, Dr. Maximilian Metti, Dr. Fei Wang, Kai Yang, Changhe Qiao, and Yicong Ma. Without the advice and helpful discussions, I was impossible to accomplish my Ph.D. thesis. Last but not the least, I want to thank my family, especially my wife Ying Chen, for her unconditional trust, constant encouragement, and sweet love over the years.
x Dedication
TO GOD BE THE GLORY.
xi Chapter 1
Introduction
Numerical simulation plays an important role in scientific research and engineering design since experimental investigation is both very expensive and time consuming. Numerical simulation helps to understand important features and reduce the time for development. Progress in computer science and engineering helps to meet this need for computational power. The rapid development of the computer industry provides more and more powerful computing ability for numerical simulation, which also makes numerical simulation even more applicable to wider fields and more complex physical phenomena. As the complexity and difficulty of the numerical simulation increase, the linear solvers become the most stringent bottleneck as measured by the proportion of execution time. The need for the fast and stable linear solvers, especially for massively parallel computers, is becoming increasingly urgent. Assume V is a Hilbert space and V∗ is the dual space of V. Consider the following linear system Au = f, (1.1) where A : V → V∗ is a nonsingular linear operator over V and f ∈ V∗ is the given function on the dual space V∗. Since we consider V to be a finite dimensional space, we take V∗ = V. There are two different ways to solve the system (1.1): a direct solver or an iterative solver. Direct solvers theoretically give the exact solution in finite steps, for example Gaussian elimination [1], multifrontal solvers [2], and the like. The review papers of [3, 4, 5] serve as excellent references for various direct solvers. These methods would give the precise solution if they were performed in infinite precision arithmetic. However, in practice, this is rarely true because of the rounding errors. The error made in one step propagates further in all 2 following steps. This makes it difficult to solve the equations by direct solvers for complex problems in applications. On the other hand, as scientific computing develops, many of the problems are extremely large and complex. For example, “Grand Challenge” problems require PetaFLOPs and PetaBytes of computing resources. The large computing complexity of the direct methods makes them infeasible for these problems, even with the best available computing power. Consider Gaussian elimination method as an example since it is still the most commonly used method in practice: Gaussian elimination is a row reduction algorithm for solving linear equations. To perform row reduction on a matrix, one uses a sequence of elementary row operations to modify the matrix until the lower left-hand corner of the matrix is filled with as many zeros as possible. There are three types of elementary row operations: 1) Swapping two rows, 2) Multiplying a row by a non-zero number, 3) Adding a multiple of one row to another row. By using these operations, a matrix can always be transformed into an upper triangular matrix. Once all of the leading coefficients (the left-most non-zero entry in each row) are 1, and in every column containing a leading coefficient has zeros elsewhere, the matrix is said to be in reduced row echelon form. This final form is unique; in other words, it is independent of the sequence of row operations used. For example, in the following sequence of row operations (where multiple elementary operations might be done at each step), the third and fourth matrices are the ones in row echelon form, and the final matrix is the unique reduced row echelon form. The advantage of Gaussian elimination is that it is the most user-friendly solver. For any matrix and right and side, it can guarantee that Gaussian elimination can always solve the equations. However, the computational efficiency is very low. The number of arithmetic operations is one way of measuring the algorithm’s computational efficiency. For Gaussian elimination, it requires N(N − 1)/2 divisions, (2N 3 + 3N 2 − 5N)/6 multiplications, and (2N 3 + 3N 2 − 5N)/6 subtractions, for a total of approximately 2N 3/3 operations. So the arithmetic complexity is O(N 3). When the problem scale is large, it will costs lots of time and memory to solve the problem. Sometimes, it is even impossible. For example, for the input size N = 109, even with the top 1 super computer Tianhe-2, which ranked the world’s fastest with a record of 33.86 petaflops in 2013, it costs about 560 years to solve one problem by the O(N 3) algorithm. In contrast to direct methods, iterative methods are not expected to terminate in a finite number of steps. Starting from an initial guess, iterative methods form successive approximations that converge to the exact solution only in the limit. These methods are 3 relatively easy to implement and use fewer memories. Therefore, the iterative methods are generally needed for large scale problems. Two main classes of the iterative methods are the stationary iterative methods and the more general Krylov subspace methods. Stationary iterative methods solve a linear system with an operator approximating the original one. Examples of stationary iterative methods are the Jacobi method, Gauss–Seidel method and the Successive over-relaxation (SOR) method [1]. While these methods are simple to derive, implement, and analyze, the convergence of these methods is only guaranteed for a limited class of matrices. On the other hand, Krylov subspace methods work by forming a basis of the sequence of successive matrix powers times the initial residual (the Krylov sequence). The approximations to the solution are then formed by minimizing the residual over the subspace formed. The prototypical method in this class is the conjugate gradient method (CG) [6], the minimal residual method (MINRES) [7], and the generalized minimal residual method (GMRES) [8]. Since these methods form a basis, it is evident that the method converges in N iterations, where N is the system size. However, in the presence of the rounding errors this statement does not hold. Moreover, in practice N can be very large, and the iterative process reaches sufficient accuracy already far earlier. The Krylov space method can also be accelerated by some preconditioners. For example, if N = 109, the CG method with multigrid method as the preconditioner can solve the problem in 500 seconds. It has been observed that when employing classical iteration methods can reduce the high-frequency components of the errors rapidly, but it can hardly reduce the low-frequency components errors [9, 1, 10]. The multigrid principle was motivated by this observation. Another crucial observation is that the low-frequency errors on a fine mesh becomes high- frequency errors on a coarser mesh. For the coarse grid problem, we can apply the smoothing and the separation of scales again. Recursively application of smoothing to each level results in the classical formulation of multigrid. A natural application of this idea is the geometric multigrid (GMG) method [11, 9]. GMG has been provided substantial acceleration when compared to basic iterative solvers like Jacobi or Gauss-Seidel, even better performance has been observed when these methods are used as a preconditioner of Krylov methods. GMG method for a Poisson equation can be solved in O(N) operations. Roughly speaking, there are two different types of theories that have been developed for the convergence of GMG. For the first kind theory that makes critical use of elliptic regularity of the underlying partial differential equations as well as approximation and inverse properties 4 of the discrete hierarchy of grids, we refer to Bank and Dupont [12], Braess and Hackbusch [13], Hackbusch[11], and Bramble and Pasciak[14]. The second kind of theory makes minor or no elliptic regularity assumption, we refer to Yserentant [15], Bramble, Pasciak and Xu [16], Bramble, Pasciak, Wang and Xu [17], Xu [18, 19] and Yserentant [20], and Chen, Nochetto and Xu [21, 22]. The GMG method, however, relies on a given hierarchy of geometric grids. Such a hierarchy of grids is sometimes naturally available, for example, due to an adaptive grid refinement or can be obtained in some special cases by a coarsening algorithm [23]. But in most cases in practice, only a single (fine) unstructured grid is given. This makes it difficult to generate a sequence of nested meshes. To circumvent this difficulty, two different methods are developed: algebraic multigrid (AMG) methods and non-nested geometric multigrid. One practical way to generate the grids hierarchy for general unstructured grids is al- gebraic multigrid (AMG) methods. Most AMG methods, although their derivations are purely algebraic in nature, can be interpreted as nested MG when they are applied to fi- nite element systems based on a geometric grid. AMG methods are usually very robust and converge quickly for Poisson-like problems [24, 25]. There are many different types of AMG methods: the classical AMG [26, 27, 28], smoothed aggregation AMG [29, 30, 31], AMGe [32, 33], unsmoothed aggregation AMG [34, 35] and many others. Highly efficient sequential and parallel implementations are also available for both CPU and GPU systems [36, 37, 38]. AMG methods have been demonstrated to be one of the most efficient solvers for many practical problems [39]. Despite the great success in practical applications, AMG still lacks solid theoretical justifications for these algorithms except for two-level theories [40, 30, 41, 42, 43, 44, 45]. For a truly multilevel theory, using the theoretical framework developed in [17, 19], Vanˇek,Mandel, and Brezina [46] provides a theoretical bound for the smoothed aggregation AMG under some assumption about the aggregations. Such an assumption has been recently investigated in [45] for aggregations that are controlled by auxiliary grids that are similar to those used in [47]. Another way to generate the multigrid method for unstructured grid is non-nested geo- metric multigrid. One example of such kind of theory is by Bramble, Pasciak and Xu [48]. In this work, optimal convergence theories are established under the assumption that a non- nested sequence of quasi-uniform meshes can be obtained. Another example is the work by Bank and Xu [49] that gives a nearly optimal convergence estimate for a hierarchical basis type method for a general shape-regular grid in two dimensions. This theory is based on 5 non-nested geometric grids that have nested sets of nodal points from different levels. One feature in the aforementioned MG algorithms and their theories is that the underly- ing multilevel finite element subspaces are not nested, which is not always desirable from both theoretical and practical points of view. To avoid the non-nestedness, many different MG techniques and theories have been explored in the literature. One such a theory was devel- oped by Xu [47] for a semi-nested MG method with an unstructured but quasi-uniform grid based on an auxiliary grid approach. Instead of generating a sequence of non-nested grids from the initial grid, this method is based on a single auxiliary structured grid whose size is comparable to the original quasi-uniform grid. While the auxiliary grid is not nested with the original grid, it contains a natural nested hierarchy of coarse grids. Under the assump- tion that the original grid is quasi-uniform, an optimal convergence theory was developed in [47] for second order elliptic boundary problems with Dirichlet boundary conditions. The first goal of my thesis is to extend the algorithm and theory in Xu [47] to shape regular grids that are not necessarily quasi-uniform. The lack of quasi-uniformity of the original grid makes the extension nontrivial for both the algorithm and the theory. First, it is difficult to construct auxiliary hierarchical grids without increasing the grid complexity, especially for grids on complicated domains. The way we construct the hierarchical structure is to generate a cluster tree, based on the geometric information of the original grid [50, 51, 52, 53]. Secondly, it is also not straightforward to establish optimal convergence for the geometric multigrid applied to a hierarchy of auxiliary grids that can be highly locally refined. As the need to solve extremely large systems grows urgency, researchers do not only study the MG method and theories but also also consider the parallelizations. Parallel multigrid approaches have been implemented in various frameworks. For example, waLBerla [54] is for finite difference discretizations on fully structured grids, BoomerAMG (included in the Hypre package [55]) is the parallelization of the classical AMG methods and their variants for unstructured grids, ML [56] focus on the parallel versions of smoothed aggregation AMG methods, Peano [57] is based on space-filling curves or Distributed and Unified Numerics Environment (DUNE) which is a general software framework for solving PDEs. Not only are the researchers rapidly developing algorithms, they are doing the same with the hardware. GPUs based on single instruction multiple thread (SIMT) hardware architecture have provided an efficient platform for large-scale scientific computing since November 2006, when NVIDIA released the Compute Unified Device Architecture (CUDA) toolkit. The CUDA toolkit made programming on GPU considerably easier than it had 6 been previously in large part. MG methods have also been parallelized and implemented on GPUs in a number of studies. GMG methods as the typical cases of MG have been implemented on GPUs first [58, 59, 60, 61, 62, 63]. These studies demonstrate that the speedup afforded by using GPUs can result in GMG methods achieving a high level of performance on CUDA-enabled GPUs. However, to the best of our knowledge of the task, parallelizing an AMG method on GPUs or CPUs remains very challenging, mainly due to the sequential nature of the coarsening processes (setup phase) used in AMG methods. In most AMG algorithms, coarse-grid points are selected sequentially using graph theoretical tools (such as maximal independent sets and graph partitioning algorithms) and the coarse- grid matrices are constructed by a triple-matrix multiplication. Although extensive research has been devoted to improving the performance of parallel coarsening algorithms, leading to marked improvements on CPUs [64, 65, 66, 36, 67, 68, 69] and on GPUs [65, 70, 71] over time, the setup phase is still considered to be the major bottleneck in parallel AMG methods. On the other hand, the task of designing an efficient and robust parallel smoother in the solver phase is no less challenging. However, most of the difficulties could be solved by using auxiliary spaces. The special structure of the auxiliary grid makes the parallelizations more efficient. Therefore, the second goal of this thesis is to design new parallel smoothers and a MG method based on auxiliary space preconditioning techniques. Many problems in physics and engineering, like the Navier-Stokes equations in fluid dy- namics, Helmholtz equations, or multi-physics problems, lead to indefinite problems. Other problems, like the biharmonic plate problem, may be formulated as coupled problems in different variables, which leads to a saddle point problem. It is important to find an efficient preconditioner for these indefinite problems. It is noted that preconditioning design and analysis for saddle-point problems and indef- inite problems have been the subject of active research in a variety of areas of applied math- ematics, for example: groundwater flow, Stokes and Navier-Stokes flow [72, 73, 74, 75], elas- ticity, magnetostatics, etc. While most results address the symmetric case, non-symmetric preconditioning has also been analyzed for some practical problems. Various iterative methods and preconditioning methods to solve saddle-point type have been the subject of research. Some methods are focused on clever renumbering schemes in combination with a classical iterative approach, like the SILU scheme and ILU schemes pro- posed by Wille et al. Other methods are based on the splitting of the saddle point operator. 7
A number of block-preconditioners have been devised, for example block diagonal precondi- tioners [74], block triangle preconditioners, the Pressure-Convection Diffusion commutator (PCD) of Kay, Logan and Wathen [76, 77], the Least Squares Commutator (LSC) by Elman, Howle, Shadid, Shuttleworth and Tuminaro [75], the Augmented Lagrangian Approach (AL) of Benzi and Olshanskii [78], the Artificial Compressibility (AC) preconditioner [79] and the Grad-Div (GD) preconditioner [79]. For an overview of block preconditioners, we refer to [80, 81, 82]. Although we can apply MG to solve the sub-blocks, the development of MG for indefinite problems on general unstructured grids is still rarely analyzed. So, the last goal of this thesis is to carry out the general analysis of FASP preconditioners for finite element discretizations and to describe sufficient conditions for optimal preconditioning. The rest of the thesis is organized as follows. In Chapter 2, we review the iterative method and preconditioning techniques. Then in Chapter 3, we introduce the basic concepts and theories of MG and auxiliary space preconditioning. Next, we introduce the algorithm and theory of our new auxiliary space MG on shape regular grids in Chapter 4. In Chapter 5, we present a parallel colored Gauss-Seidel method. In Chapter 6, we discuss the parallelization of the UA-AMG algorithms based on auxiliary grids. After that, we give some numerical examples in Chapter 7. In Chapter 8, we review the iterative method for the indefinite problems and introduce the new FASP theory for indefinite problems. After that, we discuss the method and apply the FASP theories to the Stokes equations in Chapter 9. And finally, we make some conclusions and describe future work in Chapter 10. Chapter 2
Iterative Method
A single step linear iterative method uses an old approximation, uold, of the solution u∗ of (1.1), to produce a new approximation, unew, usually consists of three steps:
1. Form rold = f − Auold;
2. Solve Ae = rold approximately:e ˆ = Brold;
3. Update unew = uold +e, ˆ where B is a linear operator on V and it can be thought of as an approximate inverse of A. As a result, we have the following algorithm.
Algorithm 1 Iterative Method Given u0 ∈ V; p0 = r0; for m = 0, 1, ··· , until convergence do um+1 = um + B(f − Aum)
∗ We say that an iterative scheme like algorithm 1 converges if limm→∞ um = u for any u0 ∈ V. The core element of the above iterative scheme is the operator B. Notice that if B = A−1, after one iteration, u1 is then the exact solution. In general, B may be regarded as an approximate inverse of A. For general iterative methods, we have the following simple convergence result. 9
Lemma 2.0.1. The iterative Algorithm 1 converges if and only if
ρ(I − BA) < 1
where ρ(A) is the spectral radius of A.
If A is symmetric positive definite (SPD), we can define a new inner product: (u, v)A = (Au, v). Sometimes it is more desirable that the operator B is symmetric. If B is not symmetric, there is a natural way to symmetrize it. Consider the following Algorithm 2.
Algorithm 2 Symmetrized Iterative Method Given u0 ∈ V; p0 = r0; for m = 0, 1, ··· , until convergence do um+1/2 = um + B(f − Aum) um+1 = um+1/2 + BT (f − Aum+1/2)
The symmetrization of iterator B can be denote as B¯. Since
I − BA¯ = (I − BT A)(I − BA) = I − BT A − BA + BT ABA,
we have B¯ = BT + B − BT AB.
The convergence theory is as follows.
Theorem 2.0.2. A sufficient condition for the convergence of Algorithm 2 is that
B−T + B−1 − A is SPD.
Proof. If B−T + B−1 − A is SPD, then B¯ = BT (B−T + B−1 − A)B is SPD. Since
¯ ¯T (BAu, v)A = (B Au, v)A,
¯ BA is SPD w.r.t. (·, ·)A. Because
((I − BA¯ )u, v) = (I − BT A)(I − BA) 10
Assume λ ∈ σ(I − BA¯ ) ⊂ [0, ∞) and µ ∈ σ(BA¯ ), then µ > 0, so
0 ≤ λ = 1 − µ < 1. which leads the conclusion.
Theorem 2.0.3. If A is SPD, A sufficient condition for the convergence of Algorithm 1 is that B−T + B−1 − A is SPD.
Proof. By Theorem 2.0.2,
1 1 T 2 ¯ 2 ρ(I − BA) ≤ kI − BAkA = k(I − BA) (I − BA)kA = kI − BAkA < 1.
2.1 Stationary Iterative Methods
Most of the stationary iterative methods involve passing from one iterate to the next by modifying one or a few components of an approximate vector solution at a time. This is natural since it is simple to modify a component. The convergence of these methods is rarely guaranteed for all matrices, but a large body of theory exists. Assume V = RN and N×N A = (aij) ∈ R which is symmetric positive definite (SPD). And we begin with the matrix splitting A = D + L + U, (2.1)
where D is the diagonal of A, L is the strict lower part, and U is the strict upper part (as shown in Figure 2.1).
2.1.1 Jacobi Method
The Jacobi method determines the i-th component by eliminating the i-th component of m the residual vector of the previous solution. If ui denotes the i-th component of the m-th 11
Figure 2.1. Matrix splitting of A iteration um. With writing the residual form as
N ∗ X m m (f − Au )i = fi − aijuj − aiiui = 0, j=1,j6=i where u∗ is the exact solution of the linear system. um+1 can be determined by
N m+1 X m aiiui = fi − aijuj (2.2) j=1,j6=i or N ! m+1 1 X m ui = fi − aijuj , i = 1, 2, ··· ,N. (2.3) aii j=1,j6=i This is a component-wise form of the Jacobi iteration. It can be written in vector form as
um+1 = D−1(f − (L + U)um) (2.4)
or um+1 = um + D−1(f − Aum) (2.5)
This leads the following algorithm: 12
Algorithm 3 Jacobi Method for k ← 1 until convergence do for i ← 1 to N do σ = 0 for j ← 1 to N do if j 6= i then m σ ← σ + aijuj ,
m+1 1 ui ← (fi − σ) aii check if convergence is reached
The convergence theory of the Jacobi method is well defined. Theorem 2.1.1 (Jacobi Method). Assume A is a symmetric positive definite (SPD), Jacobi method converges if and only if 2D − A is SPD. Proof. Since B−T + B−1 − A = D + D − A = 2D − A, the desired result follows from Theorem 2.0.3.
It is worthy to mention that the Jacobi method sometimes converges even if these con- ditions are not satisfied.
2.1.2 Gauss-Seidel Method
The Jacobi method determines the i-th component by eliminating the i-th component of the residual vector of the current solution in the order i = 1, 2, ··· ,N. This time the approximate solution is updated immediately. after the new component is determined. The i-th component of the residuals is
i−1 N X m+1 m+1 X m fi − aijuj − aiiui − aijuj = 0, (2.6) j=1 j=i+1 which leads to the iteration,
i−1 N ! 1 X X um+1 = f − a um+1 − a um , i = 1, 2, ··· ,N. (2.7) i a i ij j ij j ii j=1 j=i+1 13
The vector form of the equation (2.6) can be written as
f − Lum+1 − Dum+1 − Uum = 0.
Therefore, the vector form of the Gauss-Seidel method is
um+1 = (D + L)−1(f − Uum), (2.8) or um+1 = um + (D + L)−1(f − Aum). (2.9)
This leads to the following algorithm:
Algorithm 4 Gauss-Seidel Method u = u0 while convergence is not reached do for i ← 1 to N do σ = 0 for j ← 1 to N do if j 6= i then σ ← σ + aijuj, 1 ui ← (fi − σ) aii check if convergence is reached
The convergence properties of the Gauss-Seidel method depend on the matrix A. Namely, the procedure is known to converge if either A is symmetric positive-definite or A is strictly or irreducibly diagonally dominant.
Theorem 2.1.2 (Gauss-Seidel Method). Assume A is SPD. Then Gauss-Seidel method always converges.
Proof. Since B−T + B−1 − A = (D + L)T + D + L − A = D, is SPD. By Theorem 2.0.3, we can prove the Gauss-Seidel method is convergent. 14
The Gauss-Seidel method has some variational formats. The backward Gauss-Seidel method can be defined as
um+1 = uu + (D + U)−1(f − Aum), (2.10)
which is equivalent to making the corrections in the order N,N − 1, ··· , 1. The symmetric Gauss-Seidel method consists of a forward sweep followed by a backward sweep which can be written as m+ 1 m −1 m u 2 = u + (D + L) (f − Au ), (2.11) m+1 m+ 1 −1 m+ 1 u = u 2 + (D + U) (f − Au 2 ). We can rewrite to one equation
um+1 = um + [(D + U)−1 + (D + L)−1 − (D + U)−1A(D + L)−1](f − Aum). (2.12)
It is simple to prove that when A is SPD, the backward Gauss-Seidel method and the symmetric Gauss-Seidel method converge.
2.1.3 Successive Over-Relaxation Method
The successive over-relaxation method (SOR) is derived by extrapolating the Gauss-Seidel method. This extrapolation takes the form of a weighted average between the previous iterate and the computed Gauss-Seidel iterate successively for each component,
m+1 m+1 m ui = ωu¯i + (1 − ω)ui , i = 1, 2, ··· ,N
m whereu ¯i denotes a Gauss-Seidel iterate and ω is the extrapolation factor. The idea is to choose a value for ω that will accelerate the rate of convergence of the iterates to the solution. The vector form of the SOR method can be written as
um+1 = (D + ωL)−1[−ωU + (1 − ω)D]um + ω(D + ωL)−1f. (2.13)
This leads the following algorithm: 15
Algorithm 5 SOR Method u = u0 while convergence is not reached do for i ← 1 to N do σ = 0 for j ← 1 to N do if j 6= i then σ ← σ + aijuj, ω ui ← (1 − ω)ui + (fi − σ) aii check if convergence is reached
If ω = 1, the SOR method simplifies to the Gauss-Seidel method. If ω = 0, the SOR method simplifies to the Jacobi method. A theorem due to Kahan [83] shows that SOR fails to converge if ω is outside the interval (0, 2).
Theorem 2.1.3. Assume A is SPD. Then SOR method converges if 0 < ω < 2.
Proof. Since
2 − ω B−T + B−1 − A = ω−1(D + ωL)T + ω−1(D + ωL) − A = D, ω
is SPD. The result follows from Theorem 2.0.3.
In general, it is not possible to compute in advance the value of ω that will maximize the rate of convergence of SOR. If the coefficient matrix A is symmetric and positive definite, the SOR iteration is guaranteed to converge for any value of ω between 0 and 2, though the choice of ω can significantly affect the rate at which the SOR iteration converges. Frequently, some heuristic estimate is used, such as ω = 2 − O(h), where h is the mesh spacing of the discretization of the underlying physical domain. A backward SOR sweep can be defined analogously to the backward Gauss-Seidel sweep (2.10). A Symmetric SOR (SSOR) step consists of the SOR step (2.13) followed by a backward SOR step,
m+ 1 −1 m −1 u 2 = (D + ωL) [−ωU + (1 − ω)D]u + ω(D + ωL) f, (2.14) m+1 −1 m+ 1 −1 u = (D + ωU) [−ωL + (1 − ω)D]u 2 + ω(D + ωU) f. 16
2.1.4 Block Iterative Method
We now discuss the extension of the stationary iterative methods to block scheme. The block iterative methods are generalizations of the “pointwise” iteration method described above. They update a whole set of components at each time, typically a sub vector of the solution vector, instead of only one component. First of all, we assume that that A˜ is simply a matrix
A11 A12 ··· A1J ξ1 β1 A A ··· A ξ β ˜ 21 22 2J 2 ˜ 2 A = . . . . , u˜ = . , f = . . . . .. . . . . . . . . AJ1 AJ2 ··· AJJ ξJ βJ
with entries being subblocks:
Ni×Nj Nj Ni X Aij ∈ R , ξj ∈ R , βi ∈ R , 1 ≤ i, j ≤ J and N = Ni. i
Similarly, we can define the following block matrix splitting:
A˜ = D˜ + L˜ + U˜ where D˜ is the block diagonal of A˜ and L˜ and U˜ are the strictly block lower and upper triangular parts of A˜, respectively. Namely
A11 0 0 A12 ··· A1J . . A22 A21 0 0 .. . D˜ = , L˜ = , U˜ = . .. . .. .. . . . . . .. A 1,J−1 AJJ AJ1 ··· AJ−1,1 0 0
With these definitions, it is easy to generalize the previous three iterative procedures defined earlier, namely, Jacobi, Gauss-Seidel, and SOR, namely we can simply take D˜ −1 block Jacobi; B˜ = (D˜ + L˜)−1 block Gauss-Seidel; (2.15) ω(D˜ + ωL˜)−1 block SOR, 17
the iterative method can be written as
u˜m+1 =u ˜m + B˜(f˜− A˜u˜m). (2.16)
In addition, a block can also correspond to the unknowns associated with a few consecutive lines in the plane. Unlike the “pointwise” iterative methods, it is not easy to compute B˜ exactly because solving Aii may be very expensive. In order to make the block iterative method more practical, the modified block iterative methods can be defined by replacing the block diagonal inverse D˜ −1 by a modified block diagonal inverse, denoted by R˜. Namely we can choose R˜ modified block Jacobi; B˜ = (R˜−1 + L˜)−1 modified block Gauss-Seidel; (2.17) ω(R˜−1 + ωL˜)−1 modified block SOR, where R˜ is a modification or an approximation of D˜ −1. This means on each step, we do not have to solve the sub-block exactly which is more practical to implement. Mathematically, it can be proved that under the same conditions with the “pointwise” iterative method, the block iterative methods and the modified iterative methods are also convergent. In the following, we shall proceed to give a convergence analysis of the modified block Gauss-Seidel method:
u˜m+1 =u ˜m + (R˜−1 + L˜)−1(f˜− A˜u˜m), m = 1, 2,... (2.18)
Lemma 2.1.4. Assume that A˜ is semi-SPD. Assume that
D˜ = R˜−T + R˜−1 − D˜
is nonsingular, then
−1 −1 B˜−1 = (R˜−T + U˜)T D˜ (R˜−T + U˜) = A˜ + (D˜ + U˜ − R−1)T D˜ (D˜ + U˜ − R−1) (2.19)
and for any v˜, −1 v˜T B˜−1v˜ = ((R˜−T + U)˜v)T D˜ (R˜−T + U)˜v (2.20)
v˜T B˜−1v˜ =v ˜T A˜v˜ + ((D˜ + U˜ − R−1)˜v)T D˜¯ −1(D˜ + U˜ − R−1)˜v. (2.21) 18
Proof. It follows that
B¯˜ = B˜ + B˜t − B˜tA˜B˜ = B˜t(B˜−1 + B˜−t − A)B˜ = B˜tD˜¯B˜ where D˜ = B˜−1 + B˜−t − A = R˜−t + R˜−1 − D.˜
Hence
B˜¯−1 = B˜−1D˜¯ −1B˜−t = (D˜ + A − B−T )D˜¯ −1(D˜ + A − B−1) = (D˜ + A − B−T )D˜¯ −1(D˜ + A − B−1) = A˜ + (A − B−T )D˜¯ −1(A − B−1)
The desired results then follow.
Theorem 2.1.5. Assume A is SPD. The modified block Gauss-Seidel method converges if
D˜¯ ≡ R˜−t + R˜−1 − D > 0. (2.22)
And the following convergence estimate holds:
˜ ˜ 2 1 1 kI − BAkA = 1 − = 1 − (2.23) c1 1 + c0
where ˜t ˜ T ˜¯ −1 ˜−t ˜ c1 = sup ((R + U)v) D (R + U)v (2.24) kvkA=1 and ˜ ˜ ˜−1 T ˜¯ −1 ˜ ˜ ˜−1 c0 = sup ([(D + U) − R ]v) D [D + U − R ]v. (2.25) kvkA=1
In particular, if R˜ = D˜ −1, then R˜¯ = D˜ −1 and
˜ ˜ T ˜ −1 ˜ ˜ c1 = sup ((D + U)v) D (D + U)v (2.26) kvkA˜=1 19 and ˜ T ˜ −1 ˜ c0 = sup (Uv) D Uv. (2.27) kvkA=1
2.2 Krylov Space Method and Preconditioners
With respect to the “influence on the development and practice of science and engineering in the 20th century”, Krylov space methods are considered as one of the ten most important classes of numerical methods. Unlike the stationary iterative methods, Krylov methods do not have an iteration matrix. The Krylov subspace method is a method that extracting an m 0 0 approximate solution u from an affine subspace u + Km, where u is an arbitrary initial guess to the solution and Km is the Krylov subspace
0 0 0 2 0 m−1 0 Km(A, r ) = span{r , Ar ,A r , ··· ,A r }. (2.28)
The residual is r = f − Au
m So {r }m≥0 denotes the sequence of residuals
rm = f − Aum.
0 When there is no ambiguity, Km(A, r ) will be denoted by Km. From the theory of the approximation, it is clear that the approximations obtained from a Krylov subspace method are of the form
−1 m 0 0 A f ≈ u = u + qm−1(A)r , (2.29)
0 in which qm−1 is a cartain polynomial of degree m − 1. In a simplest case, let u = 0, then −1 A f is approximated by qm−1(A)f. In other words, polynomial qm−1(u) is a approximation of 1/u. The relative residual norm of the Krylov subspace method can be bounded as
krmk 0 ≤ min |q(A)|, (2.30) kr k q∈Pm−1 where Pm−1 is the set of (m − 1)-th order polynomials. 20
Although all the techniques provide the same type of polynomial approximations, dif- ferent choices will arise the different versions of Krylov subspace methods. In this section, we will introduce the conjugate gradient method (CG) and we will introduce the minimal residual method (MINRES) and general minimal residual method (GMRES) in Chapter 8.
2.2.1 Conjugate Gradient Method
The conjugate gradient (CG) method is due to Hestenes and Stiefel [6]. It is one of the best known iterative techniques for solving sparse symmetric positive definite linear systems (SPD). The conjugate gradient method was invented in the 1950s as a direct method. It has come into wide use over the last 15 years as an iterative method and has generally superseded the stationary iterative method. The CG method is the prototype of the Krylov subspace method. It is an orthogonal projection method and satisfies a minimality condition: the error is minimal in the energy norm or A-norm, which means
T 1/2 kukA = (u Au) + γ.
Consider the quadratic function
1 φ(u) = uT Au − f T u. (2.31) 2
Since φ(u) is convex and has a unique minimum, assume φ(u∗) is the minimal value. It satisfies ∇φ(u∗) = f − Au∗ = 0,
∗ 1 T −1 so u is the solution. If we choose γ = 2 f A f, it is easy to see that
2 ∗ 2 ∗ T ∗ T T T T T −1 kf −AukA−1 = ku −ukA = (u ) Au−2(u ) Au+u Au = u Au−2f x+f A f = 2φ(u).
So solving the equation Au = f is equivalent to minimize the quadratic function φ(u) and also equivalent to minimize the energy norm of the error vector u∗ − u. In m-th step, we want to find um such that
um = min φ(u). (2.32) 0 u∈u +Km 21
By doing this iteratively, we can find the solution within N steps. Choosing the search directions {vm} which are conjugate (A-orthogonal) to each other, i.e. (vm)T Avj = 0, j = 0, 1, ·, m − 1, and define um = um−1 + ωm−1vm−1, rm = rm−1 − ωm−1Avm−1
ωm is chosen to minimize the A-norm of the residual of the error on the line um−1 + ωvu−1. This means (rm−1)T vm−1 ωm−1 = . (2.33) (vm−1)T Avm−1 Associated with the minimality of the Galerkin condition
m−1 Km−1⊥r ∈ Km,
m N which implies that {r }m=0 are the orthogonal basis of KN . If we choose the conjugate m N search directions to be {r }m=0, we can get the standard CG method. The detail algorithm is as following:
Algorithm 6 CG Method r0 = f − Au0; p0 = r0; for m = 0, 1, ··· , until convergence do (rm)T rm α ← m (pm)T Apm m+1 m m u ← u + αmp , m+1 m m r ← r − αmAp If rm+1 is sufficiently small then exit loop (rm+1)T rm+1 β ← m (rm)T rm m+1 m+1 m p ← r + βmp
The conjugate gradient method can theoretically be viewed as a direct method, as it produces the exact solution after a finite number of iterations, which is not larger than the size of the matrix, in the absence of round-off error. However, the conjugate gradient method is unstable with respect to even small perturbations, most directions are not in practice conjugate, and the exact solution is never obtained. Fortunately, the conjugate 22 gradient method can be used as an iterative method as it provides monotonically improving approximations um to the exact solution, which may reach the required tolerance after a relatively small (compared to the problem size) number of iterations. The improvement is typically linear and its speed is determined by the condition number κ(A) of the system matrix A: λ (A) κ(A) = max , λmin(A) where λmax(A) and λmin(A) are the largest and smallest eigenvalue of A. The convergence Analysis of CG from [84] is as following:
Theorem 2.2.1 (Convergence of CG). Assume that um is the m-th iteration of CG method and u∗ is the exact solution, we have the following estimate:
p ∗ m κ(A) − 1 m ∗ 0 ku − u kA ≤ 2( ) ku − u kA, (2.34) pκ(A) + 1
Proof. For an arbitrary polynomial qm−1 of degree m − 1, denote
m ∗ ∗ u˜ = qm−1(A)f = qm−1(A)Au = Aqm−1(A)u , since (2.32) we have
(u∗ − um)T A(u∗ − um) ≤ min(u∗ − u˜m)T A(u∗ − u˜m) qm−1 ∗ T ∗ ≤ min((I − Aqm−1(A))u ) A(I − Aqm−1(A))u ) qm−1 ∗ T ∗ ≤ min (Aqm(A)u ) Aqm(A)u qm(0)=1 2 ∗ T ∗ ≤ min max |qm(λ)| (u ) Au qm(0)=1 λ∈σ(A) 2 ∗ T ∗ ≤ min max |qm(λ)| (u ) Au . qm(0)=1 λ∈[a,b]
Here a = λmin(A) and b = λmax(A). We choose b+a−2λ Tm( b−a ) q˜m(λ) = b+a . Tm( b−a ) 23
Here Tm(t) is the Chebyshev polynomial of degree m given by
( cos(m cos−1 t)) if |t| ≤ 1; Tm(t) = (2.35) (sign(t))m cosh(m cosh−1 t)) if |t| ≥ 1;
b+a−2λ Notice that Tm( b−a ) ≤ 1 for λ ∈ [a, b]. Thus
b + a −1 max |q˜m(λ)| ≤ Tm( ) . λ∈[a,b] b − a
We set b + a eσ + e−σ = cosh σ = . b − a 2 Solving this equation for eσ, we have
pκ(A) + 1 eσ = pκ(A) − 1 since κ(A) = b/a. We then obtain
!m emσ + e−mσ 1 1 pκ(A) + 1 cosh mσ = ≥ emσ = . 2 2 2 pκ(A) − 1
Consequently !m pκ(A) − 1 min max |qm(λ)| ≤ 2 p . qm(0)=1 λ∈[a,b] κ(A) + 1 The desired result then follows.
Even though, the estimate given above is sufficient for many applications but in general it is not sharp. There are many ways to sharpen the estimate. For example, the following improved estimate shows that the convergence of the CG method depends on the distribution of the spectrum of A. It is possible that the CG method converges fast even the condition number of A is large.
Theorem 2.2.2. [85] Assume that σ(A) = σ0(A) ∪ σ1(A) and l is the number of elements in σ0(A). Then p !m−l ∗ m b/a − 1 ∗ 0 ku − u kA ≤ 2M ku − u kA pb/a + 1 24
where a = minλ∈σ1(A) λ, b = maxλ∈σ1(A) λ and
Y λ M = max 1 − . λ∈σ1(A) µ µ∈σ0(A)
In practice, κ(A) 1, if A ill-conditioned. For this condition, the convergence rate can not be guaranteed.
2.3 Preconditioned Iterations
Although all of the iterative methods are well founded theoretically, they are all likely to suffer from slow convergence for problems which arise from typical applications. Preconditioning is a key ingredient for the success of Krylov subspace methods in the applications. Both the efficiency and robustness of iterative techniques can be improved by using preconditioners. preconditioning is simply transforming the original linear system into one which has the same solution, but easier to solve with an iterative solver. In general, the reliability of iterative techniques, when dealing with various applications, depends much more on the quality of the preconditioner than on the particular Krylov subspace accelerators used. In this section, we discusses the preconditioned versions of the Krylov subspace algorithms using a generic preconditioner first. Then, we cover some common used preconditioners. To begin with, it is worthwhile to consider the options available for preconditioning a system. The first step in preconditioning is to find a preconditioning matrix B. The matrix B can be defined in many different ways but it must satisfy a few minimal requirements. B should be an approximation of A−1 in some sense. From a practical point of view, the most requirement for B is that it is inexpensive to construct. This is because the preconditioned algorithms will all require to multiply B at each step. Once a preconditioning matrix B is defined there are three known ways of applying the preconditioner. The preconditioner can be applied from the left, which leading to the preconditioned system BAu = Bf. (2.36)
Alternatively, it can also be applied to the right:
ABv = f, u = Bv. (2.37) 25
Note that the above formulation amounts to making the change of variables u = B−1x, and solving the system with respect to the unknown u. The third common situation is applying the preconditioner on both,which is called split preconditioner:
B1AB2v = B1f, u = B2v, B = B1B2. (2.38)
2.3.1 Preconditioned Conjugate Gradient Method
Recall that CG is designed to solve the symmetric and positive definite matrix. It is im- perative to preserve symmetry. In general, the right and left preconditioners are no longer symmetric. In order to design the Preconditioned Conjugate Gradient Method (PCG), we need to consider the strategies for preserving symmetry. When B is available in the form of an incomplete Cholesky factorization, i.e. B = LLT , then it is simple to use the split preconditioner as (2.38), wihc leads to
LALT v = Lf, u = LT v. (2.39)
Apply CG to this system, we can have the corresponding PCG method.
Algorithm 7 PCG Method for split preconditioner r0 = f − Au0;r ˆ0 = Lr0; p0 = LT r0 for m = 0, 1, ··· , until convergence do (ˆrm)T rˆm α ← m (pm)T Apm m+1 m m u ← u + αmp , m+1 m m rˆ ← rˆ − αmAp (ˆrm+1)T rˆm+1 β ← m (ˆrm)T rˆm m+1 T m+1 m p ← L rˆ + βmp
However, it is not necessary to split the preconditioner to preserve symmetry. Since BA is self-adjoint for the B−1 inner product,
(BAu, v)B−1 = (Au, v) = (u, BAv)B−1 .
Therefore, an alternative is to replace the usual Euclidean inner product in the Conjugate 26
Gradient algorithm by the B−1 inner product. Note that the B−1 inner product do not have to be computed explicitly. With this observation, the following algorithm is obtained.
Algorithm 8 PCG Method for left preconditioner r0 = f − Au0; z0 = Br0; p0 = z0 for m = 0, 1, ··· , until convergence do (ˆrm)T zm α ← m (pm)T Apm m+1 m m u ← u + αmp , m+1 m m r ← r − αmAp zm+1 ← Brm+1 (rm+1)T zm+1 β ← m (rm)T zm m+1 m+1 m p ← z + βmp
By observing that BA is also self-adjoint with respect to the A inner product, i.e.
(BAu, v)A = (BAu, Av) = (u, BAv)A.
A similar PCG with left preconditioner can be written under this inner product. Consider now the right preconditioned system (2.37). AB is self-adjoint with B inner product, so the PCG algorithm can be written with respect to the variable u under the new inner product. that the same sequence of computations is obtained as with Algorithm 8. The implication is that the left preconditioned CG algorithm with the B−1 inner product is mathematically equivalent to the right preconditioned CG algorithm with the B inner product. The following convergence estimate has been done for PCG:
p !m ∗ m κ(BA) − 1 ∗ 0 ku − u kA ≤ 2 ku − u kA. (2.40) pκ(BA) + 1
So a good preconditioner for PCG should satisfy that BA should be better conditioned, i.e. κ(BA) < κ(A). By knowing that any iterative method could be seen as an operator B which is an approximation of A−1, we now prove that any convergent iterative method can accelerate CG. Theorem 2.3.1. Assume that B is symmetric with respect to the inner product (·, ·). If the 27
iterate scheme um+1 = um + B(f − Aum) is convergent, then B from the scheme can used as a preconditioner for A and the PCG method converges at a faster rate.
Proof. If the iterative scheme is convergent, then ρ = ρ(I − BA) < 1. By the definition, we have 2 ((I − BA)v, v)A (BAv, v)A kI − BAkA = sup 2 = 1 − inf 2 < 1. v6=0 kvkA v6=0 kvkA infv6=0(BAv, Av) > 0 which means B is symmetric positive definite. So we can use B as a preconditioner of PCG. Then, since 1 − kI − BAk ≤ kBAk ≤ 1 + kI − BAk, we have
1 + ρ κ(BA) ≤ . 1 − ρ
So q p 1+ρ p κ(BA) − 1 1−ρ − 1 1 − 1 − ρ2 ≤ = < ρ. pκ(BA) + 1 q 1+ρ ρ 1−ρ + 1 The desired result then follows.
We conclude that for any linear iterative scheme for which B is symmetric, a precondi- tioner for A can be attained and the convergence rate can be accelerated by using the PCG method. For example, a preconditioner can be resulted from symmetric SOR method as follows B = SSt,S = (D − ωU)D1/2.
A more interesting case is that the scheme may not be convergent at all whereas B can always be a preconditioner. For example, the Jacobi method is not convergent for all SPD system, but B = D−1 can always be used as a preconditioner. This preconditioner is often known as diagonal preconditioner.
2.3.2 Preconditioning Techniques
Finding a good preconditioner to solve a sparse linear system is often viewed as a combination of art and science. A preconditioner can be defined as a solver which is combined with an 28
outer iteration, like the Krylov subspace iterations. Roughly speaking, a preconditioner is any form of implicit or explicit modification of the linear system. In general, a good preconditioner should be cheap to construct and apply and easy to solve. Generally speaking, there are two approaches to constructing preconditioners. One popu- lar approach is purely algebraic methods that only use the information of the matrix A. The algebraic methods are often easy to develop and to use. For example, all of the stationary iterative methods can be applied as the preconditioners. In general, for a matrix splitting A = M − N, any B = M −1 can be the preconditioner. Ideally, M should be close to A in some sense. The other common preconditioner is defined by the incomplete factorization of A. Assume we have a decomposition of the form A = LU − R where L and U have the same nonzero structure as the lower and upper parts of A respectively, and R is the residual or error of the factorization. This factorization known as ILU(0) which is rather easy and inexpensive to compute. On the other hand, it often leads to a crude approximation which may result in the Krylov subspace accelerator requiring many iterations to converge. To rem- edy this, several alternative incomplete factorizations have been developed by allowing more fill-in in L and U. In general, the more accurate ILU factorizations require fewer iterations to converge, but the preprocessing cost to compute the factors is higher. The algebraic methods may not always be efficient. The other approach is to design the specialized algorithms by using more information for a narrow class of problems. By using the knowledge of the problem, such as the continuous equations, the problem domain, and details of the discretization, very stable and efficient preconditioners can be developed. One typical example is the multigrid preconditioners.
2.4 Numerical Example
At last, we use one example to show the difference of all the iterative method we used to make the audience be more clear about the advantage and drawbacks of the methods. All of the methods are tested on MATLAB with 2.3GHz processor. 29
2.4.1 Comparison of the Iterative Method
The first test problem we considered is the Laplacian on the unit square with homogeneous Dirichlet boundary conditions:
− ∆u = f in Ω := [0, 1]2, u = 0 on Γ := ∂Ω. (2.41)
Set f = 2x(1 − x) + 2y(1 − y), so the exact solution is u∗ = x(1 − x)y(1 − y). Define the P1 finite element spaces V on a (2` − 1) × (2` − 1) structured mesh with ` 2 N N := (2 − 1) vertices. Let (ϕi)i=1 denote a Lagrange basis in V . The discrete stiffness matrices A are defined by Z Ai,j := h∇ϕj(x) , ∇ϕi(x)idx. Ω
The right hand side vector b is defined as Z fi = fϕi(x)dx. Ω
The linear systems generated from this equation is symmetric positive definite. We compare the different stationary iterative method by using CG as a baseline (Table 2.4.1). As shown in Figure 2.2 and Figure 2.3, Jacobi method converges slowest in both number of iteration and CPU time. It is considered as an inefficient method in term of the run time. However, the primary advantage of the Jacobi method is its parallelism. The low complexity of each iterates makes Jacobi method seems to be very efficient in parallel computing. Gauss Seidel method is faster is because it uses more updated convergence values to find better guesses than the Jacobi method. SOR method with a optimal parameter converges fastest in the stationary method. However, it is difficult to find the best parameters in practice. On the other hand, from the numerical results, all of the stationary iterative method is much slower than CG method and the convergence rate highly depend on the condition of the matrix A. It might need thousands of iterations to get an accurate solution. Because of these disadvantages, stationary iterative methods are rarely used as a solver of the linear systems. 30
5 × 5 9 × 9 17 × 17 33 × 33 Jacobi 129(2.7) 368(6.45) 1204(18.4) 4309(60.2) Gauss-Seidel 65(1.4) 185(3.25) 603(9.25) 2156(30.15) Symemtric GS 38(0.9) 98(1.9) 308(4.9) 1084(15.4) SOR 22(0.7) 37(1.1) 67(1.85) 127(3.35) CG 4(0.1) 10(0.2) 24(0.5) 46(1.2) Table 2.1. Comparison of the different iterative method for Poisson equation
Figure 2.2. Comparison of the number of Iterations
Figure 2.3. Comparison of the CPU time 31
2.4.2 Comparison of the Preconditioners
In this subsection, we compare the effect of the different preconditioners regardless of the choosing the preconditioned iterative method. Table 2.4.2 shows the number of iterations for PCG with the different preconditioners. Clearly, the preconditioning helped significantly
17 × 17 33 × 33 65 × 65 129 × 129 257 × 257 No preconditioner 21 38 70 128 230 Jacobi 21 38 70 128 230 Symemtric GS 15 25 44 72 122 Symemtric SOR 14 19 27 39 57 Incomplete Cholesky Factorization 13 21 36 61 103
Table 2.2. Comparison of the different preconditioners for Poisson equation
(shown in Figure 2.4). For the 257 × 257 problem, the number of iterations for PCG with out preconditioner (the common CG method) took 230 iterations, symmetric Gauss-Seidel preconditioner, number of iterations drop to 122, and it has a little improvement. And for Cholesky IC(0) factorization preconditioning method, the number of iterations is 103, and with symmetric SOR preconditioner with the optimal parameter, it only took 57 iterations. Therefore, by apply appropriate preconditioner, the convergence rate increase significantly.
Figure 2.4. Comparison of the number of iterations for preconditioners
The running time is another issue for choosing a good preconditioner. Normally, the computation time is proportional to the number of iteration. The less the number of iteration the less computation time is required. However, since the computation cost per iteration is 32 different. For example, PCG with Jacobi method may converge faster than with symmetric SOR preconditioner on the high-performance parallel computing resources. For different situation and matrix properties choose a good preconditioner is very importatent. sIt could significantly reduce the amount of computation time and allow fast convergence. Chapter 3
Multigird Method and Fast Auxiliary Space Preconditioner
In recent decades, multigrid (MG) methods have been well established as one of the most efficient iterative solvers and preconditioners for the linear system (1.1). Moreover, intensive research has been done to analyze the convergence of MG. In particular, it can be proven that the geometric multigrid (GMG) method has linear complexity O(N) in terms of compu- tational and memory complexity for a large class of elliptic boundary value problems where N is the number of degree of freedom. In this chapter, we introduce some basic ideas of the multigrid method from the view of subspace corrections.
3.1 Method of Subspace Correction
In the spirit of dividing and conquering, we decompose the space V as the summation of subspaces and correspondingly decompose the problem (1.1) into sub-problems with smaller size and relatively easy to solve. This method is developed by Xu [19]. Let Vi ⊂ V (for 0 ≤ i ≤ L) be subspaces of V. It consists as a decomposition of V i.e.
L X V = Vi. (3.1) i=0 34
PL This means that, for each v ∈ V, there exists vi ∈ Vi (0 ≤ i ≤ L) such that v = i=1 vi. This representation of v may not be unique in general, namely (3.1) is not necessarily a direct sum. We define the following operators, for i = 1,...,L:
• Qi : V → Vi the projection in the L2 inner product (·, ·);
• Ii : Vi → V the natural inclusion to V;
• Pi : V → Vi the projection in the inner product (·, ·)A;
• Ai : Vi → Vi the restriction of A to the subspace Vi;
−1 • Ri : Vi → Vi an approximation of (Ai) which means the smoother;
• Ti : V → Vi Ti = RiQiA = RiAiPi.
For any u ∈ V and ui, vi ∈ Vi, these operators fulfil the trivial equalities
t (Qiu, vi) = (u, Iivi) = (Ii u, vi),
(AiPiu, vi) = a(u, vi) = (QiAu, vi),
(Aiui, vi) = a(ui, vi) = (Aui, vi) = (QiAIiui, vi).
Qi and Pi are both orthogonal projections and Ai is the restriction of A on Vi and is SPD. T Qi coincides with the natural inclusion Ii and thus sometimes are omitted. The matrix or operator A is understood as the bilinear function on V ×V. Then the restriction on subspaces T is Ai = Ii AIi. It follows from the definition that
AiPi = QiA. (3.2)
This identity is of fundamental importance and will be used frequently in this chapter. A consequence of it is that, if u is the solution of (1.1), then
Aiui = fi (3.3)
with ui = Piu and fi = Qif. This equation may be regarded as the restriction of (1.1) to Vi. 35
Since Vi ⊂ V, we may consider the natural inclusion Ii : Vi 7→ V defined by
Iivi = vi, ∀vi ∈ Vi.
T We notice that Qi = Ii because
T (Qiu, vi) = (u, vi) = (u, Iivi) = (Ii , u, vi).
T T Similarly, we have Pi = Ii , where Ii is the transpose of Ii w.r.t. (·, ·)A.
We note that the solution ui of (3.3) is the best approximation of the solution u of (1.1) in the subspace Vi in the sense that
1 J(ui) = min J(v), with J(v) = (Av, v) − (f, v) v∈Vi 2
and
ku − uikA = min ku − vkA. v∈Vi The subspace equation (3.3) will be in general solved approximately. To describe this, we introduce, for each i, another non-singular operator Ri : Vi 7→ Vi that represents an
approximate inverse of Ai in certain sense. Thus an approximate solution of (3.3) may be
given byu ˆi = Rifi. The consistent notation for the smoother Ri is Bi, the iterator for each local problem. But we reserve the notation B for the iterator of the original problem. −1 Last, let us look at Ti = RiQiA = RiAiPi. When Ri = Ai , from the definition, −1 Ti = Pi = Ai QiA. When Ti|Vi : Vi → Vi, the projection Pi is identity and thus Ti|Vi = RiAi. −1 −1 −1 With a slight abuse of notation, we use Ti = (Ti|Vi ) . The action of Ti and Ti is
−1 −1 (Tiui, ui)A = (RiAiui,Aiui), (Ti u, u)A = (Ri u, u).
3.1.1 Parallel Subspace Correction and Successive Subspace Cor- rection
From the viewpoint of subspace correction, most linear iterative methods can be classified into two major algorithms, namely the parallel subspace correction (PSC) method and the successive subspace correction method (SSC). 36
PSC: Parallel subspace correction. This type of algorithm is similar to Jacobi method. The idea is to correct the residue equation on each subspace in parallel. Let uold be a given approximation of the solution u of (1.1). The accuracy of this approximation can be measured by the residual: rold = f − Auold. If rold = 0 or very small, we are done. Otherwise, we consider the residual equation:
Ae = rold.
Obviously u = uold + e is the solution of (1.1). Instead we solve the restricted equation to each subspace Vi old Aiei = Qir .
old Since we can get QiAe = Qir and we know that QiA = AiPi, it is easy to see
old AiPie = Qir .
Then it is clear that ei = Pie. It should be helpful to notice that the solution ei is the best old possible correction u in the subspace Vi in the sense that
old old 1 J(u + ei) = min J(u + e), with J(v) = (Av, v) − (f, v) e∈Vi 2
and old old ku − (u + ei)kA = min ku − (u + e)kA. e∈Vi As we are only seeking for a correction, we only need to solve this equation approximately using the subspace solver Ri described earlier
old eˆi = RiQir .
An update of the approximation of u is obtained by
J new old X u = u + Iieˆi i=1 which can be written as unew = uold + B(f − Auold), 37
where J J X X t B = IiRiQi = IiRiIi . (3.4) i=1 i=1 We have therefore
Algorithm 9 Parallel subspace correction Given u0 ∈ V; apply the Algorithm 1 with B given in (3.4).
It is well-known that the Jacobi method is not convergent for all SPD problems, hence Algorithm 9 is not always convergent. However the preconditioner obtained from this algo- rithm is of great performance.
Lemma 3.1.1. The operator B given by (3.4) is SPD if each Ri : Vi → Vi is SPD.
Proof. The symmetry of B follows from the symmetry of Ri. Now, for any v ∈ V, we have PJ (Bv, v) = i=1(RiQiv, Qiv) ≥ 0. If (Bv, v) = 0, we then have Qiv = 0 for all i. Let vi ∈ Vi P P P be such that v = i vi, then (v, v) = i(v, vi) = i(Qiv, vi) = 0. Therefore v = 0 and B hence is positive and definite.
SSC: Successive subspace correction. This type of algorithm is similar to the Gauss- Seidel method. To improve the PSC method that makes simultaneous correction, we here make the correction in one subspace at a time by using the most updated approximation of u. More 0 old precisely, starting from v = u and correcting its residule in V1 gives
1 0 t −1 v = v + I1R1I1(f − Av ).
1 By correcting the new approximation v in the next space V2, we get
2 1 t 1 v = v + I2R2I2(f − Av ).
Proceeding this way successively for all Vi leads to
Let Ti = RiQiA. By (3.2), Ti = RiAiPi. Note that Ti : V 7→ Vi is symmetric with respect −1 to A(·, ·) and nonnegative and that Ti = Pi if Ri = Ai . 38
Algorithm 10 Successive subspace correction Given u0 ∈ V; for k = 0, 1, ··· , until convergence do v ← uk; for i = 1 : L do v ← v + RiQi(f − Av); uk+1 ← v.
If u is the exact solution of (1.1), then f = Au. Let vi be the ith iterate (with v0 = uk) from Algorithm 10, we have by definition
i+1 i u − v = (I − Ti)(u − v ), i = 1, ··· , L.
A successive application of this identity yields
k+1 k u − u = EL(u − u ), (3.5)
where
EL = (I − TL)(I − TL−1) ··· (I − T1). (3.6)
Algorithm 10 can also be symmetrized.
Algorithm 11 Symmetric successive subspace correction Given u0 ∈ V; v ← u0; for k = 0, 1,... until convergence do for i = 1 : J and i = J : −1 : 1 do v ← v + RiQi(f − Av)
The advantage of the symmetrized algorithm is that it can be used as a preconditioner. In fact, 11 can be formulated in the Algorithm 1 with operator B defined as follows: For f ∈ V, let Bf = u1 with u1 obtained by 11 applied to (1.1) with u0 = 0. Similar to Young’s SOR method, let us introduce a relaxation method. 39
Algorithm 12 SOR successive subspace correction Input : u0 ∈ V Output: v ∈ V v ← u0; for k = 0, 1,... until convergence do for i = 1 : J do v ← v + ωRiQi(f − Av)
Like the SOR method, that a proper choice of ω can result in an improvement of the convergence rate, but it is not easy to find an optimal ω in general. The above algorithm is essentially the same as Algorithm 10 since we can absorb the relaxation parameter ω into the definition of Ri. Like the colored Gauss-Seidel method, SSC iteration can also be colorized and paral- lelized. Associated with a given partition 3.1, a coloring of the set J = {0, 1, 2,...,L} is a disjoint decomposition:
Lc J = ∪t=1J (t)
such that
PiPj = 0 for any i, j ∈ N (t), i 6= j(1 ≤ t ≤ Lc).
We say that i, j have the same color if they both belong to some J (t). The important property of the coloring is that the SSC iteration can be carried out in parallel in each color.
Algorithm 13 Colored SSC Input : u0 ∈ V Output: v ∈ V v ← u0; for k = 0, 1,... until convergence do
for t = 1 : Jc do P t v ← v + i∈J (t) IiRiIi (f − Av)
We note that the terms under the sum in the above algorithm can be evaluated in parallel (for each t, namely within the same color). 40
3.1.2 Multigrid viewed as Multilevel Subspace Corrections
From the space decomposition point of view, a multigrid algorithm can be viewed as a subspace correction method based on the subspaces defined on a nested sequence of triangu- lations. In this section, we rederive a class of multigrid method by a successive application of the overlapping domain decomposition method. We will also provide a complete convergence analysis using the Xu-Zikatanov identity and the results in 3.1.3. For simplicity, we illustrate the technique by considering the linear finite element method for the Poisson equation.
− ∆u = f, in Ω, and u = 0, on ∂Ω, (3.7) where Ω ⊂ Rd is a polyhedral domain. The weak formulation of (3.7) is as follows: given −1 1 f ∈ H (Ω) find u ∈ H0 (Ω) so that
1 a(u, v) = hf, vi, ∀v ∈ H0 (Ω), (3.8) where Z a(u, v) = (∇u, ∇v) = ∇u · ∇vdx, Ω −1 1 and h·, ·i is the duality pair between H (Ω) and H0 (Ω). 1 By the Poincar´einequality, a(·, ·) is an inner product on H0 (Ω), Thus by the Riesz −1 1 representation theorem, for any f ∈ H (Ω), there exists a unique u ∈ H0 (Ω) such that (3.8) holds. Furthermore, we have the following regularity result. There exists α ∈ (0, 1] which depends on the smoothness of ∂Ω such that
kuk1+α . kfkα−1 (3.9)
This inequality is valid if Ω is convex or ∂Ω is C1,1.
We assume that the triangulation {Tk}, k = 1, ··· ,J is constructed by a successive refinement process. More precisely, Th = TJ for some J > 1, and Tk for k ≤ J are a nested i sequence of quasi-uniform triangulations. i.e. Tk consist of simplexes Tk = {τk} of size hk i such that Ω = ∪iτk for which the quasi-uniformity constants are independent of k (cf. [86]) l i and τk−1 is a union of simplexes of {τk}. We further assume that there is a constant γ < 1, 2k independent of k, such that hk is proportional to γ . We then have a nested sequence of 41 quasi-uniform triangulations
T0 ≤ T1 ≤ · · · ≤ TJ = Th
As an example, in the two dimensional case, a finer grid is obtained by connecting the midpoints of the edges of the triangles of the coarser grid, with T being the given coarsest 1 √ initial triangulation, which is quasi-uniform. In this example, γ = 1/ 2.
Corresponding to each triangulation Tk, a finite element space Vk can be defined by
1 Vk = {v ∈ H0 (Ω) : v|τ ∈ P1(τ), ∀τ ∈ Tk}.
Obviously, the following inclusion relation holds:
V0 ⊂ V1 ⊂ ... ⊂ VJ = V.
We assume that h = hJ is sufficiently small and h1 is of unit size. Note that J = O(| log h|). Naturally we have a macro space decomposition
J X V = Vk k=0
Let Nk be the dimension of Vk, i.e., the number of interior vertices of Tk. The standard nodal PNk basis in Vk will be denoted by ϕk,i, i = 1,,Nk. The micro decomposition is Vk = i=1 Vk,i with Vk,i = span{ϕk,i}. By choosing the right Rk,i, the PSC method on Vk is equivalent to Richardson method or Jacobi method. In summary, we choose the space decomposition:
J J N X X Xk V = Vk = Vk,i (3.10) k=0 k=0 i=1
2 T If we apply PSC to the decomposition (3.10) with Rk,i = hkIk,i, we obtain Ik,iRk,iIk,iv = 2−d 2 (v, ϕk,i)ϕk,i. The resulting operator B, according to (3.4), is the so-called BPX precon- ditioner [16]: J Nk X X 2−d Bv = 2 (v, ϕk,i)ϕk,i. (3.11) k=0 i=1
Let TJ be the finest triangulation in the multilevel structure described earlier with nodes 42
NJ {xi}i=1. With such a triangulation, a natural domain decomposition is
NJ ! ¯ ¯ h [ [ Ω = Ω0 supp φi , i=1
h where φi is the nodal basis function in VJ associated with the node xi and Ω0 , which may be empty, is the region where all functions in VJ vanish. It is easy to see that the corresponding decomposition method without a coarse space is exactly the Gauss-Seidel method which is known to be inefficient (its convergence rate is 2 known to be 1−O(hJ )). The more interesting case is when a coarse space is introduced. The choice of such a coarse space is clear here, namely VJ−1. There remains to choose a solver for
VJ−1. To do this, we may repeat the above process by using the space VJ−2 as a “coarser” space with the supports of the nodal basis function in VJ−1 as a domain decomposition. We
continue in this way until we reach a coarse space V0 where a direct solver can be used. As a result, a multilevel algorithm based on domain decomposition is obtained. This procedure can be illustrated by the following diagram:
VJ ⇒ (GS)J + &
VJ−1 ⇒ (GS)J−1 + &
VJ−2 ⇒ (GS)J−2 +
VJ−3 ... This resulting algorithm is a very basic multigrid method cycle, which may be called the backslash (\) cycle. Interpretation of the multigrid method as a special Gauss-Seidel method: A careful inspection on the multigrid algorithm derived above shows clearly that this algorithm is nothing but the SSC to the decomposition (3.10) with exact subspace problems solvers −1 Rk,i = Ak,i . Apparently the SSC method for this decomposition is nothing but the simple Gauss-Seidel iteration for the following matrix