Krylov Subspace Iteration
Total Page:16
File Type:pdf, Size:1020Kb
the Top T HEME ARTICLE KRYLOV SUBSPACE ITERATION This survey article reviews the history and current importance of Krylov subspace iteration algorithms. ince the early 1800s, researchers have lead to approximate relationships between these considered iteration methods an attrac- parameters in neighboring gridpoints. Together tive means for approximating the solu- with the prescribed behavior at the boundary tions of large linear systems. They make gridpoints and with given sources, this leads Sthese solutions possible now that we can do re- eventually to very large linear systems of equa- alistic computer simulations. The classical itera- tions, Ax = b. The vector x is the unknown para- tion methods typically converge very slowly (and meter values in the gridpoints, b is the given in- often not at all). Around 1950, researchers real- put, and the matrix A describes the relationships ized that these methods lead to solution se- between parameters in the gridpoints. Because quences that span a subspace—the Krylov sub- these relationships are often restricted to nearby space. It was then evident how to identify much gridpoints, most matrix elements are zero. better approximate solutions, without much ad- The model becomes more accurate when we ditional computational effort. refine the grid—that is, when the distance be- When simulating a continuous event, such as tween gridpoints decreases. In a 3D simulation, the flow of a fluid through a pipe or of air around this easily leads to large systems of equations. an aircraft, researchers usually impose a grid over Even a few hundred gridpoints in each coordi- the area of interest and restrict the simulation to nate direction leads to systems with millions of the computation of relevant parameters. An ex- unknowns. Many other problems also lead to ample is the pressure or velocity of the flow or large systems: electric-circuit simulation, mag- temperature inside the gridpoints. Physical laws netic-field computation, weather prediction, chemical processes, semiconductor-device sim- ulation, nuclear-reactor safety problems, me- 1521-9615/00/$10.00 © 2000 IEEE chanical-structure stress, and so on. The standard numerical-solution methods for HENK A. VAN DER VORST these linear systems are based on clever imple- Utrecht University mentations of Gaussian elimination. These 32 COMPUTING IN SCIENCE & ENGINEERING methods exploit the sparse linear-system struc- cessful method classes started at about the same ture as much as possible to avoid computations time, interestingly, in a way not appreciated at with zero elements and zero-element storage. the time. The first and truly iterative approach But for large systems these methods are often tried to identify a trend in the successive ap- too expensive, even on today’s fastest supercom- proximants and to extrapolate on the last itera- puters, except where A has a special structure. tion results. This led to the successive overrelax- For many of the problems previously listed, we ation methods, in which an overrelaxation (or can mathematically show that the standard so- extrapolation) parameter steered the iteration lution methods will not lead to solutions in any process. For interesting classes of problems, such reasonable amount of time. So, researchers have as convection-diffusion problems and the neu- long tried to iteratively approximate the solution tron-diffusion equation, this led to attractive x. We start with a good guess for the solution— computational methods that could compete with for instance, by solving a much easier nearby direct methods (maybe not so much in comput- (idealized) problem. We then attempt to im- ing time, but certainly because prove this guess by reducing the error with a of the minimal computer convenient, cheap approximation for A—an it- memory requirements). David The first and truly erative solution method. Young2,3 and Richard Varga4 Unfortunately, defining suitable nearby linear were important researchers iterative approach systems is difficult—in the sense that each step who helped make these meth- in the iterative process is cheap, and most im- ods attractive. The SOR me- tried to identify a portant, that the iteration converges sufficiently thods were intensively used by fast. Suppose that we approximate the n × n ma- engineers until more success- trend in the successive trix A of the linear system Ax = b by the simpler ful methods gradually replaced matrix K. Then, we can formulate the above them. approximants. sketched iteration process as follows: in step i + The early computers had 1, solve the new approximation xi+1 for the solu- relatively small memories that tion x of Ax = b, from made iterative methods still attractive, because you had to store only the = + − Kxi+1 Kxi b Axi . nonzero matrix elements. Also, iterative solu- tion, although slow, was the only way out for For arbitrary initial start x0, this process’s con- many PDE-related linear systems. Including it- vergence requirement is that the largest eigen- eration parameters to kill dominant factors in − value, in modulus, of the matrix I − K 1A is less the iteration errors—as in SOR—made the so- than 1. The smaller this eigenvalue is, the faster lution of large systems possible. the convergence will be (if K = A, we have conver- Varga reports that by 1960, Laplacian-type gence in one step). For most matrices, this is prac- systems of 20,000 could be solved as a daily rou- tically impossible. For instance, for the discretized tine on a Philco-20000 computer with 32,000 Poisson equation, the choice K = diag(A) leads to a words of core storage.4 This would have been convergence rate 1 − O(h2), where h is the distance impossible with a direct method on a similar between gridpoints. Even for the more modern computer. However, the iterative methods of incomplete LU decompositions, this convergence that time required careful tuning. For example, rate is the same, which predicts a very marginal for the Chebyshev accelerated iteration meth- improvement per iteration step. We get reason- ods, you needed accurate guesses for the matrix’s able fast convergence only for strongly diagonally extremal eigenvalues. Also, for the overrelax- dominant matrices. In the mid 1950s, this led to ation methods, you needed an overrelaxation pa- the observation in Ewald Bodewig’s textbook that rameter that was estimated from the largest iteration methods were not useful, except when A eigenvalue of some related iteration matrix. approaches a diagonal matrix.1 Another iterative-method class that became popular in the mid 1950s was the Alternating Di- rection method, which attempted to solve dis- Faster iterative solvers cretized PDEs over grids in more dimensions by Despite the negative feelings about iterative successively solving 1D problems in each coordi- solvers, researchers continued to design faster nate direction. Iteration parameters steered this iterative methods. process. Varga’s book, Matrix Iterative Analysis, The developments of modern and more suc- gives a good overview of the state of the art in JANUARY/FEBRUARY 2000 33 1960.4 It even mentions a system with 108,000 − Compute r0 = b Ax0 for some initial guess x0 degrees of freedom. Many other problems with a for i = 1,2, … variation in matrix coefficients, such as electron- Solve zi−1 from Kzi−1 = ri−1 ics applications, could not be solved at that time. ρ = * Because of the nonrobustness of the early it- i−1 ri−1zi−1 erative solvers, research focused on more effi- if i = 1 cient direct solvers. Especially for software used p1 = z0 by nonnumerical experts, the direct methods else have the advantage of avoiding convergence β ρ ρ i−1 = i−1/ i−2 ; problems or difficult decisions on iteration pa- + β pi = zi−1 i−1pi−1 rameters. The main problem, however, is that endif for general PDE-related problems discretized over grids in 3D domains, optimal direct tech- qi = Api niques scale O(n2.3) in floating-point operations, α = ρ * i i−1 / pi qi so they are of limited use for the larger, realistic + α xi = xi−1 ipi 3D problems. The work per iteration for an it- − α ri = ri−1 iqi erative method is proportional to n, which check convergence; continue if necessary shows that if you succeed in finding an iterative end; technique that converges in considerably fewer than n iterations, this technique is more efficient than a direct solver. Figure 1. The conjugate gradient algorithm. For many practical problems, researchers have achieved this goal, but through clever combinations of modern iteration methods with (incomplete) direct techniques: the ILU − r = b Ax0, for a given initial guess x0 preconditioned Krylov subspace solvers. With for j = 1, 2, .... proper ordering techniques and appropriate β β à β levels of incompleteness, researchers have re- = r 2 , v1=r/ ; b = e1; for i = 1, 2, ...,m alized iteration counts for convection-diffusion problems that are practically independent of w = Avi; the gridsize. This implies that for such prob- for k = 1, ...,i lems, the required number of flops is propor- = ∗ = − hk,i vkw;w w hk,ivk ; tional with n (admittedly with a fairly large hi+1,i = w 2 ; vi+1 = w/hi+1,i; proportionality constant). The other advantage for k = 2, …,i of iterative methods is that they need modest µ amounts of computer storage. For many prob- = hk−1,i µ lems, modern direct methods can also be very hk−1,i = ck−1 + sk−1hk,i modest, but this depends on the system’s ma- − µ hk,i = sk−1 + ck−1hk,i trix structure. δ = 2 + 2 = δ = δ hi,i hi+1,i ;ci hi,i / ;si hi+1,i / ri,i = cihi,i + sihi+1,i The Krylov subspace solvers bà = −s b ;bà = c bà i+1 i i i i i Cornelius Lanczos5 and Walter Arnoldi6 also (( j−1)m+i) ρ = bà (= b − Ax ) i+1 established the basis for very successful meth- 2 ods in the early 1950s.