<<

Structure in loss of orthogonality $

Xiao-Wen Changa, Christopher C. Paigea,∗, David Titley-Peloquinb

aSchool of Computer Science, McGill University, Montr´eal,Qu´ebec, Canada bDepartment of Bioresource Engineering, McGill University, Ste-Anne-de-Bellevue, Qu´ebec, Canada

Abstract In [SIAM J. Anal. Appl., 31 (2009), pp. 565–583] it was shown that for any sequence of k unit 2- n-vectors, the columns of Vk, there is a special (n+k)-square unitary matrix Q(k) that can be used in the analysis of numerical algorithms based on (k) orthogonality. A k × k submatrix Sk of Q provides valuable theoretical information on the loss of orthogonality among the columns of Vk. Here it is shown that the singular value decomposition (SVD) and Jordan canonical form (JCF) of Sk both reveal the null space of Vk as well as orthonormal vectors available from a right-side orthogonal transformation of Vk. The JCF of Sk is shown to reveal more than its SVD does. The Lanczos orthogonal tridiagonalization process for a is then used to indicate the occurrence of some of these properties in practical computations. Keywords: Loss of orthogonality, singular value decomposition, Jordan canonical form, rounding error analysis, Lanczos process, eigenproblem. 2000 MSC: 65F15, 65F25, 65G50, 15A18

1. Introduction

n×k If Vk ∈ C has unit 2-norm columns, one can define the strictly upper triangular 4 −1 H matrix Sk = (I + Uk) Uk, where Uk is the strictly upper triangular part of Vk Vk, as well as the unitary matrix

 S (I −S )V H  (k) 4 k k k k Q = H , (1) Vk(Ik −Sk) In −Vk(Ik −Sk)Vk see Theorem 1 below. This Q(k) was described in [13] and can be the of the rounding error analysis of several numerical algorithms based on orthogonality, see, e.g., [14, 15]. But more generally the matrix Sk provides valuable theoretical information on the loss of orthogonality among the columns of any such Vk. Here properties of Sk are developed in general, and used to show the various properties of Vk that can occur. In particular,

$With best wishes to Paul Van Dooren, one of the brightest and most likeable of people. ∗Corresponding author Email addresses: [email protected] (Xiao-Wen Chang), [email protected] (Christopher C. Paige), [email protected] (David Titley-Peloquin) Preprint submitted to and Its Applications July 7, 2020 we show that the Jordan canonical form (JCF) of Sk can reveal important properties that are not available from the singular value decomposition (SVD) of Sk. The paper is organized as follows. In the next two sections we give a very brief history followed by the notation used here. Section 4 summarizes the theorem on unitary Q(k) in (1), while section 5 derives some properties of Q(k) that we need. Section 6 deals with the SVD of Sk, and shows how it defines important subspaces related to Vk. Section 7 introduces the JCF of Sk, then section 8 shows how this reveals more properties of Vk. These are new results for general Vk with unit length columns, so proofs are given in these sections 7 & 8. Section 9 summarizes the Lanczos process and the result of its rounding error analysis in [14], which shows that the finite precision Lanczos process behaves as a higher dimensional exact Lanczos process for a slightly perturbed (k + n) × (k + n) matrix Ak. Section 10 states a theorem on how the Lanczos process converges, and then uses the JCF of Sk to reveal some surprising numerical behaviors of the Lanczos process and therefore of some other numerical iterative algorithms.

2. A very brief history of the Lanczos process and orthogonalization

Although the orthogonal tridiagonalization of a Hermitian matrix A devised by Cor- nelius Lanczos [8] is simple and elegant mathematically, its numerical behavior has fasci- nated many for 70 years. The Lanczos process was originally discarded because of its loss of orthogonality, then brought back in importance and very gradually understood. There have been many useful works on this resuscitation, such as [11, 12, 20, 9, 10, 14, 15]. The ideas behind the Lanczos process led to other valuable algorithms such as in [4, 16, 2], and there has also been work on the sensitivity of the tridiagonal matrix and vectors resulting from the Lanczos process to perturbations in A, see for example [18]. But an understanding of the loss of orthogonality of the Lanczos process turned out to be crucial. A breakthrough in our understanding of loss of orthogonality in general was initiated by a comment by Charles Sheffield [21] to Gene Golub, which Gene related to Ake˚ Bj¨orck and Chris Paige around 1990, see [1]. This concerned the loss of orthogonality in modified Gram-Schmidt (MGS), but it was shown in [13] that it could be extended to apply to any sequence of unit-length vectors vj. A more complete background of this is given in [13, Section 2.2]. This approach was applied in [14] to give an augmented backward stability result for the Hermitian matrix Lanczos process [8], and this was used in [15] to prove the iterative convergence of the Lanczos process for the eigenproblem and solution of equations, along with more history in [15, Section 2]. Here we look more deeply into the properties of Sk in (1) and what it tells us about loss of orthogonality in general.

3. Notation

4 We use “=” for “is defined to be”, and “≡” for “is equivalent to”. Let In denote the n×k n × n unit matrix, with j-th column ej. We say Q1 ∈ C has orthonormal columns H n×k if Q1 Q1√= Ik and write Q1 ∈ U . For a vector v, we denote its Euclidean norm by 4 H n×m kvk2 = v v. For a matrix B = [b1, b2, . . . , bm] ∈ C we denote its Frobenius norm 4 by kBkF , its spectral norm by kBk2 = σmax(B) the maximum singular value of B, and its range by Range(B). For indices, i:j means i, i+1, . . . , j, while Bi:j ≡ [bi, bi+1, . . . , bj]. 2 We will be dealing with sequences of matrices of increasing , and will use the index k to denote the k-th matrix in a sequence, usually as for example Q(k), in (k) (k) (k) which case subscripts denote partitioning, as in Q ≡ [Q1 | Q2 ]. We often omit the particular superscript ·(k) when the meaning is clear. However there are five special matrices where we denote the k-th matrix by a subscript: Vk, Uk, Sk, Tk, and Ak. For these the (k + 1)-st matrix can be obtained from the k-th by adding a column, e.g., Vk+1 = [Vk, vk+1], or a column and a row, and there is no need for further subscripts. This makes their presentation and manipulation easier to understand in formulae.

4. Obtaining a unitary matrix from unit-length n-vectors The next theorem was given in full with proofs in [13]. It allows us to develop a (k) (k +n)×(k +n) unitary matrix Q from any n×k matrix Vk with unit-length columns.

n Theorem 1 ([13, Theorem 2.1]). For integers n ≥ 1 and k ≥ 1 suppose that vj ∈ C satisfies kvjk2 = 1, j =1:k+1, and Vk = [v1, . . . , vk]. If Uk is the strictly upper triangular H H matrix satisfying Vk Vk = I + Uk + Uk , define the strictly upper triangular matrix Sk via 4 −1 −1 k×k Sk = (Ik + Uk) Uk = Uk(Ik + Uk) ∈ C . (2) Then

H H kSkk2 ≤ 1; Vk Vk = I ⇔ kSkk2 = 0; Vk Vk singular ⇔ kSkk2 = 1. (3)

Here Sk is the unique strictly upper triangular k × k matrix such that " # h i (k) (k)  S (I −S )V H  (k) (k) (k) Q11 Q12 4 k k k k (k+n)×(k+n) Q ≡ Q1 Q2 ≡ (k) (k) = H ∈U . Q Q Vk(Ik −Sk) In −Vk(Ik −Sk)V k n 21 22 k (4) Finally Sk and Sk+1 have the following relations S s  S ≡ k k+1 ∈ (k+1)×(k+1), s =(I −S )V H v . (5) k+1 0 0 C k+1 k k k k+1 Here is an indication of a proof. From (2) it can be shown that

−1 −1 −1 UkSk = SkUk,Uk =(Ik −Sk) Sk ≡Sk(Ik −Sk) , (Ik −Sk) = Ik +Uk. (6)

(k)H (k) (k) To prove Q1 Q1 = Ik in (4), use (2) and (6) to give (dropping · and ·k):  S  U(I + U)−1 U Q ≡ = = (I + U)−1, 1 V (I − S) V (I + U)−1 V H −H H H −1 −H H H −1 Q1 Q1 = (I +U) [V V +U U](I +U) = (I +U) [I +U +U +U U](I +U) = (I + U)−H [(I + U)H (I + U)](I + U)−1 = I.

This was given in [15, §4]. Next, for example, kSkk2 ≤ 1 in (3) follows immediately. Finally, the first equation in (5) follows from the definition of Sk, and to prove the second equation in (5) we see from (6) that Sk+1=(Ik+1 −Sk+1)Uk+1, so that s  V H v  (I −S )V H v  k+1 =S e =(I −S )U e =(I −S ) k k+1 = k k k k+1 . 0 k+1 k+1 k+1 k+1 k+1 k+1 k+1 k+1 0 0 3 5. Some properties of Q(k) in Equation (4)

Our analysis uses properties of the sub-blocks of Q(k) in (4), see [15, §6]. H (k) From (5) sk+1 = (Ik − Sk)Vk vk+1 = Q12 vk+1, so together with (4)

(k) H (k+1) Q22 vk+1 =[In −Vk(Ik −Sk)Vk ]vk+1 =vk+1 −Vksk+1 =Q21 ek+1, (7) " #  s  Q(k) q(k+1) 4 k+1 = 12 v = Q(k)v . (8) = v −V s (k) k+1 2 k+1 k+1 k k+1 Q22

4 H For j =1:k+1 define the orthogonal projectors Pj =In− vjvj . Because Sk is strictly (1) upper triangular we see S1 =0, so from (4) we have Q22 =P1, and we use

(k) (k) H (k) (k) H Q21 = Vk(Ik − Sk),Q12 = (Ik − Sk)Vk ,Q22 = In − Q21 Vk , I − S −s  V = [V , v ],I − S = k k k+1 , k+1 k k+1 k+1 k+1 0 1

(k) to prove several things with (8), in particular we now prove that Q22 =P1 ···Pk.

(k+1) (k) (k) Q21 =Vk+1(Ik+1 −Sk+1)=[Vk(Ik −Sk), vk+1 −Vksk+1]=[Q21 ,Q22 vk+1], (9) (k+1) (k+1) H (k) H (k) H (k) H Q22 =In −Q21 Vk+1 =In −Q21 Vk −Q22 vk+1vk+1 =Q22 (In −vk+1vk+1), (10)

(k+1) from which Q22 =P1 ···PkPk+1 follows by induction. (k) The decrease in kQ22 kF is crucial for proving convergence and accuracy of the finite (k+1) (k) precision Lanczos process, so we discuss it here. First kQ22 k2 ≤kQ22 k2 because

(k+1) (k+1)H (k) H (k)H (k) (k)H (k) H (k)H Q22 Q22 = Q22 (In −vk+1vk+1)Q22 = Q22 Q22 − Q22 vk+1vk+1Q22 .

(0) 4 (k) This and (9) with Q22 = In show how kQ22 kF decreases:

(k+1) 2 (k+1) (k+1)H (k) 2 (k) 2 kQ22 kF = trace[Q22 Q22 ] = kQ22 kF − kQ22 vk+1k2. (11)

(k) (k) 2 Ideally Sk+1 = 0, so in (7) Q22 vk+1 = vk+1, and kQ22 kF = n−k decreases by 1 each (n) n×n (k) 2 step until Q22 = 0 and Vn ∈ U , see (4). But with loss of orthogonality kQ22 kF can decrease far more slowly. This can lead to dramatic slowdown of the Lanczos process.

6. The singular value decomposition (SVD) of Sk

(k) (k) (k)H We now develop the theoretical SVD Sk = W Σ P when Sk in (2) arises from any matrix Vk with unit-length columns. We remind the reader that we often omit the (k) H superscript · for readability, and write, e.g., Sk = W ΣP . From (3) σmax(Sk) ≤ 1, and any unit singular value of Sk will be important in this H analysis. Also if Vk Vk = I then Sk = 0 in (2), and it will help to label each singular vector of Sk according to its zero, unit, or in between singular values. Briefly, zero singular values correspond to no loss of orthogonality, unit singular values to loss of , and intermediate singular values to loss of orthogonality but not loss of linear independence. The rest of this section comes from [19]. 4 Definition 1 (Partitioned SVD of Sk,[19, §4]). Let the k × k matrix Sk in Theo- rem 1 have mk unit and nk zero singular values with SVD

H H H H 2 H 2 H H Sk =W ΣP =W1P1 +W2Σ2P2 ,I −SkSk =W Γ W =W2Γ2W2 +W3W3 , (12) (k) (k) (k) k×k W ≡ W ≡ [w1 , . . . , wk ] ≡ [W1,W2,W3] ∈ U , k =`k +mk +nk, mk `k nk (k) (k) (k) k×k P ≡ P ≡ [p1 , . . . , pk ] ≡ [P1,P2,P3] ∈ U , mk `k nk

(k) `k×`k Σ≡ Σ ≡diag(σ1, . . . , σk)≡diag(Imk , Σ2,Onk ), Σ2 ∈ R , 2 4 2 (k) Γ = Ik −Σ , Γ≡Γ ≡diag(γ1, . . . , γk)≡diag(Omk , Γ2,Ink ), Γ2 positive definite,

(k) where the singular values σj, 1 ≤ j ≤ k, of Sk in Σ≡Σ are arranged as follows,

1 = σ1 = ··· = σmk > σmk+1 ≥ · · · ≥ σmk+`k > σmk+`k+1 = ··· = σk = 0. (13)

These singular vectors of Sk combine with (4) to reveal key properties of Vk:

  " # " # (k) SkP W1 W2Σ2 0 W1 W2Σ2 0 Q1 P = = ≡ , (14) Vk(Ik −Sk)P Vk(P1 −W1) Vk(P2 −W2Σ2) VkP3 0 Ve2Γ2 Ve3 where Ve2 and Ve3 are formally defined in the following theorem and it is easy to verify that [Ve2, Ve3] has orthonormal columns. The first equality in (14) follows from the structure of (k) (k) Q , and the second by applying (12). But the columns of Q1 P are orthonormal, giving the structure in the fourth expression. The fourth expression reveals the null space of Vk and indicates the columns of [Ve2, Ve3] span Range(Vk) due to the fact that Γ2 > 0 and (I − Sk)P in the second expression is nonsingular. This structure was used in proving the following theorem.

Theorem 2 (Range & null space of Vk,[19, Theorem 4.2]). With the notation in Theorem 1 and Definition 1, define

4 −1 4 4 −1 4 Ve2 = Vk(P2 −W2Σ2)Γ2 , Ve3 = VkP3, Vb2 = Vk(W2 −P2Σ2)Γ2 , Vb3 = VkW3.

⊥ Let the columns of Vb0 comprise an of Range(Vk) . Then defining (k) 4 (k) 4 Ve =[Vb0, Ve2, Ve3] and Vb =[Vb0, Vb2, Vb3],

Range(Vk)=Range([Ve2, Ve3])=Range([Vb2, Vb3])⊥Range(Vb0), (Vk)=k−mk, (15)

k×mk N (Vk) = Range(P1 −W1),P1 − W1 ∈ C , rank(P1 −W1)=mk, (16) (k) n×n (k) n×n Ve ≡ Ve ≡ [Vb0, Ve2, Ve3] ∈ U , Vb ≡ Vb ≡ [Vb0, Vb2, Vb3] ∈ U , (17) " # h i   H (k) In−(k−mk) 0 Vb0 H H Q22 = Vb0 Ve2 H = Vb0Vb0 − Ve2Σ2Vb2 , (18) 0 −Σ2 Vb2 where rank(P1 −W1)=mk follows since (I −Sk)P1 = P1 −W1 and I −Sk is nonsingular.

Some singular values of Sk have a useful persistency with increasing k. 5 (k) (k) Remark 1. It was shown in [19, §5] that if in Definition 1 we call {1, wj , pj } for (k) (k) j = 1 : mk the unit singular triplets (or unit triplets) of Sk, and {0, wj , pj } for j = mk + `k + 1 : k (see (13)) zero singular triplets (or zero triplets) of Sk, then for any unit (or zero) triplet of Sk there is a related unit (or zero) triplet for any S` having Sk p p as leading principal submatrix. For example if Skp = 0 then S` [ 0 ] = 0, so [ 0 ] is always a singular vector. See [19, Remark 5.1] for more details.

Remark 2. In Definition 1 and Remark 1 it can be seen that W1 and P1 are arbitrary up to multiplication on the right by the same orthogonal transformation Z ∈ Umk×mk , H H since W1P1 = (W1Z)(P1Z) , while P3 and W3 are each arbitrary up to individual right orthogonal transformations. It follows that Ve3 = VkP3 in (14) is arbitrary up to a right orthogonal transformation. Then (14) shows that exactly nk orthonormal vectors VkP3 can be obtained via a right orthogonal transformation of Vk. We also see from (16) that each unit triplet of Sk corresponds to a unit loss of rank of Vk.

7. The Jordan canonical form (JCF) of Sk

Since Sk is strictly upper triangular its JCF is special, having all eigenvalues zero. If Vk in Theorem 1 had k random unit length n-vectors then we would expect rank(Vk) = k while k ≤ n, and to have mk = k − n unit singular values of Sk for k ≥ n, see (15). For k ≤ n we would also expect Sk to have one zero singular value with the other k − 1 being distributed in (0, 1), and the Jordan canonical form would probably be just one big Jordan block with no other interesting structure. Then kSkk2 would give a measure of the loss of orthogonality, but Sk would not give much else. However the theory here is for understanding the structure in the loss of orthogonality n×k when for example Vk comes from computations intended to produce Vk ∈ U , and then far more interesting properties can arise—there can be a lot of structure. Many properties we discuss arose with the computational Lanczos process analyzed in [15], and will presumably arise in some other large dimensional orthogonalization processes. We will need the Jordan canonical (or ) form of Sk to fully describe some of these properties. For possible future use we give more properties of this JCF than needed here. Here we will state some standard results. Since Sk is nilpotent we will just state the theory for this case. The eigen-subspace for the zero eigenvalue is N (Sk) = Range(P3) in Definition 1, so the geometric multiplicity of this zero eigenvalue is nk.

Definition 2 (Jordan canonical form (JCF) of Sk. See, e.g., [7, §3.1.11]). For k×k k×k Sk in Definition 1 there exist J ∈ R and nonsingular Y ∈ C such that

h 0 I i ` ×` S Y = Y J, Y 4 [Y ,...,Y ],J 4 diag(J ,...,J ),J 4 `i−1 ∈ i i . (19) k = 1 nk = 1 nk i = 0 0 R

Writing Y 4 [y(i), . . . , y(i)] gives S Y = Y J and the complete chain i = 1 `i k i i i

(i) (i) (i) Sky1 = 0,Skyj = yj−1, j = 2, 3, . . . , `i, (20)

(i) where `i is the height or length of the chain, j is the grade of the principal vector yj , (i) and y1 is the eigenvector. There are nk complete chains, i = 1, 2, . . . , nk in (20). 6 4 −H H H Partitioning X ≡ [X1,...,Xnk ] = Y identically to Y gives X Sk = JX and with X 4 [x(i), . . . , x(i)] i = 1 `i

SHX = X J T ; SH x(i) = x(i) , j = 1, 2, ..., ` −1; SH x(i) = 0,XH Y = I. (21) k i i i k j j+1 i k `i i i

A complete chain Y = [y(i), . . . , y(i)] corresponds to the Jordan block J , and then i 1 `i i SH x(i) = 0. We say that [y(i), y(i), . . . , y(i)], 1 ≤ j ≤ ` is a chain, and it becomes k `i 1 2 j i a complete chain when j = `i.

The principal vectors in a JCF can be far from unique, and so if Y in (19) does not already have desirable properties, we can alter it somewhat. We only need the theoretical JCF. We know a Y exists, but we need not compute Y or any of its transformations.

Remark 3. Applying standard nomenclature to our Sk, (see for example Wilkinson [22, j j−1 pp. 42–43]) any vector y which satisfies Sky = 0, Sk y 6= 0, for integer j > 0 is called a (right side) principal vector of grade j of Sk. But the principal vectors are not unique, since if y is a principal vector of grade j then the same is true of any vector obtained by adding multiples of any vectors of grades not greater than j. If such changes produce Yb from Y in (19), this does not say that every chain in (20) will be preserved. We next show that every chain in (20) will be preserved if and only if SkYb = YJb where Yb = YG for some nonsingular G with certain properties.

Definition 3. A matrix is upper Toeplitz if it is zero except for an as large as possible upper triangular and Toeplitz matrix in the top right corner, for example

α β α β δ 0 α β 0 α , , 0 α β . (22)   0 0 α   0 0 0 0 α

Lemma 1 ([3, p. 159]). For Y and J in (19), SkYb = YJb for Yb = YG with nonsin- gular G if and only if JG = GJ. If we partition G conformably with J in (19), G will 2 `i×`j have nk blocks Gi,j ∈ C , and then, see Definition 3,

JG = GJ ⇔ JiGi,j = Gi,jJj, i = 1 : nk, j = 1 : nk, (23)

⇔ Gi,j is upper Toeplitz, i = 1 : nk, j = 1 : nk. (24)

Next with the same partitioning as G, G−1 has all its sub-blocks upper Toeplitz. Then H 4 −1 H H H −1 H H 4 −1 Xb = Yb satisfies Xb Sk = JXb , where Xb = G X with X = Y . Finally if JGi = GiJ for nonsingular Gi, i = 1, 2, then JG1G2 = G1G2J.

Proof. The first sentence holds since SkY = YJ and G is nonsingular, giving

SkYb =YJb ⇔ SkYb =SkYG=YJG=YJb =Y GJ ⇔ JG=GJ. (25)

Now JiGi,j = Gi,jJj in (23) if and only if Gi,j is upper Toeplitz, see, for example, Gantmacher [3, p. 159], or just equate the elements on each side of JiGi,j = Gi,jJj. Next for nonsingular G, JG = GJ ⇔ G−1J = JG−1 and therefore G−1 has all its sub-blocks upper Toeplitz from (23). 7 H −1 H H H Then from SkYb = YJb we see that Xb = Yb satisfies Xb Sk = JXb , where Xb = −1 −1 −1 −1 H H −1 H H (YG) = G Y = G X with X = Y in X Sk = JX . Finally

JG1 = G1J & JG2 = G2J ⇒ JG1G2 = G1JG2 = G1G2J. (26)

4 We see from (26) that we can transform Y in SkY = YJ to Yb = YG1G2 ··· Gm having various different forms using a sequence of nonsingular matrices Gi that commute with J. First we show that we can always make each non-eigenvector in a chain orthogonal to the eigenvector of that chain. This is true for the JCF of any .

Lemma 2. For SkYi =YiJi, i=1:nk in (19), we can choose nonsingular upper Toeplitz `i×`i 4 H Gi,i ∈C so that with Ybi =YiGi,i we have SkYbi =YbiJi where Ybi Ybie1 =e1. If kYie1k2 = 4 1 then Ybie1 = Yie1. Then for J in (19) SkYb = YJb where Yb = [Y1G1,1,...,Ynk Gnk,nk ].

Proof. We will prove the result for Y1. The proofs for Yi, i = 2 : nk follow analogously. From Lemma 1, upper Toeplitz G1,1 ensures that J1G1,1 = G1,1J1, so with Yb1 = Y1G1,1

 γ1 γ2 · γ`1  S Y = S Y G = Y J G = Y G J = Y J ,G 4 ··· . k b1 k 1 1,1 1 1 1,1 1 1,1 1 b1 1 1,1 = · γ2 γ1

Write Y1 ≡ [y1, . . . , y`1 ], Yb1 ≡ [ˆy1,..., yˆ`1 ], and take γ1 = 1/ky1k2 so that kyˆ1k2 = 1. Then we can makey ˆ2,y ˆ3,y ˆ4, ...,y ˆ`1 orthogonal toy ˆ1 = y1γ1 via

Yb1 =Y1G1,1 =[y1γ1, y2γ1 + y1γ2, y3γ1 + y2γ2 + y1γ3, y4γ1 + y3γ2 + y2γ3 + y1γ4,...], H 3 H 2 H 2 γ2 = −y1 y2γ1 , γ3 = −y1 (y3γ1 + y2γ2)γ1 , γ4 = −y1 (y4γ1 + y3γ2 + y2γ3)γ1 ,..., H 2 γ`1 = −y1 (y`1 γ1 + y`1−1γ2 + ··· + y2γ`1−1)γ1 .

Finally if we also have ky1k2 = 1 then γ1 = 1 andy ˆ1 = y1γ1 = y1. The following lemma shows that if we reorder the Jordan blocks, we can easily obtain the reordered JCF by reordering the complete chains in the same way.

Lemma 3. In Definition 2 we have SkY = YJ. If we reorder the Jordan blocks by a 4 T permutation matrix Π to give Jb = Π JΠ, then

4 SkYb =YbJ,b Yb = Y Π.

T Proof. Since SkY = YJ, SkY Π = Y ΠΠ JΠ, leading to SkYb = YJb .

For Sk in Theorem 1 we can obtain a lot of orthogonality in the principal vectors.

Theorem 3. Suppose that the JCF (19) of Sk in Theorem 1 has been permuted so that k×k `1 ≥ `2 ≥ · · · ≥ `nk in SkY = YJ. We can always choose G ∈ C to give the JCF:

4 k×`j SkYb = Yb J, Yb = YG ≡ [Yb1, Yb2,..., Ybnk ]; Ybj ∈ C , j = 1 : nk, (27) where every vector in Ybj is orthogonal to the eigenvector Ybie1 of each previous block, 1 ≤ i < j, every non-eigenvector in each block is orthogonal to the eigenvector of that block, and every eigenvector Ybje1 has unit 2-norm, i.e.,

H H Ybj Ybie1 = 0, i= 1:j−1, j =2:nk; Ybj Ybje1 = e1, j =1:nk. (28) 8 Proof. The matrix Yb = YG in (27) is developed via a sequence of multiplications:

4 (2) (3) (nk) G = G1G G2G G3 ··· G Gnk ; (29) G 4 diag(I , .., I ,G ,I , .., I ),G ∈ `j ×`j upper Toeplitz, (30) j = `1 `j−1 j,j `j+1 `nk j,j C for j =1:nk, while

  γ(1) γ(2) · γ(`j−1) γ(`j )  IG1,j ij ij ij ij  γ(1) γ(2) · γ(`j−1)  ··   ij ij ij      (j) 4  IGj−1,j  4 ··· `i×`j G   ,Gi,j   ∈ C , (31) =  I  =  · γ(2)     ij  ·  (1)     γ  I ij

(j) for i = 1 : j −1 and j = 2 : nk. The matrices Gj and G are nonsingular and commute with J in (19). Lemma 1 with (29) then shows that Yb = YG satisfies (27). Note that the first multiplication YG1 in (29) only alters Y1 to become Yb1, which then remains unchanged in later steps. Then for j >1, Yj is unchanged until step j,

(2) (j−1)  (j) YG1G G2 ··· G Gj−1 G Gj,

(j) where since Yb1, Yb2,..., Ybj−1 will have been obtained, multiplication by G in (31) gives

4 Yej = Yb1G1,j + Yb2G2,j + ... + Ybj−1Gj−1,j + Yj (32) and multiplication by Gj in (30) creates

4 Ybj = YejGj,j, (33) which then remains unchanged in later steps. H In (32), G1,j,...,Gj−1,j are designed to give (Ybie1) Yej = 0, i = 1 : j −1, so that H H H T (Ybie1) Ybj = (Ybie1) YejGj,j = 0, see (28). Then in (33), Gj,j gives (Ybje1) Ybj = e1 , see (28). When Yej is available, Gj,j can be derived via Lemma 2 applied to Yej. Note that once G1,1 is found, (28) holds for j = 1, since the first expression in (28) is non- existent when j = 1. H Suppose (28) holds up to step j −1. Then since (Ybie1) Ybm = 0 for m = i+1 : j −1, H T H and (Ybie1) Ybi = e1 for i < j, applying (Ybie1) to Yej in (32) gives for i=1:

H H T H (Yb1e1) Yej = (Yb1e1) (Yb1G1,j +Yb2G2,j +...+Ybj−1Gj−1,j +Yj) = e1 G1,j +(Yb1e1) Yj, and setting this to zero gives, see (31),

(1) (2) (`j−1) (`j ) T H [γ1,j , γ1,j , ··· , γ1,j , γ1,j ] = e1 G1,j = −(Yb1e1) Yj, which fully defines G1,j since it is upper Toeplitz. We then find Gi,j for i=2:j−1 via

H H (Ybie1) Yej = (Ybie1) (Yb1G1,j + Yb2G2,j + ... + Ybj−1Gj−1,j + Yj) H T H = (Ybie1) (Yb1G1,j + Yb2G2,j + ... + Ybi−1Gi−1,j) + e1 Gi,j + (Ybie1) Yj, 9 H where G1,j,G2,j,...,Gi−1,j will be known at this , so setting (Ybie1) Yej = 0 fully defines upper Toeplitz Gi,j, giving

T H e1 Gi,j = −(Ybie1) (Yj + Yb1G1,j + Yb2G2,j + ... + Ybi−1Gi−1,j).

Thus having found G1,j,G2,j,...,Gj−1,j we can form Yej in (32), and then Ybj = YejGj,j via Lemma 2, and so on for j =2:nk, which completes the proof. A simpler result that holds for any ordering of Jordan blocks is covered by Theorem 3, and is summarized here for convenience.

Corollary 1. For a theoretical JCF SkY = YJ of Sk in Theorem 1 we can assume that each complete chain in Y has every non-eigenvector in that chain orthogonal to the eigenvector of that chain, and that all right eigenvectors of Sk are orthonormal.

8. Connections between the SVD and JCF principal vectors of Sk

We describe relationships among the SVD singular vectors of Sk, the JCF principal vectors of Sk, and Range(Vk). We use the abbreviations: “zero singular vectors” meaning singular vectors corresponding to zero singular values and “unit singular vectors” meaning singular vectors corresponding to unit singular values.

Remark 4. It is important to remember that here the vectors vj are theoretical in that they satisfy kvjk2 = 1 exactly, so that everything derived from them: Sk, its SVD and JCF, are exact theoretical objects, and the results in this section hold exactly. So when we discuss zero and unit singular values of Sk we mean these exactly. But the computa- tional cases seem to mimic these properties quite closely. The computed Lanczos vectors vj in sections 9 & 10 do not satisfy kvjk2 = 1 exactly, even though they are computa- tionally normalized. Nevertheless in some examples using MATLAB™ with IEEE double precision floating-point arithmetic (unit roundoff u = 2−53 ≈ 10−16), the computed SVD of the computed Sk from the computed Lanczos vectors gave singular values which were 0.000000000000000 and 1.000000000000000 to the limit of the printed output, i.e., they were accurate to within 10−15, see for example [15, Example 6.1].

First consider zero singular vectors.

Lemma 4. The JCF principal vectors and SVD singular vectors of Sk in Theorem 1 can be chosen to give a one-to-one correspondence between the eigenvectors and zero singular vectors of Sk. That is, for P3, W3 in Definition 1, with all the right eigenvectors of Sk in [Y e ,..., Y e ] and all the left eigenvectors in [X e ,..., X e ]H , these can be b1 1 bnk 1 e1 `1 enk `nk chosen so that

[Y e ,..., Y e ] = P ∈ k×nk , [X e ,..., X e ] = W ∈ k×nk . (34) b1 1 bnk 1 3 U e1 `1 enk `nk 3 U

k Proof. Since Sky = 0 for 0 6= y ∈ C if and only if y ∈ Range([Yb1e1,..., Ybnk e1]), k×nk and Sk[Yb1e1,..., Ybnk e1] = 0, we have Range([Yb1e1,..., Ybnk e1]) = Range(P3) ∈ C .

Also from Corollary 1 [Yb1e1,..., Ybnk e1] can be taken to be orthonormal. But P3 is 10 arbitrary up to a right orthogonal transformation, see Remark 2, so we can choose P3 =

[Yb1e1,..., Ybnk e1]. A similar argument proves the rest of (34).

The left and right eigenvectors of Sk form pairs, one pair for each Jordan block, e.g., y(i) in (20) and x(i) in (21), whereas from Remark 2 both the left and right zero singular 1 `i vectors are somewhat arbitrary, and it is not clear which two go together. Therefore the zero singular triplets in Remark 1 could have been more carefully defined in terms of eigenvectors of Sk. Here the JCF of Sk has revealed more structure than the SVD.

ˇ ˇ k×t Theorem 4. For Sk and Vk in Theorem 1 if Y, X ∈ C , then ˇ ˇ H ˇ ˇ H H ˇ SkY = 0 & Y Y =It ⇒ Y Vk VkY = It, (35) H ˇ ˇ H ˇ ˇ H H ˇ Sk X = 0 & X X =It ⇒ X Vk VkX = It. (k) ˇ ˇ H (k) ˇ H ˇ ˇ H ˇ Proof. These follow by forming Q1 Y and [X , 0 ]Q in (4). If Y Y =X X =It,     ˇ (k) ˇ Sk ˇ 0 (k+n)×t SkY = 0 ⇒ Q1 Y = Y = ∈ U , (36) Vk(Ik − Sk) VkYˇ H ˇ  ˇ H  (k)  ˇ H ˇ H H   ˇ H H  ˇ n×t Sk X =0 ⇒ X , 0 Q = X Sk, X (Ik −Sk)Vk = 0, X Vk ,VkX ∈U .

From the above theorem it follows that orthonormal eigenvectors Yb1e1,..., Ybnk e1 of Sk lead to the orthonormal matrix Vk[Yb1e1,..., Ybnk e1]. This occurs in the finite precision Lanczos process, see Theorem 8, and is a nice theoretical way of showing what it maintains. The next theorem involves unit singular triplets of Sk.

Theorem 5. Let Vk and Sk be as in Theorem 1 where Sk has the JCF in Definition 2. The results here will hold for any complete block of right principal vectors Yi in (19), but for simplicity we will just derive the results for Y1 ≡ Y1:`1 = [y1, . . . , y`1 ] such that SkY1 = Y1J1. Suppose there exists an integer t, 2 ≤ t ≤ `1, such that ky1k2 = kytk2, where we have scaled so that ky1k2 = 1, then

kVky1k2 = 1, kSkk2 = 1; kyjk2 = 1, j = 1:t; (37) H Vkyj = Vkyj−1, kVkyjk2 = 1, yj−1 = Skyj, yj = Sk yj−1, j =2:t. (38) For future reference we state the last three expressions in (38) in two compact forms:

H H H H SkY2:t = Y1:t−1,Sk Y1:t−1 = Y2:t; Y2:tY2:t =Y1:t−1Y1:t−1,Y1:tY1:t = It. (39)

Finally the P3, W1, and P1 in Definition 1 can be chosen such that

P3 = [y1,...],W1 = [y1, y2, . . . , yt−1,...],P1 = [y2, y3, . . . , yt,...]. (40)

These principal vectors of Sk give unit singular triplets {1, yj, yj+1} of Sk, j = 1 : t−1, see (38). Note in (40) that the columns of Y2:t−1 are columns of both W1 and P1.

(k) Proof. Since Sky1 = 0 and ky1k2 = 1, (4) gives 1 = kQ1 y1k2 = kVky1k2. Next for j = 2 : t, yj−1 = Skyj so kyj−1k2 = kSkyjk2 ≤ kyjk2, since from (3) kSkk2 ≤ 1. Thus ky1k2 = kytk2 = 1 implies kyjk2 = 1, j =1:t, and kSkk2 = 1, completing (37). 11 (k) Next from kyjk2 = 1 with (4), kQ1 yjk2 = 1, j =1:t, so that for j =2:t,     (k) Skyj yj−1 Q1 yj = = & kyj−1k2 =1⇒Vk(yj − yj−1)=0, (41) Vk(I −Sk)yj Vk(yj −yj−1) which with the first equality in (37) proves the first two equalities in (38). The third equality in (38) is just part of SkY1 = Y1J1. We then have

H H H H H 1 = yj−1yj−1 = yj−1Skyj = yj (Sk yj−1) ≤ kyjk2kSk yj−1k2 ≤ 1, j =2:t, (42)

H proving that yj = Sk yj−1, the fourth equality in (38). Next (38) leads directly to the first two equalities in (39), and these two lead to the third equality in (39) since

H H H H Y2:tY2:t = Y2:tSk Y1:t−1 = Y1:t−1Y1:t−1,

H showing that Y1:tY1:t is Toeplitz in (39). It is nonsingular since y1, . . . , yt are principal H H vectors of Sk. Now since y1 Sk = 0, it follows from the last equality in (38) that H H H T H T H y1 yj = y1 Sk yj−1 = 0 for j = 2 : t. This gives e1 Y1:tY1:t = e1 , and since Y1:tY1:t is Toeplitz and Hermitian, it is the unit matrix, completing the proof of (39). Lemma 4 shows we can choose P3 to give P3 = [y1,...] in (40). H From Definition 1, Range(P1) is the subspace of all eigenvectors of Sk Sk with unit H H eigenvalues. But from (39) Sk SkY2:t = Y2:t, so Y2:t ∈ Range(P1). Now Y1:tY1:t = I in m ×(t−1) (39), so Y2:t = P1Z1 for some Z1 ∈ U k . Then from (39) Y1:t−1 = SkY2:t = W1Z1 for the same Z1, as required in Remark 2, completing the proof of (40). H The relationships Vky1 = ··· = Vkyt in (38) with Y1:tY1:t = It in (39) reveal how these principal vectors of Sk correspond to rank deficiency in Vk, more precisely than the unit singular vectors of Sk do in (12) and (16), since those do not reveal that Vky1 = ··· =Vkyt. The JCF here also shows how the zero singular vector (eigenvector) y1 and the unit singular vectors (principal vectors) y2, . . . , yt are part of, and are ordered by, the same Jordan chain. The SVD reveals nothing of these relationships. These comments apply to all blocks Yi in (19). This essentially describes what happens with the Lanczos process when it produces repeated approximations to any single eigenvalue of A, see Remark 5 and Theorem 9. Each repeated approximation corresponds to a new unit singular vector (principal vector) in that Jordan block in the JCF of Sk corresponding to that eigenvalue.

9. The Lanczos process

H n×n n H Given A = A ∈ C and a vector v1 ∈ C of unit-length, i.e., v1 v1 = 1, one good implementation of the Hermitian matrix tridiagonalization process of Cornelius Lanczos (see [8], [12, (2.1)–(2.8)], and, e.g., [5, §§10.1–10.3] and [6]) uses the following two 2-term recurrences. Compute u1 := Av1, then for k =1, 2,...

H H 1/2  αk := vk uk, wk := uk − vkαk, βk+1 := +(wk wk) ,  stop if βk+1 is small enough, else (43) vk+1 := wk/βk+1, uk+1 := Avk+1 − vkβk+1.  12 n×k With Vk = [v1, . . . , vk] ∈ C in theory this gives after k steps

T H AVk = VkTk + vk+1βk+1ek = Vk+1Tk+1,k,Vk Vk = Ik, (44)

T where ek = (0,..., 0, 1), and we have real, symmetric, and tridiagonal   α1 β2  ..  β2 α2 .  Tk =   .  .. ..   . . βk  βk αk

In theory the process stops in at most n steps with βk+1 = 0.

9.1. Finite precision effects

With finite precision computation, the columns of Vk each have a Euclidean norm that is 1 to almost machine precision, but with a possible severe loss of orthogonality. Because Vk can become very rank deficient, the process can continue indefinitely with βk+1 never negligible, so that the resulting algorithms for finding eigenvalues or solving equations implemented in finite precision behave differently from the exact cases. To simplify this discussion we use the word “essentially” (without quotes) in the sense illustrated by: “essentially equal to” (also “≈”) meaning “equal to within O()kAk2”, ∈ ∈ and “∼” similarly, where if kyk2 = 1, then “y ∼ Range(P3)” means “y + O()kAk2 ∈ Range(P3)”. Here, together with the computer floating-point precision , O() may be polynomially dependent on the number of steps k, the dimension n of A, and the maximum number of nonzeros in a row of A, see [14, §3.2].

Definition 4. A possible solution to a given problem involving A or Tk is “backward stable” if it is the exact solution to that problem with a perturbed matrix A + δA or Tk + E where δA ≈ 0 or E ≈ 0 in the above sense.

A rounding error analysis of the Lanczos process led to the following result.

Theorem 6 ([14, Corollary 3.2]). After k finite precision steps of a good implemen- H tation (such as in (43)) of the Lanczos algorithm with A = A and v1 leading to the computed tridiagonal matrix Tk and βk+1, let Vk+1 = [v1, v2, . . . , vk+1] be the matrix of computed Lanczos vectors normalized to have unit length. Then with Q(k) in (4) we have an exact Lanczos process for the Hermitian matrix Ak in " # T 0 H(k) H(k) A 4 k + H(k),H(k) =H(k)H ≡ 11 12 , kH(k)k ≤O()kAk , (45) k = 0 A (k) (k) 2 2 H21 H22       Sk Sk sk+1 T Ak = Tk + βk+1ek , (46) Vk(I −Sk) Vk(I −Sk) vk+1 −Vksk+1 h i   (k) (k+1) 4 Sk sk+1 (k+n)×(k+1) Q1 q = ∈U , see (8). (47) Vk(I −Sk) vk+1 −Vksk+1 13 Thus the computed Tk+1,k is seen to be the exact result of k steps of an exact Lanczos process with exact orthogonality arising from the augmented Hermitian matrix Ak with (k) its O()kAk2 Hermitian backward error H , the only rounding error component. To H help understanding, if Vk Vk = I then Sk and sk+1 will be zero, the top block-row of (46) will be zero, while the bottom block-row will correspond to the ideal Lanczos process. Theorem 6 shows what is happening in the finite precision Lanczos process. Let

H AX =XΛ,X X =In,X ≡ [x1, . . . , xn], Λ = diag(λ1, . . . , λn) (48) denote the eigensystem of A, and (with Y ≡Y (k) and M ≡M (k))

T 4 (k) (k) 4 (k) (k) (k) TkY = YM,Y Y = Ik,M = diag(µ1 , . . . , µk ),Y = [y1 , y2 , . . . , yk ], (49) the eigensystem of the computed Tk. There is a clash of notation here, in that X and Y were also used for the JCF of Sk, see for example Definition 2. This clash should cause no confusion if it is kept in mind. Both uses are somewhat standard, and the present notation (k) (k) is worth keeping for consistency with [14, 15]. If an eigenpair {µj , yj } of Tk has T (k) (k) (k) (k) converged (i.e., if βk+1|ek yj | ≈ 0) then in the exact case of (44) AVkyj ≈ Vkyj µj , (k) (k) and {µj ,Vkyj } is a backward stable eigenpair for A. As discussed in the following section, the computational Lanczos process modelled by (45)–(47) parallels this very nicely.

10. Numerical behavior of the finite precision Lanczos process

First some background on the convergence of the process, see [15, Remarks 3.2, 3.6].

(k) (k) (k) Remark 5. If an eigenpair {µj , yj } of Tk has converged then µj has essentially con- verged to an eigenvalue λi of A. Orthogonality of vk+1 can only be lost in the direction of (k) (k) those Vkyj for which µj has converged, see [12, (3.18)]. But such loss of orthogonality allows the same eigenvalue of A to be approximated again later. Therefore the first time (k) (k) (k) any eigenvalue µj of Tk converges to an eigenvalue of A we call {µj , yj } a “first (k) (k) converged” eigenpair of Tk (with respect to A), and {µj ,Vkyj } a “first converged” eigenpair of A. If as k increases another eigenvalue converges to λi we call this a repeat. Once this has converged another repeat can converge, and so on.

The proof of convergence of the numerical Lanczos process in [15, §12] is long and difficult, and definitely needs improving. Theorem 7 here summarizes some of the main results. Briefly, the Lanczos process eventually converges in a backward stable way to each eigenvalue λi of A with xi represented in v1. Theorem 7 ([15, Theorem 12.3]). For the Lanczos process (43) applied to A = AH with initial unit-length vector v1 resulting in (46) with (4), consider the eigensystem (48). With a special choice of one eigenvector xi for each essentially multiple eigenvalue λi of A, the computational Lanczos process modelled in (45)–(47) eventually makes available H backward stable approximations to every such eigenpair {λi, xi} of A for which |xi v1|>0. H (k) 2 If A has distinct eigenvalues and |xi v1|>0, i=1:n, then kQ22 kF decreases monotoni- cally to zero, when all the eigenvalues of A will have been satisfactorily approximated by 14 n eigenvalues of Tk, and with the notation in Definition 1 and Theorem 2, Vb0, Ve2, Vb2, P2, and W2 are nonexistent, while

n×n (k) (k) (k) H Ve3 = VkP3 ∈ U ,Q22 = 0,P = [P1,P3],Q21 = Ve3P3 . (50)

Now we consider groups of essentially equal eigenvalues of Tk from the Lanczos process to illustrate practical occurrences of Theorems 4 and 5.

(k) (k) k×t Remark 6. In the next two theorems let TkYe1:t = Ye1:tdiag(µ1 , . . . , µt ), Ye1:t ∈ U , (k) (k) where these are t essentially equal converged eigenvalues µ1 ≈ · · · ≈ µt separated from t×t the rest. Then in each of the next two theorems Y1:t = Ye1:tZ for some Z ∈ U , so (k) k×t TkY1:t ≈ Y1:tµ1 , Y1:t ∈ U .

(k) (k) Theorem 8 (Unrelated eigenvalues [15, Corollary 11.6]). If µ1 ≈ · · · ≈ µt in (k) (k) k×t TkY1:t ≈ Y1:tdiag(µ1 , . . . , µt ), Y1:t ∈ U , are all first converged approximations to t eigenvalues of A for Tk from the Lanczos process, then for some Z in Remark 6,

H ∈ ∈ ∈ n×t Y1:tY1:t = It,Y1:t ∼ Range(P3),VkY1:t ∼ Range(Ve3),VkY1:t ∼U , (51)     (k) Sk 0 ... 0 SkY1:t ≈0,Q1 Y1:t = Y1:t ≈ , (52) Vk(I −Sk) Vky1 ...Vkyt

(k) (k) where each {µj ,Vkyj } is a backward stable eigenpair of A.

Here the columns of Y1:t are essentially eigenvectors of Sk, so SkY1:t ≈ 0 leading to (52). This is a practical occurrence of the theory in Theorem 4, see how (52) parallels (36). Next there is a fascinating result for any group of essentially equal eigenvalues of Tk which are all repeats of a single eigenvalue of A. Each such group corresponds to a distinct Jordan block of Sk. This is a practical occurrence of the results in Theorem 5.

(k) (k) Theorem 9 (Repeated eigenvalues [15, Corollary 11.4]). If µ2 ≈ · · · ≈ µt are (k) (k) (k) essentially repeats of the first converged µ1 , where TkY1:t ≈Y1:tdiag(µ1 , . . . , µt ), Y1:t ∈ Uk×t, so there is only one eigenvector of A corresponding to these t converged eigenvalues of Tk, then for some Z in Remark 6,     (k) 0 Yt−1 (k) 0 It−1 Q1 Y1:t ≈ , kVky1 k2 ≈1,SkY1:t ≈Y1:tJt,Jt = , (53) Vky1 0 0 0 (k) Sky1 ≈0; Skyj ≈yj−1,Vkyj ≈ Vkyj−1, kVkyj k2 ≈1, j =2:t; (54) ∈ ∈ ∈ y1 ∼Range(P3),Vky1 ∼Range(Ve3),Y2:t = [y2, . . . , yt]∼Range(P1), (55)

(k) (k) and the {µj ,Vkyj }, j =1:t, are all essentially identical backward stable eigenpairs for the one eigenpair of A.

The results (53)–(55) can be seen to fit very nicely with those in (37)–(40).

Remark 7. [A variant of [15, Remark 11.1]] When there are repeats, (54) shows that y1, . . . , yt essentially form the start of a Jordan chain of principal vectors of Sk. There- fore if there is a mix of different repeats and non-repeats in a converged group of close 15 (k) (k) eigenvalues µ1 ≈ · · · ≈ µt of Tk, in theory there is a right side unitary transformation of Y in (49) that will group each chain in its correct order, leading to Jordan blocks of the form shown in (53), so that we do not require any eigenvalues to be separated from the rest to split all the converged eigenvectors into their respective blocks. Each block ∈ starts with a yj ∼ Range(P3) followed by its repeats, if any, in Range(P1), see (55).

(k) (k) Theorem 7 showed that under easy conditions Q22 & 0. When Q22 = 0 [15, §13 & 14] showed that the computational Lanczos process will eventually make available a complete backward stable eigensystem of A = AH or a backward stable solution to Ax = b, and this is as good an accuracy as can be expected with finite precision.

11. Summary

Understanding the loss of orthogonality was the key to proving the accuracy of the Lanczos process, and this will probably be important for other iterative numerical or- thogonalization algorithms. Sections 6–8 showed that the matrix Sk in Theorem 1 used in the analysis of the process reveals the structure in loss of orthogonality for any set of n-vectors, and showed that the JCF of Sk gives even more information than its SVD. H The finite precision Lanczos process applied to A = A with initial vector v1 eventu- ally makes available a backward stable eigenpair of A for every eigenvalue λi of A with H |xi v1| > 0, see Theorem 7. Theorem 8 showed that in the converged approximations {µj,Vkyj} to eigenpairs of A, each yj was essentially an eigenvector of Sk, and that these produced an essentially orthonormal set of Vkyj, see (52). Paralleling this, Corollary 1 showed that for any set of unit 2-norm n-vectors in Vk, the eigenvectors Yb1e1,..., Ybnk e1 of Sk could be chosen to be an orthonormal set, and that these then led to the orthonormal matrix Vk[Yb1e1,..., Ybnk e1]. However because of the possibility of deriving many repeats of eigenvalues of A the finite precision Lanczos process can take many more than the ideal number of steps. It was shown in Theorem 9 that for a first converged approximation {µ1,Vky1} to any H eigenpair of A, with converged repeats {µj,Vkyj}, j =2:t, µ1 ≈ · · · ≈ µt, Y1:tY1:t = I, the Y1:t could be orthogonally transformed to essentially give the start of an orthonormal Jordan chain of Sk, with Vky1 ≈ · · · ≈ Vkyt and kVky1k2 ≈ 1, see (53)–(54). This is supported by the general result in Theorem 5 that if any such Sk in Theorem 1 has a Jordan chain Y1:t starting with ky1k2 = ··· = kytk2, then Y1:t can be scaled to be an H orthonormal Jordan chain, i.e., satisfying Y1:tY1:t = It, and then Vky1 = ··· = Vkyt, H kVky1k2 = 1, see (38). Also Y1:t−1 is an orthonormal left Jordan chain of Sk, see (39).

Acknowledgements

The authors are very thankful for the extremely thorough and excellent suggestions made by the referees, which improved this paper greatly. This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Grants RGPIN-2017-05138 and OGP0009236. 16 References

[1] A.˚ Bjorck¨ and C. C. Paige, Loss and recapture of orthogonality in the modified Gram–Schmidt algorithm, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 176–190, https://doi.org/10.1137/0613015. [2] D. Fong and M. A. Saunders, LSMR: An iterative algorithm for sparse least-squares problems, SIAM J. Sci. Comput., 33:5 (2011) pp. 2950-2971, https://doi.org/10.1137/10079687X. [3] F. R. Gantmacher, The Theory of Matrices, Vol.1, Chelsea Publishing Co., New York, 1959. [4] G. H. Golub and W. Kahan, Calculating the singular values and pseudo-inverse of a matrix, SIAM J. Numer. Anal., 2 (1965), pp. 205–224, https://doi.org/10.1137/0702016. [5] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., The Johns Hopkins University Press, Baltimore, 2013, [6] M. H. Gutknecht and Z. Strakoˇs, Accuracy of two three-term and three two-term recur- rences for Krylov space solvers, SIAM J. Matrix Anal. Appl., 22 (Jan. 2001), pp. 213–229, https://doi.org/10.1137/S0895479897331862. [7] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, 1985. [8] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear dif- ferential and operators, J. Research Nat. Bur. Standards, 45 (1950), pp. 255–282, https://doi.org/10.6028/jres.045.026. [9] G. Meurant, The Lanczos and Conjugate Algorithms, SIAM, Philadelphia, 2006, https://doi.org/10.1137/1.9780898718140. [10] G. Meurant and Z. Strakoˇs, The Lanczos and conjugate gradient algorithms in finite precision arithmetic, Acta Numerica, Cambridge University Press, 15 (2006), pp. 471–542, https://doi.org/10.1017/S096249290626001X. [11] C. C. Paige, The Computation of Eigenvalues and Eigenvectors of Very Large Sparse Matrices. PhD thesis, London University, London, England, 1971. [12] C. C. Paige, Accuracy and effectiveness of the Lanczos algorithm for the symmetric eigenproblem, Linear Algebra Appl., 34 (1980), pp. 235–258, https://doi.org/10.1016/0024-3795(80)90167-6. [13] C. C. Paige, A useful form of unitary matrix obtained from any sequence of unit 2-norm n-vectors, SIAM J. Matrix Anal. Appl., 31 (2009), pp. 565–583, https://doi.org/10.1137/080725167. [14] C. C. Paige, An augmented stability result for the Lanczos Hermitian matrix tridiagonalization process, SIAM J. Matrix Anal. Appl., 31 (2010), pp. 2347–2359, https://doi.org/10.1137/090761343. [15] C. C. Paige, Accuracy of the Lanczos Process for the Eigenproblem and Solution of Equations, SIAM J. Matrix Anal. Appl., 40:4 (2019), pp. 1371-1398, https://doi.org/10.1137/17M1133725. [16] C. C. Paige and M. A. Saunders, LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Software, 8 (1982), pp. 43–71, https://doi.org/10.1145/355984.355989. [17] C. C. Paige and P. Van Dooren, On the quadratic convergence of Kogbetliantz’s algorithm for computing the singular value decomposition, Linear Algebra Appl., 77 (1986), pp. 301–313. https://doi.org/10.1016/0024-3795(86)90173-4. [18] C. C. Paige and P. Van Dooren, Sensitivity analysis of the Lanczos reduction, Numer. Linear Algebra Appl., 6 (1999), pp. 29–50, https://doi.org/10.1002/(SICI)1099-1506(199901/02)6:1¡29::AID-NLA144¿3.0.CO;2-I. [19] C. C. Paige and W. Wulling¨ , Properties of a unitary matrix obtained from a sequence of nor- malized vectors, SIAM J. Matrix Anal. Appl., 35 (2014), pp. 526–545, https://doi.org/10.1137/120897687. [20] B. N. Parlett, The Symmetric Eigenvalue Problem, Classics in Appl. Math. 20, SIAM, Philadel- phia, 1998, https://doi.org/10.1137/1.9781611971163. [21] C. Sheffield, comment to Gene Golub, (Date unknown). [22] J. H. Wilkinson, The Algebraic Eigenvalue Problem, Clarendon Press, Oxford, UK, 1965,

17