Structure in loss of orthogonality $
Xiao-Wen Changa, Christopher C. Paigea,∗, David Titley-Peloquinb
aSchool of Computer Science, McGill University, Montr´eal,Qu´ebec, Canada bDepartment of Bioresource Engineering, McGill University, Ste-Anne-de-Bellevue, Qu´ebec, Canada
Abstract In [SIAM J. Matrix Anal. Appl., 31 (2009), pp. 565–583] it was shown that for any sequence of k unit 2-norm n-vectors, the columns of Vk, there is a special (n+k)-square unitary matrix Q(k) that can be used in the analysis of numerical algorithms based on (k) orthogonality. A k × k submatrix Sk of Q provides valuable theoretical information on the loss of orthogonality among the columns of Vk. Here it is shown that the singular value decomposition (SVD) and Jordan canonical form (JCF) of Sk both reveal the null space of Vk as well as orthonormal vectors available from a right-side orthogonal transformation of Vk. The JCF of Sk is shown to reveal more than its SVD does. The Lanczos orthogonal tridiagonalization process for a Hermitian matrix is then used to indicate the occurrence of some of these properties in practical computations. Keywords: Loss of orthogonality, singular value decomposition, Jordan canonical form, rounding error analysis, Lanczos process, eigenproblem. 2000 MSC: 65F15, 65F25, 65G50, 15A18
1. Introduction
n×k If Vk ∈ C has unit 2-norm columns, one can define the strictly upper triangular 4 −1 H matrix Sk = (I + Uk) Uk, where Uk is the strictly upper triangular part of Vk Vk, as well as the unitary matrix
S (I −S )V H (k) 4 k k k k Q = H , (1) Vk(Ik −Sk) In −Vk(Ik −Sk)Vk see Theorem 1 below. This Q(k) was described in [13] and can be the basis of the rounding error analysis of several numerical algorithms based on orthogonality, see, e.g., [14, 15]. But more generally the matrix Sk provides valuable theoretical information on the loss of orthogonality among the columns of any such Vk. Here properties of Sk are developed in general, and used to show the various properties of Vk that can occur. In particular,
$With best wishes to Paul Van Dooren, one of the brightest and most likeable of people. ∗Corresponding author Email addresses: [email protected] (Xiao-Wen Chang), [email protected] (Christopher C. Paige), [email protected] (David Titley-Peloquin) Preprint submitted to Linear Algebra and Its Applications July 7, 2020 we show that the Jordan canonical form (JCF) of Sk can reveal important properties that are not available from the singular value decomposition (SVD) of Sk. The paper is organized as follows. In the next two sections we give a very brief history followed by the notation used here. Section 4 summarizes the theorem on unitary Q(k) in (1), while section 5 derives some properties of Q(k) that we need. Section 6 deals with the SVD of Sk, and shows how it defines important subspaces related to Vk. Section 7 introduces the JCF of Sk, then section 8 shows how this reveals more properties of Vk. These are new results for general Vk with unit length columns, so proofs are given in these sections 7 & 8. Section 9 summarizes the Lanczos process and the result of its rounding error analysis in [14], which shows that the finite precision Lanczos process behaves as a higher dimensional exact Lanczos process for a slightly perturbed (k + n) × (k + n) matrix Ak. Section 10 states a theorem on how the Lanczos process converges, and then uses the JCF of Sk to reveal some surprising numerical behaviors of the Lanczos process and therefore of some other numerical iterative orthogonalization algorithms.
2. A very brief history of the Lanczos process and orthogonalization
Although the orthogonal tridiagonalization of a Hermitian matrix A devised by Cor- nelius Lanczos [8] is simple and elegant mathematically, its numerical behavior has fasci- nated many for 70 years. The Lanczos process was originally discarded because of its loss of orthogonality, then brought back in importance and very gradually understood. There have been many useful works on this resuscitation, such as [11, 12, 20, 9, 10, 14, 15]. The ideas behind the Lanczos process led to other valuable algorithms such as in [4, 16, 2], and there has also been work on the sensitivity of the tridiagonal matrix and vectors resulting from the Lanczos process to perturbations in A, see for example [18]. But an understanding of the loss of orthogonality of the Lanczos process turned out to be crucial. A breakthrough in our understanding of loss of orthogonality in general was initiated by a comment by Charles Sheffield [21] to Gene Golub, which Gene related to Ake˚ Bj¨orck and Chris Paige around 1990, see [1]. This concerned the loss of orthogonality in modified Gram-Schmidt (MGS), but it was shown in [13] that it could be extended to apply to any sequence of unit-length vectors vj. A more complete background of this is given in [13, Section 2.2]. This approach was applied in [14] to give an augmented backward stability result for the Hermitian matrix Lanczos process [8], and this was used in [15] to prove the iterative convergence of the Lanczos process for the eigenproblem and solution of equations, along with more history in [15, Section 2]. Here we look more deeply into the properties of Sk in (1) and what it tells us about loss of orthogonality in general.
3. Notation
4 We use “=” for “is defined to be”, and “≡” for “is equivalent to”. Let In denote the n×k n × n unit matrix, with j-th column ej. We say Q1 ∈ C has orthonormal columns H n×k if Q1 Q1√= Ik and write Q1 ∈ U . For a vector v, we denote its Euclidean norm by 4 H n×m kvk2 = v v. For a matrix B = [b1, b2, . . . , bm] ∈ C we denote its Frobenius norm 4 by kBkF , its spectral norm by kBk2 = σmax(B) the maximum singular value of B, and its range by Range(B). For indices, i:j means i, i+1, . . . , j, while Bi:j ≡ [bi, bi+1, . . . , bj]. 2 We will be dealing with sequences of matrices of increasing dimensions, and will use the index k to denote the k-th matrix in a sequence, usually as for example Q(k), in (k) (k) (k) which case subscripts denote partitioning, as in Q ≡ [Q1 | Q2 ]. We often omit the particular superscript ·(k) when the meaning is clear. However there are five special matrices where we denote the k-th matrix by a subscript: Vk, Uk, Sk, Tk, and Ak. For these the (k + 1)-st matrix can be obtained from the k-th by adding a column, e.g., Vk+1 = [Vk, vk+1], or a column and a row, and there is no need for further subscripts. This makes their presentation and manipulation easier to understand in formulae.
4. Obtaining a unitary matrix from unit-length n-vectors The next theorem was given in full with proofs in [13]. It allows us to develop a (k) (k +n)×(k +n) unitary matrix Q from any n×k matrix Vk with unit-length columns.
n Theorem 1 ([13, Theorem 2.1]). For integers n ≥ 1 and k ≥ 1 suppose that vj ∈ C satisfies kvjk2 = 1, j =1:k+1, and Vk = [v1, . . . , vk]. If Uk is the strictly upper triangular H H matrix satisfying Vk Vk = I + Uk + Uk , define the strictly upper triangular matrix Sk via 4 −1 −1 k×k Sk = (Ik + Uk) Uk = Uk(Ik + Uk) ∈ C . (2) Then
H H kSkk2 ≤ 1; Vk Vk = I ⇔ kSkk2 = 0; Vk Vk singular ⇔ kSkk2 = 1. (3)
Here Sk is the unique strictly upper triangular k × k matrix such that " # h i (k) (k) S (I −S )V H (k) (k) (k) Q11 Q12 4 k k k k (k+n)×(k+n) Q ≡ Q1 Q2 ≡ (k) (k) = H ∈U . Q Q Vk(Ik −Sk) In −Vk(Ik −Sk)V k n 21 22 k (4) Finally Sk and Sk+1 have the following relations S s S ≡ k k+1 ∈ (k+1)×(k+1), s =(I −S )V H v . (5) k+1 0 0 C k+1 k k k k+1 Here is an indication of a proof. From (2) it can be shown that
−1 −1 −1 UkSk = SkUk,Uk =(Ik −Sk) Sk ≡Sk(Ik −Sk) , (Ik −Sk) = Ik +Uk. (6)
(k)H (k) (k) To prove Q1 Q1 = Ik in (4), use (2) and (6) to give (dropping · and ·k): S U(I + U)−1 U Q ≡ = = (I + U)−1, 1 V (I − S) V (I + U)−1 V H −H H H −1 −H H H −1 Q1 Q1 = (I +U) [V V +U U](I +U) = (I +U) [I +U +U +U U](I +U) = (I + U)−H [(I + U)H (I + U)](I + U)−1 = I.
This was given in [15, §4]. Next, for example, kSkk2 ≤ 1 in (3) follows immediately. Finally, the first equation in (5) follows from the definition of Sk, and to prove the second equation in (5) we see from (6) that Sk+1=(Ik+1 −Sk+1)Uk+1, so that s V H v (I −S )V H v k+1 =S e =(I −S )U e =(I −S ) k k+1 = k k k k+1 . 0 k+1 k+1 k+1 k+1 k+1 k+1 k+1 k+1 0 0 3 5. Some properties of Q(k) in Equation (4)
Our analysis uses properties of the sub-blocks of Q(k) in (4), see [15, §6]. H (k) From (5) sk+1 = (Ik − Sk)Vk vk+1 = Q12 vk+1, so together with (4)
(k) H (k+1) Q22 vk+1 =[In −Vk(Ik −Sk)Vk ]vk+1 =vk+1 −Vksk+1 =Q21 ek+1, (7) " # s Q(k) q(k+1) 4 k+1 = 12 v = Q(k)v . (8) = v −V s (k) k+1 2 k+1 k+1 k k+1 Q22
4 H For j =1:k+1 define the orthogonal projectors Pj =In− vjvj . Because Sk is strictly (1) upper triangular we see S1 =0, so from (4) we have Q22 =P1, and we use
(k) (k) H (k) (k) H Q21 = Vk(Ik − Sk),Q12 = (Ik − Sk)Vk ,Q22 = In − Q21 Vk , I − S −s V = [V , v ],I − S = k k k+1 , k+1 k k+1 k+1 k+1 0 1
(k) to prove several things with (8), in particular we now prove that Q22 =P1 ···Pk.
(k+1) (k) (k) Q21 =Vk+1(Ik+1 −Sk+1)=[Vk(Ik −Sk), vk+1 −Vksk+1]=[Q21 ,Q22 vk+1], (9) (k+1) (k+1) H (k) H (k) H (k) H Q22 =In −Q21 Vk+1 =In −Q21 Vk −Q22 vk+1vk+1 =Q22 (In −vk+1vk+1), (10)
(k+1) from which Q22 =P1 ···PkPk+1 follows by induction. (k) The decrease in kQ22 kF is crucial for proving convergence and accuracy of the finite (k+1) (k) precision Lanczos process, so we discuss it here. First kQ22 k2 ≤kQ22 k2 because
(k+1) (k+1)H (k) H (k)H (k) (k)H (k) H (k)H Q22 Q22 = Q22 (In −vk+1vk+1)Q22 = Q22 Q22 − Q22 vk+1vk+1Q22 .
(0) 4 (k) This and (9) with Q22 = In show how kQ22 kF decreases:
(k+1) 2 (k+1) (k+1)H (k) 2 (k) 2 kQ22 kF = trace[Q22 Q22 ] = kQ22 kF − kQ22 vk+1k2. (11)
(k) (k) 2 Ideally Sk+1 = 0, so in (7) Q22 vk+1 = vk+1, and kQ22 kF = n−k decreases by 1 each (n) n×n (k) 2 step until Q22 = 0 and Vn ∈ U , see (4). But with loss of orthogonality kQ22 kF can decrease far more slowly. This can lead to dramatic slowdown of the Lanczos process.
6. The singular value decomposition (SVD) of Sk
(k) (k) (k)H We now develop the theoretical SVD Sk = W Σ P when Sk in (2) arises from any matrix Vk with unit-length columns. We remind the reader that we often omit the (k) H superscript · for readability, and write, e.g., Sk = W ΣP . From (3) σmax(Sk) ≤ 1, and any unit singular value of Sk will be important in this H analysis. Also if Vk Vk = I then Sk = 0 in (2), and it will help to label each singular vector of Sk according to its zero, unit, or in between singular values. Briefly, zero singular values correspond to no loss of orthogonality, unit singular values to loss of linear independence, and intermediate singular values to loss of orthogonality but not loss of linear independence. The rest of this section comes from [19]. 4 Definition 1 (Partitioned SVD of Sk,[19, §4]). Let the k × k matrix Sk in Theo- rem 1 have mk unit and nk zero singular values with SVD
H H H H 2 H 2 H H Sk =W ΣP =W1P1 +W2Σ2P2 ,I −SkSk =W Γ W =W2Γ2W2 +W3W3 , (12) (k) (k) (k) k×k W ≡ W ≡ [w1 , . . . , wk ] ≡ [W1,W2,W3] ∈ U , k =`k +mk +nk, mk `k nk (k) (k) (k) k×k P ≡ P ≡ [p1 , . . . , pk ] ≡ [P1,P2,P3] ∈ U , mk `k nk
(k) `k×`k Σ≡ Σ ≡diag(σ1, . . . , σk)≡diag(Imk , Σ2,Onk ), Σ2 ∈ R , 2 4 2 (k) Γ = Ik −Σ , Γ≡Γ ≡diag(γ1, . . . , γk)≡diag(Omk , Γ2,Ink ), Γ2 positive definite,
(k) where the singular values σj, 1 ≤ j ≤ k, of Sk in Σ≡Σ are arranged as follows,
1 = σ1 = ··· = σmk > σmk+1 ≥ · · · ≥ σmk+`k > σmk+`k+1 = ··· = σk = 0. (13)
These singular vectors of Sk combine with (4) to reveal key properties of Vk:
" # " # (k) SkP W1 W2Σ2 0 W1 W2Σ2 0 Q1 P = = ≡ , (14) Vk(Ik −Sk)P Vk(P1 −W1) Vk(P2 −W2Σ2) VkP3 0 Ve2Γ2 Ve3 where Ve2 and Ve3 are formally defined in the following theorem and it is easy to verify that [Ve2, Ve3] has orthonormal columns. The first equality in (14) follows from the structure of (k) (k) Q , and the second by applying (12). But the columns of Q1 P are orthonormal, giving the structure in the fourth expression. The fourth expression reveals the null space of Vk and indicates the columns of [Ve2, Ve3] span Range(Vk) due to the fact that Γ2 > 0 and (I − Sk)P in the second expression is nonsingular. This structure was used in proving the following theorem.
Theorem 2 (Range & null space of Vk,[19, Theorem 4.2]). With the notation in Theorem 1 and Definition 1, define
4 −1 4 4 −1 4 Ve2 = Vk(P2 −W2Σ2)Γ2 , Ve3 = VkP3, Vb2 = Vk(W2 −P2Σ2)Γ2 , Vb3 = VkW3.
⊥ Let the columns of Vb0 comprise an orthonormal basis of Range(Vk) . Then defining (k) 4 (k) 4 Ve =[Vb0, Ve2, Ve3] and Vb =[Vb0, Vb2, Vb3],
Range(Vk)=Range([Ve2, Ve3])=Range([Vb2, Vb3])⊥Range(Vb0), rank(Vk)=k−mk, (15)
k×mk N (Vk) = Range(P1 −W1),P1 − W1 ∈ C , rank(P1 −W1)=mk, (16) (k) n×n (k) n×n Ve ≡ Ve ≡ [Vb0, Ve2, Ve3] ∈ U , Vb ≡ Vb ≡ [Vb0, Vb2, Vb3] ∈ U , (17) " # h i H (k) In−(k−mk) 0 Vb0 H H Q22 = Vb0 Ve2 H = Vb0Vb0 − Ve2Σ2Vb2 , (18) 0 −Σ2 Vb2 where rank(P1 −W1)=mk follows since (I −Sk)P1 = P1 −W1 and I −Sk is nonsingular.
Some singular values of Sk have a useful persistency with increasing k. 5 (k) (k) Remark 1. It was shown in [19, §5] that if in Definition 1 we call {1, wj , pj } for (k) (k) j = 1 : mk the unit singular triplets (or unit triplets) of Sk, and {0, wj , pj } for j = mk + `k + 1 : k (see (13)) zero singular triplets (or zero triplets) of Sk, then for any unit (or zero) triplet of Sk there is a related unit (or zero) triplet for any S` having Sk p p as leading principal submatrix. For example if Skp = 0 then S` [ 0 ] = 0, so [ 0 ] is always a singular vector. See [19, Remark 5.1] for more details.
Remark 2. In Definition 1 and Remark 1 it can be seen that W1 and P1 are arbitrary up to multiplication on the right by the same orthogonal transformation Z ∈ Umk×mk , H H since W1P1 = (W1Z)(P1Z) , while P3 and W3 are each arbitrary up to individual right orthogonal transformations. It follows that Ve3 = VkP3 in (14) is arbitrary up to a right orthogonal transformation. Then (14) shows that exactly nk orthonormal vectors VkP3 can be obtained via a right orthogonal transformation of Vk. We also see from (16) that each unit triplet of Sk corresponds to a unit loss of rank of Vk.
7. The Jordan canonical form (JCF) of Sk
Since Sk is strictly upper triangular its JCF is special, having all eigenvalues zero. If Vk in Theorem 1 had k random unit length n-vectors then we would expect rank(Vk) = k while k ≤ n, and to have mk = k − n unit singular values of Sk for k ≥ n, see (15). For k ≤ n we would also expect Sk to have one zero singular value with the other k − 1 being distributed in (0, 1), and the Jordan canonical form would probably be just one big Jordan block with no other interesting structure. Then kSkk2 would give a measure of the loss of orthogonality, but Sk would not give much else. However the theory here is for understanding the structure in the loss of orthogonality n×k when for example Vk comes from computations intended to produce Vk ∈ U , and then far more interesting properties can arise—there can be a lot of structure. Many properties we discuss arose with the computational Lanczos process analyzed in [15], and will presumably arise in some other large dimensional orthogonalization processes. We will need the Jordan canonical (or normal) form of Sk to fully describe some of these properties. For possible future use we give more properties of this JCF than needed here. Here we will state some standard results. Since Sk is nilpotent we will just state the theory for this case. The eigen-subspace for the zero eigenvalue is N (Sk) = Range(P3) in Definition 1, so the geometric multiplicity of this zero eigenvalue is nk.
Definition 2 (Jordan canonical form (JCF) of Sk. See, e.g., [7, §3.1.11]). For k×k k×k Sk in Definition 1 there exist J ∈ R and nonsingular Y ∈ C such that
h 0 I i ` ×` S Y = Y J, Y 4 [Y ,...,Y ],J 4 diag(J ,...,J ),J 4 `i−1 ∈ i i . (19) k = 1 nk = 1 nk i = 0 0 R
Writing Y 4 [y(i), . . . , y(i)] gives S Y = Y J and the complete chain i = 1 `i k i i i
(i) (i) (i) Sky1 = 0,Skyj = yj−1, j = 2, 3, . . . , `i, (20)
(i) where `i is the height or length of the chain, j is the grade of the principal vector yj , (i) and y1 is the eigenvector. There are nk complete chains, i = 1, 2, . . . , nk in (20). 6 4 −H H H Partitioning X ≡ [X1,...,Xnk ] = Y identically to Y gives X Sk = JX and with X 4 [x(i), . . . , x(i)] i = 1 `i
SHX = X J T ; SH x(i) = x(i) , j = 1, 2, ..., ` −1; SH x(i) = 0,XH Y = I. (21) k i i i k j j+1 i k `i i i
A complete chain Y = [y(i), . . . , y(i)] corresponds to the Jordan block J , and then i 1 `i i SH x(i) = 0. We say that [y(i), y(i), . . . , y(i)], 1 ≤ j ≤ ` is a chain, and it becomes k `i 1 2 j i a complete chain when j = `i.
The principal vectors in a JCF can be far from unique, and so if Y in (19) does not already have desirable properties, we can alter it somewhat. We only need the theoretical JCF. We know a Y exists, but we need not compute Y or any of its transformations.
Remark 3. Applying standard nomenclature to our Sk, (see for example Wilkinson [22, j j−1 pp. 42–43]) any vector y which satisfies Sky = 0, Sk y 6= 0, for integer j > 0 is called a (right side) principal vector of grade j of Sk. But the principal vectors are not unique, since if y is a principal vector of grade j then the same is true of any vector obtained by adding multiples of any vectors of grades not greater than j. If such changes produce Yb from Y in (19), this does not say that every chain in (20) will be preserved. We next show that every chain in (20) will be preserved if and only if SkYb = YJb where Yb = YG for some nonsingular G with certain properties.
Definition 3. A matrix is upper Toeplitz if it is zero except for an as large dimension as possible upper triangular and Toeplitz matrix in the top right corner, for example
α β α β δ 0 α β 0 α , , 0 α β . (22) 0 0 α 0 0 0 0 α
Lemma 1 ([3, p. 159]). For Y and J in (19), SkYb = YJb for Yb = YG with nonsin- gular G if and only if JG = GJ. If we partition G conformably with J in (19), G will 2 `i×`j have nk blocks Gi,j ∈ C , and then, see Definition 3,
JG = GJ ⇔ JiGi,j = Gi,jJj, i = 1 : nk, j = 1 : nk, (23)
⇔ Gi,j is upper Toeplitz, i = 1 : nk, j = 1 : nk. (24)
Next with the same partitioning as G, G−1 has all its sub-blocks upper Toeplitz. Then H 4 −1 H H H −1 H H 4 −1 Xb = Yb satisfies Xb Sk = JXb , where Xb = G X with X = Y . Finally if JGi = GiJ for nonsingular Gi, i = 1, 2, then JG1G2 = G1G2J.
Proof. The first sentence holds since SkY = YJ and G is nonsingular, giving
SkYb =YJb ⇔ SkYb =SkYG=YJG=YJb =Y GJ ⇔ JG=GJ. (25)
Now JiGi,j = Gi,jJj in (23) if and only if Gi,j is upper Toeplitz, see, for example, Gantmacher [3, p. 159], or just equate the elements on each side of JiGi,j = Gi,jJj. Next for nonsingular G, JG = GJ ⇔ G−1J = JG−1 and therefore G−1 has all its sub-blocks upper Toeplitz from (23). 7 H −1 H H H Then from SkYb = YJb we see that Xb = Yb satisfies Xb Sk = JXb , where Xb = −1 −1 −1 −1 H H −1 H H (YG) = G Y = G X with X = Y in X Sk = JX . Finally
JG1 = G1J & JG2 = G2J ⇒ JG1G2 = G1JG2 = G1G2J. (26)
4 We see from (26) that we can transform Y in SkY = YJ to Yb = YG1G2 ··· Gm having various different forms using a sequence of nonsingular matrices Gi that commute with J. First we show that we can always make each non-eigenvector in a chain orthogonal to the eigenvector of that chain. This is true for the JCF of any square matrix.
Lemma 2. For SkYi =YiJi, i=1:nk in (19), we can choose nonsingular upper Toeplitz `i×`i 4 H Gi,i ∈C so that with Ybi =YiGi,i we have SkYbi =YbiJi where Ybi Ybie1 =e1. If kYie1k2 = 4 1 then Ybie1 = Yie1. Then for J in (19) SkYb = YJb where Yb = [Y1G1,1,...,Ynk Gnk,nk ].
Proof. We will prove the result for Y1. The proofs for Yi, i = 2 : nk follow analogously. From Lemma 1, upper Toeplitz G1,1 ensures that J1G1,1 = G1,1J1, so with Yb1 = Y1G1,1
γ1 γ2 · γ`1 S Y = S Y G = Y J G = Y G J = Y J ,G 4 ··· . k b1 k 1 1,1 1 1 1,1 1 1,1 1 b1 1 1,1 = · γ2 γ1
Write Y1 ≡ [y1, . . . , y`1 ], Yb1 ≡ [ˆy1,..., yˆ`1 ], and take γ1 = 1/ky1k2 so that kyˆ1k2 = 1. Then we can makey ˆ2,y ˆ3,y ˆ4, ...,y ˆ`1 orthogonal toy ˆ1 = y1γ1 via
Yb1 =Y1G1,1 =[y1γ1, y2γ1 + y1γ2, y3γ1 + y2γ2 + y1γ3, y4γ1 + y3γ2 + y2γ3 + y1γ4,...], H 3 H 2 H 2 γ2 = −y1 y2γ1 , γ3 = −y1 (y3γ1 + y2γ2)γ1 , γ4 = −y1 (y4γ1 + y3γ2 + y2γ3)γ1 ,..., H 2 γ`1 = −y1 (y`1 γ1 + y`1−1γ2 + ··· + y2γ`1−1)γ1 .
Finally if we also have ky1k2 = 1 then γ1 = 1 andy ˆ1 = y1γ1 = y1. The following lemma shows that if we reorder the Jordan blocks, we can easily obtain the reordered JCF by reordering the complete chains in the same way.
Lemma 3. In Definition 2 we have SkY = YJ. If we reorder the Jordan blocks by a 4 T permutation matrix Π to give Jb = Π JΠ, then
4 SkYb =YbJ,b Yb = Y Π.
T Proof. Since SkY = YJ, SkY Π = Y ΠΠ JΠ, leading to SkYb = YJb .
For Sk in Theorem 1 we can obtain a lot of orthogonality in the principal vectors.
Theorem 3. Suppose that the JCF (19) of Sk in Theorem 1 has been permuted so that k×k `1 ≥ `2 ≥ · · · ≥ `nk in SkY = YJ. We can always choose G ∈ C to give the JCF:
4 k×`j SkYb = Yb J, Yb = YG ≡ [Yb1, Yb2,..., Ybnk ]; Ybj ∈ C , j = 1 : nk, (27) where every vector in Ybj is orthogonal to the eigenvector Ybie1 of each previous block, 1 ≤ i < j, every non-eigenvector in each block is orthogonal to the eigenvector of that block, and every eigenvector Ybje1 has unit 2-norm, i.e.,
H H Ybj Ybie1 = 0, i= 1:j−1, j =2:nk; Ybj Ybje1 = e1, j =1:nk. (28) 8 Proof. The matrix Yb = YG in (27) is developed via a sequence of multiplications:
4 (2) (3) (nk) G = G1G G2G G3 ··· G Gnk ; (29) G 4 diag(I , .., I ,G ,I , .., I ),G ∈ `j ×`j upper Toeplitz, (30) j = `1 `j−1 j,j `j+1 `nk j,j C for j =1:nk, while
γ(1) γ(2) · γ(`j−1) γ(`j ) IG1,j ij ij ij ij γ(1) γ(2) · γ(`j−1) ·· ij ij ij (j) 4 IGj−1,j 4 ··· `i×`j G ,Gi,j ∈ C , (31) = I = · γ(2) ij · (1) γ I ij
(j) for i = 1 : j −1 and j = 2 : nk. The matrices Gj and G are nonsingular and commute with J in (19). Lemma 1 with (29) then shows that Yb = YG satisfies (27). Note that the first multiplication YG1 in (29) only alters Y1 to become Yb1, which then remains unchanged in later steps. Then for j >1, Yj is unchanged until step j,