A compact Arnoldi algorithm for polynomial eigenvalue problems
Yangfeng Su, Junyi Zhang, Zhaojun Bai
School of Mathematical Sciences Fudan University
January, 2008, Taiwan RANMEP2008 Goals
Polynomial Eigenvalue Problems (PEPs):
d A0 + λA1 + ... + λ Ad x = 0
(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory. Goals
Polynomial Eigenvalue Problems (PEPs):
d A0 + λA1 + ... + λ Ad x = 0
(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory. Outline
1 Background
2 Algorithm
3 Numerical Comparison Outline
1 Background
2 Algorithm
3 Numerical Comparison Linearization
Linearization ⇒ A larger GEP: A0 −A1 −A2 ... −Ad x I I λx − λ = 0 .. .. . . . . I I 0 λd−1x
Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Linearization
Linearization ⇒ A larger GEP: A0 −A1 −A2 ... −Ad x I I λx − λ = 0 .. .. . . . . I I 0 λd−1x
Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Linearization
Linearization ⇒ A larger GEP: A0 −A1 −A2 ... −Ad x I I λx − λ = 0 .. .. . . . . I I 0 λd−1x
Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Linearization
Linearization ⇒ A larger GEP: A0 −A1 −A2 ... −Ad x I I λx − λ = 0 .. .. . . . . I I 0 λd−1x
Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Arnoldi decomposition
An Arnoldi decomposition of order-j:
T OPQj = Qj Hj + hj+1,j qj+1ej
where OP: a matrix
Qj = [q1, q2,..., qj ]: orthonormal
Hj : upper Hessenberg
Arnoldi process is used to compute the Arnoldi decomposition with numerical stability Implicitly Restarted Arnoldi (IRA) for SEP
Sorensen [SIMAX92] Given an Arnoldi decomposition of order-p,
T OPQp = QpHp + βpqp+1ep .
1 Extend Arnoldi decomposition from order p to order k:
T AQk = Qk Hk + βk qk+1ek .
2 Divide the eigenvalues of Hk as “good” ones µ1, . . . , µp and “bad” ones µp+1, . . . , µk .
3 For Hk do implicitly QR steps with shifts µk+1, . . . , µk , get H Hk = UH˜k U 4 Take first p columns of
˜ T AQk U = Qk UHk + βk qk+1ek U
as a restarted Arnoldi decomposition of order p. ARPACK
ARPACK is an implementation of IRA algorithm a well-coded, well-documented package produced by Lehoucq, Sorensen and Yang during 1992-1997 used in MATLAB as eigs and arpackc IRA for QEP
For simplicity, we only discuss QEPs. For QEP:
(λ2M + λD + K)x = 0
1 shift-and-invert: for shift σ, let λ = σ + 1/µ
(µ2I − µA − B)x = 0
where
A = −(σ2M + σD + K)−1(2σM + D) B = −(σ2M + σD + K)−1M
2 linearize AB µx µx = µ I 0 x x
3 apply IRA IRA for QEP
Easy use of ARPACK
How to utilize the Frobenius structure to save memory? Outline
1 Background
2 Algorithm
3 Numerical Comparison Arnoldi decomposition for QEP
An Arnoldi decomposition with order-j
OPQj = Qj+1Hbj
For QEP: AB Qj,1 Qj+1,1 = Hbj I 0 Qj,2 Qj+1,2 Since Qj,1 = Qj+1,2Hbj we have Theorem
rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1
Observed by many people, e.g. Meerbergen [SIMAX06], Bai & S. [SIMAX05]. The key is how to use it with numerical stability. Arnoldi Decomposition for QEP
Theorem
rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1
Let n×rj Vj ∈ C = orth[Qj,1, Qj,2] then Qj,1 Vj Rj,1 Vj Rj,1 Qj = = = Qj,2 Vj Rj,2 Vj Rj,2
Two levels of orthonormality:
Vj is orthonormal R j,1 is orthonormal Rj,2 Compact ARnoldi Decomposition (CARD)
Compact ARnoldi Decomposition (CARD) Vj Rj,1 Vj+1 Rj+1,1 OP = Hbj Vj Rj,2 Vj+1 Rj+1,2
n×r Vj ∈ C j r ×j Rj ∈ C j
j ≤ rj+1 ≤ j + 1 Memory cost: Arnoldi: 2n(j + 1) ( for PEPs: dn(j + 1))
CARD: nrj+1 ≤ n(j + 2) (for PEPs: ≤ n(j + d + 1)) CARD process
CARD process is to compute the CARD with numerical stability!
CARD of order j: VR1 VR1 Vˆ Rˆ1 OP = H + βqe1 = Hˆ VR2 VR2 Vˆ Rˆ2
Expand it to a CARD of order j + 1 ⇒ next two pages Expand CARD process
1 compute q1 = Aq1 + Bq2;
2 decompose q1 = Vˆ x + vα with MGS
T x = Vˆ q1,
v = q1 − Vˆ x, α = kvk, v = v/α
3 update h i Vˆ = Vˆ , v , rj+1 = rj + 1,
Rˆ x Rˆ Rˆ (:, j + 1) Rˆ = 1 , Rˆ = 2 2 1 0 α 2 0 0 Expand CARD process
Rˆ1 x ˆ R1 0 α := Rˆ2 Rˆ2 Rˆ1 (:, j + 1) 0 0
4 decompose with MGS:
Rˆ (:, j + 2) Rˆ (:, 1 : j + 1) 1 = 1 H Rˆ (:, j + 2) Rˆ (:, 1 : j + 1) 1:j+1,j+1 2 old 2 Rˆ (:, j + 2) + 1 H Rˆ (:, j + 2) j+2,j+1 2 new 5 update the current Arnoldi vector q: q1 q = q2
q1 = Vˆ Rˆj+1,1 [:, j + 2]
q2 = Vˆ Rˆj+1,2 [:, j + 2]
Only GMS (with re-orthogonalization), no inversion CARD is numerically stable! IRA with CARD
Given a CARD of k-order: VR1 Vˆ Rˆ1 H OP = ˆ ˆ T VR2 V R2 βek
IRA does (m − p) QR steps on H with shifts µp+1, ..., µm, i.e.
H = UHUe H
Then VR1 Vˆ Rˆ1 UH˜ OP U = ˆ ˆ T VR2 V R2 βek U Denote U Uˆ = 1 IRA with CARD
Then VR1U Vˆ Rˆ1Uˆ H˜ OP = ˆ ˆ ˆ T VR2U V R2U βek U Its first p columns , denoted by Vk R1,p Vk+1R1,p+1 Hp OP = ˜ T Vk R2,p Vk+1R2,p+1 βep still form an Arnoldi decomposition of order p
However, the Vk has rk (instead of rp) columns, it is not a CARD! IRA with CARD
Since Vk+1R1,p+1 Vk+1R2,p+1 is the orthonormal factor of an Arnoldi decomposition, from previous theorem,
rank[Vk+1R1,p+1, Vk+1R2,p+1]
= rank[R1,p+1, R2,p+1]
= rp+1 ≤ p + 2,
we have a compact SVD:
rk+1×rp+1 rp+1×rp+1 rp+1×(p+1) rp+1×(p+1) [R1,p+1, R2,p+1] = P Σ [G1 , G2 ] ≡ P[R1, R2] IRA with CARD
Therefore,
n×r V k+1 R (V P)n×rp+1 R k+1 1,p+1 = k+1 1 Vk+1R2,p+1 (Vk+1P)R2
The Arnoldi decomposition is expressed in CARD again!
This process can also be implemented by a compact “QR” decomposition, which is similar with the compact Arnoldi decomposition (CARD). Details omitted. POLYAR
POLYAR: modified ARPARK for polynomial eigenvalue problems (not only QEPs) 1 znaitr p: compute CARD 2 znapps p: IRA with CARD; use LAPACK routine zgesdd to compute SVD decomposition 3 znaupd p, znaupd2 p, zgetv0 p: slightly revised (arguments, storage) 4 zgemip(added): compute inner product in compact form Outline
1 Background
2 Algorithm
3 Numerical Comparison Example 1: A random QEP
Problem: QEP (pdeg=2) Size: n=500 Environment: PC (EMS memory: 512M) Randomized Matrix M,D,K (each matrix have about 24,000 non-zero elements.) We choose shift σ = 1 and use shift-invert mode to compute eigenvalues close to 1 LU factorization of Mσ2 + Dσ + K, L and U contain about 120,000 non-zero elements. To compute 8 eigenvalues close to 1; Use 30 Arnoldi base vectors, says, 31 CARD base vectors. A random QEP
Computed eigenvalues:
ARPACK POLYAR Real Imag Real Imag 1 1.02817D+00 1.38768D−01 1.02817D+00 1.38768D-01 2 1.11582D+00 3.61818D-02 1.11582D+00 3.61818D-02 3 1.11582D+00 -3.61818D-02 1.11582D+00 -3.61818D-02 4 1.05613D+00 -1.05380D-01 1.05613D+00 -1.05380D-01 5 1.05613D+00 1.05380D-01 1.05613D+00 1.05380D-01 6 9.34692D-01 -2.73028D-15 9.34692D-01 4.39496D-15 7 1.00023D+00 5.87804D-02 1.00023D+00 5.87804D-02 8 1.00023D+00 -5.87804D-02 1.00023D+00 -5.87804D-02 A random QEP
Storage comparison:
ARPACK POLYAR V 500 × 2 × 30 500 × 31 workd 3 × 2 × 500 (2 + 2) × 500 Resid 500 × 2 500 × 2 A random QEP
Iteration and time comparison:
ARPACK POLYAR update iteration 4 4 OP × x operations 86 86 reorthogonalization of V 84 84 reorthogonalization of R 0 84 user’s OP × x operations 1.375000 1.250000 naupd2 1.609375 1.515625 basic Arnoldi iteration loop 1.531250 1.437500 reorthogonalization phrase 0.093750 0.046875 Hessenberg eig subproblem 0.031250 0.031250 applying the shifts 0.046875 0.046875 calling gesdd 0 0.000000 Example 2: A QEP from SLAC
Problem: QEP (pdeg=2) Size: n=5384 Environment: PC (EMS memory: 512M) Genuine Matrix M,D,K (M,D,K have 61425,1183,61425 non-zero elements respectively.) We choose shift σ = −10i and use shift-invert mode to compute eigenvalues close to −10i LU factorization of Mσ2 + Dσ + K, L and U have 749610 and 780229 non-zero elements. To compute 8 eigenvalues close to −10i; Use 30 Arnoldi base vectors, says, 31 CARD base vectors. A QEP from SLAC
ARPACK POLYAR Real Imag Real Imag 1 -8.17408D-13 -1.11248D+01 1.78859D-13 -1.11248D+01 2 -1.80351D-13 -1.10381D+01 1.48935D-13 -1.10381D+01 3 3.43204D-12 -8.96999D+00 1.85159D-12 -8.96999D+00 4 8.92409D-12 -1.09810D+01 2.12829D-12 -1.09810D+01 5 1.19634D-12 -1.07600D+01 2.63738D-13 -1.07600D+01 6 -1.30082D-13 -9.73674D+00 -4.44236D-14 -9.73674D+00 7 3.83685D-15 -9.79517D+00 1.48931D-15 -9.79517D+00 8 2.27178D-15 -1.00606D+01 -3.76020D-15 -1.00606D+01 A QEP from SLAC
Storage comparison:
ARPACK POLYAR V 5000 × 2 × 30 5000 × 31 workd 3 × 2 × 5000 (2 + 2) × 5000 Resid 5000 × 2 5000 × 2 A QEP from SLAC
Time comparison:
ARPACK POLYAR update iteration 2 2 OP × x operations 48 48 reorthogonalization of V 44 44 reorthogonalization of R 0 47 user’s OP × x operations 4.078125 4.093750 naupd2 5.343750 5.171875 basic Arnoldi iteration loop 5.203125 5.062500 reorthogonalization phrase 0.593750 0.250000 Hessenberg eig subproblem 0.015625 0.015625 applying the shifts 0.109375 0.078125 calling gesdd 0 0.000000 Thank you for your attention!