A compact Arnoldi algorithm for polynomial eigenvalue problems

Yangfeng Su, Junyi Zhang, Zhaojun Bai

School of Mathematical Sciences Fudan University

January, 2008, Taiwan RANMEP2008 Goals

Polynomial Eigenvalue Problems (PEPs):

 d  A0 + λA1 + ... + λ Ad x = 0

(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory. Goals

Polynomial Eigenvalue Problems (PEPs):

 d  A0 + λA1 + ... + λ Ad x = 0

(λ, x) is called an eigenpair. d=1, SEP,GEP d=2, QEP ( Quadratic Eigenvalue Problem) Goals Solving large scale polynomial eigenvalue problems with Implicitly Restarted Arnoldi (IRA) algorithm with less memory. Outline

1 Background

2 Algorithm

3 Numerical Comparison Outline

1 Background

2 Algorithm

3 Numerical Comparison Linearization

Linearization ⇒ A larger GEP:       A0 −A1 −A2 ... −Ad x  I   I   λx    − λ     = 0  ..   ..   .   .   .   .  I I 0 λd−1x

Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Linearization

Linearization ⇒ A larger GEP:       A0 −A1 −A2 ... −Ad x  I   I   λx    − λ     = 0  ..   ..   .   .   .   .  I I 0 λd−1x

Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Linearization

Linearization ⇒ A larger GEP:       A0 −A1 −A2 ... −Ad x  I   I   λx    − λ     = 0  ..   ..   .   .   .   .  I I 0 λd−1x

Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Linearization

Linearization ⇒ A larger GEP:       A0 −A1 −A2 ... −Ad x  I   I   λx    − λ     = 0  ..   ..   .   .   .   .  I I 0 λd−1x

Sleijpen, van der Vorst, and van Gijzen. Quadratic eigenproblems are no problem, SIAM News, 1996 Can be solved with ARPACK Not true: structure is neither used nor preserved. size and therefore memory are d times! This work: explore the structure to save memory in ARPACK. It is a byproduct of SOAR [Bai & S. SIMAX 2005]. Arnoldi decomposition

An Arnoldi decomposition of order-j:

T OPQj = Qj Hj + hj+1,j qj+1ej

where OP: a

Qj = [q1, q2,..., qj ]: orthonormal

Hj : upper Hessenberg

Arnoldi process is used to compute the Arnoldi decomposition with Implicitly Restarted Arnoldi (IRA) for SEP

Sorensen [SIMAX92] Given an Arnoldi decomposition of order-p,

T OPQp = QpHp + βpqp+1ep .

1 Extend Arnoldi decomposition from order p to order k:

T AQk = Qk Hk + βk qk+1ek .

2 Divide the eigenvalues of Hk as “good” ones µ1, . . . , µp and “bad” ones µp+1, . . . , µk .

3 For Hk do implicitly QR steps with shifts µk+1, . . . , µk , get H Hk = UH˜k U 4 Take first p columns of

˜ T AQk U = Qk UHk + βk qk+1ek U

as a restarted Arnoldi decomposition of order p. ARPACK

ARPACK is an implementation of IRA algorithm a well-coded, well-documented package produced by Lehoucq, Sorensen and Yang during 1992-1997 used in MATLAB as eigs and arpackc IRA for QEP

For simplicity, we only discuss QEPs. For QEP:

(λ2M + λD + K)x = 0

1 shift-and-invert: for shift σ, let λ = σ + 1/µ

(µ2I − µA − B)x = 0

where

A = −(σ2M + σD + K)−1(2σM + D) B = −(σ2M + σD + K)−1M

2 linearize  AB   µx   µx  = µ I 0 x x

3 apply IRA IRA for QEP

Easy use of ARPACK

How to utilize the Frobenius structure to save memory? Outline

1 Background

2 Algorithm

3 Numerical Comparison Arnoldi decomposition for QEP

An Arnoldi decomposition with order-j

OPQj = Qj+1Hbj

For QEP:       AB Qj,1 Qj+1,1 = Hbj I 0 Qj,2 Qj+1,2 Since Qj,1 = Qj+1,2Hbj we have Theorem

rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1

Observed by many people, e.g. Meerbergen [SIMAX06], Bai & S. [SIMAX05]. The key is how to use it with numerical stability. Arnoldi Decomposition for QEP

Theorem

rank ([Qj,1, Qj,2]) ≡ rj ≤ j + 1

Let n×rj Vj ∈ C = orth[Qj,1, Qj,2] then         Qj,1 Vj Rj,1 Vj Rj,1 Qj = = = Qj,2 Vj Rj,2 Vj Rj,2

Two levels of orthonormality:

Vj is orthonormal  R  j,1 is orthonormal Rj,2 Compact ARnoldi Decomposition (CARD)

Compact ARnoldi Decomposition (CARD)         Vj Rj,1 Vj+1 Rj+1,1 OP = Hbj Vj Rj,2 Vj+1 Rj+1,2

n×r Vj ∈ C j r ×j Rj ∈ C j

j ≤ rj+1 ≤ j + 1 Memory cost: Arnoldi: 2n(j + 1) ( for PEPs: dn(j + 1))

CARD: nrj+1 ≤ n(j + 2) (for PEPs: ≤ n(j + d + 1)) CARD process

CARD process is to compute the CARD with numerical stability!

CARD of order j:       VR1 VR1 Vˆ Rˆ1 OP = H + βqe1 = Hˆ VR2 VR2 Vˆ Rˆ2

Expand it to a CARD of order j + 1 ⇒ next two pages Expand CARD process

1 compute q1 = Aq1 + Bq2;

2 decompose q1 = Vˆ x + vα with MGS

T x = Vˆ q1,

v = q1 − Vˆ x, α = kvk, v = v/α

3 update h i Vˆ = Vˆ , v , rj+1 = rj + 1,

 Rˆ x   Rˆ Rˆ (:, j + 1)  Rˆ = 1 , Rˆ = 2 2 1 0 α 2 0 0 Expand CARD process

  Rˆ1 x  ˆ  R1  0 α  :=   Rˆ2  Rˆ2 Rˆ1 (:, j + 1)  0 0

4 decompose with MGS:

Rˆ (:, j + 2) Rˆ (:, 1 : j + 1) 1 = 1 H Rˆ (:, j + 2) Rˆ (:, 1 : j + 1) 1:j+1,j+1 2 old 2 Rˆ (:, j + 2) + 1 H Rˆ (:, j + 2) j+2,j+1 2 new 5 update the current Arnoldi vector q: q1 q = q2

q1 = Vˆ Rˆj+1,1 [:, j + 2]

q2 = Vˆ Rˆj+1,2 [:, j + 2]

Only GMS (with re-), no inversion CARD is numerically stable! IRA with CARD

Given a CARD of k-order:       VR1 Vˆ Rˆ1 H OP = ˆ ˆ T VR2 V R2 βek

IRA does (m − p) QR steps on H with shifts µp+1, ..., µm, i.e.

H = UHUe H

Then       VR1 Vˆ Rˆ1 UH˜ OP U = ˆ ˆ T VR2 V R2 βek U Denote  U  Uˆ = 1 IRA with CARD

Then       VR1U Vˆ Rˆ1Uˆ H˜ OP = ˆ ˆ ˆ T VR2U V R2U βek U Its first p columns , denoted by       Vk R1,p Vk+1R1,p+1 Hp OP = ˜ T Vk R2,p Vk+1R2,p+1 βep still form an Arnoldi decomposition of order p

However, the Vk has rk (instead of rp) columns, it is not a CARD! IRA with CARD

Since   Vk+1R1,p+1 Vk+1R2,p+1 is the orthonormal factor of an Arnoldi decomposition, from previous theorem,

rank[Vk+1R1,p+1, Vk+1R2,p+1]

= rank[R1,p+1, R2,p+1]

= rp+1 ≤ p + 2,

we have a compact SVD:

rk+1×rp+1 rp+1×rp+1 rp+1×(p+1) rp+1×(p+1) [R1,p+1, R2,p+1] = P Σ [G1 , G2 ] ≡ P[R1, R2] IRA with CARD

Therefore,

n×r  V k+1 R   (V P)n×rp+1 R  k+1 1,p+1 = k+1 1 Vk+1R2,p+1 (Vk+1P)R2

The Arnoldi decomposition is expressed in CARD again!

This process can also be implemented by a compact “QR” decomposition, which is similar with the compact Arnoldi decomposition (CARD). Details omitted. POLYAR

POLYAR: modified ARPARK for polynomial eigenvalue problems (not only QEPs) 1 znaitr p: compute CARD 2 znapps p: IRA with CARD; use LAPACK routine zgesdd to compute SVD decomposition 3 znaupd p, znaupd2 p, zgetv0 p: slightly revised (arguments, storage) 4 zgemip(added): compute inner product in compact form Outline

1 Background

2 Algorithm

3 Numerical Comparison Example 1: A random QEP

Problem: QEP (pdeg=2) Size: n=500 Environment: PC (EMS memory: 512M) Randomized Matrix M,D,K (each matrix have about 24,000 non-zero elements.) We choose shift σ = 1 and use shift-invert mode to compute eigenvalues close to 1 LU factorization of Mσ2 + Dσ + K, L and U contain about 120,000 non-zero elements. To compute 8 eigenvalues close to 1; Use 30 Arnoldi base vectors, says, 31 CARD base vectors. A random QEP

Computed eigenvalues:

ARPACK POLYAR Real Imag Real Imag 1 1.02817D+00 1.38768D−01 1.02817D+00 1.38768D-01 2 1.11582D+00 3.61818D-02 1.11582D+00 3.61818D-02 3 1.11582D+00 -3.61818D-02 1.11582D+00 -3.61818D-02 4 1.05613D+00 -1.05380D-01 1.05613D+00 -1.05380D-01 5 1.05613D+00 1.05380D-01 1.05613D+00 1.05380D-01 6 9.34692D-01 -2.73028D-15 9.34692D-01 4.39496D-15 7 1.00023D+00 5.87804D-02 1.00023D+00 5.87804D-02 8 1.00023D+00 -5.87804D-02 1.00023D+00 -5.87804D-02 A random QEP

Storage comparison:

ARPACK POLYAR V 500 × 2 × 30 500 × 31 workd 3 × 2 × 500 (2 + 2) × 500 Resid 500 × 2 500 × 2 A random QEP

Iteration and time comparison:

ARPACK POLYAR update iteration 4 4 OP × x operations 86 86 reorthogonalization of V 84 84 reorthogonalization of R 0 84 user’s OP × x operations 1.375000 1.250000 naupd2 1.609375 1.515625 basic Arnoldi iteration loop 1.531250 1.437500 reorthogonalization phrase 0.093750 0.046875 Hessenberg eig subproblem 0.031250 0.031250 applying the shifts 0.046875 0.046875 calling gesdd 0 0.000000 Example 2: A QEP from SLAC

Problem: QEP (pdeg=2) Size: n=5384 Environment: PC (EMS memory: 512M) Genuine Matrix M,D,K (M,D,K have 61425,1183,61425 non-zero elements respectively.) We choose shift σ = −10i and use shift-invert mode to compute eigenvalues close to −10i LU factorization of Mσ2 + Dσ + K, L and U have 749610 and 780229 non-zero elements. To compute 8 eigenvalues close to −10i; Use 30 Arnoldi base vectors, says, 31 CARD base vectors. A QEP from SLAC

ARPACK POLYAR Real Imag Real Imag 1 -8.17408D-13 -1.11248D+01 1.78859D-13 -1.11248D+01 2 -1.80351D-13 -1.10381D+01 1.48935D-13 -1.10381D+01 3 3.43204D-12 -8.96999D+00 1.85159D-12 -8.96999D+00 4 8.92409D-12 -1.09810D+01 2.12829D-12 -1.09810D+01 5 1.19634D-12 -1.07600D+01 2.63738D-13 -1.07600D+01 6 -1.30082D-13 -9.73674D+00 -4.44236D-14 -9.73674D+00 7 3.83685D-15 -9.79517D+00 1.48931D-15 -9.79517D+00 8 2.27178D-15 -1.00606D+01 -3.76020D-15 -1.00606D+01 A QEP from SLAC

Storage comparison:

ARPACK POLYAR V 5000 × 2 × 30 5000 × 31 workd 3 × 2 × 5000 (2 + 2) × 5000 Resid 5000 × 2 5000 × 2 A QEP from SLAC

Time comparison:

ARPACK POLYAR update iteration 2 2 OP × x operations 48 48 reorthogonalization of V 44 44 reorthogonalization of R 0 47 user’s OP × x operations 4.078125 4.093750 naupd2 5.343750 5.171875 basic Arnoldi iteration loop 5.203125 5.062500 reorthogonalization phrase 0.593750 0.250000 Hessenberg eig subproblem 0.015625 0.015625 applying the shifts 0.109375 0.078125 calling gesdd 0 0.000000 Thank you for your attention!