Pseudospectra and the Behavior of Dynamical Systems

Home , Pseudospectrum, Spectral abscissa, Spectral radius

Mark Embree Computational and Applied Mathematics Rice University Houston, Texas

Trogir, Croatia October 2011 Lecture 1:

I Normality and Nonnormality

I Pseudospectra

I Bounding Functions of Matrices

Lecture 2:

I Balanced Truncation and Lyapunov Equations

I Moment Matching Model Reduction

I Diﬀerential Algebraic Equations

Overview of Lectures

These lectures address modern tools for the spectral analysis of dynamical systems. We shall cover a mix of theory, computation, and applications. Goal: By the end, you should be equipped to understand phenomena that many people ﬁnd quite mysterious when encountered in the wild. Lecture 2:

I Balanced Truncation and Lyapunov Equations

I Moment Matching Model Reduction

I Diﬀerential Algebraic Equations

Overview of Lectures

Lecture 1:

I Normality and Nonnormality

I Pseudospectra

I Bounding Functions of Matrices Overview of Lectures

Lecture 1:

I Normality and Nonnormality

I Pseudospectra

I Bounding Functions of Matrices

Lecture 2:

I Balanced Truncation and Lyapunov Equations

I Moment Matching Model Reduction

I Diﬀerential Algebraic Equations Quiz

The plots below show the solution to two dynamical systems, x˙(t) = Ax(t). Which system is stable? That is, for which system, does x(t) → 0 as t → ∞?

(a) neither system is stable (b) only the one on the blue system is stable (c) only the one on the red system is stable (d) both are stable Quiz

The plots below show the solution to two dynamical systems, x˙(t) = Ax(t). Which system is stable? That is, for which system, does x(t) → 0 as t → ∞?

(a) neither system is stable (b) only the one on the blue system is stable (c) only the one on the red system is stable (d) both are stable Quiz

The plots below show the solution to two dynamical systems, x˙(t) = Ax(t). Which system is stable? That is, for which system, does x(t) → 0 as t → ∞?

original unstable system stabilized system

Eigenvalues of 55 × 55 Boeing 767 ﬂutter models [Burke, Lewis, Overton]. These eigenvalues do not reveal the exotic transient behavior. Why is Transient Growth a Big Deal?

Many linear systems arise from the linearization of nonlinear equations, e.g., Navier–Stokes. We compute eigenvalues as part of linear stability analysis.

Transient growth in a stable linearized system has implications for the behavior of the associated nonlinear system.

Trefethen, Trefethen, Reddy, Driscoll “Hydrodynamic stability without eigenvalues,” Science, 1993. Why is Transient Growth a Big Deal?

Many linear systems arise from the linearization of nonlinear equations, e.g., Navier–Stokes. We compute eigenvalues as part of linear stability analysis.

Transient growth in a stable linearized system has implications for the behavior of the associated nonlinear system.

Consider the autonomous nonlinear system u0(t) = f(u).

I Find a steady state u∗, i.e., f(u∗) = 0.

I Linearize f about this steady state and analyze small perturbations, u = u∗ + v: 0 0 v (t) = u (t) = f(u∗ + v)

2 = f(u∗) + Av + O(kvk )

= Av + O(kvk2).

0 I Ignore higher-order eﬀects, and analyze the linear system v (t) = Av(t). The steady state u∗ is stable provided A is stable.

But what if the small perturbation v(t) grows by orders of magnitude before eventually decaying? Spectra and Pseudospectra

Source for much of the content of these lectures:

Princeton University Press 2005 1. Normality and Nonnormality − (a · ∇)u(x, t)

− Dx0(t)

Motivating Applications

Many applications lead to models involving linear, non-self-adjoint operators.

I convective ﬂuid ﬂows

I damped mechanical systems

I atmospheric science

I magnetohydrodynamics

I neutron transport

I population dynamics

I food webs

I directed social networks

I Markov chains

I lasers

ut (x, t) = ν∆u(x, t)

Mx00(t) = −Kx(t) Motivating Applications

Many applications lead to models involving linear, non-self-adjoint operators.

I convective ﬂuid ﬂows

I damped mechanical systems

I atmospheric science

I magnetohydrodynamics

I neutron transport

I population dynamics

I food webs

I directed social networks

I Markov chains

I lasers

ut (x, t) = ν∆u(x, t) − (a · ∇)u(x, t)

Mx00(t) = −Kx(t) − Dx0(t) Important note: The adjoint is deﬁned via the inner product: hAx, yi = hx, A∗yi. hence the deﬁnition of normality depends on the inner product. Here we always use the standard Euclidean inner product, unless noted. In applications, one must use the physically relevant inner product.

Normality and Nonnormality

Unless otherwise noted, all matrices are of size n × n, with complex entries. The adjoint is denoted by A∗ = AT .

Deﬁnition (normal)

The matrix A is normal if it commutes with its adjoint, A∗A = AA∗.

» 1 1– »2 0– A = : A∗A = AA∗ = =⇒ normal −1 1 0 2 »−1 1– » 1 −1– »2 1– A = : A∗A = 6= = AA∗ =⇒ nonnormal 0 1 −1 2 1 1 Normality and Nonnormality

Unless otherwise noted, all matrices are of size n × n, with complex entries. The adjoint is denoted by A∗ = AT .

Deﬁnition (normal)

The matrix A is normal if it commutes with its adjoint, A∗A = AA∗.

» 1 1– »2 0– A = : A∗A = AA∗ = =⇒ normal −1 1 0 2 »−1 1– » 1 −1– »2 1– A = : A∗A = 6= = AA∗ =⇒ nonnormal 0 1 −1 2 1 1

Important note: The adjoint is defined via the inner product: hAx, yi = hx, A∗yi. hence the definition of normality depends on the inner product. Here we always use the standard Euclidean inner product, unless noted. In applications, one must use the physically relevant inner product. Hence, any nondiagonalizable (defective) matrix is nonnormal. But there are many interesting diagonalizable nonnormal matrices. Our fixation with diagonalizability has caused us to overlook these matrices.

Conditions for Normality

Many (∼ 100) equivalent deﬁnitions of normality are known; see [Grone et al. 1987] .... By far, the most important of these concerns the eigenvectors of A.

Theorem

The matrix A is normal if and only if it is unitarily diagonalizable,

A = UΛU∗,

for U unitary (U∗U = I) and Λ diagonal. Equivalently, A is normal if and only if is possesses an orthonormal basis of eigenvectors (i.e., the columns of U). Conditions for Normality

Many (∼ 100) equivalent deﬁnitions of normality are known; see [Grone et al. 1987] .... By far, the most important of these concerns the eigenvectors of A.

Theorem

The matrix A is normal if and only if it is unitarily diagonalizable,

A = UΛU∗,

for U unitary (U∗U = I) and Λ diagonal. Equivalently, A is normal if and only if is possesses an orthonormal basis of eigenvectors (i.e., the columns of U).

Hence, any nondiagonalizable (defective) matrix is nonnormal. But there are many interesting diagonalizable nonnormal matrices. Our ﬁxation with diagonalizability has caused us to overlook these matrices. Orthogonality of Eigenvectors

Theorem

The matrix A is normal if and only if it is unitarily diagonalizable,

A = UΛU∗,

for U unitary (U∗U = I) and Λ diagonal. Equivalently, A is normal if and only if is possesses an orthonormal basis of eigenvectors (i.e., the columns of U).

An orthogonal basis of eigenvectors gives a perfect coordinate system for studying dynamical systems:

x0(t) = Ax(t) =⇒ U∗x0(t) = U∗AUU∗x(t) =⇒ z0(t) = Λz(t) 0 =⇒ zj (t) = λj zj (t), with kx(t)k = kz(t)k for all t. The exact solution is easy:

n X tλk x(t) = Vz(t) = e zk (0)vk . k=1

Suppose kx(0)k = 1. The coeﬃcients zk (0) might still be quite large:

n X x(0) = Vz(0) = zk (0)vk . k=0

The “cancellation” that gives x(0) is washed out for t > 0 by the etλk terms.

The Perils of Oblique Eigenvectors

Now suppose we only have a diagonalization, A = VΛV−1:

x0(t) = Ax(t) =⇒ V−1x0(t) = V−1AVV−1x(t) =⇒ z0(t) = Λz(t) 0 =⇒ zj (t) = λj zj (t), with kx(t)k 6= kz(t)k in general. The Perils of Oblique Eigenvectors

Now suppose we only have a diagonalization, A = VΛV−1:

x0(t) = Ax(t) =⇒ V−1x0(t) = V−1AVV−1x(t) =⇒ z0(t) = Λz(t) 0 =⇒ zj (t) = λj zj (t), with kx(t)k 6= kz(t)k in general.

The exact solution is easy:

n X tλk x(t) = Vz(t) = e zk (0)vk . k=1

Suppose kx(0)k = 1. The coeﬃcients zk (0) might still be quite large:

n X x(0) = Vz(0) = zk (0)vk . k=0

The “cancellation” that gives x(0) is washed out for t > 0 by the etλk terms. Oblique Eigenvectors: Example

Example

» 0 – » –» – » – » – x1(t) −1/2 500 x1(t) x1(0) 1 0 = , = . x2(t) 0 −5 x2(t) x2(0) 1

Eigenvalues and eigenvectors:

» 1 – » 1 – λ = −1/2, v = , λ = −5, v = 1 1 0 2 2 −.009

Initial condition: » 1 – 1000 1009 x(0) = = v − v . 1 9 1 9 2

Exact solution: 1000 1009 x(t) = e−λ1t v − e−λ2t v . 9 1 9 2 Oblique Eigenvectors: Example

kx(0)k = 1

kx(.4)k = 76.75

Note the diﬀerent scales of the horizontal and vertical axes. Oblique Eigenvectors Can Lead to Transient Growth

Transient growth in the solution: a consequence of nonnormality.

See Figure 1 in “Nineteen Dubious Ways to Compute the Exponential of a Matrix” by Moler and Van Loan, SIAM Review, 1978. Transient Behavior of the Power Method for Computing Eigenvalues

Large coefficients in the expansion of x0 in the eigenvector basis can lead to k cancellation effects in xk = A x0. Example: here different choices of α and β affect eigenvalue conditioning,

21 α 0 3 213 2−4α3 28αβ/213 A = 40 3/4 β 5 , v1 = 405 , v2 = 4 1 5 , v3 = 4 −2β/3 5 . 0 0 −3/4 0 0 1

v3 6 v3 6

x0 x2 x0 x0 x v2 x4 2 x } x6 v2 4 x8 : x6 v2 k x x5 x7 v1 3 x x x1, ..., x8 : ix1 8 : x 7 : 5 v1 v3 v1 x3 x1 α = β = 0 α = 10, β = 0 α = 0, β = 10 normal nonnormal nonnormal v1 ⊥ v2 ⊥ v3 v1 6⊥ v2 ⊥ v3 v1 ⊥ v2 6⊥ v3 Cf. [Beattie, E., Rossi 2004; Beattie, E., Sorensen 2005] First, we might seek a scalar measure of normality. Any deﬁnition of normality leads to one such gauge.

∗ ∗ I kA A− AA k

I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.)

I Henrici’s departure from normality:

dep2(A) = min kNk. A=U(D+N)U∗ Schur factorization No minimization is needed in the Frobenius norm: v u n u 2 X 2 depF (A) = min kNkF = tkAkF − |λj | . A=U(D+N)U∗ Schur factorization j=1 These are related by equivalence constants [Elsner, Paardekooper, 1987]. None of these measures is of much use in practice.

Tools for Measuring Nonnormality

Given a matrix, we would like some eﬀective way to measure whether we should be concerned about the eﬀects of nonnormality. ∗ ∗ I kA A− AA k

I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.)

I Henrici’s departure from normality:

Tools for Measuring Nonnormality

Given a matrix, we would like some effective way to measure whether we should be concerned about the effects of nonnormality. First, we might seek a scalar measure of normality. Any definition of normality leads to one such gauge. I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.)

I Henrici’s departure from normality:

Tools for Measuring Nonnormality

∗ ∗ I kA A− AA k I Henrici’s departure from normality:

Tools for Measuring Nonnormality

∗ ∗ I kA A− AA k

I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.) These are related by equivalence constants [Elsner, Paardekooper, 1987]. None of these measures is of much use in practice.

Tools for Measuring Nonnormality

∗ ∗ I kA A− AA k

I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.)

I Henrici’s departure from normality:

dep2(A) = min kNk. A=U(D+N)U∗ Schur factorization No minimization is needed in the Frobenius norm: v u n u 2 X 2 depF (A) = min kNkF = tkAkF − |λj | . A=U(D+N)U∗ Schur factorization j=1 None of these measures is of much use in practice.

Tools for Measuring Nonnormality

∗ ∗ I kA A− AA k

I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.)

I Henrici’s departure from normality:

∗ ∗ I kA A− AA k

I min kA − Zk Z normal (See work on computing the nearest normal matrix by Gabriel, Ruhe 1987.)

I Henrici’s departure from normality:

kx(t)k = ketAx(0)k ≤ ketAkkx(0)k ≤ kVetΛV−1kkx(0)k ≤ κ(V) max |etλ|kx(0)k. λ∈σ(A)

Tools for Measuring Nonnormality

If A is diagonalizable, A = VΛV−1, it is often helpful to characterize nonnormality by κ(V) := kVkkV−1k ≥ 1.

I This quantity depends on the choice of eigenvectors; √ scaling each column of V to be a unit vector gets within n of the optimal value, if the eigenvalues are distinct [van der Sluis 1969].

I For normal matrices, one can take V unitary, so κ(V) = 1.

I If κ(V) is not much more than 1 for some diagonalization, then the eﬀects of nonnormality will be minimal. Tools for Measuring Nonnormality

If A is diagonalizable, A = VΛV−1, it is often helpful to characterize nonnormality by κ(V) := kVkkV−1k ≥ 1.

I This quantity depends on the choice of eigenvectors; √ scaling each column of V to be a unit vector gets within n of the optimal value, if the eigenvalues are distinct [van der Sluis 1969].

I For normal matrices, one can take V unitary, so κ(V) = 1.

I If κ(V) is not much more than 1 for some diagonalization, then the eﬀects of nonnormality will be minimal.

Example (Bound for Continuous Systems)

kx(t)k = ketAx(0)k ≤ ketAkkx(0)k ≤ kVetΛV−1kkx(0)k ≤ κ(V) max |etλ|kx(0)k. λ∈σ(A) I W (A) is the set of all Rayleigh quotient eigenvalue estimates.

I σ(A) ⊂ W (A) [Proof: take x to be an eigenvector.]

I W (A) is a closed, convex subset of C.

I If A is normal, then W (A) is the convex hull of σ(A).

I Unlike σ(A), the numerical range is robust to perturbations:

W (A + E) ⊆ W (A) + {z ∈ C : |z| ≤ kEk}.

Numerical Range (Field of Values)

Another approach: identify a set in the complex plane to replace the spectrum. This dates to the early 20th century literature in functional analysis, e.g., the numerical range, Von Neumann’s spectral sets, and sectorial operators.

Deﬁnition (Numerical Range, a.k.a. Field of Values)

The numerical range of a matrix is the set

W (A) = {x∗Ax : kxk = 1} . Numerical Range (Field of Values)

Deﬁnition (Numerical Range, a.k.a. Field of Values)

The numerical range of a matrix is the set

W (A) = {x∗Ax : kxk = 1} .

I W (A) is the set of all Rayleigh quotient eigenvalue estimates.

I σ(A) ⊂ W (A) [Proof: take x to be an eigenvector.]

I W (A) is a closed, convex subset of C.

I If A is normal, then W (A) is the convex hull of σ(A).

I Unlike σ(A), the numerical range is robust to perturbations:

W (A + E) ⊆ W (A) + {z ∈ C : |z| ≤ kEk}. If z ∈ W (A), then z + z 1 “ A + A∗ ” Re z = = (x∗Ax + x∗A∗x) = x∗ x. 2 2 2 Using properties of Hermitian matrices, we conclude that h “ A + A∗ ” “ A + A∗ ”i Re(W (A)) = λ , λ . min 2 max 2 Similarly, one can determine the intersection of W (A) with any line in C.

A Gallery of Numerical Ranges

normal random Grcar Jordan A Gallery of Numerical Ranges

normal random Grcar Jordan

If z ∈ W (A), then z + z 1 “ A + A∗ ” Re z = = (x∗Ax + x∗A∗x) = x∗ x. 2 2 2 Using properties of Hermitian matrices, we conclude that h “ A + A∗ ” “ A + A∗ ”i Re(W (A)) = λ , λ . min 2 max 2 Similarly, one can determine the intersection of W (A) with any line in C. Neat problem: Given z ∈ W (A), ﬁnd unit vector x such that z = x∗Ax [Uhlig 2008; Carden 2009].

Computation of the Numerical Range

This calculation yields points on the boundary of the numerical range. Use convexity to obtain polygonal outer and inner approximations [Johnson 1980]; Higham’s fv.m. Computation of the Numerical Range

This calculation yields points on the boundary of the numerical range. Use convexity to obtain polygonal outer and inner approximations [Johnson 1980]; Higham’s fv.m.

Neat problem: Given z ∈ W (A), ﬁnd unit vector x such that z = x∗Ax [Uhlig 2008; Carden 2009]. Deﬁnition (numerical abscissa)

The numerical abscissa is the rightmost in W (A):

ω(A) := max Re z. z∈W (A)

Numerical Range and the Matrix Exponential

What does the numerical range reveal about matrix behavior?

1/2 d tA ˛ d “ ∗ tA∗ tA ” ke x0k˛ = x0 e e x0 dt ˛t=0 dt d “ ”1/2 = x∗(I + tA∗)(I + tA)x dt 0 0 ∗ 1 ∗“ A + A ” = x0 x0 kx0k 2

So, the rightmost point in W (A) reveals the maximal slope of ketAk at t = 0. Numerical Range and the Matrix Exponential

What does the numerical range reveal about matrix behavior?

1/2 d tA ˛ d “ ∗ tA∗ tA ” ke x0k˛ = x0 e e x0 dt ˛t=0 dt d “ ”1/2 = x∗(I + tA∗)(I + tA)x dt 0 0 ∗ 1 ∗“ A + A ” = x0 x0 kx0k 2

So, the rightmost point in W (A) reveals the maximal slope of ketAk at t = 0.

Deﬁnition (numerical abscissa)

The numerical abscissa is the rightmost in W (A):

ω(A) := max Re z. z∈W (A) Intitial Transient Growth via Numerical Abscissa

»−1.1 10 – A = 0 −1

κ(V)e−t

eω(A)t Crouzeix’s Conjecture: kf (A)k ≤ 2 max |f (z)|. z∈W (A)

Obstacle: in applications, W (A) is often too large, e.g., it contains the origin, or points in the right-half plane, or points of magnitude larger than one.

Crouzeix’s Conjecture

The numerical range can be used to help gauge the size of matrix functions.

Theorem (Crouzeix, 2007)

Let f be a function analytic on W (A). Then

kf (A)k ≤ 11.08 max |f (z)|. z∈W (A) Crouzeix’s Conjecture

The numerical range can be used to help gauge the size of matrix functions.

Theorem (Crouzeix, 2007)

Let f be a function analytic on W (A). Then

kf (A)k ≤ 11.08 max |f (z)|. z∈W (A)

Crouzeix’s Conjecture: kf (A)k ≤ 2 max |f (z)|. z∈W (A)

Obstacle: in applications, W (A) is often too large, e.g., it contains the origin, or points in the right-half plane, or points of magnitude larger than one. Boeing 737 Example, Revisited

Numerical range of 55 × 55 stable matrix.

σ(A) W (A) 2. Pseudospectra Example

Compute eigenvalues of three similar 100 × 100 matrices using MATLAB’s eig.

2 0 1 3 2 0 1/2 3 2 0 1/3 3 . . . 6 . 7 6 . 7 6 . 7 6 1 0 . 7 6 2 0 . 7 6 3 0 . 7 6 . . 7 6 . . 7 6 . . 7 4 .. .. 1 5 4 .. .. 1/2 5 4 .. .. 1/3 5 1 0 2 0 3 0

Pseudospectra

The numerical range and the eigenvalues both have limitations. For nonnormal matrices, W (A) can be “too big,” while σ(A) is “too small.” Here we shall explore another option that, loosely speaking, interpolates between σ(A) and W (A). Pseudospectra

Example

Compute eigenvalues of three similar 100 × 100 matrices using MATLAB’s eig.

2 0 1 3 2 0 1/2 3 2 0 1/3 3 . . . 6 . 7 6 . 7 6 . 7 6 1 0 . 7 6 2 0 . 7 6 3 0 . 7 6 . . 7 6 . . 7 6 . . 7 4 .. .. 1 5 4 .. .. 1/2 5 4 .. .. 1/3 5 1 0 2 0 3 0 Pseudospectra

Example

Compute eigenvalues of three similar 100 × 100 matrices using MATLAB’s eig.

2 0 1 3 2 0 1/2 3 2 0 1/3 3 . . . 6 . 7 6 . 7 6 . 7 6 1 0 . 7 6 2 0 . 7 6 3 0 . 7 6 . . 7 6 . . 7 6 . . 7 4 .. .. 1 5 4 .. .. 1/2 5 4 .. .. 1/3 5 1 0 2 0 3 0 We can estimate σε(A) by conducting experiments with random perturbations. For the 20 × 20 version of A = tridiag(2, 0, 1/2), 50 trials each:

ε = 10−2 ε = 10−4 ε = 10−6

Pseudospectra

Deﬁnition (ε-pseudospectrum)

For any ε > 0, the ε-pseudospectrum of A, denoted σε(A), is the set

n×n σε(A) = {z ∈ C : z ∈ σ(A + E) for some E ∈ C with kEk < ε}. Pseudospectra

Deﬁnition (ε-pseudospectrum)

For any ε > 0, the ε-pseudospectrum of A, denoted σε(A), is the set

n×n σε(A) = {z ∈ C : z ∈ σ(A + E) for some E ∈ C with kEk < ε}.

We can estimate σε(A) by conducting experiments with random perturbations. For the 20 × 20 version of A = tridiag(2, 0, 1/2), 50 trials each:

ε = 10−2 ε = 10−4 ε = 10−6 Equivalent Deﬁnitions of the Pseudospectrum

Theorem

The following three deﬁnitions of the ε-pseudospectrum are equivalent:

n×n 1. σε(A) = {z ∈ C : z ∈ σ(A + E) for some E ∈ C with kEk < ε}; −1 2. σε(A) = {z ∈ C : k(z − A) k > 1/ε}; n 3. σε(A) = {z ∈ C : kAv − zvk < ε for some unit vector v ∈ C }. Equivalent Deﬁnitions of the Pseudospectrum

Theorem

The following three deﬁnitions of the ε-pseudospectrum are equivalent:

n×n 1. σε(A) = {z ∈ C : z ∈ σ(A + E) for some E ∈ C with kEk < ε}; −1 2. σε(A) = {z ∈ C : k(z − A) k > 1/ε}; n 3. σε(A) = {z ∈ C : kAv − zvk < ε for some unit vector v ∈ C }.

Proof. (1) =⇒ (2) If z ∈ σ(A + E) for some E with kEk < ε, there exists a unit vector v such that (A + E)v = zv. Rearrange to obtain

v = (z − A)−1Ev.

Take norms:

kvk = k(z − A)−1Evk ≤ k(z − A)−1kkEkkvk < εk(z − A)−1kkvk.

Hence k(z − A)−1k > 1/ε. Equivalent Deﬁnitions of the Pseudospectrum

Theorem

The following three deﬁnitions of the ε-pseudospectrum are equivalent:

Proof. (2) =⇒ (3) If k(z − A)−1k > 1/ε, there exists a unit vector w with k(z − A)−1wk > 1/ε. −1 Deﬁne bv := (z − A) w, so that 1/kbvk < ε, and k(z − A)vk kwk b = < ε. kbvk kbvk

Hence we have found a unit vector v := bv/kbvk for which kAv − zvk < ε. Equivalent Deﬁnitions of the Pseudospectrum

Theorem

The following three deﬁnitions of the ε-pseudospectrum are equivalent:

Proof. (3) =⇒ (1) Given a unit vector v such that kAv − zvk < ε, deﬁne r := Av − zv. Now set E := −rv∗, so that

(A + E)v = (A − rv∗)v = Av − r = zv.

Hence z ∈ σ(A + E). Equivalent Deﬁnitions of the Pseudospectrum

Theorem

The following three deﬁnitions of the ε-pseudospectrum are equivalent:

These different definitions are useful in different contexts:

1. interpreting numerically computed eigenvalues; 2. analyzing matrix behavior/functions of matrices; computing pseudospectra on a grid in C;

3. proving bounds on a particular σε(A). History of Pseudospectra

Invented at least four times, independently:

I Jim Varah in his 1967 Stanford PhD thesis.

I Henry Landau in a 1975 paper, motivated by laser theory.

I S. K. Godunov and colleagues in Novosibirsk in the 1980s.

I L. N. Trefethen, motivated by stability of spectral methods (1990).

Early adopters include Wilkinson (1986), Demmel (1987), Chatelin (1990s). Properties of Pseudospectra

A is normal ⇐⇒ σε(A) is the union of open ε-balls about each eigenvalue: [ A normal =⇒ σε(A) = λj + ∆ε j [ A nonnormal =⇒ σε(A) ⊃ λj + ∆ε j

A circulant (hence normal) matrix:

2 0 1 0 0 0 3 6 0 0 1 0 0 7 6 7 A = 6 0 0 0 1 0 7 6 7 4 0 0 0 0 1 5 1 0 0 0 0 Properties of Pseudospectra

A is normal ⇐⇒ σε(A) is the union of open ε-balls about each eigenvalue: [ A normal =⇒ σε(A) = λj + ∆ε j [ A nonnormal =⇒ σε(A) ⊃ λj + ∆ε j

A circulant (hence normal) matrix:

2 3 0 1 0 0 0 k(z − A)−1k 6 0 0 1 0 0 7 6 7 A = 6 0 0 0 1 0 7 6 7 4 0 0 0 0 1 5 1 0 0 0 0

Im z

Re z Normal Matrices: matching distance

An easy misinterpretation: “A size ε perturbation to a normal matrix can move the eigenvalues by no more than ε.” This holds for Hermitian matrices, but not all normal matrices. An example constructed by Gerd Krause (see Bhatia, Matrix Analysis, 1997): √ √ A = diag(1, (4 + 5 −3)/13, (−1 + 2 −3)/13).

Now construct the (unitary) Householder reﬂector

U = I − 2vv∗

for v = [p5/8, 1/2, p1/8]∗, and deﬁne E via

A + E = −U∗AU.

By construction A + E is normal and σ(A + E) = −σ(A). Normal Matrices: matching distance

√ √ A = diag(1, (4 + 5 −3)/13, (−1 + 2 −3)/13). A + E = −U∗AU.

2.5

1.5

0.5

−0.5

−1

−1.5

−1.5 −1 −0.5 0 0.5 1 1.5 2 2.5 Normal Matrices: matching distance

√ √ A = diag(1, (4 + 5 −3)/13, (−1 + 2 −3)/13). A + E = −U∗AU.

0.5

−0.5

−1 −1.5 −1 −0.5 0 Properties of Pseudospectra

Theorem (Facts about Pseudospectra)

For all ε > 0,

I σε(A) is an open, ﬁnite set that contains the spectrum.

I σε(A) is stable to perturbations: σε(A + E) ⊆ σε+kEk(A). ∗ I If U is unitary, σε(UAU ) = σε(A). −1 −1 I For V invertible, σε/κ(V)(VAV ) ⊆ σε(A) ⊆ σεκ(V)(VAV ).

I σε(A + α) = α + σε(A).

I σ|γ|ε(γA) = γσε(A). “ »A 0– ” σ = σ (A) ∪ σ (B). I ε 0 B ε ε Theorem (Stone, 1932)

For any A, σε(A) ⊆ W (A) + ∆ε.

The pseudospectrum σε(A) cannot be bigger than W (A) in an interesting way.

Relationship to κ(V) and W(A)

Let ∆r := {z ∈ C : |z| < r} denote the open disk of radius r > 0.

Theorem (Bauer–Fike, 1963)

Let A be diagonalizable, A = VΛV−1. Then for all ε > 0,

σε(A) ⊆ σ(A) + ∆εκ(V).

If κ(V) is small, then σε(A) cannot contain points far from σ(A). Relationship to κ(V) and W(A)

Let ∆r := {z ∈ C : |z| < r} denote the open disk of radius r > 0.

Theorem (Bauer–Fike, 1963)

Let A be diagonalizable, A = VΛV−1. Then for all ε > 0,

σε(A) ⊆ σ(A) + ∆εκ(V).

If κ(V) is small, then σε(A) cannot contain points far from σ(A).

Theorem (Stone, 1932)

For any A, σε(A) ⊆ W (A) + ∆ε.

The pseudospectrum σε(A) cannot be bigger than W (A) in an interesting way. Toeplitz Matrices

Pseudospectra of the 100 × 100 matrix A = tridiag(2, 0, 1/2) that began our investigation of eigenvalue perturbations.

A is diagonalizable (it has distinct eigenvalues), but Bauer–Fike is useless here: κ(V) = 299 ≈ 6 × 1029. Jordan Blocks

2 0 1 3 6 . . 7 6 0 . 7 Sn = 6 7 . . 4 . 1 5 0

Near the eigenvalue, the resolvent norm grows with dimension n; outside the unit disk, the resolvent norm does not seem to get big. We would like to prove this. Jordan Blocks

2 0 1 3 6 . . 7 6 0 . 7 Sn = 6 7 . . 4 . 1 5 0

Near the eigenvalue, the resolvent norm grows with dimension n; outside the unit disk, the resolvent norm does not seem to get big. We would like to prove this. Jordan Blocks

2 0 1 3 6 . . 7 6 0 . 7 Sn = 6 7 . . 4 . 1 5 0

Near the eigenvalue, the resolvent norm grows with dimension n; outside the unit disk, the resolvent norm does not seem to get big. We would like to prove this. In particular, S(1, z, z 2,...) = (z, z 2, z 3,...) = z(1, z, z 2,...). So if (1, z, z 2,...) ∈ `2(N), then z ∈ σ(S). If |z| < 1, then ∞ X j−1 2 1 |z | = < ∞. 1 − |z|2 j=1 So, {z ∈ C : |z| < 1} ⊆ σ(S).

Jordan Blocks/Shift Operator

Consider the generalization of the Jordan block to the domain

∞ 2 X 2 ` (N) = {(x1, x2,...): |xj | < ∞}. j=1

The shift operator S on `2(N) is deﬁned as

S(x1, x2,...) = (x2, x3,...). If |z| < 1, then ∞ X j−1 2 1 |z | = < ∞. 1 − |z|2 j=1 So, {z ∈ C : |z| < 1} ⊆ σ(S).

Jordan Blocks/Shift Operator

Consider the generalization of the Jordan block to the domain

∞ 2 X 2 ` (N) = {(x1, x2,...): |xj | < ∞}. j=1

The shift operator S on `2(N) is deﬁned as

S(x1, x2,...) = (x2, x3,...).

In particular, S(1, z, z 2,...) = (z, z 2, z 3,...) = z(1, z, z 2,...). So if (1, z, z 2,...) ∈ `2(N), then z ∈ σ(S). Jordan Blocks/Shift Operator

Consider the generalization of the Jordan block to the domain

∞ 2 X 2 ` (N) = {(x1, x2,...): |xj | < ∞}. j=1

The shift operator S on `2(N) is deﬁned as

S(x1, x2,...) = (x2, x3,...).

In particular, S(1, z, z 2,...) = (z, z 2, z 3,...) = z(1, z, z 2,...). So if (1, z, z 2,...) ∈ `2(N), then z ∈ σ(S). If |z| < 1, then ∞ X j−1 2 1 |z | = < ∞. 1 − |z|2 j=1 So, {z ∈ C : |z| < 1} ⊆ σ(S). Jordan Blocks/Shift Operator

S(x1, x2,...) = (x2, x3,...).

We have seen that {z ∈ C : |z| < 1} ⊆ σ(S).

Observe that kSk = sup kSxk = 1, kxk=1 and so σ(S) ⊆ {z ∈ C : |z| ≤ 1}. The spectrum is closed, so σ(A) = {z ∈ C : |z| ≤ 1}.

For any ﬁnite dimensional n × n Jordan block Sn,

σ(Sn) = {0}.

So the Sn → S strongly, but there is a discontinuity in the spectrum:

σ(Sn) 6→ σ(S). 2 1 3 2 0 3 6 z 7 6 0 7 6 7 6 7 6 . 7 6 . 7 = z 6 . 7 − 6 . 7 . 6 7 6 7 4 z n−2 5 4 0 5 z n−1 z n

n p 2n n |z| n 1 − |z| Hence, kSnx − zxk = |z| , so for all ε > = |z| , kxk p1 − |z|2

z ∈ σε(Sn).

−1 We conclude that for ﬁxed |z| < 1, the resolvent norm k(z − Sn) k grows exponentially with n.

Pseudospectra of Jordan Blocks

Pseudospectra resolve this unpleasant discontinuity. Recall the eigenvectors (1, z, z 2,...) for S.

Truncate this vector to length n, and apply it to Sn:

2 3 0 1 2 1 3 2 z 3 6 .. 7 z z 2 6 0 . 7 6 7 6 7 6 7 6 . 7 6 . 7 6 . 7 6 . 7 = 6 . 7 6 .. 1 7 6 . 7 6 . 7 6 7 6 n−2 7 6 n−1 7 4 0 1 5 4 z 5 4 z 5 n−1 0 z 0 n p 2n n |z| n 1 − |z| Hence, kSnx − zxk = |z| , so for all ε > = |z| , kxk p1 − |z|2

z ∈ σε(Sn).

−1 We conclude that for ﬁxed |z| < 1, the resolvent norm k(z − Sn) k grows exponentially with n.

Pseudospectra of Jordan Blocks

Pseudospectra resolve this unpleasant discontinuity. Recall the eigenvectors (1, z, z 2,...) for S.

Truncate this vector to length n, and apply it to Sn:

2 3 0 1 2 1 3 2 z 3 2 1 3 2 0 3 6 .. 7 z z 2 z 0 6 0 . 7 6 7 6 7 6 7 6 7 6 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 = 6 . 7 = z 6 . 7 − 6 . 7 . 6 .. 1 7 6 . 7 6 . 7 6 . 7 6 . 7 6 7 6 n−2 7 6 n−1 7 6 n−2 7 6 7 4 0 1 5 4 z 5 4 z 5 4 z 5 4 0 5 n−1 n−1 n 0 z 0 z z Pseudospectra of Jordan Blocks

Pseudospectra resolve this unpleasant discontinuity. Recall the eigenvectors (1, z, z 2,...) for S.

Truncate this vector to length n, and apply it to Sn:

2 3 0 1 2 1 3 2 z 3 2 1 3 2 0 3 6 .. 7 z z 2 z 0 6 0 . 7 6 7 6 7 6 7 6 7 6 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 = 6 . 7 = z 6 . 7 − 6 . 7 . 6 .. 1 7 6 . 7 6 . 7 6 . 7 6 . 7 6 7 6 n−2 7 6 n−1 7 6 n−2 7 6 7 4 0 1 5 4 z 5 4 z 5 4 z 5 4 0 5 n−1 n−1 n 0 z 0 z z

n p 2n n |z| n 1 − |z| Hence, kSnx − zxk = |z| , so for all ε > = |z| , kxk p1 − |z|2

z ∈ σε(Sn).

−1 We conclude that for ﬁxed |z| < 1, the resolvent norm k(z − Sn) k grows exponentially with n. Pseudospectra of Toeplitz Matrices

Toeplitz matrices are described by their symbola with Laurent expansion

∞ X k a(z) = ak z , k=−∞ giving the matrix with constant diagonals containing the Laurent coeﬃcients: 2 3 a0 a−1 a−2 ··· 6 . . . 7 6 a a .. .. . 7 6 1 0 7 6 . . 7 n×n An = 6 . . 7 ∈ C . 6 a2 . . a−1 a−2 7 6 7 6 . .. 7 4 . . a1 a0 a−1 5 ··· a2 a1 a0

Call the image of the unit circle T under a the symbol curve, a(T). Pseudospectra of Toeplitz Matrices

Theorem (Landau; Reichel and Trefethen; B¨ottcher)

PM k Let a(z) = k=−M ak z be the symbol of a banded Toepiltz operator.

I The pseudospectra of An converge to the pseudospectra of the Toeplitz operator on `2(N) as n → ∞.

Let z ∈ C have nonzero winding number w.r.t. the symbol curve a(T).

−1 I k(z − An) k grows exponentially in n.

I For all ε > 0, z ∈ σε(An) for all n suﬃciently large.

n = 16 n = 32 n = 64 n = 128 Pseudospectra of Toeplitz Matrices

symbol curve σε(A100) Pseudospectra of Toeplitz Matrices

symbol curve σε(A500) Computation of Pseudospectra

Naive algorithm: O(n3) per grid point −1 I Compute k(z − A) k using the SVD on a grid of points in C.

I Send data to a contour plotting routine.

Modern algorithm: O(n3) + O(n2) per grid point [Lui 1997; Trefethen 1999] ∗ I Compute a Schur factorization A = UTU and compute σε(T). −1 −∗ −1 I Compute k(z − T) k on grid via inverse Lanczos on (z − T) (z − T) .

Large-scale problems: [Toh and Trefethen 1996; Wright and Trefethen 2001] ∗ I Pick an invariant subspace Ran(V) for V V = I corresponding to eigenvalues of physical interest (e.g., using ARPACK). ∗ I Compute σε(V AV) ⊆ σε(A).

Alternative: [Br¨uhl1996; Bekas and Gallopoulos, . . . ] −1 I Curve tracing: follow level sets of k(z − A) k. EigTool

EigTool: Thomas Wright, 2002 http://www.cs.ox.ac.uk/pseudospectra/eigtool Pole Placement Example

Problem (Pole Placement Example of Mehrmann and Xu, 1996)

Given A = diag(1, 2,..., N) and b = [1, 1,..., 1]T , ﬁnd f ∈ C n such that

σ(A − bf∗) = {−1, −2,..., −N}.

One can show that f = G−∗e, where e = [1, 1,..., 1]T , where

−1 −1 G:,k = (A − λk ) b = (A + k) b.

This gives 2−1 3 ∗ 6 .. 7 −1 A − bf = G 4 . 5 G , −n with 1 G = . j,k j + k Pole Placement Example: Numerics

In exact arithmetic, σ(A − bf∗) = {−1, −2,..., −N}. All entries in A − bf∗ are integers. (To ensure this, we compute f symbolically.)

For example, when N = 8, 2 73 −2520 27720 −138600 360360 −504504 360360 −102960 3 72 −2518 27720 −138600 360360 −504504 360360 −102960 6 7 6 72 −2520 27723 −138600 360360 −504504 360360 −102960 7 6 7 ∗ 6 72 −2520 27720 −138596 360360 −504504 360360 −102960 7 A−bf = 6 7 . 6 72 −2520 27720 −138600 360365 −504504 360360 −102960 7 6 7 6 72 −2520 27720 −138600 360360 −504498 360360 −102960 7 4 72 −2520 27720 −138600 360360 −504504 360367 −102960 5 72 −2520 27720 −138600 360360 −504504 360360 −102952 Pole Placement Example: Numerics

In exact arithmetic, σ(A − bf∗) = {−1, −2,..., −N}. Computed eigenvalues of A − bf∗ (matrix is exact: all integer entries).

◦ exact eigenvalues • computed eigenvalues Pole Placement Example: Numerics