<<

Chapter 6

Linear operators and adjoints

Contents

Introduction ...... 6.1 Fundamentals ...... 6.2 Spaces of bounded linear operators ...... 6.3 Inverseoperators...... 6.5 Linearityofinverses...... 6.5 Banachinversetheorem ...... 6.5 Equivalenceofspaces ...... 6.5 Isomorphicspaces...... 6.6 Isometricspaces...... 6.6 Unitaryequivalence ...... 6.7 Adjoints in Hilbert spaces ...... 6.11 Unitaryoperators ...... 6.13 Relations between the four spaces ...... 6.14 Duality relations for convex cones ...... 6.15 Geometric interpretation of adjoints ...... 6.15 Optimization in Hilbert spaces ...... 6.16 Thenormalequations ...... 6.16 Thedualproblem ...... 6.17 Pseudo-inverseoperators ...... 6.18 AnalysisoftheDTFT ...... 6.21

6.1 Introduction The field of optimization uses linear operators and their adjoints extensively. Example. differentiation, convolution, , Radon transform, among others.

Example. If A is a n m , an example of a linear , then we know that y Ax 2 is minimized when x = 1 × k − k [A0A]− A0y. We want to solve such problems for linear operators between more general spaces. To do so, we need to generalize “transpose” and “inverse.”

6.1 6.2 c J. Fessler, December 21, 2004, 13:3 (student version)

6.2 Fundamentals We write T : D when T is a transformation from a set D in a vector space to a vector space . →Y X Y The set D is called the domain of T . The range of T is denoted

R(T ) = y : y = T (x) for x D . { ∈Y ∈ }

If S D, then the image of S is given by ⊆ T (S) = y : y = T (s) for s S . { ∈Y ∈ } If P , then the inverse image of P is given by ⊆Y 1 T − (P ) = x D : T (x) P . { ∈ ∈ }

Notation: for a linear operator A, we often write Ax instead of A(x). For linear operators, we can always just use D = , so we largely ignore D hereafter. X Definition. The nullspace of a linear operator A is N(A) = x : Ax = 0 . It is also called the kernel of A, and denoted ker(A). { ∈X } Exercise. For a linear operator A, the nullspace N(A) is a subspace of . Furthermore, if A is continuous (in a normed space ), then N(A) is closedX [3, p. 241]. X Exercise. The range of a linear operator is a subspace of . Y

Proposition. A linear operator on a normed space (to a normed space ) is continuous at every point if it is continuous at a single point in . X Y X X Proof. Exercise. [3, p. 240]. Luenberger does not mention that needs to be a normed space too. Y Definition. A transformation T from a normed space to a normed space is called bounded iff there is a constant M such that T (x) M x , x . X Y k k ≤ k k ∀ ∈X Definition. The smallest such M is called the of T and is denoted T . Formally: ||| ||| T , inf M R : T (x) M x , x . ||| ||| { ∈ k k ≤ k k ∀ ∈ X } Consequently: T (x) T x , x . k kY ≤ ||| |||k kX ∀ ∈X Fact. For a linear operator A, an equivalent expression (used widely!) for the is

A = sup Ax . ||| ||| x 1 k k k k≤

Fact. If is the trivial vector space consisting only of the vector 0, then A = 0 for any linear operator A. X ||| ||| Fact. If is a nontrivial vector space, then for a linear operator A we have the following equivalent expressions: X Ax A = sup k k = sup Ax . ||| ||| x=0 x x =1 k k 6 k k k k c J. Fessler, December 21, 2004, 13:3 (student version) 6.3

...... Example. Consider = (Rn, ) and = R. X k·k∞ Y Clearly Ax = a1x1 + +anxn so Ax = Ax = a1x1 + +anxn a1 x1 + + an xn ( a1 + + an ) x . · · · k kY | | | · · · | ≤ | || | · · · | || | ≤ | | · · · | | k k∞ In fact, if we choose x such that xi = sgn(ai), then x = 1 we get equality above. So we conclude A = a1 + + an . k k∞ ||| ||| | | · · · | | Example. What if = Rn, ? ?? X k·kp  

Proposition. A linear operator is bounded iff it is continuous.

Proof. Exercise. [3, p. 240]. More facts related to linear operators. If A : is linear and is a finite-dimensional normed space, then A is continuous [3, p. 268]. • X →Y X If A : is a transformation where and are normed spaces, then A is linear and continuous iff A( ∞ α x ) = • X →Y X Y i=1 i i i∞=1 αiA(xi) for all convergent series i∞=1 αixi. [3, p. 237]. This is the superposition principle as described in introductory signals and systems courses. P P P

Spaces of bounded linear operators

Definition. If T1 and T2 are both transformations with a common domain and a common range , over a common scalar field, then we define natural addition and scalar multiplication operations as follows:X Y

(T1 + T2)(x) = T1(x) + T2(x)

(αT1)(x) = α(T1(x)).

Lemma. With the preceding definitions, when and are normed spaces the following space of operators (!) is a vector space: X Y B( , ) = bounded linear transformations from to . X Y { X Y} (The proof that this is a vector space is within the next proposition.) This space is analogous to certain types of dual spaces (see Ch. 5). Not only is B( , ) a vector space, it is a normed space when one uses the operator norm A defined above. X Y ||| |||

Proposition. (B( , ), ) is a normed space when and are normed spaces. X Y |||· ||| X Y Proof. (sketch) Claim 1. B( , ) is a vector space. Suppose T ,TX Y B( , ) and α . 1 2 ∈ X Y ∈F (αT1 + T2)(x) = αT1(x) + T2(x) α T1(x) + T2(x) α T1 x + T2 x = K x , where k kY k kY ≤ | | k kY k kY ≤ | |||| |||k kX ||| |||k kX k kX K , α T + T . So αT + T is a bounded operator. Clearly αT + T is a linear operator. | |||| 1 ||| ||| 2 ||| 1 2 1 2 Claim 2. is a norm on B. The “hardest”|||· ||| part is verifying the triangle inequality:

T1 + T2 = sup (T1 + T2)x sup T1 x + sup T2 x = T1 + T2 . ||| ||| x =1 k kY ≤ x =1 k kY x =1 k kY ||| ||| ||| ||| k k k k k k 2

Are there other valid norms for B( , )? ?? X Y Remark. We did not really “need” linearity in this proposition. We could have shown that the space of bounded transformations from to with is a normed space. X Y |||· ||| 6.4 c J. Fessler, December 21, 2004, 13:3 (student version)

Not only is B( , ) a normed space, but it is even complete if is. X Y Y

Theorem. If and are normed spaces with complete, then (B( , ), ) is complete. X Y Y X Y |||· ||| Proof. Suppose T is a Cauchy (in B( , )) of bounded linear operators, i.e., T T 0 as n, m . { n} X Y ||| n − m||| → → ∞ Claim 0. x , the sequence T (x) is Cauchy in . ∀ ∈X { n } Y

Tn(x) Tm(x) = (Tn Tm)(x) Tn Tm x 0 as n, m . k − kY k − kY ≤ ||| − |||k kX → → ∞

Since is complete, for any x the sequence T (x) converges to some point in . (This is called pointwise convergence.) Y ∈X { n } Y So we can define an operator T : by T (x) , limn Tn(x). X →Y →∞ To show B is complete, we must first show T B, i.e., 1) T is linear, 2) T is bounded. Then we show 3) T T (convergence w.r.t.∈ the norm ). n → |||· ||| Claim 1. T is linear

T (αx + z) = lim Tn(αx + z) = lim [αTn(x) + Tn(z)] = αT (x) + T (z) . n n →∞ →∞ (Recall that in a normed space, if u u and v v, then αu + v αu + v.) n → n → n n → Claim 2: T is bounded Since T is Cauchy, it is bounded, so K < s.t. T K, n N. Thus, by the continuity of norms, for any x : { n} ∃ ∞ ||| n||| ≤ ∀ ∈ ∈X

T (x) = lim Tn(x) lim Tn x K x . n n k kY →∞ k kY ≤ →∞ ||| |||k kX ≤ k kX

Claim 3: Tn T Since T is→ Cauchy, { n} ε > 0, N N s.t. n, m > N = T T ε ∀ ∃ ∈ ⇒ ||| n − m||| ≤ = Tn(x) Tm(x) ε x , x ⇒ k − kY ≤ k kX ∀ ∈X = lim Tn(x) Tm(x) ε x , x m ⇒ →∞ k − kY ≤ k kX ∀ ∈X = Tn(x) T (x) ε x , x by continuity of the norm ⇒ k − kY ≤ k kX ∀ ∈X = T T ε. ⇒ ||| n − ||| ≤ We have shown that every Cauchy sequence in B( , ) converges to some limit in B( , ), so B( , ) is complete. 2 X Y X Y X Y Corollary. (B( , R), ) is a for any normed space . X |||· ||| X Why? ?? We write A B( , ) as shorthand for “A is a bounded linear operator from normed space to normed space .” ...... ∈ X Y ...... X ...... Y Definition. In general, if S : and T : , then we can define the product operator or composition as a transformation T S : by (TSX →Y)(x) = T (S(x))Y. → Z X → Z

Proposition. If S B( , ) and T B( , ), then TS B( , ). ∈ X Y ∈ Y Z ∈ X Z Proof. Linearity of the composition of linear operators is trivial to show. To show that the composition is bounded: TSx T Sx T S x . 2 k kZ ≤ ||| |||k kY ≤ ||| |||||| |||k kX Does it follow that T S = T S ? ?? ||| ||| ||| |||||| ||| c J. Fessler, December 21, 2004, 13:3 (student version) 6.5

6.3 Inverse operators Definition. T : is called one-to-one mapping of into iff x , x and x = x = T (x ) = T (x ). X →Y X Y 1 2 ∈X 1 6 2 ⇒ 1 6 2 1 Equivalently, T is one-to-one if the inverse image of any point y is at most a single point in , i.e., T − ( y ) 1, y . ∈Y X { } ≤ ∀ ∈Y Definition. T : is called onto iff T ( ) = . This is a stronger condition than into. X →Y X Y 1 1 Fact. If T : is one-to-one and onto , then T has an inverse denoted T − such that T (x) = y iff T − (y) = x. X →Y Y Many optimization methods, e.g., Newton’s method, require inversion of the Hessian matrix (or operator) corresponding to a cost function. Lemma. [3, p. 171] If A : is a linear operator between two vector spaces and , then A is one-to-one iff N(A) = 0 . X →Y X Y { }

Linearity of inverses We first look at the algebraic aspects of inverse operators in vector spaces.

1 Proposition. If a linear operator A : (for vector spaces and ) has an inverse, then that inverse A− is also linear. X →Y X Y 1 1 Proof. Suppose A− (y1) = x1, A− (y2) = x2, A(x1) = y1, and A(x2) = y2. Then by the linearity of A we have A(αx1 +x2) = 1 1 1 αAx1 + Ax2 = αy1 + y2, so A− (αy1 + y2) = αx1 + x2 = αA− (y1) + A− (y2). 2

6.4 Banach inverse theorem Now we turn to the topological aspects, in normed spaces. Lemma. (Baire) A Banach space is not the union of countably many nowhere dense sets in . Proof. see text X X

Theorem. (Banach inverse theorem) 1 If A is a continuous linear operator from a Banach space onto a Banach space for which the inverse operator A− exists, 1 X Y then A− is continuous. Proof. see text Combining with the earlier Proposition that linear operators are bounded iff they are continuous yields the following. Corollary. 1 , Banach and A B( , ) and A invertible = A− B( , ) X Y ∈ X Y ⇒ ∈ Y X

Equivalence of spaces (one way to use operators) in vector spaces • in normed spaces • in inner product spaces • spaces relation operator requirements vector isomorphic isomorphism linear, onto, 1-1 normed topologically isomorphic topological isomorphism linear, onto, invertible, continuous normed isometrically isomorphic isometric isomorphism linear, onto, norm preserving = 1-1, continuous inner product unitarily equivalent unitary isometric isomorphism that preserves⇒ inner products 6.6 c J. Fessler, December 21, 2004, 13:3 (student version)

Isomorphic spaces Definition. Vector spaces and are called isomorphic (think: “same structure”) iff there exists a one-to-one, linear mapping T of onto . X Y In suchX casesY the mapping T is called an isomorphism of onto . X Y 1 Since an isomorphism T is one-to-one and onto, T is invertible, and by the “linear of inverses” proposition in 6.3, T − is linear. Example. Consider = R2 and = f(t) = a + bt on [0, 1] : a, b R . X Y { ∈ } 1 An isomorphism is f = T (x) f(t) = a + bt where x = (a, b), with inverse T − (f) = (f(0), f(1) f(0)). ⇐⇒ − Exercise. Any real n-dimensional vector space is isomorphic to Rn (problem 2.14 p. 44) [3, p. 268]. However, they need not be topologically isomorphic [3, p. 270]...... So far we have said nothing about norms. In normed spaces we can have a topological relationship too. Definition. Normed spaces and are called topologically isomorphic iff there exists a continuous linear transformation T of X Y 1 onto having a continuous inverse T − . The mapping T is called a topological isomorphism. X Y

Theorem. [3, p. 258] Two normed spaces are topologically isomorphic iff there exists a linear transformation T with domain and range and positive constants m and M s.t. m x T x M x , x . X Y k kX ≤ k kY ≤ k kX ∀ ∈X

Example. In the previous example, consider ( , ) and ( , 2). Then for the same T described in that example: X k·k∞ Y k·k 2 1 2 2 2 2 2 2 2 x = (a, b) = T (x) = 0 (a + bt) dt = a + ab + b /3 a + a b + b /3 (1 + 1 + 1/3) x = 7/3 x . ⇒ k kY ≤ | | | | ≤ k k∞ k k∞ Also x 2 2 2 2 2 √ √ 2 2 2 T ( ) = a + ab +Rb /3 = (a + b/2) + b /12 = (a 3/2 + b/ 3) + a /4 x /12. So kand kareY topologically isomorphic for the given norms. ≥ k k∞ X Y

Exercise. (C[0, 1], ) and (C[0, 1], 1) are not topologically isomorphic. Why? ?? k·k∞ k·k

Isometric spaces Definition. Let and be normed spaces. A mapping T : is called norm preserving iff T (x) = x , x . X Y X →Y k kY k kX ∀ ∈X In particular, if T is norm preserving, then T = 1. What about the converse? ?? ||| |||

Proposition. If T is linear and norm preserving, then T is one-to-one, i.e., T (x) = T (z) = x = z. ⇒ Proof. If T (x) = y and T (z) = y, then by linearity T (x z) = 0. So since T is norm preserving, x z = 0 = 0, so x−= z. 2 ...... k − k k k ...... Definition. If T : is both linear and norm preserving, then T is called a linear isometry. X →Y If, in addition, T is onto , then and are called isometrically isomorphic, and T is called an isometric isomorphism. Y X Y n Remark. To illustrate why we require onto here, consider T : E `2 defined by T (x) = (x1,...,xn, 0, 0,...). This T is linear, one-to-one, and norm preserving, but not onto. →

Exercise. Every normed space is isometrically isomorphic to a dense subset of a Banach space ˆ. X X (problem 2.15 on p. 44) Normed spaces that are isometrically isomorphic can, in some sense, be treated as being identical, i.e., they have identical proper- ties. However, the specific isomorphism can be important sometimes. N p Example. Consider = `p( ) = (a1, a2,...) : i∞=1 ai < Y { | | ∞}p and = `p(Z) = (...,a 2, a 1, a0, a1, a2,...) : ∞ ai < . Define the mapping T : by y = (b1, b2,...) = X − − P i= | | ∞ X →Y T (x) if b = a where z(i) = ( 1)i i/2 . Note that z −∞: 1, 2, 3,... 0, 1, 1, 2, 2,.... i z(i)  − b c P { } → − − This mapping T is clearly an isometric isomorphism, so “`p (Z)” and “`p (N)” are isometrically isomorphic. Hence we only bother to work with `p = `p(N) since we know all algebraic and topological results will carry over to double-sided . c J. Fessler, December 21, 2004, 13:3 (student version) 6.7

Unitary equivalence in inner product spaces Definition. Two inner product spaces and are unitarily equivalent iff there is an isomorphism U : of onto X Y X →Y X Y that preserves inner products: Ux1, Ux2 = x1, x2 , x1, x2 . The mapping U is called a . h iY h iX ∀ ∈X Fact. If U is unitary, then U is norm preserving since Ux = Ux, Ux = x, x = x . Clearly U = 1. Furthermore, since U is onto, U is an isometric isomorphismk k . Interestingly,h i theh conversei k isk also true [3,||| p.||| 332].

Theorem. A mapping U of onto , where and are inner product spaces, is an isometric isomorphism iff U is a unitary operator. X Y X Y

Proof. ( =) was just argued above. (= ) Suppose⇐ U is an isometric isomorphism. Using⇒ the parallelogram law, the linearity of U, and the fact that Ux = x , we have: k k k k 4 Ux, Uy = Ux + Uy 2 Ux Uy 2 + i Ux + iUy 2 i Ux iUy 2 h i k k − k − k k k − k − k = U(x + y) 2 U(x y) 2 + i U(x + iy) 2 i U(x iy) 2 k k − k − k k k − k − k = x + y 2 x y 2 + i x + iy 2 i x iy 2 = 4 x, y . k k − k − k k k − k − k h i Thus U an isometric isomorphism = U unitary. 2 ⇒ 1 ? Remark. After defining adjoints we will show that U − = U in Hilbert spaces. Exercise. Any complex n-dimensional is unitarily equivalent to Cn [3, p. 332].

n Every separable is unitarily equivalent with `2 or some C [3, p. 339]. Example. Continue the previous f(t) = a + bt example, but now use En and ( , , ). If g(t) = c + dt then Y h· ·i2 1 1 ad bc bd 1 1/2 c T x , T x = f, g = f(t)g(t) dt = (a+bt)(c+dt) dt = ac+ + + = [a b] = x0 Gx , h 1 2i h i 2 2 3 1/2 1/3 d 1 2 Z0 Z0     1/2 so we define U = T G− , then

1/2 1/2 1/2 1/2 Ux , Ux = T G− x , T G− x = (G− x )0G(G− x ) = x0 x = x , x . h 1 2i h 1 2i 1 1 1 1 h 1 2i

ı2πνt Example. The Fourier transform F (ν) = ∞ f(t) e− dt is a unitary mapping of 2[R] onto itself. −∞ L Example. Soon we will analyze the discrete-timeR Fourier transform (DTFT) operator, defined by

∞ G = g G(ω) = g eıωn . F ⇐⇒ n n= X−∞ We will show that B(` , [ π, π]) and is invertible. And Parseval’s relation from Fourier analysis is that F ∈ 2 L2 − F π 1 1 1 ∞ g, h = G(ω)H∗(ω) dω = gnh∗ = g, h . √2π F √2π F 2π n h i   π n= Z− X−∞ So ` and [ π, π] are unitarily equivalent, and the unitary operator needed is simply U = 1 . 2 L2 − √2π F 6.8 c J. Fessler, December 21, 2004, 13:3 (student version)

The following “extension theorem” is useful in proving that every separable Hilbert space is isometrically isomorphic to `2.

Theorem. Let be a normed space, a Banach space, and M and N be subspaces. Suppose that X Y ⊂ X ⊂ Y M = , and N = , • T˜ : MX N is a boundedY linear operator. Then• there→ exists a unique linear operator T : such that T M = T˜. Moreover, T = T˜ . If, in addition, X →Y | ||| ||| ||| ||| is a Banach space, • X−1 T˜ : N M exists and is bounded, then• T is also→onto . Y Proof. Note: T M reads T restricted to M. | T M = T˜ means T (x) = T˜(x) for all x M...... | ∈ ...... Claim 1. If x M is Cauchy in M, then T˜(x ) is Cauchy in N. { n} ∈ n n o T˜(xn) T˜(xm) = T˜(xn xm) T˜ xn xm 0 as n, m . − − ≤ ||| |||k − kX → → ∞ ...... Y Y ...... Claim 2. For x , x˜ M, suppose x x and x˜ x. { n} { n } ∈ n → ∈X n → Then T˜(xn) and T˜(x˜n) both converge and to the same limit. n o n o T˜(xn) and T˜(x˜n) are both Cauchy in N by Claim 1 since xn and x˜n both converge. nSince ois complete,n yo, y˜ s.t. T˜(x ) y and T˜(x˜ ) y˜. Y ∃ ∈Y n → n → But y y˜ = limn T˜(xn) T˜(x˜n) = limn T˜(xn x˜n) limn T˜ xn x˜n k − kY →∞ − →∞ − ≤ →∞ ||| |||k − kX ˜ Y ˜ Y ˜ = T x x = 0, using norm continuity, linearity of T , and boundedness of T . ||| |||k − kX Now we define T : as follows. X →Y For x = M, xn M such that xn x, so we define T (x) , limn T˜(xn). By Claim∈X 2, T is well-defined,∃ { } ∈ and moreover,→ if x M, then T (x) = T˜(x), i.e.→∞, T M = T˜...... ∈ ...... | ...... Claim 3. T = T˜ ||| ||| ||| ||| x , T (x) = limn T˜(xn) where xn M and xn x. ∀ ∈X →∞ ∈ → Thus T (x) = limn T˜(xn) limn T˜ xn = T˜ x . Thus T T˜ . k kY →∞ ≤ →∞ ||| |||k kX ||| |||k kX ||| ||| ≤ ||| ||| ˜ Y ˜ However, x M, T (x) = T (x) = T T , since T is defined on M. Thus T ∀= ∈T˜ . ⇒ ||| ||| ≥ ||| ||| X ⊇ ...... ||| ||| ||| ||| ...... Claim 4. T is linear, which is trivial to prove...... Claim 5. (uniqueness of T ) If L1, L2 B( , ), then L1 M = L2 M = L1 = L2. For x = M, x M s.t. x ∈ xX. Thus,Y by the| continuity| of L⇒and L , ∈X ∃ { n} ∈ n → 1 2 L1(x) = limn L1(xn) = limn L2(xn) = L2(x), since L1(xn) = L2(xn) by L1 M = L2 M. →∞ →∞ | | ...... 1 Claim 6. If T˜− : N M exists and is bounded, and if is a Banach space, then T is onto . → X Y Let y . Since N = , yn N s.t. yn y. ∈Y Y ∃ { } ∈ →1 By the same reasoning as in Claim 1, x = T˜− (y ) is Cauchy in . n n X Since is a Banach space, x s.t. xn x. X ∃ ∈X → 1 Thus T (x) = limn T˜(xn) = limn T˜ T˜− (yn ) = limn yn = y. →∞ →∞ →∞ Thus, y , x s.t. T (x) = y.   2 ∀ ∈Y ∃ ∈X c J. Fessler, December 21, 2004, 13:3 (student version) 6.9

The following theorem is an important application of the previous theorem.

Theorem. Every separable Hilbert space is isometrically isomorphic to `2. H Proof. ∞ separable = a countable orthonormal basis ei . (Homework.) Let M = [ ei i=1]. H ⇒ ∃ { } { }∞ Of course `2 has a countable orthonormal basis e˜i , where e˜ij = δi−j (Kronecker). Let N = [ e˜i ]. { } { }i=1 Define T˜ : M N to be the linear operator for which T˜(ei)= e˜i. (Exercise. Think→ about why “the” here—consider M.) n Then since any x M has form x = ciei for some n N: ∈ i=1 ∈ n Pn n n n 2 T˜(x) = T˜ ciei = ci T˜(ei) = cie˜i = ci = ciei = x , 2 =1 ! =1 =1 v =1 | | =1 k k i 2 i 2 i 2 u i i H X X X uX X (6-1) t so T˜ = 1 on M. ||| ||| −1 −1 Clearly T˜ exists on N and is given by T˜ (e˜i)= ei, so T˜ = 1 on N by a similar argument. ||| ||| Since `2 is a Banach space, we have established all the conditions of the preceding theorem, so there exists a unique linear operator T from to `2 that is onto `2 with T = T˜ = 1. H ||| ||| ||| ||| Since T is bounded, it is continuous, so one can take limits as n in (6-1) to show that T x = x , so T is norm preserving. → ∞ k k k k

Thus T is an isometric isomorphism for and `2, so and `2 are isometrically isomorphic. 2 H H Corollary. 2 is isometrically isomorphic to `2. L A remarkable result. (Perhaps most of what is remarkable here is that 2 is separable.) L A practical consequence: usually one can search for counterexamples in `2 (and its relatives) rather than 2...... L Example. The (different) spaces of odd and even functions in 2[ π,π] are isometrically isomorphic. L − Example. Elaborating. Let = 2[ π,π] and define H L − e2k = cos(kt) /c2k e2k+1 = sin(kt) /c2k+1, k = 0, 1, 2,... π π 2 2 c2k = cos (kt)dt c2k+1 = sin (kt)dt Z−π Z−π = [ e2k ]= f 2 : f even = [ e2k+1 ]= f 2 : f odd . X { } { ∈ L } Y { } { ∈ L } Then , , and are each Banach spaces that are isometrically isomorphic to each other! H X Y And each is isometrically isomorphic to `2.

Example. `2 is isometrically isomorphic to 2[ π,π]. (Just use the DTFT.) ...... L −...... But an even stronger result holds than the above...

n Every separable Hilbert space is unitarily equivalent with `2 or some C [3, p. 339]. 6.10 c J. Fessler, December 21, 2004, 13:3 (student version)

Summary Insert 5.1-5.3 here! c J. Fessler, December 21, 2004, 13:3 (student version) 6.11

6.5 Adjoints in Hilbert spaces When performing optimization in inner product spaces, often we need the “transpose” of a particular linear operator. The term transpose only applies to matrices. The more general concept for linear operators is called the adjoint. Luenberger presents adjoints in terms of general normed spaces. In my experience, adjoints most frequently arise in inner product spaces, so in the interest of time and simplicity these notes focus on that case. The treatment here generalizes somewhat the treatment in Naylor [3, p. 352]. Recall the following fact from linear algebra. If A : Cn Cm is a m n matrix, then → ×

Ax, y n = y0Ax = (A0y)0x = x, A0y m . h iC h iC This section generalizes the above relationship to general Hilbert spaces. Let A B( , ) where and are Hilbert spaces. Let y ∈ beX aY fixed vector,X andY consider the following functional: ∈Y gy : C, where gy(x) , Ax, y . X → h iY gy is clearly linear, since A is linear and , y is linear. • h· iY gy(x) Ax y A x y by Cauchy-Schwarz and since A is bounded. • | | ≤ k kY k kY ≤ ||| |||k kX k kY Thus gy is bounded, with gy A y . ||| ||| ≤ ||| |||k kY ? In other words, gy . ∈X Definition. By the Riesz representation theorem (here is where we use completeness), for each such y there exists a unique ∈Y z = zy such that ∈X gy(x) = x, zy , x . h iX ∀ ∈X ? ? So we can define legitimately a mapping A : , called the adjoint of A, by the relationship zy = A (y). The defining property of A? is then: Y→X Ax, y = x, A?(y) , x , y . h iY h iX ∀ ∈X ∈Y (At this point we should write A?(y) rather than A?y since we have not yet shown A? is linear, though we will soon.) Lemma. A? is the only mapping of to that satisfies the preceding equality. Y X Proof. For any y , suppose x we have Ax, y = x, T1(y) = x, T2(y) . ∈Y ∀ ∈X h iY h iX h iX Then 0 = x, T1(y) T2(y) so T1(y) = T2(y), y . 2 h − iX ∀ ∈Y Exercise. Here are some simple facts about adjoints, all of which concur with those of Hermitian transpose in Euclidean space. ? αx, y = x, α∗y so for A : defined by Ax = αx we have A y = α∗y. So “reuse” of the asterix is acceptable. • hI? = I i, h 0? i= 0 X →X • A?? = A (seeX Thm →Y below)Y→X • (ST )? = T ?S? • (S + T )? = S? + T ? • ? ? (αA) = α∗A • Note: these last two properties are unrelated to the question of whether A? is a linear operator! ?? Example. Consider = [0, 1], = C2, and A : is defined by X L2 Y X →Y 1 1 y = Ax y = [Ax] = tx(t) dt, y = [Ax] = t2x(t) dt . ⇐⇒ 1 1 2 2 Z0 Z0 ? ? 2 We can guess that the adjoint of A is defined by x = A y x(t) = (A y)(t) = y1t + y2t . This is verified easily: ⇐⇒

1 1 1 2 2 ∗ ? Ax, y = y1∗[Ax]1 + y2∗[Ax]2 = y1∗ tx(t) dt +y2∗ t x(t) dt = x(t) y1t + y2t dt = x, A y . h iY 0 0 0 h iX Z Z Z h (A?y)(t) i | {z } 6.12 c J. Fessler, December 21, 2004, 13:3 (student version)

Theorem. Suppose A B( , ) where and are Hilbert spaces. The adjoint operator∈A? isX linearY and bounded,X Yi.e., A? B( , ). • A? = A ∈ Y X • |||(A?)|||? =|||A ||| • Proof. Claim 1. A? is linear. ? ? ? x, A (αy + z) = Ax, αy + z = α∗ Ax, y + Ax, z = α∗ x, A (y) + x, A (z) h i h i h i h i h i h i = x, αA?(y) +XA?(z) , which holdsY for all x Y and y Y, so A?(αy + z) =XαA?(y) + A?(Xz) by the usual Lemma once h i ∈X ∈Y again. X ...... Claim 2. A? is bounded and A? A . ||| ||| ≤ ||| ||| A?y 2 = A?y, A?y = AA?y, y AA?y y A A?y y k kX h iX h iY ≤ k kY k kY ≤ ||| |||k kX k kY so A?y A y and thus A? A and hence A? B( , )...... k kX ≤ ||| |||k kY ||| ||| ≤ ||| ||| ...... ∈ Y X ...... Claim 3. A?? = A. Since we have shown A? B( , ), we can legitimately define the adjoint of A?, denoted A??, as the (bounded linear) operator that satisfies A?y, x =∈ y,Y A??Xx , x , y . h i h i ∀ ∈X ∈Y X Y? ? ?? ?? Since y, Ax = Ax, y ∗ = x, A y ∗ = A y, x = y, A x , by the previous uniqueness arguments we see A = A. h i h i h i h i h i ...... Y Y X X...... Y ...... Claim 4. A? = A . From Claim||| 2||| with|||A|||?: A?? A? or equivalently: A A? . 2 ||| ||| ≤ ||| ||| ||| ||| ≤ ||| ||| Corollary. Under the same conditions as above: A?A = AA? = A 2 = A? 2. ||| ||| ||| ||| ||| ||| ||| ||| Proof. Recalling that ST S T we have by the preceding theorem: ||| ||| ≤ ||| |||||| ||| A?A A? A = A 2 = A? 2, ||| ||| ≤ ||| |||||| ||| ||| ||| ||| ||| and Ax 2 = Ax, Ax = Ax, (A?)?x = A?Ax, x A?Ax x A?A x 2 k kY h iY h iY h iX ≤ k k k k ≤ ||| |||k k so A 2 A?A . Combining yields the equality A 2 = A?A . The rest is obvious using A?? = A. 2 ||| ||| ≤ ||| ||| ||| ||| ||| |||

? ? 1 1 ? Proposition. If A B( , ) is invertible, where , are Hilbert spaces, then A has an inverse and (A )− = (A− ) . ∈ X Y X Y Proof. Claim 1. A? : is one-to-one into . Y→X X ? ? ? Consider y1, y2 with y1 = y2. but suppose A y1 = A y2, so A d = 0 where d = y2 y1 = 0. Thus x, A?d ∈Y= 0, x 6 and hence Ax, d = 0, x . − 6 h i ∀ ∈X h i ∀ ∈X Since A is invertible,X we can make the “change of variables”Y z = Ax and hence z, d = 0, z . h i ∀ ∈X But this implies d = 0, contradicting the supposition that d = 0. So A? is one-to-one. Y 6 Claim 2. A? : is onto . Y→X X 1 1 1 ? By the Banach inverse theorem, since A B( , ) and A is invertible, A− B( , ), so A− has its own adjoint, (A− ) . Pick any z . Then for any x ∈ X Y ∈ Y X ∈X ∈X 1 1 ? 1 ? ? 1 ? x, z = A− Ax, z = Ax, (A− ) z = Ax, (A− ) z = x, A (A− ) z . h i h i h i h i h i ? 1 ? ? ? Thus z = A (A− ) z , showing that z is in the range of A . Since z was arbitrary, A is onto . ∈X X h i ∈Y ? 1 1 ? Claim 3. (A )−| ={z (A}− ) . Since A? is one-to-one and onto , A? is invertible. ? 1 ? X ? 1 1 ? ? 1 1 ? Furthermore, z = A (A− ) z = (A )− z = (A− ) z, z . Thus (A )− = (A− ) . 2 ⇒ ∀ ∈X 1 Remark. AB = I and BA = I = A, B invertible and A− = B. ⇒ 1 Remark. AB = I and A invertible = A− = B. ⇒ c J. Fessler, December 21, 2004, 13:3 (student version) 6.13

Definition. A B( , ), where is a real Hilbert space, is called self adjoint if A? = A. ∈ H H H Exercise. An orthogonal projection P : , where M is a closed subspace in a Hilbert space , is self adjoint. ?? M H→H H Exercise. Conversely, if P B( , ) and P 2 = P and P ? = P , then P is an orthogonal projection operator. (L6.16) ∈ H H Definition. A self-adjoint bounded linear operator A on a Hilbert space is positive semidefinite iff x, Ax 0, x . H h i ≥ ∀ ∈H Remark. It is easily shown that x, Ax is real when A is self-adjoint. h i

Example. When M is a Chebyshev subspace in an inner product space, is PM a self-adjoint operator? ??

Unitary operators (Caution: the proof on [3, p. 358] is incomplete w.r.t. the “onto” aspects.) We previously defined unitary operators; we now examine the adjoints of these.

Theorem. Suppose U B( , ) where , are Hilbert spaces. Then the following are equivalent: 1. U is unitary, i.e., U∈is anX isomorphismY X (linear,Y onto, and one-to-one) and Ux, Uz = x, z , x, z , 2. U ?U = I and UU ? = I, h i h i ∀ ∈X 1 ? 3. U is invertible with U − = U . Proof. (2 = 3) and (3 = 1) are obvious. ⇒ ⇒ (1 = 2) If U is unitary then for all x, z : x, z = Ux, Uz = U ?Ux, z , so U ?Ux = x, x so U ?U = I . ⇒ ∈X h i h i h i ∀ ∈X X For any y , since U is onto there exists an x Xs.t. Ux = y. Thus UU ?y = UU ?Ux = UI x = Ux = y. Since y ∈Ywas arbitrary, UU ? = I . ∈X X 2 ∈Y Y Remark. A corollary is that if U is unitary, then so is U ?. 1 Remark. To see why we need both U ?U = I and UU ? = I above, consider U = , for which U ?U = I but U is not onto. 0   Example. Consider = = ` and the (linear) discrete-time convolution operator A : ` ` defined by X Y 2 2 → 2

∞ z = Ax zn = hn kxk, n Z, ⇐⇒ − ∈ k= X−∞ where we assume that h ` , which is equivalent to BIBO stability. We showed previously that Ax h x , so A is ∈ 1 k k2 ≤ k k1 k k2 bounded with A h 1 , so A has an adjoint. (Later we will show A = H where H is the frequency response of h.) ||| ||| ≤ k k ||| ||| k k∞ Since A is bounded, it is legitimate to search for its adjoint:

∗ ∞ ∞ ∞ ∞ ∞ ? ? Ax, y = yn∗ xkhn k = xk ynhn∗ k = xk[A y]k∗ = x, A y , h i − − h i n= "k= # k= "n= # k= X−∞ X−∞ X−∞ X−∞ X−∞ where the adjoint is

? ∞ ? ∞ [A y]k = hn∗ kyn = [A y]n = hk∗ nyk, − ⇒ − n= k= X−∞ X−∞ which is convolution with hk∗ n . − When is A self adjoint? When h l = h∗ l, i.e., h is Hermitian symmetric. − 2 When is A unitary? When hn h∗ n = δ[n], i.e., when H(ω) = 1. ∗ − | | 6.14 c J. Fessler, December 21, 2004, 13:3 (student version)

6.6 Relations between the four spaces The following theorem relates the null spaces and range spaces of a linear operator and its adjoint. Remark. Luenberger uses the notation [R(A)] but this seems unnecessary since R(A) is a subspace.

Theorem. If A B( , ) where , are Hilbert spaces, then ∈ X Y X Y ? ? 1. R(A) ⊥ = N(A ), 2. R(A) = N(A ) ⊥ , { } { } ? 3. R(A ) ⊥ = N(A), 4. R(A?) = N(A) ⊥ . { } { } Proof. ? Claim 1. R(A) ⊥ = N(A ) { } Pick z N(A?) and any y R(A), so y = Ax for some x . ∈ ∈ ∈X Now y, z = Ax, z = x, A?z = x, 0 = 0. h i h i h i h i Y ? Y X X ? Thus z N(A ) = z R(A) ⊥ since y R(A) was arbitrary. So N(A ) R(A) ⊥. ∈ ⇒ ∈ { } ∈ ⊂ { } ? ? ? Now pick y R(A) ⊥. Then for all x , 0 = Ax, y = x, A y . So A y = 0, i.e., y N(A ). ∈ { } ∈X h i h i ∈ ? Y X ? Since y R(A) ⊥ was arbitrary, R(A) ⊥ N(A ). Combining: R(A) ⊥ = N(A ) ...... ∈ { } { } ⊂ ...... { } ...... ? Claim 2. R(A) = N(A ) ⊥ , { } ? Taking the of part 1 yields R(A) ⊥⊥ = N(A )⊥. { } Recall from proposition 3.4-1 that S⊥⊥ = [S] when S is a subset in a Hilbert space. Since R(A) is a subspace, [R(A)] = R(A)...... Parts 3 and 4 follow by applying 1 and 2 to A? and using the fact that A?? = A. 2

Example. (To illustrate why we need closure in R(A) above. Modified from [4, p. 156].)

Consider the linear operator A : `2 `2 defined by Ax = (x1,x2/√2,x3/√3,...). Clearly x `2 = Ax `2 and Ax x , so A is bounded (and hence→ continuous). In fact A = 1 (consider x = e ). ∈ ⇒ ∈ k k ≤ k k ||| ||| 1 Clearly R(A) includes all finitely nonzero sequences, so R(A) = `2. However, y = (1, 1/2, 1/3,...) / R(A) (Why not?) yet y ` , so R(A) is not closed. ∈ ∈ 2 This problem never arises in finite-dimensional spaces!

Example. Consider A : `2 `2 defined by the upsampling operator: Ax = (x1, 0, x2, 0, x3, 0 . . .). It is easily shown that A is→ bounded and A = 1. ||| ||| ? One can verify that the adjoint operation is downsampling: A y = (y1, y3, y5, ...). ? ? Clearly R(A ) = `2 and N(A) = 0 = [R(A )]⊥ . Furthermore, R(A) = [ e ]{and} N(A?) = [ e ] and these two spaces are orthogonal complements of one another. ∪i∞=0 2i+1 ∪i∞=1 2i c J. Fessler, December 21, 2004, 13:3 (student version) 6.15

Example. Consider A : [R] [R] (which is complete [3, p. 589]) defined by the shifted filtering operator: L2 → L2

∞ y = Ax y(t) = (Ax)(t) = sinc2(t 3 τ) x(τ) dτ, ⇐⇒ − − Z−∞ sin(πt) where sinc(t) = πt or unity if t = 0. The adjoint is

∞ (A?y)(t) = sinc2(t + 3 τ) y(τ) dτ. − Z−∞ ? The nullspace of A consists of signals in 2 whose spectrum is zero over the frequencies ( 1/2, 1/2). The range of A is all signals in that are -limited to that sameL range, so the orthogonal complement is the same− as N(A?). (Picture) . L2 Exercise. Why did I use sinc2( ) rather than sinc( )? ?? · ·

6.7 Duality relations for convex cones skip

6.8 Geometric interpretation of adjoints (Presented in terms of general adjoints.) skip When A B( , ) with and Hilbert spaces, consider the following hyperplanes: ∈ X Y X Y V = x : x, A?y = 1 for some y , V = y : Ax , y = 1 for some x . 1 ∈X h i ∈Y 2 ∈Y h 0 i 0 ∈X n X o n Y o 6.16 c J. Fessler, December 21, 2004, 13:3 (student version)

Optimization in Hilbert spaces Consider the problem of “solving” y = Ax, where y is given, x is unknown, A B( , ) is given, and and are Hilbert spaces. For any such y, there are three possibilities∈ Y for x: ∈ X ∈ X Y X Y a unique solution, • no solution, • multiple solutions. • 1 If A is invertible, then there is a unique solution x = A− y, which is the least interesting case. 6.9 The normal equations (No exact solutions, so we seek a minimum-norm, unconstrained approximation.) We previously explored the normal equations in a setting where R(A) was finite dimensional. Now we have the tools to generalize. The following theorem illustrates the fundamental role of adjoints in optimization.

Theorem. Let A B( , ) where and are Hilbert spaces. For a fixed y ,∈ a vectorX Yx minimizesX Y y Ax iff A?Ax = A?y. ∈Y ∈X k − kY

Proof. Consider the subspace M = R(A). Then the minimization problem is equivalent to infm M y m . ∈ k − k ? By the pre-projection theorem, m? M achieves the infimum iff y m? M, i.e., y m? M ⊥ = [R(A)]⊥ = N(A ), by a ? ∈ ? ? − ⊥ − ∈ previous theorem. Thus, 0 = A (y m?) = A y A Ax, for some x . 2 There is no claim of existence here,− since R(A) might− not be closed. ∈X • There is no claim of uniqueness of x? here, since although m? will be unique, there may be multiple solutions to m? = Ax. • If a minimum distance solution x exists and A?A is invertible, then the solution is unique and has the familiar form: • ? ? 1 ? x? = (A A)− A y.

n Example. Find minimum-norm approximation to y of the form yˆ = i=1 αixi, where the xi’s are linearly independent vectors in . ∈ H H P We know how to solve this from Ch. 3, but the operator notation provides a concise expression. Define the operator A B(Cn, ) by ∈ H n Aα , αixi where α = (α1,...,αn). i=1 X Note that Aα = √α Gα λ (G) α where G = A?A is the Gram matrix. Since the x ’s are linearly independent, G k k 0 ≤ max k k i is symmetric positive definite so its eigenvalues are real and positive. So A is bounded and A = λ (G). p ||| ||| max Our goal is to minimize y Aα over α Cn. By the preceding theorem, the optimal solution mustp satisfy A?Aα = A?y. k − k ∈ ? n ? n What is A : C here? Recall we need Aα, y = α, A y Cn , α C , so H → h iH h i ∀ ∈ n n n n ? ? Aα, y = α x , y = α x , y = α y, x ∗ = α [A y]∗ = α, A y n , h i i i i h i i i h ii i i h iC H *i=1 + i=1 H i=1 H i=1 X H X X X ? where we see that [A y]i = y, xi and hence h iH A?y = y, x ,..., y, x . h 1i h ni  H H Thus one can easily show that A?Ax = A?y is equivalent to the usual normal equations. ? 1 ? So no computational effort has been saved, but the notation is more concise. Furthermore, the notation (A A)− A y is comfort- 1 ingly similar to the notation (A0A)− A0y that we use for the least-squares solution of linear systems of equations in Euclidean space. So with everything defined appropriately, the generalization to arbitrary Hilbert spaces is very natural. c J. Fessler, December 21, 2004, 13:3 (student version) 6.17

6.10 The dual problem (for minimum norm solutions) If y = Ax has multiple solutions, then in some contexts it is reasonable to choose the solution that minimizes some type of norm. However, the appropriate norm is not necessarily the norm induced by the inner product. Exercise. Generalize Luenberger’s treatment to weighted norms.

Theorem. Let and be Hilbert spaces and A B( , ). Suppose y R(A) is given, i.e., y = Ax0 for some x0 . Assume that R(AX?) is closedY in . ERROR in Luenberger∈ X Y p. 161. ∈ ∈X The unique vector x havingX minimum norm and satisfying Ax = y is characterized by: ? ∈X x = A?z : AA?z = y, z . ? { ∈ Y}

Proof. Since x0 is one solution to Ax = y, the general solution has the form x = x0 m, where m M , N(A). In other words, we seek the minimum norm vector in the linear variety − ∈

V = x : Ax = y = x m : m M . { ∈X } { 0 − ∈ } Since A is continuous, N(A) is a closed subspace (homework). Thus V is a closed linear variety in a Hilbert space, and as such has a unique element x? of minimum norm by the (generalized) projection theorem, and that element is characterized by the two conditions x = x m M, and x V . ? 0 − ? ⊥ ? ∈ ? ? ? Since R(A ) was assumed closed, by the previous “4-space” theorem we have M ⊥ = N(A) ⊥ = R(A ) = R(A ) . ? ? { } Thus x? M = x? M ⊥ = R(A ) = x? = A z for some z . (There may be more than one such z.) ⊥ ⇒ ∈ ⇒ ? ∈Y Furthermore, x? V = Ax? = y = AA z = y. (x? is unique even if there are many such z values!) 2 ...... ∈ ⇒ ⇒ ...... If AA? is invertible, then the minimum norm solution has the form:

? ? 1 x? = A (AA )− y.

...... 1 0 1 Example. = R3, = R2, A = , y = (3, 3). X Y 0 1 1   There are multiple solutions including x1 = (0, 0, 3) and x2 = (3, 3, 0) and convex combinations thereof. ? T ? 2 1 ? 1 1 2 1 ? 1 1 ? Here A = A so AA = , [AA ]− = , z = [AA ]− y = and x = A z = (1, 1, 2). 1 2 3 1− 2 1 ?    −    Of course x? has a smaller 2-norm than the other solutions above. However, x1 is more “sparse” which can be important in some applications. 1 Example. = R1, = R2, A = , y = (2, 2). Then, using MATLAB’s pinv function, xˆ = 2. X Y 1   1 1 In this case, AA? = , so there are multiple solutions to AA?z = y, each of which leads to the same xˆ = A?z however! 1 1 ......   ...... Example. The “downsampling by averaging” operator A : ` ` is defined by Ax = (x /2 + x /2,x /2 + x /2,...). 2 → 2 1 2 3 4 One can show that this is bounded with A = 1/√2, since [1/2 1/2] = 1/√2. ? ||| ||| ? ||| ||| 1 ? 1 The adjoint is A y = (y1/2,y1/2,y2/2,y2/2,...), so AA z = (z1/2,z2/2,...) = 2 z = AA = 2 I. So z = 2y. ? ? ⇒ Thus x? = A z = 2A y = (y1,y1,y2,y2,...), which is a sensible solution in this application. 6.18 c J. Fessler, December 21, 2004, 13:3 (student version)

6.11 Pseudo-inverse operators This concept allows a more general treatment of finding “solutions” to Ax = y, regardless of how many such solutions. Definition. Let and be Hilbert spaces with A B( , ) and R(A) closed in . For any y , define the following linear variety: X Y ∈ X Y Y ∈Y

Vy = x1 : Ax1 y = min Ax y . ∈X k − k x k − k  Y ∈X Y  Among all vectors x1 Vy, let x0 be the unique vector of minimum norm . The pseudo-inverse A∈+ of A is the operator mapping each y in into its correspondingk·kX x . So A+ : . Y 0 Y→X Note: closure of R(A) usually arises from one of or being finite dimensional. X Y This definition is legitimate since minx y Ax = minm M=R(A) y m where R(A) is assumed closed. By the projection theorem there is a unique∈X kyˆ −M =k R(A) of∈ minimumk distance− k to y. However, the linear variety V = x : yˆ =∈ Ax may nevertheless contain multiple points. { ∈X } What about uniqueness of the vector having minimum norm?

The set x1 : Ax1 = yˆ is a linear variety, a translate of N(A), which is closed. Why? ?? { ∈X } + So by the Ch. 3 theorem on minimizing norms within a linear variety, x0 is unique. Thus A is well defined. 1 + 1 If A is invertible, then x0 = A− y will of course be the minimizer, in which case we have A = A− . y

A R(A) x N(A) ⊥ { } yˆ

x0

N(A) A+ V = x : Ax = yˆ { ∈X } + Often A is many-to-one since many y vectors will map to the same x0. Geometric interpretation (The above definition is algebraic.) Since N(A) is a closed subspace in the Hilbert space , by the theorem on orthogonal complements we have X

= N(A) N(A) ⊥ . X ⊕ { } Similarly, since we have assumed that R(A) is closed (and it is a subspace):

= R(A) R(A) ⊥ . Y ⊕ { }

When restricted to the subspace N(A) ⊥, the operator A is a mapping from N(A) ⊥ to R(A) (of course). Between these spaces, A is one-to-one{ ,} due to the following Lemma. { } + Thus A has a linear inverse on R(A) that maps each point in R(A) back into a point in N(A) ⊥. This inverse defines A on + + { } R(A). To define A on all of , define A y = 0 for y R(A) ⊥. Y 1 ∈ { } + − One way to write this is: A = A N(A) ⊥ P . | { } R(A) Lemma. Let A : be a linearh operator oni a Hilbert space . X →Y X If S N(A) ⊥, then A is one-to-one on S. ⊆ { } Proof. Suppose As1 = As2. where s1, s2 S N(A) ⊥, which is a subspace, so s = s1 s2 N(A) ⊥. By the linearity of A, we have As = 0, so s∈ N⊆( {A). Thus} s = 0 and hence s = s . So A is− one-to-one∈ { on}S. 2 ∈ 1 2 c J. Fessler, December 21, 2004, 13:3 (student version) 6.19

[N(A)]⊥ A R(A)

A+

N(A) [R(A)]⊥

A A+

0 0

...... 0 1 Example. Consider A = . Then 0 1  

N(A) = (a, 0) : a R , N(A) ⊥ = (0, b) : b R , { ∈ } { } { ∈ } R(A) = (b, b) : b R , R(A) ⊥ = (a, a) : a R . { ∈ } { } { − ∈ }

Recall that v v y = P x y = x, , [v] ⇐⇒ v v  k k k k 1 1 1 so P = 1 [1 1] = 1 . R(A) 2 1 2 1 1     1 1 − 0 0 + − 0 0 1 1 1 0 0 And clearly A N(A) ⊥ = , so A = A N(A) ⊥ P = = . | { } 0 1 | { } R(A) 0 1 2 1 1 1/2 1/2 h i   h i       Example. 1 I, a = 0 (aI)+ = a = a+I. 0I, a =6 0 

Here are algebraic properties of a pseudoinverse.

Proposition. Let A B( , ) have closed range with pseudo-inverse A+ : . Then A+ B( , ) ∈ X Y Y→X • ∈+ Y X (A+) = A • A+AA+ = A+ • AA+A = A • (A?)+ = (A+)? • (A+A)? = A+A = A? (A+)? • A+ = (A?A)+ A? = A? (AA?)+ • + ? 1 ? ? A = (A A)− A if A A is invertible • + ? ? 1 ? A = A (AA )− if AA is invertible • In finite-dimensional spaces, a “simple” formula is given in terms of the SVD of A. • Proof. (Exercise) ...... L6.19: + ? 1 ? ? ? 1 A = lim [A A + εI]− A = lim A [AA + εI]− , ε 0+ ε 0+ → → where the limits represent convergence with respect to what norm? ?? 6.20 c J. Fessler, December 21, 2004, 13:3 (student version)

...... One property that is missing is (CB)+ = B+C+, unlike with adjoints and inverses. 6 However, L 6.21 claims that if B is onto and C is one-to-one, then (CB)+ = B+C+. Example.

? 1 + ? ? 1 1/2 1 ? + ? 1 ? C = [1 1], C = , C = C (CC )− = , B = , B = [1 0], B = (B B)− B = [1 0]. 1 1/2 0       In this case, CB = 1, but B+C+ = 1/2. Example. Something involving DTFT/filtering following DTFT analysis. Example. Downsampling by averaging. (Handwritten notes.)

Example. (p.165)

From regularization design in tomography, form a LS approximation of the form: f(φ) 3 α cos2 φ π k . ≈ k=0 k − 4 But those cos terms are linearly dependent! P 

More generally, if the xi’s may be linearly dependent. how to we work with

n

min y αixi . α Cn − ∈ i=1 X Cn n Define A : by α = i=1 αixi, where y, xi . If minimizing α→Hnot unique, we could choose the one of∈H minimum norm. Here R(A) is closed since it isP a finite-dimensional subspace of . So A+ exists and A+ = (A?A)+A?, where G = A?A is simplyH the n n Gram matrix. So the minimum norm LS solution is × αˆ = A+y = (A?A)+A?y.

Since G is (Hermitian) symmetric nonnegative definite, it has an orthogonal eigenvector decomposition G = QDQ0, and one can + + show that G = QD Q0. c J. Fessler, December 21, 2004, 13:3 (student version) 6.21

Analysis of the DTFT Given a discrete-time signal g(n), i.e., g : Z C, the discrete-time Fourier transform or DTFT is “defined” in introductory signal processing books as follows: →

∞ ıωn G = g G(ω) = g(n) e− . (6-2) F ⇐⇒ n= X−∞ This is an infinite series, so for a rigorous treatment we must find suitable normed spaces in which we can establish convergence. The natural family of norms is ` , for some 1 p . Why? What about the doubly infinite sum? ?? p ≤ ≤ ∞ The logical meaning of the above definition is really

N ıωn G = g G = lim N g, where GN = N g GN (ω) , g(n) e− , where N N. (6-3) F ⇐⇒ N F F ⇐⇒ ∈ →∞ n= N X−

Alternatively, one might also try to show that = limN N , where the limit is with respect to the operator norm in F →∞ F B(`p, r[ π, π]) for some p and r, but this is in fact false! (See below.) SinceL G− (ω) is only a finite sum, clearly it is always well defined. • N Furthermore, being a finite sum of complex exponentials, GN (ω) is continuous in ω, and hence Lebesgue integrable on [ π, π]. • 2N+1 − So we could write N : R 1[ π, π] or perhaps more usefully: N : `p r[ π, π] for any 1 p, r . To elaborate, note thatF by Holder’s¨ → inequality: L − F → L − ≤ ≤ ∞

N N ıωn ∞ 1 1/p GN (ω) = g(n) e− g(n) = g(n) 1 n N g p 1 n N = g p (2N + 1) − . | | ≤ | | | {| |≤ }| ≤ k k {| |≤ } q k k n= N n= N n= X− X− X−∞

Thus π 1/r r 1/r 1 1/p N g r = GN r = GN (ω) dω (2π) g p (2N + 1) − . (6-4) kF k k k π | | ≤ k k Z−  1/r Furthermore, for p = 1 the upper bound is achieved when g(n) = δ[n], so N 1 r = (2π) . |||F1/r ||| → 1 1/p Thus N B(`p, r[ π, π]) for any 1 p, r , and N p r (2π) (2N + 1) − . • F ∈ L − ≤ ≤ ∞ |||F ||| → ≤ Remark. f [a, b] = f r[a, b] if < a < b < and r 1. • ∈ L∞ ⇒ ∈ L −∞ ∞ ≥ But to make (6-2) rigorous we must have normed spaces in which the limit in (6-3) exists. Non-convergence of the operators Note: treating : ` [ π, π], by considering g (n) = δ[n (M + 1)] we have for N >M: FN p → Lr − 0 − ıω(M 1) 1/r N M p r = sup ( N M )g r ( N M )g0 r = e− − 0 = (2π) . |||F −F ||| → g : g 1 k F −F k ≥ k F −F k − r k kp≤

So N is not Cauchy (and hence not convergent) in B(`p, r[ π, π]), no matter what p or r values one chooses. So we{F must} analyze convergence of the spectra G (ω) = Lg,− rather than convergence of the operators themselves. N FN FN 6.22 c J. Fessler, December 21, 2004, 13:3 (student version)

`1 analysis

Proposition. If g ` , then g is Cauchy in [ π, π] for any 1 r . ∈ 1 {FN } Lr − ≤ ≤ ∞ Proof.

If g `1, then defining ∈ I(N,M) , n Z : min(N,M) < n max(N,M) (6-5) { ∈ | | ≤ } and G = g we have N FN N M ıωn ıωn ıωn G (ω) G (ω) = g(n) e− g(n) e− = g(n) e− | N − M | − n= N n= M n I(N,M) X− X− ∈ X

g(n) g(n) 0 as N,M . (6-6) ≤ | | ≤ | | → → ∞ n I(N,M) n >min(N,M) ∈ X | | X R R R So for each ω , the sequence GN (ω) N∞=1 is Cauchy in , and hence convergent by the completeness of , provided g `1. Thus for each ∈ω, G (ω) converges{ pointwise} to some limit, call it G(ω), where (6-2) is shorthand for that limit. ∈ { N } Furthermore, when g ` : ∈ 1 GN GM = sup GN (ω) GM (ω) 0 as N,M . k − k∞ ω π | − | → → ∞ | |≤ So the sequence of function GN is Cauchy in [ π, π], which is complete, so GN converges to a limit G [ π, π]. { } L∞ − { } ∈ L∞ − More generally, using (6-6):

π r r GN GM r = GN (ω) GM (ω) dω 2π g(n) 0 as N,M , k − k π | − | ≤ | | → → ∞ Z− n >min(N,M) | | X so GN is Cauchy in r[ π, π] for any 1 r . 2 ...... { } L − ≤ ≤ ∞ ...... Thus, due to completeness, GN converges to a limit G r[ π, π]. So we can define the DTFT{ operator} : ` [ π, π]∈by L − F 1 → Lr −

g , lim N g. (6-7) F N F →∞

1/r Proposition. B(`1, r[ π, π]) for any 1 r with 1 r = (2π) . F ∈ L − ≤ ≤ ∞ |||F||| → Proof. Linearity of follows from linearity of . For g ` : F FN ∈ 1 1/r g r = lim N g = lim N g r lim N 1 r g 1 = (2π) g 1 , kF k N F r N kF k ≤ N |||F ||| → k k k k →∞ →∞ →∞ using (6-4). Equality is achieved when g(n) = δ[n]. 2 c J. Fessler, December 21, 2004, 13:3 (student version) 6.23

`2 analysis

Unfortunately, `1 analysis is a bit restrictive; the class of signals is not as broad as we might like, and for least-squares problems we would rather work in `2. This will allow us to apply Hilbert space methods.

However, if a signal is in `2, it is not necessarily in `1, so the above `1 analysis does not apply to many signals in `2. So we need a different approach.

Proposition. If g ` , then g is Cauchy in [ π, π]. ∈ 2 {FN } L2 − Proof. If g ` ,then using G = g: ∈ 2 N FN 2 2 π N M π 2 ıωn ıωn ıωn GN GM 2 = g(n) e− g(n) e− dω = g(n) e− dω k − k π − π Z− n= N n= M Z− n I(N,M) X− X− ∈ X π ıω(n m) = g(n)g∗(m) e− − dω = g(n)g ∗(m)2π 1 n=m π { } n I(N,M) m I(N,M) Z− n I(N,M) m I(N,M) ∈ X ∈X ∈ X ∈X = 2π g(n) 2 2π g(n) 2 0 as N,M , | | ≤ | | → → ∞ n I(N,M) n >min(N,M) ∈ X | | X since g ` . Thus g is Cauchy in [ π, π]. 2 ∈ 2 {FN } L2 −

Since [ π, π] is complete, G is convergent (in the sense!) to some limit G [ π, π], and the expression in (6-2) is L2 − { N } L2 ∈ L2 − again a reasonable “shorthand” for that limit, whatever it may be, and now we can define : `2 2[ π, π] via (6-7). In other words, F → L − π 2 2 GN GM 2 = GN (ω) G(ω) dω 0. k − k π | − | → Z− This is often called mean square convergence.

Proposition. B(`2, 2[ π, π]) with 2 2 = √2π. F ∈ L − |||F||| → Proof. Linearity of is easily shown. Since G is convergentF it is bounded. In fact { N } 2 π N N N π N 2 ıωn ıω(n m) 2 2 GN 2 = g(n) e− dω = g(n)g∗(m) e− − dω = 2π g(n) 2π g 2 , k k π π | | ≤ k k Z− n= N n= N m= N Z− n= N X− X− X− X− so g √2π g and hence √2π. kF k2 ≤ k k2 |||F||| ≤ Furthermore, if we consider g(n) = δ[n], then G(ω) = 1. 2 π √ 2 Thus G 2 = π 1 dω = 2π which achieves the upper bound above. Hence 2 2 = 2π. k k − |||F||| → R 6.24 c J. Fessler, December 21, 2004, 13:3 (student version)

Adjoint Since B(` , [ π, π]) and both ` and [ π, π] are Hilbert spaces, has an adjoint: F ∈ 2 L2 − 2 L2 − F π π ıωn ıωn ∗ ? g, S 2[ π,π] = S∗(ω) g(n) e− dω = g(n) S(ω) e dω = g, S `2 hF iL − h F i π n ! n π Z− X X Z−  ? π ıωn Z where x = S x(n) = π S(ω) e dω, n . So the adjointF is almost⇐⇒ the same− as the inverse DTFT∀ (defined∈ below). R And of course we know that ? B( [ π, π], ` ). F ∈ L2 − 2 Range

To apply the Hilbert space methods, we would like R( ) to be closed in 2[ π, π]. It suffices for R( ) to be onto 2[ π, π], since of course [ π, π] itself is closed. F L − F L − L2 −

Proposition. The DTFT : ` [ π, π] defined by (6-2) is onto [ π, π]. F 2 → L2 − L2 − 1 ıωk Proof. Let e , k Z denote the family of functions e (ω) = e− . k ∈ k √2π Recall that e is a complete orthonormal basis for [ π, π] [4, p. 62]. { k} L2 − Thus, by Parseval’s relation we have for any G [ π, π]: ∈ L2 −

1 ıωk G = G, ek ek, i.e., G(ω) = G, ek e− , h i h i √2π Xk Xk where G, e 2 = G 2 . | h ki | k k2 Xk Thus if we define g(k) = G, ek /√2π then g `2 and G = g, so G R( ). Since G [ π, π] wash arbitrary,i we conclude∈R( ) = [ Fπ, π]. ∈ F 2 ∈ L2 − F L2 −

DTFT is (almost) unitary One can show easily that u v u v , 2[ π,π] = 2π , 2 . hF F iL − h i Thus (since is linear and invertible and hence an isomorphism), the normalized DTFT U = 1/√2π is unitary. F F So [ π, π] and ` are unitarily equivalent. L2 − 2 c J. Fessler, December 21, 2004, 13:3 (student version) 6.25

Inverse DTFT Define a “partial inverse DTFT” operator : [ π, π] ` by RN L2 − → 2 π 1 ıωn 1 gN = N G gN (n) = G(ω) e dω 1 n N = G, en 1 n N . R ⇐⇒ 2π π {| |≤ } √2π h i {| |≤ } Z−

Proposition. If G [ π, π], then G is Cauchy in ` . ∈ L2 − {RN } 2 Proof. 1 1 G G = G, e 2 G, e 2 0 as N,M . kRN − RM k 2π | h ki | ≤ 2π | h ki | → → ∞ k I(N,M) k >min(N,M) ∈ X | | X 2

Since `2 is complete, N G converges to some limit g `2 and we define G to be that limit: G , limN N G. {R } ∈ R R →∞ R

Proposition. B( [ π, π], ` ) and = 1/√2π. R ∈ L2 − 2 |||R||| Proof. π 2 2 1 ıωn 1 2 1 2 1 2 N G 2 = G(ω) e dω = G, ek G, ek = G 2 . kR k 2π π 2π | h i | ≤ 2π | h i | 2π k k n N Z− k N k | X|≤ | X|≤ X

So 1/√2π and B( [ π, π], ` ). |||RN ||| ≤ RN ∈ L2 − 2 When G(ω) = 1 we have G 2 = δ[n] 2 = 1 and G 2 = 2π so = 1/√2π. kRN k k k k k |||RN ||| 1 Proof 2. G = limN N G limN G . Consider G = 1 to show equality. 2 kR k2 →∞ kR k2 ≤ →∞ √2π k k2

1 Proposition. = I`2 and = I 2 , so − = , where I denotes the identity operator for Hilbert space . RF FR L F R H H Proof. Exercise. 2 6.26 c J. Fessler, December 21, 2004, 13:3 (student version)

Convolution revisited Using time-domain analysis, we showed previously that if h ` and Ax = h x then A B(` , ` ). ∈ 1 ∗ ∈ p p 1 We have shown B(`2, 2[ π, π]) and − B( 2[ π, π], `2). Consider the “band-limiting”F ∈ L linear− operatorFD :∈ [ Lπ, π−] [ π, π] defined by L2 − → L2 − x(ω), ω π/2 y = Dx y(ω) = | | ≤ ⇐⇒ 0, otherwise.  Clearly D B( [ π, π], [ π, π]) and in fact D = 1. ∈ L2 − L2 − ||| ||| 1 Now consider A , − D . We previously showed in the analysis of the composition of operators that ST S T . F F 1 1 ||| ||| ≤ ||| |||||| ||| So A B(` , ` ) with A − D 1 √2π = 1. ∈ 2 2 ||| ||| ≤ |||F |||||| ||||||F||| ≤ √2π · · But this A represents an ideal lowpass filter, i.e., convolution with h(n) = 1 sinc n . But this h / ` . 2 2 ∈ 1 Evidently, the convolution operator, at least in `2, has a looser requirement than h `1. ∈ In contrast, in ` , h `1 is both necessary and sufficient for Ah B(` , ` ). ∞ ∈ ∈ ∞ ∞ In `2, a necessary and sufficient condition is that the frequency response be bounded.

Proposition. Ah B(`2, `2) (with Ah = H ) h < , i.e., h [ π, π] . ∈ ||| ||| k k∞ ⇐⇒ kF k∞ ∞ F ∈ L∞ − Proof. ( =) Suppose h is finite. Then using the convolution property of the DTFT and Parseval: ⇐ kF k∞ π 2 1 2 1 2 2 1 2 2 2 h x 2 = HX 2 = H(ω)X(ω) dω H X 2 = H x 2 , k ∗ k 2π k k 2π π | | ≤ k k∞ 2π k k k k∞ k k Z−

Ah h = H . The upper bounded is achieved when h(n) = δ[n]. ||| ||| ≤ kF k∞ k k∞ (= ) If H is unbounded, then forall T there exists an interval over which H T . Choose x to be a signal whose spectrum is an indicator⇒ function on that interval, and then x h T x , so A would be≥ be unbounded. Take contrapositive. 2 k ∗ k2 ≥ k k2 h Continuous-time case Fourier transform convolution Young’s inequality c J. Fessler, December 21, 2004, 13:3 (student version)

1. P. Enflo. A counterexample to the approximation problem in Banach spaces. Acta Math, 130:309–17, 1973. 2. I. J. Maddox. Elements of . Cambridge, 2 edition, 1988. 3. A. W. Naylor and G. R. Sell. Linear in engineering and science. Springer-Verlag, New York, 2 edition, 1982. 4. D. G. Luenberger. Optimization by vector space methods. Wiley, New York, 1969. 5. J. Schauder. Zur theorie stetiger abbildungen in funktionenrumen. Math. Zeitsch., 26:47–65, 1927. 6. L. Grafakos. Classical and modern Fourier analysis. Pearson, NJ, 2004. 7. P. P. Vaidyanathan. Generalizations of the sampling theorem: Seven decades after Nyquist. IEEE Tr. Circ. Sys. I, Fundamental theory and applications, 48(9):1094–109, September 2001. 8. A. M. Ostrowski. Solution of equations in Euclidian and Banach spaces. Academic, 3 edition, 1973. 9. R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput. System. Sci., 12(1):108–21, 1976. 10. M. Rosenlicht. Introduction to analysis. Dover, New York, 1985. 11. A. R. De Pierro. On the relation between the ISRA and the EM algorithm for positron emission tomography. IEEE Tr. Med. Imag., 12(2):328–33, June 1993. 12. A. R. De Pierro. On the convergence of the iterative image space reconstruction algorithm for volume ECT. IEEE Tr. Med. Imag., 6(2):174–175, June 1987. 13. A. R. De Pierro. Unified approach to regularized maximum likelihood estimation in computed tomography. In Proc. SPIE 3171, Comp. Exper. and Num. Meth. for Solving Ill-Posed Inv. Imaging Problems: Med. and Nonmed. Appl., pages 218–23, 1997. 14. J. A. Fessler. Grouped coordinate descent algorithms for robust edge-preserving image restoration. In Proc. SPIE 3170, Im. Recon. and Restor. II, pages 184–94, 1997. 15. A. R. De Pierro. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography. IEEE Tr. Med. Imag., 14(1):132–137, March 1995. 16. J. A. Fessler and A. O. Hero. Penalized maximum-likelihood image reconstruction using space-alternating generalized EM algorithms. IEEE Tr. Im. Proc., 4(10):1417–29, October 1995. 17. M. W. Jacobson and J. A. Fessler. Properties of MM algorithms on convex feasible sets. SIAM J. Optim., 2003. Submitted. # 061996. 18. P. L. Combettes and H. J. Trussell. Method of successive projections for finding a common point of sets in metric spaces. J. Optim. Theory Appl., 67(3):487–507, December 1990. 19. F. Deutsch. The convexity of Chebyshev sets in Hilbert space. In A. Yanushauskas Th. M. Rassias, H. M. Srivastava, editor, Topics in polynomials of one and several variables and their applications, pages 143–50. World Sci. Publishing, River Edge, NJ, 1993. 20. M. Jiang. On Johnson’s example of a nonconvex Chebyshev set. J. Approx. Theory, 74(2):152–8, August 1993. 21. V. S. Balaganskii and L. P. Vlasov. The problem of convexity of Chebyshev sets. Russian Mathematical Surveys, 51(6):1127– 90, November 1996. 22. V. Kanellopoulos. On the convexity of the weakly compact Chebyshev sets in Banach spaces. Israel Journal of Mathematics, 117:61–9, 2000. 23. A. R. Alimov. On the structure of the complements of Chebyshev sets. Functional Analysis and Its Applications, 35(3):176– 82, July 2001. 24. Y. Bresler, S. Basu, and C. Couvreur. Hilbert spaces and least squares methods for signal processing, 2000. Draft. 25. M. Vetterli and J. Kovacevic. Wavelets and subband coding. Prentice-Hall, New York, 1995. 26. D. C. Youla and H. Webb. Image restoration by the method of convex projections: Part I—Theory. IEEE Tr. Med. Imag., 1(2):81–94, October 1982. 27. M. Unser and T. Blu. Generalized smoothing splines and the optimal discretization of the wiener filter. IEEE Tr. Sig. Proc., 2004. in press.