<<

Chapter 3

Hilbert

Contents

Introduction ...... 3.2 Innerproducts...... 3.2 Cauchy-Schwarzinequality ...... 3.2 Inducednorm ...... 3.3 Hilbertspace...... 3.4 Minimum optimization problems ...... 3.5 Chebyshevsets ...... 3.5 Projectors ...... 3.6 ...... 3.6 Orthogonalcomplements ...... 3.11 Orthogonalprojection ...... 3.12 Directsum...... 3.13 Orthogonalsets ...... 3.14 Gram-Schmidtprocedure ...... 3.15 Approximation...... 3.17 Normalequations ...... 3.17 Grammatrices...... 3.18 Orthogonalbases ...... 3.18 Approximation and Fourier ...... 3.18 Linearvarieties ...... 3.20 Dualapproximationproblem ...... 3.20 Applications...... 3.22 Fourierseries ...... 3.24 Complete orthonormal / countable orthonormal bases ...... 3.26 ...... 3.26 Vectorspaces ...... 3.27 Normedspaces...... 3.27 Innerproductspaces...... 3.28 Summary ...... 3.28 Minimum distance to a ...... 3.29 Projection onto convex sets ...... 3.30 Summary ...... 3.31

3.1 3.2 c J. Fessler, November 5, 2004, 17:8 (student version)

Key missing geometrical concepts: and orthogonality (“right ”). 3.1 Introduction We now turn to the subset of normed spaces called Hilbert spaces, which must have an inner product. These are particularly useful spaces in applications/analysis. Why not introduce Hilbert first then? For generality: it is helpful to see which properties are general to vector spaces, or to normed spaces, vs which require additional assumptions like an inner product. Overview inner product • orthogonality • orthogonal projections • applications • least-squares minimization • orthonormalization of a • General forms of things you have seen before: Cauchy-Schwarz, Gram-Schmidt, Parseval’s theorem

3.2 Inner products Definition. A pre-Hilbert , aka an , is a defined on the field = R or = C, along with an inner product , : , which must satisfy the followingX axioms x, y F, α .F 1. x, y = y, x ∗ (Hermitianh· ·i X×X→F), where ∗ denotes . ∀ ∈X ∈F 2. hx + yi , zh = xi , z + y, z (additivity) 3. hαx, y =i α xh, y i(scalingh )i 4. hx, x i 0 andh x,i x = 0 iff x = 0. (positive definite) h i ≥ h i Properties of inner products Bilinearity property:

α x , β y = α β∗ x , y . i i j j i j h i j i * i j + i j X X X X Lemma. In an inner product space, if x, y = 0 for all y, then x = 0. Proof. Let y = x. h i 2

Cauchy-Schwarz Lemma. For all x,y in an inner product space,

x, y x, x y, y = x y (see induced norm below), |h i| ≤ h i h i k k k k with equality iff x and y are linearly dependent.p p Proof. For any λ the positive definiteness of , ensures that ∈F h· ·i 0 x λy, x λy = x, x λ y, x λ∗ x, y + λ 2 y, y . ≤ h − − i h i − h i − h i | | h i If y = 0, the inequality holds trivially. Otherwise, consider λ = x, y / y, y and we have h i h i 0 x, x y, x 2 / y, y . ≤ h i−|h i| h i Rearranging yields y, x x, x y, y = x y . The proof about equality|h conditionsi| ≤ h is Problemih i 3.1k. k k k 2 p This result generalizes all the “Cauchy-Schwarz inequalities” you have seen in previous classes, e.g., vectors in Rn, random variables, discrete- and continuous-time signals, each of which corresponds to a particular inner product space. c J. Fessler, November 5, 2004, 17:8 (student version) 3.3

Angle Thanks to this inequality, we can generalize the notion of the angle between vectors to any general inner product space as follows: x, y θ = cos−1 |h i| , x, y = 0. x y ∀ 6 k k k k This definition is legitimate since the argument of cos−1 will always be between 0 and 1 due to the Cauchy-Schwarz inequality.

Induced norm

Proposition. In an inner product space ( , , ), the induced norm x = x, x is indeed a norm. X h· ·i k k h i Proof. What must we show? p The first axiom ensures that x, x is real. • x 0 with equality iff x =h 0 followsi from Axiom 4. • kαxk ≥= αx, αx = α x, αx = α αx, x ∗ = αα∗ x, x ∗ = α x, x = α x , using Axioms 1 and 3. • k k h i h i h i h i | | h i | | k k The only condition remaining to be verified is the : x + y 2 = x, x + x, y + y, x + y, y • p p p p k k p h i h i h i h i = x 2 + 2 real( x, y ) + y 2 x 2 + 2 x, y + y 2 x 2 + 2 x y + y 2 = ( x + y )2 . 2 (Recallk k if z = a +h ıb, theni ak=k real(≤ kz) k √a2|h+ b2 =i| z k.) k ≤ k k k k k k k k k k k k ...... ≤ ...... | | ...... Any inner product space is necessarily a normed space. Is the reverse true? Not in general. The following property distinguishes inner product spaces from mere normed spaces. Lemma. (The law.) In an inner product space:

x + y 2 + x y 2 = 2 x 2 + 2 y 2 , x, y . (3-1) k k k − k k k k k ∀ ∈X Proof. Expand the norms into inner products and simplify. 2 x

x-y x+y

y Remarkably, the converse of this Lemma also holds (see, e.g., problem [2, p. 175]).

Proposition. If ( , ) is a normed space over C or R, and its norm satisfies the (3-1), then is also an inner product space,X withk·k inner product: X 1 x, y = x + y 2 x y 2 + i x + iy 2 i x iy 2 . h i 4 k k − k − k k k − k − k   Proof. homework challenge problem. Continuity of inner products

Lemma. In an inner product space ( , , ), if xn x and yn y, then xn, yn x, y . Proof. X h· ·i → → h i → h i

x , y x, y = x , y x, y + x, y x, y x , y x, y + x, y x, y |h n ni − h i| |h n ni − h ni h ni − h i| ≤ |h n ni − h ni| |h ni − h i| = x x, y + x, y y |h n − ni| |h n − i| x x y + x y y by Cauchy-Schwarz ≤ k n − k k nk k k k n − k x x M + x y y since y is convergent and hence bounded ≤ k n − k k k k n − k n 0 as n . → → ∞ Thus x , y x, y . 2 h n ni → h i 3.4 c J. Fessler, November 5, 2004, 17:8 (student version)

Examples Many of the normed spaces we considered previously are actually induced by suitable inner product space. Example. In , the usual inner product (aka “”) is

n x, y = a b , where x = (a ,...,a ) and y = (b ,...,b ). h i i i 1 n 1 n i=1 X Verifying the axioms is trivial. The induced norm is the usual `2 norm. Example. For the space ` over the complex field, the usual inner product is1 x, y = a b∗. 2 h i i i i The Holder¨ inequality, which is equivalent to the Cauchy-Schwarz inequality for this space, ensures that x, y x 2 y 2 . So the inner product is indeed finite for x, y ` . Thus ` is not only a , itP is also an inner|h producti| space. ≤ k k k k ∈ 2 2 Example. What about ` for p = 2? Do suitable inner products exist? p 6 Consider = R2, with x = (1, 0) and y = (0, 1). X k·kp · The parallelogram law holds (for this x and y) iff 2(1 + 1)2/p = 2 12 + 2 12, i.e., iff 22/p = 2. · · Thus `2 is only inner product space in the `p family of normed spaces. Example. The space of measurable functions on [a, b] with inner product

b f, g = w(t)f(t)g∗(t) dt, h i Za where w(t) > 0, t is some (real) weighting function. Choosing w = 1 yields [a, b]. ∀ L2

Hilbert space Definition. A complete inner product space is called a . In other words, a Hilbert space is a Banach space along with an inner product that induces its norm. The addition of the inner product opens many analytical doors, as we shall see. The concept “complete” is appropriate here since any inner product space is a normed space. All of the preceding examples of inner product spaces were complete vector spaces (under the induced norm). Example. The following is an inner product space, but not a Hilbert space, since it is incomplete:

b R [a, b] = f : [a, b] R : Riemann f 2(t) dt < , 2 → ∞ ( Za ) with inner product (easily verified to satisfy the axioms): f, g = b f(t)g(t) dt . h i a 1Note that the conjugate goes with the second argument become of Axiom 3. I haveR heard that some treatments scale the second argument in Axiom 3, which affects where the conjugates go in the inner products. c J. Fessler, November 5, 2004, 17:8 (student version) 3.5

Minimum norm optimization problems Section 3.3 is called “the projection theorem” and it is about a certain type of minimum norm problem. Before focusing on that specific minimum norm problem, we consider the broad family of such problems. Consider the following problem, which arises in many applications such as in approximation problems: Given x in a normed space ( , ), and a subset S in , find “the” vector s S that minimizes x s . X k·k X ∈ k − k Example. Control subject to constraint. See Section 3.11. n Example. Least-squares estimation: minθ y i=1 θixi is equivalent to minm∈[{x1,...,xn}] y m ...... k − ...... k k − k ...... What questions should we ask about such problems?P Is there any best s? I.e., does there exist s? S s.t. x s? = d(x,S)? • ∈ k − k What answers do we have so far for this question? ?? If so, is s? unique? Answers thus far? ?? • How is s? characterized? (Better yet would be an explicit formula for s?.) •

Chebyshev sets One way to address such questions is “answer by definition.” Definition. A set S in a normed space ( , ) is called proximinal [16] iff x , there exists at least one point s S s.t. x s = d(x,S). X k·k ∀ ∈ X ∈ k − k Definition. In a normed space, a set S is called a Chebyshev set iff x , there exists a unique s S s.t. x s = d(x,S). ∀ ∈X ∈ k − k Fact. Any proximinal set is closed. (The points in S S do not have a closest point in S.) Fact. Any Chebyshev set is a proximinal set. − Fact. Any compact set is a proximinal set (due to Weierstrass theorem). Note that we have not said anything about inner products here, so why not study minimum norm problems in detail in Banach spaces? The answer is due to one of the most famous unsolved problems in analysis: characterization of Chebyshev sets in general Banach spaces and in infinite-dimensional Hilbert spaces. What is known includes the following. (See Deutsch paper [17], a scanned version of which is available on the course web page.) inner product space • In finite-dimensional Hilbert spaces, any Chebyshev set is closed, convex, and nonempty. • “Conversely,” in any inner product space, any complete and convex set is Chebyshev. (We will prove this later in 3.12). normed• space • Are all Chebyshev sets convex? In general: no. A nonconvex Chebyshev set (in an incomplete infinite-dimensional normed • space within the space of finite-length sequences) is given in [18]. In [3, p. 285], an example is given in a Banach space of a closed (and thus complete) subspace (hence convex) that is not a • Chebyshev set. There is continued effort to characterize Chebyshev sets, e.g., [19–21]. Since the characterization of Chebyshev sets is unsolved in normed spaces, we focus primarily on closed convex sets in inner product spaces hereafter. Rn Rn However, this fact is encouraging. If S is a nonempty closed subset of , then x : arg miny∈S x y is nonunique has Lebesgue zero [16, p. 493]. ∈ k − k  Are all complete and bounded sets also proximinal? The answer is yes in finite-dimensional normed spaces since there all closed and bounded sets are compact. But the answer is “not necessarily” in infinite dimensional normed spaces, even in a Hilbert space in fact.

Example. Let S = (1 + 1/n)en : n N `p. Then S is bounded, and is complete since there are no “non-trivial” Cauchy sequences in S. Since{ d(0, (1 + 1/n)e∈)=1+1} ⊂ /n, we have d(0,S) = 1, yet there is no s S for which s 0 = 1. n ∈ k − k 3.6 c J. Fessler, November 5, 2004, 17:8 (student version)

closed

proximinal compact complete

Projectors If S is a Chebyshev set in a normed space ( , ), then we can define a projector P : S that, for each point x , gives the closest point P(x) S. In other words,X fork·k a Chebyshev set S, we can define legitimatelyX → ∈X ∈ P(x) = arg min x s , s∈S k − k and “arg min” is well defined since there exists a unique minimizer when S is Chebyshev.

When needed for clarity, we will write PS rather than just P . Such a projector satisfies the following properties. P(x) S, x S • x P(∈ x) ∀ ∈x y , y S, i.e., x P(x) = d(x,S) • kP(P(−x)) =k P( ≤x k) or− morek concisely:∀ ∈ P2k=− P. k • As noted above, closedness of S is a necessary condition for existence of a projector defined on all of ...... X ...... Example. Consider = R and S = [0, ). This is a Chebyshev set with P(x) = max x, 0 ...... X ∞ ...... { } ...... Example. Consider = R2 and the (compact) set K = y R2 : y = 1 (the unit circle). There is not a unique minimizer of the distance to x =X 0, the center of the unit circle. ∈ k k  This why there is not a plethora of papers on “projections onto compact sets” (POKS?) methods.

3.3 Orthogonality Definition. In an inner product space, two vectors x,y are called orthogonal iff x, y = 0, which is denoted x y. (This is consistent with the earlier cos−1 definition of angle.) h i ⊥ Definition. A vector x is called orthogonal to a set S iff s S, x s, in which case we write x S. ∀ ∈ ⊥ ⊥ Definition. Two sets S and T in an inner product space ( , , ) are called orthogonal and written S T iff X h· ·i ⊥ s, t = 0, s S, t T . h i ∀ ∈ ∀ ∈

Exercise. Show x S = [y ,..., y ] iff x y , i = 1,...,n. ⊥ 1 n ⊥ i Lemma. () x y = x + y 2 = x 2 + y 2 . ⊥ ⇒ k k k k k k Proof. x + y 2 = x 2 + y 2 + 2 real( x, y ) . 2 k k k k k k h i The converse does not hold. Consider C and the vectors x = 1 and y = ı. Then x + y 2 = x 2 + y 2 = 2 but x and y are not since here x, y = xy∗ = ı = 0. k k k k k k h i − 6 c J. Fessler, November 5, 2004, 17:8 (student version) 3.7

We first consider the easiest version of the general minimum norm problem, where the set of interest is actually a subspace of . X

Theorem. (Pre-projection theorem) Let be an inner product space, M a subspace of , and x a vector in . IfX there exists a vector m M such that x Xm = d(x,M), thenX that m is unique. • 0 ∈ k − 0k 0 A necessary and sufficient condition that m0 M be the unique minimizing vector in M is that the error vector x m0 • be orthogonal to M. ∈ −

Proof. Claim 1. If m M (not necessarily unique) s.t. x m = d(x,M), then x m M. ∃ ∈ k − 0k − 0 ⊥ Pf. We show by contradiction. Suppose m M is a minimizer, but x m M. 0 ∈ − 0 ⊥ Then m M s.t. x m0, m = δ = 0, for some δ, where w.l.o.g. we assume m = 1. Consider∃ m∈ , m h+−δm M.i 6 k k 1 0 ∈ x m 2 = x m δm 2 = x m 2 δ∗ x m , m δ m, x m + δ 2 = x m 2 δ 2 < x m 2 , k − 1k k − 0 − k k − 0k − h − 0 i− h − 0i | | k − 0k −| | k − 0k contradicting the assumption that m0 is a minimizer. Claim 2. If m M s.t. x m M, then m is a unique minimizer. ∃ 0 ∈ − 0 ⊥ 0 Pf. For any m M, since m m M, by the Pythagorean theorem: ∈ 0 − ∈ x m 2 = (x m ) + (m m) 2 = x m 2 + m m 2 > x m 2 k − k k − 0 0 − k k − 0k k 0 − k k − 0k if m = m0. So m0, and only m0, is the minimizer. 2 6 x

M

0 m0

In words: in an inner product space, if a subspace is proximinal, then it is Chebyshev. The “good thing” about this theorem is that it does not require completeness of , only the presence of an inner product. However, it does not prove the existence of a minimizer! X

Is this lack of an existence proof simply because “we” have not been clever enough to find it? Or, are there (incomplete) inner product spaces in which no such minimizer exists? (We cannot find an example drawing 2d or 3d pictures, since En is complete!)

Example. Consider the Hilbert space `2 with the incomplete (and hence non-closed) subspace M that consists of sequences with only finitely many nonzero terms, and consider x = (1, 1/2, 1/3, 1/4,...). For n N, let m be identical to x for the first n terms, and then zero thereafter. Then x m 2 = ∞ (1/k)2 , which ∈ n k − nk2 k=n approaches 0 as n , but mn converges to x / M. So d(x,M) = infm∈M x m = 0. → ∞ { } ∈ k − k P But clearly no m M achieves this minimum since x has an infinite number of nonzero terms. ∈ 3.8 c J. Fessler, November 5, 2004, 17:8 (student version)

But how about an example where is incomplete and M is a closed subspace? X Example. (See [3, p. 289].)

Consider the (incomplete) inner product space = (“finite-length” sequences, 2). Let u `2 denote the of reals u = 1/2i, and define the following (uncountablyX infinite) subspace: M = x k·k: ∞ x u∈ = 0 . i { ∈X i=1 i i } Claim 1. M is closed. P Suppose yn M and yn y . Then (borrowing the `2 inner product): y, u = y yn, u y yn 2 u 2 0 as n { (using} ∈ Cauchy-Schwarz→ ∈ X inequality) so y, u = 0 and hence y M|h. i| |h − i| ≤ k − k k k → → ∞ h i ∈ Claim 2. infm∈M x m is not achieved, where x = (1, 0, 0,...) / M but x . Suppose m M achievesk − k that infimum. Then by the pre-projection theorem,∈ z ,∈Xx m M. ∈ − ⊥ Since z , z = (z1,...,zN , 0, 0,...) for some N N. ∈X 1 1 ∈ Define wi = ei eN+1 for i = 1,...,N where ei denotes the standard basis vectors for `2. Since wi M, ui − uN+1 { } ∈ z, w = 0. But therefore z = 0, i = 1,...,N, so z = 0, i.e., x = m M. This contradicts the fact x / M. h ii i ∈ ∈

To establish existence of a minimizer, we make a stronger assumption: completeness of the subspace. Or, more frequently, we assume the inner product space is complete, so that all closed subspaces within it are complete. Why? ??

Theorem. (The classical projection theorem) Let M be a complete subspace of an inner product space (e.g., M may be a closed subspace of a Hilbert space). H For any x , there exists a unique m0 M such that x m0 = d(x,M), i.e., M is Chebyshev. • Furthermore,∈H a necessary and sufficient condition∈ that mk −M bek the unique minimizer is that x m M. • 0 ∈ − 0 ⊥

Proof. Uniqueness and the characterization of m0 in terms of orthogonality was established in the pre-projection theorem, so we focus on existence. Clearly, if x M, then m = x and we are done. For x / M, δ = d(x,M) > 0. Why? If d(x,M) were zero, then we could find ∈ 0 ∈ a sequence mi M such that d(x, mi) 0, meaning mi x, but that would imply x M because M is closed, contradicting x / M. So δ >∈0. → → ∈ ∈ Let m denote a sequence of vectors in M such that x m < δ + 1/i, which is possible by definition of d(x,M). { i} k − ik Claim 1. m is Cauchy. { i} By the parallelogram law:

(m x) + (x m ) 2 + (m x) (x m ) 2 = 2 m x 2 + 2 x m 2 k j − − i k k j − − − i k k i − k k − jk so m m 2 = 2 m x 2 + 2 x m 2 4 x 1 (m + m ) 2 . k i − jk k i − k k − jk − k − 2 j i k However, 1 (m + m ) M, so x 1 (m + m ) δ. Thus 2 j i ∈ k − 2 j i k ≥ m m 2 2 m x 2 + 2 x m 2 4δ2 2δ2 + 2δ2 4δ2 = 0 as i, j . k i − j k ≤ k i − k k − jk − → − → ∞

Since m is Cauchy, and M is complete, m M s.t. m m . { i} ∃ 0 ∈ i → 0 Since norms are continuous,

x m0 = x lim mi = lim x mi lim δ + 1/i = δ = d(x,M) . k − k − i→∞ i→∞ k − k ≤ i→∞

2

Remark. The key step in the proof was (the clever use of) the parallelogram law, a defining property of inner product spaces. Remark. The proof uses only completeness of M, not of . We will use this generality in a subsequent example. H c J. Fessler, November 5, 2004, 17:8 (student version) 3.9

Polynomial approximation example Consider the problem of approximating the function x(t) = sin−1 t over the interval [-1,1] by a third-order polynomial. If we want the approximation to fit better at the ends than in the middle, then the following inner product space is reasonable:

1 = f : [ 1, 1] R : f is continuous , f, g = w(t)f(t)g(t) dt, where w(t) = 1 + t2. (Picture) X { − → } h i Z−1 Since x(t) is an odd function, the following subspace of suffices: X M = at + bt3 : a, b R . ∈  Is complete? ?? Is M? ?? X To find the best 3rd-order polynomial approximation, i.e., m? = arg minm∈M x m , we apply the projection theorem, which characterizes the minimizer through x m M. Denoting m (t) = ct + dt3k, then− k − ? ⊥ ? 1 w(t)(sin−1 t ct dt3)(at + bt3) dt = 0, a, b R. − − ∀ ∈ Z−1 Since aq+br = 0, a, b R q = r = 0, we can reduce the problem to the following finite-dimensional system of equations: ∀ ∈ ⇐⇒ 1 1 1 (1 + t2)t sin−1 t dt (1 + t2)t2 dt (1 + t2)t4 dt c −1 = −1 −1 . 1 (1 + t2)t3 sin−1 t dt 1 (1 + t2)t4 dt 1 (1 + t2)t6 dt d " R−1 # " R−1 R−1 #   Using MATLAB’s symbolicR toolbox for the integration yields:R R

−1 c 16/15 24/35 13/32 296/1027 = π = π. d 24/35 32/63 13/48 148/1027         296π 148π Thus m (t) = t + t3, so m˙ (0) = 296π/1027 0.91. ? 1027 1027 ? ≈ 3 The following figure shows x(t), the Taylor approximation t + t /3!, and the minimum norm approximation m?(t). Although the Taylor approximation fits the best near t = 0, the minimum norm approximation has a better overall fit.

2 f(t) = asin(t) Taylor approximation Minimum norm 1.5

1

0.5

0

−0.5

−1

−1.5

−2 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

It would be fair to argue that we did not really need the general version of the projection theorem for this problem. We could have 1 3 2 solved mina,b∈R −1 w(t)(x(t) at bt ) dt by conventional methods. The forthcoming Fourier series examples, where M is infinite dimensional, are (perhaps?)− more− compelling. R What about ? ?? k·k1 3.10 c J. Fessler, November 5, 2004, 17:8 (student version)

Complete subspaces versus Chebyshev subspaces The preceding theorems and examples might lead one to conjecture that completeness of a subspace M is a necessary condition for M to be Chebyshev. The next example shows otherwise. Example. Consider the (incomplete) inner product space consisting of sequences with finitely many nonzero terms, with the X usual `2 inner product, and define the subspace M = (a1, a2,...) : a1 = 0 . This subspace is closed, but is not complete for the same reasons that is incomplete. Nevertheless,{ M is Chebyshev,∈X and x =} (a , a , a ,...) = P (x) = (0, a , a ,...). X 1 2 3 ⇒ M 2 3 These complications are eliminated if we focus on Hilbert spaces. In a Hilbert space, completeness of a subspace becomes a necessary condition for the subspace to be Chebyshev.

Theorem. [22, p. 192] In a Hilbert space , a subspace M is Chebyshev if and only if M is complete. H

Proof. M complete = M Chebyshev follows from the projection theorem. M Chebyshev⇒= M closed = M is complete (since is also a complete normed space). ⇒ ⇒ H Summary The following Venn diagram summarizes the situation with subspaces in any inner product space. subspaces closed Chebyshev complete

In Hilbert spaces (including all finite-dimensional inner product spaces), the three ellipses coincide!

Example. In signal processing problems, the subspace of -limited (continuous-time) signals with a certain band- is important. Is this subspace complete? Consider = [R] and define M to be the subspace of all -integrable functions having supported X L2 ⊂X on the frequency interval [ νmax,νmax]. Since is a Hilbert space, the preceding theorem says that the question of whether M is complete is equivalent to determining− whether MX is Chebyshev. In this case, a simple way to answer that is to construct a projector for M. Given x , can we find a (unique, in the sense) band-limited function y such that ∈X L2 ? y x y x , y M. k ? − k ≤ k − k ∀ ∈ Applying Parseval’s theorem from :

∞ ∞ y M = y x 2 = y(t) x(t) 2 dt = Y (ν) X(ν) 2 dν ∈ ⇒ k − k | − | | − | Z−∞ Z−∞ νmax = Y (ν) X(ν) 2 dν + X(ν) 2 dν . | − | | | Z−νmax Z|ν|>νmax Clearly this is minimized by taking y to be the (unique in sense) signal having Fourier transform ? L2 X(ν), ν ν Y (ν) = max ? 0, |otherwise| ≤ .  Since a projector exists, M is Chebyshev, and hence M is complete. Of course, in this case M is also convex. c J. Fessler, November 5, 2004, 17:8 (student version) 3.11

3.4 Orthogonal complements (The key to ) We saw in the projection theorem that an orthogonality condition characterizes the closest point in a complete subspace of an inner product space. We now examine orthogonality in more detail. In R2, we decompose any vector into a sum of an “x-component” vector and a “y-component vector,” the two of which are orthogonal. We can extend this concept considerably in general Hilbert spaces. Definition. If S is a subset of an inner product space ( , , ), then the of S is the subspace: X h· ·i S⊥ , x : x S . { ∈X ⊥ }

Clearly x S⊥ x S • y ∈ S = ⇐⇒y S⊥⊥. • ∈ ⇒ ⊥ Example. What is 0 ⊥? ?? { } Example. In 3-space, what is the orthogonal complement of the “cone” αx : α [0, ) for some x = 0? ?? { ∈ ∞ } 6

Proposition. If S and T are subsets of an inner product space ( , , ), then the following hold. (a) S⊥ is a closed subspace of . X h· ·i (b) S S⊥⊥ When equal? SeeX next theorem! (c) If S⊆ T , then T ⊥ S⊥. (d) S⊥⊥⊥⊆ = S⊥ ⊆ ⊥ (f) S⊥ = S = S⊥

Proof.  (a) S⊥ is a subspace since linear combinations of vectors orthogonal to S are also orthogonal to S. ⊥ Suppose xn S with xn x . Since xn, s = 0, s S, by continuity of the inner product we have: x, s ={ lim} ∈ x , s = lim→ ∈Xx , s =h 0, soix S⊥∀. ∈ h i h n→∞ n i n→∞h n i ∈ (b) y S = y S⊥ = y S⊥⊥, so S S⊥⊥. ∈ ⇒ ⊥ ⇒ ∈ ⊆ (c) If S T , then y T ⊥ = y x, x T = y x, x S = y S⊥. So T ⊥ S⊥. ⊆ ∈ ⇒ ⊥ ∀ ∈ ⇒ ⊥ ∀ ∈ ⇒ ∈ ⊆ (d) From above S⊥ S⊥⊥⊥. Also S S⊥⊥ so by 3rd property: S⊥⊥⊥ S⊥. Thus S⊥⊥⊥ = S⊥. ⊆ ⊆ ⊆ (f) From (a), S⊥ = S⊥, which is the second equality. Now pick any x S. If y S, then s S s.t. s y. Since x S, ⊥ ∈ ∃ n ∈ n → ⊥ we have x sn, n, so x, y = x, limn→∞ sn = limn→∞ x, sn = 0. Thus x S = x S. ⊥ ∀ h i ⊥ h i h ⊥ i ⊥ ⊥⇒ ⊥ Since x was arbitrary, S⊥ S . Furthermore, S S = S S⊥. Thus S⊥ = S . 2 ⊆ ⊆ ⇒ ⊆    Proposition. If S is a subset of a Hilbert space, then the following hold. (e) S⊥⊥ = [S], i.e., S⊥⊥ is the smallest closed subspace containing S. (g) S⊥ is complete

Proof. (e) See problem 3.9. ?? (g) follows since S⊥ is closed and a Hilbert space is complete. 2 Example. E2 with S = (1, 0) . Then S⊥ = [(0, 1)] , S⊥⊥ = [(1, 0)] = S, S⊥⊥⊥ = [(0, 1)] = S⊥. { } 6 Example. E2 with S = (1, 0), (2, 0) . and T = (1, 0) . Then S⊥ = [(0, 1)] = T ⊥, yet it is not the case that S T . So the converse of (c) above fails.{ } { } ⊂ 3.12 c J. Fessler, November 5, 2004, 17:8 (student version)

Orthogonal projection The projection theorem allows us to extend the geometric projection property of Euclidean space to general inner product spaces. Definition. Let M be a Chebyshev subspace of an inner product space . For each x , let P(x) be the unique point in M closest to x: X ∈ X P(x) = arg min x m . m∈M k − k Then P : M is called the orthogonal projection of onto M. When needed for clarity, we write P rather than just P. X → X M Lemma. If M is a Chebyshev subspace of an inner product space, then M ⊥ is a Chebyshev set, • ⊥ , 2 PM ⊥ (x) = PM (x) x PM (x), which is called the projection onto the orthogonal complement , • x = P (x) + P⊥ (x) where− P (x) M, P⊥ (x) M ⊥. • M M M ∈ M ∈ Proof. (Exercise) Recall the following properties of the projector P onto a Chebyshev subset S. P(x) S, • x P(∈ x) = d(x,S) • kP(P(−x)) =k P(x) (idempotent) • Here are some trivial properties of orthogonal projections (onto subspaces) that follow directly from the projection theorem. x P(x) M • P⊥−(x) M⊥⊥ • P(x) ∈P⊥(x) • x = P(⊥x) + P⊥(x) • Here are some less trivial properties.

Proposition. If M is a Chebyshev subspace of an inner product space , then the orthogonal projector P = PM has the following properties. X (a) P(x) = 0 iff x M (cf. earlier figure) (b) P : M is a⊥linear (cf. earlier figure) X → 0 (c) P = supx6=0 P(x) / x = 1 (see p. 105) (provided M is at least 1 dimensional, i.e., not simply ) (d) |||P is||| continuous,ki.e., xk k xk= P(x ) P(x). { } n → ⇒ n → Proof. (a) P(x) = 0 x M follows directly from the “characterization” part of the pre-projection theorem. ⇐⇒ ⊥ (b) P(x1) = m1 and P(x2) = m2 = x1 m1 M and x2 m2 M. Thus α(x m ) + β(x m ) ⇒M so−(αx +⊥ βx ) (αm− + β⊥m ) M. 1 − 1 2 − 2 ⊥ 1 2 − 1 2 ⊥ Thus P(αx1 + βx2) = αm1 + βm2 = α P(x) +β P(x2) by the pre-projection theorem. (c) x M = P(x) = x so P 1. x = x P(x) M = x P(x) P(x) since P(x) M. ∈ ⇒ ||| ||| ≥ ∈H ⇒ − ⊥ ⇒ − ⊥ ∈ So by Pythagorean: x 2 = x P(x) + P(x) 2 = x P(x) 2 + P(x) 2 P(x) 2 so P(x) / x 1. k k k − k k − k k k ≥ k k k k k k ≤ (d) Exercise. ?? x xn M P(x) 0 P(xn)

2

2 ⊥ ⊥ Notice the reuse of the symbol here. This is reasonable since PM⊥ =PM when M is a Chebyshev subspace in an inner product space. c J. Fessler, November 5, 2004, 17:8 (student version) 3.13

Direct sum Recall that if S,T are subsets of a common vector space , then S + T = s + t : s S, t T . However, in general, if x S + T , decompositions of the form x = s + t need notX be unique. { ∈ ∈ } ∈ Definition. A vector space is called the of subspaces M and N iff each x has a unique representation as x = m + n where m M andX n N. We notate this situation as follows: ∈ X ∈ ∈ = M N. X ⊕

Fact. If u ,..., u are a linearly independent set of vectors, then { 1 n} [ u ,..., u ] = [ u ] [ u ] { 1 n} { 1} ⊕ · · · ⊕ { n}

This is an algebraic concept, so we could have introduced it much earlier. But its main use is in the context of Hilbert spaces.

Theorem. If M is a Chebyshev subspace of an inner product space , then X = M M ⊥, and M ⊥⊥ = M. X ⊕

Proof. As shown previously, x = m + n where m = P(x) M and n = P⊥(x) M ⊥. ? ? ? ∈ ? ∈ (However, uniqueness of m? as the minimizer in M of x m does not alone ensure uniqueness of the decomposition m? +n?, so we must prove uniqueness next.) k − k Suppose x = m + n with m M and n M ⊥. Then 0 = x x = (m + n ) (m + n) = (m m) + (n n), but ∈ ∈ − ? ? − ? − ? − m m n n, so by the Pythagorean theorem: 0 = 0 2 = m m 2 + n n 2. Thus m = m and n = n. ? − ⊥ ? − k k k ? − k k ? − k ? ? Since M M ⊥⊥ was shown previously, we need to show M ⊥⊥ M. Suppose x⊆ M ⊥⊥. By the first part of this theorem, x = m + n⊆where m M M ⊥⊥ and n M ⊥. Since both x∈ and m are in M ⊥⊥, n = x m M ⊥⊥. But also n M ⊥, so∈ n ⊆n = 0 = n ∈= 0. Thus x = m M and hence M ⊥⊥ M−since∈x was arbitrary. ∈ ⊥ ⇒ 2 ∈ ⊆ ⊥ Corollary. For any subset S of a Hilbert space : = [S] [S] . X X ⊕ Summarizing previous results: In any inner product space, for any subset S we have S S⊥⊥. • In any inner product space, for any Chebyshev subspace⊆M we have M = M ⊥⊥. • Example. A subspace M in a Hilbert space where M = M ⊥⊥. 6 Take = ` and M = sequences with finitely many nonzero terms (not closed). Then M ⊥ = 0 and M ⊥⊥ = ` = M = M. X 2 { } { } 2 6 Example of a closed subspace in an inner product space where M = M ⊥⊥? (Exercise). ?? 6 3.14 c J. Fessler, November 5, 2004, 17:8 (student version)

Having established the fundamental theory of inner product spaces, we now move towards “applications:” Fourier series and other minimum norm problems like approximation. 3.5 Orthogonal sets Orthogonal sets of vectors in a Hilbert space (such as complex exponentials for ordinary Fourier series) are very useful in applica- tions such as signal analysis. Definition. A set S of vectors in an inner product space is called an orthogonal set iff

u, v S, u = v = u v. ∈ 6 ⇒ ⊥ If in addition each vector in S has unity norm, then S is called an orthonormal set. Remark. An orthogonal set can include the zero vector. An orthonormal set cannot. Example. [0, 1] with S = cos(2πkt) : k N (countable). L2 { ∈ } Do uncountable orthogonal sets exist? The example S = 1 : a (0, 1) fails since S 0 in . {t=a} ∈ ≡ L2  Proposition. In any inner product space, an orthogonal set of nonzero vectors is a linearly independent set.

Proof. Suppose u ,..., u S and n α u = 0 yet u = 0, i. Then { 1 n} ⊂ i=1 i i i 6 ∀ P n 0 = 0, u = α u , u = α u 2 , h ki i i k k k kk *i=1 + X so αk = 0, k = 1,...,n. Thus the vectors are linearly independent. Since n and the αi’s and ui’s were arbitrary, the set is linearly independent. 2 Fact. (Projection onto the span of a single vector.) Using a convenient shorthand:

hx, ui 2 u, u = 0 , kuk Pu(x) P[{u}](x) = 6 ( 0, u = 0.

hx, ui ∗ Proof. αu, x Pu(x) = α u, x 2 u = α ( u, x x, u ) = 0, so x Pu(x) [ u ] . h − i − kuk h i − h i − ⊥ { } Fact. If u ,..., u form anD orthogonal set, thenE { 1 n}

P (x) = Pu (x) + + Pu (x) . [{u1,...,un}] 1 · · · n Proof. Exercise. Fact. If M and N are orthogonal Chebyshev subspaces of an inner product space, then (Prob. 3.7) M + N = M N, • M N is a Chebyshev⊕ subspace, and • P ⊕ (x) = P (x) + P (x) . • M⊕N M N c J. Fessler, November 5, 2004, 17:8 (student version) 3.15

Gram-Schmidt procedure Since orthonormal sets are so convenient, it is fortunate the we can always create such sets by using the Gram-Schmidt orthogo- nalization procedure described in the proof of the following theorem. This is another generalization of a familiar method in finite to general inner product spaces.

Theorem. (Gram-Schmidt) If xi is a finite or countable sequence of linearly independent vectors in an inner product space ( , , ), then there exists an{ orthonormal} sequence e such that X h· ·i { i} [ e ,..., e ] = [ x ,..., x ], n N. { 1 n} { 1 n} ∀ ∈ Proof. Linearly independent vectors are necessarily nonzero, so x = 0. k ik 6 Take e = x / x , which clearly has unity norm and spans the same space as x . 1 1 k 1k 1 Form the remaining vectors recursively:

n−1 z = x x , e e , e = z / z , n = 2, 3,.... n n − h n ii i n n k nk i=1 X Being a of linearly independent vectors z is nonzero. And z e for i = 1,...,n 1 is easily verified. n n ⊥ i − Since we can write xn as a linear combination of the ei vectors, by an induction argument the span of e1,..., en equals the span of x ,..., x . { } 2 { 1 n} Note:

n−1 n−1 ⊥ ⊥ z = x x , e e = x Pe (x ) = x P e e − (x ) = P (x ) = P (x ) . n n − h n ii i n − i n n − [{ 1,..., n 1}] n [{e1,...,en−1}] n [{x1,...,xn−1}] n i=1 i=1 X X ......

Corollary. Any finite-dimensional inner product space has an .

Example. The polynomials x (t) = ti, i = 0, 1,..., are linearly independent in [ 1, 1]. i L2 − Why? ?? Applying Gram-Schmidt yields the :

1 1 x0(t) 1 1 1 2 2 e0(t) = = , z1(t) = x1(t) x1, e0 e0(t) = t t dt = t, z1 = t dt = 2/3, x0 √2 − h i − · √2 · √2 k k k k Z−1  Z−1 z (t) 3 2 e (t) = 1 = t, z (t) = x (t) x , e e (t) x , e e (t) = t2 , .... 1 z 2 2 2 − h 2 0i 0 − h 2 1i 1 − 3 k 1k r One can show by induction that 2n + 1 en(t) = Pn(t), n = 0, 1, 2,..., r 2 where the Pn(t) are the Legendre polynomials

1 dn P (t) = (t2 1)n. n 2nn! dtn −

Clearly some subset of these ei(t)’s will be an orthonormal basis for any finite-dimensional space of polynomials.

Is the entire collection some type of “basis” for 2[ 1, 1]? (It is not a Hamel basis for 2.) We will return to this question soon! L − L 3.16 c J. Fessler, November 5, 2004, 17:8 (student version)

Legendre polynomials on [−1,1] for n=0,...,5 2.5 2 1.5 1 0.5 (t)

n 0 e −0.5 −1 −1.5 −2 −2.5 −1 −0.5 0 0.5 1 t c J. Fessler, November 5, 2004, 17:8 (student version) 3.17

Approximation A previous example considered the problem of approximating arcsin(t) by a 3rd-order polynomial, which reduced to a 2 by 2 system of equations since only 2 of the 4 coefficients were relevant. We now consider such approximation problems more generally, and see that such reductions to a finite system of equations is the general behavior when the subspace M is finite dimensional. Suppose M is a finite-dimensional subspace of an inner product space ( , , ). Then by definition, M = [ y ,..., y ] for X h· ·i { 1 n} some vectors yi . Furthermore, being finite dimensional, M is complete, so by the projection theorem, M is a Chebyshev set. Thus, given an arbitrary∈X vector x , there exists a unique approximation xˆ M that is closest to x, as measured, of course, by the norm induced by the inner product:∈X ∈ x xˆ = d(x,M) = inf x m . k − k m∈M k − k Now we would like to find an explicit formula for xˆ, since “existence and uniqueness” alone is inadequate for most practical applications. 3.6 Normal equations Since xˆ M = xˆ = n α y , we must find the n scalars α that minimize x n α y . ∈ ⇒ i=1 i i { i} k − i=1 i ik The projection theoremPensures that xˆ exists, and is characterized by x xˆ M, or equivalentlyP x xˆ yj for j = 1,...,n. Thus − ⊥ − ⊥ n n 0 = x xˆ, y = x α y , y = x, y α y , y , j = 1,...,n. h − j i − i i j h j i − ih i ji * i=1 + i=1 X X Rearranging yields the n n system of linear equations for the coefficients, called the normal equations: × y , y y , y α x, y h 1 1i ··· h n 1i 1 h 1i ......  . . .   .  =  .  . y , y y , y α x, y  h 1 ni ··· h n ni   n   h ni        If the y ’s are vectors in Cm, with the usual inner product x, y = y0x, then defining the m n Y = [y . . . y ] we have i h i × 1 n α = (Y 0Y )−1Y 0x.

In particular, if n = 1, then xˆ = x, y/ y y/ y (cf. the picture we draw). h k ki k k Example. See previous polynomial approximation to arcsin(t). 3.18 c J. Fessler, November 5, 2004, 17:8 (student version)

Gram matrices The n n matrix above is called the of y ,..., y . × { 1 n} Its g(y1,..., yn) is called the Gram determinant.

To find the best αi’s, we must solve the above system of equations, which has a unique solution iff the Gram determinant is nonzero.

Proposition. The Gram determinant is nonzero iff the vectors y ,..., y are linearly independent. { 1 n} 0 Proof. If the yi’s are linearly dependent, then αi’s not all zero such that i αiyi = . Thus the rows of the Gram matrix are also linearly∃ dependent, so the determinant is zero. P Conversely, if the determinant is zero, then the rows of the Gram matrix are linearly dependent, so αi’s not all zero such that α y , y = 0 for all j. Thus α y , y = 0 for all j, so α∗ α y , y = 0.∃ Thus α y = 0 so i ih i j i h i i i j i j j h i i i j i k i i ik αiyi = 0 so the yi’s are linearly dependent. 2 Pi P P P P Remark.P If the yi’s are linearly dependent, then there are multiple solutions to the normal equations, all of which are equally good approximations. Often, at least in signal processing, of these many solutions we prefer the one that minimizes the Euclidean norm of (α1,...,αn). n However, no matter which solution for α we choose, when added up via xˆ = i=1 αiyi we will get the same xˆ since xˆ is unique by the projection theorem! Uniqueness of xˆ is different than uniqueness of α. P The text also describes the explicit error norm formula (which does not seem particularly useful):

g(y ,..., y , x) x xˆ 2 = 1 n . k − k g(y1,..., yn)

We see now that the approximation problem has an easily computed solution when M is a finite dimensional subspace. We will see later in 3.10 that the same is true if M ⊥ is finite dimensional!

Orthogonal bases

What happens if the yi’s are orthonormal? Then the Gram matrix is just the identity matrix, and we can immediately write down the optimal approximation:

n n xˆ = α y , α = xˆ, y , or equivalently xˆ = Py (x). i i i h ii i i=1 i=1 X X To generalize this result, we want to consider infinite dimensional approximating subspaces, since that is often of most interest in practice (e.g., ordinary Fourier series).

3.9 Approximation and Fourier series Returning to the case of finite-dimensional subspace M, we can find the minimum norm solution by first applying Gram-Schmidt to orthonormalize a (linearly independent) basis for M, and then use the Fourier series:

n xˆ = x, e e . h ii i i=1 X Thus applying Gram-Schmidt is equivalent to inverting the Gram matrix. Pick your poison... c J. Fessler, November 5, 2004, 17:8 (student version) 3.19

Weighted least-squares FIR filter design Example. Suppose we have a given desired frequency response D(ω) that we would like to approximate by a FIR filter with impulse response M h[n] = h[k] δ[n k] − kX=0 and corresponding frequency response

M H(ω) = h[k] e−ıωk 1, e−ıω , e−ı2ω ,..., e−ıMω . ∈ k=0 X   The natural inner product space here is [ π, π] but with a weighted inner product L2 − π H , H = H (ω)H∗(ω)W (ω) dω, h 1 2i 1 2 Z−π where the positive weighting function W (ω) can influence which frequency bands require the closest match between D(ω) and H(ω) since the induced norm is π D H 2 = D(ω) H(ω) 2 W (ω) dω . k − k | − | Z−π The Gram matrix G has elements π G = e−ıωk , e−ıωl = e−ıω(k−l) W (ω) dω, k,l = 0,...,M, kl h i Z−π −1 so the WLS optimal filter design has coefficients h = G d, where d = (d0,...,dM ) with

π d = D, e−ıωk = D(ω) eıωk W (ω) dω, k = 0,...,M. k h i Z−π

Because the complex exponentials e−ıωk are linearly independent, the Gram matrix is invertible. What if we want to minimize D H instead? Use Remez algorithm... k − k∞ 3.20 c J. Fessler, November 5, 2004, 17:8 (student version)

We now explore other minimum norm problems, in particular infinite dimensional ones where neither the normal equations nor the Fourier series solutions are applicable directly. In particular, we consider the broad family of problems involving linear varieties.

Linear varieties (from 2.3 and 3.10)

Definition. A subset V of a vector space is called a linear variety if V = x0 + M for some x0 and some subspace M of . Another term used is affine subspace.X ∈X X Exercise. If ( , ) is a vector space and V , then the following are equivalent. V is a linearX variety.F ⊂X • For any x? V , the set M = x x? : x V is a subspace. • x, y, z V,∈ α , α(x { y)− + z V∈. } • ∀ ∈ ∈F − ∈ Example. In R2, consider V = (a, b) : a + b = 1, a, b R . (A line not through the origin.) { ∈ } Exercise. If V = x0 + M, then V is complete / Chebyshev / closed iff M is complete / Chebyshev / closed.

Any subspace is a linear variety (take x0 = 0), so linear varieties are a small generalization of subspaces. • A single point x is a linear variety (take M = 0 ). • 0 { } Fact. The point x0 need not be unique. If V = x0 +M is a linear variety, then for any v0 V we can write V = v0 +M, because v = x + m for some m M, so if v V then v = x + m = v m + m = v ∈+ (m m ) where m m M. 0 0 0 0 ∈ ∈ 0 0 − 0 0 − 0 − 0 ∈

In a “variety” of problems one would like to find the x having minimum norm (e.g., minimum energy) subject to certain constraints. Example. Continuing the previous example, the following figure shows the v V with minimum norm. ? ∈ k·k2 V

v?

3.10 Dual approximation problem The following theorem shows that such problems always have a unique solution and characterizes that solution.

Theorem. (Modified projection theorem.) Let M be a Chebyshev subspace of an inner product space . If V = x0 + M is a linear variety where x0 , then there exists a unique v V having minimum norm, and v is characterizedX completely by the two conditions v V∈and X v M. ? ∈ ? ? ∈ ? ⊥

Proof. Simply translate V by x and apply the projection theorem: − 0

inf v = inf x0 + m = inf x0 m = x0 x? where x? = PM (x0) and x0 x? M. v∈V k k m∈M k k m∈M k − k k − k − ⊥

So use v , x x = P⊥ (x ) V and v M and v has minimum norm in V . 2 ? 0 − ? M 0 ∈ ? ⊥ ? c J. Fessler, November 5, 2004, 17:8 (student version) 3.21

Remark. Note that v M, not v V , cf. preceding figure. ? ⊥ ? ⊥

x0 x0

v? x?

V M M Why called dual? Perhaps:

x? = arg min x0 m = x0 v? where v? = arg min v . m∈M k − k − v∈V =x0+M k k

Exercise. Generalize to the problem arg min x v if possible. ?? v∈V k − k

One important application of linear varieties is in the study of problems with constraints. In particular, the projection theorem led to the very convenient normal equations for the case of finite-dimensional subspaces. But there are also problems where something akin to the normal equations still apply even though the problem appears to be infinite dimensional. Example. The set V in the previous example could be written V = x R2 : x, y = 1 , where y = (1, 1). ∈ h i The following proposition shows that such sets are always linear varieties.

Proposition. Let y ,..., y be a finite set of linearly independent vectors in an inner product space . { 1 n} X Then for given scalars c1, . . . , cn, the following set is a closed linear variety:

V = x : x, y = c . { ∈X h ii i} Moreover, there exists a unique x [y ,..., y ] such that V = x + [y ,..., y ]⊥. 0 ∈ 1 n 0 1 n Proof. The y ’s are linearly independent, so the Gram matrix is nonsingular, and there is a unique x [y ,..., y ] such that x , y = i 0 ∈ 1 n h 0 ii ci for i = 1,...,n. Thus V is nonempty and consists at least of this single point x0. ⊥ Claim. V = x0 + [y1,..., yn] .

x V x, y = c , i = 1,...,n ∈ ⇐⇒ h ii i x x , y = x, y x , y = c c = 0, i = 1,...,n ⇐⇒ h − 0 ii h ii − h 0 ii i − i x x [y ,..., y ] x x [y ,..., y ]⊥. ⇐⇒ − 0 ⊥ 1 n ⇐⇒ − 0 ∈ 1 n ⊥ ⊥ Thus V = x0 + [y1,..., yn] . Recall that orthogonal complements are closed, so [y1,..., yn] is closed. (We did not use completeness to show this, rather just the continuity of the inner product!) ... V is closed by a preceding Exercise. ⊥ ⊥ Suppose V = x1 + [y1,..., yn] . Then x0 V = x0 = x1 + n where n [y1,..., yn] . Thus x0, yi = x1 + n, yi = x , y = c for i = 1 ...,n, so x = x by∈ the uniqueness⇒ discussed above.∈ h i h i 2 h 1 ii i 1 0 Remark. In general a linear variety V can be infinite dimensional. However, for the specific type of V in the above proposition, we say V has codimension n since the orthogonal complement of the underlying subspace has n. 3.22 c J. Fessler, November 5, 2004, 17:8 (student version)

Applications Two types of linear varieties are of particular interest in optimization problems. V = x + n c x : c R , where the x ’s are linearly independent • { i=1 i i i ∈ } i V = x : x, yi = ci, i = 1,...,n • { ∈XP h i } Both reduce to finite dimensional problems thanks to the projection theorem.

Theorem. Let y ,..., y be a set of linearly independent vectors in an inner product space . Let { 1 n} X V = x : x, y = c , i = 1,...,n . { ∈X h ii i } n Then there exists a unique v? V with minimum norm. Moreover, v? = i=1 βiyi, where the βi’s satisfy the normal equations ∈ y , y y , y β P c h 1 1i ··· h n 1i 1 1 ......  . . .   .  =  .  . y , y y , y β c  h 1 ni ··· h n ni   n   n       

Remark. Note that V is necessarily nonempty by the previous Proposition, due to the . ⊥ Proof. From the previous proposition, V = x0 + M , where M = [y1,..., yn] and x0 M. Being finite dimensional, M is Chebyshev, so M ⊥ is also Chebyshev. ∈ So existence of a unique minimizing v? follows from the modified projection theorem. ⊥ ⊥⊥ ⊥ Likewise, v? M follows from that theorem. Thus v? M = M since M is Chebyshev. Since v M⊥, we have v = n β y for some β’s. ∈ ? ∈ ? j=1 j i We also need v? V , i.e., v? must satisfy the constraints v?, yi = ci or equivalently ∈ P h i n n β y , y = β y , y = c , i = 1,...,n, j j i j h j ii i *j=1 + j=1 X X leading to the normal equations. 2

Remark. Combining the projection theorem and the derivation of the normal equations yields the following theorem, which should be contrasted with the previous theorem.

Theorem. ((3.10-2) Really just a corollary to projection theorem.) If M = [y1,..., yn] is a finite-dimensional subspace of an inner product space , then given x , there exists a unique X n ∈ X x M s.t. x x = infm x m . Furthermore, x x M, and x = α y where ? ∈ k − ?k ∈M k − k − ? ⊥ ? i=1 i i y , y y , y α x, yP h 1 1i ··· h n 1i 1 h 1i ......  . . .   .  =  .  . y , y y , y α x, y  h 1 ni ··· h n ni   n   h ni       

As the nice figure at the bottom of p. 67 shows, if either M or M ⊥ is finite dimensional, then minimum norm problems reduce to a finite set of linear equations.

Skim 3.11: control problem example c J. Fessler, November 5, 2004, 17:8 (student version) 3.23

Example. (See [4, p. 66].) Consider the linear system t y(t) = y(0) + h(t,τ)u(τ) dτ Z0 where h C([0,T ] [0,T ]), ∈ × i.e., h(t,τ) is a real-valued function that is continuous on [0,T ] [0,T ], × Problem: find u( ) that minimizes T u(t) 2 dt subject to y(T ) = y . · 0 | | f This is a minimum energy control problem. R Solution. Let = [0,T ] with x, y = T x(t)y(t) dt. H L2 h i 0 Now h(T, ) [0,T ] since φ(t) = h(T,t) is a on the compact set · ∈ L2 R [0,T ]. Thus T y = y(0) + h(T,τ)u(τ) dτ = y(0) + h(T, ), u( ) f h · · i Z0 so we have the following constraint set (with codimension = 1):

V = u [0,T ] : y(t) = y = u [0,T ] : y y(0) = h(T, ), u( ) . { ∈ L2 f } ∈ L2 f − h · · i n “c1” “y1” o So in notation our problem is | {z } | {z } min u . u V k k ∈ By a previous theorem, there is a unique solution u?(t) that satisfies u?(t) = βh(T,t), where h(T, ), h(T, ) β = y y(0) (normal equation), h · · i f − so the general solution is y y(0) u?(t) = f − h(T,t). T 2 0 h (T,τ) dτ Interestingly, the system itself, throughR h(T,t), determines the form of the solution; this particular constraint affects only a scale factor. 3.24 c J. Fessler, November 5, 2004, 17:8 (student version)

3.7 Fourier series ∞ Recall that an infinite series of the form i=1 xi is said to converge to x in a normed space iff the sequence of partial sums n ∞ n sn = xi converges to x, in which case we write x = xi as short hand for x = limn→∞ xi. i=1 P i=1 i=1 Lemma.P If ei is an orthonormal sequence in an inner productP space, then P { } n 2 n 2 ciei = ci , ci , n N. | | ∀ ∈F ∀ ∈ i=1 i=1 X X

This lemma is a form of Parseval’s relationship . The following theorem gives necessary and sufficient conditions for convergence of an infinite series of orthogonal vectors.

∞ Theorem. If ei is an orthonormal sequence in a Hilbert space , then a series of the form i=1 ciei converges to some ∞{ } 2 H x iff ci < . In that case, we have ci = x, ei . ∈H i=1 | | ∞ h i P P Proof. Let s = n c e and β = n c 2. Note β is an increasing sequence. n i=1 i i n i=1 | i| { n} ∞ 2 Claim. sn is CauchyP iff ci

x, ei = lim sn, ei = lim sn, ei = lim ci = ci. h i n→∞ n→∞h i n→∞ D E 2 This “continuity of the inner product” technique is very useful in such proofs. The c = x, e values are called the Fourier coefficients of x w.r.t. e . i h ii { i} Remark. The above theorem does not yet ensure that any x in can be written as a Fourier series. That will come soon though. H c J. Fessler, November 5, 2004, 17:8 (student version) 3.25

Lemma. (Bessel’s inequality) If x is an element of an inner product space and e is an orthonormal sequence in that space, then { i} ∞ x, e 2 x 2 . |h ii| ≤ k k i=1 X Proof. Let c = x, e . i h ii n 2 n n n 2 2 0 x ciei = x ciei, x ciei = x ci , ≤ − * − − + k k − | | i=1 i=1 i=1 i=1 X X X X n 2 ∞ 2 so n N, c 2 x , so c 2 x . 2 ∀ ∈ i=1 | i| ≤ k k i=1 | i| ≤ k k ∞ Remark. Bessel’sP inequality guaranteesP that in a Hilbert space , there exists xˆ such that xˆ = i=1 x, ei ei, thanks to the preceding theorem. H ∈H h i P Now we need to characterize xˆ.

Theorem. If x is an element of a Hilbert space , and e is an orthonormal sequence in , then H { i} H ∞ xˆ , x, e e M , [ e ∞ ], h ii i ∈ { i}i=1 i=1 X which is called the closed subspace generated by the e ’s. Furthermore, x xˆ M. i − ⊥

Why do we need a above? ?? Proof. Convergence of the series follows from Bessel’s inequality and the preceding theorem. Clearly xˆ M since xˆ is the limit of partial sums s = n c e M, where c = x, e . ∈ n i=1 i i ∈ i h ii By continuity of the inner product: P

x xˆ, ei = x lim sn, ei = lim x sn, ei = lim ci ci = 0, h − i h − n→∞ i n→∞h − i n→∞ − so x xˆ [ e ∞ ]. Using (f) of proposition on orthogonal complements, we conclude x xˆ [ e ∞ ]. 2 − ⊥ { i}i=1 − ⊥ { i}i=1 ∞ Corollary. If M is a closed subspace of a Hilbert space and ei is an orthonormal sequence such that M = [ ei i=1], then P : M, the orthogonal projection, is given by H { } { } H → ∞ P (x) = x, e e . h ii i i=1 X ∞ Now the key question is when is [ ei i=1] = ? When the closed subspace generated by the ei’s is all of , then we can expand any vector in as a series of the e{ ’s} with theH Fourier coefficients. H H i 3.26 c J. Fessler, November 5, 2004, 17:8 (student version)

3.8 Complete orthonormal sequences / countable orthonormal bases See “review of bases”. Definition. An orthonormal sequence e in a Hilbert space is called complete (Luenberger) or a countable orthonormal { i} H basis (Naylor and others) iff the closed subspace generated by the e ’s is , i.e., iff = [ e ∞ ]. i H H { i}i=1

Lemma. If e is an orthonormal sequence in a Hilbert space , then the following are equivalent. { i} H ei is complete. • { } ∞ ⊥ 0 The only vector that is orthogonal to all of the ei’s is the zero vector, i.e., [ i=1 ei ] = • ∞ { } { } Every vector in has a Fourier series representation, i.e., x , x = x, ei ei. • H ∀ ∈H Si=1h i x , x 2 = ∞ x, e 2, which is called Parseval’s equality [23, p. 24]. • ∀ ∈H k k i=1 |h ii| P Proof. (Left to reader) P When does a Hilbert space have a countable orthonormal basis? When (and only when) is separable [23, p 21]. For a (complicated) example ofH a nonseparable Hilbert space, see [3, p. 230]. H Example. Completeness of the orthogonal polynomials in [ 1, 1]. L2 − Sketch of proof. see text It suffices to show that the only function that is orthogonal to all the polynomials is the zero function. n Suppose f t for all n = 0, 1,... for some f 2[ 1, 1]. Then its integral⊥ F (which is continuous) is also∈ orthogonal L − to the polynomials, by integration by parts. Use Weierstrass and Cauchy-Schwarz to show that F can be made arbitrarily small by choosing a sufficiently accurate polyno- mial approximation to F . So F must be zero, and hencek k f must be zero a.e. Example. Completeness of the complex exponentials e (t) = eikt/√2π for k = 0, 1, 2,... in [0, 2π]. k ± ± L2 see text Caution. Equality in is meant in the sense, i.e., x = y means x(t) = y(t) a.e. L2 L2 Remark. Completeness of the complex exponentials does not contradict Gibbs phenomena for truncated Fourier series. Com- pleteness in 2 implies that x sn 2 0, but it can still be the case that x sn ∞ does not go to zero, and indeed it does not for functionsL with discontinuities.k − k There→ is a big difference between convergencek − k in integrated squared error and pointwise convergence.

Remark. The Fourier series coefficients of x(t) = 1{t∈Q} are all zero.

Wavelets What major post-1969 topic is missing here? Wavelets, of course. There are both orthonormal wavelets and non-orthogonal wavelets. A set y (a Hilbert space) is called a iff A > 0, B < such that { n}∈H ∃ ∞ A x 2 x, y 2 B x 2 , x . k k ≤ |h ni| ≤ k k ∀ ∈H n X If the frame bounds A and B are equal, then we call the frame tight. In a tight frame [23, p. 27] 1 x 2 = x, y 2 k k A |h ni| n X 1 x = x, y y . A h ni n n X Despite how similar this last expression looks to the Fourier series representation for an orthonormal basis, the yn’s here need not be orthogonal, and in fact may be linearly dependent [23, p. 27,320], in which case we call it an overcomplete expansion. c J. Fessler, November 5, 2004, 17:8 (student version) 3.27

Bases

Vector spaces n Definition. A finite sum αixi for xi and αi is called a linear combination. i=1 ∈X ∈F Definition. If S is a subsetP of a vector space , then the span of S is the subspace of linear combinations drawn from S: X n [S]= x : x = αixi, for xi S, αi , and n N . ( ∈X i=1 ∈ ∈F ∈ ) X Definition. A set S of vectors is called linearly independent iff each vector in the set is linearly independent of the others, i.e., x S, x / [S x ] . ∀ ∈ ∈ − { } S can even be uncountable. Definition. A set S is called a Hamel basis [3, p 183] for iff S is linearly independent and [S]= . Luenberger says “finite set” but Naylor [3, p. 183] and MaddoxX [2, p. 74] do not. Let us agree toX use the above definition, rather than Luenberger’s. If S is linearly independent set in a vector space , then there exists a basis B for such that S B [3, p 184]. X X ⊆ Thus, every vector space has a Hamel basis, (since the empty set is linearly independent). However, “Hamel basis is not the only concept of basis that arises in analysis. There are concepts of basis that involve topological as well as linear structure. ... In applications involving infinite-dimensional spaces, a useful basis, if one even exists, is usually something other than a Hamel basis. For example, a complete orthonormal set is far more useful in an infinite-dimensional Hilbert space than a Hamel basis” [3, p. 183].

Normed spaces A (Hamel) basis for a Banach space is either finite or uncountably infinite [3, p. 218]. What is the closure of a span? If S is finite, then [S] is finite-dimensional so [S] = [S]. • ∞ If S is countably infinite, i.e., S = i=1 xi, then [S] contains (at least) all the formed • from S: ∞S ∞ [S] αixi : αi , αixi is convergent . ⊃ ( i=1 ∈F i=1 ) X X An example where [S] contains more than its convergent series is given in Naylor [3, p. 316], where S is a countable collection of linearly independent vectors!

Definition. In a normed space, xn is a for iff for each x , there exists a { }∈X∞ X ∈ X unique sequence λn such that x = λnxn [2, p. 98]. { } n=1 The famous Banach conjecture that everyP separable Banach space has a Schauder basis was shown to be incorrect by Elfon in 1973 [2, p. 100]. So for more satisfactory answers we need to turn to Hilbert spaces, which have better structure. 3.28 c J. Fessler, November 5, 2004, 17:8 (student version)

Inner product spaces Any finite-dimensional inner product space has an orthonormal basis (via Gram-Schmidt).

Definition. An orthonormal set S = xα is maximal iff there is no x0 such that S x0 is an orthonormal set. { } ∈X ∪ Definition. A maximal orthonormal set B in a Hilbert space is called an orthonormal basis. H

Theorem. If xn be an orthonormal set in a Hilbert space, then [3, p. 307]: { } xn is an orthonormal basis x = x, xn xn, x . { } ⇐⇒ n h i ∀ ∈X X

When S = xn is an orthonormal basis in Hilbert space , we have { } H ∞ ∞ 2 = [S]= αixi : αi < . H ( i=1 i=1 | | ∞) X X Orthonormal sets can be countable or uncountable, and the previous theorem applies to both cases [3, p. 314]. In engineering we usually work in separable Hilbert spaces (notably `2 and 2). L

Theorem. A Hilbert space has a countable orthonormal basis iff it is separable [3, p. 314].

In other words: Luenberger’s complete orthonormal sequence Naylor’s countable orthonormal basis ≡ and Naylor’s terminology is more common in signal processing literature, e.g., Vetterli’s book. Naylor notes that he deliberately avoids the term “complete” to describe certain orthonormal sets.

Summary In Hilbert spaces, we have two kinds of bases: Hamel bases, which are not very useful, (uncountable for infinite-dimensional spaces) • orthonormal bases, which are extremely useful. • c J. Fessler, November 5, 2004, 17:8 (student version) 3.29

3.12 Minimum distance to a convex set Thus far we have considered only minimum distances to subspaces and linear varieties. Many applications need broader sets.

Theorem. Let K be a nonempty complete convex subset in an inner product space (e.g., K may be a nonempty closed convex subset • of a Hilbert space.) Then K is Chebyshev: for any x , there existsX a unique vector k K such that ∈X ? ∈

x k? = d(x, K) = inf x k . k − k k∈K k − k

Let K be a convex Chebyshev subset in an inner product space . Then the projector PK for K is characterized in the • following necessary and sufficient sense: k = P (x) real(X x k , k k ) 0 for all k K. ? K ⇐⇒ h − ? − ?i ≤ ∈ Let K be a subset of an inner product space with the property that for every x , there exists a unique point k? such • that real( x k , k k ) 0 for all k XK. Then K is Chebyshev and k = P∈X(x) . h − ? − ?i ≤ ∈ ? K Proof.

First we prove existence. Let ki be a sequence in K such that x ki δ = d(x, K)...... { } ...... k − k → ...... Claim 1. ki is Cauchy. { } 2 By the parallelogram law, k k 2 = (k x) (k x) 2 = 2 k x 2 + 2 k x 2 4 x ki+kj . k i − j k k i − − j − k k i − k k j − k − − 2 2 ki+kj ki+kj Since K is convex, K, so x δ2. 2 ∈ − 2 ≥ k k 2 k x 2 k x 2 2 Thus i j 2 i + 2 j 4δ 0 as i, j . k − k ≤ k − k k − k − → → ∞ Since k is Cauchy and K is complete, k converges to some k K. { i} { i} ? ∈ By continuity of the norm, x k? = limi→∞ x ki = δ...... k − k k − ...... k ...... Claim 2. k? is unique. (Proof by contradiction) k?, n even Suppose k1 K also satisfies x k1 = δ. Then the sequence kn = is Cauchy by the same argument used ∈ k − k k1, n odd  for Claim 1, so kn is convergent, which can only happen if k? = k1...... { } ...... Claim 3. real( x k , k k ) 0 for all k K. (Proof by contradiction.) h − ? − ?i ≤ ∈ Suppose k K s.t. real( x k , k k ) = ε > 0. Define k = αk + (1 α)k K for α [0, 1] since K is convex. ∃ ∈ h − ? − ?i α − ? ∈ ∈ Define f(α) = x k 2, so f(0) = δ2. Now k − αk f(α) = x αk (1 α)k 2 = (x k ) α(k k ) 2 = δ2 + α2 k k 2 2α real( x k , k k ) k − − − ?k k − ? − − ? k k − ?k − h − ? − ?i and d 2 f(α) = αδ 2 real( x k?, k k? ) = 2ε < 0. dα − h − − i α=0 − α=0 2 Thus α > 0 s.t. f(α) < f(0) = δ , contradicting the minimizing norm property of k?...... ∃ ...... Claim 4. If real( x k , k k ) 0 for some k K, then x k x k . (So “is characterized by” means “iff.”) h − ? − ?i ≤ ∈ k − ?k ≤ k − k x k 2 = x k (k k ) 2 = x k 2 + k k 2 2 real( x k , k k ) x k 2 . k − k k − ? − − ? k k − ?k k − ?k − h − ? − ?i ≥ k − ?k 2 Remark. The convexity of K was used for both the existence and characterization parts. • However, Claim 4 did not use convexity, which explains the last item in the Theorem. • In this case the characterization is an inequality, which is usually harder to work with. • Although PK exists since K is Chebyshev, we do not have a general formula for it other than the “characterization” inequality. • Can we generalize further? No. In any finite-dimensional inner product space, K Chebyshev = K closed, convex, and • nonempty. ⇒ 3.30 c J. Fessler, November 5, 2004, 17:8 (student version)

Revisiting subspaces

What happens when K is in fact a subspace? Then for any k0 K we can pick k = k? k0 K and k = k? + k0 K to show that real( x k , k ) = 0. For complex subspaces we can also∈ pick k = k ık K− to show∈ that the imaginary∈ part is zero, h − ? 0i ? − 0 ∈ to conclude that x k? K. Conversely, if k? K and x k? K, then x k?, k k? = 0 for all k K. So we could have presented− ⊥ convex sets first, and∈ then special−ized⊥ to subspaces.h − − i ∈ Example. See text p. 71 for a somewhat unsatisfying example involving nonnegativity constraints.

Example. In R2, consider the half K = (a, b) R2 : a + b 0 , which is a closed convex set. By sketching this set and using geometric intuition, one∈ can conjecture≤ that the projector is given by  a + b x, x > 0 P ((a, b)) = (a p, b p), where p = , and [x] = K − − 2 + 0, otherwise.  + 

Note that [x] (x [x] ) = 0. + − + To verify that the above projector is correct, we can check it against the characterization condition in the preceding Theorem.

If x = (a, b) and k? = PK (x) then

x k , k k = (a, b) (a p, b p), (k , k ) (a p, b p) = (p,p), (k a + p, k b + p) h − ? − ?i h − − − 1 2 − − − i h 1 − 2 − i = 2p2 p(a + b) + p(k + k ) − 1 2 2p2 p(a + b) since k K = k + k 0 ≤ − ∈ ⇒ 1 2 ≤ a + b = 2p p = 0. − 2   Thus the above projector is correct.

Projection onto convex sets (POCS) In light of the previous theorem, if K is a nonempty closed convex subset of a Hilbert space , then K is Chebyshev and we can legitimately define a projection operator P : K by H K H →

PK (x) = arg min x k . k∈K k − k

It is not an orthogonal projection in general, i.e., x P (x) K, (unless of course K happens to be a subspace). − K ⊥ Indeed, in general this projection inherits only the trivial properties of projectors given previously: P (x) K • K ∈ PK (PK (x)) = PK (x) • x P (x) = d(x, K) • k − K k In addition, P ( ) is continuous [24] (homework problem). K · There are a variety of convex sets of interest in signal and image processing problems, such as The subspace of band-limited signals with a given band-limit. • The set of nonnegative signals. • The subspace of signals with a given time or spatial . • Because of such examples, POCS methods are (somewhat) useful in signal processing. A typical problem would be “find the signal with a given spatial support whose spectrum is given over certain frequency ranges only.”

n ˆ ˆ xj, xj 0 Example. In R consider K = x : xj 0, j = 1,...,n . Then if k = PK (x) we have kj = ≥ { ≥ } 0, otherwise. n  If k K then x kˆ, k kˆ = (xj kˆj )(kj kˆj ) = (xj 0)(kj 0) 0, as required, since kj 0. ∈ h − − i j=1 − − j:xj <0 − − ≤ ≥ ?? P P c J. Fessler, November 5, 2004, 17:8 (student version) 3.31

Summary Inner products and properties • Cauchy-Schwarz • induced norm • parallelogram law • continuity Orthogonality• • Pythagorean theorem • orthogonal complements • direct sum • orthogonal sets • Gram-Schmidt procedure • orthonormal bases Minimum• norm problems • (orthogonal) projections onto subspaces • normal equations • Fourier series • complete orthonormal sequences (countable orthonormal bases) • minimum norm within linear variety (constraints) • minimum distance to convex sets • c J. Fessler, November 5, 2004, 17:8 (student version)

1. P. Enflo. A counterexample to the approximation problem in Banach spaces. Acta Math, 130:309–17, 1973. 2. I. J. Maddox. Elements of . Cambridge, 2 edition, 1988. 3. A. W. Naylor and G. R. Sell. Linear in engineering and science. Springer-Verlag, New York, 2 edition, 1982. 4. D. G. Luenberger. Optimization by vector space methods. Wiley, New York, 1969. 5. J. Schauder. Zur theorie stetiger abbildungen in funktionenrumen. Math. Zeitsch., 26:47–65, 1927. 6. A. M. Ostrowski. Solution of equations in Euclidian and Banach spaces. Academic, 3 edition, 1973. 7. R. R. Meyer. Sufficient conditions for the convergence of monotonic mathematical programming algorithms. J. Comput. System. Sci., 12(1):108–21, 1976. 8. M. Rosenlicht. Introduction to analysis. Dover, New York, 1985. 9. A. R. De Pierro. On the relation between the ISRA and the EM algorithm for positron emission tomography. IEEE Tr. Med. Im., 12(2):328–33, June 1993. 10. A. R. De Pierro. On the convergence of the iterative image space reconstruction algorithm for volume ECT. IEEE Tr. Med. Im., 6(2):174–175, June 1987. 11. A. R. De Pierro. Unified approach to regularized maximum likelihood estimation in computed tomography. In Proc. SPIE 3171, Comp. Exper. and Num. Meth. for Solving Ill-Posed Inv. Imaging Problems: Med. and Nonmed. Appl., pages 218–23, 1997. 12. J. A. Fessler. Grouped coordinate descent algorithms for robust edge-preserving image restoration. In Proc. SPIE 3170, Im. Recon. and Restor. II, pages 184–94, 1997. 13. A. R. De Pierro. A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography. IEEE Tr. Med. Im., 14(1):132–137, March 1995. 14. J. A. Fessler and A. O. Hero. Penalized maximum-likelihood image reconstruction using space-alternating generalized EM algorithms. IEEE Tr. Im. Proc., 4(10):1417–29, October 1995. 15. M. W. Jacobson and J. A. Fessler. Properties of optimization transfer algorithms on convex feasible sets. SIAM J. Optim., 2003. Submitted. 16. P. L. Combettes and H. J. Trussell. Method of successive projections for finding a common point of sets in spaces. J. Optim. Theory Appl., 67(3):487–507, December 1990. 17. F. Deutsch. The convexity of Chebyshev sets in Hilbert space. In A. Yanushauskas Th. M. Rassias, H. M. Srivastava, editor, Topics in polynomials of one and several variables and their applications, pages 143–50. World Sci. Publishing, River Edge, NJ, 1993. 18. M. Jiang. On Johnson’s example of a nonconvex Chebyshev set. J. Approx. Theory, 74(2):152–8, August 1993. 19. V. S. Balaganskii and L. P. Vlasov. The problem of convexity of Chebyshev sets. Russian Mathematical Surveys, 51(6):1127– 90, November 1996. 20. V. Kanellopoulos. On the convexity of the weakly compact Chebyshev sets in Banach spaces. Israel Journal of Mathematics, 117:61–9, 2000. 21. A. R. Alimov. On the structure of the complements of Chebyshev sets. Functional Analysis and Its Applications, 35(3):176– 82, July 2001. 22. Y. Bresler, S. Basu, and C. Couvreur. Hilbert spaces and methods for signal processing, 2000. Draft. 23. M. Vetterli and J. Kovacevic. Wavelets and subband coding. Prentice-Hall, New York, 1995. 24. D. C. Youla and H. Webb. Image restoration by the method of convex projections: Part I—Theory. IEEE Tr. Med. Im., 1(2):81–94, October 1982.