Mollifiers. From R. Showalter’s book on PDEs.

(i) Mollifiers are C0∞ approximations of the delta-. (ii) Mollification is an operator of with a mollifier:

n fε(x) = Z f(x y)ϕε(y) dy , x R . Rn − ∈

n (iii) Mollification is a smoothening operator: fε(x) C (R ). ∈ ∞ (iv) Mollification is a bounded linear operator with norm 1. ≤ p (v) Any function in L (G), p = or C0(G) can be approximated by its 6 ∞ p mollification. Hence C0∞(G) is dense in L (G) and C0(G).

We shall begin with some elementary results concerning the approxima- tion of functions by very smooth functions. The “strong inclusion” K G between subsets of Euclidean space Rn means K is compact, G is⊂⊂ open, and K G. If A and B are sets, their Cartesian product is given by A B =⊂ [a, b]: a A, b B . If A and B are subsets of Kn (or any other× vector{ space) their∈ set∈ sum} is A + B = a + b : a A, b B . If G is not open, this definition can be extended if{ there exists∈ and∈ open} set O such that O¯ = G¯, by saying that K G if K O. For each ε > 0, let n ⊂⊂ ⊂⊂ ϕε C (R ) be given with the properties ∈ 0∞ n ϕε 0 , supp(ϕε) x R : x ε , Z ϕε = 1 . ≥ ⊂ { ∈ | | ≤ } Such functions are called mollifiers and can be constructed, for example, by taking an appropriate multiple of

2 2 1 exp( x ε )− , x < ε , ψε(x) =  0 , | | − |x| ε . | | ≥ Let f L1(G), where G is open in Rn, and suppose that the of f satisfies supp(∈ f) G. Then the from supp(f) to ∂G is a positive number δ. We extend⊂⊂ f as zero on the complement of G and denote the extension in L1(Rn) also by f. Define for each ε > 0 the mollified function

n fε(x) = Z f(x y)ϕε(y) dy , x R . (0.1) Rn − ∈

1 Lemma 0.1 For each ε > 0, supp(fε) supp(f) + y : y ε and n ⊂ { | | ≤ } fε C (R ). ∈ ∞ Proof : The second result follows from Leibniz’ rule and the representation

fε(x) = Z f(s)ϕε(x s) ds . −

The first follows from the observation that fε(x) = 0 only if x supp(f)+ y : y ε . Since supp(f) is closed and y : y 6 ε is compact,∈ it follows { | | ≤ } { | | ≤ } that the indicated set sum is closed and, hence, contains supp(fε).

p Lemma 0.2 If f C0(G), then fε f uniformly on G. If f L (G), ∈ → p ∈ 1 p < , then fε p f p and fε f in L (G). ≤ ∞ k kL (G) ≤ k kL (G) → Proof : The first result follows from the estimate

fε(x) f(x) Z f(x y) f(x) ϕε(y) dy | − | ≤ | − − | sup f(x y) f(x) : x supp(f) , y ε ≤ {| − − | ∈ | | ≤ } and the uniform continuity of f on its support. For the case p = 1 we obtain

fε 1 ZZ f(x y) ϕε(y) dy dx = Z ϕε Z f k kL (G) ≤ | − | · | | by Fubini’s theorem, since f(x y) dx = f for each y Rn and this R | − | R | | ∈ gives the desired estimate. If p = 2 we have for each ψ C0(G) ∈

Z fε(x)ψ(x) dx ZZ f(x y)ψ(x) dx ϕε(y) dy ≤ | − |

Z f 2 ψ 2 ϕε(y) dy = f 2 ψ 2 ≤ k kL (G)k kL (G) k kL (G)k kL (G) by computations similar to the above, and the result follows since C0(G) is dense in L2(G). (the result for p = 1 or 2 is proved as above but using the H¨olderinequality in place of Cauchy-Schwartz.)6 Finally we verify the claim of convergence in Lp(G). If η > 0 we have a g C0(G) with f g Lp η/3. The above shows fε gε Lp η/3 and we∈ obtain k − k ≤ k − k ≤

fε f Lp fε gε Lp + gε g Lp + g f Lp k − k ≤ k − k k − k k − k 2η/3 + gε g Lp . ≤ k − k 2 For ε sufficiently small, the support of gε g is bounded (uniformly) and − gε g uniformly, so the last term converges to zero as ε 0. →The preceding results imply the following. →

p Theorem 0.3 C0∞(G) is dense in L (G).

Theorem 0.4 For every K G there is a ϕ C0∞(G) such that 0 ϕ(x) 1, x G, and ϕ(x) =⊂⊂ 1 for all x in some neighborhood∈ of K. ≤ ≤ ∈

Proof : Let δ be the distance from K to ∂G and 0 < ε < ε + ε0 < δ. Let f(x) = 1 if dist(x, K) ε0 and f(x) = 0 otherwise. Then fε has its support within x : dist(x, K) ≤ ε + ε and it equals 1 on x : dist(x, K) ε ε , { ≤ 0} { ≤ 0 − } so the result follows if ε < ε0.

3 Distributions. From R. Showalter’s book on PDEs.

(i) Distributions are linear functionals on C0∞(G).

(ii) For a ‘nice’ function distributions are defined as integrals Tf (φ) = fφ RG (iii) A fundamental example of a distribution is the delta-function.

(iv) Distributions are not bounded functionals, but they are closed func- tionals.

(v) A of a distribution is a distribution, well-defined by integra- tion by parts.

(vi) A derivative of a distribution differs from from a regular derivative.

Recall

Cm(G) = f C(G): Dαf C(G) for all α m , { ∈ ∈ | | ≤ } α C∞(G) = f C(G): D f C(G) for all α , { ∈ ∈ } C∞(G) = f C∞(G) : supp(f) is compact . 0 { ∈ } All of them are linear spaces. Cm(G) is also a (check) with the norm

α f Cm(G) = max D f C(G), f C(G) = sup f(x) . || || α || || || || x G | | ∈

C∞(G) is not a Banach space, but it is a complete with the metric ∞ 1 f g Cn(G) ρ(f, g) = n || − || . X 2 1 + f g n n=1 || − ||C (G) C∞(G) is not even metrizable, but there is a (nonconstructive) ‘locally com- pact induced by the metric like C∞(G), that makes it complete.

4 0.1

A distribution on G is defined to be a conjugate-linear functional on C0∞(G). That is, C0∞(G)∗ is the linear space of distributions on G, and we also denote it by (G). D∗ 1 1 Example. The space Lloc(G) = L (K): K G of locally integrable functions on G can be identified∩{ with a subspace⊂⊂ of} distributions on G as in the Example of I.1.5. That is, f L1 (G) is assigned the distribution ∈ loc Tf C (G) defined by ∈ 0∞ ∗

Tf (ϕ) = Z fϕ¯ , ϕ C0∞(G) , (0.2) G ∈ where the Lebesgue integral (over the support of ϕ) is used. Theorem 0.3 1 shows that T : Lloc(G) C0∞(G)∗ is an injection. In particular, the (equiv- alence classes of) functions→ in either of L1(G) or L2(G) will be identified with a subspace of (G). D∗ 0.2 We shall define the derivative of a distribution in such a way that it agrees with the usual notion of derivative on those distributions which arise from α continuously differentiable functions. That is, we want to define ∂ : ∗(G) (G) so that D → D∗ α m ∂ (Tf ) = TDαf , α m , f C (G) . | | ≤ ∈ But a computation with integration-by-parts gives

α α TDαf (ϕ) = ( 1)| |Tf (D ϕ) , ϕ C∞(G) , − ∈ 0 and this identity suggests the following. Definition. The αth partial derivative of the distribution T is the distri- bution ∂αT defined by

α α α ∂ T (ϕ) = ( 1)| |T (D ϕ) , ϕ C∞(G) . (0.3) − ∈ 0 α α Since D L(C0∞(G),C0∞(G)), it follows that ∂ T is linear. Every distri- bution has∈ of all orders and so also then does every function, 1 e.g., in Lloc(G), when it is identified as a distribution. Furthermore, by the very definition of the derivative ∂α it is clear that ∂α and Dα are compatible with the identification of C (G) in (G). ∞ D∗ 5 0.3 We give some examples of distributions on R. Since we do not distinguish 1 the function f L (R) from the functional Tf , we have the identity ∈ loc ∞ f(ϕ) = Z f(x)ϕ(x) dx , ϕ C∞(R) . ∈ 0 −∞ (a) If f C1(R), then ∈

∂f(ϕ) = f(Dϕ) = Z f(Dϕ¯ ) = Z (Df)ϕ ¯ = Df(ϕ) , (0.4) − − where the third equality follows by an integration-by-parts and all others are definitions. Thus, ∂f = Df, which is no surprise since the definition of derivative of distributions was rigged to make this so. (b) Let the ramp and Heaviside functions be given respectively by

x , x > 0 1 , x > 0 r(x) = H(x) =  0 , x 0 ,  0 , x < 0 . ≤ Then we have

∞ ∞ ∂r(ϕ) = Z xDϕ¯(x) dx = Z H(x)ϕ ¯(x) dx = H(ϕ) , ϕ C0∞(G) , − 0 ∈ −∞ so we have ∂r = H, although Dr(0) does not exist. (c) The derivative of the non-continuous H is given by

∞ ∂H(ϕ) = Z Dϕ¯ =ϕ ¯(0) = δ(ϕ) , ϕ C0∞(G); − 0 ∈ that is, ∂H = δ, the Dirac functional. Also, it follows directly from the definition of derivative that

∂mδ(ϕ) = ( 1)m(Dmϕ)(0) , m 1 . − ≥ (d) Letting A(x) = x and I(x) = x, x R, we observe that A = 2r I and then from above obtain| | by linearity ∈ −

∂A = 2H 1 , ∂2A = 2δ . (0.5) − Of course, these could be computed directly from definitions.

6 (e) For our final example, let f : R K satisfy f R C∞( , 0], → | − ∈ −∞ f + C [0, ), and denote the jump in the various derivatives at 0 by |R ∈ ∞ ∞ m + m σm(f) = D f(0 ) D f(0−) , m 0 . − ≥ Then we obtain

0 ∞ ∂f(ϕ) = Z f(Dϕ) Z f(Dϕ) (0.6) − 0 − −∞ 0 ∞ + = Z (Df)ϕ ¯ + f(0 )ϕ(0) + Z (Df)ϕ ¯ f(0−)ϕ(0) 0 − −∞ = Df(ϕ) + σ0(f)δ(ϕ) , ϕ C∞(G) . ∈ 0

That is, ∂f = Df + σ0(f)δ, and the derivatives of higher order can be computed from this formula, e.g.,

2 2 ∂ f = D f + σ1(f)δ + σ0(f)∂δ , 3 3 2 ∂ f = D f + σ2(f)δ + σ1(f)∂δ + σ0(f)∂ δ .

For example, we have

∂(H sin) = H cos , · · ∂(H cos) = H sin +δ , · − · so H sin is a solution (generalized) of the ordinary differential equation · (∂2 + 1)y = δ .

0.4 Before discussing further the interplay between ∂ and D we remark that to claim a distribution T is “constant” on R, means that there is a number c K such that T = Tc, i.e., T arises from the locally integrable function whose∈ value everywhere is c:

T (ϕ) = c Z ϕ¯ , ϕ C0∞(R) . R ∈ Hence a distribution is constant if and only if it depends only on the mean value of each ϕ. This observation is the key to the proof of our next result.

7 Theorem 0.5 (a) If S is a distribution on R, then there exists another distribution T such that ∂T = S. (b) If T1 and T2 are distributions on R with ∂T1 = ∂T2, then T1 T2 is constant. −

Proof : First note that ∂T = S if and only if

T (ψ0) = S(ψ) , ψ C∞(R) . − ∈ 0 This suggests we consider H = ψ : ψ C (R) . H is a subspace of { 0 ∈ 0∞ } C0∞(R). Furthermore, if ζ C0∞(R), it follows that ζ H if and only if ζ = 0. In that case we have∈ ζ = ψ , where ∈ R 0 x ψ(x) = Z ζ , x R . ∈ −∞ Thus H = ζ C (R): ζ = 0 and this equality shows H is the kernel of 0∞ R the functional{ ∈ϕ ϕ on C (R}). (This implies H is a hyperplane, but we R 0∞ shall prove this directly.)7→ Choose ϕ0 C (R) with mean value unity: ∈ 0∞

Z ϕ0 = 1 . R

We shall show C (R) = H K ϕ0, that is, each ϕ can be written in exactly 0∞ ⊕ · one way as the sum of a ζ H and a constant multiple of ϕ0. To check ∈ the uniqueness of such a sum, let ζ1 + c1ϕ0 = ζ2 + c2ϕ0 with the ζ1, ζ2 H. ∈ Integrating both sides gives c1 = c2 and, hence, ζ1 = ζ2. To verify the existence of such a representation, for each ϕ C (G) choose c = ϕ and ∈ 0∞ R define ζ = ϕ cϕ0. Then ζ H follows easily and we are done. To finish− the proof of (a),∈ it suffices by our remark above to define T on H, for then we can extend it to all of C0∞(R) by linearity after choosing, e.g., T ϕ0 = 0. But for ζ H we can define ∈ x T (ζ) = S(ψ) , ψ(x) = Z ζ , − −∞ since ψ C0∞(R) when ζ H. Finally,∈ (b) follows by∈ linearity and the observation that ∂T = 0 if and only if T vanishes on H. But then we have

T (ϕ) = T (cϕ0 + ζ) = T (ϕ0)¯c = T (ϕ0) Z ϕ¯ and this says T is the constant T (ϕ0) K. ∈

8 Theorem 0.6 If f : R R is absolutely continuous, then g = Df defines → 1 g(x) for almost every x R, g Lloc(R), and ∂f = g in ∗(R). Conversely, ∈ ∈ 1 D if T is a distribution on R with ∂T Lloc(R), then T (= Tf ) = f for some absolutely continuous f, and ∂T = Df∈ .

Proof : With f and g as indicated, we have f(x) = x g + f(0). Then an R0 integration by parts shows that

Z f(Dϕ¯) = Z gϕ¯ , ϕ C∞(R) , − ∈ 0 so ∂f = g. (This is a trivial extension of (0.4).) Conversely, let g = ∂T L1 (R) and define h(x) = x g, x R. Then h is absolutely continuous and∈ loc R0 from the above we have ∂h = g. But∈ ∂(T h) = 0, so Theorem 0.5 implies that T = h + c for some constant c K, and− we have the desired result with f(x) = h(x) + c, x R. ∈ ∈

0.5 Finally, we give some examples of distributions on Rn and their derivatives. (a) If f Cm(Rn) and α m, we have ∈ | | ≤ α α α α α n ∂ f(ϕ) = ( 1)| | Z fD ϕ¯ = Z D f ϕ¯ = (D f)(ϕ) , ϕ C0∞(R ) . − Rn Rn · ∈ (The first and last equalities follow from definitions, and the middle one is a computation.) Thus ∂αf = Dαf essentially because of our definition of ∂α. (b) Let x x . . . x , if all x 0 , r(x) = 1 2 n j  0 , otherwise.≥ Then

∞ ∞ ∂1r(ϕ) = r(D1ϕ) = Z ... Z (x1 . . . xn)D1ϕ dx1 . . . dxn − − 0 0 ∞ ∞ = Z ... Z x2 . . . xn ϕ(x) dx1 . . . dxn . 0 0 Similarly, ∞ ∞ ∂2∂1r(ϕ) = Z ... Z x3 . . . xn ϕ(x) dx , 0 0 and (1,1,...,1) ∂ r(ϕ) = Z H(x)ϕ(x) dx = H(ϕ) , Rn

9 where H is the Heaviside function (= functional) 1 , if all x 0 , H(x) = j  0 , otherwise.≥ (c) The derivatives of the Heaviside functional will appear as distribu- tions given by integrals over subspaces of Rn. In particular, we have

∞ ∞ ∞ ∞ ∂1H(ϕ) = Z ... Z D1ϕ(x) dx = Z ... Z ϕ¯(0, x2, . . . , xn) dx2 . . . dxn , − 0 0 0 0 a distribution whose value is determined by the restriction of ϕ to 0 Rn 1, { }× − ∞ ∞ ∂2∂1H(ϕ) = Z ... Z ϕ¯(0, 0, x3, . . . , xn) dx3 . . . dxn , 0 0 a distribution whose value is determined by the restriction of ϕ to 0 0 Rn 2, and, similarly, { } × { } × − ∂(1,1,...,1)H(ϕ) = ϕ(0) = δ(ϕ) , where δ is the Dirac functional which evaluates at the origin. (d) Let S be an (n 1)-dimensional C1 in Rn and suppose f C (Rn S) with f −having at each point of S a limit from each side of ∈ ∞ ∼ S. For each j, 1 j n, we denote by σj(f) the jump in f at the surface ≤ ≤ S in the direction of increasing xj. (Note that σj(f) is then a function on S.) Then we have

∂1f(ϕ) = f(D1ϕ) = Z f(x)D1ϕ(x) dx − − Rn

= Z (D1f)(ϕ)(x) dx + Z ... Z σ1(f)ϕ(s) dx2 . . . dxn Rn where s = s(x2, . . . , xn) is the point on S which (locally) projects onto (0, x2, . . . , xn). Recall that a surface integral over S is given by

Z F ds = Z F sec(θ1) dA S A · n 1 when S projects (injectively) onto a region A in 0 R and θ1 is the { } × − angle between the x1-axis and the unit normal ν to S. Thus we can write the above as

∂1f(ϕ) = D1f(ϕ) + Z σ1(f) cos(θ1)ϕ ¯ dS . S

10 However, in this representation it is clear that the integral is independent of the direction in which S is crossed, since both σ1(f) and cos(θ1) change sign when the direction is reversed. We need only to check that σ1(f) is evalu- ated in the same direction as the normal ν = (cos(θ1), cos(θ2),..., cos(θn)). Finally, our assumption on f shows that σ1(f) = σ2(f) = = σn(f), and we denote this common value by σ(f) in the formulas ···

∂jf(ϕ) = (Djf)(ϕ) + Z σ(f) cos(θj)ϕ ¯ dS . S These generalize the formula (1.6). (e) Suppose G is an open, bounded and connected set in Rn whose bound- ary ∂G is a C1 manifold of dimension n 1. At each point s ∂G there − ∈ is a unit normal vector ν = (ν1, ν2, . . . , νn) whose components are direction cosines, i.e., νj = cos(θj), where θj is the angle between ν and the xj axis. ¯ n ¯ Suppose f C∞(G) is given. Extend f to R by setting f(x) = 0 for x / G. n ∈ ∈ In C0∞(R )∗ we have by Green’s second identity

n n n ∂2f (ϕ) = f D2ϕ¯ = (D2f)ϕ ¯ X j  Z X j  Z X j j=1 G j=1 G j=1

∂ϕ¯ ∂f n + Z f ϕ¯  dS , ϕ C0∞(R ) , ∂G ∂ν − ∂ν ∈ so the indicated distribution differs from the pointwise derivative by the functional ∂ϕ¯ ∂f ϕ Z f ϕ¯  dS , 7→ ∂G ∂ν − ∂ν where ∂f = f ν is the indicated (directional) normal derivative and ∂ν ∇ · f = (∂1f, ∂2f, . . . , ∂nf) denotes the gradient of f. Hereafter we shall also ∇let n ∆ = ∂2 n X j j=1 denote the Laplace differential operator in (Rn). D∗

11 Sobolev spaces. From R. Showalter’s book on PDEs.

(i) Sobolev spaces W m,p(G) are completions of Cm(G¯) with respect to Lp-norms.

m,p m (ii) Sobolev spaces W0 (G) are completions of C0 (G) with respect to Lp-norms.

m,q m,p (ii) Negative Sobolev spaces W − (G) are bounded distributions on W0 (G), p < . ∞ 0.6 Hilbert space case: p = 2. Let G be an open set in Rn and m 0 an integer. Recall that Cm(G¯) is the linear space of restrictions to G¯≥ of m n m ¯ functions in C0 (R ). On C (G) we define a scalar product by

α α (f, g)Hm(G) = Z D f D g : α m X G · | | ≤ and denote the corresponding norm by f Hm(G). Define Hm(G) to be the completionk ofk the linear space Cm(G¯) with the m norm Hm(G). H (G) is a which is important for much of our followingk · k work on boundary value problems. We note that the H0(G) norm and L2(G) norm coincide on C(G¯), and that we have the inclusions

2 C0(G) C(G¯) L (G) . ⊂ ⊂ 2 Since we have identified L (G) as the completion of C0(G) it follows that we must likewise identify H0(G) with L2(G). Thus f H0(G) if and only if ∈ 2 there is a fn in C(G¯) (or C0(G)) which is Cauchy in the L (G) { } norm and fn f in that norm. → m m Let m 1 and f H (G). Then there is a sequence fn in C (G¯) ≥ ∈ m α { 2 } such that fn f in H (G), hence D fn is Cauchy in L (G) for each → { } 2 multi-index α of order m. For each such α, there is a unique gα L (G) α ≤ 2 ∈ such that D fn gα in L (G). As indicated above, f is the limit of fn, → n so, f = gθ, θ = (0, 0,..., 0) R . Furthermore, if α m we have from an integration-by-parts ∈ | | ≤

α α α (D fn, ϕ) 2 = ( 1)| |(fn,D ϕ) 2 , ϕ C∞(G) . L (G) − L (G) ∈ 0

12 Taking the limit as n , we obtain → ∞ α α (gα, ϕ) 2 = ( 1)| |(f, D ϕ) 2 , ϕ C∞(G) , L (G) − L (G) ∈ 0 α 2 th so gα = ∂ f. That is, each gα L (G) is uniquely determined as the α partial derivative of f in the sense∈ of distribution on G. These remarks prove the following characterization.

Theorem 0.7 Let G be open in Rn and m 0. Then f Hm(G) if and m ≥ ∈ only if there is a sequence fn in C (G¯) such that, for each α with α m, α {2 } 2 | | ≤ the sequence D fn is L (G)-Cauchy and fn f in L (G). In that case α { α} 2 → we have D fn ∂ f in L (G). → Corollary Hm(G) Hk(G) L2(G) when m k 0, and if f Hm(G) then ∂αf L2(G) for⊂ all α with⊂ α m. ≥ ≥ ∈ ∈ | | ≤ We shall later find that f Hm(G) if ∂αf L2(G) for all α with α m Case 1 p . When∈p = 2 then the∈ norm of the Sobolev spaces| | ≤ is not induced≤ by a≤ scalar ∞ product,6 the construction is based on completion of Cm(G) functions and its derivatives in Lp(G)-space. If we use

(f, ϕ)L2(G) = Tf (ϕ) = Z fϕ¯dx G when ϕ C∞(G), then all the computations are literally the same as in the case of H∈m(G) Similar to Hm(G) case, f W m,p(G) if and only if there is m ∈ a sequence fn in C (G¯) such that, for each α with α m, the sequence α {p } p | |m,p ≤ D fn is L (G)-Cauchy and fn f in L (G). The W -norm is defined {as: } → p α p h f W m,p(G)i = Z D f : α m X G k k α m | | | | ≤ | |≤ 0.7 Case p = 2. The case p = 2 is identical unless mentioned explicitly. We m 6 m m define H0 (G) to be the closure in H (G) of C0∞(G). Generally, H0 (G) is a proper subspace of Hm(G). Note that for any f Hm(G) we have ∈ α α α (∂ f, ϕ) 2 = ( 1)| |(f, D ϕ) 2 , α m , ϕ C∞(G) . L (G) − L (G) | | ≤ ∈ 0 We can extend this result by continuity to obtain the generalized integration- by-parts formula α α α m m (∂ f, g) 2 = ( 1)| |(f, ∂ g) 2 , f H (G) , g H (G) , α m . L (G) − L (G) ∈ ∈ 0 | | ≤

13 m m This formula suggests that H0 (G) consists of functions in H (G) which vanish on ∂G together with their derivatives through order m 1. We shall make this precise in the following. − m m Since C0∞(G) is dense in H0 (G), each element of H0 (G)0 determines (by restriction to C0∞(G)) a distribution on G and this correspondence is an m injection. Thus we can identify H0 (G)0 with a space of distributions on G, and those distributions are characterized as follows.

m Theorem 0.8 H0 (G)0 is (identified with) the space of distributions on G which are the linear span of the set

∂αf : α m , f L2(G) . { | | ≤ ∈ }

Proof : If f L2(G) and α m, then ∈ | | ≤ α ∂ f(ϕ) f L2(G) ϕ Hm(G) , ϕ C∞(G) , | | ≤ k k k k 0 ∈ 0 α m so ∂ f has a (unique) continuous extension to H0 (G). Conversely, if T Hm(G) , there is an h Hm(G) such that ∈ 0 0 ∈ 0

T (ϕ) = (h, ϕ) m , ϕ C∞(G) . H (G) ∈ 0 α α α But this implies T = α m( 1)| |∂ (∂ h) and, hence, the desired result, P| |≤ − since each ∂αh L2(G). ∈ m m Remark. The space H− (G)0 is identified by definition with H0 (G). What happens when p = 2? One in general has be very careful with the 1 6 L∞-L duality, because

1 1 (i) The dual of L (G) is L∞(G), but the dual of L∞(G) is not L (G).

(ii) C0∞ is not dense in L∞(G), moreover L∞(G) is not a . m,q m,p Hence, we define W − (G) spaces as duals of W0 (G) only when p = . m,q m,p6 ∞ Another question is why we don’t define W − (G) as a dual of W (G)? m,p m,p The reason is that C0∞ is not dense in W (G), therefore W (G)0 may contain objects which are not distributions! Examples. a. Operator div p for p L2(Ω) as a bounded linear functional in 1 ∈ H− (Ω)

14 1 Since C0∞(Ω) is dense in H0 (Ω)

div p(φ) = Z p φdxφ C0∞(Ω) − Ω · ∇ ∈ By Cauchy-Schwartz this function is dominated by the sublinear function

p 2 φ 1 || ||L || ||H 1 Hence we can uniquely extend it to the whole H0 (Ω) by Hahn-Banach the- orem with

div p 1 = sup p φdx p 2 H− Z L || || φ 1 =1 Ω · ∇ ≤ || || || ||H b. Similarly operator curl v for p L2(Ω) as a bounded linear functional 1 ∈ in H− (Ω) ∂φ ∂φ (curl v(φ))ij = Z vi vj dx − Ω ∂xj − ∂xi We shall have occasion to use the two following results, each of which m m suggests further that H0 (G) is distinguished from H (G) by boundary values.

m n m n n Theorem 0.9 H0 (R ) = H (R ). (Note that the boundary of R is empty.)

n Proof : Let τ C0∞(R ) with τ(x) = 1 when x 1, τ(x) = 0 when x 2, and 0 ∈ τ(x) 1 for all x Rn. For each| | ≤ integer k 1, define | | ≥ ≤ n ≤ ∈ m n ≥ m n τk(x) = τ(x/k), x R . Then for any u H (R ) we have τk u H0 (R ) ∈ m n ∈ · ∈ and (exercise) τk u u in H (R ) as k . Thus we may assume u has compact support.· → Letting G denote a→ sphere ∞ in Rn which contains the support of u, we have from Lemma 0.2 of Section 1.1 that the mollified 2 α α α 2 functions uε u in L (G) and that (D u)ε = D (uε) ∂ u in L (G) for → n → m n each α with α m. That is, uε C (R ) and uε u in H (R ). | | ≤ ∈ 0∞ →

n Theorem 0.10 Suppose G is an open set in R with sup x1 :(x1, x2, . . . , xn) G = K < . Then {| | ∈ } ∞ 1 ϕ 2 2K ∂1ϕ 2 , ϕ H (G) . k kL (G) ≤ k kL (G) ∈ 0

15 1 Proof : We may assume ϕ C0∞(G), since this set is dense in H0 (G). Then integrating the identity ∈

2 2 2 D1(x1 ϕ(x) ) = ϕ(x) + x1 D1( ϕ(x) ) · | | | | · | | over G by the divergence theorem gives

2 Z ϕ(x) = Z x1(D1ϕ(x) ϕ¯(x) + ϕ(x) D1ϕ¯(x)) dx . G | | − G · ·

The right side is bounded by 2K D1ϕ L2(G) ϕ L2(G), and this gives the result. In the case p = 2 use k k k k 6

p p p D1(x1 ϕ(x) ) = ϕ(x) + x1 D1( ϕ(x) ). · | | | | · | | Since p p 1 p 1 ϕ(x) = ϕ(x)ϕ ¯(x) − =ϕ ¯(x)ϕ(x) − , | | in the divergence theorem we’ll use

p p 1 D1( ϕ(x) ) = pϕ¯(x) − D1ϕ(x) | | · and by H¨olderwith q = p/(p 1) − (p 1)/p p 1 p − p 1/p Z ϕ(x) − D1ϕ(x) h Z ϕ(x) i h Z D1ϕ(x) ] G | | | | ≤ G | | G | |

p 1 − = ϕ(x) Lp D1ϕ(x) Lp h|| || i || || we get p p 1 − ϕ(x) Lp 2K ϕ(x) Lp D1ϕ(x) Lp h|| || i ≤ h|| || i || || or ϕ(x) Lp 2K D1ϕ(x) Lp . || || ≤ || ||

16 Sobolev embedding theorem.

(i) A maps weakly convergent to strongly convergent sequences

m+1,p (ii) If fn converges weakly in W (G), then it converges strongly in W m,p(G).

Definition: Let V and W be two Banach spaces with V W . We say the space V is continuously embedded in W and write ⊂

V, W → if f W C f V for any f V. || || ≤ || || ∈ We say the space V is compactly embedded in W and write

V, , W →→ if it is continuously embedded in W and each bounded sequence in V has a convergent subsequence in W . Equivalent in reflexive spaces: each weakly convergent sequence in V is convergent in W . Remark: Sometimes it is useful to think about embeddings as linear operators A : V W . Definition: →Let V and W be two Banach spaces. Then the operator A : V W is continuous if →

Af W C f V for any f V. || || ≤ || || ∈

The operator A is compact if every bounded sequence fn has a weakly convergent subsequence Afn. Example of compact operators: Integral operators

K(f) = Z k(x, y)f(y)dyx G G ∈ are compact under certain smooth conditions on the kernel k(x, y). For example, if k(x, y) k(˜x, y˜) x x˜ + y y˜ , then they are compact on Lp 1 p | (check!).− | ≤ | − | | − | ≤ ≤ ∞

17 Sobolev embedding theorem of the boundary, Lipschitz domain: On Lipschitz domains the Sobolev embedding estimates hold. Basically all domains you encounter are Lipschitz. Definition: A function is said to be (m, β)-H¨oldercontinuous f(x) Cm,β(G¯), G¯ is closed, 0 < β 1, if f(x) Cm(G¯) and there exists∈ a constant C such that for any for≤ anyx ˜ G ∈ ∈ sup Dαf(x) Dαf(˜x) C x x˜ β, α = m. x,x˜ G | − | ≤ | − | | | ∈ The H¨oldernorm is defined by Dαf(x) Dαf(˜x) f m,β (G¯) = f m,β (G¯) + sup | − | C C X x x˜ β || || || || α =m x,x˜ G | | ∈ | − | If m = 0 then the function is said to be β-H¨oldercontinuous. If β = 1, then the function is said to be m-Lipschitz continuous. If m = 0, β = 1, then the function is said to be Lipschitz continuous. Definition: Let G be open and bounded in Rn, and let V denote a n 1 function space on R − . We say ∂G is of class V if for each pointx ˜ ∂G, there exist an r > 0 and a function g V such that upon a transformation∈ of the coordinate system if necessary,∈ we have

G B(˜x, r) = x B(˜x, r) xn > g(x1, . . . , xn 1) . ∩ { ∈ | − } Here, B(˜x, r) denotes the n-dimensional ball centered atx ˜ with radius r. In particular, when V consists of Lipschitz continuous functions, we say G is a Lipschitz domain. Examples All (curvilinear) polygons are Lipschitz domains; however a disk with a crack: B(0, 1) [0, 1) − is not a Lipschitz domain. Sobolev embedding theorem. G is an open bounded Lipschitz domain in Rn and u W j+m,p(G) ∈ (i) mp < n. Find q, such that 1/q = 1/p m/n , then − W j+m,p(G) , W j,q(G), → and for any 1 q < q ≤ ∗ j+m,p j,q W (G) , , W ∗(G), →→

18 (ii) mp = n. Then for any 1 q < ≤ ∗ ∞ j+m,p j,q W (G) , , W ∗(G), →→ (iii) mp > n. Then W j+m,p(G) , , Cj(G). →→ Remark: H¨olderspace embedding. In part (iii) one can show that if k = m [n/p] 1, β = [n/p] + 1 n/p. Then − − − W j+m,p(G) , Cj+k,β(G) if n/p = integer, → 6 and for any 0 β < β ≤ ∗ j+m,p j+k,β W (G) , , C ∗(G). →→ Remark In part (ii) by analogy q = should be a continuous em- ∞ bedding, however it is not for the standard reason - C0∞ is not dense in L∞. We give here a proof for the part (i) under some additional assumptions to simplify our task: (i) We assume that j = 0, the result extends to the case j > 0 if we work with all αu, α = j instead of u. ∇ | | (ii) We assume that our functions have compact support in G, that is m,p u W0 (G). This assumption can be easily removed, by the ex- tension∈ operator construction - if the boundary is Lipschitz then any function in u W m,p(G) can be extended to a function (with slightly larger norm)∈ in Ω with compact support, for some slightly larger domain Ω, G Ω. ⊂ (iii) We assume that G is rectangular domain, say a cube [0, 1] [0, 1] ... [0, 1]. This simplification saves us some estimates at the× boundary.× ×

(iv) We assume that k = 1, the result for k > 1 follows from the result for 1,p k = 1, the case of W0 (G) by iteration:

Lq , W 1,q1 , W 2,q2 → → where 1 1 1 1 1 1 = , = , q q1 − n q1 q2 − n

19 and hence 1 1 2 = q q2 − n The proof relies on four main observations: np (i) : If 1 p < n and q = n p , then there is a constant C = C(n, p) such that ≤ −

1 n u Lq C u Lp , u C (R ) (0.7) || || ≤ ||∇ || ∈ 0 1 1,p and since C0 (G) is dense in W0 (G) inequality (0.7) holds for any u W 1,p(G), the continuous embedding follows immediately. ∈ 0 (ii) Interpolation inequality: for any 1 q < q, there exists 0 < λ 1 (note: λ is strictly positive) such that≤ ∗ ≤

λ 1 λ u Lq u 1 u −q . (0.8) || || ∗ ≤ || ||L || ||L The interpolation inequality implies that it is sufficient to show strong convergence for L1 and use continuous embedding result (0.7).

(iii) If a domain G is subdivided into a finite number of smaller domains Qδ, then weak convergence of un (f, un u) 0 implies strong convergence of averages − → un (x)Q u Q 0 (0.9) |h i δ − h i δ | → Where we denote u Q the average of u over a domain Qδ. h i δ 1 u Qδ = Z udx h i Vol(Qδ) Qδ

Let u (x) = u Q if x Qδ. Note that u (x) takes only finitely h i h i δ ∈ h i many values - one for each Qδ, therefore

un (x) u (x) L1(G) Vol(G) max un (x) u (x) 0. ||h i − h i || ≤ x G |h i − h i | → ∈ n (iv) Poincare-type inequality (Rellich-Kondrachov) Assume that Qδ R is a cube [0, δ] [0, δ] ... [0, δ]. If u C1(Q) then for any p ⊂ 1 × × × ∈ ≥ u u Q 1 δ u 1 . (0.10) || − h i δ ||L (Qδ) ≤ ||∇ ||L (Qδ) Hence u u 1 δ u 1 δC u p . || − h i||L (Q) ≤ ||∇ ||L (Q) ≤ ||∇ ||L (Q)

20 Combining (iii) and (iv) we have

u un L1(Q) Vol(G) max un (x) u (x) || − || ≤ x G |h i − h i | ∈

+δC u Lp + δC un Lp 0. ||∇ || ||∇ || → Proof of Sobolev inequality (0.7): Generalized H¨olderinequality (by induction): m 1 = 1, f Lpi (G) X p i i=1 i ∈ then

Z f1f2 . . . fmdx f1 Lp1 f2 Lp2 ... fm Lpm G ≤ || || || || || || Case p = 1.

x u(x) Z Diu(x)dxi Z Diu(x) dxi | | ≤ | | ≤ R | | −∞ Hence n 1 n n 1 n 1 − u(x) − Πi=1h Z Diu(x) dxii . | | ≤ R | | Integrate over Rn and use generalized H¨older:

n 1 n n 1 n 1 − Z u(x) − dx Z Πi=1h Z Diu(x) dxii dx Rn | | ≤ Rn R | |

1 n 1 n 1 n 1 = Z Z Z D1u(x) dx1 − Πi=2 Z Diu(x) dxi − dx1 dx2 . . . dxn n 1 h h i h i i R − R R | | R | | 1 n 1 n 1 n 1 = Z Z D1u(x) dx1 − Z Πi=2 Z Diu(x) dxi − dx1 dx2 . . . dxn n 1 h i h h i i R − R | | R R | | (by generalized H¨olderwith = pj = n 1 ) − 1 n 1 n 1 n 1 Z Z D1u(x) dx1 − Πi=2 Z Diu(x) dxidx1 − dx2 . . . dxn n 1 h i h 2 i ≤ R − R | | R | |

(similarly for other variables x2, . . . , xn)

n 1 n 1 n 1 n 1 ... Πi=1h Z Diu(x) dxi − = hΠi=1 Z Diu(x) dxi − ≤ Rn | | Rn | |

21 Since for nonnegative ai:

n n 1 n Π ai ai , i=1 n X  ≤ i=1 we have n 1 n 1/(n 1) n/(n 1) − Z u(x) − dx Z Diu(x) dx n hn X n  i R | | ≤ i=1 R | |

n n/(n 1) 1/(n 1) − = n− − Z Diu(x) dx ,  X n  i=1 R | | hence u n/(n 1) C u L1 || ||L − ≤ ||∇ || Case p > 1. In general apply scaling argument- take the power α ‘properly’ so that the of scaling uα and (uα) is the same in corresponding norms: ∇ α α 1 u n/(n 1) C u − u L1 || ||L − ≤ || ∇ || α 1 C u − Ls u Lp , 1/s + 1/p = 1. ≤ || || ||∇ || Equating the norms of u we choose α

αn/(n 1) = (α 1)s − − we have q = αn/(n 1) − α α 1 u q C u −q u Lp || ||L ≤ || ||L ||∇ || Proof of interpolation inequality (0.8): Use H¨olderinequality:

1/q 1/q q ∗ λq (1 λ)q ∗ u Lq = u ∗ 1 = u ∗u − ∗ 1 || || ∗ h|| ||L i h|| ||L i 1/q 1 1 λq (1 λ)q ∗ u ∗ Ls u − ∗ t , + = 1. ≤ h|| || || ||L i s t We want 1 1 (λq )s = 1, ((1 λ)q )t = q, + = 1, (0.11) ∗ − ∗ s t

22 λq 1/s (1 λ)q q/s because then u ∗ Ls = u L1 and u − ∗ Lt = u Lq . Such λ, t and s can be found|| from|| (0.11),|| and|| this gives|| || || || 1 1 (1 λ) + λ = q − q ∗ which implies 0 < λ 1. Combining all the terms ≤ 1/q 1/s q/s ∗ 1/sq q/sq λ 1 λ u Lq u 1 u q = u 1 ∗ u q ∗ = u 1 u −q . || || ∗ ≤ h|| ||L || ||L i || ||L || ||L || ||L || ||L Proof of strong convergence of averages (0.9): Subdivide G into cubes Qδ, where each

Qδ = Qδ(m1, m2, . . . , mn) = [δm1, δ(m1 + 1)] ... [δmn, δ(mn + 1)]. × ×

Then if uk 1,p C, by (0.7) uk 1 C, then by the Alaoglu theo- || ||W ≤ || ||L ≤ rem, there is a subsequence (denoted hereafter by uk ) which is weakly convergent in L1, this means that if we take the trial{ linear} functional to be integration over each Qδ we have u Qδ un Qδ 0 |h i − h i | → 1 Proof of Poincare type inequality (0.10): Since u C (Qδ)) there ∈ is a pointx ˜ = (˜x1,..., x˜n) such that u Q = u(˜x) then h i δ

n xj u(x) u(˜x) = Z Dju(x1, x2, . . . , xj 1, s, x˜j+1,..., x˜n)ds. X − − j=1 x˜j

Therefore for any x Qδ ∈ n δ u(x) u D u dx . Qδ X Z j j | − h i | ≤ j=1 0 | |

Note that each term on the right-hand side depends only on n 1 variables, therefore integrating with respect to all variables −

n u u 1 δ D u dx. δ u 1 . Qδ L (Qδ) X Z j L (Qδ) || − h i || ≤ j=1 Qδ | | ≤ ||∇ ||

23 Trace. From R. Showalter’s book on PDEs. 1 2 (i) Trace γ0 is a bounded linear functional γ0 : H (G) L (∂G). → 1 (ii) u H (G) iff γ0(u) = 0. ∈ 0 ∂j (iii) Higher order traces a normal derivatives: “γj = j ∂G”. ∂ν | We shall describe the sense in which functions in Hm(G) have “boundary values” on ∂G when m 1. Note that this is impossible in L2(G) since ∂G is a set of measure zero≥ in Rn. First, we consider the situation where G is n the half-space R+ = (x1, x2, . . . , xn): xn > 0 , for then ∂G = (x0, 0) : x0 n 1 { } { ∈ R − is the simplest possible (without being trivial). Also, the general case can be} localized as in Section 2.3 to this case, and we shall use this in our final discussion of this section.

0.8 n We shall define the (first) γ0 when G = R+ = x = (x0, xn): n 1 { x0 R − , xn > 0 , where we let x0 denote the (n 1)-tuple (x1, x2, . . . , xn 1). For∈ any ϕ C1(G¯}) and x Rn 1 we have − − ∈ 0 ∈ − 2 ∞ 2 ϕ(x0, 0) = Z Dn( ϕ(x0, xn) ) dxn . | | − 0 | | n 1 Integrating this identity over R − gives

2 ϕ( , 0) 2 Rn 1 Z [(Dnϕ ϕ¯ + ϕ Dnϕ¯n )] dx L ( − ) n k · k ≤ R+ · ·

2 Dnϕ L2(Rn ) ϕ L2(Rn ) . ≤ k k + k k + The inequality 2ab a2 + b2 then gives us the estimate ≤ 2 2 2 ϕ( , 0) L2(Rn 1) ϕ L2(Rn ) + Dnϕ L2(Rn ) . k · k − ≤ k k + k k + 1 n 1 n Since C (R+) is dense in H (R+), we have proved the essential part of the following result.

1 0 Theorem 0.11 The trace function γ0 : C (G¯) C (∂G) defined by → 1 γ0(ϕ)(x0) = ϕ(x0, 0) , ϕ C (G¯) , x0 ∂G , ∈ ∈ n (where G = R+) has a unique extension to a continuous linear operator 1 2 2 γ0 (H (G),L (∂G)) whose range is dense in L (∂G), and it satisfies ∈ L 1 1 γ0(β u) = γ0(β) γ0(u) , β C (G¯) , u H (G) . · · ∈ ∈

24 Proof : The first part follows from the preceding inequality and the Rietz n 1 representation theorem for Hilbert spaces. If ψ C0∞(R − ) and τ is the truncation function defined in the proof of Theorem∈ 0.9, then n ϕ(x) = ψ(x0)τ(xn) , x = (x0, xn) R ∈ + 1 n 1 defines ϕ C (G¯) and γ0(ϕ) = ψ. Thus the range of γ0 contains C (R ). ∈ 0∞ − The last identity follows by the continuity of γ0 and the observation that it holds for u C1(G¯). ∈ 1 n 1 n Theorem 0.12 Let u H (R+). Then u H0 (R+) if and only if γ0(u) = 0. ∈ ∈

n 1 n Proof : If un is a sequence in C (R ) converging to u in H (R ), then { } 0∞ + + γ0(u) = lim γ0(un) = 0 by Theorem 0.11. 1 n Let u H (R ) with γ0u = 0. If τj : j 1 denotes the sequence of ∈ + { ≥ } truncating functions defined in the proof of Theorem 0.9, then τju u in 1 n → H (R+) and we have γ0(τju) = γ0(τj)γ0(u) = 0, so we may assume that u has compact support in Rn. 1 Let θj C (R+) be chosen such that θj(s) = 0 if 0 < s 1/j, θj(s) = 1 ∈ ≤ if s 2/j, and 0 θj0 (s) 2j if (1/j) s (2/j). Then the extension of ≥ ≤ ≤ n ≤ n≤ 1 n x θj(xn)u(x0, xn) to all of R as 0 on R is a function in H (R ) with 7→ − support in x : xn 1/j , and (the proof of) Theorem 0.9 shows we may { ≥ } n approximate such a function from C0∞(R+). Hence, we need only to show 1 n that θju u in H (R+). It is an→ easy consequence of the Lebesgue dominated convergence the- 2 n orem that θju u in L (R+) and for each k, 1 k n 1, that → 2 n ≤ ≤ − ∂k(θju) = θj(∂ku) ∂ku in L (R ) as j . Similarly, θj(∂nu) ∂nu → + → ∞ → and we have ∂n(θju) = θj(∂nu) + θj0 u, so we need only to show that θj0 u 0 2 n → in L (R+) as j . → ∞ s n 1 Since γ0(u) = 0 we have u(x , s) = ∂nu(x , t) dt for x R and 0 R0 0 0 − s 0. From this follows the estimate ∈ ≥ s 2 2 u(x0, s) s Z ∂nu(x0, t) dt . | | ≤ 0 | | Thus, we obtain for each x Rn 1 0 ∈ − 2/j s ∞ 2 2 2 Z θj0 (s)u(x0, s) ds Z (2j) s Z ∂nu(x0, t) dt ds 0 | | ≤ 0 0 | | 2/j s 2 8j Z Z ∂nu(x0, t) dt ds . ≤ 0 0 | |

25 Interchanging the order of integration gives

2/j 2/j ∞ 2 2 Z θj0 (s)u(x0, s) ds 8j Z Z ∂nu(x0, t) ds dt 0 | | ≤ 0 t | | 2/j 2 16 Z ∂nu(x0, t) dt . ≤ 0 | |

n 1 Integration of this inequality over R − gives us

2 2 θj0 u L2(Rn ) 16 Z ∂nu dx k k + ≤ Rn 1 [0,2/j] | | − × and this last term converges to zero as j since ∂nu is square-summable. → ∞

0.9 We can extend the preceding results to the case where G is a sufficiently smooth region in Rn. Suppose G is given as in Section 2.3 and denote by Gj : 0 j N , ϕj : 1 j N , and βj : 0 j N the open ,{ corresponding≤ ≤ } { local maps,≤ ≤ and} the partition-of-unity,{ ≤ ≤ respectively.} Recalling the linear injections Λ and λ constructed in Section 2.3, we are 1 2 led to consider function γ0 : H (G) L (∂G) defined by → N γ (u) = γ ((β u) ϕ ) ϕ 1 0 X 0 j j  j− j=1 ◦ ◦ N = γ (β ) (γ (u ϕ )ϕ 1) X 0 j 0 j j− j=1 · ◦ where the equality follows from Theorem 0.11. This formula is precisely what is necessary in order that the following diagram commutes.

H1(G) Λ H1(G) H1(Q+) H1(Q+) −→ 0 × × · · · × γ0  γ0    L2(y∂G) λ L2(Qy 0) L2(yQ0) −→ × · · · × 1 Also, if u C (G¯) we see that γ0(u) is the restriction of u to ∂G. These remarks and∈ Theorems 0.11 and 0.12 provide a proof of the following result.

26 Theorem 0.13 Let G be a bounded open set in Rn which lies on one side of its boundary, ∂G, which we assume is a C1-manifold. Then there exists 1 2 a unique continuous and linear function γ0 : H (G) L (∂G) such that 1 → for each u C (G¯), γ0(u) is the restriction of u to ∂G. The kernel of γ0 is 1 ∈ 2 H0 (G) and its range is dense in L (∂G).

This result is a special case of the trace theorem which we briefly discuss. For a function u Cm(G¯) we define the various traces of normal derivatives given by ∈ ∂ju

γj(u) = , 0 j m 1 . ∂νj ≤ ≤ − ∂G

Here ν denotes the unit outward normal on the boundary of G. When Rn G = + (or G is localized as above), these are given by ∂u/∂ν = ∂nu xn=0. m − | Each γj can be extended by continuity to all of H (G) and we obtain the following.

Theorem 0.14 Let G be an open bounded set in Rn which lies on one side of its boundary, ∂G, which we assume is a Cm-manifold. Then there is a m m 1 m 1 j unique continuous linear function γ from H (G) into j=0− H − − (∂G) such that Q

m γ(u) = (γ0u, γ1u, . . . , γm 1(u)) , u C (G¯) . − ∈ m The kernel of γ is H0 (G) and its range is dense in the indicated product.

The Sobolev spaces over ∂G which appear in Theorem 0.14 can be defined locally. The range of the trace operator can be characterized by Sobolev spaces of fractional order and then one obtains a space of bound- m m ary values which is isomorphic to the quotient space H (G)/H0 (G). Such characterizations are particularly useful when considering non-homogeneous boundary value problems and certain applications, but the preceding results will be sufficient for our needs.

27 Compact operators in Banach spaces.

(i) Spectrum is a generalization of the set of eigenvalues. (ii) Spectrum is compact. Resolvent set is open. (iii) For a compact operator and any  > 0, there are finitely many eigen- values λ  and their eigenspaces are finite-dimensional. | | ≥ When X is a finite-dimensional linear space and

A : X X → is linear, then the equation Af = g has a well-developed solvability theory in terms of eigenvalues. We wish to extend these results to compact operators. We study here equations of the second kind:

λf + Af = g.

Let us generalize the concept of an eigenvalue for a bounded (not necessarily compact) linear operator A on a Banach space X. The range (or image)

R(T ) = y X : x X, y = Ax X. { ∈ ∃ ∈ } ⊂ The nullspace (or kernel)

N(A) = x X : Ax = 0 X. { ∈ } ⊂ If A is bounded then N(A) is closed (check!), however R(A) may not be closed. Example. Suppose

f(x) X = f C[0, 1], f(0) = 0 . ∈ { ∈ } x A : f(x) Z f(t)dt → 0 Then A is one-to-one, R(A) is dence in X but R(A) is not closed

1 R(A) = f C [0, 1], f(0) = 0, f 0(0) = 0 . { ∈ } If Aλ = A λI, λ C be invertible, then such λ is said to be in the − ∈ 1 resolvent set of T ρ(A) A, and the inverse Aλ− is called the resolvent operator. The spectrum⊂ of T σ(T ) is the complement of the resolvent set σ(T ) = C ρ(T ). By open mapping theorem λ σ(T ) if Aλ is not both one-to-one and− onto. The spectrum is subdivided into∈ three parts:

28 (i) point spectrum

σp(A) = λ C : Aλ is not one-to-one , { ∈ }

If λ σp(A), then λ is called an eigenvalue of A. ∈ (ii) continuous spectrum

σc(A) = λ C : Aλ is one-to-one and R(Aλ) is dense in X, { ∈ 1 but A− is not bounded , λ } (iii) residual spectrum

σr(A) = λ C : Aλ is one-to-one and R(Aλ) is not dense in X , { ∈ } Proposition. The point, continuous, and residual spectra are disjoint and their union is σ(A). Proof: Suppose λ σp(A) σc(A) σr(A). Then Aλ is one-to-one, has 6∈1 ∪ ∪ dense range, and Aλ− is bounded. We need to show that Aλ is onto, then 1 λ ρ(A). By the extension theorem (Aλ− is bounded and R(Aλ) is dense in∈X) 1 A− : R(Aλ) X λ → can be extended to 1 A˜− : X X. λ → Now for any y such that the sequence yn, yn R(Aλ) converges to y. ˜ 1 {1 ∈ } Since Aλ− is bounded, then xn, xn = Aλ− yn converges to x, and since Aλ is continuous { } yn = Aλxn Aλx. → Thus Aλx = y, and Aλ is onto. Example: Consider A : X X where X = L2([0, 1]) or X = C([0, 1]). → 1 Define A(f(t)) = tf(t). We have Aλ(f(t)) = (t λ)f(t) whence Aλ− (f(t)) = 1 − t λ f(t). The spectrum of A consists of all λ for which t λ is zero, hence − − σ(A) = [0, 1]. For any λ Aλ is one-to-one, therefore σp = 0. 2 Suppose X = L ([0, 1]). Then R(Aλ) is dense for all λ [0, 1]. Hence ∈ σ(A) = σc(A) = [0, 1]. Suppose C([0, 1]). Then R(Aλ) is not dense for any λ [0, 1]. Hence ∈ σ(A) = σr(A) = [0, 1]. Note: If λ σc(A), this means that (A λ)f = g is ill-posed. If λ σr(A), R − R 29 this often means that (A λ)f = g is ill-posed; however if R(Aλ) is closed in X, then the problem (A − λ)f = g is well-posed, but it requires ‘solvability − condition on the right-hand side’: g R(Aλ). Example: Let X = l2, define the averaging∈ operator

A :(x1, x2, x3,...) (x1, (x1 + x2)/2, (x2 + x3)/2,...). → Let λ = 1

Aλ :(x1, x2, x3,...) (0, (x1 x2)/2, (x2 x3)/2,...), → − − 2 1 σp(A) because then x1 = x2 = x3 = ..., but (1, 1, 1,...) l . Proposition.6∈ For a bounded linear operator A, ρ(A) C6∈ is open, and ∈ σ(A) C is compact. Moreover σ(A) BR, where R = A . Proof:∈ Let R = A , For any λ, λ >⊂ R by the geometric|| || theorem || || | |

1 A 1 ∞ A A 1(A λI) 1 = (I ) 1 = ( )n λ− − λ λ − λ X λ − − − − n=0

1 is a well-defined, . Hence σ(A) BR. If there exists Aλ− , 1 ⊂ let 0 <  1/ Aλ− , then (again by the geometric series theorem) for any µ, µ < ≤ || || | |

∞ A 1 = (A µI) 1 = A 1(I µA 1) 1 = A 1 (µA 1)n λ−+µ λ − λ− λ− − λ− X λ− − − n=0 is a well-defined, bounded operator. Hence ρ(A) is open. Therefore σ(A) is closed and bounded, therefore it is compact. Definition. An operator A : X Y → is compact if every bounded sequence xn is mapped to a sequence yn = { } { Axn that has a convergent subsequence. } Proposition. If A is compact, xn converges weakly to x then yn = Axn converges strongly to y = Ax. { } { } Proof: Let us first show that yn converges weakly to y. Let g Y then ∈ ∗ define f(z) = g(Az). Then f X , therefore f(xn) f(x). Hence ∈ ∗ → g(yn) g(y). Suppose yn does not converge strongly to y. Then yn → { } has a subsequence such that y yn , k. But xn C therefore || − k || ≥ ∀ || k || ≤ there must be a subsequence of yn such that y yn 0. kl || − kl || →

30 An extensive class of compact operators g(t) = Af(t) where f(t) is a func- tion defined on an interval [0, 1] can be written in the form

1 g(t) = Z k(t, s)f(s)ds. (0.12) 0 Proposition. Formula (0.12) defines a compact operator on C([0, 1]) if k(s, t) is bounded and all points of discontinuity of k(s, t) lie on a finite number of curves t = φk(s). Proposition. If A is compact, and B is bounded then AB and BA are compact. Proof: If M is a bounded set then BM is also bounded. Therefore AB is compact by definition. Similarly, if xn x then Byn By if B is bounded. That proves the second part of the statement.→ → Corollary. A compact operator cannot have a bounded inverse. 1 Proof: If there is an inverse A− , which is bounded then the identity oper- 1 ator I = AA− is compact, which is a contradiction. Proposition. If A is a compact operator on a Banach space X then σp(A) is countable and its only possible accumulation point is 0. Moreover, if λ = 0, 6 then N(Aλ) is finite-dimensional. Note that here A : X X otherwise the operator Aλ is not defined. Proof: The proof is→ by contradiction, i.e. assume that there exists an infinite number of eigenvectors

Axn = λnxn, λn  > 0, n = 1, 2,... | | ≥

Consider E the subspace of X spanned by xn. Recall that it means that we take finite linear combinations of xn. By construction E is invariant under the map A, i.e. A : E E. → In this space the operator A has a bounded inverse such that

1 1 A− x x . || || ≤  || || 1 Therefore A− can be extended to a bounded linear operator to the closure E¯. Therefore we can view A as a bounded linear operator on the Banach space E¯. Since we have shown that there it is invertible, therefore A, cannot be compact.

31 Fredholm alternative.

Spectrum of compact operator is point spectrum except 0. The goal is to show that is λ = 0 and λ σ(A) then λ is an eigenvalue, that isλ σ(A). 6 ∈ ∈ Lemma. If λ = 0 then R(Aλ) is closed. 6 Proof: By contradiction. Suppose xn X such that Aλxn y, but { } ∈ → y R(Aλ). y = 0, because 0 R(Aλ), therefore we can assume that xn 6∈ 6 ∈ 6∈ N(Aλ). Let ρn = ρ(xn,N(Aλ)). Let zn N(Aλ) such that xn zn 2ρn. ∈ || − || ≤ For the sequence an = xn zn there are only two options: either there is || − || a subsequence of it which is bounded or an . → ∞ Consider the first option, then xn zn contains a bounded subsequence || − || such that A(xn zn) is convergent. But − 1 xn zn = (A(xn zn) Aλ(xn zn)) − λ − − − Since both terms on the righthand side are convergent (the latter converges to y R(Aλ) because Aλzn = 0) it implies that xn zn itself has a convergent subsequence,6∈ say − xn zn x − → Then

Aλ(xn zn) = Aλ(xn) Aλx, Aλ(xn zn) = Aλ(xn) y, − → − → and hence y = Ax. Suppose now an . Normalize → ∞ xn zn wn = − , an then wn = 1 is bounded and || || 1 Aλwn = Aλxn 0, an → since Aλxn converges to y. But again 1 wn = (Awn Aλwn). λ − Hence there is a convergent subsequence

wn w. → 32 We need to have w N(Aλ), since Aλwn 0. Finally, let ∈ → yn = zn + anw N(Aλ) ∈ then ρn xn zn , but xn zn = xn yn anw = an(wn w), and we have ≤ || − || − − − − ρn xn yn 2ρn wn w ≤ || − || ≤ || − || therefore 1 wn w > . || − || 2 Contradiction. For simplicity of notation, assume that λ = 1 and hence we study the operator A1 = Ax x. The general case reduces to this one by rescaling. Proposition. Let− A be a compact operator

A : X X → X is a Banach space. If the equation

Ax x = y (0.13) − is solvable for all y, then the equation

Ax x = 0 − has no solutions distinct from the zero solution. Proof: By contradiction, assume there is x1 such that Ax1 x1 A1x1 = 0. Let − ≡ n En = x X : A x = 0 { ∈ 1 } We have E1 E2 E3 ... (0.14) ⊂ ⊂ Let us show that the sequence En is strictly increasing. Since (0.13) has a solution for any y, there exists x2, x3,...

A1x2 = x1,A1x3 = x2,...

Then xn En but xn En 1. Therefore En is strictly increasing. All ∈ 6∈ − the subspaces En are linear and closed (they are the (finite-dimensional) nullspaces of corresponding bounded operators), therefore there is an ele- ment yn+1 En+1 such that ∈ yn+1 = 1 and ρ(yn+1,En) 1/2, || || ≥ 33 where ρ(yn+1En) = inf yn+1 x , x En. Indeed, suppose ρ(xn+1,En) = || − || ∈ α, where xn+1 is constructed above. Then there existsx ˜ such that xn+1 x˜ < 2α; but || − || ρ(xn+1 x,˜ En) = ρ(xn+1,En) = α. − Let xn+1 x˜ yn+1 = − . xn+1 x˜ || − || Consider the sequence Ayk . If m > n { }

Aym Ayn = ym (yn + A1ym A1yn) 1/2, || − || || − − || ≥ because yn + A1ym A1yn Em 1. Therefore Ayk does not have a convergent subsequence,− but it∈ is bounded.− Hence{ we have} a contradiction to compactness. Let us discuss the adjoint operator of A1. Definition. Suppose A : X Y → is a bounded linear operator, and X and Y are Banach spaces. Then the adjoint operator is defined as

A∗ : Y ∗ X∗ → by the formula [A∗(f)](x) = f[A(x)].

So A∗ is defined on the space of bounded linear functionals “in the other direction”. Proposition. If A : X X → is compact then A∗ is compact. Proof: Suppose gn C. Since by Alaoglu theorem a ball in X is weak-* || || ≤ ∗ compact, there is a subsequence of gn , denoted thereafter as gn , such { } { } that this subsequence converges to some g X∗ weak-*, that is for any chosen x X ∈ ∈ (gn g)(y) 0 − → We want to show that if

fn(x) = gn(A(x)). (0.15)

34 then fn f X 0. By (0.15) if y = Ax || − || ∗ →

(fn f)(x) 0 − →

Suppose fn f X 0. Then there is a subsequence of fn again denoted || − || ∗ 6→ { } as fn and a sequence xn X such that { } { } ∈

(fn f)(xn) δ xn n. | − | ≥ || ||∀

By rescaling, choose xn = 1. But fn(xn) = gn(A(xn)). Since A is com- || || pact, then since xn = 1 there is a subsequence of xn again denoted as || || { } xn such that A(xn xm) 0 as n, m . For any δ/2 >  > 0 choose{ } m = m()|| such that− || → → ∞  2C A(xn xm) n m || − || ≤ 2∀ ≥ where max gn C. Choose n0 = n0(m) such that || || ≤  (f fn)xm n n0 | − | ≤ 2∀ ≥

Then for all n max n0, m we have Since gn C, then ≥ || || ≤

(fn f)(xn) (fn f)(xm) + (fn f)(xm xn) = (fn f)(xm) | − | ≤ | − | | − − | | − |

+ (gn g)A(xn xm) (fn f)(xm) + 2C A(xn xm)  | − − | ≤ | − | || − || ≤

This is the contradiction. Therefore fn f X∗ 0. Combining two previous propositions|| we− have:|| → Proposition. If the equation

h = A∗f f − is solvable for all h, then the equation 0 = A∗f f has only zero solution. Lemma. For for a compact operator A and any−x X define the distance ∈

ρ(x) = ρ(x, N(A1)) = inf x y . y N(A1) || − || ∈ Then there is a M > 0 such that

ρ(x) M A1x . ≤ || ||

35 Proof Suppose not, i.e. there is a sequence xn X such that xn N(A1) with { } ∈ 6∈ ρ(x ) n . A1xn → ∞ || || Since N(A1) is finite-dimensional there is wn N(A1) such that ∈

xn wn = ρ(xn) || − || Rescale xn wn vn = − , ρ(xn) then vn = 1 and || || A1xn A1vn = 0. ρ(xn) → We can write vn = Avn Avn + vn = Avn A1vn. − − And since A1vn 0 and A is compact there is a subsequence which we → again denote vn such that vn v. → Then v N(A1), because A1vn 0, but ∈ → xn wn d(xn)v vn v = − − 1, || − || || d(xn) || ≥ because wn d(xn)v N(A1) and d(xn) xn y for any y N(A1). − − ∈ ≤ || − || ∈ Hence vn does not converge to v. Contradiction. Lemma If y R(A1) there is x X such that y = A1x and x M y . ∈ ∈ || || ≤ || || Proof: For any y R(A1) there is some x0 such that y = A1x0. By the ∈ previous lemma there is w N(A1) such that ρ(x0) = ρ(x0,N(A1)) = ∈ x0 w . Let x = x0 w, then || − || −

x = ρ(x0) M A1x0 = M A1x = M y || || ≤ || || || || || ||

Proposition. For some fixed g the equation A∗f f = g is solvable if and only if g(x) = 0 for all x X such that Ax x =− 0. Proof: If A f f = g is∈ solvable then − ∗ −

g(x) = A∗f(x) f(x) = f(Ax x) = 0 − −

36 if Ax x = 0. In the− other direction assume g(x) = 0 for all x that satisfy Ax x = 0. − We are going to show that A∗f f = g is solvable. We will construct the functional f, that solves it. For any− y = Ax x let f(y) = g(x). The values − of this functional are defined uniquely, because if Ax1 x1 = Ax2 x2, then − − x1 x2 N(A1), and therefore g(x1) = g(x2). So far we have defined f(y), − ∈ if y R(A1). This is a bounded linear functional, because ∈ g(y) = h(x) h M y | | | | ≤ || || || || where x is chosen by the previous lemma. Hence, by Hahn-Banach it can be extended to the whole space and therefore we are done. Corollary. If Ax x = 0 has no nonzero solutions, then A∗f f = g is solvable for all g. − − Proposition For some fixed y the equation Ax x = y is solvable if and only if f(y) = 0 for all f X such that A f f−= 0. ∈ ∗ ∗ − Proof: If Ax x = y is solvable. Then for any f such that A∗f f = 0 we have − −

f(y) = f(A(x)) f(x) = A∗f(x) f(x) = [A∗f f](x) = 0. − − −

In the other direction, suppose y R(A1) - it cannot be represented in the form y = Ax x. For each f such6∈ that A f f = 0 consider the nullspaces − ∗ − N(f). They are closed subsets of X. Hence f N(f) is also closed. Then we need to prove that ∩ y f N(f). 6∈ ∩ It is sufficient to show that we can construct a bounded linear functional f such that f(y) = 0,A∗f f = 0. 6 − Consider the linear subspace of X

E = µy + R(A1) and for all x E define ∈

f(µy + R(A1)) = µ.

Then f is a bounded linear functional, because R(A1) is closed (check why bounded in more detail). Hence by Hahn-Banach theorem there is a bounded extension of this function to X. Hence we are done.

37 Hence we have: Proposition. If Ax x = 0 has no nonzero solutions, then Ax x = y is solvable for all y. − − Proof: If Ax x = 0 has no nonzero solutions, then A f f = g is solvable − ∗ − for all g, then A∗f f = 0 has no nonzero solutions, then Ax x = y is solvable for all y. − − The last proposition allows us finally to formulate the Fredholm alternative: For an equation Ax λx = y λ = 0 there are only two cases possible: 1) It is solvable for all− y, i.e. the6 operator A λI has a bounded inverse. − 2) It has a nonzero solution for y = 0, i.e. λ σp(A), i.e. it is an eigenvalue. Proof The fact that A λI has a bounded∈ inverse follows from the open mapping theorem - by all− the propositions we showed that if A λI is one- to-one, then it is also onto, therefore it has a bounded inverse.− Part 2) is immediate. One can actually say more about the solvability of the problem

Ax λx = f, − If λ is an eigenvalue. Proposition(without proof) The dimension of N(Aλ) equals to the dimen- sion of N(Aλ∗ ). The space of compact operators is closed under the operator norm: Proposition. If An is a sequence of compact operators from a Banach space to a Banach space that converges in operator norm to an operator A, then A is a compact operator. Proof: Use the diagonal argument: for any sequence xn there is a con- { } vergent subsequence A1xn , then the latter sequence, has a convergent { 1 } sequence A2xn2 , and so on. Form the diagonal sequence and show that it is convergent{ for}A. Definition. A bouned operator

An : X Y → where X and Y are Banach spaces is said to be finite-dimensional (or finite- rank) of dimesion n, if R(An) has dimension n. Examples. Any functional is a finite-dimensional operator of dimension 1. On R∞ a projection operator Pn on the first n coordinates is a finite- dimensional operator of dimension n. A very rich class of compact operators comes from the following idea: every

38 finite-dimensional operator is compact. Let’s take a closure with respect to the operator norm, all elements of this closure by the previous theorem are compact. Question Do we get this way all compact operators, i.e. can any compact operator be approximated arbitrarily well by a finite-dimensional operator? The answer is ‘yes’ in a Hilbert space, but the answer is ‘no’ in general (look for work of P.Enflo if you are interested). Since L2 is a Hilbert space then A is compact if (check!)

1 2 2 A : L [0, 1] L [0, 1],A(f)(x) = Z k(x, y)f(y)dy, → 0 and k(x, y) L2[0, 1] [0, 1] can be approximated by integral operators with a degenerate∈kernel function×

n k (x, y) = α (x)β (y), n X m m m=1 by, for example, taking finite trigonometric polynomials.

39