<<

An Overview of the Theory of Distributions

Matt Guthrie

Adapted from Halperin[1]

1 Chapter 1

Introduction

For the range of time between the introduction of the operational calculus at the end of the 19th century and the mid 20th century, many formulas were in use which had not been adequately clarified from the mathematical point of view. For instance, consider the Heaviside function ( 0 for x < 0 H(x) = (1.1) 1 for x ≥ 0. It is said that the derivative of this function is the Dirac delta function δ(x), which has the mathematically impossible properties that it vanishes everywhere except at the origin where its value is so large that

Z ∞ δ(x) dx = 1. (1.2) −∞ This “function” and its successive “derivatives” have been used with considerable success, especially in electromagnetics and gravitation. It was suggested by Dirac himself that the delta function could be avoided by using instead a limiting procedure involving ordinary, mathematically possible, functions. However, the delta function can be kept and made rigorous by defining it as a measure, that is, as a set function in place of an ordinary point function. This suggests that the notion of a point function be enlarged to include new entities1 and that the notion of the derivative be correspondingly generalized so that within the new system of entities, every point function should have a rigorously defined derivative. This is done in the theory of distributions. The new system of entities, called distributions, includes all continuous functions, all Lebesgue locally summable functions, and new objects of which a simple example is the Dirac delta function mentioned above. The more general but rigorous process of differentiation assigns to every distribution a derivative which is again a distribution, and so every distribution, including every locally summable point function, has derivatives of all orders. The derivative of a locally summable point function is always a distribution although not, in general, a point function. However, it coincides with the classical derivative when the latter exists and is locally summable.

1Just as the notion of a rational number was enlarged by Dedekind to include all real numbers.

2 This theory of distributions gives rigorous content and validity to the formulas of operational calculus mentioned above. It can be developed not only for functions of one variable, but also for functions of several variables and it provides a simple but more complete theory of such topics as Fourier series and integrals, convolutions, and partial differential equations. A systematic exposition of the theory of distributions is given in Grubb’s recent Distributions and Operators[2]. There’s also the recommended reference work by Strichartz, A Guide to Distribution Theory and Fourier Transforms[3]. The comprehensive treatise on the subject, although quite old now, is Gel’fand, and Shilov’s Generalized functions[4]. A very good (though pretty advanced) source that’s now available from Dover is Topological Vector Spaces, Distributions and Kernels by Trves[5]. That book is one of the classic texts on functional analysis and if you’re an analyst or aspire to be, there’s no reason not to have it now. Of course, if you read French, you really should go back and read Schwartz’s original treatise[6]. However, the following pages may serve as a useful introduction to some of the basic ideas.

3 Chapter 2

Point Functions as Functionals

The following discussion will lead to the precise definition of distributions given later in this chapter. Let (a, b) be a finite closed interval. A continuous function f(x) can be considered a point function1 but it can also be considered in another way:2 it defines the functional F (φ) as

Z b F (φ) = f(x)φ(x) dx, (2.1) a where F (φ) is a number3 defined for every continuous φ(x). This functional is linear, that is

F (c1φ1 + c2φ2) = c1F (φ1) + c2F (φ2). (2.2) Since there are linear functionals which cannot be expressed in this way, in terms of any continuous (or even Lebesgue summable[7] f(x)), our first suggestion is that the distributions be defined as the arbitrary linear functionals F (φ) over some suitable set S of continuous φ(x). The φ(x) chosen will be called testing functions. If f(x) is absolutely continuous and has a derivative f 0(x), the derivative will also define a linear functional, namely

Z b f 0(x)φ(x) dx. a This new functional can be expressed in terms of the original F (φ), for some φ, by use of integration by parts:

Z b Z b f 0(x)φ(x) dx = − f(x)φ0(x) dx = −F (φ0), (2.3) a a 1Point functions need be defined almost everywhere and two functions are identified if they are equal almost everywhere. Thus, we say that f(x) is identically zero if it has the value zero almost everywhere, that f(x) is continuous if it can be identified with a continuous function; the upper bound of f(x) will mean the essential upper bound, the derivative of f(x) will mean the derivative of that function which can be identified with f(x) and which has a derivative if such a function exists, and so on. 2Just as every rational number was considered by Dedekind in a new way: as defining a cut in the set of all rationals. 3Numbers may be taken to mean real numbers or complex numbers.

4 which is valid for all φ(x) that vanish at a and b and have a continuous first derivative. This suggests that we allow S to include only such φ(x) and not all continuous φ(x). Equation (2.3) also suggests that for every linear F (φ), whether it comes from an f(x) possessing a derivative f 0(x) or not, F 0(φ), a derivative of F which is a linear functional on S, be defined by

F 0(φ) = −F (φ0). (2.4) If F 0(φ) is to be defined for all φ ∈ S, S must be restricted to include only such φ(x) as have φ0(x) also in S. This leads to the condition that φ(x) mush have derivatives of all orders and they (as well as φ(x) itself) must vanish at a and b. Important examples of such φ(x) are the functions

1/n φn(x) = (φc,d(x)) , obtained by choosing n ∈ N and a ≤ c < d ≤ b and defining

( 1 1  exp − x−c + d−x for c < x < d φc,d(x) = (2.5) 0 otherwise. It is not necessary to restrict S any further. However, in restricting S we must be careful to identify the point function f(x) and the functional F (φ) which it defines. Thus, we want S to include sufficiently many testing functions so that functions f(x) which are different on (a, b) will be identified with different F (φ). We do not want

Z b f(x)φ(x) dx a to vanish for all φ ∈ S except when f(x) is identically zero on (a, b). This requirement is satisfied if S includes all

1/n (φc,d(x)) , for

Z b Z d 1/n lim f(x)(φc,d(x)) dx = f(x) dx, (2.6) n→∞ a c and if

Z d f(x) dx = 0 (2.7) c for all a ≤ c < d ≤ b, then f(x) is identically zero on (a, b). Another point is this: the F (φ) which are defined by point functions, and all derivatives of such (the F (n)(φ)) have a certain continuity property and it is desirable to require this continuity property in defining distributions.4

4As will be shown in Chapter 5, this gives the smallest enlargement or completion of the system of locally summable point functions which permits unrestricted differentiation.

5 Finally, with a view to later applications, we will define distributions on open intervals. The closed interval gives a mathematically simpler situation and indeed the open interval will be discussed in terms of its closed subintervals. However, the word “distribution” will be reserved for the open interval and the terminology “continuous linear functional” will be used for the closed interval. The precise definitions are:

• Definition of a distribution

For any closed finite interval (a, b), let S(a,b) consist of all continuous φ(x) possessing deriva- tives φ(n)(x) of all orders which, along with φ(x) itself, vanish at a and b and for x outside

(a, b). A linear functional F (φ), defined for all φ ∈ S(a,b), is called a continuous linear func- tional on (a, b) if it has the property that whenever all φ, φm ∈ S(a,b) and the φm(x) converge (n) (n) uniformly to φx, and for each n, the derivatives φm (x) converge uniformly to φ (x), then F (φm) will converge to F (φ).

For an arbitrary open interval I, finite or infinite, a distribution on I is a linear functional F (φ) such that for every closed finite interval (a, b) contained in I, F (φ) is defined for

φ ∈ S(a,b) and, when restricted to these φ, defines a continuous linear functional on (a, b).

• Identification of the distribution and point function A distribution F (φ) on an interval I is to be identified with a point function f(x) if, for every closed finite interval (a, b) ∈ I, f(x) is summable on (a, b) and

Z b F (φ) = f(x)φ(x) dx (2.8) a

for all φ ∈ S(a,b). Sometimes the notation f(φ) is used to denote the distribution identified with the point function f(x).

• Identification of the distribution and measure function A distribution F (φ) on an interval I is to be identified with the Stieltjes measure5 dψ(x) if for every closed finite interval (a, b) contained in I, ψ(x) is of bounded variation on (a, b) and

Z b F (φ) = φ(x) dψ(x) (2.9) a

for all φ ∈ S(a,b).

• Definition of the derivative of a distribution For any distribution F (φ) on I, the derivative distribution F 0(φ) is defined by

F 0(φ) = −F (φ0). (2.10)

5For more information on this, see Lebesgue-Stieltjes Measures on the Real Line and Distribution Functions[8].

6 It is easily verified that this F 0 satisfies the conditions for a distribution on I (the same formula defines the derivative for a continuous linear functional).

The distributions defined above include all continuous and even all Lebesgue (locally) summable point functions, all Stieltjes measures, and, as we will see, a variety of new mathematical entities. Within the system of distributions each distribution has a derivative and consequently, derivatives of all orders:

F (n)(φ) = (−1)(n)F (φ(n)). (2.11) The derivative of a point function f(x) may be a point function or a Stieltjes measure, or a more general distribution. It will be a point function g(x) if and only if f(x) is absolutely continuous on every finite closed (a, b) ∈ I, then g(x) is the ordinary point derivative f 0(x) which then must be defined (almost everywhere). The derivative of f(x) is a Stieltjes measure dψ(x) if and only if f(x) is of bounded variation on every finite closed (a, b) ∈ I and then ψ(x) differs from f(x) by an arbitrary additive constant. In particular, the derivative δ of the Heaviside function H(x) is now rigorously defined (I being any open interval which contains the origin) and is the measure function dH(x); this is the proper mathematical description of the Dirac delta function, which has no meaning as a point function. We can now use the notation δ = H0. In applications to certain physical problems, a locally summable f(x) may be thought of as repre- senting a distribution of mass or electric charge on the x axis, as such,

Z d f(x) dx c is the total (algebraic) charge on (c, d) and f(x) is the charge density at a particular x. From this point of view, the Dirac δ function represents the concentration of unit charge at a single point, the origin. δ0 represents a dipole, and higher derivatives of δ represent more complicated multipole terms. An interesting distribution which is quite different from the Dirac δ and its derivatives can be investigated with ( x−1/2 for x > 0 f(x) = (2.12) 0 for x ≤ 0. The derivative of f(x) exists (as a distribution), although it is not quite a Stieltjes measure. Roughly speaking, f 0 corresponds to negative mass continuously distributive on the positive x axis with an infinite quantity in every neighborhood of the origin, together with an infinite positive mass at the origin, in such a way that there is finite total algebraic mass on every finite closed interval. This is shown by the formula, for φ ∈ S(a,b) with a < 0 < b:

7 Z b Z b f 0(φ) = − f(x)φ0(x) dx = − f(x)φ0(x) dx a 0 Z b = − lim x−1/2φ0(x) dx (2.13) →0   Z b   −1/2 b 1 −3/2 = − lim x φ(x)  + x φ(x) dx →0  2 Because φ vanishes at b,

φ() Z b  1   f 0(φ) = lim √ + − x−3/2 φ(x) dx →0  2  (2.14) φ(0) Z b  1   = lim √ + − x−3/2 φ(x) dx →0   2 and since

φ() − φ(0) √ φ() − φ(0) √ =  → 0, (2.15)   0 1 −3/2 φ (0) = 0 as  → 0. Although − 2 x φ(x) is not summable on (0, b) for arbitrary φ ∈ S(a,b), yet the bracket as a whole always has a finite limit, which has been called the “finite part” of the divergent integral by Hadamard[9]. Hadamard uses the notation

Z b f 0(φ) = F. p. f 0(x)φ(x) dx (2.16) 0 to indicate that f 0(x) is the non-summable point derivative of f(x) for x > 0. Finite parts of divergent integrals have been studied in great detail by Hadamard. Similarly, the linear functional

Z − Z b 1  F (φ) = lim + φ(x) dx (2.17) →0 a  x corresponds to a continuously distributed mass with infinite positive mass along the positive x axis together with infinite negative mass along the negative x axis in such a way that in every neighborhood of the origin there is finite total algebraic mass. The limit of the bracket is another case of a Hadamard finite part of a divergent integral, in this case coinciding with the Cauchy principal value. The distribution itself is the derivative of the point function log|x|.

8 Chapter 3

The Calculus of Distributions

Multiplication of a distribution by a constant and addition of two distributions are defined by the formulas

(cF )(φ) = cF (φ), (3.1) and

(F1 + F2)(φ) = F1(φ) + F2(φ). (3.2) It is easily verified that the usual rules of addition and subtraction hold, even when differentiation is involved. Thus,

0 0 0 (c1F1 + c2F2) = c1F1 + c2F2. (3.3) The constant point function, f(x) = k, is identified with constant distribution k(φ) which has the property

Z b Z b k(φ) = kφ(t) dt = k φ(t) dt (3.4) a a for every φ ∈ S(a,b). In particular, when k = 0 the corresponding distribution is called the zero distribution. Again, it is easily verified that if F is a constant distribution, F 0 is the zero distribution. To prove the converse, it is useful to establish the following expansion lemma.

Lemma. Let θ(x) be any function in S(c,d) for which

Z d θ(x) dx = 1. (3.5) c For instance, θ(x) could be

Z d −1 φc,d(t) dt φc,d(x), c

9 where φc,d(x) is the function defined in (2.5). Let n ∈ N. Then any φ(x) in any S(a,b) with a ≤ c < d ≤ b can be expressed in the form

0 (n) n+1 φ(x) = a0θ(x) + a1θ (x) + ... + anθ (x) + ρn (x), (3.6)

where a0, a1, . . . , an are uniquely determined constants and ρn(x) ∈ S(a,b).

This lemma can be proved by induction on n if we observe that a particular φ(x) ∈ S(a,b) can be 0 expressed in the form ρ (x) for some ρ(x) ∈ S(a,b) if and only if

Z b φ(t) dt = 0. (3.7) a

In the expansion for a general φ(x) ∈ S(a,b),

Z b a0 = φ(t) dt. (3.8) a 0 0 0 Now using this representation with n = 1, we argue that if F = 0 then F (ρ1) = −F (ρ1) = 0 for all ρ1, and therefore

0 0 F (φ) = F (a0θ + ρ1) = a0F (θ) + F (ρ1) = a0k, (3.9) where k is a constant, the value of F (θ). Therefore,

Z b F (φ) = kφ(t) dt, (3.10) a proving that F is indeed a constant distribution. We say that the distribution F is a primitive of G if G0 = F . From the preceding paragraph, it follows that two primitives of the same F differ by a constant distribution. As is well known, in the case of point functions there exists an infinity of different (point function) primitives, and in order to specify one of them it is sufficient to give its value at a particular point. In the case of an arbitrary distribution F there exists again an infinity of primitives (distributions!) and a particular one can be specified by giving its value for any particular testing function θ(x) as described above. This follows immediately from the relationship

0 0 G(φ) = G(a0θ + ρ1) = a0G(θ) + G(ρ1) = a0G(θ) − F (ρ1), (3.11) which can be used to define G(φ) for every φ in terms of an arbitrary (but fixed) G(θ), and a given F . The fact that this G is a distribution can be deduced from the relation

Z x Z b  Z x ρ0(x) = φ(t) dt − φ(t) dt θ(t) dt. (3.12) 0 a a

10 Chapter 4

Multiplication of Distributions

The product F1F2 is not defined for arbitrary distributions. This reflects the fact that the product f1(x)f2(x) of two locally summable point functions may not be locally summable. However, F1F2 is defined in certain cases. For example, if F1,F2 can be identified with f1(x), f2(x) respectively and the product f1(x)f2(x) is locally summable, then F1F2 is defined to be the distribution which is identified with f1(x)f2(x). This is a special case of the following general definition.

• Definition of a product of two continuous linear functionals on a finite closed interval (n) Suppose for some n that F1 is a point function f1(x) and that F2 is the nth derivative of a point function f2(x):

(n) (n) F1 = f1,F2 = f2 . (4.1)

Suppose, too, that the product f1(x)f2(x) is summable. Then we define F1F2 by the formula

(n) F1F2 = F1f2 n n = (F f )(n) − (F 0f )(n−1) + (F (2)f )(n−2) + ... 1 2 1 1 2 2 1 2 (4.2) n + (−1)r (F (r)f )(n−r) + ... + (−1)nF (n)f . r 1 2 1 2

(n) Each term on the right is a continuous linear functional since F1 f2 is f1(x)f2(x) which was (r) assumed summable for r < n, F1 f2 is the product of an absolutely continuous (so it is bounded) point function and f2.

• Definition of the product of distributions on an open interval I

Let F(a,b) denote the distribution F restricted to the testing functions in S(a,b). If F1,F2 are distributions on I such that F1(a,b)F2(a,b) is defined on every (a, b) ∈ I, then F1F2 is defined to be the distribution on I which, when restricted to any S(a,b), coincides with F1(a,b)F2(a,b).

11 It is not difficult to verify that these definitions give a unique F1F2 whenever they define F1F2 at all and that our formula above for the product F1F2 is equivalent to

Z b (n) (n) (F1F2)(φ) = (−1) f2(x)(F1(x)φ(x)) dx. (4.3) a 0 0 0 It can also be verified that the product law (F1F2) = F1F2 + F1F2 is valid and that the three products which occur in this statement are necessarily defined whenever one of the products on the right is defined. There is a special but important case in which this product rule can be greatly simplified. Suppose that F1 is a continuous point function a(x) possessing point function derivatives of all orders, so (n) that F1 is a continuous point function for each n. Then the product rule defines F1F2 for every (n) F2 which, when restricted to an S(a,b), can be put in the form f2 for some f2 and n which might depend on a and b. The rule simplifies in this case to

(F1F2)(φ) = (aF2)(φ) = F2(aφ) (4.4) for every testing function φ.1 The preceding paragraph suggests that we could use the relation

(aF2)(φ) = F2(aφ) (4.5) to define aF2 whenever a(x) has (ordinary) derivatives of all orders but F2 is an arbitrary distribu- tion. It is noteworthy that such a definition would not give anything new since every distribution (n) F2, when restricted to an S(a,b), can be put in the form f2 for some point function f2(x) and some n, which may depend on a and b. This theorem will be proved in the next chapter and it is a consequence of the continuity condition in the definition of a distribution. It is noted that the original definition of cF , where c is a constant, is included in the general definition of product of distributions if c is considered as a constant distribution. Also note that when the Dirac δ and its derivatives δ(n) are multiplied by an a(x) with derivatives a(n)(x) of all orders, we obtain:

a(x)δ = a(0)δ (4.6) a(x)δ = (aδ)0 − a0δ = a(0)δ0 − a0(0)δ, and in general,

n a(x)δ(n) = a(0)δ(n) − na0(0)δ(n−1) + a00(0)δ(n−2) + ... + (−1)na(n)(0)δ. (4.7) 2

1Observe that aφ is a testing function along with φ.

12 Chapter 5

The Order Classification of Distributions

The system of continuous linear functionals on a closed, finite interval includes every summable f(x) and its derivatives. This chapter will illustrate that there are no other continuous linear functionals. That is, every continuous linear functional is either f(φ) or f (n)(φ) for some suitable summable f(x) and some finite n.1 Let F be a continuous linear functional on (a, b). F will be said to have finite order on (a, b) either if it can be identified with a summable point function f(x) or if, for some finite r, F is the rth order derivative of some such f(x): F = f (r). The smallest possible r will be called the order of F .2 Clearly, if F can be identified with a summable f(x), then it has order 0 and its derivative F 0 has order either 0 or 1. If F has order r > 0, its derivative has order r + 1. If F has order r, and s ≥ r, then for some f which depends on F and s,

F = f (s). (5.1) The value of s determines the following properties of f:

• if s > r, f is absolutely continuous.

• if s = 0, f is uniquely determined to within Lebesgue equivalence.

• if s > 0, f is determined only to within an arbitrary additive in x of degree s − 1.

It should therefore be demonstrated that F must have finite order. The definition of a continuous linear functional requires that F (φm) converges to F (φ) whenever all φ, φm ∈ S(a,b) and, for each (n) (n) n ≥ 0, the φm (x) converge to φ (x). This implies the stronger condition (Cr) for some finite r which depends on F ,

1By using a higher n we can prove this with f(x) continuous or even absolutely continuous. 2In The Theory of Distributions[6], a slightly different definition is used. The order of F is defined there to be (r) the lowest r for which F = F0 , where F0 is a Stieltjes measure.

13 F (φm) converges to F (φ) whenever all φ, φm ∈ S(a,b), and for all n with 0 ≤ n ≤ r, the (n) (n) φm (x) converge uniformly to φ (x).

Suppose that Cr is false for every r. A sequence of testing functions φm can then be defined such that for each m,

(n) −m a. |φm | < 2 ∀ n ≤ m, and

b. F (φm) > 1.

The here a modified absolute value notation is being used, |φ| = max[|φ(x)|; a ≤ x ≤ b]. Then for (n) every n, the φm (x) converge uniformly to the zero function, and since F is a continuous linear functional, this should imply that F (φm) → 0. This contradicts b. above and so Cr cannot be false for every r, that is Cr holds for some finite r. But, for n < r,

Z x (x − t)r−n−1 φ(n)(x) = φ(r)(t) dt, (5.2) a (r − n − 1)! so that |φ(n)| ≤ K|φ(r)| ∀ n < r, for some finite K which depends only on r, a, and b. It follows that Cr is equivalent to the following condition Br,

(r) F (φm) converges to F (φ) whenever all φ, φm ∈ S(a,b) and the φm (x) converge uniformly to φ(r)(x).

The condition Br, in turn, implies the following condition Ar,

(r) |F (φ)| ≤ |F |r|φ | ∀ φ ∈ S(a,b), (5.3)

3 with |F |r a finite constant .

(r) Indeed, if Ar were false, we could define a sequence φm with F (φm) > m|φm |. Then the functions (r) −1 −1 (r) µm(x) = |φm | m φm(x) would be in S(a,b) and the µm (x) would converge uniformly to zero (r) −1 since |µm | = m . But F (µm) > 1, contradicting Br. Thus, Ar cannot be false. To summarize these results, if F is a continuous linear functional on (a, b), there is a finite r for which |F |r < ∞, and

(r) |F (φ)| ≤ |F |r|φ | ∀ φ ∈ S(a,b). (5.4)

Now we construct a new functional L, which is defined for functions which can be put in the form φ(r), by the formula L(φ(r)) = F (φ). The functional L is, by the preceding paragraph, a bounded linear functional on the linear space of the functions of the form φ(r) with norm |φ(r)|. The Hahn- Banach procedure[10] can be used to extend this functional L to all continuous functions on (a, b)

3 Let |F |r denote the smallest possible such constant.

14 without increasing the bound of L. Then the representation theorem of F. Riesz[10] applies and shows that

Z b L(φ(r)) = φ(r)(x) dψ(x) (5.5) a for some ψ(x) of total variation equal to |F |r and ψ(x) can be assumed to satisfy |ψ(x)| ≤ |F |r. Thus,

Z b F (φ) = L(φ(r)) = φ(r)(x) dψ(x) a (5.6) Z b = − ψ(x)φ(r+1)(x) dx a so that F = f (r+1) where f = (−1)rψ, showing that F is indeed of finite order. If f is replaced by one of its indefinite integrals we can write F = f (r+2) with f(x) absolutely continuous.

(r+1) For use in the next chapter, we emphasize that if |F |r is finite then F = f with f(x) bounded, in fact |f| ≤ |F |r. The converse to this is false; however |F |r is certainly finite if F is of order r, since then

Z b r (r) |F (φ)| = (−1) f(x)φ (x) dx

a (5.7) Z b  ≤ |f(x)| dx |φ(r)|. a Thus, every continuous linear functional F on a finite closed interval can be expressed as a suitable derivative of a summable point function, and our system of continuous linear functionals is extensive enough to permit indefinite differentiation of the summable point functions. This suggests an abstract but equivalent formulation of the system of continuous linear functionals as the system of all symbols f (n) using all summable f(x) and all nonnegative n, and imposing obvious rules of addition, identification, and so on. If a distribution F on an open interval I is considered, then the above considerations apply to (r) every F(a,b). We write F(a,b) = f on (a, b) and speak of F as having order r on (a, b). However, the order of F on (a, b) may be unbounded when a and b vary. A simple example of this is the distribution

∞ X F (φ) = φ(m)(x) (5.8) m=1 on the interval −∞ < x < ∞. The sum on the right has only a finite number of nonzero addends for every testing function, and the order of this F on an interval (a, b) is the largest n for which a < n < b.

15 Chapter 6

Continuity and Convergence Properties of Distributions

Consider continuous linear functionals on the interval (a, b).

A set of Fa will be called bounded if, for each φ ∈ S(a,b), the numbers Fa(φ) are bounded, that is

|Fa(φ)| ≤ M, (6.1) where M is a finite constant which depends on φ, M = M(φ).

We say that Fm form a convergent sequence if, for each φ, the numbers Fm(φ) form a convergent sequence, and that the Fm converge to F as limit if, again for each φ, the Fm(φ) converge to F (φ) as limit. Obviously, if the Fm converge to a limit, then the Fm form a convergent sequence. Conversely, if the Fm do form a convergent sequence, we define F by

F (φ) = lim Fm(φ). (6.2) m→0

As we will show later in this chapter, F is a continuous linear functional and the Fm converge to F as limit. P∞ PN Finally, we say that the series m=1 Fm is convergent and has sum F if m=1 Fm converges to F as N becomes infinite.

We now show that if the Fa are bounded there must be a finite r such that

(r) |Fa(φ)| ≤ K|φ | (6.3) for all a, φ for some finite constant K. That is, the |Fa|r are bounded. Indeed, if this were false for every r, we could, by induction on p, select a sequence of φp and Fap such that, for each p,

(n) −p a. |φp | < 2 ∀ n ≤ p, Pp−1 b. Fap (φp) > p + 1 + m=1 M(φm), −m c. Fap (φm) < 2 ∀ m > p.

16 Let i = 0, 1, . . . , p − 1 and suppose the φi,Fai have been selected to satisfy a., b., c., insofar as they are involved in these conditions. Since each Fai has |Fai |ri finite for some ri we will have −p |Fai |(φp) < 2 ∀ i if φp satisfies the p conditions

|φ(ri)| < 2−p |F |−1 , with i = 0, 1, . . . , p − 1. (6.4) p ai ri These p conditions as well as a. can be included in a single condition

(r) |φp | < K1, (6.5) with a suitable finite K1 and r = max(r0, . . . , rp−1, p). Since we are assuming that the |Fa|r are unbounded for every r, there must be an Fap and a φp such that φp satisfies this condition and

Fap (φp) satisfies b.. Then the φi, Fai with i = 0, 1, . . . , p will satisfy a., b., c., insofar as they are involved in these conditions. P∞ Then m=1 φm(x) would be a testing function φ(x) for which

∞ ! N ! X X Fap (φ) = Fap φm = lim Fap φm > p, (6.6) N→∞ m=1 m=1 and therefore, for this φ, the Fa(φ) would not be bounded. This contradiction shows that the (r) Fa are bounded if and only if |Fa(φ)| ≤ K|φ | for some fixed K and r, for all a and all testing functions. Using the result emphasized in the preceding chapter it can be concluded that continuous linear (r) functionals Fa are bounded if and only if they can be expressed as Fa = fa for some common r, with |fa| bounded.

As a corollary, we deduce that if Fm form a convergent sequence, then F (φ) = limm→∞ Fm(φ) is a continuous linear functional and the Fm converge to F ; for the convergence of the Fm implies that they are bounded, therefore |Fm|r ≤ K for some r and fixed finite K; and therefore

(r) |F (φ)| = lim |Fm(φ)| ≤ K|φ | (6.7) m→∞ which implies that F is a continuous linear functional. (r) If the Fm can be expressed as Fm = fm with the fm(x) converging uniformly, then

Z b Z b r (r) r   (r) Fm(φ) = (−1) fm(x)φ (x) dx → (−1) lim fm(x) φ (x) dx (6.8) a a m→∞ (r) so that the Fm do converge and limm→∞ Fm = {limm→∞ fm(x)} . The converse also holds. That (r) is, if the Fm are convergent then for some r, Fm = fm with fm(x) converging uniformly and (r) limm→∞ Fm = {limm→∞ fm(x)} .

Suppose therefore that the Fm are a convergent sequence so that the Fm(φ) converge for every φ to a limit F (φ). Then the F,Fm are bounded and therefore

(r) (r) Fm = fm ,F = f

17 for some suitable f(x), fm(x) with |f|, |fm| bounded. By replacing each of f(x), fm(x) by its integral (r) (r) from a to x and using r in place of the former r + 1 it can be supposed that F = f ,Fm = fm (r) with the f(x), fm(x) equicontinuous on (a, b). We can also write Fm = (fm + Pm) where the fm(x) are as before and the Pm(x) are arbitrary of degree less than r. It will be shown in a separate lemma that these polynomials can be chosen so as to give: fm(x) + Pm(x) converges (r) uniformly to f(x). This will prove the theorem: Fm converges if and only if Fm = fm for some r (r) and uniformly convergent continuous fm(x) and then F = limm→∞ Fm = (limm→∞ fm(x)) .

It remains therefore to prove the following lemma, with gm(x) in place of our former fm(x) − f(x).

Lemma. If gm(x) are equicontinuous in x on (a, b) and if for some fixed r and every testing function φ, Z b (r) gm(x)φ (x) dx a

converges to zero as m → ∞, then there are polynomials Pm(x) of degree less than r such that gm(x) + Pm(x) → 0 as m → ∞.

We may suppose that the gm(x) are real valued by considering real and imaginary parts separately if they are complex valued. We first prove the lemma for the case r = 0. Let  be an arbitrary positive number and δ() be such that for all m, |gm(x) − gm(y)| <  whenever |x − y| < δ(). Let (a, b) be covered by a finite number of intervals I1,...,It each of length less than δ(). Suppose that for some q and some Ip = (c, d) say, gq(x) ≥  ∀ x ∈ Ip. Then

Z b Z d gq(x)φc,d(x) dx ≥  φc,d(x) dx, (6.9) a c R b where φc,d is defined in Chapter 2. Since a gq(x)φc,d(x) dx → 0 as m → 0 it is impossible for this to be true for an infinite number of q for the same Ip. Since there are only a finite number of such Ip, there is an m0 such that, for m ≥ m0, gm(x) <  for at least one xp in each Ip. But every x ∈ (a, b) lies in some Ip, so

gm(x) = gm(x) − gm(xp) + gm(xp), (6.10) i.e., ∀ x ∈ (a, b), ∀ m ≥ m0, gm(x) < 2. Similarly, we can obtain −gm(x) < 2 and therefore |gm(x)| < 2 ∀ x, for all but a finite number of m. This means that the gm(x) converge uniformly to zero, proving the lemma for the case r = 0.

Now to prove the lemma for all r by induction. Suppose that θ(x) is a fixed function in S(a,b) with R b 0 a θ(x) dx = 1 and use the expansion lemma of Chapter 3 to write φ(x) = a0θ(x) + ρ1(x), where R b a0 = a φ(x) dx. Then

Z b Z b (p)  (p) (p+1) gm(x)φ (x) dx = gm(x) a0θ (x) + ρ1 dx a a (6.11) Z b Z b (p+1) = cm φ(x) dx + gm(x)ρ1 (x) dx a a

18 where

Z b (p) cm = gm(x)θ (x) dx (6.12) a and depends on gm but not on φ, and is bounded for all m. Repeated use of integration by parts shows that

Z b (−1)p Z b c φ(x) dx = c xpφ(p)(x) dx m m p! a a (6.13) Z b p (p) = −bm x φ (x) dx, a where bm is a constant bounded in m depending on gm but not on φ. Thus, we have the identity

Z b Z b p (p) (p+1) (gm(x) + bmx )φ (x) dx = gm(x)ρ1 (x) dx, (6.14) a a and it is clear that if the lemma holds for r = p, it also holds for r = p + 1. Therefore, it holds for all r.

We may now consider distributions on an open interval I. We say that the Fa are bounded if, for each φ, the Fa(φ) are bounded, and we say Fm converges to F if, for each φ, Fm(φ) converges to F (φ). Now our previous work enables us to draw relevant conclusions about the Fa(a,b),F(a,b) for each (a, b) ∈ I.

0 An important convergence property of distributions is if Fm converges to F , then Fm converges to F 0. This follows immediately from the fact that

0 0 0 0 F (φ) − Fm(φ) = − (F (φ ) − Fm(φ )) . (6.15) P∞ 0 P∞ 0 Thus, if F = m=1 Fm, then F = m=1 Fm, so that term by term differentiation is valid without restriction for distributions. In particular, if f(x) is a sum of uniformly convergent functions fm(x),

∞ X f(x) = fm(x), (6.16) m=1 each of which is absolutely continuous, then f(x) may not be absolutely continuous, but it will be 0 PN 0 summable and therefore have a distribution derivative f which will be the limit of m=1 fm(x) PN 0 0 as N → ∞. If f(x) happens to be absolutely continuous, then m=1 fm(x) will converge to f (x).

Let us say that a distribution Ft, depending on parameter t, converges to F as t → t0, if, for each 0 0 φ, Ft(φ) → F (φ) as t → t0. Then convergence of Ft to F implies that of Ft to F . Thus, starting from the function

Z t cos wx f(x) = lim − 2 dw (6.17) t→∞ 1 w we can differentiate twice, in the sense of distributions, to obtain

19 Z t f 00 = lim cos wx dw (6.18) t→∞ 1 R t so that we can give a meaning to 1 cos wx dw as a distribution although the integral is not convergent in the usual sense. R ∞ Let us calculate the value of 2 0 cos 2πwx dw, which, according to the preceding paragraph, will have meaning as a distribution. We set

Z t sin 2πtx gt(x) = 2 cos 2πwx dw = (6.19) 0 πx and observe that

Z ∞ sin 2πtx gt(φ) = φ(x) dx (6.20) −∞ πx converges to φ(0) as t becomes infinite (the well known Dirichlet integral of Fourier series), so that gt as a distribution converges to the Dirac δ. This gives the result Z ∞ 2 cos 2πwx dw = δ. (6.21) 0 Such formulas have been used classically in electromagnetic theory, in symbolic calculations in wave mechanics, but often without adequate explanation. As distribution formulas, they are unambiguous and have rigorous validity.

dF 0 If Ft is a distribution depending on parameter t, we can define dt (not to be confused with F ) d as the distribution F for which F (φ) = dt Ft(φ) ∀ φ, if such an F exists. Similarly, we can define R t2 R t2 Ft dt to be F if Ft(φ) dt exists and equals F (φ) ∀ φ. t1 t

20 Chapter 7

Nullity Sets and Supporting Sets

A distribution F , even though it is more general than a point function, can be said to have “local” properties. That is, properties which concern a particular, but arbitrary, x and neighborhoods of that x. We have already seen that for each x there is an interval (c, d) with c < x < d such that F has finite order on (c, d).

We say that a particular x0 ∈ I is in the nullity set of a distribution F on I if F(c,d) = 0 for some c < x0 < d. The complement with respect to I of the nullity set of F will be called the supporting set of F . This nullity set is open and the supporting set is closed, relative tot he containing interval I. If F happens to be identifiable with a continuous point function f(x), then the supporting set of F will be the closure of the set of x for which f(x) 6= 0, the closure of an open set. The nullity set will be the interior of the set of x for which f(x) = 0, the interior of a closed set. But these statements are not valid for arbitrary locally summable, but continuous, f(x). The zero distribution has a nullity set which includes all x so that its supporting set is the empty set. Conversely, the zero distribution is the only distribution with these properties. On any (a, b) ∈ I,F = f (r) with f(x) continuous on (a, b) so that

Z b r (r) F (φ) = (−1) f(x)φ (x) dx ∀ φ ∈ S(a,b), (7.1) a and therefore for all φ ∈ S(c,d) with a ≤ c < d ≤ b. If some particular F(c,d) = 0, the Lemma of Chapter 6 (with all gm of the Lemma equal to f) implies that f(x) is a polynomial of degree less than r on (c, d). By the Heine-Borel Theorem, (a, b) can be covered by a finite number of such (c, d) on each of which f(x) is a polynomial of degree less than r. Thus, f(x) is a polynomial of degree less than r on (a, b) and F (φ) = 0 ∀ φ ∈ S(a,b). Therefore F = 0, as required. This type of reasoning also shows that for a particular φ, F (φ) = 0 in the closure of the set x for which φ(x) 6= 0 is contained in the nullity set of F . It follows that the value of F (φ) depends only on the values of φ(x) in the neighborhood of the supporting set of F , that is, on the values of φ(x) in any open set containing the supporting set. The preceding statement cannot be sharpened to imply that F (φ) depends only on the values of φ(x) on the supporting set of F . For example, the derivative of the Dirac δ, namely δ0, has the origin as the only point in its supporting set, yet δ0(0) = −φ0(0) is not determined by the value

21 of φ(0). Since on every (a, b), each F has the form f (r) we can show that for each (a, b) and each

F there is a fixed r such that the value of F (φ), for φ ∈ S(a,b), depends only on the values of φ(x), φ0(x), . . . , φ(r)(x), on the supporting set of F . Indeed, N, the nullity set of F , is an open set and can be expressed as the set union of a finite or countable number of disjoint open intervals hcn, dni. Then

Z b F (φ) = (−1)r f(x)φ(r)(x) dx a (7.2) X Z dn Z = (−1)r f(x)φ(r)(x) dx + (−1)r f(x)φ(r)(x) dx. n cn I−N

On each hcn, dni, f(x) is a polynomial Pn(x) of degree r − 1 at most, so that, when integrated, R dn f(x)φ(r)(x) dx depends only on the values of φ(x), φ0(x), . . . , φ(r)(x) at c and d . Since all cn n n (r) cn, dn are in the supporting set of F and since the value of the integral of f(x)φ (x) over I − N, the supporting set of F , depends only on the values of φ(r) on the supporting set of F , the desired conclusion can be drawn. A corollary of the previous paragraph is that a distribution has a supporting set consisting of one point only, the origin, if and only if the distribution is a finite linear combination of the Dirac δ and its derivatives:

N X (p) F = cpδ . (7.3) p=0

For any fixed x0, let δx0 denote the distribution

δx0 (φ) = φ(x0), (7.4) so that the δ of our previous notation coincides with δ0. From now on, we call any such distributions

δx0 a Dirac delta. Then it follows, as above, that a distribution on I has a supporting set consisting of isolated points if and only if the distribution is a finite or infinite linear combination of Dirac deltas and their derivatives such that for each (a, b) ∈ I, only a finite number of these deltas and their derivatives are taken at points in (a, b). It is easily seen that the supporting set of F contains the supporting set of F 0 and an analysis of the non-increasing family of supporting sets of F (n) gives some information about the local structure of F . For use later in this chapter, we require the following observations: If a < c < d < b, there is a continuous function a(x) with derivatives a(n)(x) of all orders such that a(x) = 0 for x ≤ c and a(x) = 1 for x ≥ d. Such a function can be obtained by defining a(x) for c < x < d as

Z d −1 Z x a(x) = φc,d(t) dt φc,d(t) dt, (7.5) c c

22 where φc,d was defined in (2.5). Therefore, every φ ∈ S(a,b) can be put in the form φ1 + φ2, with φ1 ∈ S(a,d) and φ2 ∈ S(c,b); set φ1 = (1 − a)φ and φ2 = aφ. Repeated applications of this observation show that if (a, b) can be covered by a finite number of open intervals Ip = (cp, dp), with p = 1,...,N, then every φ ∈ S(a,b) can be expressed as a sum

N X φp p=1 with φp ∈ S(cp,dp). Finally, if each x0 ∈ (a, b) is covered by at least one of a family of open intervals Ia, then (a, b) is covered by a finite number of such Ia by the Heine-Borel Theorem. This leads to the partition theorem: a finite number of (c, d) can be selected, each an Ia, so that every φ can be expressed as a sum of φp, with each φp in one of the S(c,d). Moreover, this can be done in such a way that the condition on a sequence of φ that they and all their nth derivatives converge uniformly to zero is equivalent to the same condition for their φp ∀ p. Now suppose that T (φ), not assumed to be a distribution, is defined for certain φ as follows: for a family of (c, d) ∈ I, T (φ) is defined for all φ ∈ S(c,d) and, when restricted to S(c,d) is a continuous linear functional T(c,d) on (c, d); suppose too that each x0 ∈ I is covered by the interior of at least on (c, d). In other words, suppose that T (φ) defines a distribution locally. Then there is one and only one distribution F on I which extends T , that is, for which F(c,d) = T(c,d) for all such (c, d). For an arbitrary φ can be expressed as the sum of a finite number of φp, with φp in some S(c,d). P 1 Then T φp is defined for each p, and F , if it exists, must satisfy F (φ) = p T (φp). On the other hand, this actually defines F (φ) uniquely and this F is a distribution with the properties stated above, as follows from the preceding paragraph. The preceding observations enable us to generalize our definition of the product of two distributions.

If F1,F2 are two distributions on I, we say that their product is defined locally at x0 if F1(c,d) F2(c,d) is defined, in accordance with our previous definition, for some (c, d) with interior covering x0. If the product F1F2 is defined locally at every x, there is one and only one distribution F on I such that F(cd) = F1(c,d) F2(c,d) ∀ (c, d). We define this F to be the product F1F2. In order that the product F1F2 be defined locally at x0, it is easily seen that a necessary (but not sufficient) condition is that x0 must not be an isolated point in the supporting sets of F1 and F2. Thus, F1F2 is not defined if there are any common isolated points in the supporting sets. For example, the product of δ0 by itself is not defined. On the other hand, F1F2 will certainly be defined and will be the zero distribution if the supporting sets have no points in common.2

1This gives a simple proof that if F is a distribution with nullity set that includes all x, then F must be the zero distribution. 2 That this sufficient condition is not necessary is shown by the fact that xδ0 is the zero distribution.

23 Chapter 8

Differential Equations Involving Distributions

Consider the differential equation

(m) (m−1) G + Am−1G + ... + A0G = F (8.1) under the assumptions that each Am−s is a bounded function of x on a closed finite interval (a, b) and that F is a continuous function if x summable on (a, b). A point function G(x) is a solution of the differential equation if G(x),...,G(m−1) are absolutely continuous and the G(x),...,G(m)(x) satisfy the given equation for almost all x. All such solutions can be found by the classical method of successive substitutions, as follows:

[−1] R x Let H (x) denote a H(t) dt for any summable H(x), so that Z x (x − t)(s−1) H[−s](x) = H(t) dt. (8.2) a (s − 1)! Let ∆H denote

(m−1) −(Am−1(x)H (x) + ... + A0(x)H(x)) for any H(x) such that H(x),...,H(m−1)(x) exist and are absolutely continuous. Then set

[−m] G0(x) = F (x) (8.3) and

[−m] [−m] Gk+1(x) = F (x) + (∆G) (x) for k ≥ 0. (8.4) The series

∞ X G0(x) + (Gk+1(x) − Gk(x)) k=0 will converge uniformly to a sum G(x) and the first, second, . . . , mth differentiated series will converge uniformly to G0(x),...,G(m)(x) respectively. From this, it can be shown that G(x) is a

24 solution of the given differential equation, to be called a particular integral. From the construction (p+1) (p) it also follows that if, for any p, all coefficients Am−s (x) have Am−s(x),...,Am−s(x) absolutely (p+1) 0 (p) continuous, and Am−s (x) bounded, and F (x) has F (x),F (x),...,F (x) absolutely continuous, 0 (m+p) then G(x) has G(x),G (x),...,G (x) all absolutely continuous. In particular, if the Am−s(x) and F (x) are indefinitely differentiable, then G(x) is indefinitely differentiable. Now m linearly independent solutions of the homogeneous equation (setting F = 0) can be obtained by choosing r to be in turn 0, 1, . . . , m − 1 and setting

xr G (x) = , (8.5) 0 r!

[−m] Gk+1(x) = (∆Gk) (x), (8.6) and

∞ X G(x) = G0(x) + (Gk+1(x) − Gk(x)). (8.7) k=0

Without danger of confusion, we now adopt the notation Gp(x) for the particular integral and G0(x),...,Gm−1(x) for the solutions of the homogeneous equation above. Then for any constants c0, . . . , cm−1, the function

G(x) = Gp(x) + c0G0(x) + ... + cm−1Gm−1(x) (8.8) will be a solution of the given differential equation, and Gp will have differentiability properties, as explained above, depending on those of F and the Am−s, while G0,...,Gm−1 will have differ- entiability properties depending on those of the Am−s only. As can be shown, there are no other point function solutions of the given differential equation. We now regard the given differential equation as an equation in an unknown continuous linear functional G. The point function solutions previously found are also continuous linear functionals. We will show that there are no other continuous linear functional solutions. Indeed, if the order of G(m) were greater than zero, it would be greater than the orders of each G(m−1),...,G, F , and therefore greater than the order of

(m−1) F − (Am−1G + ... + A0G) which is supposed to be equal to G(m). This contradiction shows that the order of G(m) must be zero, so that G(m) is a summable point function, and therefore, G is a point function solution as defined above. The above technique of solving the differential equation can be applied if F is an arbitrarily assigned continuous linear functional provided that the Am−s are restricted to be bounded point functions satisfying another condition. Let F −1 denote any primitive of F , and by induction, F −s any primitive of F −(s−1); of course the F −s are defined only to within an arbitrary additive polynomial −s of degree s − 1. Now we require that Am−sF be defined as a product of two continuous linear

25 functionals for s = 1, . . . , m. This condition is certainly satisfied if the Am−s(x) happen to be all indefinitely differentiable. Under the above assumptions, if the order of F is greater than zero, we rewrite the differential equation as

−m (m) −m (m−1) −m (G − F ) + Am−1(G − F ) + ... + A0(G − F ) = F1, (8.9) where

−1 −2 −m F1 = −(Am−1F + Am−2F + ... + A0F ). (8.10) −m Now we have an equation in G − F in which the right side F1 is of order less than that of F . By repeated reductions of the order of the right side we finally obtain a differential equation in which the right side is a summable point function. The use of ordinary point function integrals can then be combined with the method of successive substitutions to yield the theorem: there is a continuous linear functional Gp and m linearly independent point functions G0,...,Gm−1 such that

G = Gp + c0G0 + ... + cm−1Gm−1 (8.11) is a continuous linear functional solution of the given differential equation for arbitrary c0, . . . , cm−1, (m) and there are no other continuous linear functional solutions. The order of Gp is precisely the order of F and the G0(x),...,Gm−1(x), as point function solutions of the homogeneous equation, have differentiability properties on those of the Am−s(x). This discussion of continuous linear functional solutions on each (a, b) leads without difficulty to the corresponding distribution solutions of the differential equation on an open interval I. The more general equation

(m) (m−1) Am(x)G + Am−1(x)G + ... + A0(x)G = F (8.12) can be reduced to the case Am = 1, providing that the coefficient Am(x) can be factored out. In this case, the system of solutions is as described above. However, in the singular case, where Am(x) vanishes at certain points, quite different results may be obtained. Under some circumstances, the homogeneous equation may have no solutions other than the trivial one - the zero distribution. Under other circumstances, the solutions may depend on a number of arbitrary parameters greater than the order of the equation. For example, consider the equation

xG0 + G = 0. (8.13) This can be written as (xG)0 = 0 and the solutions must satisfy xG = k for some constant k. This seems to give a one-parameter family of solutions G = k/x. Although these are solutions in a certain sense, they are not absolutely continuous over any interval including the origin, and they are not locally summable functions and hence cannot be identified with distributions. From the point of view of distributions, the equation xG = k does have solutions, for instance

Z b kφ(x) G(φ) = p.v. dx (8.14) a x

26 1  where p.v. indicates the Cauchy principal value. If we denote this solution by k p.v. x , then all other solutions differ from it by a solution of the equation xG1 = 0. Such a G1 must have supporting set consisting of one point, the origin, and therefore has the form

N X (r) G1 = crδ0 . (8.15) r=0 Then

N N X  (r) X (r−1) xG1 = cr xδ0 = cr(−r)δ0 (8.16) r=0 r=0 if and only if cr = 0 for r > 0. Thus, G1 = c0δ0 is the set of all such solutions G1. Therefore, the solutions of equation (8.13) consists precisely of all distributions

 1  G = k p.v. + k δ (8.17) 1 x 2 0 and depends on two independent parameters.

27 Chapter 9

Distributions in Several Variables - Examples

The preceding theory can be extended to distributions in n variables where n ∈ N. In this chapter, we give definitions and some examples. The next chapter will sketch some of the theory.

Let x now denote an n-tuple of real numbers: x = (xi) = (xi, . . . , xn) which may be thought of as a point in n-dimensional space. x ≤ y will mean xi ≤ yi ∀ i. If a = (ai), b = (bi) are arbitrary but fixed, with ai < bi ∀ i, we let R denote the closed rectangle a ≤ x ≤ b. SR will denote the set of all continuous functions φ(x) = φ(x1, . . . , xn) which possess continuous partial derivatives of all orders:

∂p1+...+pn (p) (p1,...,pn) φ (x) = φ (x1, . . . , xn) = p p φ(x1, . . . , xn) (9.1) (∂x1) 1 ... (∂xn) n and which, together with all φ(p)(x), vanish on ∂R and outside R. A linear functional F (φ) defined

∀ φ ∈ SR will be called a continuous linear functional on R if: whenever φ, φm are in SR and for (p) (p) each p, φm (x) converges uniformly to φ (x), then F (φm) converges to F (φ). If Ω is any bounded or unbounded open rectangle in n-dimensional space (for example, all n- dimensional space), we define F to be a distribution on Ω if, for each R ∈ Ω, F (φ) is defined for

φ ∈ SR and, when restricted to such φ is a continuous linear functional to be denoted by FR. For any distribution F on Ω and any p, we define a partial distribution derivative on Ω by the formula

F (p) = (−1)p1+...+pn F (φ(p)). (9.2) Then every distribution has partial derivatives of all orders and the result of successive differenti- ation is always independent of the order of differentiation. A distribution F on Ω will be identified with a point function f(x) if, for each R ∈ Ω, f(x) is summable on R, and Z Z Z F (φ) = f(x) dx = ... f(x1, . . . , xn)φ(x1, . . . , xn) dx1 ... dxn (9.3) for each φ ∈ SR, the integration being taken over R or, equivalently, over all n-dimensional space.

28 Identification of certain distributions with Stieltjes measures in one or more of the xi can also be made. To take one example, suppose ψ(x1, x2, x3, . . . , xn) is a real valued function and suppose that on each R ∈ Ω the following holds: for almost all fixed (x3, . . . , xn), the function ψ considered as a function of x1, x2 has finite variations v+(x3, . . . , xn), v−(x3, . . . , xn) which are summable over R. Then ψ determines a distribution F on Ω by the formula Z Z ZZ 

F (φ) = ... dx3 ... dxn φ(x1, . . . , xn) dx1x2 ψ(x1, x2, . . . , xn) (9.4)

A set of distributions Fa will be said to be bounded if, for each fixed φ, the Fa(φ) are bounded. Fm will be said to converge as m → ∞ if Fm(φ) converges as m → ∞, for each fixed φ; in which case, as will be shown in the next chapter, the limiting value of Fm(φ) will determine a limit distribution of F (φ). Similarly, Ft will be said to converge to F as t → t0 if Ft(φ) → F (φ) for each fixed φ. P∞ Ph A series m=1 Fm will be said to have sum F if m=1 Fm converges to F as h → ∞. Continuity, respectively convergence, of distributions implies that of their distribution derivatives of all orders so that term by term differentiation is always valid.

A point x0 will be said to be in the nullity set of F if for some R which covers x0, FR is the identically zero continuous linear functional. The complement of the nullity set of F will be called the supporting set of F . The sum of two distributions and a constant times a distribution are defined in the obvious way. In the next chapter we will develop some of the theory of distributions in n variables. We conclude this section with examples.

For fixed x0, let F be the distribution on all n-dimensional space: F (φ) = φ(x0). This is a generalization of the Dirac δ function. It can be identified with the n-dimensional Stieltjes measure dx1...xn H(x1, . . . , xn), where H(x) is the Heaviside function: ( 0 for x < x H(x) = 0 (9.5) 1 for x0 ≥ x.

In our terminology, this Dirac δ function, δx0 , is the pth derivative of H(x), where p = (1, 1,..., 1). In terms of physical concepts such as mass and charge distribution, this δ distribution corresponds to a unit of mass or charge concentrated at one point, x0.

If we use the same H(x) but take the pth derivative for any p = (pi) with pi = 0 or 1 for each i, we obtain a variety of mixed δ distributions: thus, ∂H is the distribution ∂x1 Z Z F (φ) = ... φ(x01, x2, . . . , xn) dx2 ... dxn, (9.6) the integration being taken over the n − 1 dimensional quadrant x1 = x01, xi ≥ x0i for i > 1. In physical terms, this corresponds to an n − 1 dimensional surface distribution with unit surface density on the quadrant. The Heaviside function defined above has the value 1 on an n-dimensional quadrant and vanishes outside the quadrant. We now generalize by replacing the quadrant and vanishes outside the quadrant. We now generalize by replacing the quadrant by a rectangle or sphere or any other

29 n-dimensional set J which has a hypersurface H sufficiently smooth to permit calculations using integration by parts. Thus, we now set H(x) = 1 for x ∈ J or on the hypersurface H(J) and we set H(x) = 0 for all other x. Then ∂H , as a distribution, corresponds to a surface distribution of ∂x1 mass or charge on H with the surface density at a point of H equal to cos θ1, where θi indicates the angle between the inner normal to H and the positive xi axis. More generally, let f(x) be a point function which is continuous and has continuous partial deriva- tives when considered only within a closed set J (that is, including its hypersurface H). Suppose, however, that f(x) vanishes outside J. Then f determines a distribution F . Calculation shows that the derivative distribution ∂F is made up of two terms. The first term is a distribution which ∂x1 can be identified with the “ordinary” derivative ∂f which is supposed to exist at all x with possible ∂x1 exceptions for x on H. The second term is a distribution corresponding to a surface distribution on H whose density at a point of H is the value of f at that point multiplied by cos θ1. The ∂2F distribution 2 consists of three terms. The first term corresponds to an n-dimensional volume (∂x1) ∂2f distribution over J with density 2 . The second term corresponds to a hypersurface distribu- (∂x1) ∂f tion on H with density cos θ1. The third term corresponds to a layer of doublets or dipoles ∂x1 all parallel to the x1 axis, and carried by H with surface density −f(x) cos θ1. The Laplacian distribution of F , namely

∂2F ∂2F ∂2F 2 + 2 + ... + 2 , (∂x1) (∂x2) (∂xn) written ∆F , also consists of three terms. The first term corresponds to a volume mass distribution over J, identifiable with the ordinary Laplacian of f. The second term corresponds to a surface df d distribution H with density dn ( dn indicating the derivative along the inner normal). The third term corresponds to a double layer (as in potential theory), that is, a layer of doublets carried by H, all normal to H, with surface density −f(x). Symbolically,

(∆F )(φ) = F (∆φ) Z = (f)(∆φ) dx J (9.7) Z Z df  Z dφ = (∆f)(φ) dx + (φ) dS − (f) dS. J H dn H dn This formula is well knon in potential theory as Green’s formula, given here with an interpretation from distribution theory. In the same way the Gauss, Riemann, Green, and Stokes equations can be given interpretations in terms of distribution derivatives of discontinuous point functions. Another example, of great importance in the theory of Harmonic functions, is the fundamental 1 1 solution of the Laplace equation: f(x) = rn−2 when n ≥ 3 and f(x) = log r when n = 2. Here, r denotes distance from the origin to the point x:

n !1/2 X 2 r = (xi) . (9.8) i=1 The function f(x) is harmonic, that is, its ordinary Laplacian is zero, at each point x except at the origin where the “ordinary” Laplacian does not exist as a point function. However, this f(x) can

30 be considered as a distribution and its Laplacian is easily calculated. We find that this Laplacian distribution is not the identically zero distribution. It turns out to be −Nδ0, where δ0 is the n-dimensional Dirac δ distribution corresponding to unit mass concentrated at the origin, and N is a constant which depends on the dimension n:

(n − 2)2πn/2 N = n  (9.9) Γ 2 if n ≥ 3. In particular, N = 4π if n = 3, and N = 2π if n = 2. It is highly significant that this fundamental or elementary solution is really a solution of the nonhomogeneous Laplace equation with second member a δ distribution. We choose our final example from the theory of holomorphic functions. We consider the case n = 2, f = f(x, y) a complex function and set

1 x = (z +z ¯) (9.10) 2 and

1 y = (z − z¯). (9.11) 2i ∂ ∂ z, z¯ cannot be considered as independent variables but we define ∂z , ∂z¯ as ∂ 1  ∂ ∂  = − i , (9.12) ∂z 2 ∂x ∂y and

∂ 1  ∂ ∂  = + i . (9.13) ∂z¯ 2 ∂x ∂y The necessary and sufficient conditions that f(x, y) be holomorphic in the neighborhood of a point (x, y) are that f be continuously differentiable and satisfy the Cauchy-Riemann equations ∂f in this neighborhood: ∂z¯ = 0. At a single point, these conditions are meaningless. However, ∂F if f determines a distribution F , then the distribution ∂z¯ can be formed and it will not be the ∂F identically zero distribution. This ∂z¯ can be a useful tool in studying the singularity. ∂F Thus, if f = 1/z, it turns out that ∂z¯ = πδ0. This distribution theorem could be used in the development of the theory of holomorphic functions, Cauchy integrals, and so on. It could also be useful in the more difficult study of functions of several complex variables. The function f = 1/zm is not summable for m > 1, but f determines a distribution through the use of Hadamard finite part which coincides in this case with the Cauchy principal value.

 1   1  F = F.p. = p.v. (9.14) zm zm and

∂F ∂  ∂ m−1 (−1)m−1 1 (−1)m−1π  ∂ m−1 = = δ . (9.15) ∂z¯ ∂z¯ ∂z (m − 1)! z (m − 1)! ∂z 0

31 A general meromorphic function f(z) determines a distribution F if the Cauchy principal value ∂F is used at poles of order greater than one. Then ∂z¯ is a sum of distributions, each of which has a supporting set consisting of a singe point. These single points are the poles of the mero- ∂F morphic function and the distribution ∂z¯ expresses the ordinary residue formula in the theory of meromorphic functions.

32 Chapter 10

Distributions in Several Variables - Theory

First, the expansion lemma of Chapter 3 has extensions to n variables, of which the simplest is as follows:

Lemma. Let R be a closed n-dimensional rectangle ai ≤ xi ≤ bi, where i = 1, . . . , n and let R0 be the n − 1 dimensional rectangle ai ≤ xi ≤ bi, with i = 2, . . . , n. Let θ(x1)

be in S(a1,b1) with

Z b1 θ(x1) dx1 = 1 (10.1) a1

and for each φ ∈ SR, let

Z b1 φ0(x2, . . . , xn) = φ(x1, . . . , xn) dx1. (10.2) a1

Then φ0 is in SR0 and

∂ φ = φ0θ + ρ (10.3) ∂x1

with a uniquely defined ρ ∈ SR.

∂G Therefore, for every distribution F on R, there are x1 primitives G, that is, with = F , and ∂x1 they can be expressed by the formula

G(φ) = G0(φ0) − F (ρ), (10.4) where G0 is an arbitrary, but fixed, distribution on R0, thus G0 is a distribution in the variables x2, . . . , xn.

If Ω is an open rectangle αi < xi < βi, i = 1, . . . , n and Ω0 is the open rectangle αi < xi < βi, i = 2, . . . , n put

33 Z β1 φ0 = φ(x1, . . . , xn) dx1 (10.5) α1 and choose θ(x1) in some S(a1,b1) with α1 < a1 < b1 < β1 and

Z b1 θ(x1) dx1. (10.6) a1 Then we may, again, write

∂ρ φ = φ0θ + (10.7) ∂x1 and the x1 primitives of F on Ω can be expressed by

G(φ) = G0(φ0) − F (ρ), (10.8) where G0 is an arbitrary distribution on Ω0. Next, with the methods of Chapter 5, we show that if F is a continuous linear functional on a finite (r) closed rectangle R, then F = f for some suitable summable f(x) and some r = (r1, . . . , rn). Exactly as in Chapter 5, we begin by proving that there must be a p = (p1, . . . , pn) which depends on F , such that

(q) (q) (Cp)— F (φm) converges to F (φ) whenever φm converges uniformly to φ for each q ≤ p.

Then, since for q ≤ p,

Z x1 Z xn p1−q1−1 pn−qn−1 (q) (x1 − t1) (xn − tn) (p) φ (x) = ...... φ (t) dt1 ... dtn. (10.9) a1 an (p1 − q1 − 1)! (pn − qn − 1)! It follows that |φ(q)| ≤ K|φ(p)| ∀ q ≤ p for some finite K which depends on p and R but not on φ. Therefore, for this p,

(p) (p) (Bp)— F (φm) converges to F (φ) whenever φm converges uniformly to φ .

This, in turn, implies (as in Chapter 5),

(Ap)— |F (φ)| ≤ |F |p|φ(p)|

for all φ, for some finite constant |F |p - let |F |p denote the least possible such constant. Finally, as in Chapter 5, we form the linear functional L,

L(φ(p)) = F (φ) (10.10)

34 defined for functions of the form φ(p) with norm |φ(p)| and extend this functional to all continuous functions by the Hahn-Banach procedure. The Riesz representation theorem then applies: there is a function ψ(x) of total (n-dimensional) variation equal to |F |p such that Z F (φ) = L(φ(p)) = φ(p)(x) dψ(x) (10.11) R and we may also suppose that |ψ(x)| ≤ |F |p ∀ x. This implies, by an n-dimensional integration by parts, Z F (φ) = (−1)n ψ(x)φ(p1+1,...,pn+1)(x) dx, (10.12) R

(r) p1+...+pn so that F = f with r = (p1 + 1, . . . , pn + 1) and f(x) = (−1) ψ(x) (note that |f| ≤ |F |p). If f(x) is replaced by its integral,

Z x1 Z xn ... f(t1, . . . , tn) dt1 ... dtn, (10.13) a1 an (r) and r is taken to be (p1 + 2, . . . , pn + 2) we obtain F = f with f continuous. Next, we generalize the results of Chapter 6. The method used in Chapter 6 serves to prove that if Fa are bounded on R then for all a and φ,

(p) |Fa(φ)| ≤ K|φ | (10.14) for some p and some finite K independent of a, φ, that is, the |Fa|p are bounded. As in Chapter 6, it follows that the Fa on R are bounded if and only if they can be expressed in the form

(r) Fa = (fa) (10.15) with |fa| bounded, or equivalently (with different fa and r), with fa(x) a set of equicontinuous point functions. From this we conclude that if Fm is convergent, its limit

F (φ) = lim Fm(φ) (10.16) m→∞ must be a continuous linear functional. We can conclude also that there must be a representation (r) Fm = fm with fm(x) a uniformly convergent sequence of continuous functions with the following lemma.

Lemma 2. If gm(x) are equicontinuous and for some fixed r and every φ,

Z (r) gm(x)φ (x) dx (10.17) R

converges to zero as m → ∞, then there are equicontinuous functions Pm(x) of the form

n ri−1 X X j Pm(x) = h(x1, . . . , xi−1, xi+1, . . . , xn)xi (10.18) i=1 j=0

35 such that the gm(x) + Pm(x) converge uniformly to zero.

To prove this lemma, consider the case r = (0,..., 0) exactly as in Chapter 6, but with intervals replaced by n-dimensional rectangles and then complete the proof by use of induction on the indices, one at a time. As for the partition theorem of Chapter 7, it can be proved for finite closed n-dimensional rectangles by using the products of functions a(xi), i = 1, . . . , n, with each a of the type described in Chapter 7. It then follows that the local character determines the distribution uniquely and, in particular, the zero distribution is the only one with nullity set which includes all x.

36 Chapter 11

The Convolution Product

We now define two bilinear products. First, the direct product of two distributions; for locally summable point functions f(x), g(y) the direct product of f(x) × g(y) should reduce to the usual product f(x)g(y) which is a locally summable function in two variables. This suggests that for distributions Sx,Ty over the spaces of x, y respectively, Sx × Ty should be defined so that Sx × Ty · φ(x, y) = Sxu(x)Tyv(y) whenever φ(x, y) is the product of testing functions u, v in x, y respectively. Now it can be shown that this requirement leads to a unique Sx × Ty. Moreover, for a general testing function (x, y) it can be shown that Ty · φ(x, y) is a testing function in x and that

Sx × Ty · φ(x, y) = Sx · (Ty · φ(x, y)) = Ty · (Sx · φ(x, y)). (11.1)

For example, if δx, δy are Dirac δs in x space, y space respectively the δx × δy is a δ in (x, y) space. We use the direct product to define the more important convolution product. For point functions the convolution product h = f ∗ g is well known and Z Z h(x) = f(x − t)g(t) dt = f(t)g(x − t) dt. (11.2)

However, local summability of f and g is not sufficient to ensure that h exists and is locally summable. This difficulty disappears if at least one of f, g has a bounded supporting set. For example, if if B(x) = 1 for |x| < 1 and 0 elsewhere, then at each x the value of f ∗ B is the average of f over all t with |x − t| < 1. To extend the definition of convolution product to distributions, we first write

ZZ h(φ) = f ∗ g · φ = f(x − t)g(t)φ(x) dx dt ZZ = f(ξ)g(η)φ(ξ + η) dξ dη (11.3)

= fξ × gη · φ(ξ + η).

This suggests for distributions

(S ∗ T )x · φ(x) = Sξ × Tη · φ(ξ + η). (11.4)

37 If at least one of S,T has a bounded supporting set it can be shown that this formula leads to a unique S ∗ T even though φ(ξ + η) does not vanish outside a finite rectangle in ξ, η space (that is, φ(ξ + η) is not a testing function). This definition leads to identities:

• δ ∗ T = T ,

• δ0 ∗ T = T 0,

• DpT = Dp ∗ T .

We use Dp to represent both the differentiation operator and also the distribution Dpδ; we use the Laplacian ∆ in the same way. In general, (S ∗ T )0 = S0 ∗ T = S ∗ T 0. It follows from this that if a is a testing function then T ∗ a (the regularization of T by a) can be formed, since a may be considered as a distribution with bounded supporting set, and this T ∗ a is always a point function with point function derivatives of all orders. This leads to the theorem that every distribution is a limit of testing functions (considered as distributions). In a space of n > 2 dimensions, it is customary to define the potential U f associated with the mass density f(x) as follows:

Z f(t) U f = dt, (11.5) |x − t|n−2

pP 2 where |x − t| is the Euclidean distance r = (xi − ti) . This extends to distributions in the form

1 U T = T ∗ , (11.6) rn−2 and gives the Poisson equation

1 ∆U T = U ∆T = ∆ ∗ ∗ T = −Nδ ∗ T = −NT, (11.7) rn−2 where N has the same value as in Chapter 9. The use of convolutions with distributions has important applications in the Cauchy problem in partial differential equations.

38 Chapter 12

Fourier Series for Distributions

For simplicity, we will take the period 1 instead of 2π and let x correspond to the points of a circle such that the points x = 0 and x = 1 are identified. A testing function is now a continuous function φ(x) with continuous derivatives of all orders such that φ and its derivatives have the (p) same values at x = 0 as they do at x = 1. We say that φm → φif for each p, φm (x) converges uniformly φ(p)(x). A distribution T is now a linear functional defined for all such φ and such that

T · φm → T · φ if φm → φ in this sense. In the present situation, every distribution T is identifiable with a finite sum of continuous point functions and derivatives of such continuous point functions. The usual calculus of distributions applies but now convolution S ∗ T is defined for S,T . First, Fourier coefficients are defined as follows. For a function f(x) we have

Z 1 −2iπkx ak(f) = f(x)e dx. (12.1) 0 Generalizing this formula for distributions, we define

−2iπkx ak(T ) = T · e . (12.2) Then every distribution T has a sequence of Fourier coefficients.

Secondly, for each distribution there is a polynomial in k, P (k), such that |ak(T )| ≤ P (k). Indeed, if T is a continuous function f, its Fourier coefficients are bounded and if T = f (p), then

p ak(T ) = (2iπk) ak(f). (12.3)

Thirdly, the Fourier series

X 2iπkx ak(T )e (12.4) is always convergent and converges to the distribution T . Let us prove this first for the special case of the δ distribution. Here, the coefficients δ e2iπkx = e0 = 1 and we need to show that P e2iπkx = δ. But this is so since for every testing function φ,

X 2iπkx X e · φ = a−k(φ) = φ(0) = δ(φ). (12.5) Now for a general T we have

39 X 2iπkx X −2iπkξ 2iπkx ak(T )e = Tξ · e e

X 2iπk(x−ξ) X 2iπkx = Tξ · e = T ∗ e (12.6) X = T ∗ e2iπkx = T ∗ δ = T.

It can be shown that every trigonometric series with coefficients bounded by a fixed polynomial converges to some distribution with the given coefficients as its Fourier coefficients.

40 Bibliography

[1] I. Halperin and L. Schwartz. Introduction to the Theory of Distributions. Canadian Mathe- matical Congress: Lecture series. University of Toronto Press, 1952.

[2] G. Grubb. Distributions and Operators. Graduate Texts in . Springer, 2009.

[3] R.S. Strichartz. A Guide to Distribution Theory and Fourier Transforms. Studies in advanced mathematics. World Scientific, 2003.

[4] I.M. Gelfand and G.E. Shilov. Generalized Functions: Vol.: 1-5. New York, 1968.

[5] F. Treves. Topological Vector Spaces, Distributions and Kernels. Dover Books on Mathematics. Dover Publications, 2013.

[6] L. Schwartz. Th´eorie des distributions: Texte imprim´e. Publications de l’Institut de math´ematiquede l’Universit´ede Strasbourg. Hermann, 1966.

[7] John Renze. Lebesgue integrable. http://mathworld.wolfram.com/LebesgueIntegrable. html.

[8] Vigirdas Mackevicius. LebesgueStieltjes Measures on the Real Line and Distribution Functions, pages 223–231. John Wiley & Sons, Inc., 2014.

[9] J. Hadamard. Lectures on Cauchy’s Problem in Linear Partial Differential Equations. Dover phoenix editions. Dover Publications, 2003.

[10] S. Banach. Theory of Linear Operations. North-Holland Mathematical Library. Elsevier Science, 1987.

41