<<

Physics 525 (Methods of Theoretical Physics) Fall 2014

Theory of Distributions (v. 2)

In this document I will summarize the definitions and statements of the simplest theorems of the theory of distributions. This gives a consistent framework for using things like the Dirac delta . The concepts provide a very useful framework for many calculations in quantum mechanics (which was why Dirac invented the delta function). This document only has the mathematical definitions and the theorems, but leaves out the motivations etc., which I treated in class. A useful book on the subject is: Ian Richards and Heekyung Youn, “Theory of distributions, a non-technical introduction” (Cambridge University Press, 1990), call #: QA273.6.R53 1990 (in Physical Sciences Library). Another good treatment is in Sec. 4.1 of J.P. Keener, “Principles of Applied ” (Westview, 2000) ISBN 0–7382–0129–4, call #: QA601.K4 1999. Yet another is found in Ch. 2 of M. Stone and P. Goldbart, “Mathematics for Physics” (Cambridge University Press, 2009) ISBN 978–0–521–85403–0, call #: QC20.S76 2009. One annoyance is that there are a of notations that are not, strictly speaking correct, but constitute an “abuse of notation”. These notations are very useful for guiding a calculation. I like the term suggested by a student in one of my earlier courses: “bent notation”. A lot of this document consists of definitions; there are not so many theorems. Think of this as being like a computer program with many subroutines; to use the subroutines, they have to be defined somewhere. Similarly to do mathematics with new objects, one has to define all the operations that one does with the new objects.

1 Distributions

1.1 Definition Definition: A distribution T is a continuous linear mapping from a space of test functions to (real or complex) . We can write the value of T on a test function φ(x) as T [φ]. The term “generalized function” is a synonym for “distribution”.

The square-bracket notation emphasizes the aspect that the argument of a distribution is a whole function, not the value of a function at one point. A distribution is an example of what is called a functional. It is sometimes said that a functional is a “function of a function”, as opposed to being a function of a real variable. In this document, the test functions will be functions of one real variable, and conse- quently we will talk about distributions defined on the real axis. Roughly speaking, what we mean by a test function is that

A test function is any sufficiently differentiable function which vanishes suffi- ciently rapidly at infinity.

However, there are certain complications concerning the nature of the test functions: • There are alternatives in making precise the space of test functions, which cause some complications that we will return to later.

• As we will see with examples in Sec. 2, once a distribution is defined on a certain space of test functions, it can typically be naturally extended to a wider class of functions.

While these details matter, it is also important that we obtain a physical intuition. A distribution is meant as a mathematical construct that corresponds, for example, to a distribution of electric charge in space, no matter whether we have a continuous distribution, defined by a continuous charge density, or something else, like a point charge or a point dipole. The of obtaining the value T [φ] corresponds to an (idealized) experiment that measures the charge weighted by the test function.

1.2 Spaces of test functions There are two standard spaces of test functions:

Definition: A test function of compact support is a functions from the real line to the complex numbers that is infinitely differentiable and that is zero outside a bounded . The space of compact-support test functions is called D.

Definition: An open support test function φ(x) is a function from the real line to the complex numbers that is infinitely differentiable and that is of “rapid decrease”, i.e., for every N, the product xN φ(x) remains bounded as x → ±∞. The space of open support test functions is called S.

Both of these spaces are vector spaces. The space S is sometimes called the Schwartz space, and its elements called Schwartz functions, after Laurent Schwartz, who was the leading pioneer of the mathematical theory of distributions. Note that the space D of compact- support test functions is a subspace of the Schwartz space S. Following Richards & Youn, I will define a general distribution as one on the space of compact-support test functions. A tempered distribution is one that is defined on Schwartz space. If T is a tempered distribution, it is also a general distribution; this is because T [φ] is defined for all φ ∈ S, it is defined for all φ in the smaller space of compact-support test functions. Initially all our work will be with general distributions, and we can assume our test functions to have compact support. But when we treat Fourier transforms, it will be better to restrict to tempered distributions.

1.3 Extension of distributions beyond test functions Once we have defined a distribution on a space of test functions, its definition can typically be extended to a broader class of functions, as we will see.

2 1.4 Continuity of distributions It is required that a distribution is not just a linear functional of test functions but is a continuous linear functional. This will be true for all the distributions we normally encounter

in physics. What this means is that if a sequence of test functions φn converges to a test function φ, then the values of a distribution on the φn must converge to its value on φ, i.e., if φn → φ then T [φn] → T [φ]. But it is important that there is a very strong definition of the meaning that φn → φ. (As usual, n runs from 1 to ∞.) In the following, for a function φ(x), its kth derivative is notated by φ(k)(x).

Definition: For compact support test functions, by φn → φ is meant that

(k) (k) • For each k, φn (x) approaches φ (x) uniformly in x. I.e., for each k and (k) (k)  > 0, there is an n0(k, ) such that |φn (x) − φ (x)| <  for n > n0(k, ). Uniformity of the convergence means that the same n0 works for all x.

• And there is a uniformly bounded support for all of φn, i.e., there is a number a such that φn(x) = 0 for all |x| > a. Uniformity here means that the same a works for all n.

These restrictions ensure that the approach of φn to φ is not spoiled, for example, by oscilla- tions of decreasingly small amplitude but of rapidly increasing frequency; this would affect manipulations involving derivatives of the functions, such as we will encounter.

Definition: For Schwartz functions, by φn → φ is meant that

N (k) • For each pair of non-negative k and N, |x| φn (x) approaches |x|N φ(k)(x) uniformly in x.

1.5 Generalization to higher dimensions The above definition, together with the ones we will encounter later, apply when the under- lying functions are of one real variable. But they generalize trivially to functions of more than one real variable. See Sec. 4 for one example.

2 Basic distributions

There is standard correspondence between a distribution f¯ and an ordinary function f. It is defined by: Definition: f¯[φ] = R f(x) φ(x) dx. Comment: Note that a continuous function f(x) can be reconstructed if one knows f¯[φ] for all test functions. So one normally identifies the function and the corresponding distribution, and therefore drops the overbar on f¯. Also, the notation f¯, with the overbar, is my own notation and isn’t standard. But there also distributions that do not correspond to functions, for example the well- known delta function:

3 Definition: The delta function at a point a is defined by δa[φ] = φ(a). (Strictly speaking, the use of the word “function” here is a misnomer, of course.)

Notation: A notation that is often seen in the literature is that for any distribution T , hT, φi is defined to mean T [φ]. This is meant to be reminiscent of Dirac notation, but is not identical, because of the lack of a complex conjugate. I will not normally use this notation in this course, precisely because of the danger of confusion with Dirac bra and ket notation. Notation: We also use the notation R T (x) φ(x) dx to mean T [φ], even when T is not R obtained from a function. So we write, for example, δa[φ] = δ(x − a) φ(x) dx, even though δ(x − a) does not exist as a function. This bent notation is like the one for a derivative:

df δf = lim , (1) dx δx→0 δx where δf = f(x + δx) − f(x). It is useful to think of df/dx as df divided by dx even though neither df nor dx exists. It is useful when changing variables, for example: df(x(y))/dy = (df/dx) (dx/dy). Once we have defined a distribution with respect to a certain space of test functions, which for the moment we will take to be compact-support test functions, we can typically extend the definition in a natural way to more general functions. For example, the definitions of f¯[φ] can be extended to any function φ(x) for which the defining integral converges. Similarly

δa[φ] can be extended to any function φ(x) that continuous at x = 0. In physics applications, we will find it useful to use such extensions from time to time. However, if we want to apply, for example, the definition of the derivative of a distribution, which we will do shortly, only a more restricted extension will be appropriate. The following remarks are found on p. 11 of the Richards-Youn book. They seem appro- priate to me, and their full implications will appear after you get further acquainted with the subject

[This] comment represents an opinion of the authors, which the reader is free to disregard. Students sometimes get tied up in the logical technicalities of this theory and lose the forest for the trees. Logically, a distribution T is a mapping from test functions φ to numbers hT, φi. However, that is not the right way to think about them. They should be viewed (imprecisely, but suggestively) as ‘generalized functions’ of x, and written T (x). The test functions φ give a logical underpinning to the theory — but, whenever possible, they should be kept in the background. For example, the (incorrect) picture of the in Figure [1] gives a much better impression of what this construct means than the mathematically rigorous definition hδ, φi = φ(0). (A corresponding anomaly between theory and intuitive perception appears in the definition of real numbers. According to one standard definition, the number π is an of Cauchy sequences of rationals. No one thinks of it that way. We think of π as simply ‘a number’. Similarly we should think of the delta function δ(x) as a function of x, even though, strictly speaking, it isn’t.)

4 Figure 1: A schematic picture of the delta function δ(x). The pulse is to be viewed as ‘very thin and very high’, with its total area equal to 1. It is centered above the point x = 0.

3 Operations on distributions

First observe that the distributions corresponding to functions obey some elementary theo- rems:

(f + g)[φ] = f¯[φ] +g ¯[φ], (2) (λf)[φ] = λf¯[φ], (3) (f 0)[φ] = −f¯[φ0], (4) (fg)[φ] = f¯[gφ] =g ¯[fφ]. (5)

Here f(x) and g(x) are ordinary functions, and λ is a complex number. In this context, the product of two functions is (fg)(x) = f(x)g(x). As usual f 0(x) is the derivative of f. Notice that the last definition only applies for sufficiently well-behaved functions; it definitely applies if f and g are test functions, but otherwise it needs the extension of a distribution to function arguments that are more general than test functions. For distributions that do not correspond to functions, we use these same equations as definitions, as follows. Let T and S be distributions and let f be a function. Then we define

(T + S)[φ] = T [φ] + S[φ], (6) (λT )[φ] = λ(T [φ]), (7) (T 0)[φ] = −T [φ0], (8) (T f)[φ] = (fT )[φ] = T [fφ]. (9)

If T and S are replaced by the distributions corresponding to functions, the definitions agree with the elementary theorems above. So we have, for example, f 0 = f¯0. This allows us to drop the overbar on f¯ without making algebraic errors in calculations. Again, the definition of T f initially applies if f is a test function. But it can be applied with more general functions for f, provided that the extension of T [fφ] applies. To understand why these definitions are needed, examine the second of these, for example. On the left-hand side, we have the product of an ordinary number λ and a distribution T , and this product is not an object we have dealt with earlier. It is therefore in need of

5 definition. In contrast, on the right-hand side, we have the product of two ordinary numbers λ and T [φ], which is already defined. It is not possible to make a general definition of the product of two distributions that

gives sensible results in all cases. For example δaδa has to be regarded as infinite. However useful definitions can be given in some cases: e.g., a distribution times an ordinary function, as we have already seen. I will treat another case in Sect. 4. From the definitions it follows that the usual product rule holds for a product of a function and a distribution: (T f)0 = T 0f + T f 0, (10) as I ask you to prove on one of the exercises. The definitions also allow on to give sensible results for the derivatives of non-differentiable functions. For example, the theta function is defined by

 0 if x < 0, θ(x) = (11) 1 if x > 0.

¯0 Then, it follows from the definitions that θ = δ0. If we drop the overbar, and use the standard physicists’ notation, this is exactly

dθ(x) = δ(x). (12) dx We can also obtain the definition of the derivative of the delta function Z δ0(x − a) φ(x) dx = −φ0(a). (13)

4 Products of distributions

In general the product of two distributions is not defined. For example [δ(x − a)]2 does not exist. But if we have a two dimensional space, then, for example, we have Z δ(x − a) δ(y − b) φ(r) d2r = φ(a, b) (14)

Here I use the notation where vector r is also written as (x, y) in terms of its coordinates with respect to some chosen coordinate system. It is clearly sensible to define a two-dimensional delta function by Z δ(2)(r − (a, b)) φ(r) d2r = φ(a, b), (15) and to write δ(2)(r − (a, b)) = δ(x − a) δ(y − b). From this we can see that the idea of a distribution or generalized function generalizes easily and simply from the case that the underlying functions are of one real variable (as in φ(x)) to the case that they are functions of more variables (or of vectors).

6 5 Fourier transform of a distribution

I define the Fourier transform f˜ of a suitably well-behaved ordinary function f by Z f˜(k) = f(x)eikx dx, (16) for which the inverse transformation is known to be 1 Z f(x) = f˜(k)e−ikx dk . (17) 2π

To construct a definition of the Fourier transform of a general distribution (as well as of less well-behaved ordinary functions), let us follow the same strategy as in Sec. 3, i.e.,

1. Express the distributional version of f˜ in terms of the distributional version of f, for the case of a function whose ordinary Fourier transform exists and is well-behaved.

2. Use the result as a definition for a general distribution.

The first step is therefore Z f˜[φ] = f˜(k)φ(k) dk ZZ = f(x)eikxφ(k) dx dk Z Z = f(x) eikxφ(k) dk dx Z = f(x)φ˜(x) dx

= f[φ˜]. (18)

The test functions are required to be well-enough behaved that their Fourier transform exists. The exchange of limits on the third line is also allowed in this case. We then use T˜[φ] = T [φ˜] (19) as the definition of the Fourier transform of a distribution, and hence of a function whose ordinary Fourier transform does not exist. For example, let g(x) = e−ilx for some constant number l. Then

g˜[φ] = g[φ˜] Z = e−ilxφ˜(x) dx

= 2πφ(l) Z = 2πδ(k − l)φ(k) dk . (20)

7 This is the often-quoted result that the Fourier transform of the function g(x) = e−ilx is g˜(k) = 2πδ(k − l). To make the definition in (19) actually work correctly, we restrict the distributions to tempered distributions, that is distributions defined on the space S of Schwartz functions, as defined in Sec. 1.2. The reason is that the Fourier transform of a Schwartz function is itself a Schwartz function, i.e., from φ ∈ S it follows that φ˜ ∈ S, and hence that T [φ˜] is defined, provided that T is a tempered distribution. To see that the Fourier transform of a Schwartz function is a Schwartz function, observe that • If a function φ(x) is infinitely differentiable, then its Fourier transform φ˜(k) falls faster than any power of k as k → ±∞.

• The pth derivative φ˜(p)(k) of a Fourier transform is the Fourier transform of (ix)pφ(x). This has an extra factor of xp and hence worse convergence.

• If the function φ(x) decreases faster than any power of x then the integrals for all the derivatives of φ˜ exist, so that φ˜ is infinitely differentiable. In contrast, the Fourier transform of a compact-support test function normally does not have compact support but is merely a Schwartz function.

6 Distributions that are not tempered distributions

An ordinary delta distribution is not only a general distribution but also a tempered distri- bution, since its definition δa[φ] = φ(a) can be applied equally to Schwartz functions as to compact support test functions. That is, the definition is not affected by the behavior of the test functions φ(x) as x → ±∞. Consider instead a distribution defined essentially as an infinite sum over delta functions with an exponentially rising coefficient:

+∞ def X T [φ] = φ(n)en. (21) n=1 This is defined for all compact-support test functions, since then only a finite number of terms in the series are non-zero. But it is not defined for all Schwartz test functions, since they are guaranteed only to decrease faster than any power. For example, T [ψ] diverges when √ − 1 x2+1 ψ(x) = e 2 . (22) This function is infinitely differentiable, while it and all its derivatives decrease more rapidly than a power of x at large x. But the decrease of ψ(x) is overwhelmed by the en factor when (21) is applied. Of course, there are other Schwartz functions for which T [φ] does exist, but they are only a subspace of S. In contrast, if the en factor in (21) were replaced by one with only a power-law growth, e.g., n3, we would have convergence whenever φ(x) is a Schwartz function, since φ(n) would decrease faster than the n3 factor increases.

8 Roughly, we can characterize tempered distributions as being like functions T (x) that increase at most like a power of x when x goes to infinity. This covers all typical applications in physics where a Fourier transform appears.

7 Limits

It is often useful to find a sequence of ordinary functions that converge (in some suitable sense) to a non-trivial distribution limn→∞ fn = f, even when the limit is not an ordinary function. I’ll not discuss that in these notes. But we’ll see how to apply it to Fourier transforms.

8 Exercises

1. Exercises on using distributions:

R ∞ 0 −3x2 (a) Compute −∞ δ (x − 2)e dx. (b) What meaning is to be given to R δ00(x − a) φ(x) dx?

(c) Compute R e−|r|∇2δ(3)(r − a) d3r, where a is a fixed non-zero vector. (d) In one dimension, a function f(x) is defined by  e2x if x < 0,  f(x) = 1 if 0 < x < 1, (23)  e−(x−1) if 1 < x.

What are f 0(x) and f 00(x) in the sense of distributions?

2. Let T (x) be a distribution and let f(x) be a sufficiently well-behaved function. Starting from the definitions of (T f)0 and of T 0, etc, show that (T f)0 = T 0f + T f 0. (24) Note: This looks like a theorem from your calculus textbook, but it isn’t, since T is not necessarily an ordinary function. You will need to use the definitions of the product of a distribution and a function and of the derivative of a distribution.

3. In one dimension, a function f(x) is defined by   3x if x < 0, f(x) = 1 if 0 < x < 1, (25)  3x2 if 1 < x.

What is f 0(x)? What is f 00(x)?

9