Quick viewing(Text Mode)

1 Introduction 2 Integral Transforms

1 Introduction 2 Integral Transforms

Physics 129b 051012 F. Porter Revision 150928 F. Porter

1 Introduction

The integral problem is to find the solution to:

Z b h(x)f(x) = g(x) + λ k(x, y)f(y)dy. (1) a We are given functions h(x), g(x), k(x, y), and wish to determine f(x). The quantity λ is a parameter, which may be complex in general. The bivariate k(x, y) is called the kernel of the . We shall assume that h(x) and g(x) are defined and continuous on the interval a ≤ x ≤ b, and that the kernel is defined and continuous on a ≤ x ≤ b and a ≤ y ≤ b. Here we will concentrate on the problem for real variables x and y. The functions may be complex-valued, although we will sometimes simplify the discussion by considering real functions. However, many of the results can be generalized in fairly obvious ways, such as relaxation to piece- wise continuous functions, and generalization to multiple dimensions. There are many resources for further reading on this subject. Some of the popular ones among physicists include the “classic” texts by Mathews and Walker, Courant and Hilbert, Whittaker and Watson, and Margenau and Murphy, as well as the newer texts by Arfken, and Riley, Hobson, and Bence.

2 Integral Transforms

If h(x) = 0, we can take λ = −1 without loss of generality and obtain the integral equation: Z b g(x) = k(x, y)f(y)dy. (2) a This is called a Fredholm equation of the first kind or an . Particularly important examples of integral transforms include the and the , which we now discuss.

1 2.1 Fourier Transforms A special case of a Fredholm equation of the first kind is

a = −∞ (3) b = +∞ (4) 1 k(x, y) = √ e−ixy. (5) 2π This is known as the Fourier transform: 1 Z ∞ g(x) = √ e−ixyf(y)dy (6) 2π −∞ Note that the kernel is complex in this case. The solution to this equation is given by: 1 Z ∞ f(y) = √ eixyg(x)dx. (7) 2π −∞ We’ll forego rigor here and give the “physicist’s” demonstration of this:

∞ ∞ 1 Z Z 0 g(x) = e−ixydy eix yg(x0)dx0 (8) 2π −∞ −∞ ∞ ∞ 1 Z Z 0 = g(x0)dx0 ei(x −x)ydy (9) 2π −∞ −∞ Z ∞ = g(x0)δ(x − x0)dx0 (10) −∞ = g(x). (11)

Here, we have used the fact that the Dirac “delta-function” may be written 1 Z ∞ δ(x) = eixydy. (12) 2π −∞ The reader is encouraged to demonstrate this, if s/he has not done so before. It is instructive to notice that the Fourier transform may be regarded as a of the Fourier . Let f(x) be expanded in a in a box of size [−L/2, L/2]:

∞ X 2πinx/L f(x) = ane . (13) n=−∞ We have chosen periodic boundary conditions here: f(L/2) = f(−L/2). The an expansion coefficients may be determined for any given f(x) using the orthogonality relations:

Z L/2 1 2πinx/L −2πimx/L e e dx = δmn. (14) L −L/2

2 Hence, Z L/2 1 −2πinx/L an = f(x)e dx. (15) L −L/2 Now consider taking the limit as L → ∞. In this limit, the summation√ goes over to a continuous integral. Let y = 2πn/L and g(y) = Lan/ 2π. Then, using dn = (L/2π)dy,

∞ X 2πinx/L f(x) = lim ane (16) L→∞ n=−∞ √ ∞ 2π = lim X g(y)eixy (17) L→∞ n=−∞ L 1 Z ∞ = √ eixyg(y)dy. (18) 2π −∞ Furthermore: La 1 Z ∞ g(y) = √ n = √ f(x)e−ixydx. (19) 2π 2π −∞ We thus verify our earlier statements, including the δ-function equivalence, assuming our limit procedure is acceptable. Suppose now that f(y) is an even function, f(−y) = f(y). Then, 1 Z 0 Z ∞  g(x) = √ e−ixyf(y)dy + e−ixyf(y)dy (20) 2π −∞ 0 1 Z ∞ h i = √ eixy + e−ixy f(y) dy (21) 2π 0 s 2 Z ∞ = f(y) cos xy dy. (22) π 0 This is known as the Fourier cosine transform. It may be observed that the transform g(x) will also be an even function, and the solution for f(y) is: s 2 Z ∞ f(y) = g(x) cos xy dx. (23) π 0 Similarly, if f(y) is an odd function, we have the Fourier trans- form: s 2 Z ∞ g(x) = f(y) sin xy dy, (24) π 0 where a factor of −i has been absorbed. The solution for f(y) is s 2 Z ∞ f(y) = g(x) sin xy dx. (25) π 0 Let us briefly make some observations concerning an approach to a more rigorous discussion. Later we shall see that if the kernel k(x, y) satisfies

3 conditions such as square-integrability on [a, b] then convenient behavior is achieved for the solutions of the integral equation. However, in the present case, we not only have |a|, |b| → ∞, but the kernel eixy nowhere approaches zero. Thus, great care is required to ensure valid results. We may deal with this difficult situation by starting with a set of functions which are themselves sufficiently well-behaved (e.g., approach zero rapidly as |x| → ∞) that the behavior of the kernel is mitigated. For example, in quantum mechanics we may construct our of acceptable wave functions on R3 by starting with a set S of functions f(x) where:

1. f(x) ∈ C∞, that is f(x) is an infinitely differentiable complex-valued function on R3.

n 2. lim|x|→∞ |x| d(x) = 0, ∀n, where d(x) is any partial of f. That is, f and its fall off faster than any power of |x|.

We could approach the proof of the Fourier inverse theorem with more rigor than our limit of a series as follows: First, consider that subset of S consisting of Gaussian functions. Argue that any function in S may be ap- proximated aribtrarily closely by a series of Gaussians. Then note that the S functions form a pre-Hilbert space (also known as an Euclidean space). Add the completion to get a Hilbert space, and show that the theorem remains valid. The Fourier transform appears in many physical situations via its con- nection with waves, for example:

In electronics we use the Fourier transform to translate “time domain” prob- lems in terms of “” problems, with xy → ωt. An LCR circuit is just a complex impedance for a given frequency, hence the integral- differential time-domain problem is translated into an algebraic problem in the frequency domain. In quantum mechanics the position-space wave func- tions are related to momenutm-space wave functions via the Fourier trans- form.

2.1.1 Example: RC circuit

Suppose we wish to determine the “output” voltage Vo(t) in the simple circuit of Fig. 1. The time domain problem requires solving the equation:

Z t   Z t 1 0 0 1 1 1 0 0 Vo(t) = Vi(t ) dt − + Vo(t ) dt . (27) R1C −∞ C R1 R2 −∞ This is an integral equation, which we will encounter in Section 5.2 as a “Volterra’s equation of the second kind”.

4 R1

V (t) V (t) i C R2 o

Figure 1: A simple RC circuit problem.

If Vi(t) is a sinusoid waveform of a fixed frequency (ω), the circuit elements may be replaced by complex impedances:

R1 → Z1 = R1 (28)

R2 → Z2 = R2 (29) 1 C → Z = . (30) C iωC

Then it is a simple matter to solve for Vo(t): 1 V (t) = V (t) , (31) o i R1 1 + (1 + iωR2C) R2 if Vi(t) = sin(ωt + φ), and where it is understood that the real part is to be taken. Students usually learn how to obtain the result in Eqn. 31 long before they know about the Fourier transform. However, it is really the result in the frequency domain according to the Fourier transform. That is:

Z ∞ 1 −iωt Vbo(ω) = √ Vo(t)e dt (32) 2π −∞ 1 = Vb (ω) . (33) i R1 1 + (1 + iωR2C) R2

We are here using the “hat” ( b ) notation to indicate the integral transform of the unhatted function. The answer to the problem for general (not necessarily sinusoidal) input Vi(t) is then:

Z ∞ 1 iωt Vo(t) = √ Vbo(ω)e dω (34) 2π −∞ 1 Z ∞ eiωt = √ Vb (ω) dω. (35) i R1 2π −∞ 1 + (1 + iωR2C) R2

5 2.2 Laplace Transforms The Laplace transform is an integral transform of the form: Z ∞ F (s) = f(x)e−sxdx. (36) 0 The “solution” for f(x) is: 1 Z c+i∞ f(x) = F (s)esxds, (37) 2πi c−i∞ where x > 0. This transform can be useful for some functions where the Fourier trans- form does not exist. Problems at x → +∞ are removed by multiplying by e−cx, where c is a positive real number. Then the problem at −∞ is repaired by multiplying by the unit step function θ(x):  1 if x > 0,  θ(x) ≡ 1/2 if x = 0, and (38)  0 if x < 0. Thus, we have Z ∞ g(y) = f(x)θ(x)e−cxe−ixydx (39) −∞ Z ∞ = f(x)e−cxe−ixydx, (40) 0 √ where we have by convention also absorbed the 1/ 2π. The inverse Fourier transform is just: 1 Z ∞ e−cxθ(x)f(x) = g(y)eixydy. (41) 2π −∞ If we let s = c + iy and define F (s) ≡ g(y) at s = c + iy, then Z ∞ F (s) = f(x)e−sxdx, (42) 0 and 1 Z ∞ f(x)θ(x) = F (s)ex(c+iy)dy (43) 2π −∞ 1 Z c+i∞ = F (s)exsds, (44) 2πi c−i∞ which is the above-asserted result. We group together here some useful theorems for Fourier and Laplace transforms: First define some notation. Let 1 Z ∞ (Ff)(y) = g(y) = √ f(x)e−ixydx (45) 2π −∞

6 be the Fourier transform of f, and Z ∞ (Lf)(s) = F (s) = f(x)e−sxdx (46) 0 be the Laplace transform of f. Finally, let T (“transform”) stand for either F or L. The reader should immediately verify the following properties: 1. Linearity: For functions f and g and complex numbers α, β,

T (αf + βg) = α(T f) + β(T g). (47)

2. Transform of derivatives: Integrate by parts to show that

(Ff 0)(y) = iy(Ff)(y), (48)

assuming f(x) → 0 as x → ±∞, and

(Lf 0)(y) = s(Lf)(s) − f(0), (49)

assuming f(x) → 0 as x → ∞ and defining f(0) ≡ limx→0+ f(x). The procedure here may be iterated to obtain expressions for higher derivatives. 3. Transform of :  Z  1 F f(x)dx (y) = (Ff)(y) + Cδ(y), (50) iy where C is an arbitrary constant arising from the arbitrary of an indefinite integral;  Z x  Z ∞ Z x L f(t)dt (s) = dxe−sx f(t)dt 0 0 0 Z ∞ Z ∞ = dtf(t) dxe−sx 0 t 1 = (Lf)(s). (51) s 4. Translation:

[Ff(x + a)] (y) = eiay [Ff](y), (52) Z a [Lf(x + a)] (s) = eas [Lf](s) − θ(a) f(x)e−sxdx. (53) 0

5. Multiplication by an exponential:

{F [eaxf(x)]} (y) = (Ff)(y + ia), (54) {L [eaxf(x)]} (s) = (Lf)(s − a). (55)

7 6. Multiplication by x: d {F [xf(x)]} (y) = i (Ff)(y), (56) dy d {L [xf(x)]} (s) = − (Lf)(s). (57) ds

An important notion that arises in applications of integral transforms is that of a “”: Definition (Convolution):Given two functions f1(x) and f2(x), and con- stants a, b, the convolution of f1 and f2 is:

Z b g(x) = f1(y)f2(x − y)dy. (58) a In the case of Fourier transforms, we are interested in −a = b = ∞. For Laplace transforms, a = 0, b = ∞. We then have the celebrated convolution theorem:

Theorem: √ (Fg)(y) = 2π (Ff )(y)(Ff )(y), (59) √ 1 2 (Lg)(y) = 2π (Lf1)(y)(Lf2)(y). (60)

The proof is left as an exercise.

2.2.1 Laplace transform example: RC circuit

Let us return to the problem of determining the “output” voltage Vo(t) in the simple circuit of Fig. 1. But now, suppose that we know that Vi(t) = 0 for t < 0. In this case, the Laplace transform is an appropriate method to try. There are, of course, many equivalent solution paths; let us think in terms of the currents: i(t) is the total current (through R1), iC (t) is the current through the capacitor, and iR(t) is the current through R2. We know that i(t) = iC (t) + iR(t), and

Vo(t) = Vi(t) − i(t)R1 (61)

= [i(t) − iC (t)] R2 (62) Z t Q 1 0 0 = = iC (t ) dt . (63) C C 0

This gives us three equations relating the unknowns Vo(t), i(t), and iC (t), which we could try to solve to obtain Vo(t). However, the integral in the last equation complicates the solution. This is where the Laplace transform will help us.

8 Corresponding to the first of the three equations we obtain (where the hat now indicates the Laplace transform):

Z ∞ −st Vbo(s) = Vo(t)e dt = Vbi(s) − bi(s)R1. (64) 0 Corresponding to the second, we have: h i Vbo(s) = bi(s) − biC (s) R2. (65)

For the third, we have:

Z ∞ Z t 1 −st 0 0 Vbo(s) = e iC (t ) dt dt C 0 0 Z ∞ Z ∞ 1 0 0 −st = iC (t ) dt e dt C 0 t0 1 = bi (s). (66) sC C Now we have three simultaneous algebraic equations, which may be readily solved for Vbo(s): 1 Vb (s) = Vb (s) . (67) o i R1 1 + (1 + sR2C) R2 We note the similarity with Eqn. 33. Going back to the time domain, we find: 1 Z a+i∞ 1 V (t) = Vb (s)est ds. (68) o R1 i 2πi a−i∞ 1 + (1 + sR2C) R2

For example, let’s suppose that Vi(t) is a brief pulse, V ∆t ∼ A, at t = 0. Let’s model this as: V (t) = Aδ(t − ), (69) where  is a small positive number, inserted to make sure we don’t get into trouble with the t = 0 boundary in the Laplace transform. Then:

Z ∞ ˆ −st −s Vi(s) = Aδ(t − )e dt = Ae . (70) 0 Inserting into Eqn. 68, we have

a+i∞ s(t−) 1 R2 Z e Vo(t) = A ds, (71) 2πi R1 + R2 a−i∞ 1 + τs where R R τ ≡ 1 2 C. (72) R1 + R2 The integrand has a pole at s = −1/τ. We thus choose the contour of integration as in Fig. 2. A contour of this form is known as a “Bromwich

9 Im(s)

x a -1/ τ Re(s)

Figure 2: The Bromwich contour, for the RC circuit problem. contour”. In the limit R → ∞ the integral around the semicircle is zero. We thus have:

1 R2  1  Vo(t) = A2πi Residue at − 2πi R1 + R2 τ R  1  es(t−) = A 2 lim s + R1 + R2 s→−1/τ τ 1 + τs 1 −t = A e τ . (73) R1C In this simple problem, we could have guessed this result: At t = 0, we instantaneously put a voltage A/R1C on the capacitor. The time τ is simply the time constant for the capacitor to discharge through the parallel (R1,R2) combination. However, we may also treat more difficult problems with this technique. The integral-differential equation in the t-domain be- comes a problem of finding the zeros of a polynomial in the s-domain, at which the residues are evaluated. This translated problem lends itself to numerical solution.

3 Laplace’s Method for Ordinary Differential Equations

There are many other integral transforms that we could investigate, but the Fourier and Laplace transforms are the most ubiquitous in physics applica- tions. Rather than pursue other transforms, we’ll look at another example

10 that suggests the breadth of application of these ideas. This is the “Laplace’s Method” for the solution of ordinary differential equations. This method rep- resents a sort of generalization of the Laplace transform, using the feature of turning derivatives into powers. Suppose we wish to solve the differential equation: n X (k) (ak + bkx)f (x) = 0, (74) k=0 where we use the notation dkf f (k)(x) ≡ (x). (75) dxk In the spirit of the Laplace transform, assume a solution in the integral form Z f(x) = F (s)esx ds, (76) C where the contour will be chosen to suit the problem (not necessarily the contour of the Laplace transform). We’ll insert this proposed solution into the differential equation. Notice that Z f 0(x) = F (s)sesx ds. (77) C Thus, Z 0 = [U(s) + xV (s)] F (s)esx ds, (78) C where n X k U(s) = aks , (79) k=0 n X k V (s) = bks . (80) k=0 We may eliminate the x in the xV term of Eqn. 78 with an :

c2 Z Z d sx sx sx V (s)F (s)xe ds = [V (s)F (s)e ] − [V (s)F (s)] e ds, (81) C C ds c1 where c1 and c2 are the endpoints of C. Hence

( ) c2 Z d sx sx 0 = U(s)F (s) − [V (s)F (s)] e ds + [V (s)F (s)e ] . (82) C ds c1 We assume that we can choose C such that the integrated part vanishes. Then we will have a solution to the differential equation if d U(s)F (s) − [V (s)F (s)] = 0. (83) ds

11 Note that we have transformed a problem with high-order derivatives (but only first order polynomial coefficients) to a problem with first-order deriva- tives only, but with high-order polynomial coefficients. Formally, we find a solution as: d [V (s)F (s)] = U(s)F (s) (84) ds dF (s) dV (s) V (s) = U(s)F (s) − F (s) (85) ds ds d ln F U d ln V = − (86) ds V ds Z U d ln V ! ln F = − ds (87) V ds Z U = ds − ln V + ln A, (88) V where A is an arbitrary constant. Thus, the soluton for F (s) is;

A "Z s U(s0) # F (s) = exp ds0 . (89) V (s) V (s0)

3.1 Example of Laplace’s method: Hermite Equation The simple harmonic oscillator potential in the Schr¨odingerequation leads to the Hermite differential equation:

f 00(x) − 2xf 0(x) + 2νf(x) = 0, (90) where ν is a constant. This is an equation that may lend itself to treatment with Laplace’s method; let’s try it. First, we determine U(s) and V (s):

2 2 U(s) = a0 + a2s = 2ν + s (91)

V (s) = b1s = −2s. (92)

Substituting these into the general formula Eqn. 89, we have

1 Z s s2 + 2ν ! F (s) ∝ − exp − ds (93) 2s 2s 1 s2 ! ∝ − exp − − ν ln s (94) 2s 4 e−s2/4 ∝ − . (95) 2sν+1

12 Im(s)

C

Re(s)

Figure 3: A possible contour for the Hermite equation with non-integral constant ν. The branch cut is along the negative real axis.

To find f(x) we substitute this result into Eqn. 76: Z f(x) = F (s)esx ds (96) C Z e−s2+2sx = A ds, (97) C sν+1 where A is an arbitrary constant, and where we have let s → 2s according to convention for this problem. Now we are faced with the question of the choice of contour C. We at least must require that the integrated part in Eqn. 82 vanish:

2 2 2 e−s +2sx V (s)F (s)esx ∝ = 0. (98) 1 ν s 1 We’ll need to avoid s = 0 on our contour. If ν = n is a non-negative integer, we can take a circle around the origin, since the integrand is then analytic and single-valued everywhere except at s = 0. If ν 6= n, then s = 0 is a branch point, and we cannot choose C to circle the origin. We could in this case take C to be the contour of Fig. 3. Let’s consider further the case with ν = n = 0, 1, 2,... Take C to be a circle around the origin. Pick by convention A = n!/2πi, and define: n! Z e−s2+2sx Hn(x) ≡ ds. (99) 2πi C sn+1 This is a powerful integral form for the “Hermite polynomials” (or, Hermite functions in general with the branch cut contour of Fig. 3). For example, n! e−s2+2sx ! H (x) = 2πi × residue at s = 0 of . (100) n 2πi sn+1

13 Recall that the residue is the coefficient of the 1/s term in the Laurent series expansion. Hence,

 n −s2+2sx Hn(x) = n! × coefficient of s in e . (101)

That is, ∞ 2 H (x) e−s +2sx = X n sn. (102) n=0 n! This is the “generating function” for the Hermite polynomials. The term “generating function” is appropriate, since we have:

n d −s2+2sx Hn(x) = lim e (103) s→0 dsn −s2+2sx H0(x) = lim e = 1 (104) s→0 −s2+2sx H1(x) = lim(−2s + 2x)e = 2x, (105) s→0 and so forth.

4 Integral Equations of the Second Kind

Referring back to Eq. 1, if h(x) 6= 0 for a ≤ x ≤ b we may rewrite the problem in a form with h(x) = 1:

Z b g(x) = f(x) − λ k(x, y)f(y)dy. (106) a This is referred to as a linear integral equation of the second kind or as a Fredholm equation of the second kind. It defines a linear transformation from function f to function g. To see this, let us denote this transformation by the letter L:

Z b g(x) = (Lf)(x) = f(x) − λ k(x, y)f(y)dy. (107) a If

Lf1 = g1 and (108)

Lf2 = g2, (109) then for aribtrary complex constants c1 and c2:

L(c1f1 + c2f2) = c1g1 + c2g2. (110)

Notice that we may sometimes find it convenient to use the notation:

|gi = L|fi = |fi − λK|fi, (111)

14 R b where K|fi indicates here the integral a k(x, y)f(y)dy. Our linear is then written: L = I − λK, (112) where I is the identity operator. We are interested in the problem of inverting this linear transformation – given g, what is f? As it is a linear transformation, it should not be surprising that the techniques are analogous with those familiar in matrix equations. The difference is that we are now dealing with vector spaces that are infinite-dimensional function spaces.

4.1 Homogeneous Equation, Eigenfunctions It is especially useful in approaching this problem to first consider the special case g(x) = 0: Z b f(x) = λ k(x, y)f(y)dy. (113) a This is called the homogeneous integral equation. It has a trivial solution f(x) = 0 for a ≤ x ≤ b. If there exists a non-trivial solution f(x) to the homogeneous equation, then cf(x) is also a solution, and we may assume that our solution is “nor- malized” (at least up to some here-neglected questions of rigor related to the specification of our ):

Z b |f(x)|2dx = 1. (114) a

If there are several solutions, f1, f2, f3, . . . , fn, then any linear combina- tion of these is also a solution. Hence, if we have several linearly independent solutions, we can assume that they are orthogonal and normalized. If they are not, we may use the Gram-Schmidt process to obtain such a set of or- thonormal solutions. We therefore assume, without loss of generality, that:

Z b ∗ fi (x)fj(x)dx = δij. (115) a Alternatively, we may use the familiar shorthand:

hfi|fji = δij, (116) or even |fihf| = If , (117) where If is the identity matrix in the subspace spanned by {f}. A value of λ for which the homogeneous equation has non-trivial solutions is called an eigenvalue of the equation (or, of the kernel). Note that the use of the term eigenvalue here is analogous with, but different in detail from the

15 usage in matrices – our present eigenvalue is more similar with the inverse of a matrix eigenvalue. The corresponding solutions are called eigenfunctions of the kernel for eigenvalue λ. We have the following:

Theorem: There are a finite number of eigenfunctions fi corresponding to a given eigenvalue λ. Proof: We’ll prove this for real functions, leaving the complex case as an exercise. Given an eigenfunction fj corresponding to eigenvalue λ, let: Z b 1 pj(x) ≡ k(x, y)fj(y)dy = fj(x). (118) a λ Now consider, for some set of n eigenfunctions corresponding to eigenvalue λ:  2 Z b n 2 X D(x) ≡ λ k(x, y) − pj(x)fj(y) dy. (119) a j=1 It must be that D(x) ≥ 0 because the integrand is nowhere negative for any x. Note that the sum term may be regarded as an approximation to the kernel, hence D(x) is a measure of the closeness of the approximation. With some manipulation:

Z b Z b n 2 2 2 X D(x) = λ [k(x, y)] dy − 2λ k(x, y)pj(x)fj(y)dy a a j=1  2 Z b n 2 X +λ  pj(x)fj(y) dy a j=1 Z b n 2 2 2 X 2 = λ [k(x, y)] dy − 2λ [pj(x)] a j=1 n n Z b 2 X X +λ pj(x) pk(x) fj(y)fk(y)dy j=1 k=1 a Z b n 2 2 2 X 2 = λ [k(x, y)] dy − λ [pj(x)] . (120) a j=1 With D(x) ≥ 0, we have thus proved a form of Bessel’s inequality. We may rewrite the inequality as:

Z b n 2 2 X 2 λ [k(x, y)] dy ≥ [fj(x)] . (121) a j=1 If we integrate both sides over x, we obtain:

Z b Z b n Z b 2 2 X 2 λ [k(x, y)] dydx ≥ [fj(x)] dx a a j=1 a ≥ n, (122)

16 RR 2 using the normalization of the fj. As long as k dxdy is bounded, we see that n must be finite. For finite a and b, this is certainly satisfied, by our continuity assumption for k. Otherwise, we may impose this as a requirement on the kernel. More generally, we regard “nice” kernels as those for which

Z b Z b [k(x, y)]2 dydx < ∞, (123) a a b Z 2 [k(x, y)] dx < U1, ∀y ∈ [a, b], (124) a b Z 2 [k(x, y)] dy < U2, ∀x ∈ [a, b], (125) a where U1 and U2 are some fixed upper bounds. We will assume that these conditions are satisfied in our following discussion. Note that the kernel may actually be discontinuous and even become infinite in [a, b], as long as these conditions are satisfied.

4.2 Degenerate Kernels Definition (Degenerate Kernel):If we can write the kernel in the form:

n X ∗ k(x, y) = φi(x)ψi (y) (126) i=1 Pn (or K = i=1 |φiihψi|), then the kernel is called degenerate. We may assume that the φi(x) are linearly independent. Otherwise we could reduce the number of terms in the sum to use only independent functions. Likewise we may assume that the ψi(x) are linearly independent. The notion of a degenerate kernel is important due to two facts: 1. Any k(x, y) can be uniformly approximated by polynomials in a closed interval. That is, the polynomials are “com- plete” on a closed bounded interval. 2. The solution of the integral equation for degenerate kernels is easy (at least formally). The first fact is known under the label Weierstrass Approximation Theorem. A proof by construction may be found in Courant and Hilbert. We remind the reader of the notion of uniform convergence in the sense used here: P∞ PN Definition (Uniform Convergence):If S(z) = n=0 un(z) and SN = n=0 un(z), then S(z) is said to be uniformly convergent over the set of points A = {z|z ∈ A} if, given any  > 0, there exists an integer N such that

|S(z) − SN+k(z)| < , ∀k = 0, 1, 2,... and ∀z ∈ A. (127)

17 Note that this is a rather strong form of convergence – a series may converge for all z ∈ A, but may not be uniformly convergent. Let us now pursue the second fact asserted above. We wish to solve for f: Z b g(x) = f(x) − λ k(x, y)f(y)dy. (128) a If the kernel is degenerate, we have:

n Z b X ∗ g(x) = f(x) − λ φi(x) ψi (y)f(y)dy. (129) i=1 a We define the numbers: Z b ∗ gi ≡ ψi (x)g(x)dx (130) a Z b ∗ fi ≡ ψi (x)f(x)dx (131) a Z b ∗ cij ≡ ψi (x)φj(x)dx. (132) a ∗ Multiply Eq. 128 through by ψj (x) and integrate over x to obtain: n X gj = fj − λ cjifi. (133) i=1

This is a system of n linear equations in the n unknowns fi. Suppose that there is a unique solution f1, f2, . . . , fn to this system. It is readily verified that a solution to the integral equation is: n X f(x) = g(x) + λ fiφi(x). (134) i=1 Substituting in:   n Z b n X X ∗ X g(x) = g(x) + λ fiφi(x) − λ φi(x) ψi (y) g(y) + λ fjφj(y) dy i=1 i=1n a j=1     n Z b n X  ∗ X  = g(x) + λ φi(x) fi − ψi (y) g(y) + λ fjφj(y) dy i=1  a j=1  n   n  X  X  = g(x) + λ φi(x) fi − gi + λ cijfj i=1  j=1  = g(x). (135) Let us try an explicit example to illustrate how things work. We wish to solve the equation: Z 1 x2 = f(x) − λ x(1 + y)f(y)dy (136) 0

18 In this case, n = 1, and it is clear that the solution is simply a quadratic polynomial which can be determined directly. However, let us apply our new method instead. We have g(x) = x2 and k(x, y) = x(1 + y). The kernel is degenerate, with φ1(x) = x and ψ1(y) = 1 + y. Our constants evaluate to:

Z 1 2 7 g1 = (1 + x)x dx = (137) 0 12 Z 1 5 c11 = x(1 + x)dx = . (138) 0 6 The linear equation we need to solve is then: 7 5 = f − λ f , (139) 12 1 6 1 giving 7 1 f = , (140) 1 2 6 − 5λ and 7 λ f(x) = x2 + x. (141) 2 6 − 5λ The reader is encouraged to check that this is a solution to the original equation, and that no solution exists if λ = 6/5. To investigate this special value λ = 6/5, consider the homogeneous equa- tion: Z 1 f(x) = λ x(1 + y)f(y)dy. (142) 0

We may use the same procedure in this case, except now g1 = 0 and we find that  5 f 1 − λ = 0. (143) 1 6

Either f1 = 0 or λ = 6/5. If f1 = 0, then f(x) = g(x) + λf1φ1(x) = 0. If λ 6= 6/5 the only solution to the homogeneous equation is the trivial one. But if λ = 6/5 the solution to the homogeneous equation is f(x) = ax, where a is arbitrary. The value λ = 6/5 is an (in this case the only) eigenvalue of√ the integral equation, with corresponding normalized eigenfunction f(x) = 3x. This example suggests the plausibility of the important theorem in the next section.

4.3 Theorem Theorem: Either the integral equation

Z b f(x) = g(x) + λ k(x, y)f(y)dy, (144) a

19 with given λ, possesses a unique continuous solution f(x) for each con- tinuous function g(x) (and in particular f(x) = 0 if g(x) = 0), or the associated homogeneous equation Z b f(x) = λ k(x, y)f(y)dy (145) a possesses a finite number of linearly independent solutions. We’ll give an abbreviated proof of this theorem to establish the ideas; the reader may wish to fill in the rigorous details. We have already demonstrated that there exists at most a finite number of linearly independent solutions to the homogeneous equation. A good ap- proach to proving the remainder of the theorem is to first prove it for the case of degenerate kernels. We’ll use the Dirac notation for this, suggesting the applicability for linear operators in general. Thus, let n X K = |φiihψi| (146) i=1 n X |fi = |gi + λ |φiihψi|fi, (147) i=1 and let

gi ≡ hψi|gi (148)

fi ≡ hψi|fi (149)

cij ≡ hψj|φii. (150) Then, n X fj = gj + λ cjifi, (151) i=1 or   g1    g2  g =  .  = (I − λC) f, (152)  .   .  gn where C is the matrix formed of the cij constants. Thus, we have a system of n linear equations for the n unknowns {fi}. Either the matrix I − λC is non-singular, in which case a unique solution f exists for any given g (in particular f = 0 if g = 0), or I − λC is singular, in which case the homogeneous equation f = λCf possesses a finite number of linearly independent solutions. Up to some further considerations concerning continuity, this proves the theorem for the case of a degenerate kernel. We may extend the proof to arbitrary kernels by appealing to the fact that any continuous funciton k(x, y) may be uniformly approximated by de- generate kernels in a closed interval (for example, see Courant and Hilbert). There is an additional useful theorem under Fredholm’s name:

20 Theorem: If the integral equation:

Z b f(x) = g(x) + λ k(x, y)f(y)dy (153) a for given λ possesses a unique continuous solution for each continuous g(x), then the transposed equation:

Z b t(x) = g(x) + λ k(y, x)t(y)dy (154) a also possesses a unique solution for each g. In the other case, if the homogeneous equation possesses n linearly independent solutions {f1, f2, . . . , fn}, then the transposed homogeneous equation

Z b t(x) = λ k(y, x)t(y)dy (155) a

also has n linearly independent solutions {t1, t2, . . . , tn}. In this case, the original inhomogeneous equation 153 has a solution if and only if g(x) satisfies conditions:

Z b ∗ hg|tii = g (x)ti(x)dx = 0, i = 1, 2, . . . , n. (156) a That is, g must be orthogonal to all of the eigenvectors of the transposed homogeneous equation. Furthermore, in this case, the solution is only determined up to addition of an arbitrary linear combination of the form: c1f1 + c2f2 + ... + cnfn. (157)

Again, a promising approach to proving this is to first consider the case of degenerate kernels, and then generalize to arbitrary kernels.

5 Practical Approaches

We turn now to a discussion of some practical “tools of the trade” for solving integral equations.

5.1 Degenerate Kernels If the kernel is degenerate, we have shown that the solution may be obtained by transforming the problem to that of solving a system of linear equations.

21 5.2 Volterra’s Equations Integral equations of the form: Z x g(x) = λ k(x, y)f(y)dy (158) a Z x f(x) = g(x) + λ k(x, y)f(y)dy (159) a are called Volterra’s equations of the first and second kind, respectively. One situation where such equations arise is when k(x, y) = 0 for y > x: k(x, y) = θ(x − y)`(x, y). Thus,

Z b Z x k(x, y)f(y)dy = `(x, y)f(y)dy. (160) a a Consider Volterra’s equation of the first kind. Recall the fundamental theorem: d Z b(x) db da Z b ∂f f(y, x)dy = f(b, x) − f(a, x) + (y, x)dy. (161) dx a(x) dx dx a ∂x We may use this to transform the equation of the first kind to: dg Z x ∂k (x) = λk(x, x)f(x) + λ (x, y)f(y)dy. (162) dx a ∂x This is now a Volterra’s equation of the second kind, and the approach to solution may thus be similar. Notice that if the kernel is independent of x, k(x, y) = k(y), then the solution to the equation of the first kind is simply: 1 dg f(x) = (x). (163) λk(x) dx Let us try a simple example. Suppose we wish to find f(x) in: Z x x2 = 1 + λ xyf(y)dy. (164) 1 R x This may be solved with various approaches. Let φ(x) ≡ 1 yf(y)dy. Then x2 − 1 φ(x) = . (165) λx Now take the derivative of both sides of the original equation: Z x 2x = λx2f(x) + λ yf(y)dy = λx2f(x) + λφ(x). (166) 1 A bit of further algebra yields the answer: 1  1  f(x) = x + . (167) λx2 x

22 As always, especially when you have taken derivatives, it should be checked that the result actually solves the original equation! This was pretty easy, but it is even easier if we notice that this problem is actually equivalent to one with an x-independent kernel. That is, we may rewrite the equation as:

x2 − 1 Z x = λ yf(y)dy. (168) x 1 Then we may use Eq. 163 to obtain the solution.

5.2.1 Numerical Solution of Volterra’s equation The Volterra’s equation readily lends itself to a numerical approach to solu- tion on a grid (or “mesh” or “lattice”). We note first that (absorbing the factor λ into the definition of k for convenience):

Z x=a f(a) = g(a) + k(a, y)f(y)dy a = g(a). (169)

This suggests building up a solution at arbitrary x by stepping along a grid starting at x = a. To carry out this program, we start by dividing the interval (a, x) into N steps, and define: x − a x = a + n∆, n = 0, 1,...,N, ∆ ≡ . (170) n N We have here defined a uniform grid, but that is not a requirement. Now let

gn = g(xn) (171)

fn = f(xn) (172)

knm = k(xn, xm). (173)

Note that f0 = g0. We may pick various approaches to the , for exam- ple, the gives:

n−1 ! Z xn 1 X 1 k(xn, y)f(y) dy ≈ ∆ kn0f0 + knmfm + knnfn . (174) a 2 m=1 2

Substituting this into the Volterra equation yields, at x = xn:

n−1 ! 1 X 1 fn = gn + ∆ kn0f0 + knmfm + knnfn , n = 1, 2,...N. (175) 2 m=1 2

23 Solving for fn then gives:

 1 Pn−1  gn + ∆ 2 kn0f0 + m=1 knmfm fn = ∆ , n = 1, 2,...N. (176) 1 − 2 knn For example,

f0 = g0, (177) ∆ g1 + 2 k10f0 f1 = ∆ , (178) 1 − 2 k11 ∆ g2 + 2 k20f0 + ∆k21f1 f2 = ∆ , (179) 1 − 2 k22 and so forth. We note that we don’t even have to explicitly solve a system of linear equations, as we did for Fredholm’s equation with a degenerate kernel. There are of order ! N   O X n = O N 2 (180) n=1 operations in this algorithm. The accuracy may be estimated by looking at the change as additional grid points are added.

5.3 Neumann Series Solution Often an exact closed solution is elusive, and we resort to approximate meth- ods. For example, one common approach is the iterative solution. We start with the integral equation:

Z b f(x) = g(x) + λ k(x, y)f(y)dy. (181) a The iterative approach begins with setting

f1(x) = g(x). (182)

Substituting this into the integrand in the original equation gives:

Z b f2(x) = g(x) + λ k(x, y)g(y)dy. (183) a Substituting this yields:

Z b " Z b # 0 0 0 f3(x) = g(x) + λ k(x, y) g(y) + λ k(y, y )g(y )dy dy. (184) a a

24 This may be continuted indefinitely, with the nth iterative solution given in terms of the (n − 1)th:

Z b fn(x) = g(x) + λ k(x, y)fn−1(y)dy (185) a Z b = g(x) + λ k(x, y)g(y)dy (186) a Z b Z b +λ2 k(x, y)k(y, y0)g(y0)dydy0 a a + ... Z b Z b +λn−1 ··· k(x, y) ··· k(y(n−2)0, y(n−1)0)g(y(n−1)0)dy . . . dy(n−1)0. a a If the method converges, then

f(x) = lim fn(x). (187) n→∞ This method is only useful if the series converges, and the faster the bet- ter. It will converge if the kernel is bounded and lambda is “small enough”. We won’t pursue this further here, except to note what happens if

Z b k(x, y)g(y)dy = 0. (188) a In this case, the series clearly converges, onto solution f(x) = g(x). However, this solution is not necessarily unique, as we may add any linear combination of solutions to the homogeneous equation.

5.4 Fredholm Series Better convergence properties are obtained with the Fredholm series. As before, we wish to solve

Z b f(x) = g(x) + λ k(x, y)f(y)dy. (189) a Let Z b 2 ZZ 0 λ k(x, x) k(x, x ) 0 D(λ) = 1 − λ k(x, x)dx + 0 0 0 dxdx a 2! k(x , x) k(x , x )

k(x, x) k(x, x0) k(x, x00) λ3 ZZZ 0 0 0 0 00 0 00 − k(x , x) k(x , x ) k(x , x ) dxdx dx 3! 00 00 0 00 00 k(x , x) k(x , x ) k(x , x ) + ..., (190) and let Z 2 k(x, y) k(x, z) D(x, y; λ) = λk(x, y) − λ dz k(z, y) k(z, z)

25

k(x, y) k(x, z) k(x, z0) λ3 ZZ 0 0 + k(z, y) k(z, z) k(z, z ) dzdz 2! 0 0 0 0 k(z , y) k(z , z) k(z , z ) + ..., (191)

Note that not everyone uses the same convention for this notation. For example, Mathews and Walker defines D(x, y; λ) to be 1/λ times the quantity defined here. We have the following: Theorem: If D(λ) 6= 0 and if the Fredholm’s equation has a solution, then the solution is, uniquely: Z b D(x, y; λ) f(x) = g(x) + g(y)dy. (192) a D(λ)

R b The homogeneous equation f(x) = λ a k(x, y)f(y)dy has no continu- ous non-trivial solutions unless D(λ) = 0. A proof of this theorem may be found in Whittaker and Watson. The proof may be approached as follows: Divide the range a < x < b into equal intervals and replace the original integral by a sum:

n X f(x) = g(x) + λ k(x, xi)f(xi)δ, (193) i=1 where δ is the width of an interval, and xi is a value of x within interval i. This provides a system of linear equations for f(xi), which we may solve and take the limit as n → ∞, δ → 0. In this limit, D(λ) is the limit of the determinant matrix of coefficients expanded in powers of λ. While the Fredholm series is cumbersome, it has the advantage over the Neumann series that the series D(λ) and D(x, y; λ) are guaranteed to con- verge. There is a nice graphical representation of the Fredholm series; I’ll de- scribe a variant here. We let a line segment or smooth arc represent the kernel k(x, y), one end of the segment corresponds to x and the other end to variable y. If the segment closes on itself smoothly (e.g., we have a circle), then the variables at the two “ends” are the same – we have k(x, x). The product of two kernels is represented by making two segments meet at a point. The meeting ends correspond to the same variable, the first variable in one kernel and the second variable in the other kernel. One may think of the lines as directed, such that the second variable, say, is at the “starting” end, and the first variable is at the “finishing” end. When two seg- ments meet, it is always that the “finish” of one is connected to the “start” of the other. We could draw arrows to keep track of this, but it actually isn’t needed in this application, since in practice we’ll always integrate over

26 the repeated variables. A heavy dot on a segment breaks the segment into two meeting segments, according to the above rule, and furthermore means integration over the repeated variable with a factor of λ. For illustration of these rules:

k(x, y)

k(x, x)

k(x, y)k(y, z) • λ R k(x, y)k(y, z)dy • λ R k(x, x)dx

Thus,

D(x, y; λ)   = − − λ 1   + − + • − + − 2! ••• • • − ..., (194) and

1   D(λ) = 1 − + − 2! 1   − − + ••• • − + − 3! • + ... (195)

Let us try a very simple example to see how things work. Suppose we wish to solve: Z 1 f(x) = x + λ xyf(y)dy. (196) 0 Of course, this may be readily solved by elementary means, but let us apply our new techniques. We have:

= k(x, y) = xy (197)

Z 1 λ = λ k(x, x)dx = (198) 0 3

27 Z 1 λ = λ k(x, y)k(y, z)dy = xz = • (199) • 0 3 !2 Z 1 Z 1 λ  2 = λ2 k(x, y)k(y, x)dxdy = = . (200) ••• • 0 0 3 We thus notice that

Z 1 Z 1 2 n n (n) 2 h (n)i dots • = λ ··· dx . . . dx x ... x •• 0 0  n = . (201)

We may likewise show that

n  n dots• = . (202) • We find from this that all determinants of dimension ≥ 2 vanish. We have

λ D(λ) = 1 − = 1 − (203) 3 1 D(x, y; λ) = = xy. (204) λ The solution in terms of the Fredholm series is then: Z 1 D(x, y; λ) f(x) = g(x) + g(y)dy 0 D(λ) 3λ Z 1 = x + xy2dy 3 − λ 0 3 = x. (205) 3 − λ Generalizing from this example, we remark that if the kernel is degenerate,

n X k(x, y) = φi(x)ψi(y), (206) i=1

28 then D(λ) and D(x, y; λ) are polynomials of degree n in λ. The reader is invited to attempt a graphical “proof” of this. This provides another algorithm for solving the degenerate kernel problem. Now suppose that we attempt to solve our example with a Neumann series. We have Z ZZ f(x) = g(x) + λ k(x, y)g(y)dy + λ2 k(x, y)k(y, y0)g(y0)dydy0 + ... Z 1 Z 1 Z 1 = x + λx y2dy + λ2x y2(y0)2dydy0 + ... 0 0 0 ∞ λ!n = x X . (207) n=0 3 This series converges for |λ| < 3 to 3 f(x) = x. (208) 3 − λ This is the same result as the Fredholm solution above. However, the Neu- mann solution is only valid for |λ| < 3, while the Fredholm solution is valid for all λ 6= 3. At eigenvalue λ = 3, D(λ = 3) = 0. At λ = 3, we expect a non-trivial solution to the homogeneous equation Z 1 f(x) = 3 xyf(y)dy. (209) 0 Indeed, f(x) = Ax solves this equation. The roots of D(λ) are the eigenvalues of the kernel. If the kernel is degenerate we only have a finite number of eigenvalues.

6 Symmetric Kernels

Definition: If k(x, y) = k(y, x) then the kernel is called symmetric. If k(x, y) = k∗(y, x) then the kernel is called Hermitian. Note that a real, Hermitian kernel is symmetric. For simplicity, we’ll restrict ourselves to real symmetric kernels here,1 but the generalization to Hermitian kernels is readily accomplished (indeed is already done when we use Dirac’s notation). The study of such kernels via eigenfunctions is referred to as “Schmidt-Hilbert theory”. We will assume that our kernels are bounded in the sense: Z b [k(x, y)]2dy ≤ M, (210) a Z b "∂k #2 (x, y) dy ≤ M 0, (211) a ∂x

1Note that, since we are assuming real functions in this section, we do not put a complex conjugate in our scalar products. But don’t forget to put in the complex conjugate if you have a problem with complex functions!

29 where M and M 0 are finite. Our approach to studying the symmetric kernel problem will be to analyze it in terms of the solutions to the homogeneous equation. We have the following: Theorem: Every continuous symmetric kernel (not identically zero) pos- sesses eigenvalues. Their number is countably infinite if and only if the kernel is not degenerate. All eigenvalues of a real symmetric kernel are real. Proof: First, recall the Schwarz inequality, in Dirac notation:

|hf|gi|2 ≤ hf|fihg|gi. (212)

Consider the “quadratic integral form”:

Z b Z b J(φ, φ) = hφ|K|φi ≡ k(x, y)φ(x)φ(y)dxdy, (213) a a where φ is any (piecewise) continuous function in [a, b]. We’ll assume |a|, |b| < ∞ for simplicity here; the reader may consider what additional criteria must be satisifed if the interval is infinite. Our quadratic integral form is analogous with the quadratic form for systems of linear equations:

n ! X  A(x, x) = aijxixj = ( x ) A x , (214) i,j=1 and this analogy persists in much of the discussion, lending an intuitive perspective. Notice that if we write: ZZ J(φ, φ) = hu|vi = u(x, y)v(x, y)dxdy, (215) where

u(x, y) ≡ k(x, y) (216) v(x, y) ≡ φ(x)φ(y), (217) we have defined a scalar product between the vectors u and v. We are thus led to consider its square, Z Z Z Z [J(φ, φ)]2 = dx dy dx0 dy0k(x, y)φ(x)φ(y)k(x0, y0)φ(x0)φ(y0), (218) to which we apply the Schwarz inequality:

[J(φ, φ)]2 = |hu|vi|2 ≤ hu|uihv|vi ZZ ZZ ≤ [φ(x)φ(y)]2 dxdy [k(x, y)]2 dxdy. (219)

30 Thus, if we require φ to be a normalized function, Z [φ(x)]2 dx = 1, (220) we see that |J(φ, φ)| is bounded, since the integral of the squared kernel is bounded. Furthermore, we can have J(φ, φ) = 0 for all φ if and only if k(x, y) = 0. The “if” part is obviously true; let us deal with the “only if” part. This statement depends on the symmetry of the kernel. Consider the “bilinear integral form”: ZZ J(φ, ψ) = J(ψ, φ) ≡ k(x, y)φ(x)ψ(y)dxdy. (221)

We have J(φ + ψ, φ + ψ) = J(φ, φ) + J(ψ, ψ) + 2J(φ, ψ), (222) for all φ, ψ piecewise continuous on [a, b]. We see that J(φ, φ) = 0 for all φ only if it is also true that J(φ, ψ) = 0, ∀φ, ψ. In particular, let us take Z ψ(y) = k(x, y)φ(x)dx. (223)

Then Z Z Z 0 = J(φ, ψ) = dx dyk(x, y)φ(x) dx0k(x0, y)φ(x0) Z Z 2 = k(x, y)φ(x)dx dy. (224)

Thus, R k(x, y)φ(x)dx = 0 ∀φ. In particular, take for any given value of y, φ(x) = k(x, y). Then Z [k(x, y)]2 dx = 0, (225) and we find k(x, y) = 0. We now assume that J(φ, φ) 6= 0. Let us assume for convenience that J(φ, φ) can take on positive values. If not, we could repeat the following arguments for the case J(φ, φ) ≤ 0 ∀φ. We are interested in finding the normalized φ for which J(φ, φ) attains its greatest possible value. Since J(φ, φ) is bounded, there exists a least upper bound:

J(φ, φ) ≤ Λ1 = 1/λ1, ∀φ such that hφ|φi = 1. (226) We wish to show that this bound is actually achieved for a suitable φ(x). Let us suppose that the kernel is uniformly approximated by a series of degenerate symmetric kernels:

an X (n) An(x, y) = cij ωi(x)ωj(y), (227) i,j=1

31 (n) (n) where cij = cji and hωi|ωji = δij, and such that the approximating kernels are uniformly bounded in the senses:

Z b 2 [An(x, y)] dy ≤ MA, (228) a Z b " #2 ∂An 0 (x, y) dy ≤ MA, (229) a ∂x

0 where MA and MA are finite and independent of n. We consider the quadratic integral form for the approximating kernels: ZZ Jn(φ, φ) ≡ Anφ(x)φ(y)dxdy

an X (n) Z Z = cij ωi(x)φ(x)dx ωj(y)φ(y)dy i,j=1 an X (n) = cij uiuj, (230) i,j=1 R where ui ≡ ωi(x)φ(x)dx. This is a quadratic form in the numbers u1, u2, . . . , uan . Now, " a #2 Z Xn φ(x) − uiωi(x) dx ≥ 0, (231) i=1 implies that (Bessel inequality):

an X 2 hφ|φi = 1 ≥ ui . (232) i=1 The maximum of J(φ, φ) is attained when

an X 2 ui = 1. (233) i=1 More intuitively, note that

a Xn φ(x) = uiωi(x), (234) i=1 unless there is a component of φ orthogonal to all of the ωi. By removing that component we can make Jn(φ, φ) larger. We wish to find a function φn(x) such that the maximum is attained. We know that it must be of the form φn(x) = u1ω1(x) + u2ω2(x) + ··· uan ωan (x), P 2 where ui = 1, since then hφn|φni = 1. The problem of finding max [Jn(φ, φ)] is thus one of finding the maximum of the quadratic form subject to the P 2 constraint ui = 1. We know that such a maximum exists, because a con- tinuous function of several variables, restricted to a finite domain, assumes a

32 maximum value in the domain. Suppose that {u} is the appropriate vector. Then an X (n) cij uiuj = Λ1n (235) i,j=1 is the maximum value that Jn(φ, φ) attains. But the problem of finding the maximum of the quadratic form is just the problem of finding its maximum eigenvalue and corresponding eigenvector. That is, an X (n) cij uj = Λ1nui, i = 1, 2, . . . , an. (236) j=1 This is also called the “principal axis problem”. Take

φn(x) = u1ω1(x) + u2ω2(x) + ··· + uan ωan (x), (237) where {u} is now our (normalized) vector for which the quadratic form is maximal. The normalization hφn|φni still holds. Apply the approximate kernel operator to this function:

an Z X (n) Z An(x, y)φn(y)dy = cij ωi(x) ωj(y)φn(y)dy i,j=1 an an X X (n) = ωi(x) cij uj i=1 j=1 a Xn = Λ1n uiωi(x) i=1 = Λ1nφn(x). (238)

Therefore φn(x) is an eigenfunction of An(x, y) belonging to eigenvalue λ1n = 1/Λ1n. Finally, it is left to argue that, as we let An converge on k, φn(x) converges on eigenfunction φ(x), with eigenvalue λ1. We’ll let n → ∞. Since An(x, y) is uniformly convergent on k(x, y), we have that, given any  > 0, there exists an N such that whenever n ≥ N:

|k(x, y) − An(x, y)| < , ∀x, y ∈ [a, b]. (239) Thus,

 2 2 ZZ [J(φ, φ) − Jn(φ, φ)] = [k(x, y) − An(x, y)] φ(x)φ(y)dxdy ZZ 2 2 ≤ |hφ|φi| [k(x, y) − An(x, y)] dxdy (Schwarz), Z b Z b ≤ 2 dxdy a a ≤ 2(b − a)2. (240)

33 Thus, the range of Jn may be made arbitrarily close to the range of J by tak- ing n large enough, and hence, the maximum of Jn may be made arbitrarily close to that of J: lim Λ1n = λ1. (241) n→∞

Now, by the Schwarz inequality, the functions φn(x) are uniformly bounded for all n:  2 2 Z [φn(x)] = λ1n An(x, y)φn(y)dy Z 2 2 ≤ λ1nhφn|φni [An(x, y)] dy. (242)

As n → ∞, λ1n → λ1 and An(x, y) → k(x, y). Also, since An(x, y) is piecewise continuous, φn(x) is continuous, since it is an integral function. The φn(x) form what is known as an “equicontinuous set”: For every  > 0, there exists δ() > 0, independent of n, such that

|φn(x + η) − φn(x)| < , (243)

0 whenever |η| < δ. This may be seen as follows: First, we show that φn(x) is uniformly bounded:

" Z ∂A #2 [φ0 (x)]2 = λ n (x, y)φ (y)dy n 1n ∂x n Z "∂A #2 ≤ λ2 n (x, y) dy (Schwarz) 1n ∂x 2 0 ≤ λ1nMA. (244)

0 2 00 00 0 2 Or, [φn(x)] ≤ MA, where MA = MA max λ1n. With this, we find:

Z x+η 2 2 0 |φn(x + η) − φn(x)| = φn(y)dy x 2 Z b 0 = [θ(y − x) − θ(y − x − η)] φn(y)dy a Z b Z b 2 0 2 ≤ [θ(y − x) − θ(y − x − η)] dy [φn(y)] dy a a 00 ≤ |η|(b − a)MA < , (245)

00 for δ ≤ /(b − a)MA. For such sets of functions there is a theorem analogous to the Bolzano- Weierstrass theorem on the existence of a limit point for a bounded infinite sequence of numbers:

34 Theorem: (Arzela) If f1(x), f2(x),... is a uniformly bounded equicontin- uous set of functions on a domain D, then it is possible to select a subsequence that converges uniformly to a continuous limit function in the domain D. The proof of this is similar to the proof of the Bolzano-Weierstrass theorem, which it relies on. We start by selecting a set of points x1, x2,... that is everywhere dense in [a, b]. For example, we could pick successive midpoints of intervals. By the Bolzano-Weierstrass theorem, this sequence of num- bers contains a convergent subsequence. Now select an infinite sequence of functions (out of {f}) a1(x), a2(x),... whose values at x1 form a convergent sequence, which we may also accomplish by the same reasoning. Similarly, select a convergent sequence of functions (out of {a}) b1(x), b2(x),... whose values at x2 form a convergent sequence, and so on. Now consider the “diagonal sequence”:

q1(x) = a1(x)

q2(x) = b2(x)

q3(x) = c3(x) ... (246) We wish to show that the sequence {q} converges on the entire interval [a, b]. Given  > 0, take M large enough so that there exist values xk with k ≤ M such that |x − xk| ≤ δ() for every point x of the interval, where δ() is the δ in our definition of equicontinuity. Now choose N = N() so that for m, n > N |qm(xk) − qn(xk)| < , k = 1, 2,...,M. (247) By equicontinuity, we have, for some k ≤ M:

|qm(x) − qm(xk)| < , (248)

|qn(x) − qn(xk)| < . (249) Thus, for m, n > N:

|qm(x) − qn(x)| = |qm(x) − qm(xk) + qm(xk) − qn(xk) + qn(xk) − qn(x)| < 3. (250) Thus, {q} is uniformly convergent for all x ∈ [a, b].

With this theorem, we can find a subsequence φn1 , φn2 ,... that converges uniformly to a continuous limit function ψ1(x) for a ≤ x ≤ b. There may be more than one limit function, but there cannot be an infinite number, as we know that the number of eigenfunctions for given λ is finite. Passing to the limit,

hφn|φni = 1 → hψ1|ψ1i = 1 (251)

Jn(φn, φn) = Λ1n → J(ψ1, ψ1) = Λ1 (252) Z Z φn(x) = λ1n An(x, y)φn(y)dy → ψ1(x) = λ1 k(x, y)ψ1(y)dy.(253)

35 Thus we have proven the existence of an eigenvalue (λ1). Note that λ1 6= ∞ since we assumed that J(φ, φ) could be positive: 1 max [J(φ, φ)] = Λ1 = > 0. (254) λ1 Note also that, just as in the principal axis problem, additional eigenvalues (if any exist) can be found by repeating the procedure, restricting to functions orthogonal to the first one. If k(x, y) is degenerate, there can only be a finite number of them, as the reader may demonstrate. This completes the proof of the theorem stated at the beginning of the section. We’ll conclude this section with some further properties of symmetric ker- nels. Suppose that we have found all of the positive and negative eigenvalues and ordered them by absolute value:

|λ1| ≤ |λ2| ≤ ... (255) Denote the corresponding eigenfunctions by

β1, β2,... (256) They form an orthonormal set (e.g., if two independent eigenfunctions cor- responding to the same eigenvalue are not orthogonal, we use the Gram- Schmidt procedure to obtain orthogonal functions). We now note that if there are only a finite number of eigenvalues, then the kernel k(x, y) must be degenerate: n β (x)β (y) k(x, y) = X i i . (257) i=1 λi We may demonstrate this as follows: Consider the kernel n β (x)β (y) k0(x, y) = k(x, y) − X i i , (258) i=1 λi and its integral form ZZ J 0(ψ, ψ) = k0(x, y)ψ(x)ψ(y)dxdy. (259)

The maximum (and minimum) of this form is zero, since the eigenvalues of k(x, y) equal eigenvalues of Pn βi(x)βi(y) . Hence k0(x, y) = 0. i=1 λi We also have the following “expansion theorem” for integral transforms with a symmetric kernel. Theorem: Every continuous function g(x) that is an integral transform with symmetric kernel k(x, y) of a piecewise continuous function f(y), Z g(x) = k(x, y)f(y)dy, (260)

36 where k(y, x) = k(x, y), can be expanded in a uniformly and absolutely convergent series in the eigenfunctions of k(x, y):

∞ X g(x) = giβi(x), (261) i=1

where gi = hβi|gi. We notice that for series of the form: ∞ β (x)β (y) k(x, y) = X i i (262) i=1 λi the theorem is plausible, since

∞ X βi(x) Z g(x) = βi(y)f(y)dy i=1 λi ∞ X = giβi(x), (263) i=1 where gi = hβi|fi/λi = hβi|gi, and we should properly justify the interchange of the summation and the integral. We’ll forego a proper proof of the theorem and consider its application. We wish to solve the inhomogeneous integral equation:

Z b g(x) = f(x) − λ k(x, y)f(y)dy. (264) a

Suppose that λ is not an eigenvalue, λ 6= λi, i = 1, 2,.... Write

Z b f(x) − g(x) = λ k(x, y)f(y)dy. (265) a Assuming f(y) is at least piecewise continuous (hence, f − g must be con- tinuous), the expansion theorem tells us that f(x) − g(x) may be expanded in the absolutely convergent series:

∞ X f(x) − g(x) = aiβi(x), (266) i=1 where

ai = hβi|f − gi ZZ = λ k(x, y)f(y)βi(x)dydx Z Z = λ f(y)dy k(y, x)βi(x)dx λ = hβi|fi. (267) λi

37 Using the first and final lines, we may eliminate hβi|fi:

λi hβi|fi = hβi|gi, (268) λi − λ and arrive at the result for the expansion coefficients: λ ai = hβi|gi. (269) λi − λ Thus, we have the solution to the integral equation:

∞ X hβi|gi f(x) = g(x) + λ βi(x) . (270) i=1 λi − λ

This solution fails only if λ = λi is an eigenvalue, except that it remains valid even in this case if g(x) is orthogonal to all eigenfunctions corresponding to λi, in which case any linear combination of such eigenfunctions may be added to solution f.

6.1 Resolvent Kernels and Formal Solutions In the context of the preceding discussion, we may define a “resolvent kernel” R(x, y; λ) by: Z b f(x) = g(x) + λ R(x, y; λ)g(y)dy. (271) a Then ∞ β (y)β (x) R(x, y; λ) = X i i . (272) i=1 λi − λ The Fredholm series: 1 D(x, y; λ) (273) λ D(λ) is an example of such a resolvent kernel. Now look at the problem formally: We wish to solve the (operator) equa- tion f = g + λKf. The solution in terms of the resolvent is

f = g + λRg = (1 + λR)g. (274)

But we could have also obtained the “formal” solution: 1 f = g. (275) 1 − λK If “|λK|”< 1 then we have the series solution:

f = g + λKg + λ2K2g + ..., (276)

38 which is just the Neumann series. What do these formal operator equations mean? Well, they only have meaning in the context of operating on the appropriate operands. For exam- ple, consider the meaning of |λK| < 1. This might mean that for all possible normalized functions φ we must have that kλKk < 1, where the k indicates an “operator norm”, given by:  ZZ 2 kλKk ≡ max λ k(x, y)φ(x)φ(y)dxdy < 1. (277) φ By the Schwarz inequality, we have that ZZ kλKk ≡ λ2 [k(x, y)]2 dxdy. (278)

The reader is invited to compare this notion with the condition for con- vergence of the Neumann series in Whittaker and Watson: |λ(b − a)| max |k(x, y)| < 1. (279) x,y

6.2 Example Consider the problem: Z 2π f(x) = sin2 x + λ k(x, y)f(y)dy, (280) 0 with symmetric kernel 1 1 − α2 k(x, y) = , |α| < 1. (281) 2π 1 − 2α cos(x − y) + α2 We look for a solution of the form ∞ X hβi|gi f(x) = g(x) + λ βi(x), (282) i=1 λi − λ where g(x) = sin2 x. In order to accomplish this, we need to determine the eigenfunctions of the kernel. √ With some inspection, we realize that the constant 1/ 2π is an (normal- ized) eigenfunction. This is because the integral: Z 2π dx I0 ≡ (283) 0 1 − 2α cos(x − y) + α2 is simply a constant, with no dependence on y. In order to find the cor- responding eigenvalue, we must evaluate I0. Since I0 is independent of y, evaluate it at y = 0: Z 2π dx I0 = . (284) 0 1 − 2α cos x + α2

39 We turn this into a contour integral on the unit circle, letting z = eix. 1 Then dx = dz/iz and 2 cos x = z + z . This leads to: I dz I = i . (285) 0 αz2 − (1 + α2)z + α

The roots of the quadratic in the denominator are at z = {α, 1/α}. Thus,

i I dz I = . (286) 0 α (z − α)(z − 1/α)

Only the root at α is inside the contour; we evaluate the residue at this pole, and hence determine that 2π I = . (287) 0 1 − α2 √ We conclude that eigenfunction 1/ 2π corresponds to eigenvalue 1. We wish to find the rest of the eigenfunctions. Note that if we had not taken y = 0 in evaluating I0, we would have written: ieiy I dz I = , (288) 0 α (z − eiyα)(z − eiy/α) and the relevant pole is at eiyα. We thence notice that we know a whole class of integrals:

1 − α2 ieiy I zndz = αneiny, n ≥ 0. (289) 2π α (z − eiyα)(z − eiy/α)

Since zn = einx, we have found an infinite set of eigenfunctions, and their egienvalues. But we should investigate the negative powers as well – we didn’t include them here so far because they yield an additional pole, at z = 0. We wish to evaluate: ieiy I dz I ≡ , n ≥ 0. (290) −n α zn(z − eiyα)(z − eiy/α)

iy 1 1 The residue at pole z = αe is 1−α2 αneiny . We need also the residue at z = 0. It is coefficient A−1 in the expansion:

iy ∞ e X j n iy iy = Ajz . (291) z (z − e α)(z − e /α) j=−∞

After some algebra, we find that

iy ∞   e α X j−n −i(j+1)y 1 j+1 n iy iy = 2 z e j+1 − α . (292) z (z − e α)(z − e /α) 1 − α j=0 α

40 The j = n − 1 term will give us the residue at z = 0: α   A = e−iny α−n − αn . (293) −1 1 − α2 Thus, 2π I = αne−iny. (294) −n 1 − α2

We summarize the result: The normalized eigenfunctions are βn(x) = inx √e , with eigenvalues λ = α−|n|, for n = 0, ±1, ±2,.... 2π n Finally, it remains to calculate:

∞ 2 2 X hβn| sin xi f(x) = sin x + λ βn(x) λn − λ √n=1 2π "2β (x) β (x) + β (x)# = sin2 x + λ 0 − −2 2 4 1 − λ α−2 − λ λ  1 1  = sin2 x + − cos 2x . (295) 2 1 − λ α−2 − λ Note that if λ = 1 or λ = α−2 then there is no solution. On the other hand, if λ = α−|n| is one of the other eigenvalues (n 6= 0, ±2), then the above is still a solution, but it is not unique, since we can add any linear combination of βn(x) and β−n(x) and still have a solution.

7 Exercises

1. Given an abstract complex vector space (linear space), upon which we have defined a scalar product (inner product):

ha|bi (296)

between any two vectors a and b, prove the Schwarz inequality:

|ha|bi|2 ≤ ha|aihb|bi. (297)

Give the condition for equality to hold. One way to approach the proof is to consider the fact that the projection of a onto the subspace which is orthogonal to b cannot have a negative length, where we define the length (norm) of a vector according to: q kck ≡ hc|ci. (298)

Further, prove the triangle inequality:

ka + bk ≤ kak + kbk. (299)

41 2. Considering our RC circuit example, derive the results in Eqn. 31 through Eqn. 35 using the Fourier transform. 3. Prove the convolution theorem. 4. We showed the the Fourier transform of a Gaussian was also a Gaussian shape. That is, let us denote a Gaussian of mean µ and standard deviation σ by: 1 " (x − µ)2 # N(x; µ, σ) = √ exp − . (300) 2πσ 2σ2 (a) In class we found (in an equivalent form) that the Fourier Trans- form of a Gaussian of mean zero was: 1 " y2σ2 # Nˆ(y; 0, σ) = √ exp − . (301) 2π 2 Generalize this result to find the Fourier transform of N(x; µ, σ). (b) The experimental resolution function of many measurements is approximately Gaussian in shape (in probability&statistics we’ll prove the “Central Limit Theorem”). Often, there is more than one source of uncertainty contributing to the final result. For ex- ample, we might measure a distance in two independent pieces, with means µ1, µ2 and standard deviations σ1, σ2. The resolu- tion function (sampling distribution) of the final result is then the convolution of the two pieces: Z ∞ P (x; µ1, σ1, µ2, σ2) = N(y; µ1, σ1)N(x − y; µ2, σ2)dy. (302) −∞

Do this integral to find P (x; µ1, σ1, µ2, σ2). Note that it is possible to do so by straightforward means, though it is a bit tedious. You are asked here to instead use Fourier transforms to (I hope!) obtain the result much more easily. 5. The “Gaussian integral” is: 1 Z ∞ " (x − µ)2 # √ exp − dx = 1. (303) 2πσ −∞ 2σ2 Typically, the constants µ and σ2 are real. However, we have encoun- tered situations where they are complex. Determine the domain in (µ, σ2) for which this integral is valid. Try to do a careful and convinc- ing demonstration of your answer.

−µr 6. In class we√ consider the three-dimensional Fourier transform of e /r, 2 2 2 where r = x + y + z . What would the Fourier transform√ be in two dimensions (i.e., in a two-dimensional space with r = x2 + y2)?

42 7. The lowest P -wave hydrogen wave function in position space may be written: 1  r  ψ(x) = q r cos θ exp − , (304) 5 2a0 32πa0 √ where r = x2 + y2 + z2, θ is the polar angle with respect to the z axis, and a0 is a constant. Find the momentum-space wave function for this state (i.e., find the Fourier transform of this function). In this and all problems in this course, I urge you to avoid look-up tables (e.g., of integrals). If you do feel the need to resort to tables, however, be sure to state your source.

8. In section 2.2.1, we applied the Laplace transform method to determine the response of the RC circuit: V(t)

R 1

Vc (t)

R2 C

to an input voltage V (t) which was a delta function. Now determine VC (t) for a pulse input. Model the pulse as the difference between two exponentials:   V (t) = A e−t/τ1 − e−t/τ2 . (305)

9. In considering the homogeneous integral equation, we stated the the- orem that there are a finite number of eigenfunctions for any given eigenvalue. We proved this for real functions; now generalize the proof to complex functions.

10. Give a graphical proof that the series D(λ) and D(x, y; λ) in the Fred- holm solution are polynomials of degree n if the kernel is of the degen- erate form: n X k(x, y) = φi(x)ψi(y). (306) i=1

43 11. Solve the following equation for u(t): d2u Z 1 (t) + sin [k(s − t)] u(s)ds = a(t), (307) dt2 0 with boundary condition u(0) = u0(0) = 0, and a(t) is a given function. 12. Prove that an n-term degenerate kernel possesses at most n distinct eigenvalues. 13. Solve the integral equation: Z x 1 + y f(x) = ex + f(y)dy. (308) 1 x Hint: If you need help solving a differential equation, have a look at Mathews and Walker chapter 1. 14. In section 5.2.1 we developed an algorithm for the numerical solution of Volterra’s equation. Apply this method to the equation: Z x f(x) = x + e−xyf(y)dy. (309) 0 In particular, estimate f(1), using one, two, and three intervals (i.e., N = 1, N = 2, and N = 3). [We’re only doing some low values so you don’t have to develop a lot of technology to do the computation, but going to high enough N to get a glimpse at the convergence.] 15. Another method we discussed in section 3 is the extension to the Laplace transform in Laplace’s method for solving differential equa- tions. I’ll summarize here: We are given a differential equation of the form: n X (k) (ak + bkx)f (x) = 0 (310) k=0 We assume a solution of the form: Z f(x) = F (s)esxds, (311) C where C is chosen depending on the problem. Letting

n X k U(s) = aks (312) k=0 n X k V (s) = bks , (313) k=0 the formal solution for F (s) is: A Z s U(s0) F (s) = exp ds0, (314) V (s) V (s0)

44 where A is an arbitrary constant. A differential equation that arises in the study of the hydrogen atom is the Laguerre equation:

xf 00(x) + (1 − x)f 0(x) + λf(x) = 0. (315)

Let us attack the solution to this equation using Laplace’s method.

(a) Find F (s) for this differential equation. (b) Suppose that λ = n = 0, 1, 2,... Pick an appropriate contour, and determine fn(x).

16. Write the diagram, with coefficients, for the fifth-order numerator and denominator of the Fredholm expansion.

17. Solve the equation: Z π f(x) = sin x + λ cos x sin yf(y)dy (316) 0 for f(x). Find any eigenvalues and the corresponding eigenfunctions. Hint: This problem is trivial!

18. Find the eigenvalues and eigenfunctions of the kernel:

1 x + y x − y k(x, y) = log sin / sin 2 2 2 ∞ sin nx sin ny = X , 0 ≤ x, y ≤ π. (317) n=1 n

19. In the notes we considered the kernel: 1 1 − α2 k(x, y) = , (318) 2π 1 − 2α cos(x − y) + α2 where |α| < 1 and 0 ≤ x, y ≤ 2π. Solve the integral equation

Z 2π f(x) = ex + λ k(x, y)f(y)dy (319) 0 with this kernel. What happens if λ is an eigenvalue? If your solution is in the form of a series, does it converge?

20. Solve for f(x): Z x f(x) = x + (y − x)f(y)dy, (320) 0 This problem can be done in various ways. If you happen to obtain a series solution, be sure to sum the series.

45 21. We wish to solve the following integral equation for f(x): Z x f(x) = g(x) − λ f(y)dy, (321) 0 where g(x) is a known, real continuous function with continuous first derivative, and satisfies g(0) = 0. (a) Show that this problem may be re-expressed as a differential equa- tion with suitable boundary condition, which may be written in operator form as Lf = g0. Give L explicitly and show that it is a linear operator. (b) Suppose that G(x, y) is the solution of LG = δ(x − y), where δ(x) is the Dirac δ function. Express the solution to the original problem in the form of an integral transform involving G and g0. (c) Find G(x, y) and write down the solution for f(x). 22. Some more Volterra’s equations: Solve for f(x) in the following two cases –

R x (a) f(x) = sin x + cos x + 0 sin(x − y)f(y) dy, −x R x y−x (b) f(x) = e + 2x + 0 e f(y) dy. 23. Consider the LCR circuit in Fig. 4: V(t)

L

V0 (t)

R C

Figure 4: An LCR circuit.

Use the Laplace transform to determine V0(t) given  V 0 < t < T V (t) = (322) 0 otherwise. √ √ Make a√ sketch of V (t) for (a) 2RC > LC; (b) 2RC < LC; (c) 2RC = LC.

46 24. The radioactive decay of a nucleus is a random process in which the probability of a decay in time interval (t, t+dt) is independent of t, if the decay has not already occurred. This leads to the familiar exponential decay law (as you may wish to convince yourself): If at time t, there are N(t) nuclei, then the rate of decays is proportional to N(t): dN (t) = −λN(t). dt Integrating, we find N(t):

N(t) = N(0)e−λt.

In practice, radioactive decays often occur in long chains. For (a sim- plified) example, 238U decays via α-emission to 234Th with a half-life of 4.5 × 109 y; 234Th decays in a subchain with two β emissions to 234U with a half-life of 24 d; 234U decays via α-emission to 230Th with a 2.4×106 y half-life; etc. We may use the method of Laplace transforms to determine how the abundance of any species of nucleus in such a chain evolves with time. Thus, suppose that we have a decay chain A → B → C → D, where D is stable, and the decay rates for A, B, and C are λA, λB, and λC , respectively. Suppose NB(0) = 0. Determine NC (t) as a function of the rates and the initial abundances NA(0) and NC (0). You are supposed to approach this problem by setting up a system of differential equations for the abundances, and then using Laplace transforms to solve the equations. Note, in setting up your differential equations, that the rate of change in abundance for an intermediate nucleus in the chain gets a contribution from the nucleus decaying into it as well as from its own decay rate.

25. Solve the following integral equations for f(x):

(a) Z 2 f(x) = ex + λ xyf(y)dy. (323) 0 (b) Z π f(x) = λ sin(x − y)f(y)dy. (324) 0 For both parts, what happens for different values of λ?

26. Consider the following simple integral equation:

Z 1 f(x) = x2 + λ xyf(y)dy. (325) 0

47 (a) Find the Neumann series solution, to order λ2. For what values of λ do you expect the Neumann series to be convergent? If you aren’t sure from what you have done so far, try doing the rest of the problem and come back to this. (b) Find the Fredholm series solution, to order λ2. (c) This is a degenerate kernel, so find the solution according to our method for degenerate kernels.

48