Sturm–Liouville Theory
Total Page:16
File Type:pdf, Size:1020Kb
2 Sturm–Liouville Theory So far, we’ve examined the Fourier decomposition of functions defined on some interval (often scaled to be from π to π). We viewed this expansion as an infinite dimensional − analogue of expanding a finite dimensional vector into its components in an orthonormal basis. But this is just the tip of the iceberg. Recalling other games we play in linear algebra, you might well be wondering whether we couldn’t have found some other basis in which to expand our functions. You might also wonder whether there shouldn’t be some role for matrices in this story. If so, read on! 2.1 Self-adjoint matrices We’ll begin by reviewing some facts about matrices. Let V and W be finite dimensional vector spaces (defined, say, over the complex numbers) with dim V = n and dim W = m. Suppose we have a linear map M : V W . By linearity, we know what M does to any → vector v V if we know what it does to a complete set v , v ,..., v of basis vectors ∈ { 1 2 n} in V . Furthermore, given a basis w , w ,..., w of W we can represent the map M in { 1 2 m} terms of an m n matrix M whose components are × Mai =(wa,Mvi) for a =1, . , m and i =1, . , n , (2.1) where ( , ) is the inner product in W . We’ll be particularly interested in the case m = n, when the matrix M is square and the map M takes M : V W = V is an isomorphism of vector spaces. For any n n → ∼ × matrix M we define it’s eigenvalues λ ,λ ,...,λ to be the roots of the characteristic { 1 2 n} equation P (λ) det(M λI) = 0, where I is the identity matrix. This characteristic ≡ − equation has degree n and the fundamental theorem of algebra assures us that we’ll always be able to find n roots (generically complex, and not necessarily distinct). The eigenvector vi of M that corresponds to the eigenvalue λi is then defined by Mvi = λivi (at least for non-degenerate eigenvalues). Given a complex n n matrix M, its Hermitian conjugate M† is defined to be the × T complex conjugate of the transpose matrix, M† (M )∗, where the complex conjugation T ≡ acts on each entry of M . A matrix is said to be Hermitian or self-adjoint if M† = M. There’s a neater way to define this: since for two vectors we have (u, v)=u v, we see † · that a matrix B is the adjoint of a matrix A iff (Bu, v)=(u, Av) (2.2) because the vector (Bu)† = u†B†. The advantages of this definition are that i) it does not require that we pick any particular components in which to write the matrix and ii) it applies whenever we have a definition of an inner product ( , ). Self-adjoint matrices have a number of very important properties. Firstly, since λi(vi, vi)=(vi, Mvi)=(Mvi, vi)=λi∗(vi, vi) (2.3) – 19 – the eigenvalues of a self-adjoint matrix are always real. Secondly, we have λi(vj, vi) = (vj, Mvi)=(Mvj, vi)=λj(vj, vi) (2.4) or in other words (λ λ )(v , v ) = 0 (2.5) i − j j i so that eigenvectors corresponding to distinct eigenvalues are orthogonal wrt the inner product ( , ). After rescaling the eigenvectors to have unit norm, we can express any v V as a linear combination of the orthonormal set v , v ,..., v of eigenvectors of ∈ { 1 2 n} some self-adjoint M. If M has degenerate eigenvalues (i.e. two or more distinct vectors have the same eigenvalue) then the set of vectors sharing an eigenvalue form a vector subspace of V and we simply choose an orthonormal basis for each of these subspaces. In any case, the important point here is that self-adjoint matrices provide a natural way to pick a basis on our vector space. A self-adjoint matrix M is non-singular (det M = 0 so that M 1 exists) if and only ' − if all its eigenvalues are non-zero. In this case, we can solve the linear equation Mu = f 1 for a fixed vector f and unknown u. Formally, the solution is u = M− f, but a practical way to determine u proceeds as follows. Suppose v , v ,..., v is an orthonormal basis { 1 2 n} of eigenvectors of M. Then we can write n n f = fivi and u = uivi !i=1 !i=1 where fi =(vi, f) etc. as before. We will know the vector u if we can find all its coefficients u in the v basis. But by linearity i { i} n n n Mu = ui Mvi = uiλivi = f = fivi , (2.6) !i=1 !i=1 !i=1 and taking the inner product of this equation with vj gives n n uiλi (vj, vi)=ujλj = fi(vj, vi)=fj (2.7) !i=1 !i=1 using the orthonormality of the basis. Provided λ = 0 we deduce u = f /λ so that j ' j j j n fi u = vi . (2.8) λi !i=1 If M is singular then either Mu = f has no solution or else has a non-unique solution (which it is depends on the choice of f). 2.2 Differential operators In the previous chapter we learned to think of functions as infinite dimensional vectors. We’d now like to think of the analogue of matrices. Sturm and Liouville realised that these – 20 – could be thought of as linear differential operators . This is just a linear combination of L derivatives with coefficients that can also be functions of x, i.e. is a linear differential L operator of order p if p p 1 d d − d = Ap(x) p + Ap 1(x) p 1 + + A1(x) + A0(x) . L dx − dx − ··· dx When it acts on a (sufficiently smooth) function y(x) it gives us back some other function y(x) obtained by differentiating y(x) in the obvious way. This is a linear map between L spaces of functions because for two (p-times differentiable) functions y1,2(x) and constants c we have (c y + c y )=c y + c y . The analogue of the matrix equation Mu = f 1,2 L 1 1 2 2 1L 1 2L 2 is then the differential equation y(x)=f(x) where we assume that both the coefficient L functions A (x),...,A (x) in and the function f(x) are known, and that we wish to find p 0 L the unknown function y(x). For most of our applications in mathematical physics, we’ll be interested in second order9 linear differential operators10 d2 d = P (x) + R(x) Q(x) . (2.9) L dx2 dx − Recall that for any such operator, the homogeneous equation y(x) = 0 has precisely two L non-trivial linearly independent solutions, say y = y1(x) and y = y2(x) and the general solution y(x)=c1y1(x)+c2y2(x) with ci C is known as the complementary function. ∈ When dealing with the inhomogeneous equation y = f, we seek any single solution y(x)= L yp(x), and the general solution is then a linear combination y(x)=cpyp(x)+c1y1(x)+c2y2(x) of the particular and complementary solutions. In many physical applications, the function f represents a driving force for a system that obeys y(x) = 0 if left undisturbed. L In the cases (I assume) you’ve seen up to now, actually finding the particular solution required a good deal of either luck or inspired guesswork – you ‘noticed’ that if you dif- ferentiated such-and-such a function you’d get something that looked pretty close to the solution you’re after, and perhaps you could then refine this guess to find an exact solu- tion. Sturm–Liouville theory provides a more systematic approach, analogous to solving the matrix equation Mu = f above. 2.3 Self-adjoint differential operators The 2nd-order differential operators considered by Sturm & Liouville take the form d dy y p(x) q(x)y, (2.10) L ≡ dx dx − " # where p(x) is real (and once differentiable) and q(x) is real and continuous. This may look to be a tremendous specialization of the general form (2.9), with R(x) restricted to be 9It’s a beautiful question to ask ‘why only second order’? Particularly in quantum theory. 10The sign in front of Q(x) is just a convention. – 21 – P (x), but actually this isn’t the case. Provided P (x) = 0, starting from (2.9) we divide # ' through by P (x) to obtain 2 d R(x) d Q(x) x R(t)/P (t)dt d x R(t)/P (t)dt d Q(x) + =e− 0 e 0 (2.11) dx2 P (x) dx − P (x) dx dx − P (x) ! " ! # x Thus setting p(x) to be the integrating factor p(x) = exp 0 R(t)/P (t)dt and likewise setting q(x)=Q(x)p(x)/P (x), we see that the forms (2.10) and (2.9) are equivalent. $% & However, for most purposes (2.10) will be more convenient. The beautiful feature of these Sturm–Liouville operators is that they are self-adjoint with respect the inner product b (f, g)= f(x)∗g(x)dx, (2.12) 'a provided the functions on which they act obey appropriate boundary conditions. To see this, we simply integrate by parts twice: b d df ∗ ( f, g)= p(x) q(x)f ∗(x) g(x)dx L dx dx − 'a ( " # ) b b df ∗ df ∗ dg = p g p(x) q(x)f(x)∗g(x)dx dx − a dx dx − ( )a ' (2.13) b b df ∗ dg d dg = p g pf∗ + f(x)∗ p(x) q(x) g(x) dx − dx dx dx − ( )a 'a ( " # ) b df ∗ dg = p(x) g f ∗ +(f, g) dx − dx L ( " #)a where in the first line we have used fact that p and q are real for a Sturm–Liouville operator.