Linear Algebra and Linear Operators in Engineering

LINEAR ALGEBRA AND LINEAR OPERATORS IN ENGINEERING with Applications in Mathematica This is Volume 3 of PROCESS SYSTEMS ENGINEERING A Series edited by George Stephanopoulos and John Perkins LINEAR ALGEBRA AND LINEAR OPERATORS IN ENGINEERING with Applications in Mathematica

H. Ted Davis Department of Chemical Engineering and Materials Science University of Minnesota Minneapolis, Minnesota Kendall T. Thomson School of Chemical Engineering Purdue University West Lafayette, Indiana

ACADEMIC PRESS

An Imprint ofEbevier

San Diego San Francisco New York Boston London Sydney Tokyo This book is printed on acid-free paper, fe/

All Rights Reserved, No part of this publication may be reproduced or transmitted in any fonn or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without pennission in writing from the publisher.

Permissions may be sought directly frotn Elsevier's Science and Technology Rights Department in Oxtbrd, UK. Phone: (44) 1865 843830, Fax: (44) 1865 853333, e-mail: [email protected]. You may also complete your request on-line via the Elsevier homepage: hllp://www.elsevier.com by selecting "Customer Support" and then "Obtaining Permissions".

Academic Press An Imprint of Elsevier 525 B Street, Suite 1900, San Diego, CaUfomia 92101-4495, USA http://www.academicpres$.com

Academic Press Harcouit Place, 32 Jamestown Road, London NWl 7BY, UK http;//www.hbuk.co.uk/ap/

Library of Congress Catalog Card Number: 00-100019 ISBN-13:978-0-12-206349-7 ISBN-10:0-12-206349-X PRINTED IN THE UNITED STATES OF AMERICA 05 QW 98765432 CONTEKTS

PREFACE xi

I Determinants

1.1. Synopsis 1 1.2. Matrices 2 1.3. Definition of a Determinant 3 1.4. Elementary Properties of Determinants 6 1.5. Cofactor Expansions 9 1.6. Cramer's Rule for Linear Equations 14 1.7. Minors and Rank of Matrices 16 Problems 18 Further Reading 22

2 Vectors and Matrices

2.1. Synopsis 25 2.2. Addition and Multiplication 26 2.3. The Inverse Matrix 28 2.4. Transpose and Adjoint 33 VI CONTENTS

2.5. Partitioning Matrices 35 2.6. Linear Vector Spaces 38 Problems 43 Further Reading 46

3 Solution of Linear and Nonlinear Systems

3.1. Synopsis 47 3.2. Simple Gauss Elimination 48 3.3. Gauss Elimination with Pivoting 55 3.4. Computing the Inverse of a Matrix 58 3.5. LU-Decomposition 61 3.6. Band Matrices 66 3.7. Iterative Methods for Solving Ax = b 78 3.8. Nonhnear Equations 85 Problems 108 Further Reading 121

4 General Theory of Solvability of Linear Algebraic Equations

4.1. Synopsis 123 4.2. Sylvester's Theorem and the Determinants of Matrix Products 124 4.3. Gauss-Jordan Transformation of a Matrix 129 4.4. General Solvability Theorem for Ax = b 133 4.5! Linear Dependence of a Vector Set and the Rank of Its Matrix 150 4.6. The Fredholm Alternative Theorem 155 Problems 159 Further Reading 161

5 The Eigenproblem

5.1. Synopsis 163 5.2. Linear Operators in a Normed Linear Vector Space 165 5.3. Basis Sets in a Normed Linear Vector Space 170 5.4. Eigenvalue Analysis 179 5.5. Some Special Properties of Eigenvalues 184 5.6. Calculation of Eigenvalues 189 Problems 196 Further Reading 203 CONTENTS VII

6 Perfect Matrices

6.1. Synopsis 205 6.2. Implications of the Spectral Resolution Theorem 206 6.3. Diagonalization by a Similarity Transformation 213 6.4. Matrices with Distinct Eigenvalues 219 6.5. Unitary and Orthogonal Matrices 220 6.6. Semidiagonalization Theorem 225 6.7. Self-Adjoint Matrices 227 6.8. Normal Matrices 245 6.9. Miscellanea 249 6.10. The Initial Value Problem 254 6.11. Perturbation Theory 259 Problems 261 Further Reading 278

7 Imperfect or Defective Matrices

7.1. Synopsis 279 7.2. Rank of the Characteristic Matrix 280 7.3. Jordan Block Diagonal Matrices 282 7.4. The Jordan Canonical Form 288 7.5. Determination of Generalized Eigenvectors 294 7.6. Dyadic Form of an Imperfect Matrix 303 7.7. Schmidt's Normal Form of an Arbitrary Square Matrix 304 7.8. The Initial Value Problem 308 Problems 310 Further Reading 314

8 Infinite-Dimensional Linear Vector Spaces

8.1. Synopsis 315 8.2. Infinite-Dimensional Spaces 316 8.3. Riemann and Lebesgue Integration 319 8.4. Inner Product Spaces 322 8.5. Hilbert Spaces 324 8.6. Basis Vectors 326 8.7. Linear Operators 330 8.8. Solutions to Problems Involving fe-term Dyadics 336 8.9. Perfect Operators 343 Problems 351 Further Reading 353 VIII CONTENTS

9 Linear Integral Operators in a Hilbert Space

9.1. Synopsis 355 9.2. Solvability Theorems 356 9.3. Completely Continuous and Hilbert-Schmidt Operators 366 9.4. Volterra Equations 375 9.5. Spectral Theory of Integral Operators 387 Problems 406 Further Reading 411

10 Linear Differential Operators in a Hilbert Space

10.1. Synopsis 413 10.2. The Differential Operator 416 10.3. The Adjoint of a Differential Operator 420 10.4. Solution to the General Inhomogeneous Problem 426 10.5. Green's Function: Inverse of a Differential Operator 439 10.6. Spectral Theory of Differential Operators 452 10.7. Spectral Theory of Regular Sturm-Liouville Operators 459 10.8. Spectral Theory of Singular Sturm-Liouville Operators 477 10.9. Partial Differential Equations 493 Problems 502 Further Reading 509

APPENDIX A.l. Section 3.2: Gauss Elimination and the Solution to the Linear System Ax = b 511 A,2. Example 3.6.1: Mass Separation with a Staged Absorber 514 A.3. Section 3.7: Iterative Methods for Solving the Linear System Ax = b 515 A.4. Exercise 3.7.2: Iterative Solution to Ax = b—Conjugate Gradient Method 518 A.5. Example 3.8.1: Convergence of the Picard and Newton-Raphson Methods 519 A.6. Example 3.8.2: Steady-State Solutions for a Continuously Stirred Tank Reactor 521 A.7. Example 3.8.3: The Density Profile in a Liquid-Vapor Interface (Iterative Solution of an Integral Equation) 523 A.8. Example 3.8.4: Phase Diagram of a Polymer Solution 526 A.9. Section 4.3: Gauss-Jordan Elimination and the Solution to the Linear System Ax = b 529 A. 10. Section 5.4: Characteristic Polynomials and the Traces of a Square Matrix 531 A. 11. Section 5.6: Iterative Method for Calculating the Eigenvalues of Tridiagonal Matrices 533 A. 12. Example 5.6.1: Power Method for Iterative Calculation of Eigenvalues 534 CONTENTS IX

A. 13. Example 6.2.1: Implementation of the Spectral Resolution Theorem—Matrix Functions 535 A.M. Example 9.4.2: Numerical Solution of a Volterra Equation (Saturation in Porous Media) 537 A.15. Example 10.5.3: Numerical Green's Function Solution to a Second-Order Inhomogeneous Equation 540 A.16. Example 10.8.2: Series Solution to the Spherical Diffusion Equation (Carbon in a Cannonball) 542

INDEX 543 This Page Intentionally Left Blank PREFACE

This textbook is aimed at first-year graduate students in engineering or the physical sciences. It is based on a course that one of us (H.T.D.) has given over the past several years to chemical engineering and materials science students. The emphasis of the text is on the use of algebraic and operator techniques to solve engineering and scientific problems. Where the proof of a theorem can be given without too much tedious detail, it is included. Otherwise, the theorem is quoted along with an indication of a source for the proof. Numerical techniques for solving both nonlinear and linear systems of equations are emphasized. Eigenvector and eigenvalue theory, that is, the eigenproblem and its relationship to the operator theory of matrices, is developed in considerable detail. Homework problems, drawn from chemical, mechanical, and electrical engineering as well as from physics and chemistry, are collected at the end of each chapter—the book contains over 250 homework problems. Exercises are sprinkled throughout the text. Some 15 examples are solved using Mathematica, with the Mathematica codes presented in an appendix. Partially solved examples are given in the text as illustrations to be completed by the student. The book is largely self-contained. The first two chapters cover elementary principles. Chapter 3 is devoted to techniques for solving linear and nonlinear algebraic systems of equations. The theory of the solvability of linear systems is presented in Chapter 4. Matrices as linear operators in linear vector spaces are studied in Chapters 5 through 7. The last three chapters of the text use analogies between finite and infinite dimensional vector spaces to introduce the functional theory of linear differential and integral equations. These three chapters could serve as an introduction to a more advanced course on functional analysis.

H. Ted Davis Kendall T. Thomson

Xi This Page Intentionally Left Blank I DETERMINANTS

I.I. SYNOPSIS

For any square array of numbers, i.e., a square matrix, we can define a determinant—a scalar number, real or complex. In this chapter we will give the fundamental definition of a determinant and use it to prove several elementary properties. These properties include: determinant addition, scalar multiplication, row and column addition or subtraction, and row and column interchange. As we will see, the elementary properties often enable easy evaluation of a determinant, which otherwise could require an exceedingly large number of multiplication and addition operations. Every determinant has cofactors, which are also determinants but of lower order (if the determinant corresponds to an n x n array, its cofactors correspond to (n — 1) X (n — 1) arrays). We will show how determinants can be evaluated as linear expansions of cofactors. We will then use these cofactor expansions to prove that a system of linear equations has a unique solution if the determinant of the coefficients in the linear equations is not 0. This result is known as Cramer's rule, which gives the analytic solution to the linear equations in terms of ratios of determinants. The properties of determinants established in this chapter will play (in the chapters to follow) a big role in the theory of linear and nonlinear systems and in the theory of matrices as linear operators in vector spaces. CHAPTER I DETERMINANTS

1.2. MATRICES

A matrix A is an array of numbers, complex or real. We say A is an m x n- dimensional matrix if it has m rows and n columns, i.e..

-'22 ^23

-*32 '*33 ^3n (1.2.1)

^m2 •*m3

The numbers a,y (/ = 1,..., m, 7 = I,..., w) are called the elements of A with the element a^j belonging to the ith row and 7th column of A. An abbreviated notation for A is

Ki- ll.2.2)

By interchanging the rows and columns of A, the transpose matrix A^ is generated. Namely,

A^ = (1.2.3)

The rows of A^ are the columns of A and the iji\\ element of A^ is a^,, i.e., {^)ij = ciji' If A is an m X n matrix, then A^ is an n x m matrix. When m = n, we say A is a square matrix. Square matrices figure importantly in applications of linear algebra, but non-square matrices are also encountered in common physical problems, e.g., in least squares data analysis. The m x 1 matrix

•^2 X = (1.2.4)

and the 1 x m matrix

y'^ = [ji,-..>Jn] (1.2.5)

are also important cases. They are called vectors. We say that x is an m-dimensional column vector containing m elements, and y^ is an n-dimensional row vector containing n elements. Note that y^ is the transpose of the n x 1 matrix y—the n- dimensional column vector y. DEFINITION OF A DETERMINANT

If A and B have the same dimensions, then they can be added. The rule of matrix addition is that corresponding elements are added, i.e.,

A + B = [a^j] + [bij] = [a^j -f b^j]

^21 + ^21 ^22 + ^22 «2« + K (1.2.6)

, ^ml +^ml ^ml '^ ^ml Consistent with the definition of m x « matrix addition, the multiplication of the matrix A by a complex number a (scalar multiplication) is defined by

a A = [aa,.^.]; (1.2.7)

i.e., a A is formed by replacing every element a^j of A by aa,y.

1.3. DEFINITION OF A DETERMINANT

A determinant is defined specifically for a square matrix. The various notations for the determinant of A are

D = D^=DetA = |A|=:K.y| (1.3.1)

We define the determinant of A as follows: D^ Yl (-l)''«./,«2,, (1.3.2)

where the summation is taken over all possible products of a^^ in which each product contains n elements and one and only one element from each row and each column. The indices /,,...,/„ are permutations of the integers 1,..., n. We will use the symbol Y! to denote summation over all permutations. For a given set {/i,..., /„}, the quantity P denotes the number of transpositions required to transform the sequence /j, /2,..., /„ into the ordered sequence 1, 2,..., w. A transposition is defined as an interchange of two numbers /, and Ij. Note that there are n\ terms in the sum defining D since there are exactly n\ ways to reorder the set of numbers {1,2,..., n) into distinct sets {IxJi^ • • •»h)- As an example of a determinant, consider

aji ^12 ci\ ^21 ^22 ^23 — ^n«22%3 ~ ^12^21^33 ~ ^11^23^32 (1.3.3) ^31 ^32 ^33

«13«22«31 + «I3«2I<'32 + «I2«23«31 • CHAPTER I DETERMINANTS

The sign of the second term is negative because the indices {2, 1,3} are transposed to {1, 2, 3} with the one transposition

2, l,3-> 1,2,3,

and so P = 1 and (—1)^ = —1. However, the transposition also could have been accomplished with the three transpositions

2,1,3-> 2,3,1 -> 1,3,2-^ 1,2,3,

in which case P = 3 and (—1)^ = — 1. We see that the number of transpositions P needed to reorder a given sequence /j,..., /„ is not unique. However, the evenness or oddness of P is unique and thus (—1)^ is unique for a given sequence. ••• EXERCISE 1.3.1. Verify the signs in Eq. (1.3.3). Also, verify that the number I I • of transpositions required for <^ii«25^33'^42^54 is even. A definition equivalent to that in Eq. (1.3.2) is

(1.3.4)

If the product ai^iai^2'' '%n *s reordered so that the first indices of the «/, are ordered in the sequence 1,...,«, the second indices will be in a sequence requiring P transpositions to reorder as 1,..., M. Thus, the n! n-tuples in Eqs. (1.3.2) and (1.3.4) are the same and have the same signs. The determinant in Eq. (1.3.3) can be expanded according to the defining equation (1.3.4) as

— ^11^22^32 ~ ^21^12^33 ~ ^11^32^23 '*22 ^23 (1.3.5) '*32

«3,a22^j3 4- «2l«32«l3 + «3l<^12«23- It is obvious by inspection that the right-hand sides of Eqs. (1.3.3) and (1.3.5) are identical since the various terms differ only by the order in which the multiplication of each 3-tuple is carried out. In the case of second- and third-order determinants, there is an easy way to generate the distinct n-tuples. For the second-order case,

21 "22

the product of the main diagonal, an^22» is one of the 2-tuples and the product of the reverse main diagonal, ai2«2i» is the other. The sign of «i2<^2i is negative since {2,1) requires one transposition to reorder to {1, 2}. Thus,

— diiCl'j')Ml "22 di'yCl'12"2 1 (1.3.6) ^21 ^22 DEFINITION OF A DETERMINANT

since there are no other 2-tuples containing exactly one element from each row and column. In the case of the third-order determinant, the six 3-tuples can be generated by multiplying the elements shown below by solid and dashed curves

/ /Tl /«f l /C^Al / 013 ' / / / qi\ a^2 «^3

[ OKi^ ail ai^ ^

The products associated with solid curves require an even number of transpositions P and those associated with the dashed curves require an odd P. Thus, the determinant is given by

a,3

^22 «23 = «11^22^33 + «12^23«31 + ^13<^2I«32 (1.3.7) «33

«I3«22^31 - «ll«23«32 " ^12^21^33^ in agreement with Eq. (1.3.3), the defining expression. For example, the following determinant is 0: 2 5 = 1(5)1 + 2(6)3 + 3(2)4 - 3(5)3 - 2(6)1 - 1(2)4 (1.3.8) 2

= 0. The evaluation of a determinant by calculation of the n\ n-tuples requires {n — l)(n!) multiplications. For a fourth-order determinant, this requires 72 multiplications, not many in the age of computers. However, if n = 100, the number of required multiplications would be

<"-""-<"-'>(7)"=K^)' (1.3.9) ~ 3.7 X 10*^^ where Stirling's approximation, n\ ^ {n/eY, has been used. If the time for one multiplication is 10~^ sec, then the required time to do the multiplications would be 3.7 X lO^'*^ sec, or 1.2 x 10^^^ years! (1.3.10) Obviously, large determinants cannot be evaluated by direct calculation of the defining n-tuples. Fortunately, the method of Gauss elimination, which we will describe in Chapter 3, reduces the number of multiplications to n^. For n = 100, this is 10*^ multiplications, as compared to 3.7 x 10^^^ by direct n-tuple evaluation. The Gauss elimination method depends on the application of some of the elementary properties of determinants given in the next section. CHAPTER i DETERMINANTS

1.4. ELEMENTARY PROPERTIES OF DETERMINANTS

If the determinant of A is given by Eq. (1.3.2), then—because the elements of the transpose A^ are Uj^—it follows that

(1.4.1)

However, according to Eq. (1.3.4), the right-hand side of Eq. (1.4.1) is also equal to the determinant D^ of A. This establishes the property that 1. A determinant is invariant to the interchange of rows and columns; i.e., the determinant of A is equal to the determinant of A^. For example,

1 8 = 7-24 = -17 3 7

1 3 = 7-24 = -17. 8 7

Another elementary property of a determinant is that 2. If two rows (columns) of a determinant are interchanged, then the determinant changes sign. For example.

D = = flii^^?? — floiflf21"12 » m

^11 D' = a^xQi\^\i ^w^n = -'^•

From the definition of D in Eq. (1.3.2),

^=EVi)%«2/, hir' ^nL^ it follows that the determinant D' formed by the interchange of rows i and j in D is

^' = E (~l)^ «l/,«2/2 • • • ^Jh • ' ' ^ilj ' ' ' ^nl„ ' (1.4.2) Each term in D' corresponds to one in D if one transposition is carried out. Thus, P and P' differ by 1, and so (-1)^' = (-1)^^^ = -(-1)^. From this it follows that D' = — D. A similar proof that the interchange of two columns changes the sign of the determinant can be given using the definition of D in Eq. (1.3.4). Alternatively, from the fact that D^ = D^T, it follows that if the interchange of two rows changes the sign of the determinant, then the interchange of two columns does the same thing because the columns of A^ are the rows of A. ELEMENTARY PROPERTIES OF DETERMINANTS /

The preceding property implies:

3. If any two rows (columns) of a matrix are the same, its determinant is 0.

If two rows (columns) are interchanged, D ~ —D'. However, if the rows (columns) interchanged are identical, then D = D\ The two equalities, D = —D^ and D = D\ are possible only if D = D' = 0. Next, we note that

4. Multiplication of the determinant D by a constant k is the same as multiplying any row (column) by k.

This property follows from the commutative law of scalar multiplication, i.e., kab — {ka)b = a(kb), or

kD= E'(-l)^^i/,«2/2 • • -H/, • "(ini„

«2l "2«

kai„ (1.4.3) ci„„

a,. «12 ka^

«22 kG'j *ln

^nl ka nj

Multiplication of the determinant

1 2 3 D 2 4 6 1 2 8

by \ gives

1 1 3 2^ = 2 2 6 1 1 8

from which we can conclude that D/2 = 0 and D = 0, since D/2 has two identical columns. Stated differently, the multiplication rule says that if a row (column) of D has a common factor k, then D = kD\ where D' is formed from D by replacing the row (column) with the common factor by th6 row (column) divided by the CHAPTER I DETERMINANTS common factor. Thus, in the previous example,

1 1 3 D = 4 I 1 3 1 1 8

The fact that a determinant is 0 if two rows (columns) are the same yields the property:

5. The addition of a row (column) multiplied by a constant to any other row (column) does not change the value of D.

To prove this, note that

(1.4.4)

The second determinant on the right-hand-side of Eq. (1.4.4) is 0 since the elements of the iih and jth rows are the same. Thus, D' = D. The equality D^ = D^T establishes the property for column addition. As an example,

1 2 H-2^ 2 =3-2=1= = 3 + 6A:-2-6A:=l. 1 3 1+3A: 3

Elementary properties can be used to simplify a determinant. For example.

1 2 3 1[ 2 3- 2 1 2 1 2 4 6 = 2 4 6-4 = 2 4 2 1 2 8 1 2 8-2 1 1 2 6

1 1 1 1 1-1 1 -1 -2(2) 1 1 1 = 4 1 1-1 1-1 (1.4.5) 1 1 6 1 1-1 6-1

1 0 0 1 0 Ol = 4 1 0 0 = 41 0 0 o| 1 0 5 0 0 5|

The sequence of application of the elementary properties in Eq. (1.4.5) is, of course, not unique. Another useful property of determinants is:

6. If two determinants differ only by one row (column), their sum differs only in that the differing rows (colunms) are summed. COFACTOR EXPANSIONS

That is.

«ir-- «ii «„.•• ay-\-by- a,„ + ^m ^nn «nl • • • «m + Ki • • • «n (1.4.6) This property follows from the definition of determinants and the distributive law (ca + cb = c(a -f b)) of scalar multiplication. As the last elementary property of determinants to be given in this section, consider differentiation of D by the variable /: dD day, ,,^ -an, • • -Ifa,, 'dt = E'(- '^nl„ + E'(- dt da„i^ + ••• !)''«„ (IA7) + E'(- .^2/2 dt

or dD dt = E'(- dt -%2 ' • • «/„« + E'(--1)%. %2- dt (1.4.8) =j:^i'

The determinant D'- is evaluated by replacing in D the elements of the ith row by the derivatives of the elements of the ith row. Similarly, for Dl\ replace in D the elements of the ith column by the derivatives of the elements of the ith column. For example.

dOii dci^2 d dt It dt + da2\ da22 «22 IT dt da II dai2 (1.4.9) dt dt -f dan d^22 dt ^21 IT

.5. COFACTOR EXPANSIONS

We define the cofactor A,^ as the quantity (—1)'^^ multiplied by the determinant of the matrix generated when the ith row and jth column of A are removed. For example, some of the cofactors of the matrix

A = ^21 ^22 ^23 (1.5.1)

•^32 ^33 10 CHAPTER I DETERMINANTS

include

^22 ^23 «12 «13 a,2 fl,3 Au = , A21 — — ' ^31 — (1.5.2) ^32 ^33 «32 «33 ^22 ^23

In general,

*i,;-i M.7+1

A,^^{-\r^ H-iA ^i-l,n (1.5.3) ^f + l,n

^nl '*«.7-l Note that an /i x n matrix has n^ cofactors. Cofactors are important because they enable us to evaluate an nth-order determinant as a linear combination of n {n — l)th-order determinants. The evaluation makes use of the following theorem:

CoFACTOR EXPANSION THEOREM. The determinant D of K can be computed from

D — XI ^u ^'7' where i is an arbitrary row, (1.5.4) 7=1 or

D = ^ciij^ijy where j is an arbitrary column. (1.5.5) 1=1 Equation (1.5.4) is called a cofactor expansion by the fth row and Eq. (1.5.5) is called a cofactor expansion by the 7th column. Before presenting the proof of the cofactor expansion, we will give an example. Let 4 -1 0 A = -1 4 -1 (1.5.6) 0 -1 4

By the expression given in Eq. (1.3.7), it follows that

D=.64 + 0 + 0 + (-0) - 4 - 4 = 56.

The cofactor expansion by row 1 yields

4 -1 1 -1 -1 4 D = 4 (-1) + 0 -1 4 0 4 0 -1 = 60 - 4 + 0 = 56, COFACTOR EXPANSIONS II

and the cofactor expansion by column 2 yields

-1 4 0 D^^-H^l) + 4 1(-1) = 56. 0 0 4 To prove the cofactor expansion theorem, we start with the definition of the determinant given in Eq. (1.3.4). Choosing an arbitrary column j, we can rewrite this equation as

(1.5.7) ' = 1 /l.....'• In where the primed sum now refers to the sum over all permutations in which Ij — i. For a given value of / in the first sum, we would like now to isolate the ijth cofactor of A. To accomplish this, we must examine the factor (—1)^ closely. First, we note that the permutations defined by P can be redefined in terms of permutations in which all elements except element / are in proper order plus the permutations required to put / in its place in the sequence 1, 2,..., w. For this new definition, the proper sequence, in general, would be

1,2,3, , / - 1, / + 1, / + 2,..., ; - 1, 7 + 1, ; + 2,..., n - 1, n. (1.5.8)

We now define P/^ as the number of permutations required to bring a sequence back to the proper sequence defined in Eq. (1.5.8). We now note that \j — i\ permutations are required to transform this new proper sequence back to the original proper sequence 1, 2,..., n. Thus, we can write (-1)'' = (-1)^^(-1)'+^ and Eq. (1.5.7) becomes

D = Ei-lY^^l E' (-l)''^«/,.«(,2 • • •a,._,;-,%,;+, • • •«/„„)«o, (1-5-9) i=l \/,,....i,...,/„ / which we recognize using the definition of a cofactor as

1 = 1 A similar proof exists for Eq. (1.5.4). With the aid of the cofactor expansion theorem, we see that the determinant of an upper triangular matrix, i.e.,

0 u 22 *23 *2/i

U = 0 II 33 *3n (1.5.11)

0 0 0 where u^j = 0, when / > j, is the product of the main diagonal elements of U, i.e.,

\V\ = f]u,. (1.5.12) 1=1 12 CHAPTER I DETERMINANTS

To derive Eq. (1.5.12), we use the cofactor expansion theorem with the first column of U to obtain

^22 ^^23 *2n

0 u 33 *3n |U| = M„ 1=2 0 0

"22 "23 *2n

0 U'^^ *3/i

0 0 where Un is the /I cofactor of U. Repeat the process on the (n — l)th-order upper triangular determinant, then the (n — 2)th one, etc., until Eq. (1.5.13) results. Simi- larly, the row cofactor expansion theorem can be used to prove that the determinant of the lower triangular matrix.

\lu 0 • • 0

In '22 • 0 L = (1.5.13)

IL l„2 •• hin

IS \u = Uhi'^ (1.5.14)

i.e., it is again the product of the main diagonal elements. In L, l^j ~ 0 when j > i. The property of the row cofactor expansion is that the sum n

replaces the ith row of D^ with the elements a^j of the ith row; i.e., the sum puts in the ith row of D^ the elements a^i, a,2' • • •»^/n- Thus, the quantity

puts the elements aj, fl2» • •»^n in the /th row of D^, i.e.,

"2n

^i-1,1 (1,5.15)

^i+i, 1 a COFACTOR EXPANSIONS 13

Similarly, for the column expansion.

^l,j-\ Of, a 1../+1 (1.5.16) i=\ ^nj-l ^n ^«,y+l

EXAMPLE 1.5.1.

1 2 A = 3 2 1 1

2 An = A21 — *31 1

£)^ = 1 X All + 3 X A21 -f 1 X ^31 = l(3)^3(-l) + l(-4) = -4

2 3 A'- 2 1

D^, = «! X A,i + a2 X A21 + 0^3 X A3J

= 30?! +0'2 — 4Qf3.

The cofactor expansion of D^ by the first-column cofactors involves the same cofactors, Ay, A21, and A31, as the cofactor expansion of D^' by the first column. The difference between the two expansions is simply that multipliers of A^, A21, and A31 differ since the elements of the first column differ. Consider next the expansions

n ky^j, (1.5.17)

and

E%A-r k^i. (1.5.18)

The determinant represented by Eq. (1.5.17) is the same as D^, except that the 7th column is replaced by the elements of the /cth column of A, i.e.,

column j column k

T.^ik^ij = (1.5.19)

^rti ^nk 14 CHAPTER I DETERMINANTS

The determinant in Eq. (1.5.19) is 0 because columns j and k are identical. Sim- ilarly, the determinant represented by Eq. (1.5.18) is the same as D^, except that the ith row is replaced by the elements of the A;th row of A, i.e.,

^kn row I (1.5.20) row k

^ni

The determinant in Eq. (1.5.20) is 0 because rows / and k are identical. Equations (1.5.19) and (1.5.20) embody the alien cofactor expansion theorem:

ALIEN COFACTOR EXPANSION THEOREM. The alien cofactor expansions are 0, Le.,

Ylîkîj=^^ î'h i^\ (1.5.21) E«ityA7=0, k + i. 7=1 The cofactor expansion theorem and the alien cofactor expansion theorem can be summarized as

(1.5.22)

where ^^^ is the Kronecker delta function with the property

hi = 1» ^ = h (1.5.23) = 0, k + j.

1.6. CRAMER'S RULE FOR LINEAR EQUATIONS

Frequently, in a practical situation, one wishes to know what values of the variables jCi, ^2,..., ^„ satisfy the n linear equations

«21^1 + «22-^2 + • • • + ^2n^n = ^2 (1.6.1)

^nl^l+««2^2 + "- +««n-^n=^. CRAMER'S RULE FOR LINEAR EQUATIONS 15

These equations can be summarized as

Y^aijXj =bi, / = 1,2,... ,w. (1.6.2)

7=1 Equation (1.6.2) is suggestive of a solution to the set of equations. Let us multiply Eq. (1.6.2) by the cofactor A,^ and sum over /. By interchanging the order of summation over / and j on the left-hand side of the resulting equation, we obtain

(1.6.3)

j f i By the alien cofactor expansion, it follows that

E«oAit=0 (1.6.4)

unless j = k, whereas, when j = k, the cofactor expansion yields

^11 «12

^21 ^22 ^2fi J2^ik^ik = ^ = (1.6.5)

«nl ^«2 Also, it follows from Eq. (1.5.19) that

Ml "12 '^l.Jt-l ^1 ^l.it+1 D,^j:b,A,,= (1.6.6)

««1 ««2 ^n,it-l ^/i ^n,k+l

where D^ is the same as the determinant D except that the /:th column of D has been replaced by the elements ^j, ^2' • • • ^ ^w According to Eqs. (1.6.4)-( 1.6.6), Eq. (1.6.3) becomes

Dx. = D.. (1.6.7)

Cramer's rule follows from the preceding result:

CRAMER'S RULE. If the determinant D is not 0, then the sohition to the linear system, Eq. (L6J), is

£1 k = I,.., ,n. (1.6.8) D and the solution is unique. To prove uniqueness, suppose x^ and y^ for / = 1,..., w are two solutions to Eq. (1.6.2). Then the difference between J2j a^jXj = b^ and J^j a^jyj = bf yields

E«i7^ -3^;) = ^' ' = ^ ^' (1.6.9) 16 CHAPTER I DETERMINANTS

Multiplication of Eq. (1.6.9) by A^^ and summation over / yields

D(x, - y,) = 0, (1.6.10)

or jc^ = j^, k = 1,..., n, since D 7^ 0. Incidentally, even if D = 0, the linear equations sometimes have a solution, but not a unique one. The full theory of the solution of linear systems will be presented in Chapter 4. EXAMPLE 1.6.1. Use Cramer's rule to solve

2Xi -\-X2=l

Solution. 2 i D = =3 1 2

7 1 = 11 3 2

2 7 = -1 1 3 11 _D^_ 1 D y Even if the determinant found no other role, its utility in mathematics is assured by Cramer's rule. When D / 0, a unique solution exists for the linear equations in Eq. (1.6.1). We shall see later that, in the theory and applications of linear algebra, the determinant is important in a myriad of circumstances.

.7. MINORS AND RANK OF MATRICES

Consider the m x n matrix

^21 ^22 ^2n A = (1.7.1)

^w2 If m — r rows and n — r columns are struck from A, the remaining elements form an r X r matrix whose determinant M^ is said to be an rth-order minor of A. For example, striking the third row and the second and fourth columns of

A = -^22 ^23 •^24 '*25 (1.7.2) ^31 ^32 ^33 '*34 ^35 ^43 ^44 ^45 J L"4l MINORS AND RANK OF MATRICES 17

generates the minor

«ii «J3 «15

M\ = «2l «23 «25 (1.7.3)

*43 ^45

We can now make the following important definition:

DEFINITION. The rank r {or r^) of a matrix A is the order of the largest nonzero minor of A.

For example, for

A = |A| = 3 (1.7.4)

and so r. = 2. On the other hand, all of the minors of

1111 A = 1111 (1.7.5) 1111

except M\, are 0. Thus, r^ = 1 for this 3x4 matrix. For an m x n matrix A, it follows from the definition of rank that r^ < min(m, n). Let us end this chapter by mentioning the principal minors and traces of a matrix. They are important in the analysis of the time dependence of systems of equations. The jth-order trace of a matrix A, tr^ A, is defined as the sum of the yth-order minors generated by striking n — j rows and columns intersecting on the main diagonal of A. These minors are called the principal minors of A. Thus, for a 3 X 3 matrix,

«ii ^12 ^13

tr3A = «21 ^22 ^23 = DetA, (1.7.6)

«31 ^32 ^33

tr2A = «n «12 + «11 ^13 + 1'*2 2 -'23 (1.7.7) «21 ^22 a 31 ^33 ^32 *33 trt A = ail + ari + a-^-i. (1.7.8)

For an n X n matrix A, the nth-order trace is just the determinant of A and tr, A is the sum of the diagonal elements of A. These are the most common traces encountered in practical situations. However, all the traces figure importantly in the theory of eigenvalues of A. In some texts, the term trace of A is reserved for trj A = Yl]=\ <^jj* ^^^ the objects tr^ A are called the invariants of A. 18 CHAPTER I DETERMINANTS

EXERCISE 1.7.1. Show that all of the traces of the matrix 4 1 1 0 1 -4 1 1 A = (1.7.9) 1 1 -4 1 0 1 1 -4 are positive. We will show in Chapter 6 that this implies that A has only negative eigenvalues. It also implies that, independently of initial conditions, the solution x to the equation dx = Ax (1.7.10) dt III always vanishes with increasing time t.

PROBLEMS

1. Evaluate the determinant of the matrix given by Eq. (1.7.9) using the formulas

^ = E (-i)^^i/,%/2• • '^^ = E^o^ h /« '=1 2. Solve the following determinants by elementary operations

6 3 2 9| 5 3 2 9 (a) 4 4 8 9 14 3 2 9

a-{-b c c (b) a b -\- c a = ^Aabc. b b c -{- a (c) Solve the following set of equations:

JX^ ~r X2 + 2JC3 = 1

—JCi 4- 4JC2 + 5JC3 = 1

— iXl + 2Xr^,+JC 3 = -l.

. Evaluate the determinants

0 0 0 ^1

0 0 ^1 «2 0 Ci h «3

di C2 &3 a 4 PROBLEMS 19

and

h ^3 h ^5 0 Cj 0 0

0 ^3 d. ds 0 e^ 0 e^

4. Using Cramer's rule, find jc, y, and z for the following system of equations:

3x~4y + 2z=l 2x-^3y-3z = -l 5x -5y-\-4z = 7.

5. Using Cramer's rule, find jc, y, and z for the following system of equations:

6/x-2/y-^\/z=4 2/x + 5/y-2/z = 3/4 5/x-l/y + 3/z = 63/4.

6. Show that

0 a b c 0 1 1 I a 0 c b 1 0 c^ b^ b c 0 a 1 c^ 0 a^ c b a 0 1 b^ a^ 0

7. Use the determinant properties to evaluate

.2 a"

8. Using Cramer's rule, find x, y, and z, where

3JC + 43; - 2z = 3 2;c 4- 2}^ - 3z = 1 -jc + >' - 2z = -2.

9. Using Cramer's rule, find x, y, and z for the system of equations

4JC + 7y - z = 7

3JC + 2y 4- 2z = 9 jc +5j — 3z = 3. 20 CHAPTER I DETERMINANTS

10. Using Cramer's rule, find x, y, and z for the system of equations

x + 3y = 0

2JC 4- 6y + 4z = 0 -x + 2z = 0,

11. Using Cramer's rule, find x, y, and z for the system of equations

X + y -\-z = 1 x + l.000ly-{-2z = 2 x + 2y-{-2z = 1.

12. Let

D,= ^11 ^n and D2 = ^1 ^21 ^22 ^21 hi

Show that

«ll «12 0 0

^21 ^22 ^ 0 D = D,D2 = -1 0 fc„ ^12 0 -1 bn. ^22

13. What is the rank of

2 1 3 4 1 1 -2 -1 0 3 -1 2

14. Evaluate the determinant

3 5 2 4 1 1 -1 6 D 2 3 5 1 2 1 4 8

by first generating zero entries where you can and then using a cofactor expansion. 15. Show that

1+c, 1 1 1 1 I+C2 1 1 A 1 1 1 1\ = c,c-,c 1 1 I+C3 1 3C41+ - + - + - + -. V Ci C2 C3 C4/ 1 1 1 I+C4 PROBLEMS 21

16. What is the rank of

12 3 4 A = 14 6 8 5 7 9 1

17. Give all of the minors of

A=: ^21 ^22 ^23

18. Give all of the traces of the matrix whose determinant is shown in Problem 15. 19. Solve the equation

l-x 7 -h 2JC 0 -h 3A: 4 + 2x 10-4JC 6-6A: 2 4 5

20. Without expanding the determinant, show that

x^ X y2 y ix ~y)(y~z)ix-z).

21. Consider the set of matrices

a b A = b

(a) Defining the determinants D„ as

D„ = |A„|

find a recursion relation for D„ (i.e., D„ = f(D„_i, D„-2,...; a, /?)). (b) Letting a = 0.5 and b = I, write a computer program to evaluate Dgg. 22 CHAPTER I DETERMINANTS

22. Consider the n-dimensional matrix

jc a a X a a X a a X a 0

For the case where a = I and jc = 2cos^, prove that the determinant is given by D = sin{n 4-1)0/ sin 6 as long as 9 is restricted to 0 < ^ < n. 23. Find the determinant of the n x n matrix whose diagonal elements are 0 and whose off-diagonal elements are a, i.e.,

0 a a a a a 0 a a a a a 0 •• a a

a a a . 0 a a a a a 0

24. Find the following determinant:

l+«, 1+fl, a a. 1+a,

^1 ^2 «3 •*•!+««

25. Prove the following relation for the Vandermonde determinant'.

1 1 1 X2 ^2 = n fi ^-^/)- (=1 7=« + l

^n-l

FURTHER READING

Aitken, A. C. (1948). "Determinants and Matrices." Oliver and Boyd, Edinburgh. Aitken, A. C. (1964). "Determinants and Matrices." Interscience, New York. Amundson, A. R. (1964). "Mathematical Methods in Chemical Engineering." Prentice-Hall, New Jersey. Bronson, R. (1995). "Linear Algebra: an Introduction." Academic Press, San Diego, FURTHER READING 23

Muir, T. (1960). "A Treatise on the Theory of Determinants." Dover, New York. Muir, T. (1930). "Contributions to the History of Determinants, 1900-1920." Blackie & Son, London/ Glasgow. Nomizu, K. (1966). "Fundamentals of Linear Algebra." McGraw-Hill, New York. Stigant, S. A. (1959). "The Elements of Determinants, Matrices and Tensors for Engineers." Macdonald, London. Tumbull, H. W. (1928). "The Theory of Determinants, Matrices and Invariants." Blackie, Lon- don/Glasgow. Vein, R. "Determinants and Their Applications in Mathematical Physics," Springer, New York. This Page Intentionally Left Blank VECTORS AND MATRICES

2.1. SYNOPSIS

In this chapter we will define the properties of matrix addition and multiplication for the general mxn matrix containing m rows and n columns. We will show that a vector is simply a special class of matrices: a column vector is an m x I matrix and a row vector is a 1 x n matrix. Thus, vector addition, scalar or inner products, and vector dyadics are defined by matrix addition and multiplication. The inverse A~^ of the square matrix A is the matrix such that AA~^ = A~'A = I, where I is the unit matrix. We will show that when the inverse exists it can be evaluated in terms of the cofactors of A through the adjugate matrix

^n\

^12 ^22 Hi adjA =

A^n ^In

Specifically, by using the cofactor expansion theorems of Chapter 1, we will prove that the inverse can be evaluated as

A-^ = -adjA. D ^

25 26 CHAPTER 2 VECTORS AND MATRICES

We will also derive relations for evaluating the inverse, transpose, and adjoint of the product of matrices. The inverse of a product of matrices AB can be computed from the product of the inverses B~^ and A~^ Similar expressions hold for the transpose and adjoint of a product. The concept of matrix partitioning and its utility in computing the inverse of a matrix will be discussed. Finally, we will introduce linear vector spaces and the important concept of linear independence of vectors sets. We will also expand upon the concept of vector norms, which are required in defining normed linear vector spaces. Matrix norms based on the length or norm of a vector are then defined and several very general properties of norms are derived. The utility of matrix norms will be demonstrated in analyzing the solvability of linear equations.

2.2. ADDITION AND MULTIPLICATION

The rules of matrix addition were given in Eq. (1.2.6). To be conformable for addition (i.e., for addition to be defined), the matrices A and B must be of the same dimension m x n. The elements of A + B are then a^ + b^j; i.e., corresponding elements are added to make the matrix sum. Using this rule for addition, the product of a matrix A with a scalar (complex or real number) a was defined as

aai aa, Of A (2.2.1)

aa„ aa„ Using the properties of addition and scalar multiplication, and the definition of the derivative of A, dk ,. A(r + AO-A(0 -— = hm , (2.2.2) dt A/ô Ar we find that a,^{t ^ ^t) ~ a^^(t) a,„(f + AO -- «i«(0 At Â At -— = lim dt A/ô aîjt-\- At) ~ a^^jt) a,„„{t + AO -a^nit)- At At (2.2.3) da^ da. dt dt

da„ . dt dt We can therefore conclude that the derivative of a matrix dkjdt is a matrix whose elements are the derivatives of the elements of A, i.e., dA __ [^^,7] (2.2.4) dt " \_ dt J' ADDITION AND MULTIPLICATION 27

Note that \dA/dt\ ^ d\A\/dt, The determinant of d\/dt is a nonlinear function of the derivatives of a,y, whereas the derivative of the determinant |A| is linear. If A and B are conformable for matrix multiplication, i.e., if A is an m x n matrix and B is an n x /? matrix, then the product

C = AB (2.2.5)

is defined. C is then anm x p matrix with elements

^0- (2.2.6) k=\ Thus, the ijWx element of C is the product of the /th row of A and the yth column of B, and so A and B are conformable for the product AB if the number of columns of A equals the number of rows of B. For example, if [l l1 A== 2 1 and B = [3 ij then

AB = (2.2.7)

whereas BA is not defined. EXERCISE 2.2.1. Solve the linear system of equations dX BA, (2.2.8) It where

Mt = 0) = (2.2.9)

and

-2 1 B = (2.2.10) 1 -2

If X and y are «-dimensional vectors, then the product x^y is defined since the transpose x^ of x is a 1 x n matrix and y is an n x 1 matrix. The product is a 1 x 1 matrix (a scalar) given by

(2.2.11) 1=1 x^y is sometimes called the scalar or inner product of x and y. The scalar product is only defined if x and y have the same dimension. If the vector x is real, then 28 CHAPTER! VECTORS AND MATRICES

the scalar product x^x = X!"=i ^f ^^ positive as long as x is not 0. We define the "length" ||x|| of x as

||x|| = \/x^ (2.2.12) for real vectors. For a complex vector, J2i ^f is not necessarily positive—or even real. In this case, we define the inner product by

xV = E<>'i (2-2.13)

and the length of x by

ilx.. = yxtx = Jf:Kl', (2.2.14) =1 where the quantity |jc,p (= x*x^) is the square of the modulus of x^. Here, again, x^ denotes the adjoint of x, namely, the complex conjugate of the transpose of x, x^ = [xj*,x*,,..,jc:], (2.2.15) where x* is the complex conjugate of x^. The length ||x|| has the desired property that it is 0 if and only if every component of x is 0 (i.e., if x = 0). By definition, the matrix product AB exists only if A is an m x p matrix and B is a /? X n matrix. The product AB is then an m x n matrix. Thus, if x is an m- dimensional vector (m x 1 matrix) and y is an n-dimensional vector (n x 1 matrix), the vector y^ is a 1 x n matrix. The product xy^ therefore exists as an m x « matrix given by

•^1^1 '"^lyi ••• -^iJn T (2.2.16) xy' ^my\ ^myi ••' ^myn In the language of vectors and tensors, in Euclidean vector space the product xy^ is known as a dyadic. We will use this term in later chapters.

2.3. THE INVERSE MATRIX

We note that the product Ax is defined for an m x n matrix A and an «-dimensional vector X (i.e., an n x 1 matrix). With the matrix product so defined, the set of linear equations

^11-^1+ «12-^2+ • • • + ^Xn^n = ^1 : : : : (2.3.1)

can be summarized by the single equation

Ax = b. (2.3.2)

We can see that the multiplication of A by x transforms the n-dimensional vector X into the m-dimensional vector b. THE INVERSE MATRIX 29

When n = m in Eq. (2.3.1) and D^ ^ 0, we know from Cramer's rule that Eq. (2.3.1) can be solved for x uniquely for any b. In this case, it seems natural to hunt for an n x n matrix A~^ that is the inverse of the n x n matrix A, i.e., a matrix such that

A-^A = AA-' =1, (2.3.3)

where I is the identity matrix. The identity matrix is defined by the property that

ly = y; (2.3.4)

i.e., the product of the n x n identity matrix I and an n-dimensional vector y produces the vector y again. With this property, Eq. (2.3.2) can be solved by multiplying it by A~^ to obtain

A ^Ax = Ix = x = A ^b. (2.3.5)

If A~^ can be found, then the solution x to Ax = b is simply A~^b. The identity matrix I has the simple form

1 0 0 1 [5,7]' (2.3.6)

0 0

i.e., I is unity on the main diagonal and is 0 elsewhere. We can use the cofactor expansion theorems proved in Chapter 1 to construct the inverse of a square, nonsingular (i.e., |A| / 0) matrix A. We define the adjugate of a matrix A by

AnX

^22 adjA = (2.3.7)

A^n where A,y is the cofactor of a^j. Note that the ijih element of adj A is the cofactor of the 7/th element of A. The elements of the matrix product

A adj A (2.3.8)

are then

Jl^ikA jk- (2.3.9) jt=i But, according to the cofactor theorems (Eq. (1.5.18)),

(2.3.10) k=l 30 CHAPTER 2 VECTORS AND MATRICES

where D is the determinant of A. This means that A adj A is a diagonal matrix having the determinant D on the main diagonal (and, of course, O's elsewhere), i.e.,

D 0 0 0' 0 D 0 0 A adj A = 0 0 D 0 = Dl. (2.3.11)

_0 0 0 D

Likewise, the ijih element of the product

(adj A) A (2.3.12)

(2.3.13) k=l which again, by the cofactor expansion theorems, obeys

n (2.3.14) k=l

and implies (2.3.15) (adj A) A = DI.

In summary, the properties of the adj A are that (2.3.16) AadjA = (adjA)A = DI.

If D 7^ 0, then Eqs. (2.3.11) and (2.3.15) can be rearranged to give A(^-iadjA^ = (|-iadjA^A = I, (2.3.17)

which, when compared with Eq. (2.3.3), shows that the inverse of A can be computed as

A-' = -adjA. (2.3.18) D Since the elements of (adj A) b are

J^Ajibj = Df, r = l,...,n, (2.3.19)

where D, is defined in Eq. (1.6.6), it follows that the elements of x = A 'b are

Xj = —'-, i = I,... ,n. (2.3.20) D THE INVERSE MATRIX 31

Not surprisingly, the solution x = A"'b to Ax = b is exactly the one given by Cramer's rule. What is important, however, is that Eq. (2.3.8) gives the inverse of A once and for all. We can therefore generate a solution for a particular b by taking the matrix product A~^b. It should be noted that true inverses are only defined for square matrices. Otherwise, A~'A and AA"' cannot be equal to the same square matrix. We can, however, define left and right inverses or pseudo-inverses. If there exists a matrix G such that

GA = I, (2.3.21)

then we say G is a left inverse of A. Accordingly, if there exists a matrix H such that

AH = I, (2.3.22)

then we say H is a right inverse of A. Note that left and right inverses need not be unique. EXAMPLE 2.3.1. Find the inverse of

2 1 A = (2.3.23) 2 2

Since |A| =2, the adjugate formula gives

A-^ = (2.3.24)

such that

1 '2 0" "1 0] ~'A = — = I = AA 2 0 2 0 • • • ij EXAMPLE 2.3.2. Find the left or right inverse of

A = (2.3.25)

Since |A| = 0, the adjugate formula will not give an inverse. To find a left inverse, we must find

G = ^11 ^12 (2.3.26) ^21 ^22

such that

1 0 GA = ^11-^12 -^11+^12 (2.3.27) ^21 - ^22 -^21 + ^22 0 1 32 CHAPTER 2 VECTORS AND MATRICES

^11 -^12 = 1

-^11+^12=0 (2.3.28) ^21 - ^22 = 0

~82\ +^22= 1- These equations imply that gn — 812^ Six — 822^ ^^^ 0=1, which is impossible. Thus, G does not exist. Similarly, we can show that H does not exist. Examples 2.3.1 and 2.3.2 illustrate two general results for a square matrix. Namely, if a square matrix is nonsingular (|A| / 0), its inverse always exists. We have shown this by the above adjugate constructions. Also, if a square matrix is singular (|A| = 0), it has neither a right nor a left inverse. The proof of this property requires a bit more than we have learned so far. We need to know that the determinant of a product of square matrices obeys the formula

|AB| = |A||B|. (2.3,29)

From this property, which is proved in Chapter 4, and the property |I| = 1, it follows that |GA| = 0 is always true if |A| = 0. Therefore, the equations GA = I and AH = I, upon taking the determinants, imply that 0=1, which is impossible. EXAMPLE 2.3.3. Find the right inverse of

A = (2.3.30)

The pertinent equation is AH = I,

_ hu hx2 _ _. 1 1 11 1 0 AH ^21 ^22 = (2.3.31) 1 2 0 ij Jhi ^32_ ^ The corresponding set of equations is

^11 - ^21 + ^31 = 1' ^12 - ^22 + ^*332 0 (2.3.32) ^U + ^21 + 2^31 = ^' ^^12 + ^22 + 2/I32 = 1. For arbitrary /131 and /132, these equations yield

1 - 3/131 l-3/i 1 32 H = -I-/131 \-h 32 (2.3.33)

2/131 2h 32 • • I Thus, the right inverse of A exists, but is not unique. Wmm EXERCISE 2.3.1. Show that the matrix defined by Eq. (2.3.30) does not have • I • a left inverse. TRANSPOSE AND ADJOINT 33

wmam EXERCISE 2.3.2. show that the matrix

1 -1 1 (2.3.34) 1 -1 1

has neither a left nor a right inverse. An important property of the inverses of square matrices is that

(AB)-^ = B-^A-\ (ABC)-^ = C-^B-^A-\ etc.; (2.3.35)

that is, the inverse of a string of products of matrices is the product of the inverse of the matrices written in the opposite sequence. To prove Eq. (2.3.35), let D denote the inverse of AB, i.e., D = (AB)-^ Then

DAB = ABD = I. (2.3.36)

However, B 'A" (ABD) = B HAÂ)BD = B ÎBD = (B^B)D = ID = D. Since ABD = I and B-Â"^! = B-Â-^ it follows that D = B-Â-^ or

(AB)-^ =B-^A-^ (2.3.37)

To prove the property for longer strings, set

B' = BC. (2.3.38)

Then

(ABC)-* = (AB')-* = (B')-*A-* = C-'B-*A-^ (2.3.39)

where the result in Eq. (2.3.37) is used.

2.4. TRANSPOSE AND ADJOINT

We saw in Chapter 1 that the transpose A^ of the m x n matrix A is the n x m matrix formed by interchanging the rows and columns of A, i.e..

«11 «21 '^m\

^12 ^22 ^ml A^ = (2.4.1)

.^In ^2n

Accordingly, the adjoint A^ of A is the complex conjugate of the transpose of A, i.e..

Lt _ (2.4.2) 34 CHAPTER! VECTORS AND MATRICES

where a*j is the complex conjugate of the element a^j. In the shorter notation,

A = [a^jl A" = [aj,l A^ = [a*,]. (2.4.3)

Since a scalar a is a 1 x 1 matrix, the adjoint of a scalar is just its complex conjugate, i.e., a^ = a*. Of course, if A is real the adjoint and the transpose are the same since (A^)* = A^. In many cases, one needs the transpose or adjoint of matrix products such as AB, ABC, etc., for which the following property is useful:

(AB)'^ = BÂ'^, (ABC)'^ = C^B'Â'^, etc., (2.4.4a) (AB)^ = BÂ^ (ABC)^ = C^BÂ^ etc.; (2.4.4b)

that is, the transpose (adjoint) of a product string of matrices is the product of the transposes (adjoints) of the matrices taken in reverse order. The proof of Eq. (2.4.4a) is straightforward. The ijih element of B'Â'^ is [BÂ'^],^ = Y,^ bjjâjj = Y.k ^jkK^ proving that (AB)'^ = B'Â'^. If we define D = BC, then we have just proved that (AD)'^ = D^Â^ and D'^ = C^B, so that (ABC)'^ = C'^B'Â'^. A similar proof holds for a product stream of any length. Moreover, since the adjoint is simply the complex conjugate of the transpose, Eq. (2.4.4b) follows immediately from Eq. (2.4.4a). The matrices in Eq. (2.4.4) must be conformable to multiplication but, in general, do not have to be square. ••• EXAMPLE 2.4.1. If A is an m x n matrix, x an m-dimensional vector (m x 1 matrix), and y an n-dimensional vector (n x 1 matrix), then the product xÂy is a 1x1 matrix or a scalar. The adjoint of xÂy, namely, (xÂy)^ = yÂ^x, is simply • • • its complex conjugate, (xÂy)*. In many practical situations (such as reactor stability analysis or process control), one would like to know whether the equation

— = -Ax (2.4.5) dt predicts that an arbitrary perturbation x will decay to 0 or diverge in time. In such a problem, the matrix A is square, say nxn. The answer to the question is: if the scalar

z^(A + A'^)z

is positive for every nonzero n-dimensional vector z, then x -> 0 as r = oo for any initial condition. To prove this, note that

d\\xf d ^X (dx^\/^x^ \ +dx.dx dt

If X obeys Eq. (2.4.5), then

^ = -xt(A + AV (2.4.7) dt PARTITIONING MATRICES 35

If x^(A 4- A^)x is positive for every vector x, then Eq. (2.4.7) implies that

d\\x\\' < 0 for all time, (2.4.8) dt or the magnitude of all the components of x must approach 0. If, for any x, the product x^(A -f- A^)x is negative, then there will be some initial conditions for which the length of x, and hence the magnitude of its components, grow in time. In such a case, the reactor or control system would be unstable. Let us close this section by noting that the inverse of a transpose (adjoint) is the transpose (adjoint) of the inverse, i.e.,

iAT=(A-') KT and (Atr' = (A-')t. (2.4.9)

The properties T = I and I = (AA-')^ = (A-')TAT show that (A"')'^ = (A'^)-'. Similarly, it follows from 1+ == I that (A'^V = (A^)-'.

2.5. PARTITIONING MATRICES

Any matrix can be partitioned into an array of matrices and/or vectors. As long as the matrices A and B are conformable for addition or multiplication, they can be partitioned into conformable matrices of matrices. Consider, for example, the partition of the 3 x 6 matrix A denoted by the lines:

«11 a^2 a,3 «14 ^15 «16

A = «21 «23 «24 «25 «26 (2.5.1) ^21 *^23 .^31 «32 ^33 «34 «35 «36 where

1^11 ^12 ^13 «14 An = A19 — A,3 = [_'^21 ^22 ^^23 L«24 ^25 "26 (2.5.2)

A21 = [«31»«32i<^33]» ^22 = ^34' A23 = [«35'«36]- When A and B are conformable for addition, and are conformably partitioned, then their sum can be given by

M? Bu B, B, A + B = + B p«j >i B p2 B p«j (2.5.3) A„+B„ A„ + B„

A.,+B,,. By "conformably partitioned" we mean the elements A,^ and B^j have to have the same dimension for addition. 36 CHAPTER 2 VECTORS AND MATRICES

When A and B are conformable for multiplication (i.e., the number of columns of A equals the number of rows of B), the product of the partitioned matrices A and B is

M/; B, Bi. AB

LKi ^ ••• A,QPJ ^pl B^r. (2.5.4)

ZJ ^qk"k\ ' ' ' J2 ^qk^kr -k=] k=\ Note that the product A^j^B^^^y has to be conformable for each k and for all ij pairs in question. A partitioning that will often be useful in later chapters is the partitioning of a matrix x into its column vectors, i.e., — —1 Xii •^12 •^In x = [Xj, X2, . . . , X„J, (2.5.5) _^ml ^m2 X

where x^ is an m-dimensional vector with components x^j. To see how this is useful, consider the equation

AX = I (2.5.6)

for the n X n square matrices A, X, and I. The unit matrix I can be partitioned as

I = [ei,e2,...,eJ, (2.5.7)

where e, is a vector with 1 in the ith row and O's elsewhere, i.e.,

'0^

e. = (2.5.8)

The vectors e, are orthonormal, which means

e,^. = 5,,. (2.5.9) PARTITIONING MATRICES 37

If in Eq. (2.5.6) X is partitioned as shown in Eq. (2.5.5) and I as in Eq. (2.5.7), then multiplication of A and X using the rule shown in Eq. (2.5.4) yields

[Axi, Ax2,..., AxJ = [e,, 62,..., e„, ], (2.5.10)

or, equating matrix components.

Ax, = e,, / = 1,..., n. (2.5.11)

When A is not singular (|A| 7^ 0), Cramer's rule assures the solution to the equations of (2.5.11). In this case, the matrix X is the inverse A~^ of A. Thus, by using matrix partitioning, we find yet another way to obtain the inverse of A, namely, to solve the linear equations Ax, = e, for each unit vector e,. The efficient solution of such equations is the topic of the next chapter. EXAMPLE 2.5.1. Find the inverse of

2 3 A = (2.5.12) 8 7

Axj =01 yields the equations

1 (2.5.13) Sxii +7JC21 = ^

10 X, = 4 (2.5.14) 5 Ax2 = 62 yields

"^12 ' •^22 — (2.5.15) •^12 •" ^"^22 ~~ or 3_ 10 X. = (2.5.16) _1 5 and so 2_ A-^=[Xi,X2] 10 10 4 (2.5.17) 5 _i 5 • • • It is easy to verify that AA = I. IHIH EXERCISE 2.5.1. (a) Give the solution to

Ax = b (2.5.18)

for arbitrary b for the matrix A in the preceding example. • • • (b) Calculate the transpose of the inverse of A. 38 CHAPTER 2 VECTORS AND MATRICES

2.6. LINEAR VECTOR SPACES

In the analysis of linear systems, it is frequently useful to define a vector space S and linear operators in that space. Generalization of the concept of vector length to the concept of operator norm can lead to some very strong statements about the existence of solutions and the convergence of some solution techniques. We say 5 is a linear vector space if it is a collection of elements x, y,... for which addition is defined and which obey the following properties: 1. If X and y belong to S, then x -f y belongs to S; i.e., if x, y e S, then X + y G 5'. 2. x + y = y-f X. 3. There exists the zero element 0 such that

x + 0 X. 4. For every element x, there exists an element —x such that

X -f (-X) = 0. 5. If Of and p are complex numbers, then a{p\) = {aP)x (a -f p)x = ax + jSx a(x + y) =ax + ay. The collection £„ of all n-dimensional vectors obeying the previously given rules of addition and multiplication by a scalar obviously has the properties 1-5. Thus, E,^ is a linear vector space. In much of this book, E^ will be the linear vector space of interest. However, the properties of a linear vector space admit much more general objects than n-dimensional vectors. A pertinent example is a collection of all functions sharing common properties, such as continuity, differentiability, integrability, and the like. Another example is the collection of all n x m matrices. An important vector concept is linear independence. If Xj and X2 are linearly dependent, then there exists a complex (or real) number c such that x, + CX2 = 0. For example, if

X, = and (2.6.1)

then Xi = 2X2 or c = —2. If x^ and X2 are linearly independent, then there is no number c such that x^ + CX2 = 0. For example, the vectors

and X. = (2.6.2)

are linearly independent because there is no multiplier of X2 that will place 1 in the second component of X2. The general definition of linear independence is that the set of p n-dimensional vectors {x,, X2,..., x^} are linearly independent if there exists no nonzero set of numbers {c^, C2,..., c^} (not all of which are 0) such that

Ec,x,=0. (2.6.3) LINEAR VECTOR SPACES 39

EXAMPLE 2.6.1. Prove that the vectors 3 1 X, = 2 ' \2 = 2 ' and X, = (2.6.4) 1 _3_

are linearly dependent. We seek a solution to

(2.6.5) /=1 or 3ci 4- ^2 + 5c3 = 0 2c, + 2c2 + 6C3 = 0 (2.6.6) c, + 3c2 + Vc^ 0. Assume c^ ^ 0 and solve the first two equations for c, and C2. Muhiply the first equation by 2 and subtract the second equation

2(3ci + C2 = -5C3) 2c, -f 2c2 = —6C3 (2.6.7) 4c, = -4c3 or c, = —C3 and C2 = —2C3. The values c, = —C3 and C2 = —2C3 satisfy the third equation in Eq. (2.6.6). Therefore, C3 = 1, c, = — 1, and C2 = —2 are a solution to Eq. (2.6.5), and so the • • • vectors x,, X2, X3 are linearly dependent. We create what is known as a normed linear vector space by assigning to every vector x in 5 the norm (or length) ||x||. By definition, the norm is required to satisfy the properties: 1. ||x|| > 0 and ||x|| = 0 if and only if x = 0. 2. ||Qfx|| = \a\ ||x|| for any complex number a and any x. (2.6.8) 3. I|x + y||<||x|| + ||y||. Here, \a\ denotes the magnitude of the complex number a. Property 3 is known as the "triangle inequality" after the well-known property of Euclidean vectors as illustrated in Figure 2.6.1. In ordinary Euclidean space.

FIGURE 2.6.1 40 CHAPTER 2 VECTORS AND MATRICES

this inequality represents the property that the shortest distance between points a and ft is a straight line (which lies along x 4- y and is of length ||x + y|| = [Y^i=\{^i + yd^]^^^^ where x, and y^ denote the Cartesian coordinates of x and y). Clearly, if the norm ||x|| is defined as the square root of the inner product x^x, then properties 1-3 follow. However, this definition of the nomi is only one among many, and it is not always the most convenient one. For example, if x € £„, the quantity

l|x|l (2.6.9)

for any positive real number p is also an acceptablacceptat) e norm. The special choice of p = 2 corresponds to the familiar length \/?vVxx introducei d earlier. For the case p = oo, it follows that

\lp ltx||^= l\m\\xj''f:{\x,\/\xjy

= \xj\im{v)'/'' = \xjv° (2.6.10) p-^oo = \x„ where |x^| is the magnitude of jc, of maximum value, i.e.,

\xj =max|xj, !>/>«, (2.6.11)

and V is the number of components jc, having the same maximum magnitude. For example, if

X = (2.6.12)

where 0 is real and i = y—T, then

|jCi| = l/2, |JC2| = 1, and kal = 1.

and so |jc^| = 1 and v = 2. There are still other norms that we can define. For example, consider a self- adjoint matrix A (i.e., A^ = A) with the additional property that

x^Ax > 0 for all X 7^ 0 in E^.

Such a matrix is called positive definite. The quantity xÂx is real since (xÂx)* = xÂ^x = xÂx, and thus the quantity

||x|| = Vx^Ax (2.6.13) LINEAR VECTOR SPACES 41

obeys the requisite properties of a norm. Similarly, if A is positive definite, the quantity

M„A = {E[(E4^;)(E«-7^.)]'"Y'' (2-6.14)

obeys the conditions for a norm for any positive real number p. Next, we want to define linear operators in linear vector spaces. An operator A is a linear operator in S if it obeys the properties:

1. If X € 5, then Ax € 5. (2.6.15) 2. If X and y € 5, then A (ax + ^y) = a Ax -f ^Ay e S, (2.6.16)

where a and fi are complex numbers. If 5 = £„, an n-dimensional vector space, then all n x n matrices are linear operators in £"„ when the product Ax obeys the rules of matrix muhiplication. Square matrices as operators in vector spaces will preoccupy us in much of this book. However, if 5 is a function space, then the most common linear operators are differential or integral operators (more on this in Chapters 8-10). We define the norm of a linear operator as

||A||=max||Ax||/||x||. (2.6.17)

Thus, IIAII is the largest value that the ratio on the right-hand side of Eq. (2.6.17) achieves for any nonzero vector in S. From properties 1~3 of the norm of a vector, the following relations hold for the norm of a linear operator:

1. ||A|| > 0 and ||A|| = 0 if and only if A = 0. (2.6.18) 2. ||aA|| = |a| ||A|| for any complex number a. (2.6.19) 3. ||A + B||<||A|( + ||B||. (2.6.20) 4. IIABII < IIAII ||B||. (2.6.21)

Conditions 1 and 2 are easy to establish from the properties of ||x||. To prove condition 3, note that

II(A + B)x|| = IIAx + Bxll < IIAxil + ||Bx||, (2.6.22)

and so n.A_LitMi ll(A + B)x|| (A + B) = max -— x#o ||x|| IIAxll IIBxIl < max -—— + max ——- x^O Ijxll x^O ||X|| < IIAII + IIBJI. (2.6.23)

For condition 4, note that

IIAyll < IIAII llyll (2.6.24) 42 CHAPTER! VECTORS AND MATRICES

follows from the definition of ||A||. But if we set y = Bx and use again the property

||y|| = ||Bx||<||B||||x||, (2.6.25)

we find

IIABxIl < IIAII ||B|| llxll (2.6.26)

or IIABxIl IIABII = max \-^ < ||A|| ||B||, (2.6.27) x#o ||x|| which proves condition 4. •^•1 EXERCISE 2.6.1. Show that, for the space E^ and the norm ||x||^,

||A|U = maxX:K7l (2.6.28) \

IIAII, = max X:KI' (2.6.29)

i.e., that the /? = cx) norm of the matrix A is the maximum of the sum of the magnitudes of the row elements and the /? = 1 norm is the maximum of the sum of the magnitudes of the column elements. Let us close this chapter with an example of how operator norms can be useful. Suppose we want to solve the linear equation

X - Ax = b. (2.6.30)

The above equation can be rearranged to

x = Ax + b. (2.6.31)

Letting x^*^^ denote an initial guess of the solution, the next estimate to the solution can be obtained from x(i) ^ Ax^^^ + b. (2.6.32)

Continuing this process, known as Picard iteration, gives for the A:th estimate to the solution

x(^) = Ax^*-^^ -f- b. (2.6.33)

The question is: will this solution converge? Suppose x is the true solution to Eq. (2.6.30). Then subtraction of Eq. (2.6.33) from the successive iteration equations yields

X - x^^^ z= A(x - x^^^) (2.6.34) X - x^*^ = A(x - x^^-^^) = A*(x - x^^^). (2.6.35) PROBLEMS 43

Taking the norm of Eq. (2.6.35) and using the properties of the norm, we obtain

||x - x^^^ll < ||A|h|x - x^^^ll. (2.6.36)

Thus, if IIAll < 1, Eq. (2.6.36) guarantees that ||x —x^'^^H -> 0 as A; -> oo and so in Picard iteration x^*^ converges to the solution of Eq. (2.6.30). If ||A|| > 1, Picard iteration may or may not converge to a solution. In the next chapter, we will introduce iterative solution schemes whose convergence can be analyzed in terms of vector and matrix norms.

PROBLEMS

1. Show that the unit vectors e, in an n-dimensional space are similar to Cartesian unit vectors in three-dimensional Euclidean space. 2. Find the inverse of

-1 1 0 A = 1 -2 1 0 1 -2

Calculate the determinant of the inverse of A. 3. Find the maximum value of the function

/ = x^Ax - 2b^x,

where A is the matrix defined in Problem 2 and

b =

Recall that 9//3JC, = 0, / = 1, 2, and 3, at the maximum. 4. Calculate the scalar or inner product y^x, the lengths ||x|| and ||y||, and the dyadic xy^ where

X = and y =

and / = v^^. 5. Compute the adjugate and inverse of the matrix defined in Problem 2. 6. Find the right inverse of

1 2 2 A = 3 1 1 44 CHAPTER 2 VECTORS AND MATRICES

7. Prove that a two-dimensional space in which the quantity ||x|| is defined by

||x|| = x^Ax

is a normed linear vector space if

2 -1

Hint: Show that the properties in Eq. (2.6.8) are obeyed. 8. Prove that the vectors 1 5 X, = 2 X2 = 1 and X. = 3 2 are linearly independent. 9. Consider the equation X = Ax + b, (1) where

1 1 1 2 4 5 1 1 6 3 7 and b =

5 4 4 Prove that Eq. (1) can be solved by Picard iteration, i.e., if

x<^+^)=Ax^'^+b, it = 1,2,...,

then x^^^ -^ X, the solution of Eq. (1), as A: -> oo. 10. Calculate the adjugate and inverse of the matrix

1 5 3 A = 2 1 7 3 2 1 11. Show that if cos^ — sin^ A(^) = sin^ — cosO

then

(1) Consider the two-dimensional Euclidean vector represented as

X =

Examine A(6) \ and give a geometric interpretation of the result in Eq. (1). PROBLEMS 45

12. Suppose that A = [a,y], where

^ (n-J -/ + 1)!0 - 1)!

Compute A^. 13. Find all 2 x 2 solutions of the equation

A" = A,

where n is a positive integer. Consider first the case n = 2 and then solve for the general case. 14. Prove that, in general.

Under what conditions would e^^+®^^ = e^U^^ be true? 15. Suppose

— = A(Ox, x{t = 0) = Xo, (1)

where the elements a^j of A depend on t,

(a) Use the result in Problem 14 to prove that = expl / A(T)dT IXQ

is not a solution to Eq. (1). (b) If A is independent of time, prove that

X = ^^'Xo

is the solution to Eq. (1).

16. Prove that the Frobenius norm,

1/2 IIAII

obeys the requisite conditions of a norm, namely, Eqs. (2.6.18)~(2.6.21). 17. (a) Define a normed linear vector space for the linear vector space which consists of all the « x m matrices, (b) What are the linear operators in the space? 46 CHAPTER 2 VECTORS AND MATRICES

FURTHER READING

Bellman, R. (1970). "Introduction to Matrix Analysis." McGraw-Hill, New York. Bronson, R. (1995). "Linear Algebra: an Introduction." Academic Press, San Diego. Hoffman, K. and Kunze, R. (1971). "Linear Algebra." 2nd Ed., Prentice Hall International, Englewood Cliffs, NJ. Householder, A. S. (1965). "The Theory of Matrices in Numerical Analysis." Blaisdell, New York. Nomizu, K. (1966). "Fundamentals of Linear Algebra." McGraw-Hill, New York. Noble B, and Daniel, J. W. (1977). "Applied Linear Algebra." Prentice Hall International, Englewood Cliffs, NJ. Smiley, M. E (1951). "Algebra of Matrices." Allyn & Bacon, Needham Heights, MA. Wade, T. L. (1951). "The Algebra of Vectors and Matrices." Addison-Wesley, Reading, MA. SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

3J. SYNOPSIS

In this chapter we will deal exclusively with problems involving square matrices. We know from Cramer's rule that if |A| ^^ the linear system Ax = b has a unique solution. However, we learned in Chapter 1 that a straightforward application of Cramer's rule for a large system can be prohibitively expensive. In the next section we shall discover that a method due to Gauss (Gauss elimination) greatly reduces the cost of solving linear equations. We will explain how Gauss elimination can be used to find the inverse of matrices and the LU-decomposition when it exists. The LU-decomposition of a matrix can lead to reduced costs of solving sequential linear equations involving the same matrix. It also can be the most economical way to store the matrix A for eventual computer solutions to linear equations. In each row of a band matrix, there are nonzero elements only in p columns to the left and q columns to the right of the main diagonal element. This structure leads to very efficient LU-decomposition ii p -{-q is small compared to the dimension of the vector space. Despite the power of Gauss elimination, it is often more efficient to use iterative methods for solving linear equations. We will consider three such methods—the Jacobi, the Gauss-Seidel, and the successive overrelaxation (SOR) methods—and explore the conditions for the convergence of each. We will close the chapter by introducing the Picard and Newton-Raphson methods for solving nonlinear algebraic systems. The Newton-Raphson method is the more powerful, albeit more complicated, of the two methods. Getting a good first guess is essential

47 48 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

to the success of both methods and thus continuation methods are discussed. We will see that, when the Newton-Raphson method converges, it does so quadratically with the number of iterations. This quadratic convergence is not only very efficient, but it also provides a fingerprint whereby programming errors can be detected.

3.2. SIMPLE GAUSS ELIMINATION

Consider the equation Ux = b, where U is an upper triangular matrix; i.e., U has nonzero elements only on and above the main diagonal. The equations corresponding to Ux = b are

W 22-^2 + " = b. (3.2.1)

Symbolically, these equations can be represented in Fig. 3.2.1, where the triangle represents the matrix U and the vertical lines represent the vectors x and b. Expansion by cofactors shows that the determinant of U is given by |U| = H-Li ^^r Thus, if none of the diagonal elements of U is 0, then |U| ^ 0 and Eq. (3.2.1) has a unique solution. The solution can be determined by back substitution, namely.

x„ =

n-I ^n-\,n-^n ^n-l *«-!.«-

FIGURE 3.2.1 SIMPLE GAUSS ELIMINATION 49

" u (3.2.2) ^^_^-I..=.>i^.-.^.^ /=«-l,n-2,...,l.

From x„ we compute jc„_i, from which we compute x„,2^ etc. until {jf„,..., jCj} have been determined. To estimate the time it takes to evaluate the x^, we ignore the faster addition and subtraction operations and count only the slower multiplication and division operations. We note that at the iih step there are {n — /) multiplications and one division. Thus, the total number of multiplications and divisions can be determined from the series 1 n J ^(^ -/ + !) = Y^(n -/ + !) = -n{n + 1). (3.2.3)

The way to evaluate a sum such as that in Eq. (3.2.3) is to note that in analogy with the integral

^(n — / + 1) Ai <-> / (n - / -f 1) di '=* ^ (3.2.4) = n(n-l)-^(n^-l)+/T-l,

we expect a sum of first-order terms in n and summed to n terms to be a quadratic function of n. Thus, we try

n Y^(n -/ + !)= an^ + fo/z + c. (3.2.5) i=\ By taking the three special cases, « = 1, 2, and 3, we generate three equations for a, b, and c. The solution to these equations is a = b = ^ and c = 0, thus yielding Eq. (3.2.3). By blind computation with Cramer's rule, we saw in Eq. (1.3.9) that the number of multiplications and divisions of an nth-order system is (n — \)(n/e)" or 3.7 X 10^^^ if w = 100. By back substitution, Eq. (3.2.1) requires 50(101) = 5.05 X 10^ such operations if n = 100. Quite a savings! Similarly, if L is a lower triangular matrix, the linear system Lx = b is

^21-^1 "^ '22-^2 ~ ^2 (3.2.6)

represented symbolically in Fig. 3.2.2. The determinant of L is |L| = Vl"=\hi- Thus, if no diagonal elements are 0, |L| ^ 0 and Eq. (3.2.6) has a unique solution, 50 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

FIGURE 3.2.2

which can be computed by forward substitution, namely, b,

Ml bj l2[Xi

I2 2

^1 Xi = (3.2.1)

X: = / = 2,..., n. I. We note that at the ith step there are / — 1 multiplications and one division. Thus, solution of Eq. (3.2.5) by forward substitution requires 1 J2i = •;^n(n-\- 1)

multiplications and divisions, which is the same as solution of Eq. (3.2.1) by back substitution. Clearly, if the matrix A is an upper or lower triangular matrix, the cost of solving Ax = b is much smaller than what is expected by Cramer's rule. The method of Gauss takes advantage of such savings by transforming the problem to an upper triangular problem and solving it by back substitution. One could, equally easily, transform the problem to a lower triangular matrix and solve it by forward substitution. Consider the set of equations

2JCI + JC2 + 3^:3 = 1

3:^1 + 2:^2 + ^3 = 2 (3.2.8)

4xi + ^2 4- 2JC3 = 3. SIMPLE GAUSS ELIMINATION 51

We can multiply the first equation by — | and add it to the second equation and then multiply the first equation by — | and add it to the third equation. The result is

ij,, -lx, = l- (3.2.9) 2 2 "^ 2

—X2 — 4JC3 = 1.

Now we multiply the second equation by 2 and add it to the third one to obtain 2Xy -\- X2 + 3JC3 = 1 1 7 1 2^2-2^3 = 2 ^^-^-^^^ -11JC3=:2.

Note that the equations have been rearranged into upper triangular form. By back substitution, 2 Xj —"I T 3 X2 = (3.2.11) ) " 0 + ^ + 10 = sl = JC, 2 IT The preceding steps constitute an example of solution of an algebraic system by Gauss elimination. In the general case, one wants to solve the system Ax = b, or

a^Xi + ^12^:2 -f- ai3^3 + • • • + ^In^n = ^1

«21-^1 + «22-^2 + «23-^3 + ' ' ' + «2n-^« = ^2 (3.2.12)

As the first step in the elimination process, we multiply the first equation by —ail/an and add it to the /th equation for / = 2,..., n. The result is

a\\^x, + al^'^2 + «l3 ^3 + • • • + <^n = ^1

«22^^2+«23^^3 + ---+«2«^^«=^2 (3.2.13)

^:^2+^:^3+---+«;i^n=e

The coefficients in the first equation are unchanged and are denoted as a^^j^ to distinguish them from the coefficients in the equations / = 2,...,«, which have 5 2 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

changed. In fact,

(2) _{\) "/I ^(1) % """'7 (1) 1; ' I, J —L, , . . ,n. a II (3.2.14) bf>=Z,.">-V/'- '=2,...,n. Ml

In the next step, the second equation is multiplied by —a^a l^ri ^^^ ^^^ result is added to the /th equation for / = 3,..., n. This eliminates the terms in X2 in equations / = 3,..., n and generates the set of equations

^22 ^2 "^ ^23 ^^3 + ' ' ' + ^2n ^n ^ ^2 af^x, + --- + 4„^x„=bf^ (3.2.15)

Continuation of this process eventually transforms Eq. (3.2.12) into the upper triangular problem

a\\^x, + a|^^X2 + a[\\, + • • • + a['^x„ = b'^^

«2?^2 + «2?^3 + • • • + «2n^^'n = ^f -,(3)^ , I ^(3)^ _ 1,(3) <3% + "-+<^«=^3 (3.2.16)

^r, which can be solved by back substitution. This procedure is Gauss elimination in its simplest form. Since the operations used in transforming Eq. (3.2.12) to Eq. (3.2.16) involve only the additions to or subtractions from various equations by multiples of other equations, the determinant of the matrix in Eq. (3.2.16) is the same as the determinant of A, i.e.,

|A| = n4f. (3.2.17) k=l

Thus, a by-product of Gauss elimination is the evaluation of the determinant of A. Let us now calculate the number of multiplications and divisions needed for Gauss elimination. At the A;th elimination step, there are / = fc + 1,..., n ratios af^/af^ to compute and 7 = /: -f- 1,..., n + 1 products {af^^ la^kk)^tj to compute for each /. For the purpose of counting, we call bf^ the {n + l)th element. Thus, SIMPLE GAUSS ELIMINATION 53

it takes

n-l E(^ - ^) + E(« - ^)('^ + l-k) = ln(n - 1) -f \n(n^ - 1) (3.2.18) k=\ k=\ ^ ^

multiplications and divisions to carry out the upper triangularization of the system of equations. To this we must add ^n{n + 1) multiplications and divisions to carry out the back substitution to get a final solution. Thus, the total number of operations for Gauss elimination is ^n{n^i.,/^^2 -f_L- 3Qn^ _ j^ With A7 = 100 and a computer taking 10"^ sec for an arithmetic operation, a Gauss elimination solution costs about |(100)^ X 10"^ sec = 3.3x lO"'* sec compared to 3.7x 10^"^^ years for Cramer's rule. In carrying out the steps of Gauss elimination, the variables jr^, JC2,... play only a passive role. What is happening is that the augmented matrix

augA = [A,b] (3.2.19)

is being rearranged by addition and subtraction of multiples of rows. In the case of the problem in Eq. (3.2.8), the augmented matrix is

2 13 1 augA 3 2 12 (3.2.20) 4 12 3

The first step in Gauss elimination is the subtraction of | times the first row from the second row and twice the first row from the third row to obtain

2 1 3 1 1 7 1 0 (3.2.21) 2 ~2 2 0 -1 -4 1

In the next step, twice the second row is added to the third row to obtain

2 1 3 1 _7 1 « \ = [A„,bJ. (3.2.22) ~2 2 0 0 -11 2

Once Gauss elimination has transformed the augmented matrix to the form [Atf, b,J, where A^. is an upper triangular matrix, the solution to Ax = b can be computed from

A„x = b^ (3.2.23)

by backward substitution. 5 4 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Gauss elimination as just carried out above was based on the assumption that the coefficients afj^ were always nonzero. This is not always the case, even when |A| ^ 0 and a solution is assured. Consider the equations

^1 ~\~ ^X'2 + 3^3 = 1

2JCI -h Ax2 -f 2JC3 == 1 (3.2.24)

X^ 1 -^2 + A:3 = 3.

the first step in Gauss elimination,, the set becomes

vVj -|- 2wV2 "f 3JC3 = 1

-4JC3 = -1 (3.2.25)

~X2- - 2;c3 = 2.

In this case, af^ = 0 and so the next step would fail since ^23/^22' = 00. On the other hand, if the second and third equations are interchanged, we obtain

^j + 2JC2 -f 3JC3 = 1

-JC2 - 2x3 = 2 (3.2.26)

-4^3 = -1, which can be solved by backward substitution. As long as |A| ^ 0, at any step k in Gauss elimination there will exist at least one equation / > k such that the coefficient a^f ^ 0. Otherwise, the determinant of A would be 0, in contradiction to the nonsingular case we are considering in this chapter. A strategy of implementation of Gauss elimination that lends itself to an easy computer program can now be described. We want to solve the nxn problem Ax = b. First, we define the components of bi as h^ = a, „_,_i and the equations as rows /?,, i.e.,

il- ^11-^1 + ^12^2 + • —h«i„A:„ — «i „+i

ii- ^21-^1 + «22-^2 + • • • -^ ^2n^n — %,n+l (3.2.27)

n • ^n\^\ + a^2^2 + ''• * ' ^nn^n ^ ^n,n-fl

The elements of the augmented matrix, aug A, are fl,y, / = 1,..., n, 7 = 1,..., n + I. In a computer code to execute Gauss elimination, we set the dimension of the augmented matrix to be /i x (n + 1), input the elements ay, and set DET = I. Then we carry out the following sequence of operations:

Elimination Step 1. For / = 1,..., n — 1, do steps 2-5. Step 2. Find the first-row p, i < p < n, with nonzero entry in the iih column (a^^i / 0). If none is found, then NO SOLUTION, |A| = 0, and STOP. Otherwise, GAUSS ELIMINATION WITH PIVOTING 5 5

Step 3. If p ^ i, then perform a row swap

Rp ^ Rf (3.2.28)

and set

DET = {-\)DET. (3.2.29)

Step 4. For 7 =/ + 1,...,«, do step 5. Step 5. Set

/?,-> R:-^R:, (3.2.30)

Step 6. If a„„ = 0, then NO SOLUTION, |A| = 0, and STOP.

Backward Substitution Step 1. Set

x^ = ^S!l±l., (3.2.31)

5^^p 8. For / = n — 1,..., 1, set

î = («i,n+i - E îj^j)/îi' (3.2.32) j=i+i

5rep 9.

\A\ = DETl\a,,. (3.2.33)

Mathematica routines are provided in the Appendix for the Gauss elimination algorithm described above and in Section 3.3. In the computer program just outlined, we assume that the elements of the augmented matrix are written over at every elimination step. Thus, at the end of the elimination process, the augmented matrix is composed of the elements Oy\ i = 1,..., w, j = /,..., n 4- 1. Note that an interchange of rows changes the sign of the determinant, and so the role of Eq. (3.2.29) is to keep track of the net sign change. Subtracting a multiple of a row from other rows does not change the sign of a determinant, and so there is no effect on the determinant in Eq. (3.2.30). The determinant of A is then the product Hi ^ti tidies the net sign change from row swaps; hence Eq. (3.2.33) follows.

3.3. GAUSS ELIMINATION WITH PIVOTING

If all our arithmetic is done with infinite precision, Gauss elimination with row swaps as described in the preceding section will always yield the correct solution 5 6 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

to Ax = b whenever |A| ^0. However, with finite-figure floating-point arithmetic, round-off errors can cause problems. For example, consider the equations

jc, + I.OOOOIJC2 + 2JC3 = 2 (3.3.1)

^1 -1-2X2 +2-^3 = 1- Gauss elimination reduces these equations to

X^+X2+X2 = 1 0.00001jC2-fJC3 = 1 (3.3.2)

99,999JC3 = 100,000.

Let us now perform back substitution with a computer (admittedly a little fellow, but it makes the point) that does four-figure floating-point arithmetic. We find

JC3 = 1.0000, X2 = 0, jci = 0. (3.3.3)

The actual solution is, of course,

^3 = -^^^ = -^2, ^1 = 1. (3.3.4)

One way to try to avoid round-off errors is to introduce so-called "pivoting" strategies in executing Gauss elimination. Consider the general elimination problem at the /cth step. The equations are of the form

<^n -^1 ~^ ^12-^2 + 4. ^(2) _ , (2) ^22 ^2 "^

%-] + (3.3.5)

+ •< ^ ^kn ^n — ^k

These equations can be represented schematically as in Fig. 3.3.1. When the row swap as described in the previous section is not numerically stable, partial pivoting is frequently used to increase stability. In this pivoting strategy, we determine the largest magnitude of the elements aj^, j = k, k -h \, ..., n. Suppose this is the element a^^^, i.e.,

l4',^| = max|af;|, (3.3.6)

i.e., a^pl is the largest magnitude element of the fcth column lying below the (fc—l)th row. In the partial pivoting scheme, rows k and p are swapped and the next Gauss elimination step is carried out. The interchange process of partial pivoting is shown in Fig. 3.3.2. GAUSS ELIMINATION WITH PIVOTING 57

FIGURE 3.3.1

EXERCISE 3.3.1. Show that partial pivoting eliminates the round-off problem encountered in solving Eq. (3.3.1). Note that the only difference between partial pivoting and the Gauss elimination process outlined in Eqs. (3.2.27)-(3.2.33) is that in the former one hunts the first nonzero ajl\ i > k, whereas in the latter one seeks the largest of |a,-f |, / > k. In either case, once the row of the object element has been determined, say row p, one interchanges rows k and p and continues the Gauss ehmination process. Thus, in partial pivoting the only change in the process given by Eqs. (3.2.27)-(3.2.33) is to replace the search for the first-row p, k < p < n, with nonzero a^f^ by the search for the row p with the maximum of |fl,-f |, / > k. Although not as often used, complete pivoting gives even further numerical stability. In this scheme, one hunts the row p and column q containing the largest value of \ay^\i = k,... ,n, j = k,... ,n, i.e.,

7^*^ I = max (fe)| (3.3.7) ^^ k

The rows k and p and columns k and q are then interchanged so that element a^^^^ becomes the pivot element for the next step of Gauss elimination. Schematically, the interchange is shown in Fig. 3.3.3. Complete pivoting costs substantially more to compute and is more complicated to program, and so it is not usually employed in canned computer programs. The column interchange amounts to renaming the elements of the vector x, and so this feature needs to be added to a computer program executing complete pivoting.

row k

row p

FIGURE 3.3.2 58 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

column k column q

row k

row p

FIGURE 3.3.3

In closing, we note that the simplest Gauss elimination procedure, without row interchange or pivoting, frequently works and is guaranteed to be successful in two cases: (1) when A is strictly diagonally dominant, i.e..

(3.3.8) 7 = 1

and (2) when A is positive definite, i.e., x^Ax > 0 for all x 7^ 0.

3.4. COMPUTING THE INVERSE OF A MATRIX

In Chapter 1 we saw that the inverse of a nonsingular matrix (|A| 7*^ 0) is given by

A"* = TTT^^JA (3.4.1)

where the ij element of adj A is the ji cofactor of A. In retrospect, this would be an expensive way to compute A~^ A cheaper method can be devised using Gauss elimination. Consider the p problems

Ax, =b,, / = l,...,p. (3.4.2)

These can be summarized in partitioned form as

[Axi, Ax2,..., AXp] = [bj, b2,..., b^]

A[Xi,...,Xp] = [bi,...,bp]. (3.4.3) COMPUTING THE INVERSE OF A MATRIX 59

The augmented matrix corresponding to Eq. (3.4.3) is

[A,bi,b2,. ,bp]. (3.4.4)

If Gauss elimination is carried out on this augmented matrix, the number of division and multiplication operations is

n—\ n—l t 1 J2(n-k)-^Y.(n-k)(n + p-k)=^-n(n^-l)-\--pn{n-l). (3.4.5)

Programming Gauss elimination for Eq. (3.4.4) is the same as that which was oudined in Eqs. (3.2.26)-(3.2.32), or its variation with pivoting, except that the augmented matrix is of dimension n x (n + p) and it has elements a,j, /, ] — !,...,«, and a, „^^ = 6,^, / = 1,..., n, ; = 1,..., /?. Gauss elimination transforms Eq. (3.4.4) into

[Ajy, bj tr,..., b^ jj, (3.4.6)

where A^^ is an upper triangular matrix. The solutions to the equations in Eq. (3.4.2) can now be computed by backward substitution from

A X — b / = 1,. . . ,p. (3.4.7)

The number of division and multiplication operations required to solve these equations is

-«(n - 1), (3.4.8)

and so the total number of operations needed to solve Eq. (3.4.2) for x, is the sum of Eqs. (3.4.5) and (3.4.8), i.e.. 1 -n(n - \)^-pn(n- 1). (3.4.9)

Let us now consider the problem of finding the inverse of A. Define x, to be the solution of

Ax, = e,, / = 1,... ,n, (3.4.10)

where e, is the unit vector defined in Chapter 2:

(3.4.11) 60 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

where only the /th element of e, is nonzero. Defining the matrix

X = [Xi,...,x,], (3.4.12)

we see that

AX = [Ax,,..., AxJ = [e„ ... ,ej = I, (3.4.13)

where I = [5,^] is the unit matrix. We now assume that A is nonsingular; i.e., A~^ exists. Multiplying Eq. (3.4.13) by A"^ we find

X = [xi,...,xj = A ^; (3,4.14)

i.e., the inverse of A is a matrix whose column vectors are the solution of the set of equations in Eq. (3.4.10). We have just shown that the number of divisions and multiplications for solving n such equations is

nin^~ l)+n^(n-l). (3.4.15)

Thus, for large n, the inverse can be computed with |n^ operations. EXAMPLE 3.4.1. Find the inverse of

2 1 0 A = 1 -2 1 (3.4.16) 0 1 -2

The appropriate augmented matrix is

2 1 0 1 0 0 1 -2 1 0 1 0 (3.4.17) 0 1 -2 0 0 1

After the first elimination step, the matrix is

2 1 0 10 0 3 0 1 1 1 0 ~2 0 1 -2001

After the second step,

-] 2 1 0 1 0 0 3 1 l^tr' ^1, tr' ^2, tr» ^3, trl — 0 1 1 0 (3.4.18) ~2 2 4 1 2 0 0 1 ~3 3 3 LU-DECOMPOSITION 61

Solving AtrXj = e, ^r by backward substitution, we obtain

3 1 1 4 2 4 i-i 1 1 [X|, X2, X3] — -1 (3.4.19) ~2 ~2 1 1 3 - 4 2 4 J

3.5. LU-DECONPOSmON

Under certain conditions, a matrix A can be factored into a product LU in which L is a lower triangular matrix and U is an upper triangular matrix. The following theorem identifies a class of matrices that can be decomposed into a product LU. LU-DECOMPOSITION THEOREM. IfAisannxn matrix such that

lAJ^O, /» = 1,... ,n — 1, (3.5.1)

where A is the matrix formed by the elements at the intersections of the first p rows and cohimns of A, then there exists a unique lower triangular matrix L = [/,y], /„ = 1, and a unique upper triangular matrix U = [u^j] such that

A = LU. (3.5.2)

The determinants |A^|, /? = 1,..., n, are known as principal minors of A. Note that since |A| = |A„| = 0 is allowed by the theorem, an LU-decomposition can exist for a singular matrix (i.e., for a matrix for which A"^ does not exist), but not too singular since |A„_,| ^ 0 implies the rank of A is greater than or equal io n — \. An example of a matrix obeying the hypothesis of the theorem is

-2 1 0 A = 1 -2 1 (3.5.3) 0 0 0 In this case,

1 0 0

L = -2 ' (3.5.4) 0 0

and

1 0 U 0 -5 1 (3.5.5) 0 1 62 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

The reader can readily verify that LU = A and that |A| =0. Of course, A need not be singular, as the example

-3 1 1 A = 1 -3 1 (3.5.6) 1 1 -3

illustrates. In this case.

0 0

L = 1 0 (3.5.7) 3 1 L ~3 -2 ' and

-3 1 1 8 4 U = 0 (3.5.8) ~3 3 0 0 -2

It is worthwhile to note that the hypothesis of the theorem only constitutes a sufficient condition for LU-decomposition. For example, if

0 1 A = (3.5.9) 0 1

then |A[| = 0 and IA2I = |A| = 0. Nevertheless, there exist L and U such that

LU = A. (3.5.10)

In particular, if

1 0 0 1 and U = (3.5.11) 1 1 0 0

then

0 1 LU = A. (3.5.12) 0 1

If A = LU, it follows from the rules of matrix multiplication that

min(/, j) (3.5.13) k=l LU-DECOMPOSITION 63

The upper limit on the summation over k comes from the fact that L has only zero elements above the main diagonal and U has only zero elements below the main diagonal. When a matrix can be upper triangularized by Gauss elimination without interchange of rows (which is true, for example, for positive-definite matrices), LU- decomposition is generated as a by-product. Recall that at the ^th step of Gauss elimination we eliminate the coefficients aj^K i > k, by replacing the elements a^f, /, 7 =/: + l,...,n, by

a^+^> = al^' - m|f 4\ / = fc + 1,..., /I, 7 = ^,..., n, (3.5.14)

where

^^-i)^ /=^,...,«. (3.5.15)

Let us define a lower triangular matrix L = [/,y] by

/o = {5' '•-'' <3.5.16) [0, I < J. RecaUing that ajj^ = a,^, the original elements of A, and expressing Eq. (3.4.14) for the sequence of k's ranging from 1 to r, we obtain

-/n4\ / =2,... ,/i, 7 = 1,. .. ,/i,

ij ij

"ij "iJ h3^3j (3.5.17)

«.7=«.v +EUy- (3.5.18) k=\

Gauss elimination generates an upper triangular matrix with elements ajjJi) on and above the main diagonal and 0 below the main diagonal. With r = / — 1, Eq. (3.5.18) becomes

I (k) aij=a^ + j:hk4 *"' (3.5.19) Jk)

k=i 64 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

since /,, = 1. Since af-' = 0 if A: > 7, Eq. (3.5.19) becomes

min(/,y) (3.5.20)

Comparison of Eqs. (3.5.13) and (3.5.20) yields the conclusion that the LU- decomposition of A is

1 0 0 0

m-y 0 0 L = (3.5.21)

L ^ni ^ni m„ and

,(0 "11 "12

(2) 0 a 22 U = (3.5.22)

0 0 ^{n)

where m,-, = a-j/ajj and the elements afj of U are the elements resulting from the upper triangularization of A by Gauss elimination without row interchange. As an example of LU-decomposition by Gauss elimination, consider

-2 1 0 A = 1 -2 1 (3.5.23) 0 1 -2

Multiplying the first row by mji = —1 | and subtracting from the second row, and multiplying the first row by 1^3, = 0 and subtracting from the third row, we obtain

-2 0 I • (3.5.24) 0 1 -2

Multiplying the second row by m32 -1 and subtracting from the third row then yields

-2 1 0 3 U = 0 1 (3.5.25) "2 4 0 0 — '3 LU-DECOMPOSmON 65

With nijj = l,L = [mjj] becomes

1 0 0 1 L = 1 0 (3.5.26) 2 2 0 1

The reader can verify that A = LU. This example serves to illustrate how the LU- decomposition of A can be obtained by simply storing the multipliers m,y when the simplest Gauss elimination process (no row interchanges) works. As an example of how LU-decomposition might be useful, consider the equations

Ax, = b, (3.5.27)

Ax2 = b2(Xi) (3.5.28)

in which b2 depends on the solution of Eq. (3.5.27). Performing Gauss elimination twice to solve these equations requires twice n(n^ — l)/3 + n(n -f- l)/2 divisions and multiplications, whereas carrying out Gauss elimination to solve Eq. (3.5.27) costs n(n^ — \)/3-\-n(n + l)/2 and yields L and U. Equation (3.5.28) can then be written as

LUX2 = b2, (3.5.29)

and setting y = Ux2, the equation

Ly = b2

can be solved by forward substitution for y. Subsequently, the equation

Ux2 = y (3.5.30)

can be solved by backward substitution for X2 at a cost of «(n + 1). Thus, without using LU-decomposition, the solution for x, and X2 costs

}n(n^ - l)-\-n{n + 1), (3.5.31)

whereas by using LU-decomposition it only costs

-n(n^ - 1) + -n{n -f 1), (3.5.32)

a savings on the order of n^/3 for large n. As pointed out at the end of Section 3.3, the simplest Gauss elimination process is guaranteed to work for stricdy diagonally dominant matrices and for positive- definite matrices. Thus, LU-decomposition is assured for these two classes. 66 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

3.6. BAND MATRICES

If the matrix A is such that a^j = 0 for j > i -{- p and / > 7 + ^, we say A is a band matrix of bandwidth w = p -\-q + 1. Examples are

10 0 0 0 10 0 «,7 =0, 7 > I 4- 1, / > 7 + 1, w; = 1, (3.6.1) 0 0 10 0 0 0 1

and

2 2 0 0 5 2 4 0 au =0, 7 > / + 2, / > 7 + 2, w = 3. (3.6.2) 0 7 2 9 0 0 6 2

Symbolically, we represent band matrices as in Fig. 3.6.1. In the part of the matrix labeled A, there are nonzero elements. In the parts labeled 0, all elements are 0. When A is banded, n large, and p and q small, the number of operations in the Gauss elimination solution to Ax = b is on the order of npq, which is much smaller than the n^ required for a full matrix. One of the most important properties of a band matrix is that its LU- decomposition, when it exists, decomposes A into a product of more narrowly banded matrices. In particular, if the LU-decomposition can be achieved with Gauss elimination (without row interchange), it follows that the structure of L and U is as shown in Eq. (3.6.3) (see Fig. 3.6.2). Only in the parts of L and U labeled IJ and U' are there nonzero elements. As an example of the simplification resulting

FIGURE 3.6.1 BAND MATRICES 67

(3.6.3)

FIGURE 3.6.2 Equation (3.6.3).

from Eq. (3.6.3), consider the tridiagonal matrix and its LU-decomposition:

a, q h «2 ^2 b, a-.

K-i a,n-\ ^n-\ (3.6.4)

1 «! 5i

Pi 1 0^2 ^2 ft 1

0 Of. ft 1 Carrying out the matrix multiplications on the right-hand side of Eq. (3.6.4) and comparing with the corresponding elements in A, we find

5; = C, (3.6.5)

and

aj = flj, p^ = , a^ = ajt — Pk^k-i^ fe = 2,..., n. (3.6.6)

The number of operations needed to find the coefficients a^, )S,, and 5, are only 3(n — I) additions, multiplications, and divisions. As a special case of a tridiagonal matrix, consider

1-10 0 -12-10 (3.6.7) 0-1 2-1 0 0-12 68 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

With the rules given in Eq. (3.6.6), we find

1 0 0 0 -1 1 0 0 L = (3.6.8) 0 -1 1 0 0 0 -1 1

and

1 -1 0 0 0 1 -1 0 U (3.6.9) 0 0 1 -1 0 0 0 1

Note that U = L^ for this case. The inverse of L is easy to find. Consider the augmented matrix, [L, I], corresponding to Lx, = e,, / = 1,.,., 4,

1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 (3.6.10) 0 -1 1 0 0 0 1 0 0 0 -1 1 0 0 0 1

In this case, Gauss elimination gives

1 0 0 0 1 0 0 0 0 1 0 0 1 1 0 0 (3.6.11) 0 0 1 0 1 1 1 0 0 0 0 1 1 1 1 1

The solutions to Lx^ = e,, / = 1,..., 4, are rr "0" "0" "01 1 1 0 0 x, = , X2 = , X3 = , X4 — (3.6.12) 1 1 1 0 Li_ _1_ _1_ __1 J

and so the inverse of L, which is just the matrix [x,, X2, X3, X4], is the lower triangular matrix

10 0 0 110 0 (3.6.13) 1110 1111 BAND MATRICES 69

Since U = L^, it follows that U"' = (L"')^, or

1111 0 111 U-' (3.6.14) 0 0 11 0 0 0 1 By the same procedure, it is easy to show that the LU-decomposition of the n X n tridiagonal matrix

1 -1 0 -1 2 -1 (3.6.15) -1 2 -1 0 -1 2

U = L^ (3.6.16)

1 1 and

1 1 L' = U-' = (L-'K)T (3.6.17)

1 1 1 Since the inverse of A is U ^ L ', we find n n — I n — 2 n — 3 •• 2 1] n — \ n — \ n — 2 n — 3 •• 2 1 n — 2 n — 2 n — 2 n — 3 •• 2 1 A-i =

2 2 2 2 •• 2 1 1 1 1 1 1 1 becomes

'5 4 3 2 1 4 4 3 2 1 A-' = 3 3 3 2 1 2 2 2 2 1 1 1 1 1 1 70 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

An important point for band matrices can be made by comparing A~^ to L and U. A~* is a full matrix, requiring storage of n^ elements. L and U require storage of only 4n — 2 elements. To solve LUx = b, we could solve Ly = b and Ux = y in n(n — 1) multiplications and divisions, and so evaluation of A~^b requires only n^ divisions and multiplications. Thus, when storage is a problem, it is better to find the LU-decomposition of a banded matrix, store L and U, and solve LUx = b as needed. As another example, consider the n x n tridiagonal matrix

2 -1 0 1 2 -1 A = (3.6.18) -1 2 -1 0 -1 2

The main diagonal elements are 2, whereas the elements on the diagonals just above and just below the main diagonal are —1. For this case, we find

L = (3.6.19)

j-i 1 «-l

and -1 0 3 -1 2 U = (3.6.20) 0 1+1 -1 J n + 1

The inverses of L and U are not as easy to compute as in the previous example. However, the solution of LUx = b only requires n(n — 1) divisions and multiplications. Matrix theory plays an important and often critical role in the analysis of differential equations. Although we will explore the general theory and formal solutions to specific differential equations in Chapter 10, we are nonetheless confronted BAND MATRICES 71

with the cold hard fact that most practical applications in engineering and applied science offer no analytical solutions. We must, therefore, rely on numerical procedures in solving such problems. These procedures, as with the finite-difference method, usually involve discretizing coordinates into some finite grid in which differential quantities can be approximated. As a physical example, consider the change in concentration c of a single- component solution under diffusion. The equation obeyed by c in time t and direction y is

dc d^c - = D-,. (3A2„

where D is the diffusion coefficient. At r = 0, we are given the initial condition

c(yj) = fiy), (3.6.22)

and at the boundaries of the system, y = 0 and y = L, the concentration is fixed at the values

c(y = 0, 0 = a (3.6.23) c(y^Lj)^p,

Using the finite-difference approximation, we estimate c only at a set of nodal points y, = / Ay, / = 1,.. ., /t, where (n -[-1) Ay = L. In other words, we divide the interval (0, L) into n -f- 1 identical intervals as shown in Fig. 3.6.3. If we let Ci denote c(}?,, t) (the value of c at j,), then the finite-difference approximation of d^c/dy^ at y, is given by

^^c{yi,t) q_i - Icj -h c^^x .^ . ^.. —E"l— = 7T~u • (3.6.24)

With this approximation, Eq. (3.6.21), at the nodal points, reduces to the set of equations

^^\ •—- = a - 2ci -h ^2 dx ^ = c,_, - 2c, -^ q^„ / = 2,.. ., n - 1, (3.6.25) ax

where x = t Ay^/D.

yo yi y2 yn-i yn Vn+i I I I I I j I \ j 0 I Ay I FIGURE 3.6.3 72 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

With the definitions

^1 a 0 and b = (3.6.26) 0

the equations in Eq. (3.6.25) can be summarized as

dc = -Ac + b, (3.6.27) dx where A is the tridiagonal matrix defined by Eq. (3.6.18). The initial condition corresponding to Eq. (3.6.22) is then given by

fiyx) fiyi) c(r = 0) = (3.6.28) L/wJ If we replace the boundary condition c(y = O^t) = a by the boundary condition dc/dy = 0 at J = 0 (barrier to diffusion), it gives for the finite-element approximation

Ci = c{y = 0, t) (3.6.29)

and changes the first equation in Eq. (3.6.25) to dc. = -Ci + C2 (3.6.30) dx Once again, we obtain Eq. (3.6.27), but the matrix A in this case is the one given by Eq. (3.6.15). We note that, at steady state, dc/dt = 0, and the concentration at the nodal points is given by Ac = b. The formal solution to Eq. (3.6.27) can, therefore, be written as

c(r) = c(0)exp(-rA) -h f exp(-(r - ^)A) Jqb. (3.6.31)

To compute c(r) from this formula, we need to know how to calculate the exponential of a matrix. Fortunately, the eigenvalue analysis discussed in Chapters 5 and 6 will make this easy. As a further example, consider a one-dimensional problem in which the concentration c{x, t) of some species is fixed at jc = 0 and x = a, i.e.,

c(0, t) = a, c(a, t) = p. (3.6.32) BAND MATRICES 73

Further, the concentration initially has the distribution

c(jc, r = 0) = /r(jc), (3.6.33)

and in the interval 0 < JC < a is described by the equation

J^T-i^m. (3.6.34)

where D is the diffusion coefficient and /(c) represents the production or consumption rate of the species by chemical reaction. In general, /(c) will not be linear so that analytical solutions to Eq. (3.6.34) are not available. So, again, we solve this problem by introducing the finite- difference approximation. Again, the first and second spatial derivatives of c at the nodal points x^ are approximated by the difference formulas:

dc{xj, t) ^ c(xj^^J) - c{xj_^,t) (3.6.35) dx 2AJC and d^c(xj,t) _ c(xj^i,t)~-2c(xj,t) + c{Xj_^J) (3.6.36) dx^ i^xf These are the so-called centered difference formulas, which approximate dc(Xj)/dx and d^cixj)/dx^ to order (Axf. With the notation

Cj =c(Xj,t), (3.6.37)

the finite-difference approximation to Eq. (3.6.34) at the nodal points 7 = 1,..., n is given by - 2c^ + c^-_i + /(C;). (3.6,38) dt (AJC)2

With the boundary conditions CQ = a and c„_^i = ^, Eq. (3.6.38) can be expressed as the matrix equation

— = Ac 4- ——-f (c) -h b, (3.6.39) dx D where x ^ tD/{Axf

fci' a "1 fie,) 0 c = b = f = (3.6.40) 0 Lc„. L^J I f(c„) J 74 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

and

2 -1 0 0 1 2 -1 0 0 -1 2 0 A = (3.6.41)

0 2 -1 0 0 -1 2

With the initial condition

h(xO c(0) = (3.6.42)

h(Xn)

Eq. (3.6.39) can be solved to determine the time evolution of the concentration at the nodal points. If the nodal points are sufficiently close together, c{t) provides a good approximation to c(x, t). For a two-dimensional problem with square boundaries, the above problem becomes

dc d^c' (3.6.43) dx 2+^)dy^ + fie).

The initial condition is given by

c(x,y,t =0) =h(x,y), (3.6.44)

and the boundary conditions are indicated in Fig. 3.6.4. The finite-difference approximation for the square is generated by dividing the square into (n +1)^ identical cells of area (A.jc)^. Numbering the nodal points, indicated by solid circles in the figure, from 1 to n^ as indicated above, we can express the finite-difference approximations of the second derivatives at the ith node as

d% - 2c, + c,._, '•.+1 (3.6.45) (Ax)' - 2c, + c;_„ ^i+n (3.6.46) df {Axf With the boundary conditions given above, the matrix form of the finite-difference approximation to Eq. (3.6.43) is

— = -Ac + ——-f (c) H- b. (3.6.47) dr D c and f (c) are as defined in Eq. (3.6.40), but A in this case has five diagonals with nonzero elements and b is more complicated than before. For the special BAND MATRICES 75

c{x,a,t) =j{x)

—-T ^—-v.v-^—-

i-l --•: ? + l c{0,y,t) = l3{y) c(a,y,t) = %) 1—--t--—tr--n

i___i ___l____4

y A c{x,0,t) = a{x) -^x FIGURE 3.6.4

case n = 3, the formulas for A and b are

4 -1 0 - 1 0 0 0 0 0 1 4 -1 0-1 0 0 0 0 0 -1 4 () 0 -1 0 0 0 1 0 0 1^-1 0-1 0 0 0 -1 0 - 14-10 -1 0 (3.6.48) 0 0 -1 () -1 4 0 0 -1 0 0 0 - 10 0 4 -1 0 0 0 0 (3 -1 0 -1 4 -1 0 0 0 (3 0-10 -1 4

and

" a(x,) + )8(y,) ' a(x2) a{Xj) + S{y^) Piyd b = 0 (3.6.49)

yixg) 76 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Although the two-dimensional problem is more complicated to reduce to matrix form, the resulting matrix equation is about as simple as that of the one-dimensional problem. EXAMPLE 3.6.1 (Mass Separation with a Staged Absorber). It is common in engineering practice to strip some solute from a fluid by contacting it through countercurrent flow with another fluid that absorbs the solute. Schematically, the device for this can be represented as a sequence of stages, at each stage of which the solute reaches equilibrium between the fluid streams. This equilibrium is frequently characterized by a constant K. Thus, at stage /, the concentration x^ in one fluid is related to the concentration >?, in the other fluid by the equation

yi = KXi. (3.6.50)

A schematic diagram of the absorber is shown in Fig. 3.6.5, L is the flow rate of inert carrier fluid in stream 1 and G is the corresponding quantity in stream 2. Lx^ is the rate at which solute leaves stage j with stream 1 and Gjj is the rate at which it leaves stage j with stream 2. Thus, if hx^ and gy^ represent the amount of solute in stage j in streams 1 and 2, h and g being the "hold-up" capacities of each stream, then a mass balance on stage j yields the equation of change

dXj --Lxj_i-hGyj^^-Lxj-Gyj, 7 = 1,...,n. (3.6.51) h^^s''^dt dt

Input of Stream 1 Output of Stream 2

Stage 1

i I L, xi G, J/2 Stage 2

k L, X2 ' G, yz Stage 3 L, X3 I J G, J/n-l Stage n — 1

L, Xn^l G, Vn Stage n

J, Xn I Output of Stream 1 Input of Stream 2 FIGURE 3.6.5 BAND MATRICES 77

Combined with Eq. (3.6.50), this expression becomes

dx —^ = Lx:^ - (L + KG)Xj + KGx:^^, (3.6.52) ax where x ^t/{h-\-gK).\n matrix form, this set of equations can be summarized as

dx = Ax + b, (3.6.53) dx where

LXr,

X = b = (3.6.54) 0

and

-{L + GK) GK 0 0 L -iL + GK) GK 0 0 L -(L + GK) GK

(L-^GK) (3.6.55)

A is referred to as a tridiagonal matrix because it has nonzero elements only along the main diagonal and along two other diagonals parallel to the main diagonal. Such matrix problems are especially easy to handle. In a typical problem L, G, K, XQ, and y„^i are the givens. Questions asked might be: (1) How many stages n are needed for a given separation jc„? (2) What are the steady-state values of jc, and 3^,? (3) Is the steady state stable? (4) What is the start-up behavior of the absorber? (5) Can cyclic behavior occur? The countercurrent separation process described above leads to an equation like Eq. (3.6.27) with a tridiagonal matrix A in which a^ — —(L + GK), b^ = L, and Ci — GK. At steady state, Ax = —b, where

LXQ 0 b = 0 Gyn+\

Consider a process where a liquid stream of 35001b^/hr water containing 1% nicotine is contacted with pure kerosene (40001b^/hr) in a countercurrent 78 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

•H TABLE 3.6.1 Existing Nicotine Concentration versus Number of Theoretical Trays

7 0.00125 8 0.00111 9 0.00099 10 0.00090 11 0.00083

liquid-liquid extractor at room temperature. The dilute partition coefficient for nicotine is estimated as AT = 0.876 (weight fraction nicotine in kerosene)/(weight fraction nicotine in H2O). If the target nicotine concentration in water is jc„ < 0.001, how many theoretical trays are required? To solve this problem, the matrix equation Ax = —b must be solved at different values of n (the number of stages). The matrix A and the vectors x and b are defined in Eqs. (3.6.54) and (3.6.55). Given L = 3500, G = 4000, K = 0.876, XQ=:0.01, and y,^_^_^ =0, the Gauss elimination procedure can be applied to find x(/i) as a function of n. The correct number of theoretical stages is then the minimum value of n such that x„ < 0.001. Table 3.6.1 shows the values of x„ for different values of n. For this system, the required number of theoretical trays is 9. A Mathematica program to solve this problem is given in the Appendix.

3.7. ITERATIVE METHODS FOR SOLVING Ax = b

Cramer's rule and Gauss elimination are direct methods for solving linear equations. Gauss elimination is the preferred method for large problems when sufficient computer memory is available for matrix storage. However, iterative methods are sometimes used because they are easier to program or require less computer memory for storage. Jacobi's method is perhaps the simplest iterative method. Consider the equations

J2^ij^j = ^/' / = 1,..., /I, (3.7.1)

for the case a,, / 0, / = 1,..., n. Note that Eq. (3.7.1) can be rewritten as

^i = (-i^u^j + ^,j/%^ / = 1,...,n. (3.7.2) ITERATIVE METHODS FOR SOLVING Ax = b 79

If we guess the solution {x^^\ ... ,xl^^} and insert it on the right-hand side of Eq. (3.7.2), we obtain the new estimate [x{^\ ..., jc^^^} of the solution from

^r^ = (- £""y^T + ^.)/«n- r = 1,..., n. (3.7.3)

Continuing the process, we find that at the (^-|-l)th iteration the estimate is given by

^(M) = (^- X:a,jxf^ + b^/a,,, / = 1,..., n. (3.7.4)

J¥^i

If x^*^ converges to a limit x as ^ -> oo, then x is the solution to the linear equation set in Eq. (3.7.4); i.e., x is the solution to Ax = b. This is called the Jacobi iteration method. If |A| = 0, the method, of course, fails. However, even if |A| =^ 0, the method does not necessarily converge to a solution. We will discuss the convergence problem further, later in this section. In computing x\^^^\ ^2*^^\ • • • from Eq. (3.7.4), we note that when calculating ^(A^+i) fj>Q^ jj^g /ji^ equation the estimates xf^^\ ..., Jc/i|^^ are already known. Thus, we could hope to accelerate convergence by updating the quantities already known on the right-hand side of Eq. (3.7.4). In other words, we can estimate jc/*^^^ from

i-i "~ '^ ' ' / = l,...,n. (3.7.5)

This iteration scheme is called the Gauss-Seidel method and it usually converges faster than the Jacobi method. Still another method, which is a variation of the Gauss-Seidel method, is the successive overrelaxation (SOR) method. If we define the "residual" r/^^ as

^ik) ^ (k+l) _ (k)

(3.7.6)

^ 7 = 1 7=/ ^

where xf^^^ and xf^ are Gauss-Seidel estimates, then rf^ is a measure of how fast the iterations are converging. For example, if successive estimates oscillate a lot, then rf^ will be rather large. The SOR method "relaxes" the {k + l)th oscillation somewhat by averaging the estimate xf^ and the residual. Namely, the {k -f l)th SOR estimate is defined by

xf^''=xf'+corf' 0.1.1) 80 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Jk+l)

(3.7.8)

Thus, the SOR estimate of jc,.(^+1- ) i; s a weighted average of the A;th estimate x;ik) (weighted by 1 - CD) and the estimate of Jcf "^^^ computed by the Gauss-Seidel method (the coefficient of co in Eq. (3.7.8)). Note that co is an arbitrary weighting parameter—although, as we shall claim shortly, it cannot lie outside the range 0 < ft)< 2 if the method is to converge. For every matrix A, there is a "best value" CO of the SOR parameter for which the method converges the most rapidly. With 0) = 1, the SOR method is the same as the Gauss-Seidel method. The SOR method will converge faster than the Gauss-Seidel method if the optimal o) is used. To illustrate the three iterative methods just described, consider the problem Ax = b, where

4 -1 -1 0 0 1 4 0 -1 0 A = 1 0 4 -1 -1 (3.7.9a) 0 -1 -1 4 0 0 0 -1 0 4

and

b = (3.7.9b)

Table 3.7.1 shows jcf^ versus iteration k for the three methods using as the zeroth-order estimate x^^^ = 0. The solution found by Gauss elimination is given at the bottom of the table and we will use this as the correct solution x and define the error of the iterative result as

error = ||x'(k) (3.7.10)

We see that, in order to find a solution with error less than 10"'*, it takes 15 iterations for the Jacobi method, 8 for the Gauss-Seidel method, and 5 for the SOR method. The value of co (1.1035) used is optimal for the matrix A. For this matrix, SOR is the best method, Gauss-Seidel the next best, and Jacobi the poorest. Mathematica programs have been included in the Appendix for all three of these methods. ITERATIVE METHODS FOR SOLVING Ax = b 81

TABLE 3.7.1 Comparison of Rates of Convergence of Three Iterative Solutions of Ax = b for A and b Given in Eqs. (3.7.9a) and (3.7.9b)

k x^ ^2 ^3 x ^5

Jacob! method

1 0.25 0.5 0 0.25 0.5 2 0.375 0.625 0.25 0.375 0.5 3 0.46875 0.6875 0.3125 0.46875? 0.5625 4 0.5 0.734375 0.375 0.5 0.578125 5 0.527344 0.75 0.394531 0.527394 0.59375 6 0.536133 0.763672 0.412109 0.536133 0.598633 7 0.543945 0.768066 0.417725 0.543945 0.603027 8 0.546448 0.771973 0.422729 0.546448 0.604431 9 0.548676 0.773224 0.424332 0.548676 0.605682 10 0.549389 0.774338 0.425758 0.549389 0.606083 11 0.550024 0.774694 0.426215 0.550024 0.60644 12 0.550227 0.775012 0.426622 0.550227 0.606554

Gauss-Seidel method

1 0.25 0.5625 0.0625 0.40625 0.515625 2 0.40625 0.703125 0.332031 0.508789 0.583008 3 0.508789 0.754395 0.400146 0.538635 0.600037 4 0.538635 0.769318 0.419327 0.547161 0.604832 5 0.547161 0.773581 0.424788 0.549592 0.606197 6 0.549592 0.774796 0.426345 0.550285 0.606586 7 0.550285 0.775143 0.426789 0.550483 0.606697

SOR method

1 0.275875 0.627857 0.076107 0.470081 0.572746 2 0.441528 0.738257 0.401619 0.541685 0.603268 3 0.54464? 0.77503? 0.424549 0.550745 0.606434 4 0.550439 0.775323 0.427148 0.550605 0.606824 5 0.550636 0.775309 0.427003 0.550575 0.606743

Gauss elimination method

0.550562 0.775281 0.426966 0.550562 0.606742

The defining equations for Jacobi iteration, Eq. (3.7.2), can be written in vector form as

X = BjX-fCj, (3.7.11)

where

Bj = -D HL + U) (3.7.12)

and

c, = D 'b. (3.7.13) 82 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

The quantities D, L, and U are given by

au11 0 D = (3.7.14) 0

L = «31 «32 (3.7.15)

L^nl ^n2 -*n,n—1

and

fo «12 «13 0 «23

U = 0 *3« (3.7.16) 0 L 0

D is a diagonal matrix with the diagonal elements of A along the main diagonal. L is a lower diagonal matrix having O's on the main diagonal and the elements of A below the main diagonal. U is an upper diagonal matrix having O's on the main diagonal and the elements of A above the main diagonal. Thus, it follows that

A = D + L + U. (3.7.17)

Clearly, Eq. (3.7.11) is a rearrangement of Ax = b, which can be accomplished only if none of the diagonal elements is 0. Note that L and U defined here have nothing to do with the L and U in our discussion of LU-decomposition. The iterative solution of Eq. (3.7.11) is given by

x^^+^> = BjX^^^ + Cj, it = 0, 1 (3.7.18)

The vector form of the defining equations of the Gauss-Seidel iteration is

(I + D L)x = -D^Ux + D^b

^ "~ "GS^ "^ ^GS» (3.7.19)

where

BGS = -(D + L)-^U (3.7.20) ITERATIVE METHODS FOR SOLVING Ax = b , 83

and

Cos^CD-fLr^b. (3.7.21)

Equation (3.7.19) is again just a formal rearrangement of Ax = b, which is allowed if A has no zero elements on the main diagonal. D + L is a lower diagonal matrix, which is nonsingular if |D| i^ 0. The corresponding iterative solution of Eq. (3.7.19) becomes

x^*+i> = Bosx^*^ + Cos, ^ = 0, 1,.... (3.7.22)

Finally, the SOR iteration begins with the vector equation

X = BSORX + CSOR, (3.7.23)

where

BsoR = (D + oA.)~\(\ - ^)D - oy\i\ (3.7.24)

and

CsoR = (I> + ^L)''b. (3.7.25)

Equation (3.7.23) is again a formal rearrangement of Ax = b, which is allowed if |D| 7^ 0 or a^^ 7«^ 0, / = 1,..., /2, and the iterative solution to Eq. (3.7.23) is

x^*-''^ = BSORX^'> + CsoR, ^ = 0, 1,.... (3.7.26)

The components of Eqs. (3.7.18), (3.7.22), and (3.7.26) can be shown to correspond to Eqs. (3.7.4), (3.7.5), and (3.7.7), which are the Jacobi, Gauss-Seidel, and SOR equations, respectively. We now need some mechanism to predict the convergence properties of iterative methods. It turns out that a sufficient condition for convergence of the iteration process to the solution x of Ax = b is provided by the following theorem:

THEOREM. //

IIBII < 1,

then the process

^(k+i) ^ g^(^) _^^^ it = 0, 1,..., (3.7.27)

generates the solution x of

X = Bx + c. (3.7.28) 84 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

To prove the theorem, suppose x is the solution of Eq. (3.7.28). Subtracting Eq. (3.7.28) from Eq. (3.7.27), we obtain

x(fc+i) _ X = B(x^*^ - x). (3.7.29)

Note that x^*^ - X = B(x^^^ - X)

x^'> - X = B(x<'> - X) = B2(X^^^ - X) (3.7.30)

x^^^-x = B*(x^^^-x). From the properties of matrix norms, it follows that

l|x^^^ - x|| < IIB^II ||x^^^ - x|| < ||Bf ||x^^^ - x||. (3.7.31)

Thus, if the norm ||B|| of B is less than 1, Eq. (3.7.31) implies that

lim ||x^^^-x|| :=0; (3.7.32)

i.e., the sequence x^^^ generated by Eq. (3.7.27) converges to the solution of Eq. (3,7.28). We know that if B = Bj, B^g, or BSQR, Eq. (3.7.27) is equivalent to Ax = b, and so the Jacobi, Gauss-Seidel, and SOR processes will generate the solution to Ax = b if the norm of the corresponding matrix B is less than 1. Interestingly, the value of c, and therefore of b, does not affect convergence of the iteration process. • EXERCISE 3.7.1. Use the /? = oo norm to prove that Jacobi's method con- • verges when A is a diagonally dominant matrix. Using the eigenvalue techniques that will be developed in Chapters 5 and 6, we can prove that the SOR method converges if and only ifB^Q^ exists (|D| / 0) and the parameter co lies in the range

0 < a>< 2. (3.7.33)

• EXERCISE 3.7.2. If iA| ^ 0 and A is a real n x n matrix, then the problem Ax = b can be solved by the conjugate gradient method. According to this method, we start with the initial guess XQ, compute the vector FQ = b—AXQ, and set Po = FQ. We then iterate _ llr,-Illr l|P2 (3.7.34a) <*1 p/^Ap,

"i+l = x,+«,P,- (3.7.34b)

r.+i = r, - a/Ap, (3.7.34c) l|r,+ilP A (3.7.34d) llr, IP p,+i = r,-+i + AP,- (3.7.34e) NONLINEAR EQUATIONS 85

for / = 1, 2,...,« — 1. A Mathematica program implementing the above conjugate gradient method is included in the Appendix. Show that the vector x„ converges to • • • the solution to Ax = b for a variety of matrices A.

3.8. NONLINEAR EQUATIONS

Most "real-world" problems are nonlinear. Given this fact, it might appear a bit surprising that linear equations receive so much attention in numerical analysis and applied mathematics. The reason is simple: the most common method for solving nonlinear systems is to do so iteratively through linearized equations. The Newton- Raphson method is frequently the method of choice for such iterative solutions. Another iterative approach is the Picard method, which is easier than, but not as powerful as, the Newton-Raphson method. Both methods require an initial guess of the solution and, as we will see, their success depends on the "goodness" of the initial guess. Although linear equations occasionally admit multiple solutions, they very often have unique solutions. Nonlinear equations, on the other hand, normally admit multiple solutions. For example, the equation

jc^-5x2+ 4 = 0 (3gj^

has four solutions, namely, jc = 1, — 1, 2, and —2. Similarly, the equation

cos^x-l^O (3.8.2)

has an infinite number of solutions, namely, x = nn, n = 0,±\, ±2,.... There are, of course, nonlinear equations that have unique solutions. For example,

e~^ - X = 0 (3.8.3)

has the unique solution JC = 0.567.. .. Furthermore, the equation e'""^ —x=0 has a unique solution for any positive real value of a and no solution at all for any negative real value of a. When there are multiple solutions, the particular solution to which an iterative scheme converges will depend on the initial guess. In this section we will concern ourselves with a system of n equations and n unknowns, i.e., /i(xi,...,xj = 0

: (3.8.4)

/„(xi,...,xj=0,

which we can express in vector form as

f (X) = 0, (3.8.5)

where f is a column vector whose components /) are nonlinear functions of the components x^ of the column vector x. 86 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

For Picard iteration, we rewrite f as f(x) = x — g(x), and Eq. (3.8.5) is rearranged to get

x = g(x). (3.8.6)

If we choose the initial estimate of the solution to be x^^\ subsequent estimates are then computed from

x(^4-i) ^ g(x(^))^ j^ = l,2,.... (3.8.7)

If this iteration converges, i.e., if

lim ||x^^+^^-x^*>||=0, (3.8.8)

then lim^_^^ x^^^ = x, where x is a solution of Eq. (3.8.5). In practical applications, the criterion for convergence is usually that one accepts x^^"^^^ as a solution when ||x(^+i) _ x^^^ll < €, where the tolerance 6 is a small number chosen by physical considerations. The Newton-Raphson method also begins with an initial guess x^^K However, this method is derived by expanding the equations in Eq. (3.8.4) in a multivariable Taylor series about x^^\ obtaining

4-0(x-x^^^)^ (3,8.9)

or, in vector notation,

f (X) = f (x^^^) + J(x^^^)(x - x^^^) -f 0(x - x^^^)\ (3.8.10)

The matrix J(x) is called the Jacobian matrix (or just the Jacobian) and has the elements

Ju = ^(x,..^.,x,). (3.8.11)

In the Newton-Raphson method, we assume the higher order terms in the Taylor expansion are small and we obtain the next guess x^^^ for the solution to f (x) = 0 from the expression

0 = f (x^^^) + J(x^2^ - x^^>), (3.8.12)

which can be solved using Gauss elimination or any of the iterative methods discussed in the previous section. Iteration of this process yields

0 = f (x^^^) + J(x^^^)(x<^+^^ - x^^^), (3.8.13)

where k = 1,2,.... When the process converges, the result x = linif^^^ x^*^ is a solution to f (x) = 0. NONLINEAR EQUATIONS 87

In practice, the quantity x^*+^^ is accepted as a solution when

||x(^+i) _ x^ii < ^^ and ||f(x^^-^^^)|| < €f, (3.8.14)

where the tolerances 6^ and €f are small quantities chosen by physical considerations. For example, if ||x^^"*^^^|| = 10^, then €^ = 1 could be sufficiently small for practical purposes. On the other hand, if Hx^^'+^^H = 10"^, then e^ should be considerably smaller than 10"^. One way to choose 6^ would be to require €, < 10-^11x^^^11, and likewise €f < 10-^||f(x^'))||. Of course, 10"^ could be replaced by a smaller or larger number, depending on the tolerance one is willing to accept. The Newton-Raphson method will always converge to a solution when the guess x^^^ is sufficiently close to a root of the equation f (x) = 0. This is not so for the Picard method. For instance, for the equation

X = -V3JC -e' + 2, (3.8.15a)

there is no guess for which the Picard method will converge to the solution, which is jc = —0.390272. Interestingly, if Eq. (3.8.15a) is squared and rearranged to get

x^ -\re' -2 X = , (3.8.15b)

then the guess x^^^ = 1 will lead to the convergence of a Picard iteration to the root X = —0.390272. In general, for x = g{x), when \dg/dx\ < 1 for x^^^ in the vicinity of the root x, the Picard method will converge. Otherwise, it is likely to fail. EXERCISE 3.8.1. Show that the value of dg/dx for x near -0.390272 guarantees convergence of the Picard method for Eq. (3.8.15b) but not for Eq. (3.8.15a). Explore how robust the Picard method is for Eq. (3.8.15b) by trying different initial guesses. An important feature of the Newton-Raphson method is that, when it starts to converge, the convergence is quadratic. In other words, if convergence begins and

||x(^+i)_x(*)||^10-^C, (3.8.16)

where C is some scale factor, then after v more iterations

Thus, at three iterations beyond the ^th iteration, the difference ||x^*+^+''^ -x^^+^'^H is a factor of 10"^ smaller than at the A:th iteration. This remarkable result actually can be used as a test of the correctness of the evaluation of the Jacobian in a problem. Any error in J will reduce the rate of convergence below the theoretical quadratic rate. Picard iteration usually converges at a much slower rate (usually arithmetically instead of quadratically) than the Newton-Raphson method, as illustrated in the following example. 88 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

••• EXAMPLE 3.8.1 (Illustration of Quadratic Convergence of the Newton- Raphson Method). Consider the system of equations

0 = Xj tan(xj) — Jx\ — x\ / : n (3.8.18)

^- tan(x,+7r/4) V' 1^4;*

Starting with the Picard method, the residual expressions in Eq. (3.8,18) are rewritten in the proper form as

Xx = 8i{xy,X2) =Xx-Xx tan(xi) + y x| - xj (3.8.19) «,=«,(„.,,,=,,+,

fx(xy,X2) =Xx tan(xi) - y x| - xj (3.8.20)

The Jacobian elements are then

+ Xi sec (jCi) + tan(A:i) 9-^1 y/xl - X Vl X2

Vz _ _{x,+iT/4) -— = , - coti X, H— I + (Jc, H— I IX, H— ) dx, ^jcl-ix,-^n/4f V' 4/ V* 4J\' 4/ a/2 a^2 y/xl-(xx+n/4f

With a tolerance of IQ-^^ (i.e., ||x^^+^> - x^^^H < 10"^'^ and ||f || < IQ-^^), the Newton-Raphson method yields the solutions x^ = 0.99279 and X2 = 1.81713. Table 3.8.1 illustrates the convergence properties of the Newton-Raphson procedure for the final five iterations. Ill A Mathematica program for solving this problem is given in the Appendix. NONLINEAR EQUATIONS 89

TABLE 3.8.1 Residuals and J;acobi am Elements

Residuals Jacobian

||x(fc*')_x(fc)|| ||f|| ill /l2 /2I hi

0.6185 X 10 ' 0.2130 X 10« 6.38067 -1.20243 6.64350 -4.55258 0.6540 X 10-2 0.1696 X 10' 5.58271 -1.19462 6.77445 -4.80056 0.5087 X 10~^ 0.1518 X 10^ 5.51184 -1.19396 6.81964 -4.85622 0.3951 X 10-» 0.1013 X 10-^ 5.51125 -1.19395 6.81972 -4.85644 0.1513 X 10-'5 0.1391 X 10-^'' 5.51125 -1.19395 6.81972 -4.85644

EXAMPLE 3.8.2. A common model of a chemical reactor is the continuously stirred tank reactor (CSTR). A feed stream with volumetric flow rate q pours into a well-stirred tank holding a liquid volume V. The concentration of the reactant (denoted here as A) and the temperature in the feed stream are c^^ (moles/volume) and TQ, respectively. We define the concentration and temperature in the tank as Cj^ and r, respectively, and note that the product stream will—in the ideal limit considered here—^have the same values. The exiting volumetric flow rate is q. The situation is illustrated by the diagram in Fig. 3.8.1. We assume that the reaction of A to its products is first order and that the rate of consumption of A in the tank is

-^0 exp(^) CAV. (3.8.21)

where R is the gas constant (Avogadro's number times k^) and k^ exp(—E/RT) is the Arrhenius formula for the rate constant—kQ and E being constants characteristic of the chemical species. Mass and energy balances on the contents of the tank yield, under appropriate conditions, the rates of change of c^ and T with time t:

= qiÂ, - (Â) - ôexpf — jc^ V = /i(r, c^) (3.8.22) dt and -E ypCp^ = qpC,{T^ -T)- A:oexp( c^V{AH)-AUiT-T,) RT) (3.8.23) The symbol p represents the density and Cp the specific heat per unit mass of the contents of the tank. AH denotes the molar heat of reaction and the term —AU(T ~ T^) accounts for heat lost from the tank to its surroundings (which is at temperature T^), U is the overall heat transfer coefficient and A is the area through which heat is lost. The first question to ask regarding the stirred tank reactor is: What are the steady states of the reactor under conditions of constant q, c^^, TQ, and T^l The answer is the roots of the equations o=/,(r,c^) (3.8.24) 90 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Feed Stream

y.cA^T Q^CA^T

FIGURE 3.8.1

The Newton-Raphson technique reduces finding these roots to a matrix problem. The next question is whether a given steady state (r% c\) is stable to small fluctuations. For small deviations of T and c^ from T^ and c^,/i(r, c^) and /2(r, c^) can be expanded in a Taylor series and truncated after linear terms. For such a situation, Eqs. (3.8.22) and (3.8.23) become dxi dt O-i^Xi -f- flj2-^2

— a2iXi + ^22-^2 dt or dx (3.8.25) ~dt = j(ci,nx, where

"-A ^A X = (3.8.26) 'T rrs

and -94 9/, 9c^ dT Js (3.8.27) 9/2 L dc^ 97 The steady state is stable if, for an arbitrary value of x at f = 0, Eq. (3.8.24) predicts that x -> 0 as r -> 00. As we shall learn later, the asymptotic behavior of X can be deduced from the properties of the matrix J. Indeed, if all the eigenvalues of J have negative real parts, then x will go to 0 as f goes to 00. Otherwise, x will not approach 0. It is worth noting that the matrix generated in solving Eq. (3.8.27) by the Newton-Raphson technique is the matrix A needed for stability analysis. EXAMPLE 3.8.3 (The Density Profile in a Liquid-Vapor Interface). According to an approximate molecular theory of a planar interface, the density n{x) between the liquid and vapor phase obeys the nonlinear integral equation

CO K(\x' -x\)[nix') - n{x)]dx' = fio{nix)) - n, (3.8.28) / -OO NONLINEAR EQUATIONS 9 I

where A'CIA:!) is a function of order unity for x on the order of a molecular interaction distance a and which is vanishingly small when \x\ ^ a. fi is the chemical potential of the system and /XQC'^) is the chemical potential of a homogeneous fluid at density n. For the van der Waals model, often used in qualitatively modeling fluid behavior,

fi^in) = IXHT) -k^T\nl — -l] + ^—^ - 2na, (3.8.29) \nb ) \ —no where a and h (h ^ a^) are parameters of the model. The van der Waals equation of state for the pressure of a homogeneous fluid is

p.{n) = —?— - n^a, (3.8.30) 1 — nb and the value of the chemical potential at the liquid and vapor coexistence densities, «, and Wg, are determined by the thermodynamic equilibrium conditions

Po(«,)=/'o('g (3.8.31)

Once /x is fixed, the density versus positions x in a liquid-vapor interface can be determined by solving Eq. (3.8.28). Of course, K must be specified for actual computations. The formula

^:(l^l) = ^exp(^) (3.8.32)

yields qualitatively correct results for a planar interface. To solve Eq. (3.8.28), we assume that, for x > L, n(x) = n^ and, for x < —L, «(jc) = Wj so that the equation becomes

f K(\x' -x\)n(x')dx' -\r K(\x'\)dx'\n(x) - fJio{n(x)) (3.8.33) = -fj,-n^ K{\x'-x\)dx'-nJ K(\x' - x\)dx'= g(x).

Dividing the interval (—L, L) into A^ intervals of width Ax and introducing the approximation (the trapezoidal rule)

j K{W-x,\Mx')dx' = Y^ AxKi\Xj-x,\)n(Xj)

-h -A^[^(|^o - XilMxo) + K{\x^ - Xi\)n(Xf^)], (3.8.34) we obtain for Eq. (3.8.33)

Y, KijHj - arii + Pi - ii^ini) = g,-, (3.8.35) 92 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Where K,j ^ Ax K(\xj - x^, a ^ /!^ KiW\) dx\ A ^ '^{K^^n, + Kj,,n^), n, = n{Xi), and g^ = g(Xi). The matrix formula for Eq. (3.8.35) is

Kn - an + jS - iiQ{n) = g. (3.8.36)

Since fio(n) is a nonhnear function of n, an iterative method must be used. The III Appendix has a Mathematica program to solve Eq. (3.8.36). Frequently, one is interested in the solution x to the problem

f (a, x) = 0 (3.8.37)

for various values of a parameter a. Suppose one has found the solution x(a) for the parameter value a. If or' is another value that is not far removed from a, we can take Xj = x{a) as the first guess in a Newton-Raphson scheme and obtain the solution x(aO by iteration of the equation

f (a^ x^*>) + ]{a\ x^'^){x^'^'^ - x^'^) = 0. (3.8.38)

This is called zeroth-order continuation of the Newton-Raphson technique. First-order continuation is an improvement over zeroth-order continuation. In this case, suppose we have found the solution x{a) from the Newton-Raphson method. Then consider a new value a' = a-f-5a, where 8a is a small displacement from a. We then estimate an initial value of x^^^ of x{a') from the equation

f(a, X) + J(a, x{a)){x^'\a') - x(a)) af . (3.8.39) Hot'-a) — {a\x) = 0. da a'=a, x=x(a) Since x(a) is a solution for the parameter value a, f (a, x{a)) = 0, and since J(a, x(a)) has been evaluated already in finding x(a), the only work to do in Eq. (3.8.39) to obtain x^^^ is to take the derivatives 3f/3Qr and solve the resulting linear equation for x^^\a'). From here the solution x(a') is obtained by the Newton-Raphson procedure in Eq. (3.8.38). If, instead of a single parameter a, there is a collection of parameters denoted by a, Eq. (3.8.37) is replaced by

J(a, x(a)){x^'\cc') - x(a)) + X^K' - «/)^ = 0, (3.8.40) a'=a, x=x(a)

where the components of a^ are of, + 5a,. Again the solution x(a') is found by iteration of Eq. (3.8.38). Continuation strategy is often used when there is no good first guess x^^^ of the solution at the parameter value(s) a of interest. If, however, at a, a good guess exists, then the solution x(a) is found and zeroth- or first-order continuation is used repeatedly by advancing the parameter(s) by some small value 8ot until the desired value a is reached. The following example illustrates the value of using continuation with the Newton-Raphson technique. NONLINEAR EQUATIONS 9 3

••• EXAMPLE 3.8.4 (Phase Diagram of a Polymer Solution). In the Flory-Huggins theory of polymeric solutions, the chemical potentials of solvent molecules and polymer molecules obey the equations

/x, = kTInx^ -\-kT(\ - -)X2 + 6JC^ (3.8.41)

fji2'=kT\nx2 + kT(l -v)x^ + v€JCp (3.8.42)

where x^ is the volume fraction of solvent, jCj is the volume fraction of polymer, and V is the molecular weight of the polymer measured in units of the solvent molecular weight. 6 is a polymer-solvent interaction parameter, k is Boltzmann's constant, and T is the absolute temperature. Note that the volume fractions sum to unity, i.e.,

jc,-fjc2 = l. (3.8.43)

Solution properties are determined by the values of the dimensionless parameters V and X = kT/€. At certain overall concentrations and temperatures, the solution splits into two coexisting phases at different concentrations. From thermodynamics, the conditions of phase coexistence are that

l^2\-^\ ' -^2 ^ — M2V^1 ' ^2 )'

where x" and jcf are the volume fractions of component / in phases a and p. As the parameter x increases (temperature increases), the compositions of the phases become more and more similar until, at the critical point, 1 1 1/ 1 \2 ^2c = 7—7= and - =. (1 + ) , (3.8.45)

and the phases are indistinguishable. Above jCjc, there are no coexisting phases. For the polymer scientist, the problem is to solve Eq. (3.8.44) to determine the two-phase coexistence curve as a function of x- If the variables x = x^, y = xf, and

fi = ^h(^' 1 - ^) - i^i(y^ 1 - >')] (^•8-46)

are defined, the problem becomes

f = 0, (3.8.47)

where the components of f are /i and /2. The range of values of molecular weights V of interest to the polymer scientist are typically v > 100. Solving Eq. (3.8.44) requires a good first guess, which is difficult to obtain for y > 100. On the other hand, for v = 1, Eq. (3.8.44) admits the analytical solution 1-2JC ^ " ln[(l-jc)/jc] (3.8.48) y — l—x. 94 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

This suggests using the Newton-Raphson technique with continuation from v = 1 to find the coexistence curve for values of v of interest to the polymer scientist. The following routine solves for the coexistence curve at some given v (in this example, the goal will be to find the phase diagram for v = 500). The routine can be broken down into two parts, each of which uses first-order continuation. The first part uses the solution at y = 1 (at a particular choice of y) to determine the first set of points (x,y,x) foi* the coexistence curve at v = 500. This is accomplished by a succession of first-order continuation steps, followed by the Newton-Raphson technique at incrementing values of v. The incrementation of v must be chosen small enough to assure convergence of the Newton-Raphson steps. Once a solution is found at v = 500, the second part of the program uses the Newton-Raphson technique to solve (3.8.44) at several values of y, again using first-order continuation to construct the initial guesses x and x for ^^^h value of y. Alternatively, we could have chosen values of x and then computed values of x and y. However, since the y values sometimes approach unity, where the Jacobian is singular, the problem is better behaved when y is fixed instead. The derivatives needed for the calculation are

Vi 1 /. 1\ 2(1-X) dx X \ v/ X ^ = _- + (l-v) + dx I — X X 9/l ^ (1-^)2 _(l_y)2 9X X' 9/2 v(x^ - y') 2 '' ' (3.8.49)

dy y \ vJ X fi = J_-(l-v)-^ dy l-y X

dv v^

-—= -(x-J)-f- . dv X The solution is shown in Fig. 3.8.2, where we present a plot of x versus solvent volume fraction in the coexisting phases for y = 1, 50, and 500. At a given value of X below the critical point, there are two values of jcj, corresponding to the left and right branches of the curve in Fig. 3.8.2. These values of JCJ are the volume fractions of solvent in the coexisting phases a and p. It is interesting to note that for V = 500 not far below the critical point are two distinct coexisting phases having very little polymer. A Mathematica program to solve this problem is given in the Appendix. Here we outline the program logic in case the reader wishes to program the problem in some other language. NONLINEAR EQUATIONS 95

0.6

0.2 0.4 0.6 0.8 1.0 Solvent concentration (.r,i/) T"

X 1.4

0.6 0.7 0.8 0.9 1.0 Solvent concentration {x^y) 1.850 i/ = 500 critical point, 1.825

1.800

1.775

1.750

1.725 0,85 0.90 0.95 1.00 Solvent concentration (x,y) FIGURE 3.8.2 Two-phase coexistence curves for polymer solutions. We begin the program by defining the user input parameters. These are the variables of the program that both define the problem and establish the accuracy desired. The variable nucalc is the target value of v used for the calculation of the coexistence curve and deltanu is the step size used in the first continuation sequence from v = 1 to v = nucalc. The value of ymax defines the maximum desired value of y on the coexistence curve and ydelta is the step size used for the incrementation of y (in the second continuation sequence): nucalc — 500 deltanu — 0.005 yvnax = 0.999 2/de/^a = 0.001. 96 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Next, the parameters specific to the Newton-Raphson routine are defined. TOLERANCE is used in checking for the convergence of the solution and MAXITER sets the maximum number of iterations allowed in the calculation:

CONVERGE = 10-^ MAXITER = 100.

The solution arrays are then initialized to their values for v = 1. Here we are defining y{\] as ymax and when the coexistence curve is eventually calculated, y{i^ will take on values of y as it is deincremented by ydelta. The values of x[i] and XU] are the corresponding solution values for y[i]. Initially, the array need only be defined for / = 1:

yU] = ymax x[l]=l-y[l] X[l] = (1 - 2^[l])/ln[(l - x[l])A[l]].

Before beginning the main program body, the initial value of v must be set to 1 and the number of v steps (nustep) must be calculated for control of the loop:

v = 1 nustep = (nucalc — 1.0)/deltanu.

The Jacobian elements for the first continuation step must also be calculated (at V = 1):

jacobian[l, 1] = —- dx jacobian[\, 2] = —-

jacobian[2, 1] = —- dx a/2 jacobian[2, 2] = —.

The loop for the first continuation sequence begins. Here we let k range from 1 to nustep:

For k = I, nustep.

We now compute the residual vector for the next continuation step:

residual[\] = -deltanu dv a/2 residual[2] = deltanu, dv NONLINEAR EQUATIONS 97

and then compute the solution to the linear equations in Eq. (3.8.40) using a Gaussian elimination routine, xdel is the solution array containing (x^*"^^^ — X^*^) and (jc^''"^*^ — jc^*^) as elements:

xdel = LinearSolveijacobian, residual).

The values of x[l] and jc[l] are then updated and the value of v is incremented in preparation for the Newton-Raphson loop:

xW = xm + xdel[l] x[l] = x[l] + xdel[2].

Before beginning the Newton-Raphson loop, some internal variables need to be initialized. The variable normdel will be used to store the value of the modulus of xdel and normres will be the modulus of the residual vector residual. Both will be compared to TOLERANCE in order to determine if convergence has been obtained.

normdel = 0 normres — 0.

Also, the value of v must be updated:

V = V -f deltanu.

The Newton-Raphson routine first computes the updated components of the Jacobian matrix and the residual vector. Then it solves for the vector xdel using the Gaussian elimination subroutine. The respective moduli normdel and normres are then computed and the solutions x[l] and x{\] are updated. The if-statement checks for convergence using the criterion that both normdel and normres must be less than TOLERANCE. The loop is repeated until convergence is obtained or the maximum number of iterations is reached:

For i = 2, MAXITER

jacobian[l, 1] = dx

jacobian[l, 2] = dx jacobian[2, I] = 9A dx

jacobian[2, 2] = 9X residual[l] = -Mx[l],y[llxUlv) residual[2] = -/,(;c[l], ^[1], x[l], v) xdel = LinearSolveiJacobian, residual)

normdel = y/ixdel[\]'^ + xdel[2f) 9 8 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

normdel = y/{residual[l]^ + residualllf-) xm = xU] + xdel[l] x[l]=x[l]-hxdel[2] If (normdel < TOLERANCE) and (normres < TOLERANCE), then {exit loop) end loop

End loop over k:

end loop

At this point, a solution has been found at the target value of v = nucalc. The second continuation sequence now begins for the calculation of the coexistence curve. The first step is to find the minimum value of y for our deincrementation {ymin). This value corresponds to the critical point values given in Eq. (3.8.45). ymin = 1 — 1/(1 4- -/v). Now we need to find the number of deincrementation steps in y to take (nmax). The function Floor is used to find the maximum integer less than its argument and subtracting the small number 10~^^ is required to ensure that the critical point is not included in our count:

nmax = F\oor((ymax ~ ymin)/ydelta — 10"^^) + 1.

Next, the calculation of the coexistence curve proceeds by utilizing successive first- order continuation steps with respect to y. This is done by looping over the nmax values of y[i]. For / = 1, nmax.

Calculate the current value of y:

y[i] = ymax — (i — \)y delta.

Conduct first-order continuation and update of x and x *

residual[l] = —y delta

9/2 residual[2] = y delta ^yi-i xdel = LinearSolveiJacobian, residual) X[i]^Xli-l]-\-xdel[\] x[i]=^x[i - I]-h xdel[2]. NONLINEAR EQUATIONS 99

Begin the Newton-Raphson loop:

normdel = 0 normres = 0 For i=j,MAXITER dU jacobian[l, 1] = -^— dx a/i jacobian[l, 2] = — a/2 jacobian[2,1] = — dx jacobian[2, 2] = —

residual[\] = -/i(x[/], y[/], xLH, v) residual[2] = —/i(jc[/], j[/], /[/], v) xdel = LinearSolveijacobian, residual)

normdel = y (a:de/[l]^ + irrfe/[2]^)

normdel — yf {residual[\Y + residwtt/[2]^) X\i] = X{i] + xdel[\] x[i] = A^[/] + xdel[2] If (normdel < TOLERANCE) and (normres < TOLERANCE), then(exit loop) end loop end loop

Finally, calculate the values of jc, y, and x at the critical point:

x[nm,ax + 1] = ymin y[nmax + 1] = ymin • • • xlnmax + 1] = (l + l/\/(v + deltanu)f/2.

••• ILLUSTRATION 3.8.1 (Liquid-Vapor Phase Diagram). In dimensionless units, the van der Waals equation of state is

8j 3 (3.8.50)

where P T V 100 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

P, r, and V are pressure, temperature, and molar volume, respectively; and P^, T^, and Vc are the respective values of these quantities at the liquid-vapor critical point (the point at which liquid and vapor become indistinguishable). The conditions for the critical point are

= 0. (3.8.51) dx dx^ (i) Verify that these equations imply that jc = j = z = 1 at the critical point. The dimensional van der Waals equation is given by

P = JL.^^ (3.8.52) V-b V^ where R is the universal gas constant and b and a are molecular constants characteristic of a given fluid. (ii) Find P^, T^, and V^ in terms of R, b, and a. The chemical potential /JL of a van der Waals fluid is given by

(3.8.53) ix = V~b \b ) V At liquid-vapor equilibrium, the pressure and chemical potential of each phase are equal, i.e.. p(y„r)-P(K,r) = o (3.8.54) M(v„r)-M(K,r) = o, where V^ and V^ are the molar volumes of each phase. (iii) From these equilibrium conditions, find Vj and V^ as a function of temperature for temperatures less than T^. The problem is easier to solve in dimensionless variables. First, define w = lil\x^ and

•^1 X = (3.8.55)

where x, = Vi/V^ and x^ = Vy/K' ^^^

/i == ^(^1, y) - zix^. y) (3.8.56) /2 = u;(^i, y) - w{x^, y). Now let

f = /i (3.8.57) h and thus the equiUbrium equation to solve is

f (X) = 0. (3.8.58) NONLINEAR EQUATIONS 101

PIPc

FIGURE 3.8.3 Two-phase coexistence curves for van der Waals fluids.

Use the Newton-Raphson method to solve this equation for values of 0.50

(iv) Outside the liquid-vapor coexistence region, only one phase is stable. Plot P versus V for various temperatures. These curves are called pressure isotherms. Three such isotherms are illustrated in Fig. 3.8.4. Give a quantitative plot of P/P^ versus V/ V^ for several isotherms.

PIPc T

V/Vc FIGURE 3.8.4 P-V isotherms for van der Waals fluids. 102 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

hot ambient air water furnace at To cooler

graphite rod d ^""""•^ cladding

i T > • JL ' FIGURE 3.8.5

(v) In the van der Waals model, h denotes molecular volume and a/b denotes the strength of pair intermolecular attraction. Explain qualitatively, from your dimensionless phase diagram, how an increase or decrease in molecular size or intermolecular interaction affects the phase diagram of a van der Waals fluid. (vi) The fact that all phase diagrams of all fluids obeying the van der Waals equation of state reduce to the same dimensionless phase diagram is known as obeying the law of corresponding states. State this law physically and explain how • I one would use it. ••I ILLUSTRATION 3.8.2 (Cooling a Graphite Electrode). Consider the graphite electrode illustrated in Fig. 3.8.5. The electrode, of diameter d, passes from a hot furnace into a water cooler to help control the electrode temperature. The temperature t where the electrode leaves the furnace (at position jc = 0) is r(0) = 1600''C. The rod resides for a distance L in ambient air at temperature TQ = 20°C. In this space, the electrode is covered with an insulating cladding, which gives an overall effective heat transfer coefficient of U (between the air and the electrode). It is desired that the water cooler keep the temperature of the rod at 160°C at the point where it enters the cooler (at x = L). What is the rate that heat must be absorbed by the water cooler to hold T{L) = 160°C. Assuming that the temperature of the electrode varies only along its axis, it can be shown (see p. 41 of V. G. Jenson and G. V. Jeffreys, Mathematical Methods in Chemical Engineering, Academic Press, 1977) that it obeys the heat equation

d^T {dT\ ih-otT) PiT-T,)=0, (3.8.59) dx )

where IcQ — aT is the temperature-dependent thermal conductivity of graphite, and P = 4U/d. The parameters of the problem are

f/=:1.7WrCm^ ito = 152.6 W/°C m^ a = 0.056 W/(X)^ m J = 10 cm L = 25 cm 20°C NONLINEAR EQUATIONS 103

7(0) = 1600°C T(L) = 160"C.

The rate of heat transfer to the water cooler is /jtdVdT e = -(A:o-ar)f — j — atJc = L. (3.8.60)

The heat equation can be solved by two different methods. The first is accomplished by transforming the differential equation to an integral equation. Define u by

dx so that, by using the relation d{dT/dx) = {du/dT)dT, we obtain d^T du dx^ dx Then the heat equation becomes 1 diii^) ^ 6 —2a (/C^ o" - ccT)-^' dT -u^~ ^{Ta - To) = 0, (3.8.61) which can be rearranged first to d{ii^) la 2 ofl '^"^0 u — (3.8.62) dT KQ — ^^ kQ — aT aT and then to

1 d T --7-0 :((*o -aTfu (3.8.63) iK IT ') A-o •aT This expression integrates to

(*o - ctTfu^ = P{T - T^fiko - ocTo) - ^{T - T„)' + C, (3.8.64)

where C is a constant of integration. Recall that u = dT /dx. With this substitution, the above expression rearranges to dx = f(x)dT, which can be integrated to obtain

jc = / 7-pj dT = 0. (3.8.65a) •^no) (c + p(k^-aTy-laP{T-T^)y^ The constant of integration, C, is determined from /(c, = t - r fc^^:^ 5, .r = 0. •^no) i^c + Piko-aTf-lap{T-T^)Y (3.8.65b) (i) Calculate C by first plotting /(C) versus C and finding the interval of C in which /(c) switches from negative to positive. Then use the Picard method or the Newton-Raphson method to precisely determine C such that /(C) = 0. In evaluating /(C), use the trapezoidal rule for computing an integral. I 04 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

(ii) Once C is determined, calculate and plot x versus T, again using the trapezoidal rule. Determine the required cooling rate for the water cooler. Another way to solve the heat equation is to discretize it using the finite- difference approximation. Let JC,=/AJC, where Ax = L/{N + 1) and / = 0, 1,..., A^ -M. Next, let 7; = r(x,), / = 0,..., A^ + 1. The midpoint approximations to the first and second derivatives of 7, evaluated at jc,, are

dT (3.8.66) dx 2^x

and

d^T Ti+y - 27;. + 7;._, (3.8.67) dx^ i^xf

With these approximations, the heat equation becomes

<*. - "T.) ('"'(ly^"') - ^C-,.. - T,.,f - m - r.) = 0 (3.8.68) for i = 1,..., A^. The boundary points are given by

(3.8.69)

and ,^ - »^.)(""-'"(iy''-) ^ ^(n« - r,.,,' - «r. - r.) = o. (3.8.70)

(iii) Solve this nonlinear algebraic system for 7] using the Newton-Raphson • • • method and the above parameters. Compare the computation time for each method. ••• ILLUSTRATION 3.8.3 (Multireaction CSTR at Constant Temperature). Con- sider the CSTR analyzed in Example 3.8.2. We saw how the steady-state concentration could be determined by using the Newton-Raphson scheme. In the following problem we will solve for the steady-state output concentrations of a multicomponent mixture of reactants and products in a constant-temperature CSTR. The chemical species (A, B, C, and D) are assumed to be in dilute aqueous solution with input concentrations c^^, c^^, CQ, and Cjy^ (mol/L). The total volumetric flow rate is Q (L/s) and the reactor volume is V^ (L). The following reaction set applies:

A + B-^C (1) B-\-C -^ D (2) D -^ B -h C, (3) NONLINEAR EQUATIONS 105

with corresponding rate equations

'*2 = VB^C (3.8.71)

where r, is the rate in moles/second per volume of reaction /. For an ideal CSTR, the output concentrations are assumed to be equal to the mixture concentrations within the reactor (perfect mixing). Mole balances for each of the species can then be approximated as Component A Component B Component C 0 = Q(Cc^ - Cc) + (k^c^Cji - k^c^Cc + ^3^0)K Component D 0 = Q{Co^ - Co) + {k^c^Cc - k^Co)Vr' (3.8.72) The problem can be recast into dimensionless form by introducing the following quantities: Q Q* ^ TTf— (3.8.73)

k; ^ ^ (3.8.74)

c>-^. (3.8.75)

Only A and B are supplied to the reactor. Thus, the mole balance equations can be rewritten in dimensionless form as /, =0=e*(l-c*)-f(-c*c*) /2 = 0 = Q*{cl^ - c*) + i-c\cl - ^*4c* + k^cl) (3.8.76) /3 = 0 = Q*(-4) + (c^cl - klclcl + k^cl) /4 = 0 = (3*(-4) + {klc^cl - klcl). The next step is to define the problem in vector notation. By relabeling the dimensionless reactor concentrations as {A:,, ^2, X3, x^], i.e.,

X = (3.8.77)

D J we can recast the problem as f (X) = 0, (3.8.78) where f is defined in Eq. (3.8.76). 106 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Feed Stream V? îô :n / ^" / Recycle Stream Vr,T U ~ - Qout') Û 1

(l-a)Q»«< Precipitator C

FIGURE 3.8.6

(i) Use the Newton-Raphson method to solve for the output concentrations of the reactor when Q* = l,^ = 5,k^ = 10, and c^ = 1. (ii) The desired product is C, which is removed through precipitation from the reactor output stream. Note that D may be recycled to the reactor, possibly increasing the yield of C. Figure 3.8.6 illustrates the flow diagram. The precipitator is assumed to remove component C from the output stream without changing the total volumetric flow rate. We define the dimensionless output flow rate by Ql^^ = Gout/(K^i<^/io) ^"^ ^^^ recycle fraction a as the fraction of the reactor output flow rate that is recycled. Reformulate the problem in dimensionless parameters and assume the same input values of 2* and Cg . Solve the new system of equations for various values of 0 < a < 1 using zeroth- or first-order continuation. Plot the dimensionless production rate of C (GoutCj) as a function of a.

ILLUSTRATION 3.8.4 (Temperature Profile in a Plug Flow Reactor). It is desired to calculate the adiabatic temperature profile of a plug flow reactor for the gaseous phase reaction

A + B (3.8,79) The reactor, illustrated in Fig. 3.8.7, is loaded with catalyst beads and has an effective cross-sectional area of S^. We define the total molar flow rate through the reactor as A^ and the individual component flow rates as A^, such that

The initial flow rates are designated as {N^ , Ng , N(^ =0}. We define the steady- state conversion rate, which is a function of position along the catalyst bed, as $(jc). Mole balances on each species yield

(3.8.80) Ncix)=^ix) and (3.8.81)

where the total initial molar flow rate is NQ = Nj^^ + Ng^. NONLINEAR EQUATIONS 107

input stream output stream catalyst bed Se

h 0 x = L FIGURE 3.8.7

The overall rate of reaction at a point x in the catalyst bed is

,(;,) = 1^= ^_NA(X)__NBM_ (3.8.82) S^dx Vix)N(x)V(x)N{xy

where V{x) is the molar volume of the gas mixture given by

(3.8.83)

P is the pressure (assumed constant), R is the universal gas constant, and T(x) is the reactor temperature. The differential equation describing § is, therefore.

(3.8.84) dx ^^^^{RT(X))

with the initial condition that ^(0) = 0. If the reactor were operating isothermally, we would simply integrate Eq. (3.8.84) to generate an expression for ^(x). However, the temperature dependence must be determined through an enthalpy balance. Defining the molar heat capacity of component / as C,, the initial mixture temperature as 7], and the molar heat of reaction as H^ determined from the reference temperature T^, the enthalpy balance can be expressed as

0 = {T,-T,){C^N^^-\-C,N,^) (3.8.85)

We can recast these equations in dimensionless form by introducing

(3.8.86)

r = (3.8.87)

a = (3.8.88) N A(i kS^L [ P V (3.8.89) 108 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS ^4 (3.8.90) c r* — —H (3.8.91)

H. H* = (3.8.92)

where L is the total length of the catalyst bed. Equation (3.8.84) is then

(i-r)(«-n (3.8.93) (l+a-n*\ 2 (i) For the isothermal case, ^ is a constant and equal to 1. Equation (3.8.93) can be integrated directly to yield the following implicit expression for ^*:

(v2 / ^3 \ c* = r + -; ln|l - n - -; + (I + a)' + a In a-r 1 —a \l —« / a (3.8.94) where a ^ I. Plot ^* versus KX* on the interval 0 < JC* < 1 for several values of a. (ii) For the nonisothermal case, Eq. (3.8.85) can be written in dimensionless form as

0=l+aC;-^-^(l-r + C;(a-n) i-o. (3.8.95) 9-0, C*r - H*H\

where 0, = TJT^i^\, Solve the coupled equations (3.8.93) and (3.8.95) by either the trapezoidal rule or the finite-difference approximation for the dimensionless parameters AC = O.l, a = 0.5, C% — 1.2, CJ = 0.85, 0, = 0.25, and H* = 2.8. Plot the dimensionless conversion ^* and reactor temperature profile ^ for 0 < JC* < 1. Compare the output conversion to the isothermal case, (iii) Derive a solution for Eq. (3.8.93) for the isothermal case when or = 1. • • Plot §* versus KX* for this case.

PROBLEMS

1. Use Gauss elimination to solve

Ax = b,

where 1 2 3 -4 A = 2 3 4 and -5 3 1 2 0 PROBLEMS 109

2. Find the LU-decomposition of the matrix A in Problem 1. 3. Consider the matrix

3 -2 0 0 0 -1 3 -2 0 0 A = 0 -1 3 -2 0 0 0 -1 3 -2 0 0 0 -1 3

Without the aid of a computer: (a) Find the inverse A"^ of A. (b) Evaluate the determinants of A and A~^ (c) Solve Ax = b, where ^, = 1, / = 1,..., 5. 4. Write a computer program to carry out Gauss elimination with partial pivoting. 5. Consider heat transfer in a square slab of length 10 in. on a side. At steady state, the temperature obeys the equation

Assume that the temperatures of the boundaries of the plates are controlled so that

r(jc,0) = 370K(jc/10in.) T{0,y) = lOOKCv/lOin.) r(jc, 10 in.) = 100 K(l - jc/10 in.)^ r(10 in., y) = 370 K(l - y/10 in.)^

(a) Verify the finite-difference forms given for A and b in Eqs. (3.6.40) and (3.6.41). Generalize the results to an arbitrary subdivision of the square into n cells. (b) Using your Gauss elimination program, compute the temperature distribution by the method of finite differences. Estimate the discretization n for which the computed temperature obeys the accuracy criterion

-10 AO AO -10 / / T^"\x, y)dxdy- / r^"-^^(jc, >^) dx dy JQ JQ JQ JQ AO -10 < 10"^ / / T^"-^\x,y)dxdy,

where T^"^ and T^" ^^ are temperatures computed from successive discretization. I I 0 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

6. Solve the problem Ax = b, where A is a 100 x 100 tridiagonal matrix with -|-5 on the main diagonal, —3 on the diagonal just above the main diagonal, and —2 on the diagonal just below the main diagonal, and b is a vector with all elements equal to 1. Calculate x using the following methods: (a) Simple Gauss elimination. (b) The Jacobi method. (c) The Gauss-Seidel method. (d) The SOR method. Compare convergence properties and total evaluation times for each method. 7. Consider the system Ax = b, where

10"^jci + 10~^jC2 + JC3 = 2

lO'^jCi - 10-% 4- X3 = -2 X 10-^

jc, + JC2 + 2:^3 = 1.

(a) Does the system have a solution? Why? Using three-digit floating-point arithmetic, solve the system by: (b) Simple Gauss elimination without pivoting. (c) Gauss elimination with complete pivoting. (d) What is the exact solution to six-digit floating-point arithmetic? 8. (Least Squares Method). It is frequently known that some quantity p depends linearly on a set of variables, say 5^,..., s„, i.e.,

n P = Y,XjSj, (1)

where the x^ are constant coefficients. In the language of data fitting (statistics), Eq. (1) is referred to as a linear regression formula. Suppose that in an experiment run m times (where m does not necessarily equal n), the values of /?, ^i,..., 5„ are measured. We would like to use these data to estimate the values of the coefficients [xi,... .x^}. For this, we define

m / n \ 2 (2) i = \ \ ; = 1 / If the measurements could be done with infinite accuracy and if the regression formula (1) were indeed obeyed by p and the s^, then for each run / the equation

n PROBLEMS II

would hold and the quantity L would be identically 0. Any error in measurement would give rise to a violation of Eq. (3) and yield a positive value of L. L provides a collective measure of the error or scatter of data from all the runs. To determine the "best" set {.t,,..., x„} of coefficients from m experimental runs, we choose the coefficients so that L is a minimum. This procedure is known as the method of least squares for fitting a linear regression formula. (a) To minimize L, the coefficients must be chosen such that dL 0, k = \,. ,«, (4)

By defining the m x n matrix C such that c^j is the value of s^ measured in the ith experiment, prove that the condition in Eq. (4) leads to the matrix expression

where P\

(b) Use your Gauss elimination program to solve the equation when n = 5, m = 7, and the experimental data are as listed in Table 3.8. Indicate the degree of scattering of the data. 9. Consider the following three equations of state for single-component fluids: Redlich-Kwong: RT P = y-b s/TV{V+b) Soave: RT P = V-h V(V-f^) Peng-Robinson: RT aa V -b V^ + 2Vb-b^' Using the least squares method and the Newton-Raphson method, generate the "best fit" parameters for methane for each of the above equations of state using the experimental data given in Table 3.R9. Use your calculated parameters to plot P versus V for methane at T = 350 K for each equation of state. Include the above experimental values in your plot. 10. Consider the structural framework shown in Fig. 3.R10, consisting of n rods attached at a common pin joint at A. The rods are free swiveling at their attachment points and at the pin joint. We assume an external force f is applied at the pin joint, and denote {t[ ,tj as the tension of rods 1,..., n and {^j,..., ^„} as their orientations with respect to the x axis. 12 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

TABLE 3.R9

Data T(K) P (bar) V (Ug mol)

1 200 1.0 16.52 2 200 10.0 1.553 3 200 20.0 0.7149 4 300 1.0 24.90 5 300 10.0 2.452 6 300 20.0 1.206 7 400 1.0 33.24 8 400 10.0 3.310 9 400 20.0 1.648

Letting /j and /2 denote the x and y components of f, a force balance at A yields

St = f, (1)

where S is the 2 x n structure matrix

cos 01 COS O2 COS ^3 cosO„ S = (2) sin ^1 sin O2 sin 0^ sin^„

and t is an n-dimensional vector with components ?,,...,?„. With no applied force (f = 0), when none of the rods is under any tension, each of the rods is of some equilibrium length {/p /2, • • •, In) ^nd equilibrium orientation [0^, 0^,..., 0^}, (a) Defining the two-dimensional displacement vector d as the x- and 3;-coordinate values of the displacement of the pin joint (A) from its equilibrium position, and the ^-dimensional strain vector e as the

FIGURE 3.RI0 PROBLEMS I I 3

elongation of the rods under tension (e^ = /, — /?), prove that

e = S^d

in the "small strain" approximation (i.e., 0^ ^ 09). (b) If the rods are assumed to be Hookean substances, then the strain 6, = /, — /? obeys the linear law e, = A;,r,, where k^ is the flexibility constant (k^^ is the elastic modulus) of the /th rod. In matrix notation, e = Kt, where K is the matrix of materials constants

ki 0 0 •• • ^ 1 0 k2 0 • • ^ K = 0 0 ^3 0

0 0 0 • • KJ

Show that the strain d can be related to the external force f by Ad = f, where A = SR-^S'^. (c) Consider the case of five rods all made of steel of elastic modulus 207 X 10^ N/m^ (note the units). The above elastic modulus is defined by the relation cr = Ee, where a has units of stress (N/m^), e is the strain (€, = (/, — l^)/l^) in dimensionless form, and E is the elastic modulus. Assume that all of the rods have 0.1 cm^ cross-sectional area, and the attachment pins are separated by 2 cm (i.e., 2 cm apart) along the wall. When / = 0, the point A is at (8 cm, 7 cm) where the origin is located on the wall at the point of attachment of the lowest rod and the y axis is along the wall. If each of the components of the displacement vector d undergoes a 1% increase, find the force required and the tensions in each of the rods. (d) If the force applied at point A is given, set up the algorithm to find the displacement vector. Hint: Note that SK~^S^ is set up in terms of the ^,'s, which are unknown quantities. (e) Assume that a force of 50 kN is applied at A at 45° to the horizontal in the downward direction. Use the algorithm set up in part (d) to obtain the displacement vector d. Also find the new lengths of the rods and the tensions in each of them. Use the Newton-Raphson method to solve the system.

11. Solve the following equation:

X = I -{- exp(—flx/jc)

(a) Picard iteration (error criterion: (|A:^*+" - jc^*'| < 10"')). (b) The Newton-Raphson method (error criteria: |x**+" — JC***| < 10"'* and |/(;c<*'|< 10-*). (Let fl = i ) 14 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

12. Use the Newton-Raphson method with first-order continuation to find x as a function of a from the equation

X = I -{• exp(—aVx).

13. Consider the system of equations

0 = -jc + h^{exp(yy/x) + 3x^) 0 = ->' -h 0.5 + h^ tan(exp(jc) + J^).

Use the Newton-Raphson method to solve these equations for /i = 0.1 and 0.2. 14. Consider the concentric cylinders shown in Fig. 3,R14. Two fluids, labeled A and B, are separated by a meniscus, which is pinned on the tops of the inner (z = a,r = R^) and outer (z = b,r = R2) cylinders. At equilibrium, the meniscus position z = z{r) is determined by the Young-Laplace equation

-AP-\-Apgz = 2Hy, (1) where y is the surface tension, AP is the capillary pressure (pressure of phase A at z = 0 minus pressure that would exist at z = 0 if phase A were replaced by phase B), Ap is the difference between the density of fluids A and B, g is the acceleration of gravity, and H is the mean curvature, 1 dh 1 dz 2// = (2) [1 + (dz/dr^f^ dr^ + r[\ + {dz/drff^ dr '

One often wants to find the shape of the meniscus, i.e., the function z(r), by solving Eq. (1) subject to the boundary conditions

zir = RO and z(r = R2) = b.

z = b

2=0

FIGURE3.P.I4 PROBLEMS 115

(a) By introducing the finite-difference approximation, show that Eq. (1) takes the form

+ -^[(Ar)2 + (Zj^, - z,_i)V4]'^'[AP - Apgzj] = 0 (3) for 7 = 1,... ,/i. (b) Use the Newton-Raphson method to solve for the meniscus shape when fluid A is liquid water (at 293 K), fluid B is air at atmospheric pressure, R^ = \ cm, /?2 = 15 cm, a = I cm, and b = 1.2 cm. At 293 K, p^^ter = 0.998 g/cm\ p^^ = 0.0012 g/cm^ and y = 72.75 dyn/cm. Plot the resulting curve z(r). 15. In dimensionless units, the continuous stirred tank reactor equations can be expressed as

a expi I.* dt

—~ = 1 — ^2 + K exp( — — JJK*! — hX2.

Assume that or = 100, p = 5,y = 149.39, and 8 = 0.553. (a) Find the steady-state solutions. (b) Which steady-state solutions are stable to small perturbations? Why? (c) With the initial value XQ = (1 ±0.1)x% compute and plot x(t) versus time for each of the steady states x\ Use the Runge-Kutta method to solve the transient problem. The Runge-Kutta method to use for solving dx — =f(x), Xo = x(r=0), at is as follows:

X„+, = X„ + -[k, + 2k2 + 2k3 + k4],

with

k, = hUx„)

k, = ht(x„ + Ik,)

k3 = M(X„ + ik,)

k4 = hf(x„ + kj),

where x„ = x(/ == nh). I I 6 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

16. Consider the set of equations

/j = 1 + h^{Gxp(y^/x) + 3JC^) -kz-x=0 /2 = 0.5 + h^tan(exp(jc) + y^)-}-ksmz-y = 0 /3 = 1 4- h^xyz + kz^ -\-k^co^y-z = 0.

When h = k = 0, the solution to these equations is x = I, y = 0.5, and z = I. Use the Newton-Raphson method and continuation to explore solutions to the set as a function of the parameters h and k. In particular, find the solutions as a function of h when k = 0 and of k when h =0. Then look along the lines of constant values of h -\-k. 17. (a) Use Newton-Raphson method to determine the nonzero root of X = I — exp(—Qf^) to four correct decimal places (i.e., |6„_^il = \x,^_^_l — xj < 10~^) for a = 2. Choose J^Q = ^ to show that the solution converges quadratically. (b) Given a new a, set up the first-order continuation scheme for the above problem. Give a stepwise complete algorithm to obtain the new solution. Give all necessary equations. (c) Starting with a = 2 and the solution obtained from part (a), perform the first-order continuation and one Newton-Raphson iteration for a = 3.0. 18. The time-dependent reaction-diffusion equation in Eq. (3.6.34) can be solved iteratively by noting that the time derivative in the finite-difference approximation can be written as

dc c{x,t + At) — c(x,t) Jt ~ Xt '

By defining cf ^ as the value of c at the rth nodal point and the kth time step (i.e., ccj^^f ^ == cc{i( AJC, k At)), the matrix form of the time-dependent problem becomes

c'*+» - c<*> = ^[AC« + /(c<*>)] + A th, (1)

where A is defined in Eq. (3.6.41) (this is often referred to as the Euler method). Consider the reaction-diffusion problem for the second-order reaction f(c) = —kc^ over the spatial interval (0 < x < 1). Using the dimensionless values D = 0.001 and k = 0.25, plot the dimensionless concentration profile at r = 10 for the following initial and boundary conditions:

(a)

c(0, 0 = 0 c(l,r) = l c(x,0)=x^. PROBLEMS 117

(b)

c(0, 0 = 0 a —c(i,o = o dx c(jc,0)=JC^

(c) An important improvement to the Euler method is to replace the right-hand side of Eq. (1) with its future (A: + 1) values, giving

^iM) _ ^(k) ^ ^^\xc^M) + /(c<^+0)l + ^^ b (2)

(referred to as the implicit or backward Euler method). Write a program to solve Eq. (2) and repeat parts (a) and (b) above using the implicit Euler method. 19. Find the inverse of the n x n Hilbert matrix A whose components are 1

for the case n = 3 and 10. 20. For n = 10, solve by Gauss-Seidel iteration the equation Ax = b for ^j = 1, ^^ = 3, and h^ = 0, i 7^ 1 or 6, where A is the Hilbert matrix defined in Problem 19. 21. Find the LU-decomposition of the 10 x 10 matrix whose elements are

«n = 2/, «,,, + ! = -/, «/./_! = -/

and a^j — 0 otherwise. 22. The pressure and chemical potential of a van der Waals fluid are given by nkT -, P = - --n^a I — no (I -\ \1 nl:ihkT fi = -kT ln( — - 1 ) + -; 7 - 2na,

where n and T are the density and the temperature, respectively, k is Boltzmann's constant, and a and b are parameters characteristic of the fluid. A vapor of density n^ is in equilibrium with a liquid at density n, if

P{n,) = P{n,) (1) /x(n^) = /x(n,).

Find the densities n^ and n^ of coexisting vapor and liquid densities as a function of temperature T. The problem should be solved in terms of the dimensionless variables ^* ^Tb n*=:nb and T* = . a I 8 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

In terms of these dimensionless variables, the critical point is n* = | and T* = Yj' Above T*, liquid and vapor do not coexist (i.e., AT* = n* is the only solution to Eq. (1)), but for T* below T^ the equations in Eq. (1) admit the solutions ^?* / n* appropriate for coexistence. In terms of the dimensionless variables, the equations in Eq. (1) become

Use the Newton-Raphson method to solve these equations for n* and n\ as a function of temperature. A good first guess is needed. 23. Consider the van der Waals equation of state in Example 3.8.3:

nkT r, l-nb - n a

fji = -kT Inl — - 1 I + 2na, '(;s-'' ) I —nb

(a) Using the Newton-Raphson method, calculate the boiling-point temperature (T) and the equilibrium liquid and vapor densities (^i and «g) for ethanol (a = 12.016 atm-L^g-moF, b = 0.08405 L/g-mol) at atmospheric pressure (P = I atm). (b) Calculate and plot the planar interfacial density for ethanol using values of L/^ =1,5, and 10. 24. (Reversible Anionic Polymerization). A reaction scheme sometimes used to model polymerization is

ki I + M -> Py initiation step Pj + M^P2 polymerization steps

where / denotes the concentration of the initiator, M the monomer, and P„ the polymer containing n monomers. The rate equations corresponding to this scheme are

% = -kJM (1)

dM °° °° -7-=-klIM-k^Y,Pn+k2EPn (2) "' /!=1 n=2 dP, —^ = kJM - k^MPy 4- i^2^2 (3) at PROBLEMS 119

= A:,MP„_j - {kyM + k2)P„ + hPn^^. n>2. (4) dt

In matrix form, the rate equations for P|, P2' • • become

dV = AP + b, (5) dt

where

-kJM b = 0 (6)

and A is the tridiagonal matrix

-k^M k2 0 0 . 0 k^M -(itiM + itj) k2 0 • 0 A = 0 k^M -(k^M + k2) k2 0

(7)

This problem is special in that the vectors and matrices have an infinite number of entries and that A and b depend on time (through / and M, which must satisfy Eqs. (1) and (2)).

(a) Solve for the steady-state solution P* in terms of k^, ^2* ^O' ^^^ ?*' where MQ is the initial monomer concentration and f * is the steady-state conversion defined by ^* = (MQ — M*)/MQ (M* is the steady-state monomer concentration). (b) Describe how you would go about solving for the steady-state conversion §* for a given set of initial monomer and initiator concentrations. (c) In terms of the rate constants, what is the maximum possible value for the conversion §* for a given initial monomer concentration MQ? What initial conditions would be needed to achieve this conversion?

25. (Heat Exchanger Performance). As a design engineer, you have been asked to evaluate the performance of a heat exchanger and offer your advice. Two liquid streams (labeled "cold" and "hot") are contacted in a single-pass heat exchanger as shown in Fig. 3.P.25. Currently, the cold stream enters the exchanger at 95°F and exits at 110°F, while the hot stream enters at 150°F and exits at 115T. However, recent changes in the reactor temperature specification requires the cold stream to exit at 120°F (10° hotter than the current specs). Furthermore, the flow rates of both streams {Q^ and Qh) ^^^ fi^^d by other process constraints and cannot be changed. 120 CHAPTER 3 SOLUTION OF LINEAR AND NONLINEAR SYSTEMS

Qh Th,o

Qc "cold" ^\ >c

n t- Qh ThA "hot" FIGURE 3.P.25

Rather than design and install a new heat exchanger (a costly and time-consuming job), you observe that the two streams in the current configuration are contacted "co-currently." Thus, you suspect that simply swapping the input and output points of one of the streams (i.e., changing to a "countercurrent" configuration) may be all that is needed to meet the new specifications. Your goal is to estimate what the output temperatures of both streams would be (t^^ and r^.o) ^^ y^^ }^^^ rearranged the exchanger to be "countercurrent." (Assume that the heat transfer coefficient is unaffected.) (a) Use the following design equations to derive two independent equations for the two unknowns {t^^ and t^ J. Hint: Define new variables a^ = Q^C^^^/iUA) and a^ = GhQ,h/(^^) ^^^ ehminate them from the system of equations. "Cocurrent" configuration:

[(i'h,i-7;,i)-(7;,o-7;,o)] GcC„,c(7;,„-r,,i)-t/A in[(rh,i-r,,i)/(7;. Tc,o)y "Countercurrent" configuration:

c,o)] HiKo-T,,i)/(T^,i-tc,o)Y Variables:

Cc = flow rate of "cold" stream Qh = flow rate of "hot" stream Cy c = heat capacity of "cold" stream Q h = heat capacity of "hot" stream U = overall heat transfer coefficient A = effective contact area of exchanger r^ i = inlet temperature of "cold" stream FURTHER READING 121

T^ i = inlet temperature of "hot" stream T^^ = outlet temperature of "cold" stream (original configuration) 7]^^ z= outlet temperature of "hot" stream (original configuration) t^ ^ = outlet temperature of "cold" stream (new configuration) ^h, o = outlet temperature of "hot" stream (new configuration) (b) Calculate the Jacobian matrix elements for the Newton-Raphson procedure in terms of the current guesses (f^ ^ and ^^,0)- (c) Solve this problem using the Newton-Raphson method. (d) The process engineers inform you that the "cold" stream inlet temperature occasionally drops from 95 to 85°F. Using first-order continuation, calculate new initial guesses for a Newton-Raphson calculation at this new inlet temperature (T^ j = 85°F) based on the converged solutions (t*^ and t^ J from the old inlet temperature (r^ i = 95''F). Use the Newton-Raphson method to solve for the new outlet temperature.

FURTHER READING

Adby, P. R. (1980). "Applied Circuit Theory: Matrix and Computer Methods/' Halsted, New York. Chapra, S. C, and Canale, R. R (1988). "Numerical Methods for Engineers." McGraw-Hill, New York. Dahlquist, G., and Bjorck, A. (1974). "Numerical Methods." Prentice Hall International, Englewood Cliffs, NJ. Faddeeva, V. N. (1959). "Computational Methods of Linear Algebra." Dover, New York. Gere, J. M., and Weaver, W. (1965). "Analysis of Framed Structures." Van Nostrand, Princeton, NJ. Golub, G. H., and Van Loan, C. F. (1989). "Matrix Computations," 2nd Ed. Johns Hopkins Press, Baltimore. Hackbush, W. (1994). "Iterative Solution of Large Sparse Systems of Equations." Springer-Verlag, New York. Hoffman, J. (1992). "Numerical Methods for Engineers and Scientists." McGraw-Hill, New York. Householder, A. S. (1965). "The Theory of Matrices in Numerical Analysis." Blaisdell, Boston. Lewis, W. E., and Pryce, D. G. (1966). "Application of Matrix Theory to Electrical Engineering." Spon, London. Magid, A. R. (1985). "Applied Matrix Models: a Second Course in Linear Algebra with Computer Applications," Wiley, New York. Press, W. H., Teukolsky, S. A., Vetterling. W. T, and Flannery, B. P (1988). "Numerical Recipes in Fortran," 2nd Ed. Cambridge University Press, Cambridge, UK. Saad, Y (1996). "Iterative Methods for Sparse Linear Systems." PWS, Boston. Strang, G. (1988). "Linear Algebra and Its Applications." Academic Press, New York. Varga, R. S. (1962). "Matrix Iterative Analysis." Prentice Hall International, Englewood Cliffs, NJ. Watkins, S. W. (1991). "Fundamentals of Matrix Computations," Wiley, New York. Wolfram, S. (1996). "The Mathematica Book." 3rd Ed. Cambridge University Press, Cambridge, UK. This Page Intentionally Left Blank GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

4.1. SYNOPSIS

If in the linear system Ax = b the matrix A is square and nonsingular, then Cramer's rule establishes the existence of a unique solution for any vector b. The only problem remaining in this case is to find the most efficient way to solve for X. This was the topic of the previous chapter. On the other hand, if A is singular or non-square, the linear system may or may not have a solution. The goals of this chapter are to find the conditions under which the linear system has a solution and to determine what kind of solution (or solutions) there is. In pursuit of these goals, we will first prove Sylvester's theorem, namely, that the rank r^ of C = AB obeys the inequality r^ < min(r^, r^), where r^ and r^ are the ranks of A and B, respectively. This theorem can be used to establish that r^ = rj^{oT rg) if A (or B) is a square nonsingular matrix. A by-product of our proof of Sylvester's theorem is that |C| = |A| |B| if A and B are square matrices of the same dimension. Thus, the determinant of the product of square matrices equals the product of their determinants. We will then prove the solvability theorem, which states that Ax = h if and only if the rank of the augmented matrix [A, b] equals the rank of A. When a solution does exist for an m x « matrix, the general solution will be of the form X = Xp -I- Yl'i=[ ^i^h^^ where Xp is a particular solution, i.e.. Ax = b, and the vectors x^^\ ..., x^""'^^ are «—r linearly independent solutions of the homogeneous equation Ax^ = 0 (the c, are arbitrary numbers).

123 124 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

As a by-product of the solvability theorem, we will also prove that if A is an m xn matrix, of rank r^, then A contains r linearly independent row vectors. The remaining n —r column vectors and m — r row vectors are linear combinations of the r linearly independent ones. It follows from these properties that the number of linearly independent column vectors in the set {a,,..., a,J is equal to the rank of the matrix A = [a^,..., a„]. Gauss or Gauss-Jordan transformation of A provides an efficient way to determine the rank of A or the number of linearly independent vectors in the set {aj,..., a„}. In the last section, we will restate the solvability theorem as the Fredholm alternative theorem. That is, Ax = b if and only if b is orthogonal to z (i.e., b^z = 0) for any solution of the homogeneous equation A^z = 0, where A^ is the adjoint of A. The advantage of the Fredholm alternative form of the solvability theorem is that it is generalizable to abstract linear vector spaces (e.g., function spaces in which integral or differential operations replace matrices) where the concept of the rank of a matrix does not arise.

4.2. SYLVESTER'S THEOREM AND THE DETERMINANTS OF MATRIX PRODUCTS

Recall from Chapter 1 that an rth-order determinant M\fir) of the matrix formed by striking m—r rows and n — r columns of an m x n matrix A is called an rth-order minor of A. For example, consider the matrix

A = ^22 ^23 -*24 (4.2.1)

^32 ''33 ^34 If we strike the second row and the second and third columns

«11 «14 (4.2.2) L «31 «34 J we obtain the second-order minor

«11 ^14 Mf = (4.2.3) ^31 ^34

n, we obtain the third-c

^12 «13 r(3) _ «21 ^22 ^23 (4.2.4)

«3 1 «32 « 33 We previously defined the rank r^ of a matrix A as the highest order minor of A that has a nonzero determinant. For instance, the rank of

1 2 1 (4.2.5) 1 1 SYLVESTER'S THEOREM AND THE DETERMINANTS OF MATRIX PRODUCTS 125

is 2 since striking the third column gives the minor

1 2 M<^> = -1. (4.2.6) 1 1

The rank of

1 1 1 (4.2.7) 1 1 1

on the other hand, is 1 since striking one column gives

1 1 M^^> = 0, (4.2.8) 1 1

and striking two columns and one row gives

M^^^ = |l| = (4.2.9)

One of the objectives of this section will be to relate the rank of the product of two matrices to the ranks of the matrices; i.e., if C = AB, what is the relationship among r^ and r^ and r^? If A is a /? x n matrix with elements a^j and B is an n X q matrix with elements b^j, then C is a /? x ^ matrix with elements

^ij = H^ikhj^ / = 1,..., /7, y = 1,. .,, ^. (4.2.10) k=i Since only a square matrix has a determinant, we know from the outset that r^ < min(/7, n), TQ < min(n, q), and r^ < min(/?, q). There are, of course, cases where

2 0 1 0 2 0 and B then C = (4.2.11) 0 1 0 2 0 2

and so r^ = r^ = r^ = 2. However, there are also cases where r^^ < r^ and r^; e.g., if

1 1 1 0 0 0 A = and B = , then C = (4.2.12) 0 0 -1 0 0 0

and so r^ = 0, r^ = rj5 = 1. Similarly, if

1 1 1 1 1 010 A = and B = then C = 00 -1 0-1 000 (4.2.13)

and so r^ = 1, r^ = 1, and r^ = 2. The question is whether there are other possibilities. The answer is provided by 126 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

SYLVESTER'S THEOREM. If the rank of A is r^ and the rank o/B is r^, then the rank r^^ ofC = AB obeys the inequality

The proof of the theorem is straightforward though somewhat tedious. If A and B are /? X n and n x q matrices, then

Cii Ci C = (4.2.15)

Si ^2 PQ J

where the elements c^j are given by Eq. (4.2.10). Let the indices h^,... ,h^ and A:,,..., /:,. denote the rows and columns of M^'\ an rth-order minor of C given as

^h,k, (^h,k2 ^h^k. M^^' = (4.2.16)

^h,ki ^h.kj ' • • ^Kk,

Since C;,.^, = ^",=1 ^j.bj^kr ^c can be rewritten as

2Z%h^hk, ^h,k2 ^h^K

M,(r)

Jl^Kh^hk, %k2 ^Kk,

^hJx^j.k, %k2 ^h.kr (4.2.17) E J] ^hX

^h,K

h ^Kh ^Kh ^hrK In accomplishing the succession of equations in Eq. (4.2.17), we have utilized the elementary properties 6 and 4 of determinants discussed in Chapter 1. We can continue the process by setting c^^i^^ - I]" =1 ^hj2^hh ^^^ carrying out the same elementary operations to obtain

%h %J2 ^h^k. ^h,kr (4.2.18)

%h %J2 ^Kh ^h,k. SYLVESTER'S THEOREM AND THE DETERMINANTS OF MATRIX PRODUCTS 127

Continuation of the process finally yields

%h %J2 •"hxjr (4.2.19)

^Kh ^Kh ' ' • ^Kjr =E--- E b,,^...b,,Mr,

where we have identified that the remaining determinant is an rth-order minor M (r) of A. Note that if r > r^, then the minor Mf(r)^ is 0 by the definition of r^. Thus, Eq. (4.2.19) proves that

rc<^^. (4.2.20)

Beginning again with Eq. (4.2.16), we can express the elements of the first row of M^c^ as

(4.2.21)

71 = 1

and then use elementary properties to obtain

(42,22) M[^^ = E^hiJi-

^h,kx ^hJi ^h,kr

Continuation of this process with the other rows eventually yields

<-E--- E «*,;.•--"^.y, J\ h¥J\-Jr- (4.2.23) ^hh ''' ^J,K

Again, the rth-order minor Mg of B will be 0 if r > r^, and so we conclude

rr < rn. (4.2.24)

The combination of Eqs. (4.2,20) and (4.2.24) implies Sylvester's theorem, Eq. (4.2.14). 128 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

In the special case that A and B are square n x n matrices

c = (4.2.25) E^«;i^-,l E%2^22

and elementary operations similar to those yielding Eq. (4.2.23) give

b b; hr ;i2 (4.2.26) ici=i:--- E 'l7l Jr1^h-Jr-I ^Jnl ^;«2 Jnn The determinant above involving the b^j elements will be nonzero only if the integers Jx, ji^ • - - ^ Jn ^^e a permutation of 1, 2,..., n. Otherwise, two rows of the determinant would be the same. Moreover, the determinant in Eq. (4.2.26) is equal to (—1)''|B|, where P is the number of transpositions necessary to reorder the integers 7i, 72' • •» Jn to the sequence 1, 2,..., n (thereby rearranging the columns to the proper order to get |B|). Thus, Eq. (4.2.26) can be rewritten as

ici = iBix:...i:(-i)%. ...fl,njn' (4.2.27)

Noting that the factor to the right of |B| is, by definition, the determinant |A|, we have proved the following:

THEOREM. If A and B are square matrices, the determinant of their product C is the product of their determinants, i.e.,

|C| = |A||B|. (4.2.28)

This result is not only useful in evaluating products of determinants, but it is also frequently employed in proving theorems concerning matrices and linear equations. For example, if A is nonsingular, i.e., |A| ^ 0 and A~^ exists, then AA-* = I and Eq. (4.2.28) implies lAA'^l = |A| |A-^| = |I| = 1, or |A-*| = 1/1 A|. This greatly simplifies finding the determinant of the inverse of A. Before leaving this section, let us establish a corollary to Sylvester's theorem that is very useful in the theory of solvability of Ax = b.

COROLLARY. The multiplication of a matrix B by a square nonsingular matrix A does not change the rank ofB; i.e., ifC = AB, and if A is square and nonsingular, then

rc=^r,. (4.2.29)

The proof is simple. If A is a nonsingular n x n matrix, then its rank r^ is n since |A| :J^ 0, and so is the rank of A"^ since |A~^| ^ 0. According to Sylvester's theorem,

re < min(r^, r^) = min(n, r^). (4.2.30) GAUSS-JORDAN TRANSFORMATION OF A MATRIX 129

But

B=:A 'C (4.2.31)

for which Sylvester's theorem requires

Tg < min(r^-i, r^^) = min(n, r^). (4.2.32)

Equations (4.2.30) and (4.2.32) combine to imply that r^^ < Vj^ and r^ < r^^, which can only be true if r^ —r^, thus proving the corollary. A similar proof establishes that the rank of BA is the same as that of B if A is a square nonsingular matrix.

EXAMPLE 4.2.1. Consider the matrices

-2 1 -2 A = B (4.2.33) 1 -1 2

and

AB (4.2.34)

We note that since |A| 3, r^ =2, and since the order of the only nonzero minor of B is 1, r^ = 1. The same is true of AB and so r^^ 1. This is admittedly a rather trivial example, but in the next sections we will flex the true muscle of the corollary.

4.3. GAUSS-JORDAN TRANSFORMATION OF A MATRIX

Consider the nonsingular matrix

^11 a,2

«2l «22 (4.3.1)

^31 «32

Assuming a^^ 7^ 0, the first step in Gauss elimination yields

^12

0 ^22 ^23 (4.3.2)

0 hi ^33

where fc,y = a^^ (^n/^ii)^U» ' = 2» ^' J = 1' 2' ^- If ^22 = ^» ^^ interchange rows 2 and 3 (one of ^22 ^^^ ^32 ^^^^ to be nonzero since r^ = 3) to obtain

b'^r. b^ 0 ^22 23 (4.3.3)

0 b'..^32 b'3 3 I 3 0 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

Next, we multiply row 2 by 012/^22 ^^^ subtract it from the first row to obtain

«ii 0 c,3 0 b',^ b',, (4.3.4) 0 b',2 ^33 We then cany out Gauss elimination on the third row to get

0 b'22 ^23 (4.3.5)

0 0 C33

Multiplying the third row by C13/C33 and subtracting it from the first row, and multiplying the third row by ^23/^33 ^^^ subtracting it from the second row, we finally arrive at

«u 0 0

0 ^22 0 (4.3.6) 0 0 Cr

The process we just described leading to the result in Eq. (4.3.6) is known as Gauss-Jordan elimination. If we wish to solve the system of equations

^21^1 + <^22-^2 + ^23-^3 ~ ^2 (4.3.7)

a^iXi + ^732^2 + ^33X3 = ^3

represented by

Ax = b, (4.3.8)

where r^ = 3, then the Gauss-Jordan transformation of the augmented matrix [A, b] yields

«n 0 0 «l1 0 ^22 0 «2 (4.3.9)

0 0 •^33 «3 J where a^ are the values of b^ obtained through the transformation process. The linear equations corresponding to Eq. (4.3.9) are then

a,ix, = a,

^22^2 (4.3.10)

^33-^3 — ^3'

which leads to the simple solution x^ = cci/a^^, X2 = ^2/^22' ^^^ -^3 = ^3/<^33- GAUSS-JORDAN TRANSFORMATION OF A MATRIX 131

For the general nonsingular problem Ax = b, the Gauss-Jordan transformation converts the equations

anXi+ai2X2-}-'"+a^„x^ = b^

(4.3.11)

(^n\Xi + a„2^2 + • • • + a„,x„ = b.

into the system

«ii^i = ai Cl'yyjC'y (4.3.12)

for which the solution is simply jc, = oii/a-f. In Eq. (4.3.12) the quantities a'.- and a, are values of the original aij and b^ obtained through the transformation process. We see that for nonsingular problems the advantage of Gauss-Jordan elimination over Gauss elimination is that the backward substitution step is eliminated (or rendered trivial since jc, = «//«-,). A Mathematica routine is provided in the Appendix for the Gauss-Jordan algorithm. If we use a complete pivoting strategy for numerical stability, then column interchange, as well as row interchange, could occur. Suppose a column exchange occurs at the second step. Then the matrix in Eq. (4.3.2) would be

0 fo^2 fei2 3 (4.3.13) 0 bL b'..

with a[2 = <3i3, ajj = a^2^ b'22 = ^23* ^^c- The corresponding system of equations can be expressed as

«llJl+«l23'2+^i3>'3 =«1 ^223^2 + ^23)^3 = ^2 (4.3.14) '^32>'2 + fc33}^3=«3'

where y^ = jc,, }^2 = •^3» ^^d y'^ = X2. Thus, Gauss-Jordan elimination with row and column interchange transforms Eq. (4.3.7) into the system

Atry = a (4.3.15) 132 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

^uy\ ^iiyi (4.3.16)

^333^3 = «3

The components }^i, ^2' ^3 of y are simply a reordering of the components x^,X2,x^ of X. The conespondence between the components of x and y is easily determined by keeping track of the column interchanges. To do this symbolically, let us define a matrix 1,^, which, when multiplied by x, will interchange components i and j. For a four-dimensional problem, an example is

10 0 0 0 0 10 l23 = (4.3.17) 0 10 0 0 0 0 1

from which it is easy to show that

L,x (4.3.18)

L-^4 J

For this four-dimensional case, it is easy to see that the matrix I^y, which interchanges components jc- and Xj through the product I,^x, is obtained by interchanging columns / and j of the unit matrix I. This turns out to be generally true. That is, for an «-dimensional case, the matrix 1,^, obtained by interchanging columns / and j of the n X n unit matrix I, interchanges rows / and j of x through the multiplication I^^x. To see how the 1,^ might be useful, imagine a Gauss-Jordan elimination process in which columns 1 and 3 are interchanged followed by columns 2 and 5 and then columns 3 and 6. In the first interchange x is transformed into I13X, in the second into I25I13X, and in the third into I36I25I13X, so that, in the equation Aj^y = a,

y = Qx. (4.3.19)

We can see that the determinant of I,^ is equal to —1, since |Ij^| differs from |I| only by the interchange of two columns. This implies that the matrix Q is also nonsingular. For the Q defined in Eq. (4.3.19), |Q| = II36III25III13I = — 1, and since Q is nonsingular, the solution x to Ax = b is uniquely determined as X = Q~^y from the solution to A^j = a. GENERAL SOLVABILITY THEOREM FOR Ax = b 133

4.4. GENERAL SOLVABILITY THEOREM FOR Ax = b

Although up to this point we have concentrated mostly on solving Ax = b for nonsingular matrices A, the more general problem is of substantial practical and theoretical interest. Consider the system of equations

^21^1 + %2-^2 + '"+ C^ln^n = ^2 (4.4.1)

Ax = b, (4.4.2)

where A is an m x n matrix with elements a,^, x is an n-dimensional vector, and b is an m-dimensional vector. We will consider the general situation in which m and n need not be the same and the rank r of A will be less than or equal to min(m, n). Square nonsingular and singular matrices constitute special cases of the general theory. A non-square problem may or may not have a solution and when it does have a solution it is may not be unique. For instance, the system

(4.4.3) 2xi H- A:2 + 2x3 = 2

has the solution

jCi = -l-X3, JC2=0 (4.4.4)

for arbitrary JC3, and thus a solution exists but is not unique. On the other hand, the system

JC, -f- 2X2 -f- JC3 = 1 (4.4.5) 2JCI + 4JC2 + 2JC3 = 1

has no solution since multiplication of the first equation by 2 and subtraction from the second equation yields the contradiction

0=-l. (4.4.6)

Thus, the equations in Eq. (4.4.3) are compatible, whereas those in Eq. (4.4.5) are not. The theory of solvability of Ax = b must address this compatability issue for the general case. 134 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

As an aid in examining the general case, we will first discuss Gauss-Jordan elimination for the 3 x 4 matrix

A = ^21 -^22 ^23 ^24 (4.4.7)

'*32 ^33 ''34

At this point, we know nothing about A except the values of a^j. For the elimination process, we note that the matrix I^y defined in the previous section has the useful properties that pre-multiplication of A by I,y (i.e., 1,^ A) interchanges rows / and j and post-multiplication of A by I^^ (i.e., AI,y) interchanges columns / and j. For example, if

0 1 0 1 0 0 (4.4.8) 0 0 1

then it is easy to see that

0 1 0

I,2A = 1 0 0 '*22 '*23 ^24

0 0 1 ^32 ^33 '*34 (4.4.9) a 21 a 22 '*23 «24 ^14

*32 ^33 «34

And if

0 1 0 0 1 0 0 0 Ii. = (4.4.10) 0 0 1 0 0 0 0 1

then

0 1 0 0 1 0 0 0 AI,2 = ^22 ^23 ^24 0 0 1 0 ^31 ^32 ^33 ^34 0 0 0 1 (4.4.11)

^22 ^21 ^23 ^24

^32 ^33 -*34

Note that, for an m x « matrix A, the matrix I,y has to be an m x m matrix for pre-multiplication and an n x n matrix for post-multiplication. GENERAL SOLVABILITY THEOREM FOR Ax = 6 135

For Gauss-Jordan elimination, one other matrix is needed. This is the square matrix J^jik), i ^ j, which has elements 7*;, = 1 for / = 1,..., n, j^j = k, and ji^ = 0 if/ 7«^ m and Im ^ ij. To generate Jij(k), we can start with the unit matrix I and add k as the [ij ]ih element. For example, the 3 x 3 version of J23(fc) is

1 0 0 J23W = 0 1 k (4.4.12) 0 0 1

Notice that the product of J23(^) with the matrix defined by Eq. (4.4.7) is

^14

J23(^)A = «21 H- ^«31 «22 + ^«32 ^23 + ^^33 ^24 -hka 34 (4.4.13) «3l ^32 ^33 a3 4 Thus, pre-multiplying A by Jij(k) produces a matrix in which the product of k and the elements of the jth row are added to the elements of the /th row. According to the elementary properties of determinants, the determinants of A and }fj{k)A are the same. Since Jfjik) can be generated by multiplying the /th column of I by ^ and adding this to the yth column, it follows that the determinant of Jij(k) is 1. We also note that the inverse of J,y(A:) is J,^(—A;) because

lj(-k)lj(k) = l, (4.4.14)

by inspection. Since I,yl,y = I, the matrix I,y is its own inverse. The matrices I,^ and 3jj{k) are the tools we need to establish the solvability of any linear system represented by Ax = b. In Gauss-Jordan elimination, the matrix A is transformed into A^^ by multiplication of the square nonsingular matrices 1,^ and J,^. It is important to remember that this will not change the rank of A. Consider the matrix defined by Eq. (4.4.7). If the rank r of A is not 0, a nonzero element can be put at the {11} position by the interchange of rows and/or columns. As was shown above, this can be accomplished by pre- and/or post-multiplying A by the appropriate 1,^ matrices. For sim- plicity of discussion, suppose that r > 0 (otherwise, A = [0]) and that fl|, ^ 0. Then the first step of Gauss elimination is accomplished by the matrix operation

«n a,2 a,3 «14 0 ^22 ^23 &24 (4.4.15) '<-'&- «31 «32 «33 «34

where

0 0 1 0 (4.4.16) 0 1 136 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

and b22 = <^22 ~ ^12^21/^11' ^23 — ^23 ~ ^13^21/îi» ^^^ ^24 ~ ^24 ~ ûîi/îi' ^^ the next step, Eq. (4.4.15) is multiplied by J3i(—%i/an) to obtain

«11 «12 ^13 «14 1 (4.4.17) ^"{-"tM-'&-0 ^22 ^23 ^24 0 ^32 ^33 ^34 J If the rank r^ = 1, then all the elements b^j in Eq. (4.4.17) are 0 and the transformed matrix is simply

«11 «12 «13 «14

Ir — 0 0 0 0 (4.4.18) 0 0 0 0

If r^ > 1, then at least one of the elements b^j has to be nonzero. By pre- and/or post-multiplication of Eq. (4.4.17) by the appropriate 1,^ matrices, row and/or column interchanges can be carried out to place a nonzero element at the {22} position. For illustration, assume that ^22 ^^ ^32 ^^ ^23 ^^ ^ ^nd Z733 ^ 0. Then

^11 ^13 «12 «14 (4.4.19) IM23J3J 1 {-'tM-'t>"- 0 ^33 0 ^34 0 0 0 boA

Pre-multiplication of the above matrix by Ji2(—^13/^33) yields

0 ^^33 0 ^34 (4.4.20)

0 0 0 ^24 If the rank r^ of A is 2, then ^24 = ^^ i^., the transformed matrix is

Atr = 0 ^33 0 ^34 (4.4.21) 0 0 0 0

If, however, r^ 3, then ^^24 7^ 0, and post-multiplication of Eq. (4.4.20) by I34 yields

0 ^33 ^34 0 (4.4.22)

0 0 -^24 0

Pre-multiphcation by Ji3(—^14/^24) followed by J23(—^34/^24) gives the transformed matrix

0 0

0 -^33 0 0 (4.4.23)

0 ^34 0 GENERAL SOLVABILITY THEOREM FOR Ax = b 137

Equations (4.4.18), (4.4.21), and (4.4.23) are the Gauss-Jordan transformations of A for the cases r^ = 1, 2, and 3, respectively (with the special situation conditions ^722 = ^32 = ^23 = ^ introduccd for illustration purposes). The relationship between Ajr, say in the case of Eq. (4.4.23), is

PAQ, (4.4.24)

where i-m-^M-^M-'tM-'t) Gi^a 1 + 13"31 a^xc21*-1 4 «]3 -1 «ii^:3 3 + aub11*^3 4 ^34 ^33 (4.4.25) ^21^24 ^24 1 ^11^34 ^34

and 1 0 0 0 0 0 0 1 (4.4.26) 23*34 — 0 1 0 0 0 0 1 0 P is a product of nonsingular 3x3 matrices, and so it is a nonsingular 3x3 matrix itself. Likewise, Q is a nonsingular 4x4 matrix. For the particular cases resulting in Eqs. (4.4.25) and (4.4.26), we find |P| = -1, |Q| = -<1. From this 3x4 example, the matrix transformation in the Gauss-Jordan elimination method becomes clear for the general matrix

^21 ^22 '*2n A = (4.4.27)

If the rank of A is r, then, through a sequence of pre- and post-multiplications by the I,y and J^y matrices, the matrix can be transformed into

Yu 0 • 0 Ki, r+1 •• Yxn

0 K22 • • 0 ¥2. /-fl " Yin

A„ = PAQ = 0 0 • • Yrr Yr.r+ 1 Yrn (4.4.28) 0 0 0 0 0

0 0 I 3 8 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

in which P is a product of the I,y and J,y matrices and Q is a product of the I,^ matrices, and where y^i = 0 for / 7^ j and /, 7 = 1,,.., r and y^j = 0 for / > r. The values of y^j fori < r, j > r may or may not be 0. Thus, the matrix A^^ can be partitioned as

Aj A2 A, (4.4.29) O3 O4

where A| is an r x r matrix having nonzero elements y,, only on the main diagonal; A2 is an r X (n—r) matrix with the elements y,^, / = 1,..., r and 7 = r + 1,..., /i; O3 is an (m —r)xr matrix with only zero elements; and O4 is an im — r)x(n — r) matrix with only zero elements. Since the determinants |P| and |Q| must be equal to ±1, the rank of A^^ is the same as the rank of A. Thus, the Gauss-Jordan elimination must lead to exactly r nonzero diagonal elements y^,. If there were fewer than r, then the rank of A would be greater than r. Either case would contradict the hypothesis that the rank of A is r. The objective of this section is to determine the solvability of the system of equations

aji^l +'3j2^2 + ---+^ln-^;i =^1

(4.4.30)

represented by

Ax = b. (4.4.31)

If P and Q are the matrices in Eq. (4.4.28), it follows that Eq. (4.4.31) can be transformed into

PAQQ X = Pb (4.4.32)

At,y = a, (4.4.33)

where

y = Q-*x and a = Pb. (4.4.34) GENERAL SOLVABILITY THEOREM FOR Ax = b 139

In tableau form Eq. (4.4.33) reads

(4.4.35)

0 =«,+ ,

0 =a^.

If the vector b is such that not all or,, where i > r, are 0, then the equations in Eq. (4.4.35) are inconsistent and so no solution exists. If a, = 0 for all / > r, then the solution to Eq. (4.4.35) is

3^1 = -Pl,r+iyr+l Puyn + (^i/Yn

yi = -ft.r-hlJr+l ^yn + «2/K22 (4.4.36)

yr = -A-,r+l3^r+l Prnyn + «r/Kr, for arbitrary values of y^_^i,..., y„ and where

a _ ^U (4.4.37)

Of course, if Eq. (4.4.35) or Eq. (4.4.33) has a solution y, then the vector x = Qy is a solution to Ax = b (i.e., to Eq. (4.4.31)). Let us explore further the conditions of solvability of Eq. (4.4.33). The augmented matrix corresponding to this equation is

Yn 0 0 •• • 0 n, r+l •• Yin «! 0 K22 0 •• • 0 Yi r+1 " Yin «2

[At„a] = 0 Yrr Yr,r+ l Yrn a. (4.4.38) 0 0 0 0 a,.

0 0 0

As indicated above, Eq. (4.4.33) has a solution if and only if or, = 0, i > r. This is equivalent to the condition that the rank of the augmented matrix [Aj^, a] is 140 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

the same as the rank of the matrix A^^—which is the same as the rank r of A. If Af^y = a has a solution, so does Ax = b. When the rank of [Ajj., a] is r, what is the rank of the augmented matrix [A, b]? From the corollary of the previous section, it follows that the rank of C, where

C = P[A, b] = [PA, Pb] = [PA, a], (4.4.39)

is the same as the rank of [A, b] since P is a square nonsingular matrix. Note also that the augmented matrix

[A,„a] = [PAQ,a] (4.4.40)

has the same rank as C since PAQ and PA only differ in the interchange of columns—an elementary operation that does not change the rank of a matrix. Thus, the rank of [A, b] is identical to the rank of [Atj., a]. Collectively, what we have shown above implies the solvability theorem:

THEOREM. The equation Ax = b has a solution if and only if the rank of the augmented matrix [A, b] is the same as the rank of A.. Since the solution is not unique when A is a singular square matrix or a non- square matrix, we need to explore further the nature of the solutions in these cases. First, we assume that the solvability condition is obeyed (a, = 0, / > r) and consider again Eq. (4.4.36). A particular solution to this set of equations is

Ji = —, / = 1,. .. ,r, Yii (4.4.41) }\=0, r=r + l,... , n,

yii

Yrr (4.4.42)

0 However, if yp is a solution to Eq. (4.4.36), then so is yp + y^, where y^^ is any solution to the homogeneous equations

y\ = -^l,r+iyr+l AnJ/t (4.4.43)

yr = -Pr,r+\yr^\ Pmyn' GENERAL SOLVABILITY THEOREM FOR Ax = b 141

One simple solution to this set of equations can be obtained by letting y^_^^ = \ and y, = 0, / > r + 1. The solution is

~A,/-fl yi" 1 (4.4.44) 0

Similarly, choosing y^^^ = 0, y^^2 = 1' ^"^ yi = 0, / > r + 2, gives the solution

~Pr,r+2 0 yf (4.4.45) 1 0

Using this method, the set of solutions yj(1,) , •.., y},^An-r)" can be generated in which

-A, r+j

~Pr,r+j 0 yH' (4.4.46)

(r -f 7)th row

for 7 = 1,..., n — r. There are two important aspects of these homogeneous solutions. First, they are linearly independent since the (r -f 7)th element of y^^^ is 1, whereas the (r + 7)th element of all other y^^\ k j^ 7, is 0, and so any linear combination of y^^\ k ^ 7, will still have a zero (r + j)th element. The second aspect is that any solution to Eq. (4.4.43) can be expressed as a linear combination of the vectors yl^\ To see this, suppose that y^^,,..., j„ are given arbitrary values I 42 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

in Eq. (4.4.43). The solution can be written as

- Prnyn yh + 0 + 0

0 + 0 + yn (4.4.47)

which, in turn, can be expressed as

(4.4.48) 7 = 1

The essence of this result is that the equation A^^y = ct has exactly n — r linearly independent homogeneous solutions. Any other homogeneous solution will be a linear combination of these linearly independent solutions. In general, the solution to A^^y = ot can be expressed as

y = yp + E^iyh0 ) (4.4.49) ;=i

where yp is a particular solution obeying Aj^yp = a, yl;0) are the linearly independent solutions obeying At^y^^^ = 0, and Cj are arbitrary complex numbers. Because y^^^ is a solution to the homogeneous equation Aj^yh = 0, the vectors xj,'^ = Qy^^^ are solutions to the homogeneous equation Ax^^^ = 0. The set {x^^^}, 7 == 1,. .., « — r, is also linearly independent. To prove this, assume that the set is linearly dependent. Then there exist numbers aj,.. ., a„_^, not all of which are 0, such that

0. (4.4.50)

But multiplying Eq. (4.4.50) by Q"^ and recalling that yj,^^ = Q~^x;^^ yields

.0') E^^yh 0. (4.4.51) 7 = 1

Since the vectors y\^^JD are linearly independent, the only set [aj] obeying (4.4.51) is a^ ==...= a,^_^ = 0, which is a contradiction to the hypothesis that the x^^^'s are linearly dependent. Thus, the vectors xj^^ 7 = 1,..,, n - r, must be linearly independent. GENERAL SOLVABILITY THEOREM FOR Ax = b 143

We summarize the findings of this section with the complete form of the solvability theorem:

SOLVABILITY THEOREM. The equation

Ax = b (4.4.52)

has a solution if and only if the rank of the augmented matrix [A, b] is equal to the rank r of the matrix A. The general solution has the form

x = Xp + Xlc^-.0) (4.4.53)

where Xp is a particular solution satisfying the inhomogeneous equation AXp = b, the set x^^ consists ofn — r linearly independent vectors satisfying the homogeneous equation Axj^ = 0, and the coefficients Cj are arbitrary complex numbers. For those who find proofs tedious, this theorem is the "take-home lesson" of this section. Its beauty is its completeness and generality. A is an m x n matrix, m need not equal n, whose rank r can be equal to or less than the smallest of the number of rows m or the number of columns n. If the rank of [A, b] equals that of A, we know exactly how many solutions to look for; if not, we know not to look for any; or if not, we know to find where the problem lies if we thought we had posed a solvable physical problem. EXAMPLE 4.4.1. Consider the electric circuit shown in Fig. 4.4.1. The V's denote voltages at the conductor junctions indicated by solid circles and the con- ductance c of each of the conductors (straight lines) is given. The current / across a conductor is given by Ohm's law, / = cAV, where AV is the voltage drop between conductor junctions. A current i enters the circuit where the voltage is V, and leaves where it is VQ. The values of Vj and VQ are set by external conditions.

V/ (1) Ki Ye (2) Vo

FIGURE 4.4.1 144 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

From the conservation of current at each junction and from the values of the con- ductances, we can determine the voltages V,,..., Vg.

The conservation conditions at each junction are

2(V2 -VO + {V,- Vi) + 5(V3 - y,) + 4{V, - V,) + (V, - V,) = 0 2(V, - V2) + (n - V2) = 0 3(^6 - V3) + 5(V, -V,)=0 4(Vi - V4) + 2(Vs - F4) = 0 (4.4.54) 2(V4 -V,) = 0 (V, - V,) + (V2 - Ve) + 3(^3 - V,) + 2(Vo - V,) = 0 3(^8 -Vj)=0

-13^1+2^2 + 5^3+4^4 +V6 •Vi 2V,-3V2 +V, = 0 5V, -8V3 +3V^ = 0 4y, - 6y4 + 2y5 = 0 (4.4.55) 2y4 - 2y5 = 0 y, +y,+ 3y3 -7K -2yn 3y7 + 3yg = 0,

which, in matrix form, is written as

AV = b, (4.4.56)

where

13 2 5 4 0 1 0 0 2 -3 0 0 0 1 0 0 5 0 -8 0 0 3 0 0 4 0 0 -6 2 0 0 0 (4.4.57) 0 0 0 2 -2 0 0 0 1 1 3 0 0 -7 0 0 0 0 0 0 0 0 -3 3 GENERAL SOLVABILITY THEOREM FOR Ax = b 145

and

0 0 0 b = (4.4.58) 0 0 -2Vn

We will assume that Vj = 1 and VQ = 0 and use Gauss-Jordan elimination to find the voltages Vj,..., Vg. Note that since A is a 7 x 8 matrix, its rank is less than or equal to 7, and so if the system has a solution it is not unique. The reason is that the conductor between junctions 7 and 8 is "floating." Our analysis will tell us what we know physically, namely, that voltages Vj and Vg are equal to each other but are otherwise unknown from the context of the problem.

Solution. We perform Gauss-Jordan elimination on A, transforming the problem into the form of Eq. (4.4.32). The resulting transformation matrices P and Q are

1729 507 1495 1729 1729 1105 303 101 303 303 303 303 0

105 2450 1260 105 105 1085 101 1313 1313 101 101 1313 0

4715 1476 5945 4715 4715 3854 2121 707 2121 2121 2121 2121 0

P = 37506 32994 6486 23547 23547 4794 (4.4.59) 20705 20705 4141 8282 8282 4141 0

57988 5668 50140 91015 157069 37060 128169 14241 128169 128169 128169 128169 0

85 93 94 85 85 109 109 109 109 109 1 0

and

Q = l8. 146 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

the n = 8 identity matrix. The matrix A then becomes

3 0 0 0 0 0 0 0

35 0 13 0 0 0 0 0 0 41 0 0 7 0 0 0 0 0 PAQ 846 0 0 0 205 0 0 0 0 436 0 0 0 0 423 0 0 0 303 0 0 0 0 0 109 0 0 0 0 0 0 0 0 -3 3 (4.4.60) and the vector b is transformed into

./22iOT/ 1229 T/\ V 303 *^0 + 303 1/

V1313 *^C 4- 101 ^i)

1715 17 \ V2121 *^0+ 2121 '^1/

/ 9588 1/ 37506 y \ a = Pb V4141 ^O + 20705 ^i) (4.4.61)

74120 y 57988 y\ -( 128169 ^O + 128169 ^l)

-(2Vo -h 109 *^I/

0 Since for this example y = x, the voltages can be obtained by inspection. Substituting Vj = 1 and VQ = 0, we find Vi = 0.439 V2 = 0.386 V^ = 0.380 (4.4.62) V4 = 0.439 V5 = 0.439 V6 = 0.281. The last equation in A^^y = ct reduces to V7 = Vg. Thus, the values of Vj and Vg cannot be determined uniquely from the equation system. By Gauss-Jordan elimination, we found that the rank of A is equal to the number of rows (7) and by augmenting A with b the rank does not change. Therefore, although a solution does exist, it is not a unique one since A is a non-square matrix (and m < n). EXAMPLE 4.4.2 (Chemical Reaction Equilibria). We, of course, know that molecules are made up of atoms. For example, water is composed of two hydrogen atoms, H, and one oxygen atom, O. We say that the chemical formula for GENERAL SOLVABILITY THEOREM FOR Ax = b 147

water is H2O. Likewise, the formula for methane is CH4, indicating that methane is composed of one carbon atom, C, and four hydrogen atoms. We also know that chemical reactions can interconvert molecules; e.g., in the reaction

2Ho + O, = 2H,0, (4.4.63)

two hydrogen molecules combine with one molecule of oxygen to form two molecules of water. And in the reaction

2CH4 + O2 = 2CH3OH, (4.4.64)

two molecules of methane combine with one molecule of oxygen to form two molecules of methanol. Suppose that there are m atoms, labeled a,,..., a,„, some or all of which are contained in molecule Mj. The chemical formula for Mj is then

M^ = (a, )„,,(a2)„,,..., («,„)„„. (4.4.65)

Mj is thus totally specified by the column vector

'2j (4.4.66)

For example, if H, O, and C are the atoms 1, 2, and 3, and methane is designated as molecule 1, then the corresponding vector

a, = (4.4.67)

tells us that the formula for methane is (H)4(0)o(C)j or CH4. If we are interested in reactions involving the n molecules M^ .. ., M„, the vectors specifying the atomic compositions of the molecules form the atomic matrix

'*22 A = [ai,...,aJ (4.4.68)

^7n2 We know from the solvability theory developed above that if the rank of A is r, then only r of the vectors of A are linearly independent. The remaining n — r vectors are linear combinations of these r vectors; i.e., if {a^,..., a^} denotes the set of linearly independent vectors, then there exist numbers p^j such that

^k = T.Pkj^j^ k = r -\- \,.. . ,n. (4.4.69) j=i I 48 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

These equations represent chemical reactions among the molecular species. Since each vector represents a different molecule, Eq. (4.4.69) implies that the number of independent molecular components is r and a minimum of n — r reactions exists for all of the different molecular species since each equation in Eq. (4.4.69) contains a species not present in the other equations. As an application, consider the atoms H, O, and C and the molecules H2, O2, H2O, CH4 and CH3OH. The atomic matrix is given by

H, O2 H2O CH4 CH3OH H 2 0 2 4 4 0 0 2 1 0 1 C 0 0 0 1 1

The rank of this matrix is 3, and so there are three independent molecular components and there have to be at least 5 — 3 = 2 reactions to account for all the molecular species. Hj, O2, and CH4 can be chosen to be the independent components (because aj, 82, and Si^ are linearly independent) and the equilibrium of the two reactions in Eqs. (4.4.63) and (4.4.64) suffice to account theimodynamically for reactions among the independent species. H2, O2, and H2O cannot be chosen since 33 = BJ + a2/2, reflecting the physical fact that carbon is missing from these three molecules. ILLUSTRATION 4.4.1 (Virial Coefficients of a Gas Mixture). Statistical mechanics provides a rigorous set of mixing rules when describing gas mixtures with the virial equation of state. The compressibility of a mixture at low density can be written as

z = l + B^,p + C^.p' + ... , (4.4.70)

where p is the molar density of the fluid and the virial coefficients, B^^i^, C^^^, etc., are functions only of temperature and composition. At low enough density, we can truncate the series after the second term. From statistical mechanics, we can define pair coefficients that are only functions of temperature. The second virial coefficient for an A^-component mixture is then given by

Br^. = T.Y.yiyjBijiT). (4.4.71)

where y, refers to the mole fraction of component /. We desire to find the virial coefficients for a three-component gas mixture from the experimental values of B^^^^^ given in Table 4.4.1. For a three-component system, the relevant coefficients are B^, ^22, ^33, B12, ^B, and B23. The mixing rule is given by

^.ix =y]Bu -f- 3^2^22 + J3 ^33 ^ GENERAL SOLVABILITY THEOREM FOR Ax = b 149

lABLE 4.4.1 Second Virial Coefficient at 200 K for Ternary Gas Mixture

^mlx Y\ Yi ¥3 10.3 0 0.25 0.75 17.8 0 0.50 0.50 26.2 0 0.75 0.25 13.7 0.50 0.25 0.25 7.64 0.75 0 0.25

We can recast this problem in matrix form as follows. We define the vectors

mix, 1

i5^ bs B„ X = (4.4.73)

By B.mix , 5 B 23 and the 5 X 6 matrix A by

yh yh yh >'l,l>'2,l yi.iXi.i y2,iy3.i yii yii y\.i yi.iyi,! >'l,2}'3,2 yz.iy^.i A = yh yh yh 3'l,3>'2.3 3'l,3>'3.3 3'2.3>'3,3 (4.4.74) y'U yU yh V|,4^2.4 >'l,4>'3,4 .V2,4>'3,4 yU yh yh >'l,53'2.5 >'l,53'3,5 3'2,53'3,5 where }\j refers to the ith component and the jth measurement in Table 4.4.1. Similarly, the subscripts in the components of b refer to the measurements. Solving for the virial coefficients has then been reduced to solving the linear system

Ax = b. (4.4.75) (i) Using the solvability theorems of this chapter, determine if a solution exists for Eq. (4.4.75) using the data in Table 4.4.1. If a solution exists, is it unique? Find the most general solution to the equation if one exists. If a solution does not exist, explain why. We can find the "best fit" solution to Eq. (4.4.75) by applying the least squares analysis from Chapter 3. We define the quantity

2 (4.4.76) 1=1 \ j=i / which represents a measure of the relative error in the parameters x,. The vector x that minimizes L can be found from the requirement that each of the derivatives of L with respect to x^ be equal to 0. The solution (see Chapter 3, Problem 8) is

A^Ax = A^b. (4.4.77) ISO CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

Here x contains the best fit parameters B^j for the given data. (ii) Show that Eq. (4.4.77) has a unique solution for the data given in Table 4.4.1. Find the best fit virial coefficients. How does this best fit solution III compare to the general solution (if it exists) found in (i)?

4.5. LINEAR DEPENDENCE OF A VECTOR SET AND THE RANK OF ITS MATRIX

Consider the set {ap ..., a„} of m-dimensional vectors. Whether this set is linearly dependent depends on whether a set of numbers {jc,,..., jc„}, not all of which are 0, can be found such that

(4.5.1)

7 = 1 We can pose the problem in a slightly different manner by defining a matrix A whose column vectors are a^,..., a„. In partition form, this is

L^l» ^2' • • • » ^«J- (4.5.2) Next, we define the vector x by

^2 X = (4.5.3)

Now we ask the question: Does the equation

Ax = 0 (4.5.4)

have a nontrivial (x ^ 0) solution? This equation can be expressed as

[a^, a2,..., a„J = 0 (4.5.5)

or, carrying out the vector product,

(4.5.6)

which is the same as Eq. (4.5.1) anXi + --- + ai„x„ =0

(4.5.7)

«ml^l+---+flmn-«„=0- LINEAR DEPENDENCE OF A VECTOR SET AND THE RANK OF ITS MATRIX 151

The question of whether the vectors aj,..., a„ are linearly dependent is thus seen to be the question of whether the homogeneous equation Ax = 0 has a solution, and the answer to this latter question depends on the rank of A. From what we have learned from the solvability theory of Ax = b, we can immediately draw several conclusions: (1) If A is a square nonsingular matrix, then X = 0 is the only solution to Eq. (4.5.4). Thus, a set of n n-dimensional vectors a^ are linearly independent if and only if their matrix A has a rank n, i.e., |A| ^ 0. (2) This conclusion is a subcase of a more general one: a set of n m-dimensional vectors a^ are linearly independent if and only if their matrix A has a rank n. This follows from the fact that Ax = 0 admits n — r solutions. (3) A set of n m-dimensional vectors a^ are linearly dependent if and only if the rank r of their matrix A is less than n. This also follows from the fact that Ax = 0 admits n — r solutions. (4) From this it follows that if n > m, then the set of n m-dimensional vectors a^ will always be linearly dependent since r < (m,n). We are familiar with these properties from Euclidean vector space. We know that no three coplanar vectors can be used as a basis set for an arbitrary three- dimensional vector. What is useful here is the fact that analysis of the rank of the matrix whose Cartesian vector components form the column vectors of the matrix will establish whether a given set of three vectors is coplanar. Coplanar means that one of the vectors is a linear combination of the other two (linear dependence). From the solvability theory developed in this chapter, we can use the rank of [a. , a„] not only to determine whether the set {a, , a„} is linearly dependent but also to find how many of the a^ are linearly dependent on a subset of the vectors in the set. The rank of A is equal to the number of linearly independent column vectors and to the number of linearly independent row vectors that A contains. To prove this, consider first an m x n matrix A of rank r in which the upper left comer of A contains an rth-order nonzero minor. Proof of this special case will be shown to suffice in establishing the general case. We can rearrange the homogeneous equation Ax = 0 into the form

flll^l 4- • • • + ^ir-^r — ^l.r+l-^r+l a\„x,, (4.5.8)

^ml^l + ' * * I (^mr^r — ^m,r + l'^r+l ^MIM-^M •

Since the minor

^21 ^22 (4.5.9)

is nonzero and the rank of A is r, the equations in Eq. (4.5.8) have a solution {jCi,..., jc^} for arbitrary jc^_j.,,..., jc„. Note that Ax = 0 always has a nontrivial solution for r < min(m, n) because the rank of [A, 0] is the same as the rank of A. I 5 2 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

One solution to Eq. (4.5.8) is obtained by setting x^^i = 1 and Xj = 0 and solving for x[^\ ..., jc^^^^ With this solution, Eq. (4.5.8) can be rearranged to get

*l,r+l = -(a„xr'+fl,,xf + ... + a.,^^'>) (4.5.10)

or, in vector notation.

a.+i = -E^j%' (4.5.11)

This proves that the (r + l)th column vector of A is a linear combination of the set {ap ..., a^}. In general, if x^ = 1 for r > r and Xj =0 for j > r and j =^ /, then the solution {xf\ ..., xj!^] of Eq. (4.5.8) can be found, and so

: : (4.5.12)

r a, = -X]4S' ^>^' (4.5.13)

Thus, we have proven that all the column vectors a^^j,..., a„ are linear combinations of the first r column vectors aj,..., a^. The vectors aj,..., a^ are linearly independent because, otherwise, there would exist a set of numbers {cj,..., c^}, not all 0, such that Eqa,=0 1 = 1 or

: : (4.5.14)

If this set of equations has a nontrivial solution, then the rank of the matrix [aj,..., a,.] has to be less than r, which contradicts our hypothesis. In summary, for a matrix of rank r and of the form considered here, the last n — r column vectors are linearly dependent on the first r column vectors, which themselves are linearly independent. Since the rank of the transpose A^ of A is also r, it follows that the last m — r column vectors of A^ are linearly dependent on the first r column vectors, which themselves are linearly dependent. But the column vectors of A^ are simply the row vectors of A, and so we conclude that LINEAR DEPENDENCE OF A VECTOR SET AND THE RANK OF ITS MATRIX 153

the last m — r row vectors of A are linearly dependent on the first r row vectors, which are themselves linearly independent. Next, consider the general case, i.e., an m x n matrix A of rank r, but in which the upper left comer does not contain a nonzero minor. By the interchange of columns and rows, however, a matrix A' can be obtained that does have a nonzero rth-order minor in the upper left comer. We have already shown that the rank of A' is also r, and so if

A'=:[a;,a^,...,a;], (4.5.15)

then the r m-dimensional column vectors a'^ ..., a^ are linearly independent and the n — r column vectors a|.,.p ..., a„ are linear combinations of the first r column vectors. Also, from what we presented above, the r /i-dimensional row vectors [a,^p ..., i^f^y, / = 1,..., r, are linearly independent and the remaining m — r row vectors are linear combinations of the first r row vectors. To prove what was just stated, note first that the relationship between A and A'is

A' = Q^^^AQ^^,(2)\ (4.5.16)

where the square matrices Q^^^ and Q^^^ are products of the 1,^ matrices that accomplish the appropriate row interchanges and column interchanges, respectively. Since the determinants of Q^^^ and Q^^^ are ibl, it follows that the ranks of A' and A are uic same, riuiii uic piupciiy *^ij*-ij = *> ^^ IIUWA^ S

Q(.)Q(/) ^ 1 (4.5.17)

or that Q^" equals its own inverse. Consequently,

A = Q"'A'Q<2' = Q">[a;,...,a:]Q"'

Hn Hhx =:[Q<'V„...,Q<"<] (4.5.18) ,(2)

U=i k=\ J and, therefore,

(4.5.19) k=\ We proved already that

(4.5.20) 7=1 I 54 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

which, when inserted into Eq. (4.5.19), yields

a/ = EA;Q^^X- (4-5-21)

where Pij = Y!k=\ ^u^^kj- "^he vector a^ is related to one of the set {aj,..., a„}, say a^ , by the row interchange operation Q^^\ i.e.,

a;. = Q^^^a;., (4.5.22)

and it follows that

Q^^^a;. = Q^^^Q^'^a^, = a^.. (4.5.23)

Thus, Eq. (4.5.21) reads

where [Ij} indicates the indices of the r column vectors of A that were moved to the columns 1,..., r in A to put a nonzero rth-order minor in its upper left comer. This proves that any column vector in A is a linear combination of the r column vectors a;^,a;^,...,a^. These r vectors are linearly independent. To prove this, assume the contrary; i.e., assume that there exists a set of numbers {cp ..., c^}, not all 0, such that

T.^j^ij=0. (4.5.25) 10)

By multiplying Eq. (4.5.24) by Q^^\ it follows that Eq. (4.5.24) implies

or that the set {a'p ..., a^} is linearly dependent. However, this is a contradiction, and so the vectors a^^,..., a/^ must be linearly independent. Similarly, by considering the transpose of A', we can prove that r of the row vectors of A are linearly independent and that the other m — r row vectors are linear combinations of these r row vectors. THE FREDHOLM ALTERNATIVE THEOREM 155

The "take-home" lesson of this section is as follows:

THEOREM. If the rank of the m x n matrix A is r, then (a) there are r m- dimensional column vectors {and r n-dimensional row vectors) that are linearly independent and (b) the remaining n — r column vectors {and m — r row vectors) are linear combinations of the r linearly independent vectors.

EXAMPLE 4.5.1. How many of the vectors

fl' "l" '2~ "3l

M - 2 , 82 = 1 , a3 = 3 , 34 = 5 (4.5.26) [3_ 1 _4_ _1 \

are linearly independent? Since the rank r of A,

112 3

A — 13], 32' 33,34J — 2 13 5 (4.5.27) 3 14 7

is less than or equal to 3, we know at most three vectors are linearly independent. By Gauss elimination, we transform A to

1 1 2 3 K = 0 -1 -1 -1 (4.5.28) 0 0 0 0

Thus, the rank of A is 2. Therefore, only two of the vectors are linearly independent. Indeed, the pair aj and a2 are linearly independent and

• • • 83 -- a^ + 82 and VLA — z«a-5 a^. (4.5.29)

4.6. THE FREDHOLM ALTERNATIVE THEOREM

For algebraic equations, the condition of solvability, stated as the equality of the ranks of the augmented matrix [A, b] and the matrix A, is especially attractive. Straightforward Gauss-Jordan elimination establishes the ranks of both [A, b] and A and results in a final set of equations, which, when solvable, require very little further work to obtain the solution or solutions to Ax = b. There is, however, another way to state the solvability theorem, known as the Fredholm alternative theorem. While not suggestive of a method of solution, it is powerful because its form carries over to much more general vector spaces (e.g., function spaces whose operators are differential or integral operators instead of matrices) where the concept of the rank of a determinant is not defined. Before stating the theorem, some additional properties of the matrix A need to be established. Recall that the adjoint A^ is the complex conjugate of the transpose 156 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

A^ of A; i.e., if

«2n A = (4.6.1)

then

^ml

•*22 •*m2 A^ = (4.6.2)

Since the interchange of rows and columns does not change the value of a determinant, it follows that the rank of the adjoint A^ is the same as the rank of A. This can be seen explicitly by recalling the form of A^^ in Eq. (4.4.39). The adjoint of At, is

• K.i 0 0 0 •• 0 0 Yii 0 0 ... 0 0 0 0 0 ... 0

A"^ - (4.6.3) 0 0 y* 0 ••• 0 frr Vl.rM rlr+t • ' • n!.+i 0 ••• 0

Yin Yin

Since only r columns of A^^, have nonzero elements and since ni=i Yu ¥" 0^ it follows that the rank of Af, is r, the rank of A. Recall, however, that A^^. = PAQ, where |P| and |Q| are ±1. Thus, Af, = Q^A^P^ from which it follows that the rank of AJ, is the same as the rank of A^ which proves our claim that the rank of A^ is the same as the rank r of A. The matrix A^ has m column vectors (which are the complex conjugates of the transpose of the row vectors of A), and so, according to the solvability theorem in Eqs. (4.4.52) and (4.4.53), the homogeneous equation

A+z = 0 (4.6.4)

has m - r linearly independent solutions; i.e., there exist m — r n-dimensional vectors z. Z2,..., z„,_r satisfying Eq. (4.6.4). Recall that the homogeneous equation

Ax = 0 (4.6.5)

has n — r linearly independent m-dimensional vector solutions x,,..., x„_,.. Thus, only if A is a square matrix do A and A^ have the same number of solutions to their homogeneous equations. THE FREDHOLM ALTERNATIVE THEOREM 157

FREDHOLM ALTERNATIVE THEOREM. The equation

Ax = b (4.6.6)

has a solution if and only ifb is orthogonal to the solutions of Eq. (4.6.4)

b^z^. = 0, (4.6.7)

where Zj is any of the m — r linearly independent solutions of the homogeneous adjoint equation (4.6.4). The solvability condition required by the Fredholm alternative theorem places m — r conditions on b, namely, b^Zy = Y!i=\ ^t^tj = 0, 7 = 1,..., m — r. The conditions required to ensure that the rank of [A, b] is the same as the rank of A are that

^r+l Ofr+2, «m = 0, (4.6.8)

where a, = Yl^=i Ptk^k ^^^ Pik ^^^ elements of the matrix P in the transformation Aj^ = PAQ. Thus, the solvability conditions in Eq. (4.6.8) also place m — r conditions on b and must, of course, be equivalent to the conditions of the Fredholm alternative theorem. The proof of the necessity ("only if) on the conditions of the Fredholm alternative theorem is quite simple. Suppose the solution x to Eq. (4.6.6) exists and take the inner product of z, and Eq. (4.6.6) to obtain

ZJAX = ztb, (4.6.9)

where z, is any solution to Eq. (4.6.4). Taking the adjoint of each side of Eq. (4.6.9)—using the rule (Eq. (2.4.4b)) for forming adjoints of products of matrices—we obtain

X^A'^'Z; = b'^'Z;. (4.6.10)

But A^Zj = 0, or b^z, = 0, proving that Eq. (4.6.7) is a necessary condition for Ax = b to have a solution. To prove the sufficiency condition, we must assume that the conditions in Eq. (4.5.7) are true and prove that this implies the existence of X. This part of the proof is somewhat tedious and will not be given here. EXAMPLE 4.6.1. Under what conditions does

[2 ll P -, Xt '^1 1 2 1 1 = bi or Ax = b (4.6.11) [3 4J 1 ^2 JA have a solution? The homogeneous equation A^z = 0 is given by

2 1 3 = 0 (4.6.12) 1 2 4 ^3 158 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

or 2zi + ^2 + 3z3 = 0 (4.6.13)

and has the solution

(4.6.14) 3

where we have set Z3 equal to 1 (but a solution exists for arbitrary ^3). The solvability condition, b^z = 0, is then

(4,6,15)

If, for example, h^ = 1, ^2 = U and b^ = |, then Eq. (4.6,11) has a solution. In this case. Ax = b is

IXy 4- JC2 = 1

Xi H- 2JC2 = 1 (4.6.16)

3x, +4JC2 = -.

Gauss-Jordan elimination yields

_ 2 2JCI ~ 3 3 _ 1 (4.6.17) ~X'} — —• 2 ^ 2 0 = 0, or X, = I and X2 = \. Since the rank of A is 2 and n = 2, n — r = 0, there are no homogeneous solutions to Ax = b, and so

1 3 X = (4.6.18) 1 3 J

• is a unique solution to Ax = b, even though A is not a square matrix. PROBLEMS 159

EXERCISE 4.6.1. Show the solvability conditions that the Fredholm alternative theorem requires for b when [l 11 1 1 (4.6.19) [l 1J Pick a b satisfying the conditions and find the most general solution to Ax = b for this case. *

PROBLEMS

1. A and B are defined as

-2 1 A=: ' and B -2 1 1

(a) Compute the determinants |A|, |B|, and |AB| and verify that |AB| = |A||B|. (b) Compute A"' and B"' and verify that |A-'| = 1/|A| and |B-M = 1/|B|. 2. If B and C are of ranks rg and r^, show that the rank of A, where

B 0 A 0 C

is r^ + re. 3. Find the general solutions to Ax = b for the following: (a) 1 -1 2 6 -10 10 A = b = -2 4 -3 2 -2 4 (b) 3 -1 6 0 2 4 A = 6 -4 b = 8 3 1 10 3 --3 2 (c) 2 0 1 0 2 A = 0 1 1 1 0 b = 2 1 2 1 1 I 60 CHAPTER 4 GENERAL THEORY OF SOLVABILITY OF LINEAR ALGEBRAIC EQUATIONS

4. Consider the system of equations

2x -h y + az = P 2x-ay + 2z = P X — 2y -\-2az = 1.

(a) For what values of a and p does the system have a unique solution? (b) Use Cramer's rule to obtain the unique solution in terms of a and p. (c) Are there any other nonunique solutions for other values of a and pi If so, give a single example (i.e., choose specific values for a and p and give the general form of the solution.) 5. Consider the set of equations

X — 3y = —2

2JC + J = 3 3x — 2y = a.

(a) Are there values of a for which this set has no solution? If so, what are they? (b) Are there values of a for which this set has a solution? If so, give an example. 6. For what values of k will the system

2x+ky + z = 0 (k-\)x-y-2z = 0 4x-hy-\-4z = 0

have nontrivial solutions? 7. Prove that the equations

X 4- (cos y)y + (cos P)z = 0 (cos y)x 4- y + (cos a)z = 0 (cos P)x -f (cos a)y + z = 0

have a nontrivial solution if a + P -{- y = 0. 8. Consider the equations

ax + by + cz ^ fci

a^x + b'^y + c^z = ^2: a^x+by-^c^z = b^.

Give the solution to these equations when a,b, and c are different. Give the conditions for a solution when a = b ^ c and give the most general solution in this case. FURTHER READING 161

9. Consider the augmented matrix

1 3 -8 2 [A, b] = 1 -9 -10 -3 -1 3 9 0

(a) Use simple Gauss elimination to find the rank of [A, b] and A. (b) How many of the column vectors of [A, b] are linearly independent? Why? How many row vectors are linearly independent? Why? (c) If the problem Ax = b (1) has a solution, find the most general one. If there is no solution, why not? (d) Is there a solution to A^z = 0? What are the implications of the answer to this question to the solvability of Eq. (1)? 10. Repeat parts (a)-(d) in Problem 9 using the following augmented matrix:

3-12 -1 6 0 2 -2 [A,b] = 0 2 2 0 -9 3 0 3 3 2 3 -1

11. Prove that (I + B)x = b has a solution for arbitrary real b if B is a real skew symmetric matrix (B^ = -B) 12. In the pyrolysis of a low-molecular-weight hydrocarbon, the following species are present: C2H6, H, C2H5, CH3, CH4, H2, C2H4, CgHg, and C4H10. Determine the number of independent reactions among these species. 13. A reaction mixture is found to consist of O2, H2, CO, CO2, H2CO, CH3OH, C2H5OH, (CH3)2CO, CH3CHO, CH4, and H2O. How many independent components are there in the mixture? Which ones can they be? What is the minimum number of reactions possible to produce this mixture?

FURTHER READING

Amundson, A. R. (1964). "Mathematical Methods in Chemical Engineering." Prentice Hall, Englewood Cliifs, NJ. Bellman, R. (1970). "Introduction to Matrix Analysis." McGraw-Hill, New York. Noble B., and Daniel, J. W. (1977). "Applied Linear Algebra." Prentice Hall, Englewood Cliffs, NJ. This Page Intentionally Left Blank THE EIGENPROBLEM

5. L SYNOPSIS

In this chapter we will present the concept of an n x n matrix as a linear operator in an n-dimensional linear vector space £„. We can, as shown in Chapter 2, define an inner product from which we can create an inner product space. The inner product provides us with convenient vector and matrix norms. We will show that the inner product (x, y> obeys the Schwarz inequality, |{x, y>| < ||x|| ||y||, where ||x|| denotes the norm or length of the vector x. The Schwarz inequality, in turn, implies the triangle inequality ||x4-y|| < ||x|| + ||y|| for the norms (lengths) of the vectors, and ||A + B|| < ||A|| + ||B|| for the norms of the matrices. The inequality also implies that ||AB|| < ||A|| ||B||. Any set of n linearly independent vectors {x^,..., x„} forms a basis set in £*„ such that any vector x belonging to E„ can be expanded in the form

- = Eof.x. .

where a, are complex scalar numbers. We can subsequently define the reciprocal basis set {z^,..., z„} by the properties xjz^ = S^j. We will show that the vectors {z,} can be formed from the column vectors of (X~*)^ where X = [x,,...,x,J. We say that {x,} and {z,}, satisfying the orthonormality conditions xjzy = 5,^, form biorthogonal sets.

163 I 64 CHAPTER 5 THE EIGENPROBLEM

The concept of basis and reciprocal basis sets is immensely important in the analysis of perfect matrices. We say that the n x n matrix A is a perfect matrix if A has n eigenvectors, where an eigenvector x^ of A obeys the eigenequation

Ax,- = A.,x,-,

and A, is a scalar (real or complex number) called the eigenvalue. Specifically, we say X, is an eigenvector of A with eigenvalue X^. We will show that the n x n identity matrix I„ can be expressed by the equation

for any linearly independent basis set {x,} and its corresponding reciprocal basis set {z^}. Since A = AI, it follows that a perfect matrix A obeys the spectral resolution theorem n

From this we will show that the reciprocal vectors {z,} are the eigenvectors of the adjoint matrix A^ and that the eigenvalues of the adjoint matrix are A.* (the complex conjugate of A,,). The spectral resolution theorem enables us to express the function /(A) as /(A) = Yll=i /(^j)x,zj for any function f{t) that can be expanded as a Taylor or Laurent series in / near the eigenvalues A-. As we will show, the spectral resolution theorem for exp(aA) provides a simple solution to the differential equation dx/dt = Ax, x(/ = 0) = XQ. We will see that the eigenvalues of A obey the characteristic equation |A — Xl\ = 0, which is an /ith-degree polynomial P„(A.) = Jl%o^j(~^'^^~^^ where aj = lYj A. Note that tr^ A is the 7th trace of A, which is the sum of the yth-order principal minors of A. The traces of A are invariant to a similarity transformation; i.e., try A = try(S~^AS), where S is any nonsingular matrix. This invariance implies that

i.e., the yth trace of A is the sum of all distinct y-tuples of the eigenvalues Ai,..., A,,,. A is a diagonal matrix with the eigenvalues of A on the main diagonal, i.e., A = [X^Sij]. We will also prove that the traces obey the property try(AB) = try(BA). This implies that the eigenvalues of AB are the same as the eigenvalues of BA. We can determine the degree of degeneracy of an eigenvalue without actually solving for the eigenvectors. The number «, of linearly independent eigenvectors of A that correspond to a given eigenvalue A., (called the degeneracy of A.,) is equal to n — T;^., where r^^, is the rank of the characteristic determinant |A — X,I|. If A has n, eigenvectors {x,} corresponding to the eigenvalue A.,, then the adjoint A^ has n, eigenvectors {z,} corresponding to the eigenvalue A.*. If A,, 7«^ A,^, then the eigenvectors of A corresponding to Ay are orthogonal to the eigenvectors of A^ corresponding to A*. LINEAR OPERATORS IN A NORMED LINEAR VECTOR SPACE I 65

Solving the characteristic polynomial equation P„{X) — 0 can be numerically tricky, and so other methods are sometimes sought for finding the eigenvalues. For instance, tridiagonal matrices are often handled via a recurrence method. The power method is an iterative method that is especially easy to implement for computing the eigenvalue of maximum magnitude, but the method can also be used to compute any eigenvalue. We v^^ill end the chapter by presenting Gerschgorin's theorem, which gives a domain in complex space containing all the eigenvalues of a given matrix. As we will see, this is sometimes useful for estimating eigenvalues.

5.2. LINEAR OPERATORS IN A NORMED LINEAR VECTOR SPACE

We asserted in Section 2.6 that 5 is a linear vector space if it has the properties: 1. If x,y € 5, then

x4-y€5. (5.2.1)

x + y = y + x. (5.2.2)

3. There exists a zero vector 0 such that

x + 0 = x. (5.2.3)

4. For every x € 5, there exists —x such that

X + (-X) = 0. (5.2.4)

5. If a and j6 are complex numbers, then

{a 4- i8)x = ax -h i^x (5.2.5) a(x + y) = ofx + ay.

To produce the inner product space S, we add to the above properties of S an inner producty (x, y>, which is a scalar object defined by the following properties: if X, y G S, then

(x,ay) =a(x,y), (y,x) = (x,y)*, (x, x) > 0 if X 7^ 0,

and

{x, X) = 0 if and only if x = 0, (5.2.6)

where (x, y)* denotes the complex conjugate of (x, y). I 66 CHAPTER 5 THE EIGENPROBLEM

We define the norm or length ||x|| of a vector x in an inner product space by

||x|| ^ /(^), (5.2.7)

The properties of the inner product given in Eq. (5.2.6) yield a vector norm (or length) obeying the defining conditions of a normed linear vector space (Eq. (2,6.8)), which are the physical conditions commonly associated with the concept of length in the three-dimensional Euclidean vector space we occupy. Namely,

1. ||x|| > 0 and ||x|| = 0 only if x = 0.

2. ||ax|| = \a\ ||x|| for any complex number. (5.2.8)

3. ||x4-y||<||x||-M|y||.

Here \a\ denotes the absolute value of the complex number a, i.e., \a\ = \/ot\ + af, where a^ and a^ are the real and imaginary parts of a: (a = a^ 4- /orj). The proof of item (3), called the triangle inequality, proceeds by first proving the Schwarz inequality. To prove this, we note that 7 > 0, where

/ ^ (x + ay, X -I- ay) = llxf + a*(x, y)* + a{x, y) + \a\'\\y\\\ (5.2.9)

and a{= a^ + ia^) is an arbitrary complex number. Since 7 > 0, we seek the value of a that minimizes the value of J for a given x and y. At the minimum, a^ and «! obey the equations

^ = (X, y)* + (X, y) + 2a,,\\yf = 0 (5.2.10)

^ = -(x,y)*/ + (x,y>/ +2a,||y|p = 0 (5.2.11)

whose solutions yield

Re{x,y) Ini(x,y) /colo^ llyr «i' = llyl., .,l2 (5.2.12) or {x,y) (5.2.13) llyf The notation Re{x, y) and Im{x, y) denotes the real and imaginary parts of (x, y). Insertion of Eq. (5.2.13) into Eq. (5.2.9) yields

or, upon rearranging,

l(x,y>l

which is the Schwarz inequality. LINEAR OPERATORS IN A NORMED LINEAR VECTOR SPACE 167

In the ordinary Euclidean vector space, (a, b) = ab = ||a|| ||b|| cos 0, where 6 is the angle between the vectors a and b. If a and b are colinear, then | cos ^| = 1 and |a • b| = ||a|| ||b||. Otherwise, |cos 0| < I since the projection of a onto the direction of b is shorter than the length of a, as shown in Fig. 5.2.1. In an abstract linear vector space, the Schwarz inequality is geometrically equivalent to the projection of the vector x onto the direction y of y, where y is the unit vector

(5.2.16) Ilyll In terms of this unit vector, Eq. (5.2.15) reads

which is analogous to |a • b| = ||a|| | cos6\ < ||a||. We see that only if x and y are colinear will the equality hold in Eq. (5.2.15). To prove the triangle inequality, note that

l|x + yf = ||x|p + 2|(x,y)| + ||yf, (5.2.18)

which, when combined with Eq. (5.2.15), yields

l|x + yf < \\xf + 2||x|| Ilyll + \\yf = (||x|| + ||y||)^

l|x + y||<||x|| +Ilyll. (5.2.19)

Linear operators in a vector space were also defined in Section 2.6. We say that A is a linear operator in 5 if it has the properties

1. If X € 5, then Ax e 5. (5.2.20) 2. If X, y e 5, then A(ax + py) = aAx + ;8Ay. (5.2.21)

We saw in Section 2.6 that if the norm of a linear operator is defined by

II Axil IIAII = max (5.2.22) x#o llxll '

FIGURE 5.2.1 168 CHAPTER 5 THE EIGENPROBLEM

then the vector norm properties listed in Eq. (5.2.8) imply the following properties for the norms of linear operators:

1. ||A|| > 0 and ||A|| = 0 if and only if A = 0. (5.2.23) 2. ||aA|| = |a| ||A||, a for any complex number. (5.2.24) 3. ||A + B||<||A|| + ||B||. (5.2.25) 4. IIABII < IIAll ||B||. (5.2.26)

Any nxn matrix A obeying the previously defined properties of multiplication and addition is a linear operator in the n-dimensional linear vector space £"„. If we define the inner product by

(5.2.27)

then A becomes a linear operator in a normed linear vector space with the norm Vx^x. As an operator, A transforms a vector x in E^ into another vector in E^, Geometrically, this transformation can involve rotation, stretching (or shrinking), or both. The possibilities are illustrated in Fig. 5.2.2 for vectors in E3. Note that the vector y is the same length as x, i.e., ||x|| = ||y||, and so in this case the action of A on X is purely a rotation since x and y are not colinear. The vector z is neither colinear with x nor of the same length, and so in this case the action of A is to rotate and stretch (if ||z|| > ||x||) or shrink (if |{z|{ < ||x||) the vector x. In the third case. Ax = A.x, where X is a scalar (real or complex). Thus, A.x is colinear with x and so A merely stretches (|X| > 1) or shrinks (|X| < 1) the vector x. In this last case, i.e., when

Ax = Ax, (5.2.28)

we say that x is an eigenvector of A and X is an eigenvalue of A. Determining what vectors x and numbers k satisfy Eq. (5.2.28) is referred to as the eigenproblem.

FIGURE 5.2.2 LINEAR OPERATORS IN A NORMED LINEAR VECTOR SPACE 169

We will soon learn that a matrix can be usefully classified in terms of how many linearly independent eigenvectors it has. If an n xn matrix has n eigenvectors, it is called 2L perfect matrix. Otherwise, it is called an imperfect or defective matrix. The value of this classification will become clear later. The infinite-dimensional vector space E^, whose vectors x have an infinite number of components, occurs in physical problems such as those encountered in Chapters 9 and 10. With the inner product defined by

xV = E<>'i' (5.2.29)

the requirement that vectors x and y belong to a normed linear vector space is that

and llyf = Elxl'<^- (5.2.30) 1=1

Thus, the vector

(5.2.31)

with the norm llxll =[l^]'"^ 1.26 (5.2.32) belongs to a normed linear vector space, whereas the vector

7! (5.2.33) 1 7!

with the norm

llyll ST] = oo (5.2.34) does not belong to a normed linear vector space. 170 CHAPTER 5 THE EIGENPROBLEM

We can define inner products other than x^y, which are sometimes useful. For example, if A is a positive-definite matrix (i.e., if A = A^ and x^Ax > 0 for all X ^ 0 in E„), then the inner products

(y, x> = y"^Ax (5.2.35)

and

(y, x) = y^A^Ax (5.2.36)

satisfy the defining criteria in Eq. (5.2.6). The inner product in either Eq. (5.2.35) or (5.2.36) with the norm defined by ||x|| = ^/(x\ x) will yield a normed linear vector space obeying the conditions in Eq. (5.2.8). We can, of course, define a normed linear vector space in terms of the conditions in Eq. (5.2.8) without defining an inner product. For example, the norm ll^ll = [ !]"=! 1^1 K] ^^ for any positive real /?, defines an acceptable normed linear vector space in E^. However, \\x\\p is not necessarily related to an inner product. In the remainder of this and the next two chapters, we will concentrate primarily on n X n matrices as linear operators in the finite-dimensional linear vector space £•„ in which we will usually define the inner product as y^x. We will, however, demonstrate the utility of alternative inner products such as Eq. (5.2.35) or (5.2.36) for symmetrizing certain problems and for expediting certain proofs. In the last three chapters, we will explore linear operators in infinite-dimensional spaces. These include integral or differential operators in function spaces.

EXERCISE 5.2.1. Consider the matrix

A =

Show that the inner product {y. x) = y^Ax

III obeys all the conditions in Eq. (5.2.6).

53. BASIS SETS IN A NORMED LINEAR VECTOR SPACE

If X is a vector in £„, it can always be expressed in the form

X = E-^re/' (5.3.1) BASIS SETS IN A NORMED LINEAR VECTOR SPACE 171

where

0 e, = I 1 I ith row. (5.3.2) 0

The set {e^ ..., e„} is thus a basis set in E„ in the sense that any vector x can be written as a linear combination of {e,}. Since the set obeys the condition

eje^=5/;, (5.3.3)

where S^j = 1 for / = j and 5,y = 0 for i ^ j, we say that the set {e,} forms an orthonormal basis set. That is to say, the vectors e, are mutually orthogonal (ejcy = 0, / 7^ j) and of unit length (We^f = eje, = 1). Although the set {e,} is perhaps the simplest basis set, it is by no means the only one. Suppose {Xj, X2,..., x„} is any linearly independent set in E„. Then this set provides a basis set in E„, The requirement for a basis set is that for any vector X there exists a unique set of numbers {«!,...,«„} such that

(5.3.4)

This equation can be rewritten as

X = Xa, (5.3.5)

where X is a square matrix whose column vectors are x,, i.e.,

X = [Xi,...,x„] (5.3.6)

and a is the vector

a — (5.3.7) a. Since the vectors {x,,..., x„} are linearly independent, X is nonsingular (|X| ^ 0) and Eq. (5.3.5) has the unique solution

(5.3.8)

proving that the set {x,,..., x„} forms a basis set in E„, We say {Xj,...,x„} is an orthonormal basis set if

xjx^. =5,^.. (5.3.9) 172 CHAPTERS THE EIGENPROBLEM

As we have seen, one example of such a set is {e,, 62,..., e,,}. However, there are numerous other orthonormal sets. In fact, we can use the Gram-Schmidt procedure to construct an orthonormal set from any linearly independent set {z,,..., z„). The Gram-Schmidt procedure constructs an orthogonal set {yi, • • •, y„} from the set {Z;} as follows:

yi =zi (yizz) y2--2 ||y_||,y.

y3 = Z3 --;;—jy, --T-nryi Ilyif Ily2

(5.3.10) S^ (y>/)

_"yl(yK)

Note that

ylyz = 0 ylya = ylya = o (5.3.11)

yjy, =0, i ^ j, j = 1 i - 1, i = 1,..., n,

i.e., each vector y, is orthogonal to every other vector y^, j ^ /. There are exactly n orthogonal vectors y, corresponding to the n linearly independent vectors z,. If there were fewer y,, say n — 1, then y„ = 0, which implies that a linear combination of z, is 0. This, of course, is a contradiction of our hypothesis that the z, are linearly independent. If we now define

- - ^' i = l,...,n, (5.3.12) ' lly.ll'

then the vectors x^ obey

^hj=S-j, (5.3.13)

and so they form an orthonormal set. The Gram-Schmidt procedure is the same for a more general inner product—simply replace yjzy in Eq. (5.3.10) by {y, z). BASIS SETS IN A NORMED LINEAR VECTOR SPACE 173

EXAMPLE 5.3.1. Consider the set

2 1 1

1 z-^ = 2 Z3 = 1 (5.3.14) 1 1 2

Since the determinant of

2 1 1 [Z|, Z2, Z3J I 2 1 (5.3.15) 1 1 2

is nonzero (Z? equals 4), the z, are linearly independent. By the Gram-Schmidt procedure, we obtain

- 21 ~3 2 rr 5 7 y2 = 2 1 = ~6 6 1 [i_ 1 (5.3.16) 6

- 2- - 4 n ~3 "IT 2 rr 5 5 7 4 y3 = 1 ~6 1 ~TT 6 — ~iT L2 1 1 12 6 1^

Next, defining x, = y,/||y,||, we find the orthonormal set

1 iT

X, = X. = '•'66 X, = iT (5.3.17)

yi iT

• • for which xjxy = 5,^. 174 CHAPTER 5 THE EIGENPROBLEM

When (x,,..., x„} is an orthonormal set, the coefficients a, in the expansion

X = X^a,x, (5.3.18) 1 = 1

are especially simple to calculate. If we multiply Eq. (5.3.18) by x], we find

x]x = f2a,x]x,^f:a,8,j (5.3.19) /=i i=l or

a J = x)x. (5.3.20) Thus, for an orthonormal basis set, the coefficients a, in Eq. (5.3.18) are just the inner products of x, with x. Since the product x^a^ is the same as a,x,, insertion of Eq. (5.3.20) into Eq. (5.3.18) allows the rearrangement

(5.3.21) i=i V/=i / This relationship is valid for any vector x in £„. Therefore, the matrix X]"=i X/xJ has the property that it maps any vector x in E,^ onto itself. This is, by definition, the unit matrix I, i.e.,

i = Ex.xj. (5.3.22)

Equation (5.3.22) is called a resolution of the identity matrix. Application of the right-hand side of Eq. (5.3.22) to x yields x again, but expands it as a linear combination of the basis set {x,}. Different orthonormal basis sets are analogous to different Cartesian coordinate frames in a three-dimensional Euclidean vector space. EXERCISE 5.3.1. Show by direct summation that

3 3 1 0 0 0 1 0 (5.3.23) 1=1 i=l 0 0 1 where the x^ are given by Eq. (5.3.17). When {Xp ..., x„} is a non-orthonormal basis set, finding the coefficients in the expansion x = ^^ ar,Xj is not quite so simple. However, all we have to do is find the reciprocal basis {z,,..., z„), which is related to {Xj,..., x„} by the conditions

zjx^=5,^. (5.3.24) Those familiar with crystallography or other areas of physics might recall that the three non-coplanar Euclidean vectors a, b, and c have the reciprocal vectors b X c a X b b = axe (5.3.25) a = (a • b X c)' (b • a X c)' c = (c • a X b)' BASIS SETS IN A NORMED LINEAR VECTOR SPACE I 75

with the properties

aa = l, bb=l, cc = l (5.3.26) ab = ac = ba = bc = ca = cb = 0, where a • b and c x b denote the dot and cross product of Euclidean vectors. By definition of the cross product, the direction of a x b is perpendicular to the plane defined by a and b. In any case, let us first assume the existence of the reciprocal basis set, explore its implications, and then prove its existence. Since {Xj,..., x„} is a linearly independent set, it is a basis set, and so a unique set of {a,} exists for a given x such that

x = f]a,x,. (5.3.27)

Multiplying Eq. (5.3.27) by zt and using Eq. (5.3.24), we find

a,. = z]\, (5.3.28)

Thus, if the reciprocal set {z,} is known, finding {a,} for the expansion of x in the basis set {x,} is as easy as it is for an orthonormal basis set. Also, insertion of Eq. (5.3.28) into Eq. (5.3.27) yields

x = X]x,zJx=(f]x,zAx. (5.3.29)

Again, this equation is valid for any vector x in £"„, and so we again conclude

n I = ^x,z;. (5.3.30) i=\

Thus, the resolution of the identity is a sum of the dyadics x,zj for an arbitrary basis set and an orthonormal basis set, for which z, = x,, is just a special case. Let us now return to the problem of proving the existence of and finding the reciprocal set {z,} for the set {x,}. Let

X = [Xi,...,xJ. (5.3.31)

Since the vectors x, are linearly independent, X~^ exists and

X-^X = I = [5,.^]. (5.3.32)

Next, we define

Z^ = X-^ or Z = (X-^)^ (5.3.33)

which we can write in partitioned form with the column vectors {z,} as

Z = [z,,...,zJ, (5.3.34) 176 CHAPTER 5 THE EIGENPROBLEM

and so

z' = = x (5.3.35)

However,

Z^X = [X, ,xJ = [z,^] = X-^X = [5,.^.], (5.3.36)

proving that ZJX; = %,

which is the property we sought for the reciprocal set {z,} of the set {x,}. We have just proved that the set {z,} exists and consists simply of the column vectors of (X~')^

EXAMPLE S.3.2. Consider the set

2 1 1 X, = 1 X2 = 2 X3 = 1 (5.3.37) 1 1 2

The inverse of X = [x,, Xj, X3] is

3 1 1 4 "4 "4 1 3 1 (5.3.38) x-' = ~4 4 '4 1 1 3 L ~4 "4 4

and

3 1 1 4 '4 '4 1 3 1 z = (x-)V (5.3.39) "4 4 "4 1 1 3 "4 "4 4 BASIS SETS IN A NORMED LINEAR VECTOR SPACE 177

Therefore, the reciprocal vectors of x,, Xj, and X3 are

3 1 1 4 "4 ~4 1 3 1 z, = z, = (5.3.40) ~4 4 ~4 1 1 3 L ~4 J '4 L 4 J EXERCISE 5.3.2. Show that x-z^ = z]x, = 5,y and that

1 0 0 E^--< = 0 1 0 (5.3.41) i=l 0 0 1

for the preceding example. We can now demonstrate why perfect matrices are attractive. Suppose that the n X n matrix A is perfect and that x obeys the equation

= Ax, XQ = x(f = 0). (5.3.42) It Since A is perfect, it has n linearly independent eigenvectors x, (Ax, = A.,x,, I = 1,..., n). The identity I can, therefore, be resolved as in Eq. (5.3.30) as a sum of the dyadic x,zj, where the set {z,} is the reciprocal of the set {x,}. It follows that

A = AI = EAX,ZJ = X:A,X,ZJ. (5.3.43)

Thus, we have proven THEOREM. When A is perfect, the spectral resohition A = X]"^j AjX,zJ exists, where x^ and z,, / = 1,. . . ,/i, are the eigenvectors of A and their reciprocal vectors. The eigenvalues are called the spectra of A in a tradition going back to quantum mechanics. Taking the adjoint of A, we find

(5.3.44) 1=1 from which it follows that

(5.3.45)

This implies the following: 1. If A is perfect, so is its adjoint A^ 2. The eigenvalues of A^ are complex conjugates of the eigenvalues of A. 178 CHAPTERS THE EIGENPROBLEM

3. The eigenvectors z, of A^ are the reciprocal vectors of the eigenvectors x, of A. From repeated multiplication of Eq. (5.3.43) by A, it follows that

n A^ = X] AfXjzJ for any positive integer k, (5.3.46) i = \

or that any positive power of a perfect matrix has a spectral resolution into a linear combination of the dyadics x^zj. We can use this result to expand the matrix function exp(f A) in a Taylor series as

exp(f A) = Y, -rr = ^ ^ TT'''^'

n / 00 fk\k^ = EE(E^)x,z E J (5.3.47)

thus obtaining a spectral resolution of exp(f A). In fact, for any function f(t) that can be expanded in a Taylor or Laurent series for t near A,, / = 1,..., n, it follows from Eq. (5.3.45) that

/(A) = f:/(A,)x,zJ. (5.3.48) 1=1

Clearly, when A is perfect, the spectral resolution theorem greatly simplifies evaluating functions of A. As an example, consider the formal solution of Eq. (5.3.42)

X = exp(^A)Xo, (5.3.49)

which, with the spectral resolution in Eq. (5.3.46), becomes

n x = ^exp(a,)x,(zJxo). (5.3.50)

EXAMPLE 5.3.3. Suppose that the vectors in Eq. (5.3.37) are eigenvectors of A with eigenvalues Xj = — 1, ^2 = —2, and A3 = —3. Assume that

Xn = (5.3.51)

in Eq. (5.3.42). The reciprocal vectors in Eq. (5.3.40) yield

^1^0 — T' ^2^0 — 7» ^3^0 — 7' (5.3.52) EIGENVALUE ANALYSIS 179

and so [2" "1" "ll ^=r 1 -r- 2 n«-" 1 l\ _ 1 _l\ 2e-' 4- ^-'^ e-' 4- e-^' (5.3.53) • • • In the next section we will address the problem of finding the eigenvalues and eigenvectors of a matrix. In Chapter 8 we will see that, in more general normed linear vector spaces, perfect operators exist whose eigenvectors form a basis set for the space. We will see that these operators yield a powerful spectral resolution theorem and, eventually, we will discover that, even for imperfect or defective matrices, eigenanalysis will be of great benefit in solving matrix problems.

5.4. EIGENVALUE ANALYSIS

An /I X n square matrix A has an eigenvalue if there exists a number A, such that the eigenequation

(A - AI)x = 0 (5.4.1)

has a solution. From solvability theory, we know that this homogeneous equation has a nontrivial solution if and only if the determinant of (A — AI) is 0, i.e.,

|A - kl\ = 0. (5.4.2)

Equation (5.4.2) is known as the characteristic equation of A. Such a determinant will yield an nth-degree polynomial in k, i.e..

|A - All = P„{X) = i-Xy -f a.i-k)"-' + a2(-X)"-2 + • • • + a„

= taj(-kr-J. (5.4.3) 7=0 where «Q = 1 and a^, j > 0, are functions of the elements of A. Thus, the eigenvalues of A are roots of the polynomial equation (known as the characteristic polynomial equation)

(5.4.4) 7=0 From the theory of equations, we know that an nth-degree polynomial will have at least one distinct root and can have as many as n distinct roots. For example, the polynomial

(1-A)'*=0 (5.4.5) 180 CHAPTER 5 THE EIGENPROBLEM

has one root, A., = 1, whereas the polynomial

(5.4.6)

has n distinct roots, A, = i, / = 1,..., n. We say that the root Xj = 1 of Eq. (5.4.5) is a root of multiplicity n. In another example,

n-6 (5.4.7) 1=3

we say that the root Xj == 1 is of multiplicity p^^ =3, the root A,2 = 2 is of multiplicity p^^ = 3, and the roots X3 = 3, A.4 = 4,..., X„_6 = n — 6 are distinct roots or roots of multiplicity p;^ = 1. The multiplicities always sum to n, i.e.,

(5.4.8)

The coefficients a, can be related to the matrix A through the polynomial property

Cl: = (5.4.9) ' (n-j)\

which follows from the definition P„(A,) = Yl]=o^j{—^y ^- We note from the elementary properties of determinants that

dx 1 — A ^12 dP^ __ d dX dk

a n\ ^nl ' ' ' ^nn ~ ^ 1 a 0 ^22 — A,

0 a 32 ^3n

0 anl a„„ - A, (5.4.10) «11 - X 0 ai3 ^\n

«21 -1 ^23 ^2n

+ ^31 0 ^33 ~ ^ ^3n +

^n\ 0 «n3 . a in ~ ^ EIGENVALUE ANALYSIS 181

«12 •• . 0

^22 - >^ • • . 0 + ao9 • • • 0

«nl «n2

Thus, differentiation of |A — A.I| by A. generates a sum of n determinants, where the iih column vector of the ith determinant is replaced by —e,, whereas the other columns are the same as those of |A — XI|. Cofactor expansion yields

^32 ^33 "~ ^ *3« = (-1) dX

''>72 "n3 «„., - >^

^11-A 6f,3

«31 ^^33 - ^ ^3/t + (-1) + (5.4.11)

*n3 «„>, - -^

(^21 ^22 '2,/j-l + (-1)

'*;?1 *n2 ^«-l,»-l ~ -^ = (-l)tr„_,(A-AI),

where tr, A denotes the /th trace of A as defined by Eqs, (1.7.6)-(1.7.8) for a 3 x 3 matrix. Recall that, for an n x n matrix, the trace tr, A is the sum of the /th-order minors generated by striking n — / rows and columns intersecting on the main diagonal of A. The second derivative of P„(A.) is obtained by differentiating Eq. (5.4.10). The result is

-1 0

0 -1 *23 *2n ^ ^ n 0 0a3 3

0 0 ''«3 a„„ - k 182 CHAPTER 5 THE EIGENPROBLEM

-1 0

0 -1 ^23 •*2rt + 0 0a 33 •*3n

0 0 «,n 3

-1 «12 0 a 14

0 ^22 "" ^ 0 a 24 '*2« 0 ^32 -1 a + 34

0 a nl -*n4

1 «12 0 «14

0 ^22 0 «24 ^In -1 -f- 0 «32 «34 ^Zn + (5.4.12)

0 <3„2 0 «n4 • • • «nn " ^

Successive cofactor expansions by the columns containing the vectors — e, yield

(fP, f = (-l)^2tr„_2(A-Xl). (5.4.13) dX' Continuing this process results in the following expression for the general case

d'P, f = (-l)'/!tr„_,(A-XI). (5.4.14) dX The factor of /! comes from the fact that, by taking the /th derivative of |A — XI|, the same / columns are differentiated /! ways by changing the order of differentiation. Setting i = n — j and using Eq. (5.4.13) in Eq. (5.4.9), we find

Gj = Uj A. (5.4.15)

The coefficient aj, j > 0, in the characteristic polynomial P„(A) is simply the 7 th trace of A!. Since

and tr„A = |A|, (5.4.16) 1=1

we see that the coefficient a^ is equal to the sum of the diagonal elements of A, often simply called the trace of A, and the coefficient a^ is equal to the determinant of A. Mathematica routines are provided in the Appendix for calculating the yth- order trace and characteristic polynomial of a square matrix. EIGENVALUE ANALYSIS 183

EXAMPLE 5.4.1. Find the eigenvalues, eigenvectors, and reciprocal eigenvectors of 2 -1 -1 1 2 -1 (5.4.17) 2 2 -1 We find that

^3 = tr3A=|A|=:~3 (5.4.18) 2 -1 2 -I 2 -1 ^2 = tr2 A = + + 2 -1 -2 -1 -1 2 =:0-4 + 3 = -l (5.4.19) fl^ = trj A = 2 -f- 2 - 1 = 3. (5.4.20)

Thus,

P3(X) = -A^ + 3X^-f A-3 = 0. (5.4.21)

The eigenvalues are the roots of Eq. (5.3.20), v^hich are

A, =-1, ^ = 1, (5.4.22) The eigenvector corresponding to A^ obeys the homogeneous equation

(A - kil)Xi = 0 (5.4.23)

and the solutions to these equations are

1 1 -3 X, = 1 X2 = 1 X3 = 1 (5.4.24) 2 0 2

The reciprocal vectors are the column vectors of (X ')^ where X = [x,,X2,X3]. They are

_^ I 0 ~4 4 1 1 z, = z, = Z-. = (5.4.25) '4 4 1 0 L 2J L 2 J EXERCISE 5.4.1. Verify that the vectors z, in Eq. (5.4.25) are eigenvectors of the transpose A^ of A, where A is given by Eq. (5.4.17). Also, verify the spectral resolution theorem for A; i.e., show by direct calculation that

EM.^J = A. (5.4.26) 184 CHAPTERS THE EIGENPROBLEM

•IHB EXERCISE 5.4.2. Use the spectral resolution theorem to prove the Hamilton- Cayley theorem for perfect matrices. The Hamilton-Cayley theorem claims that A obeys its characteristic equation, i.e.,

PM) = f:^j(-^y~'=^' (5.4.27)

The theorem is valid for any square matrix, but the proof is harder for the general III case. If the multiplicity p^^. of the eigenvalue A,, is greater than 1, then the eigenvalue might have multiple linearly independent eigenvectors. The number of eigenvectors, however, is not necessarily equal to p;^. For example, if A = I, then any basis set {Xj,..., x„} is a linearly independent set of eigenvectors of I, i.e., Ix, = x-, and all n eigenvectors have the same eigenvalue X, = 1. Consider the eigenequation

(A - A,.I)x,. = 0 (5.4.28)

for an n x n matrix A. The number of linearly independent solutions of Eq. (5.4.28) is n — K)^., where r^_ is the rank of the characteristic determinant |A — A,,I|.

5.5. SOME SPECIAL PROPERTIES OF EIGENVALUES

From the general theory of polynomial equations, we know that a polynomial, fM = X]"=oO^" ^' ^0 = ^' ^^^ ^^ written in the factored form

fix) = f\(x-xj), (5.5.1) 7 = 1

where the x, are the roots of f(x); i.e., they obey the equation

fix,) = 0. (5.5.2)

Thus, the characteristic polynomial

P,(l) = i-Xr + a,{~kr-' + .. • + fl,_i(-A) + a„ (5.5.3)

can be expressed in the factored form

PnW = U(^j-^). (5.5.4) 7=1

where Xj are the roots of P„(>.) = 0. The Xj are, of course, just the eigenvalues of A, and the coefficients aj were shown in the previous section to be related to A by the equation

a J = tvj A. (5.5.5) SOME SPECIAL PROPERTIES OF EIGENVALUES 185

Expanding the factors in Eq. (5.5.4), we obtain

p„(A) = i-kf + It ^ j c-^)"-' + (EE^.^, j (-^)""' + • • • + fl A,. j=i (5.5.6) Comparison of Eqs. (5.5.3) and (5.5.6) leads to the relationships

7=1 n n

(5.5.7) n n n n «* = tr,A = EE---EEV..v^.-, h>k-\ '2>'i=l

7=1 The first and last of these relationships are encountered rather frequently in matrix analysis. They can be usefully restated as

E«77=E^- (5.5.8) 7=1 7=1 and

|A| n^-; (5.5.9) 7=1 i.e., the sum of the eigenvalues of A equals the sum of the diagonal elements of A, and the product of the eigenvalues of A equals the determinant of A. The k\h trace of A is equal to the sum of all distinct ^-tuples of products of the eigenvalues of A, i.e., every product of k eigenvalues that contains a different combination of eigenvalues. For example, if n = 3, then

^2 /V — AÂi "T" ArÂj ~\~ AÂ2 (5.5.10)

The result in Eq. (5.5.7) implies a very special relationship between A and the diagonal eigenvalue matrix

h 0 0 •• 0 0 A2 0 •• 0 A = = [Mo]- (5.5.11)

0 0 0 •• • K 186 CHAPTERS THE EIGENPROBLEM

From the definition of the /cth trace, it follows that

f-t A = i:t • • • EE >^,,K_, • • • \. (5.5.12)

which, when combined with Eq. (5.5.7), yields the conclusion that A and A have identical traces. Namely,

tr;^A = tr^A. (5.5.13)

This result can be summarized in the following theorem:

THEOREM. The traces of A are equal to the corresponding traces of the diagonal matrix A whose diagonal elements are the eigenvalues of the matrix A. To give a related theorem, let us first define a similarity transformation.

DEFINITION. We say A and B are related by a similarity transfonnation if

B = S-'AS, (5.5.14)

where S is a nonsingular matrix. From this definition, we can now prove the following theorem:

THEOREM. The traces of A are unchanged by a similarity transformation, i.e.,

tr^.A = tr/S-^AS). (5.5.15)

The theorem follows from the fact that A and S~^AS have the same eigenvalues and, therefore, from Eq. (5.5.13), they have the same traces. Note that the eigenvalues of S~'AS obey the characteristic equation

|S-'AS-XI|=0. (5.5.16)

But since I = S"^IS, Eq. (5.5.16) can be written as

IS-^A - AI)S| = iS"^ I |A - XII |S|

= |S-*S||A-A.I| (5.5.17)

= |A - All = 0,

and thus it follows that

THEOREM. The eigenvalues of A are unchanged by a similarity transformation. In Section 5.2 we showed that the eigenvalues of the adjoint, A^ of A are the complex conjugates, A*, of A if A is perfect. In fact, whether or not A is perfect, the eigenvalues of A^ are k*. To prove this, consider the characteristic polynomial of A^:

Q,(v) = \A^ ~ vl\. (5.5.18) SOME SPECIAL PROPERTIES OF EIGENVALUES 187

The eigenvalues v, of A^ obey the equation Q„(v) = 0, and since the complex conjugate of 0 is 0, it follows that

0=Q:(v) = |A^-vir = |A^-v*I| (5.5.19)

(5.5.20)

where bj = tr^ A^. But since the minors of A^ are the same as the minors of A, it follows that

bj=iTjA = aj, (5.5.21)

and so the roots v* of Eq. (5.5.20) are the roots of P„(v*) = Ey^o^A"^*)""^ = ^• Thus, V* = A, or i^, = A,*, proving that

THEOREM. The eigenvalues of the adjoint A^ of A are the complex conjugates of the eigenvalues of A for any square matrix A. There are two other general relationships between the eigenproperties of A and its adjoint A^ The first is expressed in the following theorem:

THEOREM. The numher of eigenvectors that A has corresponding to A, is equal to the number of eigenvectors that A^ has corresponding to A*. This conclusion follows from the fact that the rank of |A —X,I| equals the rank of |A^ — X*I|. If the rank is r^,, and A is an n x n, then there are n — r^. linearly independent solutions to Ax- = A,,x, and n — r^^ linearly independent solutions to Atz, = X*z,. The other property is that if A,, ^J^ XJ, then {Zy,x,) = {Zi,x^) = 0. To vShow this, consider the eigenequations Ax, = A,x, and A^Zy = k*Zj. It follows that {Zy,Ax,) = A,-(Zy,x,.) and (A'^Zy,X;) = Xj(Zj,x), However, {A^Zy,x,) = (z^, Ax,>, which follows from the definition of the adjoint, and so we find that (A, — Xj)(Zj, X,} = 0, or (Zy, X,) =0 if A, ^^ Ay. Thus, we have shown that

THEOREM. The eigenvectors Xj of A corresponding to A^ are orthogonal to the eigenvectors Zj of A^ corresponding to A* if X^ ^ Ay. EXERCISE 5.5.1. Define the nonsingular matrix

-2 11 0 S = 1 -2 1 (5.5.22) 0 1 -2

(a) Find the inverse S ^. (b) Calculate the matrix B, where

B = S 'AS (5.5.23)

and A is the matrix defined in Eq. (5.3.17). 188 CHAPTER 5 THE EIGENPROBLEM

(c) Compute the eigenvalues of B. These should be the same as the eigenvalues of A. (d) Find the eigenvectors of B. Are they eigenvectors of A? EXAMPLE 5.5.!. The Cartesian components of the stress tensor are

4 -2 -1 T = -2 5 2 (5.5.24) -1 2 4

in a convenient set of units. (a) Find the values of the maximum and minimum traction force for the mechanical loading giving rise to the state of stress represented by T. (b) Find the directions of the maximum and minimum traction forces. These forces are eigenvalues of the equation

Tn = Xn, (5.5.25)

and the coiTesponding eigenvectors are their corresponding directions. The eigenvalues of Eq. (5.5.25) are

A,^ = 3, ^2 = 5 - 2>/2, A.3 = 5 + iVl (5.5.26)

>^i=-3, ^2 = 2.17157, ^3==: 7.82943. (5.5.27)

Thus, the maximum and minimum traction forces are X3 = 7.82943 and X2 = 2.17157, respectively. The normalized eigenvectors of Eq. (5.5.25) are

-1 -11 1 rr 0 1 -V2 1 ^/2 (5.5.28) ' "2 = 2 Zf [i Z/ 1 1 J

Thus, if /, y, and k are the mutually orthogonal unit vectors of the Cartesian coordinate system in which T is expressed, the direction of the maximum in three- dimensional Euclidean space is

U V2. K (5.5.29) 2 2 2 and for the minimum traction force it is U V2, 1 ,^ n^ = —I / + -k. (5.5.30)

What are the physical angles between the ^2 ^^^ h^ directions and the directions of /, j, and ^? CALCULATION OF EIGENVALUES I 89

5.6. CALCUUTION OF EIGENVALUES

The most obvious way to compute the eigenvalues of A is to find the roots of the characteristic polynomial P„(A.). This is, indeed, the preferred method for small matrices, say n < 5. However, for large n, the polynomial can be ill-conditioned; i.e., its roots can be very sensitive to round-off errors. For example, suppose that the exact eigenvalues of A are A, = /, / = 1,..., 20. Since a, = tr, A, the characteristic polynomial for A is

P^(X) = X^^ - 210A.^^ + 20615A^^ -f- • • • -h 20! (5.6.1)

If a^ = -210 is replaced by «, = -210 - 2-^^ = -210.000000119... and all of the other coefficients are unchanged, the roots A,6 = 16 and Ajy = 17 become (to nine decimals)

X,6 = 16.730737466 - 2.812624894/ Ai7 = 16.730737466-2.812624894/. ^ *

As we can see, a change in a, on the order of 10~^^% changes the eigenvalues substantially—even resulting in complex values instead of real ones. ••• EXERCISE 5.6.1. You can verify by plotting the polynomial

P^{X) = -k^ + 8X2 _ 13X + 5 (5.6.3)

that it has three real roots. However, if you use Mathematica to solve P^W = 0 at various levels of precision, you will see that one of the roots has a small spurious • I • imaginary part that decreases with increasing precision in the calculation. For tridiagonal matrices, there is a simple iterative method for computing eigenvalues. Let

ai ~ X b, 0 0 0 Ci a^- A. h. 0 0

0 ^2 «3- X ^3 0 P„(k) = \A-Xl\

0 0 0 ^n-l «,I- l -X K-x 0 0 0 0 Cn- -1 a^-X (5. 6.4) If the last row and last column of this determinant are struck, the remaining determinant is P„_i(X), the characteristic polynomial of the (n - 1) x (n — 1) tridiagonal matrix. Similarly, if the last two rows and two columns are struck, the result is Pfi-iW^ the characteristic polynomial of the (n — 2) x (n — 2) tridiagonal matrix. Thus, by using the cofactor expansion of |A — XI| by its last column followed by the cofactor expansion on the coefficient ^„_i in the first cofactor expansion, we obtain

P,(X) = {a, - X)P„_i(A) - b„_,c„_,P„_2(k), (5.6.5) 190 CHAPTERS THE EIGENPROBLEM

This result is valid for any n, and so the expansion process generates the recurrence formulas P,{X) = (a, - X)P,_i(X) - ^,_iC,_,P,_2(X), r = n,..., 3

(5.6.6)

P,(k) = {a,-k)Po{k),

where P^ik) = 1. The eigenvalues can now be computed by guessing a value X' and computing Pi(V), ^2(^0, ..., Pni^')' If Pn(^') = 0. ^' is an eigenvalue. If not, we make another guess and continue the process. As we can see from Eq. (5.6.6), the calculation of Pn(X') for each guess takes 3(n — 1) operations. A Mathematica program implemeting this method has been included in the Appendix. In the next chapter we will prove that any self-adjoint matrix can be transformed into a tridiagonal matrix and thus the recurrence method for computing eigenvalues can be used. ••• EXERCISE S.6.2. Use Eq. (5.6.6) to find the eigenvalues of

1 2 0 A = 1 3 2 (5.6.7) III 0 1 4 Once an eigenvalue A, of a tridiagonal matrix is found, the corresponding eigenvector satisfies (A — A,I)x = 0 and can be easily computed (if fo, ^ 0) from

X2 = - — {ay -k)x^

1 b. (5.6.8)

The initial value x, can be given any nonzero value or can be chosen to normalize X, i.e., chosen such that ||x|| = 1. An easy method to program for computing the maximum-magnitude eigenvalue of a matrix A is the power method. The method works if the eigenvalues can be ordered as

\K\S\h\<\h\<"-<\K-i\<\K\ (5.6.9) such that X„ is the eigenvalue of maximum magnitude. If X„ is of multiplicity 1, i.e., |X„| 7^ |A.,|, I = 1,..., n — 1, then the power method works. We will prove CALCULATION OF EIGENVALUES I 9 I

the method only for perfect matrices; however, it works for any matrix as long as the conditions in Eq. (5.6.9) are valid. For a perfect matrix, the spectral resolution theorem holds, namely,

n A = X]^,x,.zJ, (5.6.10)

where Ax, = X,x, and zjx^ = 5,^. We begin with an arbitrary vector UQ. Then we define

n UtsA*Uo==X;xf(z!xo)x,. (5.6.11) = A*[(z>o)x„ + E(^)*(z;uo)x,.j.

As the value of k increases, the terms in Eq. (5.6.11) multiplied by (A.,/A,„)* contribute less and less since (A,/X„)* -> 0 as ^ ^^ oo. Thus, at large enough k,

u^ ~ X*(z>o)x„. (5.6.12)

We do have to assume that the arbitrarily chosen UQ is not orthogonal to z„. Usually, round-off errors prevent this coincidence. If we define

(5.6.13)

then / X„ \* zJUn X„

and so

as ^ -> oo (5.6.15)

and

w^*^ -> ax„ as A: -> oo. (5.6.16)

Since any multiple of an eigenvector of A is also an eigenvector of A, Eqs. (5.6.13) and (5.6.14) yield the eigenvalue of maximum magnitude and its eigenvector. A simple computer program utilizing the power method is given below. Choose an initial eigenvector guess UQ:

Initialize the variable TOLERANCE used in the convergence criterion:

TOLERANCE = (some value). 192 CHAPTERS THE EIGENPROBLEM

Initialize the variable e to some value greater than TOLERANCE:

€ = (some value greater than tolerance).

Choose an initial guess for the eigenvalue and set it equal to X^^^:

A^jjj = (some value).

Begin iteration loop and continue until solution converges:

While (e > TOLERANCE)

•*new = Au

x = U^«new/U^U

u = **new e = |A-x„,j

'^old = k Continue

With sufficiently stringent tolerance, the power method will give a good approximation to A,„ and x„, which, if normalized, is x„ = Unew/IIUnewll- The power method can be used to find the eigenvalue of minimum magnitude as well as all of the distinct eigenvalues. However, the matrix inverses must be computed in this case. Again, we begin with an arbitrary guess XQ. For a perfect matrix with only nonzero eigenvalues,

A-^Xi = Ar^x,, / = 1,..., n, (5.6.17)

and so the spectral resolution of A~^ is

A-^=EVVJ. (5.6.18)

where x^ and z, are the eigenvalues of A and A^ From this it follows that A-\^(X,r\(x,x,), (5.6.19)

where |Aj| < IA.2I < ••• < |A,„|. Thus, with the iterative sequence

Ul Aui = XQ or Ui = A 'XQ, W^ = -—- (5.6.20) Wl IIUi II

u, = A-'wi, w^ = -i ^ (5.6.21) 11^2 II and so on until

u, = A-^Wfc_i, w, = -\, (5.6.22) CALCULATION OF EIGENVALUES 193

it follows that as A: -> oo (5.6.23)

Wt X,. and Similarly, if X, is distinct, i.e., if no other eigenvalue is equal to A,, and if p is closer to A., than any other eigenvalue, then the inverse power method for A — )SI generates

(5.6.24)

Thus, if Ui = (A — /SI)~^Xo and Wj = u,/||Ui ||, the iterative sequence

u, = (A - i6I)-^w,_i, w, = -^ /: = 2, 3,..., (5.6.25)

yields

w^Aw^ as k oo (5.6.26) wjw^ (KT

and

w. X;. (5.6.27)

If A, has a multiplicity greater than 1, the power method will still generate the eigenvalue and one of the eigenvectors, but not all of them. EXAMPLE 5.6.1. Use the power method to find the eigenvectors and eigenvalues of the matrix

4 + i ^(3 + 20 1 -d+O ~2 1 -d + 0 ^(3-20 l+i A = 2 1 5(3 + 0 2 -1(1- 2^ 5 1+i ^(3 + 20 -d+O

We choose a tolerance of 10 ^^, an initial eigenvalue guess of 2, and an initial eigenvector guess of ""1

Xn = 194 CHAPTER 5 THE EIGENPROBLEM

Application of the power method outlined above results in convergence after 406 steps (i.e., ||X<*+" - A.'*'|| < 10"'^). The resulting solution is A., = 3 + / and

1 -1 X, = -(0.0335351 +0.498740 -1 1

which can be normalized to 1 -1 X, = -1 1

Next, we apply the inverse power method on the matrix A"' given by

1 1 1 1 (7 - 9/) (2 + 6/) (1+30 (1 + 3/) 30 Ts To 30 1 (1 + 30 1(7 + 60 -1(1+30 -1(1 + 30 A-' = To -1(1+30 -1(7 + 210 ^(2 + 0 1(1+30 J J oU .1(1 + 30 -1(2 + 60 1(1+30 1(11+30

Using the same tolerance and initial guesses, the solution converges after 33 iterations, yielding ^2= I —i and

Xo =

For the intermediate eigenvalues, we use the inverse power method on the matrix {X — bl)~K Choosing b = 2 gives

1 _)_ ^(3-0 \i2-i) :(l-0 ~2 1 1 1 1 :(l-0 (A - fcl)-' = 2 -i2 2 1 1 \i5-i) 0 1 1 _\(l-i) r(l-0 5(2-0 2 CALCULATION OF EIGENVALUES 195

and by using the original values of the tolerance, eigenvalue, and eigenvector, the solution converges after 68 steps, giving A.3 = 3 and

X, = 0.37794

By changing the initial eigenvector guess to

and recomputing, we find that the solution again converges after 68 steps and the resulting eigenvalue is X4 = 3, as before. However, we find that the eigenvector is different: "-1

X4 = 0.57735

and, by inspection, we see that it is not colinear with X3 and is, in fact, a separate solution. This is an example of a degenerate eigenvalue. More specifically, we say that A.3 = 3 with multiplicity pj^^ = 2. The two eigenvectors, X3 and X4, form a linearly independent basis in £"2 ^^^^ ^^^J" some careful examination, we see that any linear combination of these vectors in also an eigenvector of A with eigenvalue A. = 3. A Mathematica program for this example has been included in the Appendix. A way of roughly estimating the neighborhood of eigenvalues is provided by Gerschgorin 's theorem:

GERSCHGORIN'S THEOREM. Each of the eigenvalues of A lies in the union of circles

l^-a^il

The proof of the theorem is straightforward. If k is an eigenvalue of A, then there exists an eigenvector x such that Ax = Ax. We can rewrite this as

(X - aii)Xi = Yl ^ij^j> / = 1,..., n. (5.6.29) 7=1 196 CHAPTER 5 THE EIGENPROBLEM

Jm{z)

Re{z)

FIGURE 5.6J

where we have isolated the diagonal terms from the nondiagonal terms. But if we choose / such that |x,| = ||x||^^^, then

(5.6.30)

which proves the theorem. EXAMPLE 5.6.2. Consider the matrix 111 1 3 2 (5.6.31) ^2 1 4

The circles in Eq. (5.6.28) are thus

|z-l| = 2, |z-3| = 3, |z-4|=3, (5.6.32)

• and the eigenvalues lie in their union. The eigenvalues for this example turn out to be real and lie between | and 6, an interval that is, indeed, contained in the union of the three circles shown in Fig. 5.6.1.

PROBLEMS

1. Consider the following inner product candidates: (a) Does (x, y) = x^Ay, where

form an inner product? Why? PROBLEMS 197

(b) Does (x, y) = x'ÂÂy form an inner product? Why? (c) Does (x, y) = x^By, where 4 1 B = 1 4 form an inner product? Why? (d) Find the eigenvalues of A, AÂ, and B. 2. Use the Gram-Schmidt procedure to construct an orthonormal set from the vectors

1 2 1 0 X2 = 1 X3 = 2 1 0 3

3. (a) Compute exp(rA) for

2 1 2 A = 1 2 1 1 1 2

(b) Compute sin(AO. 4. Consider the matrix 2 1 A = 3 2 1 4

(a) Evaluate tr^A, 7 = 1, 2, 3. (b) Find the eigenvalues of A. (c) How many eigenvectors does A have? Find them. (d) Find the reciprocal vectors of the eigenvectors and give the eigenvalues and eigenvectors of the adjoint A^ 5. Consider the polynomial

(a) What would be the solution to this equation if solved on a machine using floating-point arithmetic and nine decimals? (b) Use the Newton-Raphson method to solve the polynomial to 12-place accuracy to test the solution found in part (a). 198 CHAPTER 5 THE EIGENPROBLEM

6. Consider a 10 x 10 matrix A whose eigenvalues are X^ = i. Suppose that the jih trace is evaluated with a systematic error of 10"^(—1)^; i.e., the characteristic polynomial is evaluated as PnW = i:[aj-\-i-iyio-']{~kr-^ j=0

instead of

7=0 where aj = tr^ A. (a) Calculate aj from the eigenvalues of A. (b) Find the roots of P„(A.) and compare them to the roots of P„(A,). 7. Does the infinite set

^2» ^3» ' 7 = 2, 3,...,

where the iih component of e^ is 5,^, form a basis set in E^l Why? 8. Prove that AB and BA have the same eigenvalues, where A and B are both perfect nonsingular matrices. How are the eigenvectors of AB and BA related? 9. Prove that AÂ is positive semidefinite, i.e., xÂÂx > 0. 10. (a) Show that the matrix

0 1 0 0 0 0 0 1 0 0 A =

-a„ -a„ has eigenvectors 1

X; =

where X, is an eigenvalue of A. Sho.^rw .als o that the characteristic equation becomes

A" + a^k"-^ + aX~^ + • • • + a„ = 0.

A is known as the companion matrix because of its relationship to an (n + l)th-order ordinary differential equation. (b) Suppose n = 10, «! = «2 = • • • = ^9 = 0» ^^^ ^w = 10~^°. Find the eigenvalues and eigenvectors of A. PROBLEMS 199

11. Consider the matrix 2-2 3 1 1 1 1 3 -1

(a) Compute the eigenvalues of A. (b) Compute the eigenvectors x, of A and z, of A^ Show by computation that X, and z, are a biorthogonal set, i.e., xjzj :^0 if i = j and = 0 if i ^ j. Construct the reciprocal set for {xj. (c) Show by direct calculation that A obeys the Hamilton-Cayley theorem, i.e., that A satisfies its characteristic polynomial, i.e., P^iA) — 0. (d) Compute A"^ using the Hamilton-Cayley theorem. (e) Compute A~^ (f) Show that the eigenvectors of A are linearly independent. (g) Calculate the eigenvalues of

2A^ + 5A-^-f 3A2-7A- 8I.

12. Consider the space S of square integrable functions f{t) defined on the domain —l

{f.g) = / rit)g{t)dt.

Suppose Zi(0 = l, Z2(0 = ^ ^3(0 = ^'. Use the Gram-Schmidt procedure to construct an orthonormal set of functions jC|(0,^2(0»-^3(0 from these functions. 13. Suppose -2 1 1

(a) Find the normalized eigenvectors x^ and the eigenvalues A., of A. (b) Show by direct substitution that

J. ^^^ XiXi "T" X2X2.

A. — A1X1X1 "T" A2X2X2

and exp(«A) = exp(A.,Ox,x| + e,x^{k2t)x2A.- (d) Solve dx = Ax, dt where x(t = 0) = 200 CHAPTER 5 THE EIGENPROBLEM

14. (a) Is the inner product x'^Ax positive for all nonzero, real vectors x, where

-2 1 O" 0 -2 1 0 0 -2_

(b) Is A positive definite? Why? (c) What are the eigenvalues of A? Find the eigenvectors of A. 15. Find the eigenvalues and eigenvectors of the matrices (a) "2 1] A = 1 2 (b) "4 1^ A = 0 4J (c) 2 1 0 0 2 1 0 0 2 (d) 2 10 0 0 0 0 2 10 0 0 0 0 2 0 0 0 0 0 0 4 10 0 0 0 0 4 0 0 0 0 0 0 3 16. According to the Hamilton-Caley theorem, a square matrix satisfies its characteristic equation, i.e..

P„W=0.

For the matrix -1 0 1 A=: 5 -1 -5 2 -4 -3

(a) Find the traces tr^ A, j = 1,2, and 3. (b) Show by direct substitution that A satisfies its characteristic equation. (c) Find the eigenvalues of A. (d) Is A perfect? Why? (e) Find the eigenvectors of A. PROBLEMS 201

(f) Find the matrix X that diagonalizes A under a similarity transformation. (g) Find the inverse of the matrix X. (h) Find the set of vectors z, that are the reciprocal of the set of eigenvectors of A. (i) Evaluate A"* using the Hamilton-Cayley theorem and the spectral resolution theorem. 17. According to the power method, the maximum-magnitude eigenvalue and its eigenvector can be found by starting with an arbitrary vector XQ and hunting the vector

x^ = AXjt_, =A%, ^ = 1,2,....

An estimate of the maximum-magnitude eigenvalue is t X = xjx Consider the matrix 2 1 0 A = 1 -2 1 0 I -2 (a) Find the eigenvalues and eigenvectors of A by a direct method. (b) Use the power method to find the maximum-magnitude eigenvalue and its eigenvector of A. How large must k be to find the estimated eigenvalue A and its estimated normalized eigenvector x to a tolerance

\k-k\ 10"

and ||x-x||<10- where X is the maximum-magnitude eigenvalue and x its normalized eigenvalue. It is probably best to do this part of the exercise on a computer, although it is not impossible to do by hand, (c) Calculate the adjugate of A and from it the inverse of A. Use the power method to determine the minimum-magnitude eigenvalue of A to the same tolerance as given in part (b). 18. The traction vector t^ on a surface whose normal is in the direction it is related to the stress tensor T of a material by

t, = T-n, (1) where n is a unit vector. In Cartesian coordinates, Eq. (1) can be expressed as

Tn N2

'23

'32 '33 202 CHAPTER 5 THE EIGENPROBLEM

or as t = Tn in matrix notation. From physical principles it is known that the stress tensor is real and that 7), T . If the material is loaded so that T,, 3, 7^33 = 4, 7^12 = 2, and ri3 = T22 = 0, we wish to find the principal directions, i.e., the directions in which the traction vector is the largest and the smallest. To find these directions, we want to find the extrema of Q — A,n^n, where Q — n^t = n^Tn = Yl T^j^t^j ^^^ ^ is the Lagrange multiplier resulting from the constraint that n be a unit vector, i.e., that n^n = I. The extremum condition is

[e-Wn]=0, / = 1, 2, and 3. (2) dn (a) How many linearly independent solutions does Eq. (2) have? Why? (b) Find the direction vectors n and the values of the traction vectors in these directions (principal directions and values). (c) It turns out that the material is nonlinear and a pair of points in the material originally separated by x is separated by y,

y = T^/^x

after the load is applied. If x^ = [1, 1,1], find the length of y after loading. 19. Consider the matrix equation

A^ - BA + C = 0,

where 2 1 1 2 B C = 1 2 2 1 Find A with the aid of the spectral resolution theorem for B and C. 20. Prove that tr^(BAB"^) = tr^(A). 21. Prove that tri((AB)*) = tri((BA)^). 22. Use the result from Problem 21 to prove that tr^(AB) = trjt(BA). 23. Prove that if la^J > X^y^, Wtjl for all / = 1, 2,..., /i, then the n x n matrix A is nonsingular. 24. Prove Perrin's theorem, which states that, for an n x n real-valued matrix A with a^j > 0 for all / and j, the eigenvalue of the largest magnitude is: (a) Real and positive. (b) Nondegenerate. 25 Use the recurrence formulas in Eq. (5.6.6) to compute the eigenvalues of the tridiagonal matrix

5 1 0 0 0 3 -5 1 0 0 A = 0 3 -6 2 0 0 0 4 -6 1 0 0 0 5 -7 FURTHER READING 203

Compute the eigenvectors of A. 26. Use the power method to find the maximum eigenvalue of

1 2 1 A = 1 3 2 2 1 4

27. Use the power method to find the maximum eigenvalue of an n x n matrix with elements an — " <• + /•-!• Find the eigenvalue as a function of n for n between 1 and 50. Plot the results. 28. Evaluate the contour integral:

r^f/U)(zI-A)-'Jz, Inilit I Jc'Jc where A is an n x n nonsingular matrix and / = \f^.

FURTHER READING

Amundson, A. R. (1964). "Mathematical Methods in Chemical Engineering." Prentice Hall, Englewood Cliffs, NJ. Bellman, R. (1970). "Introduction to Matrix Analysis." McGraw-Hill, New York. Golub, G. H., and Van Loan, C. F. (1989). "Matrix Computations." 2nd Ed. Johns Hopkins University Press, Baltimore. Noble, B. (1969). "Applied Linear Algebra." Prentice Hall, Englewood Cliffs, NJ. Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1988). "Numerical Recipes in Fortran." 2nd Ed. Cambridge University Press, Cambridge, UK. Wilkinson, J. H. (1965). "The Algebraic Eigenvalue Problem." Clarendon Press, Oxford. Noble, B., and Daniel, J. W. (1977). "Applied Linear Algebra." Prentice Hall, Englewood Cliffs, NJ. Smith, B. T. (1974). "Matrix Eigensystem Routines, EISPACK Guide," Springer-Veriag, New York. This Page Intentionally Left Blank PERFECT MATRICES

6.1. SYNOPSIS

The most important property of a perfect matrix is that it obeys the spectral resolution theorem, A = XlLi K^M^ where X, and x,, / = 1,...,«, are the eigenvalues and eigenvectors of A and z,, / = 1,..., n, are the eigenvectors of the adjoint A^ We say the vectors {x,} and {z,} form a biorthogonal set, i.e., xjz^ = zjx, = 5,y. The set {z,} are the column vectors of the matrix (X~^)^ where X = [x,,.. ., x,,]. Thus, the reciprocal vectors {z} can easily be computed from the eigenvectors {x,}. We will show that the spectral decomposition of A leads to the result /(A) = Z^"=i fiî)^n^h when the function f{t) is defined for values of t equal to the eigenvalues of A. The spectral resolution theorem for a perfect matrix is equivalent to the following theorem: if A is perfect, it can be diagonalized by a similarity transformation; i.e., there exists a matrix X such that X~ÂX = A, where A is a diagonal matrix. The column vectors of X are the eigenvalues of A and the diagonal elements of A are the eigenvalues of A. Furthermore, if the eigenvalues of A are distinct, i.e., if all the roots of the characteristic polynomial P„(A) = |A — A.I| are of multiplicity 1, then A is a perfect matrix, but a perfect matrix need not have distinct eigenvalues. We say a matrix U is unitary if it has the property U^ = U~*. This property is equivalent to the property ||x|| = ||y|| for every x e £"„, where y = Ux. In other words, as a linear operator in E„, a unitary matrix rotates a vector without changing its length. We will prove in Section 6.6 that any square matrix can be semidiagonal- ized by a unitary transformation; i.e., for any square matrix A, there exists a unitary matrix U such that UÂU has only zero elements below the main diagonal.

205 206 CHAPTER 6 PERFECT MATRICES

A normal matrix is defined by the property AA^ = AÂ. The class of normal matrices includes unitary matrices (A^ = A~*), orthogonal matrices (real matrices with the property A^ = A~^), and self-adjoint matrices (A^ = A). With the aid of the semidiagonalization theorem, we will prove that a normal matrix has the spectral decomposition A = Yli îX,xt, where X, and x^, / = 1,..., n, are the eigenvalues and the eigenvectors of A. The eigenvectors {x^}, therefore, form an orthonormal basis set, i.e., xjx^ = 5,^. The spectral decomposition theorem for normal matrices is equivalent to the following theorem: if A is normal, there exists a unitary matrix X such that XÂX = A; i.e., A can be diagonalized by a unitary transformation. The column vectors of X are the eigenvectors of A and the elements of the diagonal matrix A are its eigenvalues. If A is a unitary or orthogonal matrix, its eigenvalues are of modulus 1, i.e., I A. J = 1. If A is self-adjoint, its eigenvalues are real. If all of the eigenvalues of a self-adjoint matrix are positive, then it is positive definite. In the most general case of a normal matrix, the eigenvalues can be arbitrary complex numbers. We will find that the spectral resolution theorem for perfect matrices greatly simplifies the solution of the differential equations dx/dt = Ax and d^x/dt^ = Ax. For self-adjoint and normal matrices, the theorem can be used to find a coordinate system in which coupled harmonic oscillators are decoupled. The theorem can also be used in the analysis of the extrema of a multivariable function. For positive-definite matrices, the theorem will be used to find the covariance and mean square devia- tion for a multivariate Gaussian distribution. As an example of a negative-definite matrix, the problem of one-dimensional diffusion or heat transfer will be analyzed. The spectral resolution theorem for normal matrices will also be used to prove that, in the /? = 2 norm, the norm ||A|| of a normal matrix A is lAâ^l- Furthermore, the condition number K{A) equals |A.,„axl/l^minl' where k^^^ is the eigenvalue of maximum absolute value and X^^-^^ is the eigenvalue of minimum absolute value. We will prove that the product of a positive-definite (or negative-definite) and a self-adjoint matrix is a perfect matrix and has real eigenvalues. We will also show that if A is perfect, and has only distinct eigenvalues, and if A and B commute (AB = BA), then A and B have the same eigenvectors. Also, if A and B are self- adjoint and commute, then they have the same eigenvectors. If A and B commute and are self-adjoint, they can be diagonalized by the same unitary transformation. We can prove that if A is positive definite and B is self-adjoint, the bilinear function x^(A + B)x can in a coordinate transformation be transformed into w^(I + M)w, where M is a diagonal matrix with real elements along the diagonal. We will also show that if A is perfect, and has only distinct eigenvalues, it can be represented by Sylvester's formula in terms of eigenvalues and the adjugates of A — A,, I, Finally, we will show that a pth-order initial value problem can be reduced to the solution of the problem dx/dt = Ax for one or more equations. For a single pih- order differential equation with constant coefficients, we will derive the eigenvectors of the companion matrix, and the necessary and sufficient conditions for A to be a perfect matrix are given.

6.2. IMPLICATIONS OF THE SPECTRAL RESOLUTION THEOREM

In Chapter 5 we saw that a perfect n x n matrix A has n linearly independent eigenvectors x,, / = I,... ,n. These form a basis set and imply the spectral IMPLICATIONS OF THE SPECTRAL RESOLUTION THEOREM 207

resolution theorem

A = i:x,x,z/. (6.2.1)

Equation (6.2.1) is known as the spectral resolution or spectral decomposition of A. The vector set {z,} is composed of the n eigenvectors of A^ (with eigenvalues {A,*}), which are also the reciprocal vectors to the set {x^}, i.e.,

xjz,.=z)x,=6,^.. (6.2.2)

If {x,} and {z,} are calculated independently as eigenvectors of A and A^ respectively, one must binormalize the vectors z, so that xjz, = 1. An implication of Eq. (6.2.1) is that to generate the class of all perfect matrices in £„, we would first construct every linearly independent set of n vectors {x,}, then construct their reciprocal vectors {z,}, and, finally, generate every linear combination Ylt ^,x,zj for every possible set {A,,} of complex numbers. Since

A*x,. = X*x,- and (A^)*z,. = (A.;)*z,., (6.2.3)

where k is any positive integer, it follows that the eigenvectors Xj,..., x„ are also eigenvectors of A^ with eigenvalues A*,..., Aj, and thus the spectral decomposition of A^ is

A* = X:x*x,z+. (6.2.4)

If A is nonsingular, then A~^ exists and A, ^ 0, i = 1,..., n. If Ay =0, then |A| = n"=i ^i = 0 ^^^ 1^^ matrix would then be singular. However, since A"^ exists, the equation Ax^ = A,x, can be rearranged to get

A-'x,. = Af^x,., / = 1,...,«, (6.2.5)

where k is any positive integer. We see that the eigenvectors x,,..., x„ are also eigenvectors of A~* with eigenvalues A^*,..., A~^. The spectral decomposition of A~^ is, therefore,

A-^ = X^Ar^x,zJ. (6.2.6)

If f{t) is a function that can be expressed in the form

OO CO /(0 = Ec/ + E^*''' (6-2.7) k=0 k=\ for t equal to the eigenvalues {A,}, then Eqs. (6.2.4) and (6.2.6) lead to the spectral decomposition

f(A) = J2fiK)^i^]- (6.2.8) 208 CHAPTER 6 PERFECT MATRICES

Thus, as Stated in the previous chapter, if f(t) is expressible in a Laurent series or a Taylor series (J^ = 0), the matrix function /(A) has eigenvectors x, and eigenvalues /(A,,) and obeys the spectral decomposition theorem given by Eq. (6.2.8). Note that if f{t) is a Laurent series, then A must be nonsingular for Eq. (6.2.8) to hold. If f{t) is a Taylor series, then A simply has to be perfect for Eq. (6.2.8) to be valid. EXAMPLE 6.2.1. Consider the matrix 0 1 A = (6.2.9) 1 0

The eigenvalues of A are X^ = —1 and A.2 = 1, and the corresponding eigenvectors are 1 1 and (6.2.10) 1 1 LV5J For this matrix, it turns out that the reciprocal basis vectors are identical to the eigenvectors (Z; = x,, i = 1,2) since the x, are orthogonal. From Eq. (6.2.10) we can construct the dyads

1 1 1 2 and 2 2 (6.2.11) 1 1 1 "2 L2 2 j and using the spectral resolution theorem of Eq. (6.2.8), the matrices exp(AO, sin At, and cos At can be evaluated as !_ 1 - - 1 1 - 2 2 2 2 coshr sinhr exp(AO = exp(—0 + exp(r) = 1 1 1 1 sinhr coshr "2 2- -2 2. (6.2.12) r 1 1 - - 1 1 - 2 2 2 2 0 smt sin A/ = sin(—0 + sin(0 — 1 1 1 1 sint 0 L 2 2- -2 2- (6.2.13) 1 1 - - 1 1 - 2 2 2 2 cost 0 cos At = cos(—0 4- cos(0 — 1 1 1 1 0 cosf L 2 2- -2 2- (6.2.14) For the case of simple perfect matrices, the spectral resolution theorem can be applied using Mathematica for most functions (built-in or user defined)—see the Appendix. IMPLICATIONS OF THE SPECTRAL RESOLUTION THEOREM 209

••• Exercise 6.2.1. Using the spectral resolution theorem, derive expressions for the matrices exp(AO, sin Af, and cos Af for the matrix

0 -i A = / 0

where / = V—T. The spectral resolution theorem for perfect matrices can be used to compute /(A) even if /(O cannot be expanded in a series in t. For example, consider the square root of A, i.e., /(A) when f{t) = t^^^. If we set

f{k) = 'tkWT], (6.2.15) 1 = 1 we see by direct calculation that

i j = E x;^^x;/^5,,x,z] = J: X,X,ZJ (6.2.16)

= A.

The property we want of the square root ^/A is that, when it is squared, the result equals A. The matrix defined by Eq. (6.2.15) does just this. Analogously, the vth power of A is J2i K^i^h where —oo < P < oo if | A| 7*^ 0 and 0 < v < 00 otherwise. Another quantity of interest is the logarithm of A, i.e.. In A. We require that the matrix In A have the property

exp(lnA)== A; (6.2.17)

i.e., the exponential of the logarithm of A equals A. Let us define

n lnA = X]lnX,.x,zJ. (6.2.18) 1=1 The eigenvectors Xi,..., x„ of A are thus eigenvectors in Eq. (6.2.18) and form a basis set. According to the spectral resolution theorem, if /(/) = exp(f), then

n n /(In A) = X:exp(lnX,)x,zJ = E^/^.^J = A. (6.2.19) i=\ i=\ Thus, the exponential of Eq. (6.2.18) yields A. the desired property of the logarithm of A. Since the logarithm of 0 is —00, the spectral decomposition fails when A, == 0. Also, when an eigenvalue A, is negative or complex, one has to decide what meaning to give In A,. The spectral resolution theorem is especially useful for investigating the stability of nonlinear systems. Suppose dx — =f(x), (6.2.20) at 210 CHAPTER 6 PERFECT MATRICES

where x e E„ and the components /](x) of f(x) are nonlinear functions of x. At steady state.

f (x^) = 0. (6.2.21)

To find the steady states, we can use the Newton-Raphson scheme; i.e., we begin with a guess, x^^\ and iteratively solve

j(x(fc))(X<^+l)_X^^))^_f(x(^))^ (6.2.22)

/: = 0, 1,..., as described in Chapter 3. The elements of the Jacobian J(x) are just

Suppose we have found a steady-state solution x^ and we would like to know if the solution is stable to small perturbations, i.e., small values of x — x^ For x near x% one can linearize the right-hand side of Eq. (6.2.20) to obtain dx = f(x^)-}-J(x^)(x-x^). (6.2.23) It Defining y = x — x^ and recalling that f (x^) = 0, we find

(6.2.24)

Conveniently, the Jacobian J(x^) would already have been evaluated in the Newton- Raphson solution for x^ The formal solution to Eq. (6.2.24) is

y(0 = exp(a)yo, (6.2.25)

where y^ = y(r = 0) is the initial perturbation. If the matrix J is perfect and if (x,) and {z,} are the eigenvectors of J and J^, then the spectral resolution theorem yields

y(0 = Y.txi^{Xit){z]yQ)Xi. (6.2.26)

Thus, if the real part of the eigenvalues X, of J are negative, then y -> 0 as r -> oo and so the steady state is stable to small perturbations. Otherwise, it is not. If stability of the steady state is all we want to know, we only need to know the eigenvalues of the Jacobian. EXAMPLE 6.2.2. In a chemical reaction process, the concentrations obey the kinetic equations dx = Ax'-y^ - Ixy^ + 5j - 9 (6.2.27) lit

-/ = -3JCV - 12JC/ - 3y -h 9. (6.2.28) dt The Jacobian for the system is

8JCj2 - ly'^ %x^y -Axy-\-5 J = (6.2.29) 9x^y - I2y^ -3x^ - 24^:^ - 3 IMPLICATIONS OF THE SPECTRAL RESOLUTION THEOREM 2 I I

A Steady-State solution to the problem is

X = 0.07709, y = 1.89351 (6.2.30)

for which the corresponding eigenvalues of J are

Xi = -5.73216 + 13.9189/, k^ = -5.73216 - 13.9189/. (6.2.31)

Thus, the steady state is stable. Since X^ and A,2 are different, we know that the matrix • • • J is perfect. ••• EXERCISE 6.2.2. Give the spectral decomposition of exp(r J) for the above example and plot x and y versus t ioi x^— x^ = y^ — y^ =^ 1. It is interesting to think of the spectral resolution of the matrix A as a decomposition into a weighted sum of special dyads. If we define the dyad

E,. =x,.zj, (6.2.32)

then spectral decomposition yields

n A = X^;i,E,, (6.2.33) 1=1

a dyadic representation of A. It is easy to show that, since x/z^ = 5,^, the dyads E, have the properties

E? = E,, E,E^. = 0 if / ^ 7, I = f^E,. (6.2.34)

As an operator, E, projects a vector y onto the direction of the eigenvector x,, i.e.,

E,y = (zjy)x,. (6.2.35)

The E, are said to be projection operators and the set {E,) for a perfect matrix is said to form a complete set of projection operators because

EE,y = y; (6.2.36) i=l i.e., the sum X!, E, projects any vector y e E„ onto itself. A given E^ projects its • • • eigenvector x, onto itself, i.e., E^x, = x,. Furthermore, E,Xy = 0 if / 7«^ j. ••• ILLUSTRATION 6.2.1 (Process Control of a Phase Separator). A process stream of benzene is partially condensed and flashed in a phase separator at P = 14 atm and T = 200°C. The flow rates to the separator for the liquid and vapor streams are W, = 300 lb„,/min and W^ = 160 Ibm/min, respectively. A common strategy for controlling the process environment is illustrated in Fig. 6.2.1. The liquid level in the separator is maintained at a constant set point value of z = z* by a liquid level controller (LLC) and a control valve on the exiting liquid stream. Simi- larly, the pressure in the vessel is maintained at P = P* by a pressure controller (PC) and a control valve on the the exiting vapor stream. 212 CHAPTER 6 PERFECT MATRICES o- 9 PC Vapor

P,T Wu W, P1---9 LLC

Liquid

FIGURE 6.2.1

We can derive expressions for the time derivative of our control variables {z and P) by performing mass balances on both the liquid and the vapor streams. This liquid stream mass balance yields

p,s^ = w, - p,e„ (6.2.37)

where S is the internal cross-sectional area of the separator vessel, p, is the liquid benzene density, and 2i is the volumetric flow rate of the exiting liquid stream. If we assume ideal gas behavior, we can write the vapor mass balance as

MP Gv. (6.2.38) )- w. where M is the molecular weight of benzene, R is the universal gas constant, and <2v is the volumetric flow rate of the exiting vapor stream measured at P and T in the vessel. L — z is the total vapor phase volume. The volumetric flow rate of the liquid can be estimated using Bernoulli's equation as

p,Z + (P'Pi) Q, = 7tD\ (6.2.39) 8a

where P^ is the downstream liquid pressure and Dj is the inner pipe diameter for the liquid stream. The parameter a represents the collective energy-dispersive terms due to friction within the pipe, bends in the pipe, entrance and exit effects, and, most important, the dispersive factor corresponding to the control valve. The value of a is adjusted by opening and closing the control valve and hence regulating the liquid flow rate out of the vessel. DIAGONALIZATION BY A SIMILARITY TRANSFORMATION 213

A similar expression can be derived for the vapor flow rate by assuming adiabatic expansion through the vapor stream control valve. For an ideal gas, we get

, Mnw^)^!!!^^. (6.2.40,

where P2 is the downstream vapor pressure, D2 is the inner pipe diameter for the vapor stream, y is the ratio of heat capacities, and p again is the collective energy- dispersive terms for the vapor stream. Substituting these expressions into the mass balances gives the following differential equations for the control variables:

dz W, nD\ p,z-i-{P-PO (6.2.41) dt Sp^ S y 8a and

dP PW, PixD\ p,z + {P-P,) dt Sp^(L-z) S(L-z)y Sa %l'- (6.2.42)

"^ SM{L - z) ~ SiL-z)i SM{y - \)P "

(i) For the process variables, P* = 14 atm, T = 200°C, D^ = 3 in., D2 = 4 in., P, = 5 atm, P2 = 5 atm, M = 78.11, p^ = 54 IbJfX^ S = 113.1 in.^ z* = 5 ft, L = 7 ft, and y = 1.21, solve for the equilibrium values a* and p*. By constructing the appropriate Jacobian matrix, determine whether this equilibrium is stable or not. (ii) To account for fluctuations in W^ and Wj, as well as both downstream pressures, the parameters a and p are to be controlled with the following proportional control schemes: dot = -k,(z-z*) (6.2.43) at ^ = -k^iP - F*). (6.2.44) dt Determine whether this system is stable for the control parameter values ky = 1.0 sec~^ ft~^ and ^2 = 0-5 sec~^ psi~^ Will the control scheme produce oscillatory • • I behavior?

63. DIAGONALIZATION BY A SIMILARITY TRANSFORMATION

We have previously defined a perfect nxn matrix as one that has n linearly independent eigenvectors. An equivalent definition of a perfect matrix is that A is perfect if it can be diagonalized by a similarity transformation. To see the equivalence, consider the matrix

X = [x,,X2,...,xJ, (6.3.1) 2 I 4 CHAPTER 6 PERFECT MATRICES

where the column vectors are the eigenvectors of A. Taking the product

AX = [Axi, Ax2,..., Ax J = [A.1X1, X2X2,..., X„xJ (6.3.2)

and noting that if A is the diagonal matrix

Xi 0 0 0 A, 0 A = (6.3.3)

0 0

then

Ai 0 0 0 X2 0 XA = [xi,...,xj [A^Xj,..., A„x„], (6.3.4)

0 0

and so we find

AX = XA or X-^AX = A. (6.3.5)

Thus,

THEOREM. 7/" A is perfect, it can he diagonalized by a similarity transformation. The column vectors of the transformation matrix X are the eigenvectors of A and the elements of the diagonal matrix are the eigenvalues of A. When the matrix function /(A) can be spectrally decomposed, i.e., when /(A,) exists for / = 1,..., n, it can also be diagonalized by a similarity transformation. This follows from the fact that /(A)Xj = /(A,,)Xj, / = 1,..., n, and so the argument leading to Eq. (6.3.5) yields the result

/(Ai) 0 0 0 /(X2) 0 X-7(A)X =/(A) = (6.3.6)

0 0 fiK)

Rearranging Eq. (6.3.6) in the form

/(A) = X/(A)X-', (6.3.7)

we see that if A is perfect, then the solution to

dy = Ay, yo = y(/ = 0), (6.3.8) dt DIAGONAUZATION BY A SIMILARITY TRANSFORMATION 215

exp(fA.i) 0 0 0 0 y = XexpaA)X-'yo = X yo-

exp(a„) (6.3.9)

again illustrating that the asymptotic behavior of y can be understood by knowing only the eigenvalues of A. ILLUSTRATION 6.3.1 (Heat Exchanger Profile). Countercurrent heat exchang- ers are used extensively to cool hot process fluids. Consider a process stream at temperature Tj, entering a water-cooled heat exchanger illustrated in Fig. 6.3.1. It is desired to cool the stream down to a temperature T, shown exiting the exchanger on the right. Cooling water at T^ is supplied to the exchanger jacket and exits at temperature T2. The process stream has a mass flow rate of W^ (kg/s) and water is supplied at a rate of W^. We would like to determine the temperature profiles of each stream within the heat exchanger. The overall length of the fluid "contact zone" is given by L and we represent the position along this contact zone by x. We can then write enthalpy balances at any arbitrary position x along the exchanger for each stream. The process stream gives

(6.3.10) -~f^UA,{t^{x')-t^ix^))dx\

where t^{x) and t^(x) represent the process and water stream temperatures, T^^f is the reference temperature for calculating enthalpies, Cp and C^ are the respective heat capacities, U is the local overall heat transfer coefficient, and Aj is the area per unit length of heat transfer surface along the exchanger. The first two terms in Eq. (6.3.10) represent the net rate of enthalpy (we are neglecting pressure drops) entering a control volume encompassing the range JC = 0 to jc = jc within the exchanger. The last term

'water"

FIGURE 6.3.1 216 CHAPTER 6 PERFECT MATRICES

represents the total heat transferred to the water stream from the process stream. A similar equation can be derived for the water stream:

(6.3.11) + fuA^{tp{x')-tJx'))dx'.

Differentiating both balance equations with respect to x, we arrive at the following coupled set of differential equations:

dtp (6.3.12) dx %cp dK UA, -t{t,-tj. (6.3.13) dx w c We can express the above system of equations in matrix form using tlie following definitions:

ts (6.3.14)

and

-OTp Otp A = (6.3.15) *w ^w

with UA, «p = and ot^ = (6.3.16) w c Equations (6.3.12) and (6.3,13) then become di = At, (6.3.17) dx with boundary conditions

t(jc = 0) = (6.3.18)

(i) We now make use of the theorems in this section by defining the matrix S in the similarity transformation that diagonalizes A. Namely,

S-*AS = A, (6.3.19)

where A is the diagonal matrix containing the eigenvalues of A. We can then make the linear transformation s = S-4. (6.3.20) DIAGONALIZATION BY A SIMILARITY TRANSFORMATION 217

Multiplying Eq. (6.3.17) by S"^ and making use of the identity I = SS , show that Eq. (6.3.17) can be rewritten as

=:AS. (6.3.21) dx (ii) For the general case where a^^a^, find the eigenvalues of A and matrices SandS-^ (iii) Solve the decoupled system in Eq. (6.3.21) using Eq. (6.3.20) to transform the boundary conditions into the new representation. How are the temperature profiles different when ofp > oc^ as opposed to when a^

h Ci^Cj, i ^j, ij = 1,2, ...,f?, (6.3.22)

where kj^ is the rate constant for conversion of component / to j and k^j is the rate constant for conversion of component j to /. If co^ represents the mole fraction of component /, the rate equations for the reaction system are

d(Of --=J2kij(Oj-J2kji(Oi (6.3.23) ~dt J¥^i ;¥'* or did = K(y, (6.3.24) It where

'^2n K = ;¥2 (6.3.25)

n-\ %i ^«2

Note that i:;^^! ft), = 1. In general, K is not symmetric, but usually it is perfect. In this case, the concentrations of the chemical components as a function of time can be calculated from the formula

(M)(i) = exp(KOa>o = X]exp(^iOx,zJa>o, (6.3.26) 1=1 218 CHAPTER 6 PERFECT MATRICES

where A, and x,, / = 1,..., /i are the eigenvalues and eigenvectors of K and z^ are the corresponding reciprocal vectors. A system that has been studied carefully is the isomerization reactions of 1-butene, tmns-l-buione, and cr5'-2-butene. The three reactions are

1 -butene ;F=i c/5-2-butene

hi 1-butene ;=^ trans-2-butcnc

hi cr.s-2-butene ^ /ran5-2-butene

The chemical formulas for the isomers 1-butene and 2-butene are

H H H H I I I I C = C—C—C—H I I I H H H

and

H H H H I I I I H—C—C = C—C—H I I H H where C and H denote carbon and hydrogen and "—" and "=" denote single and double chemical bonds between atoms. The cis and trans forms of 2-butene are distinguished by the spatial angles of the single bonds (not depicted here). By comparing experiment and predictions, one can determine the rate constants kij. J. Wei and C. D. Prater report (Adv. Catal. 13 (1962), 256) the following values:

-14.068 4.623 1.000 K 10.344 -10.239 3.371 (6.3.27) 3.724 5.616 -4.373

Using these values, they give a comparison of calculated reaction paths with observed composition for butene isomerization shown in Fig. 6.3.2. Note that plotted in this form the units of t do not matter. K has been scaled so that k^^ = 1. This amounts to choosing a particular set of units for the time t. Thus, one can calculate o), versus t, but, without the scale factor, the absolute values of t are unknown.

(i) Find the eigenvalues, eigenvectors, and reciprocal vectors for the matrix K given for the butene isomerization reaction. (ii) Calculate as a function of time the concentrations of 1-butene and cis- and trans-2-butenc for several initial concentrations. (iii) Explain how to calculate the reaction paths shown in Fig. 6.3.2. MATRICES WITH DISTINCT EIGENVALUES 219

1-Butene

cis-2-Butene trans-2-Butene FIGURE 6.3.2 Reproduced from J. Wei and C. D. Prater (1962). "Advances in Catalysis and Related Subjects" Vol. 13, Academic Press.

(iv) Calculate the six reaction paths initially at the compositions (coi, coj, (o^) = (0.25,0.75,0), (0.75,0.25,0), (0,0.25,0.75), (0,0.75,0.25), (0.25,0,0.75), and (0.75,0,0,25). • • (v) Plot these new paths in a figure like that shown in Fig. 6.3.2.

6.4. MATRICES WITH DISTINCT EIGENVALUES

In Sections 5.3, 6.2, and 6.3, we have explored the implications of A being a perfect matrix. When can we know that A will be perfect? In subsequent sections we will show that normal, unitary, orthogonal, skew-symmetric, self-adjoint, and positive- definite matrices are always perfect. However, from the eigenvalue analysis presented in Section 5.4 we can already prove the following:

THEOREM. A matrix A is perfect if all of its eigenvalues are distinct, i.e., if each root of P^{k) = |A — A,I| =0 is of multiplicity 1. Recall from Eq. (5.4.11) that

(-l)tr„_i(A^XI). (6.4.1) dk But the nth-degree characteristic polynomial can be expressed in the factored form

(6.4.2) 220 CHAPTER 6 PERFECT MATRICES

and SO ^^ = (-l)En(^«-^)- (6.4.3)

If each eigenvalue is of multiplicity 1, it follows from Eqs. (6.4.3) and (6.4.1) that

^^^(_l)n(^,-^,)^0, (6.4,4)

and so tr„_j(A - X^I) 7^ 0, k=\,,.,,n, (6.4.5)

According to Eq. (6.4.5), since tr„_i(A — X^I) is the sum of the principal minors of A — kfj, at least one principal minor of A — XJ. is not 0, and so the rank of A — kfj is n — 1. From the solvability theorem, we know that the number of solutions of the homogeneous problem (A - X,l)x, = 0 (6.4.6)

equals the difference between n and the rank of A — A,^I, i.e., n — (n — I) = 1. Thus, A is perfect if all of its eigenvalues are distinct. It should be noted that having distinct eigenvalues is a sufficient but not a necessary condition for a matrix to be perfect. For instance, since Ix = x for any vector X, the eigenvalues of the unit matrix are all equal to 1 and, therefore, any basis set {Xj,..., x,j} in E^ is a set of eigenvectors for I.

6.5. UNITARY AND ORTHOGONAL MATRICES

All perfect matrices can be diagonalized by a similarity transformation. However, there is a large class of matrices, namely, normal matrices, that can be diagonalized by a unitary transformation. Before discussing this class, let us describe unitary and orthogonal matrices. In a complex linear vector space E^, the matrix A is a unitary matrix if it is a rotation operator in £"„, i.e., if the length of y == Ax is the same as the length of X. The equation representing this property is

\\xf = \\Axf for all X G £„. (6.5.1)

As an example of a rotation matrix, consider the following two Cartesian coordinate frames related by a rotation of angle 0 in Fig. 6.5.1. If the coordinates of r in the frame (/, j) are x and y and in the frame (/', /) they are jc' and y, it follows that

X = x' cos 0 — y sin 0 (6.5.2) y = X sin^ + y cos^.

In matrix notation, this reads X = Ax^ (6.5.3) UNITARY AND ORTHOGONAL MATRICES 221

FIGURE 6.5.1

where cos^ -sin^ X = X = A = (6.5.4) sin^ cos^ It is easy to show that

jc2 + / = x'2H-y2 or X = X' (6.5.5) which says that the length of r is unchanged—as required for the rotational transformation. We call the transformation matrix of a rotation in a Euclidean vector space an orthogonal matrix instead of a unitary matrix. This is because the Euclidean vector space is real. In what follows we shall see that orthogonal matrices are a subclass of unitary matrices, the former being restricted to rotations in real linear vectors spaces and the latter including complex linear vector spaces. To investigate the properties of unitary and orthogonal matrices, the following two theorems are useful. The first is:

THEOREM. If x^Bx = 0 forallxeE„, (6.5.6) then B = 0, where 0 is the zero matrix, all of whose elements are 0. To prove this theorem, we write x^Bx = 0 as

(6.5.7) i.J Choosing x^ j^O and jc, = 0, i 7^ k, Eq. (6.5.7) implies

bktM =^ or '^kk 0. (6.5.8) Since k is arbitrary, Eq. (6.5.8) proves that fo^ = 0, ft = 1,...,«. Next, we choose jr^ ,1^ 0, jc, 7^ 0, and jc, = 0, i ^ k or I. Then Eq. (6.5.7) implies

bkiXkX* + b,i,x,xl = 0 (6.5.9)

(b„ + b,t) Re(;ctJ:;) + /(b^, - b^) Imix^x;) = 0. (6.5.10) 222 CHAPTER 6 PERFECT MATRICES

Since the real and imaginary parts of JC;^^* ^^^ t)e varied independently, Eq. (6.5.10) implies

hi + ^/it = 0 (6.5.11) hi -hk = ^ or bf^j = bif^ = 0 for arbitrary k and /. This completes the proof that Eq. (6.5.6) is true if and only ifB — 0. The next theorem is:

THEOREM. If

B = B'^ and x^Bx = 0 for all real x e E„, (6.5.12)

then B = 0. The expression x^Bx = 0 can be written as

J^b,jx,xj=0. (6.5.13)

Again, we choose Xj^ ^ 0, JC, = 0, i ^ k, which leads to the conclusion bj^,^ = 0, k = 1,..., n. And, if we choose x^^ 7^ 0, x^ 7^ 0, jc, = 0, / 7^ A: or /, we find

{b,i-\-bt,)x,Xi=0. (6.5.14)

Since B = B^ by hypothesis (bj^i = ^;^), it follows that bj^i = 0 for arbitrary k and /, thus proving the theorem. Note that if B 7^ B^, the condition x^Bx = 0 for all real x e E^ does not prove B =: 0. For example, if

0 -1 B = (6.5.15) 1 0

then

x^Bx = —X1X2 + X2X1 = 0 for all x e E2. (6.5.16)

For this example, B has the property B = -B^ for which

(x'^Bx)'^ = -x^Bx. (6.5.17)

But x^Bx is a scalar (1x1 matrix) and so it must be equal to its transpose. Thus, Eq. (6.5.17) becomes

x'^Bx = -x^Bx (6.5.18)

or x'^Bx = 0 for any x G £„ if B = -B^. UNITARY AND ORTHOGONAL MATRICES 223

If B = B^, we say that B is symmetric, whereas if B = —B^, we say that B is asymmetric. Any arbitrary matrix can be decomposed into its symmetric and asymmetric parts:

(6.5.19) = i(B + B^) + i(B-B^).

Thus, in the quantity x^Bx, only the symmetric part of B contributes, i.e., x^Baj.yX = 0 for all X e E„, and so

x^Bx = x^B^yX = -x^(B + B^)x. (6.5.20)

This is, of course, not the case for the inner product x^Bx. Let us return now to Eq. (6.5.1), the defining equation for a unitary matrix, and rewrite it as

x'^'Ix = x"'"A^Ax

x^ (A^A - I)x = 0 for all x e £„. (6.5.21)

According to the theorem in Eq. (6.5.6), Eq. (6.5.21) establishes that a unitary matrix obeys the expression

A^A = I (6.5.22)

or, equivalently,

A-^=At; (6.5.23)

the inverse of aunitary matrix is its adjoint. Also, since |A^A| = |A|*|A| = 1, we see that the determinant of a unitary matrix is of unit magnitude; i.e., it is of modulus 1. Equation (6.5.22) proves that the column vectors of a unitary matrix form an orthonormal set. To see this, let A = [aj,..., a„] so that

A^A: [ai,a2,...,aj = [ajaj. (6.5.24)

However, A^A = I = [S^j] and so it follows from Eq. (6.5.24) that

aja,. =5,.., rij, 7 = 1,l,...,«,m. , (6.5.25)

which is the defining property of an orthonormal basis set in E„. In fact, Eqs. (6.5.23) and (6.5.25) are equivalent expressions for the defining property of a unitary matrix. 224 CHAPTER 6 PERFECT MATRICES

An orthogonal matrix is defined as a real matrix that rotates real vectors in £„; i.e., if A is an orthogonal matrix, then

xf = ||Ax|rorxnA'A-I)x = 0 for all real x€ E^, (6.5.26)

Since the matrix A^A — I is symmetric, the theorem in Eq. (6.5.12) is valid, and so an orthogonal matrix has the property

A^A = I or A-^ = A^ (6.5.27)

In this case, the determinant of A is equal to unity. Furthermore, the column vectors of A obey the conditions

a/^a^ =5,.^, ij = 1,...,/!. (6.5.28)

Again, the column vectors of an orthogonal matrix form an orthonormal set of real vectors, which form a basis set in E^. For the example given in Eq. (6.5.4),

COS0 — sin0 and an (6.5.29) sin^ cos^

It is easy to show that these are orthonormal vectors. Examples of 2 x 2 unitary matrices are

1 I 1 / 5 (6.5.30) —i V5 / 1 1 exp(/^) (6.5.31) exp(—/^) V5 —exp(—/^ ) J The eigenvalues of the matrices with the column vectors in Eqs. (6.5.29), (6.5.30), and (6.5.31) are, respectively.

Xj = cos 0 — i sin 0, X2 = cos 0 + / sin 0, (6.5.32) V2 V2 ^2 = ^0' + l)- (6.5.33)

and

A, = ^^P ' ^[-1 + expiHe) - (1 + 6exp(2i0) + (exp(4r0))'^^)], 2v 2 (6.5.34) X^ = !^^£iZ!_i[_i + exp(2j6i) - (1 + 6exp(2(6') + (exp(4/6i))'^^)]. 2V 2 All of these eigenvalues share the common property that their magnitude |A,J is equal to 1. In a later section we will show that all of the eigenvalues of any unitary or orthogonal matrix have an absolute value of 1. SEMIDIAGONALIZATION THEOREM 225

6.6. SEMIDIAGONALIZATION THEOREM

Suppose U is a unitary matrix and A is a square matrix of the same dimension. We say B is related to A by a unitary transformation if

B = U^AU. (6.6.1)

Since U^ = U"^ for a unitary matrix, a unitary transformation is simply a similarity transformation with a unitary matrix. B and A have the same eigenvalues since the eigenvalues of A are unchanged under a similarity transformation. The unitary transformation plays an important role in the powerful semidiagonalization theorem:

SEMIDIAGONALIZATION THEOREM. Given any square matrix A, there exists a unitary matrix U such that

0 ^2 U^AtAIUT — (6.6.2)

0 0

where the quantities X, are the eigenvalues of A. Thus, the theorem guarantees the existence of a unitary matrix that transforms A in a unitary transformation to an upper triangular matrix with the eigenvalues of A on the main diagonal. To prove the semidiagonalization theorem, consider first a 2 x 2 matrix. Let Xj be a normalized (||Xi || = 1) eigenvector of A and choose X2 to be a normalized vector orthogonal to X|, i.e., x|x2 = 0 and HXjH == 1. We can choose any vector that is linearly independent of Xj and use the Gram-Schmidt orthogonalization procedure to find a vector orthogonal to x^ Define the unitary matrix U = [x^, Xj]. Then

AU = A[x,, X2] = [XiXp AX2] (6.6.3)

and

X|AX2 U^AtAIUT — [XiXi,Ax2] = (6.6.4) 0 X2AX2

Since the eigenvalues of an upper triangular matrix are equal to its diagonal elements, X2AX2 is an eigenvalue of U^AU, and, therefore, it must be an eigenvalue of A (since a unitary transformation does not affect the eigenvalues of A). This proves the correctness of the semidiagonalization theorem for a 2 x 2 matrix. The rest of the proof will be accomplished by induction. Assume that the theorem is true for an M x n matrix and consider the (n -f 1) x (n + 1) matrix A. Let x^ be the normalized eigenvector of A corresponding to the eigenvalue Xp Choose the vectors X2,..., x„^.| such that

x]xj /, y = 1,.. .,n +1, (6.6.5) 226 CHAPTER 6 PERFECT MATRICES

and let

U = [Xi,X2, ...,X„+l]. (6.6.6)

The unitary transform of A is

^1 ^12 ^l,n4-l

U^AtAIUT — (6.6.7) B

where b^j = xJAxy for 7 = 2,..., n + 1 and B is an n x n matrix with elements XJAX^ for /, ;• = 2,..., n -j-1. The eigenvalues of B are the eigenvalues X2,..., X„^i of A. This follows because

n+l J~[(A.. -k) = lU^AU - XI^'^+^^I = (A.1 - X)|B - xr^^^l, (6.6.8) i = \

where I^"^^^ and I^"^ are the (n -f 1) x (n + 1) and n xn unit matrices, respectively. By hypothesis, there exists an n x n unitary matrix such that

X2 Z?22 ^2;n 0 A, K3n VBtnVv — (6.6.9)

0 0 ^n+l J We can, therefore, define the (n + 1) x (n + 1) unitary matrix

I 0 T (6.6.10)

from which Eq. (6.6.7) becomes

'^l.n+l 0 T+U^AUT = V^BV

(6.6.11) Aj fc|2 0 X,

0 0 \i-\-\ SELF-ADJOINT MATRICES 227

T and U are unitary, and so if Q = TU, then Q"^ = U-^T'^ = U+T^ = Q^ Thus, Q is a unitary matrix and Eq. (6.6.11) proves that if the semidiagonalization theorem is true for n x n matrices, it is true for (n -f 1) x (n + 1) matrices. Since we proved the theorem for the 2 x 2 case, this completes the proof for the general case.

6 J. SELF-ADJOINT MATRICES

Self-adjoint matrices, defined by the property

A = A^ or ay =aj., (6.7.1) are a very important class of matrices that appear in numerous physical problems. According to the semidiagonalization theorem, there exists a unitary matrix such that

A,| bi

bin X^AX = (6.7.2)

0 0

But taking the adjoint of Eq. (6.7.2) and using the property (X^AX)^ = X^A^X = X^ AX, we find that, for a self-adjoint matrix.

X\ 0 0 bn K 0 X+AX = (6.7.3)

bl h* K Comparison of Eqs. (6.7.2) and (6.7.3) leads to the conclusions that b^j = 0, A, = X*, and

X^AX = A, (6.7.4)

where A = [A,,^,^]. Thus, the eigenvalues of a self-adjoint matrix are real, and the matrix can be diagonalized by a unitary transformation. Since X^ = X~^ Eq. (6.7.4) can be rearranged to give

AX = XA or [Axi,..., Ax J = [X,Xi,..., A,„xJ, (6.7.5) which yields

AX; = X,X: i = 1,..., w. (6.7.6)

Since X is unitary, the x, form an orthonormal set (xjxy = 5,^). Stated in terms of the eigenproblem, we have just proved the following:

THEOREM. A self-adjoint matrix is perfect, its eigenvalues are real, and its eigenvectors can always be chosen to form an orthonormal set. 228 CHAPTER 6 PERFECT MATRICES

In Chapter 3 we defined a positive-definite matrix by the following property:

DEFINITION. We say A is positive definite if, for any nonzero vector x,

x^Ax = p > 0, (6.7.7)

where p is a real scalar quantity. The requirement that p be real valued can be stated as p = p*, and so by taking the adjoint of Eq. {6.1.1), we find

x^Ax = x^A^x (6.7.8)

from which we immediately see that A = A^ Thus, a positive-definite matrix must be self-adjoint. We can now use the above theorem and write x = Yl"^\ ^i ^i for any arbitrary /I-dimensional vector x. We can choose the vectors {x,} to be the eigenvectors of A since we have proven they form a complete basis set (or at least can be made to form one). Substitution into Eq. (6.7.7) yields

n n p = EE<^jAx^-«^.

1=1 j=\

1=1 where we have used the orthonormality conditions of {x,} to obtain the last line. Since X is arbitrary, the coefficients a, are arbitrary as well, and so in order to ensure that p > 0 for any nonzero vector x, all eigenvalues A, must be positive and nonzero. We have just proven the following theorem:

THEOREM. A positive-definite matrix is a self-adjoint matrix whose eigenvalues are positive and greater than 0 (X, > 0). A similar proof can be constructed to show that a negative-definite matrix is self-adjoint with all eigenvalues X, < 0. Furthermore, we say a matrix is positive (negative) semi-definite if it is self-adjoint and all of its eigenvalues are greater (less) than or equal to 0 (i.e., some of its eigenvalues are 0). The point here is that each of the above classes of matrices are simply subclasses of self-adjoint matrices. It is important to realize that, although the eigenvectors of a self-adjoint matrix always form a basis set, they might not be orthonormal. Let us explore this point further. Suppose X, / Xj. Then

XJAX^. = kjxjxj (6.7.10)

and

XJAX,. =kix]xi. (6.7.11) SELF-ADJOINT MATRICES 229

Taking the adjoint of Eq. (6.7.11) and using the properties A^ = A and k* — X,, we obtain

(6.7.12)

which, when subtracted from Eq. (6.7.10), yields

{Ij - Xi)x]xj =0 or \]xj = 0. (6.7.13)

Thus, if the eigenvalues X, and Xj are not equal, then x^ and Xj are orthogonal. However, if A^ = Ay, then we cannot conclude from Eq. (6.7.13) that x, and Xj are orthogonal. As an example, consider the matrix

7 1 0 2 ~2 1 7 A = 0 (6.7.14) 2 2 0 0 4

The eigenvalues are A.3 = 3, A,2 = 4,andA.3 = 4, with the corresponding eigenvectors

1 -1 X, = X. = 1 , x; = 1 (6.7.15) 1 -1

A number of things are noteworthy here. The vector Xj is orthogonal to X2 and X3, but X2 and X3 are not orthogonal and none of the vectors is normalized. Thus, the matrix [Xj, X2, X3] is not a unitary matrix. It will, however, diagonalize A in a similarity transformation. We can transform X2 and X3 into orthogonal vectors using the Gram-Schmidt orthogonalization procedure. The resulting orthogonal vectors are linear combinations of X2 and X3, and, since A(c2X2 + ^3X3) = 4(c2X2 + C3X3), these linear combinations are also eigenvectors of A. Also, the eigenvectors can be normalized by multiplying by scalar factors and so they remain eigenvectors of A. Orthogonalizing and normalizing x[, X2, and X3 yields the orthonormal eigenvectors

1 0 -1 1 1 Xi = 1 X2 = 0 X, = 1 (6.7.16) 7! 0 1 7! 0

With these eigenvectors, X = [Xj, X2, X3] is a unitary matrix and will diagonalize A in a unitary transformation. The above example illustrates the facts that the eigenvectors of a self-adjoint matrix may be provided in an unnormalized form and that if the eigenvalue A, is of multiplicity pj^, greater than 1, then the eigenvectors corresponding to A, may not be orthogonal. However, since Ylt ^t^t is an eigenvector of A, and since the p^^, eigenvectors have the same eigenvalue, the Gram-Schmidt procedure can be used to obtain 230 CHAPTER 6 PERFECT MATRICES

P;^ orthogonal eigenvectors. So, whereas we frequently say that the eigenvectors of a self-adjoint matrix form an orthonormal basis set, what we really mean is that the eigenvectors of a self-adjoint matrix can be arranged into an orthonormal basis set. Since a self-adjoint matrix A is perfect and its eigenvectors can be chosen to be an orthonormal basis, the spectral decomposition of A can be expressed as

X = J2kiX,x]. (6.7.17)

Let us illustrate the usefulness of the unitary transformation theorem by applications to physical examples in the remainder of this section. A good approximation to the Hamiltonian H of an atomic crystal far from its melting point is the harmonic oscillator or Hookean spring approximation

^^ 1 /dxV 1 ^^

where m is the mass of the atoms, jc, is a Cartesian coordinate measuring the displacement along a Cartesian axis of an atom from its equilibrium lattice site, dxjdt is the velocity along the axis, a^j is a component of the force restoring atoms to their lattice sites, and UQ is the equilibrium potential energy (a constant). With this Hamiltonian (i.e., mechanical energy), Newton's law of motion yields the equations

m~^ = - E^O^r / = 1,..., 3iV. (6.7.19)

Since all of the degrees of freedom {jcj are coupled, the solution to these equations is usually difficult. However, if we rewrite Eqs. (6.7.18) and (6.7.19) in vector form, it becomes clear how to simplify them:

„ I dxUx 1 .

(6.7.20) m—, = -Ax,

where x is a 3A^-dimensional vector whose components are x^ and A = [a^j]. Since the coefficients a^j are related to a potential energy function l/(xi,..., JC3;y) by

(6.7.21) dXi dXj '

they are real and invariant to the interchange of the order of partial differentiation (i.e., a^j = Uji), Thus, A is a self-adjoint matrix that can be diagonalized by a unitary transformation. Actually, since x and A are real, the orthonormalized eigenvectors of A form a real orthonormal set and the eigenvector matrix X is an orthogonal matrix. SELF-ADJOINT MATRICES 231

To exploit the fact that A is self-adjoint, we note that XX^ = I and we rewrite Eq. (6.7.20) as

H = i^XX^— + ix+XX^AXX^x 1 dt dt 1 (6.7.22) m—xX^x = -X^AXX^x. dr Defining the vector

<=X^x (6.7.23)

and using the relation X^AX = A, we obtain

md^^d^ 1 , (6.7.24)

1 3N m +^,f' + Uo (6.7.25) 2 ^-^ m and

-A<, (6.7.26) m-df which in expanded form is

L ,37V. (6.7.27) m-dt-" = -K^i^ The unitary transformation has greatly simplified the equation of motion of the atoms of the crystal. It amounts to introducing new Cartesian coordinates f,,.. ., f3;v—^related to the old Cartesian coordinates (jC|,.. ., x^f^) by the rotational transformation x = X^—which are decoupled. By decoupled we mean that, in the new coordinate system, the degrees of freedom f, behave as independent harmonic oscillators. The solution to Eq. (6.7.25) is simply

^, = a sin cOit + b cos(o^t, (6.7.28)

where co^ = yfkjm is the characteristic frequency of vibration of the degree of freedom f,. The quantities a and h are constants that depend on the initial conditions. Physicists refer to the transformation of Eq. (6.7.19) to Eq. (6.7.25) as finding the "normal modes" of a coupled harmonic oscillator system. EXERCISE 6.7.1. Suppose the atomic masses are different so that

1^ (dx,^ 2 \L^ (6.7.29) 232 CHAPTER 6 PERFECT MATRICES

and

m-^ = -Z^U^j' (6.7.30)

Find the transformation that converts these equations to the forms of Eqs. (6.7.23) • I • and (6.7.24). As another example, consider the problem of finding the minimum or maximum of a multivariable real function /(JCJ, ..., X„) of real variables. At an extremum (minimum or maximum),

-^=0, / = I,...,n. (6.7.31) dxi We can expand the perturbed function /(x^ +5x,,..., x„ +5x„) (near the extremum) in a Taylor series around the points {xj,. .., xj, giving

fix + 8x) = f(x) + J^8x,^ + ]-j: 8xMj//- + • • • . (6.7.32)

Since df/dx^ =0, / = 1,..., n, and defining the matrix A by its components

a,, = —, (6.7.33)

Eq. (6.7.30) can be expressed in the form

/(x + (5x) = fix) + ^ 5x^ A 5x + .. • . (6.7.34)

The matrix A is often referred to as the Hessian matrix. Let us say that we are hunting a minimum in /. If x is at a minimum, then, in a very small displacement 8x away from X, fix 4- 8x) will increase over its value fix). From Eq. (6.7.34) we see that this will be the case if

<5x^ A 5x > 0 for arbitrary 8x. (6.7.35)

Since a^^ = a^, and the values of a^^ are real, the matrix A is self-adjoint. Therefore, A can be diagonalized by a unitary transformation, or, equivalently, it has the spectral decomposition

A = EM.-XJ' (6.7.36)

where the eigenvalues X^ are real and the eigenvectors {x,} form an orthonormal set. Insertion of Eq. (6.7.36) into Eq. (6.7.35) yields

n Sx' A5x = X;A.,I«, 1^ > 0, (6.7.37) SELF-ADJOINT MATRICES 233

where

Of,. =x/5x. (6.7.38)

Since |ofJ^ is arbitrary and positive, we conclude from Eq. (6.7.35) that /(x) is a minimum if all of the eigenvalues of A are positive (and is a maximum if all of the eigenvalues are negative). If some of the eigenvalues are 0 and the rest are positive, then X might be a point of inflection rather than a minimum. High-order derivatives of / would be necessary to analyze this situation. Another way to express Eq. (6.7.35) is to use the relation A = XAX^ to obtain

8x^A8x = Sx^ XAX^ 8x = ct^Aot, (6.7.39)

where a = X^8x. This is equivalent to Eq. (6.7.37). In the thermodynamic theory of phase equilibria, / is the Helmholtz free energy and Xi is the density of component /. Previously, we said that A is positive definite if it is self-adjoint and if

x^Ax > 0 for all x € £„. (6.7.40)

With the aid of Eq. (6.7.36), it can now be stated that

n x+Ax = E^/KI'' ^i = xJx' (6.7.41) (=1 and so it follows that A is positive definite if and only if all of its eigenvalues are positive. Because a self-adjoint operator can be diagonalized by a unitary transformation (or, equivalently, because the eigenvectors of a self-adjoint matrix can be airanged as an orthonormal basis set), the p = 2 norm and the condition number fc^(A) are simple functions of eigenvalues of A. The p = 2 norm is given by

.2 (Ax)+Ax ||A|"r =— mamax ^——r-^ . (6.7.42)

With the aid of I = J^i x,^J and A = J], A.,x,x/, we obtain

x = lx = ]^x,xjx and Ax=:^A,X;ay, (6.7.43) i i

where a, = xjx. Equation (6.7.42) then becomes

||Af = max 2 E,ki (6.7.44) :E,KP

where

\k^J = max |X,.|. (6.7.45) l

Denoting x^^^ as the eigenvector having the eigenvalue of the greatest absolute value, we can choose x = Xj„. The equality in Eq. (6.7.44) can then be achieved and so

l|A|| = \X„ (6.7.46)

Thus, for the /? = 2 norm of a self-adjoint matrix, we simply have to find the eigenvalues of maximum magnitude. The power method described in Chapter 5 is a way of finding this eigenvalue. If A is not singular, then the eigenvalues of A~^ are Xf^ and the eigenvectors are the same as those of A. The /? = 2 norm of A~^ is then

l|A-'l| = (6.7.47) \K where i^„ min \Xi\ (6.7.48)

Here we denote X^i„ as the eigenvalue of minimum absolute value. The inverse power method applied to A"' yields this eigenvalue. The condition number of a self-adjoint matrix in the p = 2 norm is simply

/c(A) = ||A||||A-'|| (6.7.49)

It is important to note that if A is perfect but is not self-adjoint, then the p = 2 norm of A is not in general IX^j^l. This can be proved by a simple example:

1 1 A: 0 0 0, 1, (6.7.50) 1 Xi = -1

Thus, A is perfect but

Ay = y\ + yi (6.7.51) 0

and IIAyf (yi +yif (!+«)' (6.7.52) llyll' yf + y\ l+a^ The maximum of (1 -|- a)^/(l -I- a^) occurs when a = 1, and so

IIAII = V2, (6.7.53)

which is larger than X„ A, = 1. SELF-ADjOINT MATRICES 235

In Statistics and in the thermodynamic theory of density fluctuations, one frequently encounters the multivariate Gaussian distribution function

P = a expi — Yl îjî^j I = ^ exp(—xÂx), (6.7.54)

where a^j and jc, are real and a^j = «j,. The physical meaning of P is that

P{x^j,.. ,x^)fifjCj... Jjc„ (6.7.55)

is the probability that the variables will be observed to lie between JCJ ,..., x„ and jc,+^jci,... ,A:„-f-t/jc„. Therefore, if we integrated over all the variables {jCi,..., JC„} the result must be 1, i.e.,

OO /.OO / •••/ Pix^,...,x„)dxr-dx„ = l. (6.7.56) -OO -^-OO In using a probability distribution, we generally want to evaluate average quantities such as

OO /.OO / •••/ 8{x)P(x)dxr-dx„. (6.7.57) -00 •^—OO The variables x = {ACJ ,..., A:„} are defined as the deviations from a set of mean values (i.e., X, = ^. — ^.) and so jc^ = 0. The quantities JJ^i are elements of the covariance matrix G, which we define by [x^i] or xx^. Note that the trace of G, trjG = J^Li ^h is the mean square dispersion, a^, of the variables. Evaluation of the various averages could be very difficult in general since all of the variables are coupled. However, for the Gaussian distribution, A is a real, self- adjoint matrix, and so it is diagonalizable by an orthogonal transformation; i.e., there exists an orthogonal matrix X such that

X'^AX = A and X'^X = I. (6.7.58)

As in the analysis of the coupled harmonic oscillators, the new variables

y = X'^x (6.7.59)

can be introduced to simplify the problem. The relations X^ = X"^ and x = Xy yield

x^Ax = y'^X'^AXy = y'^ Ay = f^ X,yf. (6.7.60) 1=1

To carry out the integrations in the variables y,, the Jacobian |J| of the coordinate transformation must be calculated. The volume elements of the x and y coordinates are related by

dx^"' dx„ = |J| dy^'" dy„, (6.7.61) 236 CHAPTER 6 PERFECT MATRICES

where

dXi dy^ dy2 iJi = (6.7.62) dx^ dy^ dy2

The relationships jc, = Ylj îjyj ^^^ ^î/^yj = îj show that |J| = |X| = 1 and thus the Jacobian of coordinate transformation is unity. Any student of mechanics will, of course, recall that in Cartesian coordinates dx dy dz = dx' dy' dz' whenever the coordinates x,y,z are related to the coordinates x\y', z' by a rotation (i.e., by an orthogonal transformation). We have simply proven that this property is valid in any linear vector space regardless of its dimension. We can now calculate averages in the y coordinate system. The first task is to compute the normalization constant a from Eq. (6.7.56)

• ••/ ^M~Y.^,yndyr-dy„ = \. (6.7.63) -OO •'—OO \ j_j /

Note that if any A.^ < 0, then the integrals in Eq. (6.7.62) are singular. This means that A has to be positive definite to properly define a probability distribution in which A and x are real. From mathematical tables (or using Mathematica), we can findth e integral

(6.7.64) -OO V A;

which gives for a the result ^nu K nil ^njlw\ ' (6.7.65) TV Although we used eigenvalue analysis to obtain a, the relationship |A| = f]/ \ leads to a result requiring only the evaluation of the determinant of A. Next, consider the average value of xx^. This quantity is related to yy^ by Xyy^X^. It suffices then to determine yy^ or its elements fvfv ^^ simply write

ykyi ^ •••/ y,y,^M-j:Kyf)dyr-dy„. (6.7.66) n "/-(X) *'~oo \ j_j /

Ifk^l, then y^i = 0 since the average value of every y, is 0. If A: = /, then we note that the integral in Eq. (6.7.66) is made up of the product of « — 1 integrals of the form in Eq. (6.7.64), and one integral of the form

/_ yl^^P{~hyl)dyk = :^-^^ (6.7.67) SELF-ADJOINT MATRICES 237

The value of this integral was obtained by differentiating Eq. (6.7.64) with respect to kf^. Equation (6.7.66) can now be solved

^'''=' ' (6.7.68) 1^ IK' The result for yy^ is then

yy^ = ^A-^ (6.7.69)

The relationships A'^ = XA~*X^ and xx'^ = Xyy^X^, and Eq. (6.7.69) yield

G = i?=iA-^ (6.7.70)

Thus, the ij elements x^x^ of the covariance matrix G are simply one half of the ij elements of the inverse of the matrix A. Again, although eigenanalysis was needed to obtain Eq. (6.7.69), the result is that computation of the covariance requires only calculation of the inverse of A—no eigenvalue analysis is required. EXAMPLE 6.7.1. As an example of a negative-definite matrix, let us consider a transport process. The concentration in Fickean diffusion or the temperature in heat transfer obeys the partial differential equation

du d^U = D—^ (6.7.71) dt dx^ where t is time, x is position, and D is either the diffusion coefficient or the thermal diffusivity. With the initial condition

u{x,t =0) = fix) (6.7.72)

and the boundary conditions

M(0,O = M(/,O = 0, (6.7.73)

the equation can be solved by functional analysis, as will be shown in a later chapter. As described in Section 3.6, the finite-difference approximation

a^ . "••^•-f-+"'-. (6.7.74) ax^ {AxY where M, is the value of u at position x^ = / AJC, approximates the differential equation as du D , — = TAU, dt (Axf 238 CHAPTER 6 PERFECT MATRICES

where A is the tridiagonal matrix

2 1 0 0 • • 0 0 0 1 -2 I 0 • • 0 0 0 0 1 -2 1 • • 0 0 0 (6.7.75)

0 0 0 0 • • • 1 -2 1 0 0 0 0 • •• 0 1 -2

and

u = (6.7.76)

Here M, = u{i Ax, t), UQ = H(0, 0 = 0, M„^I — u(l, t) = 0, and n = l/Ax — 1. The initial condition yields

fix,) uit = 0) = Uo (6.7.77) f{x„) A is a self-adjoint operator and so its eigenvectors x,, i = 1,..., n, can be chosen to be an orthonormal basis set. It can be shown that the eigenvalues of A are

1 - cos I I / = 1, . . . ,/l. (6.7.78)

Since A^ = A and X, < 0, i — 1,..., n, it follows that A is a negative-definite matrix. Thus, with the spectral resolution theorem

the solution to the finite-difference equation is 2Dt "=tM"(^'['" '"(;^)])''^'''"^^- (6.7.80) At large t, the / = 1 term of the series dominates. Since n ::$> 1 for a good finite- difference approximation, l/(n 4- 1) <^ 1,1— cos[7r/(n 4- 1)] ^ [(^/(n 4- 1)]^, Ax(n -f-1) = /, and so

u ^ exp )Xi for larger. (6.7.81)

Thus, t^/n'^D is the characteristic time determining the approach of concentration or temperature to its asymptotic value. SELF-ADJOINT MATRICES 239

•••I EXERCISE 6.7.2. For the example just given, assume

/(A:) = sm(^—j, (6.7.82)

/ = 1 cm and djl^ = 10"^ s"^ Consider the discretizations n = 10 and n — 20. Using the eigenvectors given in Problem 29 of this chapter, determine u for r = 1, 10, 100, and 1000 s. The components of u approximate the profile u(x, i) at jc, = / AJC, / = 1,..., n. Plot the approximate profiles for n = 10 and n = 20 at r = 1,10, 100, and 1000 s. Another useful property of self-adjoint matrices, which is used extensively in quantum mechanics, is:

THEOREM. If the self-adjoint matrices A and B commute, i.e., if

AB = BA, (6.7.83)

then A and B have the same eigenvectors. Consider the eigenvalue X of A and suppose there are p eigenvectors Xj,..., x^ corresponding to X. The eigenvectors can always be chosen to be orthonormal. Since

Ax, = Ax,, / = 1,..., /7, (6.7.84)

Eq. (6.7.83) yields BAx, = ABx, = XBx, or Ay, = Ay,, (6.7.85)

where y/=Bx, (6.7.86)

Since y, is an eigenvector of A corresponding to A, it must be a linear combination of Xj,... ,Xp, i.e., p Bx/ = EO/Xr ' = 1---./^- (6.7.87)

The orthonormality of the x,(x,^Xy = 5,y) enables us to deduce form Eq. (6.7.87) that

c = xtBx (6.7.88)

and the fact that B is self-adjoint yields the result C- • = C .. (6.7.89)

Thus, the p X p matrix

^21 <-22 '2P C = [c,j] = (6.7.90)

L ^pl ^p2 ''PP J is self-adjoint. 240 CHAPTER 6 PERFECT MATRICES

The equations in Eq. (6.7.87) can be expressed as

B[Xi,...,x^] = [Xi,...,x^]C. (6.7.91)

Since C is self-adjoint, there exists a /? x p unitary matrix U such that

0 0 a, 0 U^CU = A = (6.7.92)

0 0

and Eq. (6.7.91) can be transformed into

B[Xj,...,x^]U = [Xi,...,x^]UU+CU, (6.7.93)

which can be rewritten as

[Bzj,...,Bz^] = [Zi,...,z^]A (6.7.94) = [aiZi,...,a„z„]

Bz,=Qr,z,, / = !,, (6.7.95)

where

(6.7.96) 7=1

Thus, the p vectors Zj,..., z^ are eigenvectors of B, and since Ax, = Xx,, / = 1,..., /7, they are also eigenvectors of A. This completes the proof of the theorem. The eigenvalues of B for the p eigenvectors {zj are eigenvalues of the matrix C, and the column vectors of U are the orthonormalized eigenvectors of C. Since B is self-adjoint, the eigenvectors in Eq. (6.7.95) are orthogonal if the eigenvalues a, are distinct or they can be orthogonalized by the Gram-Schmidt procedure if the multiplicity of a given eigenvalue a, is greater than 1. Furthermore, they remain eigenvectors of B and of A in such an orthogonalization procedure. Thus, the spectral decompositions of B and A are

B = E^/2;^zJ, A = J]A,z,.zJ, (6.7.97) 1=1 i=\

where the eigenvectors z, are orthonormalized (i.e., zjz^ = 5,^). The theorem can be restated as follows:

THEOREM. If the self-adjoint matrices A and B commute, then they can be diagonalized by the same unitary transformation', i,e., there exists a unitary matrix Y SELF-ADJOINT MATRICES 241

such that A.1 0 0 «! 0 0

0 X. 0 0 0^2 0 Y^BY = . (6.7.98) Y^AY = 0 0 0 0

The column vectors of Y are orthonormal eigenvectors of A and B. Finally, in connection with evaluating averages by Gaussian distributions, we prove the following theorem:

THEOREM. If A is positive definite and B is self-adjoint, then there exists a nonsingidar matrix T such that if

x = Tw, (6.7.99)

then the bilinear form /,

/ = xUA + B)x, (6.7.100)

becomes

/ = wt(I + M)w = j:(l+/i,.)|u;,p (6.7.101) 1 = 1

where M is a diagonal matrix with real diagonal elements /i,, / = 1,..., n. The first step in the proof is to note that since A is positive definite, there exists a unitary matrix U such that U^AU = A, where A is a diagonal matrix with positive diagonal elements A,. If we define y by

x = Uy, (6.7.102)

then / becomes

/ = yUy + y^Cy, (6.7.103)

where C is the self-adjoint matrix

C = U^BU. (6.7.104)

Next, we introduce the transformation

(6.7.105)

where A~^^^ is a diagonal matrix with elements X'[^^^. In terms of z, / becomes

/ = z+z + z^Dz, (6.7.106) 242 CHAPTER 6 PERFECT MATRICES

where

D = K-'/^CK-"\ (6.7.107)

D is self-adjoint, and so there exists a unitary matrix V such that

iJi, 0 0 0 /X2 0 V^DV = M = (6.7.108)

0 0

Inserting the transformation (6.7.109)

into Eq. (6.7.106) leads to Eq. (6.7.101), thus proving the theorem. Combining Eqs. (6.7.102), (6.7.105), and (6.7.109), we find that x = UA'^/^V+w , or that the transformation matrix is

T = UA-^/^V^ (6.7.110)

and its inverse is T~^ = VA^^^U^ The column vectors of U are the orthonormalized eigenvectors of A. The elements of A are the eigenvalues of A and the column vectors of V are the orthonormalized eigenvectors of the matrix A~^^^U^BU A"^^*^. ILLUSTRATION 6.7.1 (Conditions of Chemical Stability). The second law of thermodynamics, sometimes referred to as the entropy maximum principle^ requires that, for a system in thermodynamic equilibrium, 5^5 = 0 for any and all fluctuations W (internal energy), bV (volume), and <5«, (composition). As the principle implies, the entropy S, which is a function of [/, V, and n^ (number of moles of component /), is in a local maximum at an equilibrium point. It is often convenient to work with free energies when discussing fluids. The second law can be restated for a system at constant temperature and volume as requiring the Helmholtz free energy, F{V,T, {«,)), be in a local minimum at equilibrium. We can expand F in a Taylor series around the equilibrium point at constant T and V, giving

F—-} dtiiOn:. (6.7.111) 2j-dn,dnj • ' Recognizing that the chemical potential is defined as

(6.7.112)

the second law of thermodynamics requires that the matrix A, with components

A-/ = . (6.7.113) '•' * dn J ^ T,V be positive definite. SELF-ADJOINT MATRICES 243

A similar expression can be derived for the Gibbs free energy G{T, P, {n,}) by noting that, at constant T and P,

1 ^ d^G G=::j:^-^SnMj- (6.7.114)

The second law requires that, for a system in equilibrium at constant temperature and pressure, the Gibbs free energy must be in a local minimum. However, in this case, the volume is no longer constrained. We can imagine a fluctuation in which all of the components change in equal proportions such that the composition is constant and the total density of the system is unchanged. Such a fluctuation, in a bulk fluid, consists of an arbitrary change of the boundary of the system and should thus not affect stability. (i) Use the Gibbs-Duhem equation,

X;^Mf«c =0, (6.7.115)

to show that, for the fluctuation 5n, = an,, where a is a constant, the second-order fluctuation in the Gibbs free energy is given by

8^G = 0. (6.7.116)

The above result requires that we modify the stability conditions for the Gibbs free energy such that this one fluctuation is considered. We hence require that the matrix A' defined by

(6.7.117) '•'-{1^) T,P be positive semidefinite with exactly one eigenvalue equal to 0. (ii) A useful free-energy approximation for binary mixtures is the Wilson free energy given by

G(T, F, n,,n,) = n,g,(T, P) + n^g^{T, P)

^-RT{n^\nx^-^n^\nx^ (6.7.118)

- RT\n^\n{x^ + A 12^2) -^î^îî -^ ^2\^\)\

where jc, is the molar fraction of component /, g^ is the molar Gibbs free energy of pure component /, and A,y are model parameters. Determine under what conditions a single-phase, binary mixture obeying the Wilson equation is thermodynamically stable. ILLUSTRATION 6.7.2. (Reaction Diffusion in a Thin Film). The multi-component diffusion system offers a good example of the power of eigenanalysis. Con- sider a permeable, thin-film membrane of thickness 5. We wish to construct the steady-state concentration profile of a three-component system. Reagent A is in 244 CHAPTER 6 PERFECT MATRICES

contact with one boundary of the film at concentration Q^. Similarly, reagent B con- tacts the other boundary with concentration c^ . Reagent A is converted to reagent C according to the first-order reaction A -^ C. The rate expression is given by

-kCj^, (6.7.119)

We can write the multicomponent reaction-diffusion equation for this system in matrix form as 9c _a^c ^ (6.7.120)

where

(6.7.121)

and D contains the multicomponent diffusion coefficients

D AA D AB D AC

D ^BB ^BC {6,1.Ill) CA ACB acc The matrix R contains the first-order reaction terms and is defined for this system as

r -^-^ 0 0 00 1 R= I 0 0 0 1. (6.7.123) k 0 Q

The boundary conditions are

c^(0) = c^^, c^(5) = 0,

0^(0) =0, CB(5) = C5^, (6.7.124) Cc(0) = 0, Cc(5)=0.

We can recast Eq. (6.7.120) in a simpler form by making the following linear transformation:

u = X-^c, (6.7.125)

where X diagonalizes D in the similarity transformation

X-^DX = A. (6.7.126)

(i) Show that Eq. (6.7.120) can be transformed in terms of u into

- = A^ + Qu. (6.7.127)

Express the matrix Q in terms of X, D, and R. NORMAL MATRICES 245

(ii) Generate a fully decoupled system of equations to describe the steady-state concentrations by first multiplying Eq. (6.7.127) by A~^ Solve the equations and plot the steady-state values of c^, c^, and c^ within the membrane film (0 < x < 8) for 8 = 0.5 mm, c^^ = 0.05 mol/L, c^^ = 0.05 mol/L, k = 0.55 s"^ and

2.05 1.96 1.10 D 1.96 1.75 1.25 X 10"^ cmVs. (6.7.128) 1.10 1.25 0.96

6.8. NORMAL MATRICES

We say that A is a normal matrix if

AA^ = A^A. (6.8.1)

Self-adjoint matrices are, of course, normal, but the class also includes unitary, orthogonal, and skew-symmetric (A^ = —/A) matrices, as well as other matrices. For example, the 2 x 2 matrix

7 4- 4/ I (6.8.2) 1 7 + 4/

is a normal matrix, but is neither a unitary nor a self-adjoint matrix. Figure 6.8.1 gives a qualitative picture of the classes of matrices we have been discussing. In the figure, P.D. means positive definite.

FIGURE 6.8.1 246 CHAPTER 6 PERFECT MATRICES

The normal matrices are, in fact, the class of all matrices whose eigenvectors form an orthonormal basis set. This fact follows from the theorem:

THEOREM. Ifh. is a normal matrix^ there exists a unitary matrix X that diagonalizes A in a unitary transformation, i.e..

X+AX = A and X^X = I. (6.8.3)

As stated in the previous section, this theorem is also equivalent to the spectral resolution theorem that there exists a basis set of orthonormal eigenvectors of A such that

A = E^/X,xJ. (6.8.4)

The X- are not necessarily real. In fact, the eigenvalues of the matrix given by Eq. (6.8.2) are A,i = 3 -|- 2i and A.2 = 4 + 2/. Normal matrices represent the largest class of matrices whose eigenvectors form an orthonormal basis set. In fact, the whole class can be constructed by selecting every orthonormal basis set {x,} in £„ and taking linear combinations Y.i ^/X,xJ with every possible set of complex numbers [k^]. The proof of the above theorem makes use of the semidiagonalization theorem, which states that there exists a unitary matrix Q such that

Ai ^12 ••• Z?i„

0 A2 •• fc2« A = QB„Q+ = Q Q'

0 0 (6.8.5) Xy by2 ^\n

Q^ B«-l

where B„_i is an upper triangular matrix with eigenvalues X2,..., A.„ on the main diagonal. The adjoint of Eq. (6.8.5) yields the result

A^ = Q Q^ (6.8.6) BL,

L "^In NORMAL MATRICES 247

The product A+A, with the aid of the property QBj,Q+QB„Qf = QB);B„Qt, is

IX, P XIfo,2 Ki>u

A,fcl2 AU = Q Q^ (6.8.7)

x.fct

whereas

|X,P + El^/;l' 7=2 AAt=Q Q^ (6.8.8)

Using the property that AA^ - A^A = 0 for a normal matrix, we findtha t

bij =0, j = 2,...,n. (6.8.9)

This reduces A to

A,i 0 0 A = Q Q^ (6.8.10) B«- l

and gives the result

IX, p 0 Ol 0 AA+ = Q Q^ (6.8.11) B„- iB„_, 0

and

n — i n—i n — i n — i (6.8.12)

(since AA^ = A^ A). B„_i is an upper triangular matrix with elements A,2,..., X„ along the main diagonal and the elements of its first row are A.2, ib23,..., ^2fr Repeating the process we just used to obtain Eq. (6.8.11) leads to the conclusion that

hi =0, 7 = 3,..., ^?, (6.8.13) 248 CHAPTER 6 PERFECT MATRICES

and

Ai 0

A = Q Q^ (6.8.14) B„ 0 0

where B„_2 is an upper diagonal matrix with elements A3,..., A„ along the main diagonal. Continuing the process yields the final result

A = QAQ^ (6.8.15)

which completes the proof of the theorem. The orthonormal eigenvectors of A are, in fact, the column vectors of the unitary matrix Q that was guaranteed to exist by the semidiagonalization theorem. If U is a unitary matrix, then U^U = UU^ = I, and so, as mentioned earlier, a unitary matrix is a normal matrix. Thus, if JLI, ,..., /x„ are the eigenvalues of U and X,,..., x„ are the corresponding orthonormalized eigenvectors, the spectral decomposition is

V = J^fiiXixJ. (6.8.16)

From the fact that

i=i j=i (6.8.17) n n = Y. \fii\\xj = I = Y.Xixl

itfollowsthat |ju.,p = l;i.e.,theeigenvaluesof a unitary matrix are of unit magnitude. This leads to the conclusion that

THEOREM. A unitary matrix U can always be expressed in the form

U = exp(/A), (6.8.18)

where A is a self-adjoint matrix. To prove this, set

fjij=cxp{iXj), (6.8.19)

where Xj is a real number. Note that, since |/iy| = 1, if we define A = X]y=i ^yX^xJ, where the x, are the orthonormalized eigenvectors of U, it follows that

exp (/A) = J2^^P {^^j)^j^] — U* (6.8.20) MISCELLANEA 249

By definition, A = AMf U is unitary, which proves the conclusion. The meaning of Eq. (6.8.18) is that unitary and self-adjoint matrices are related through an exponential mapping: for any self-adjoint matrix A, there exists a unitary matrix such that U = exp(iA) and for any unitary matrix U there exists a self-adjoint matrix such that U = expO'A). As we did for self-adjoint matrices, we can use the spectral resolution theorem to prove that, for a normal matrix, in the /? = 2 norm.

l|A|| = |X,,J (6.8.21)

and

^(A) = {^, (6.8.22)

where

l^maxl = P?x|^,l and IX^iJ = min |A.,.|. (6.8.23)

6.9. NISCELUNEA

The product AB of a self-adjoint matrix A and a positive-definite matrix B is not self-adjoint. Nevertheless, the following theorem can be proved:

THEOREM. If A is self-adjoint and B is positive definite {or negative definite), then the product AB (and BA) is a perfect matrix and its eigenvalues are real. The proof is straightforward. Consider the eigenproblem

ABy,=X,y,. (6.9.1)

Since B is positive definite, it is perfect and nonsingular and its square root R = X], lj}/^Xi\] is also positive definite (/x, is the eigenvalue of B and x, is its orthonormal eigenvector). Putting B = RR in Eq. (6.9.1), defining z, = Ry,, and multiplying Eq. (6.9.1) by R, we obtain

RARz,. =X,.z.. (6.9.2)

Since the matrix RAR is self-adjoint, the eigenvalues A, are real and the eigenvectors {z,} form an orthonormal basis set. Note that the eigenvectors y, = R~^z,, / = 1,..., n, are linearly independent since the equation

0 = E «,y,- = E «,R-'z, = R-' E «,z, (6.9.3) i=l 1=1 i=l

has only the solution a, =0, / = 1,..., n. This completes the proof that AB is a perfect matrix and its eigenvalues are real. Similarly, it is easy to prove that BA is perfect and has real eigenvalues. 250 CHAPTER 6 PERFECT MATRICES

If both A and B are positive definite, then an even more specific theorem can be proved:

THEOREM. If A and B are positive definite, then the product AB {or BA) is a perfect matrix and its eigenvahies are real and positive. In the previous theorem, we established that AB is perfect and has real eigenvalues. According to Eq. (6.9.2), the eigenvalues of AB are the eigenvalues of RAR, where R is a positive definite matrix (more specifically, R is the square root of B and R = R^) Consider the inner product x^RARx = x^R^ARx. Defining w = Rx, we note that if X 7^ 0, then vf ^Q since R is positive definite, and so the inner product

x^RARx = w^Aw > 0

for all vectors x in E,^ (because A is, by choice, positive definite). Thus, from Eq. (6.9.2), it follows that ZJRARZ, = X,||z,||^ > 0, and so the eigenvalues of AB must be positive, proving the theorem. By simply interchanging the symbols A and B above, we can prove the same theorem for the product BA. Note that if both A and B are self-adjoint but neither is positive definite (or negative definite), then their product might not be perfect. For example, if

1 1 0 2 A = and B = (6.9.4) 1 0 2 -1

then

2 1 AB (6.9.5) 0 2

The eigenvalues of AB are ^.j = A,2 = 2 and the only eigenvector is

X, = (6.9.6)

Thus, AB is not a perfect matrix. EXAMPLE 6.9.1. In a multicomponent material, the concentration c^ of component / obeys the diffusion equation

8 c, L = J2D,JW'CJ, / = l,...,n, (6.9.7) Yt ;=i where t is time, V^ = d^/dx'^ -\- d^/dy'^ + d^/dz^ is the Laplacian differential operator, Dij is a diffusion coefficient through which a gradient in the concentration of component j causes diffusion of component /, and n is the number of components in the material. The quantity D,y is related to the Onsager transport coefficients l^j and the components of the Helmholtz free-energy curvature matrix A = [a^j], where

a:; = (6.9.8) '^ dc, ac, MISCELLANEA 251

In particular,

A7 = EV^; (6.9.9)

D = LA, (6.9.10)

where

D = [D,,], L = [l,jl A = [a,jl (6.9.11)

A relevant physical question is whether diffusion always gives rise to mixing in a closed multicomponent system; i.e., does diffusion always result in a homogeneous material? For present purposes, we assume that D is constant in space and time. Equation (6.9.7) can be summarized as

(6.9.12)

where

(6.9.13)

The condition that the system is closed is

ED,.Vc,.=0 or VDc=:0 (6.9.14)

on the boundary dV of the system. The system is contained in the volume V. From irreversible thermodynamics, it is known that the Onsager matrix L is positive definite. From thermodynamics, it is known that the free-energy curvature matrix is also positive definite if the material is thermodynamically stable. Thus, we can use the theorem proved above to show that the eigenvalues of D are all positive, and there exists a matrix X, which diagonalizes D in a similarity transformation, i.e..

X-*DX = A = [XiSijl Xi > 0, / = 1, ,/z. (6.9.15)

Defining

f=X-^c or Xf (6.9.16)

and multiplying Eqs. (6.9.12) and (6.9.14) by X"', we find

= V'Af (6.9.17a) dt 252 CHAPTER 6 PERFECT MATRICES

^=:ki VVi, / = !,...,«, (6.9.17b)

and

VA^ = 0 onaV (6.9.18a)

or X. Vx/f. =0, / = 1,..., n, on dV. (6.9.18b)

Multiplying Eq. (6.9.17b) by t/r. and using the identity

/ fi VV,. dV = - jiVfi) . Wilfi dV (6.9.19) + [ V^{f,Vf,)dV, Jy with the property

( V • ifi^fi) dV = [ fi Wfi • dA, (6.9.20)

where dA is an element of area on 8 V, we obtain

ld_ jffdV = -X, j{Vf,fdV. (6.9.21) lit The right-hand side of Eq. (6.9.20) vanishes because of the closed-system boundary condition, Eq. (6.9.18b). It follows from Eq. (6.9.21) that the positive quantity f -[[rf dV decreases in time until it reaches 0, at which time Vi/r, =0 everywhere in V. Thus, we have proven that, in a thermodynamically stable material, diffusional transport in a closed system will always lead in time to a homogeneous system. If any eigenvalue X, changes sign, then the conclusion does not hold. Physically, this situation arises when the composition of the system is such that the material is thermodynamically unstable, in which case |A| < 0 and |D| = |LA| = |L| |A| < 0. In this situation, the system "un-diffuses" and splits into two or more coexisting phases. Another interesting property of a product of arbitrary square matrices is

THEOREM. If A and B commute, i.e., if

AB = BA, (6.9.22)

and ifXi is an eigenvalue of A of multiplicity 1, then the corresponding eigenvector, Xi, is also an eigenvector of B. To prove this, multiply Ax, = A,x, by B and use Eq. (6.9.22) to obtain

Ay, = X J,., where y,- = Bx,.. (6.9.23) MISCELLANEA 253

Since y, is an eigenvector of A of eigenvalue A,, then it differs from x, only by a scalar factor, i.e., y, = a,x,, which yields the result

Bx, =a,x,, (6.9.24)

proving the theorem. Note that this is a more general version of the theorem we proved in Section 6.7 for self-adjoint matrices. We can also prove the following:

THEOREM. If all of the eigenvalues are distinct or of multiplicity 1, i. e., X, -^ kj, i ^ y, then it follows that

A = E ^i^i^l and ^ = E M,x,zJ (6.9.25) i=l if A and B commute. This also means that A and B can be diagonalized by the same similarity transformation; i.e., there exists a nonsingular matrix X such that

Xi 0 0 a, 0 0 0 Xo 0 0 a. 0 X-^AX = and X-^BX =

0 0 0 0 (6.9.26)

The column vectors of X are the eigenvectors of A and B. Note that if A is not perfect, then A and B can commute and yet do not have the same eigenvectors. For example, the matrix

1 1 A = (6.9.27) 0 1

has only the eigenvector

X, = (6.9.28)

However, the eigenvectors of I can be chosen to be

X. = (6.9.29)

Since lA = AI, I and A commute, but X2 is not an eigenvector of A. An interesting property of a perfect matrix with distinct eigenvalues is that

I adj(XJ —A) (6.9.30) 254 CHAPTER 6 PERFECT MATRICES

where adj(XI — A) is the adjugate of A.I — A, x, is the eigenvector corresponding to X,, and z^ is the reciprocal of x^ (xjzy = 1). In Chapter 1 it was shown that

[adj(A.I - A)](AI - A) = \Xl - A| I, (6.9.31)

and so

n [adj(AI - A)](AI - A) = l\(k - Xj)l, (6.9.32)

Multiplying Eq. (6.9.32) from the right by x^zj yields

n [adj(XJ - A)](X, - X,)x,zl = YliX, - Xj)x,zl (6.9.33)

and dividing by (X, — X^) gives

[adj(AJ - A)]x,zl = f\{X, - Apx,zl. (6.9.34) ;¥^

Since ELi ^A = I and since ELi ITj^K - ^jM = UU^^t - ^y)x.zj, summing Eq. (6.9.33) over all k and dividing by Fly^/C^, - >^j) yields Eq. (6.9.29). Therefore, when all of the eigenvalues of a matrix are distinct, the spectral resolution theorem, /(A) = Ei /(^/)X/zJ, can be expressed in the form

/(A) = E/(^.)nwf"tV (^-^-^^^

which is known as Sylvester's formula,

6.10. THE INITIAL VALUE PROBLEM

The /7th-order differential equation

d^u dP-^u du

d^u A ^^"'M ^ + E«..w^ = / (6.10.1)

becomes an initial value problem (IVP) if the initial conditions

—^ = y. at r = 0 for / = 0, 1,..., p - 1, (6.10.2) df THE INITIAL VALUE PROBLEM 255

are given. If the values of a, are constant and / is a known function of time, the IVP can be converted to a simple first-order equation as follows. Define jCi,..., x^ in the following way: x^ = u and

dxi dii

dxo d^u dt '""'' df (6.10.3)

dXp_^ _ _ f/P-'w dt p dt"-^ •

Then Eq. (6.10.1) becomes

dx P (6.10.4)

Combining Eqs. (6.10.3) and (6.10.4), we obtain

dx = Ax + b, (6.10.5) dt

where

X = b = (6.10.6) 0 LP J /

and

0 10 0 0 0 0 10 0 A = (6.10.7) 0 0 0 0

—at —a^ —ci-x —CIA ^p -I

where A is known as the companion matrix. The formal solution to Eq. (6.10.6) is

X = exp(AOxo + / exp(A(r - r))b(r)t/r, (6.10.8) 256 CHAPTER 6 PERFECT MATRICES

where

(6.10.9)

Yp-iP-i J If A is perfect, the spectral decomposition of exp(AO allows us to express the solution in the form

P rt X = X]e^P(^/0(zJxo)x, + ^ / exp(A,a ~ r))[zjb(r)]x, dx. (6.10.10)

Thus, when A is perfect, the solution to the IVP is a Hnear combination of exponential functions of time. This could have been found by trying exponential solutions. However, if A is not perfect, we shall learn in the next chapter that exponentials also arise in the solution of the IVP. For the companion matrix given in Eq. (6.10.7), we can show that

tr,A = (-l)'a^^i_,. (6.10.11)

The characteristic polynomial is given by

(6.10.12)

and so the eigenvalues of A satisfy the equation

X^ + a^k^'-^ + ap_^XP-^ + • • • + fl2^ 4- fli = 0, (6.10.13)

a result obtained by multiplying Pp(X) by (—1)^ and noting that (—1)^'' (_l)2p-2i _ I PQJ. ^ gjygj^ eigenvalue X, the eigenvector equation

-A 0 0 0 0 1 0 0 = 0 (6.10.14) 0 0 0 1

—«! —a2 —a p-i -(^n+A)J L^PJ

has the solution

-XX2 + A:3 = 0

(6.10.15) -Xxp_^ -\-Xp=0

J^a^Xi-XXp^O THE INITIAL VALUE PROBLEM 257

1 k

X = JC, (6.10.16)

where jc, is arbitrary. From this result, we can conclude that there will be n linearly independent eigenvectors if all of the eigenvalues k^ are distinct, i.e., if all of the roots of Eq. (6.10.13) are of multiplicity 1. In this case, A is perfect. If any root is of multiplicity greater than 1, then A is defective since it has fewer than p eigenvectors. Restated, we have just learned that

THEOREM. The necessary and sufficient condition for the companion matrix to be perfect is that its eigenvalues be distinct.

We will explore this further in Chapter 7.

EXERCISE 6.10.1. For p = 3,ai = 4, ^2 = 4, and a^ = \, find the eigenvalues and eigenvectors of the companion matrix and its adjoint.

A system of two linear differential equations (with initial conditions) can be expressed in vector form as

d\ ^AjX-fB^y-f b^ ~dt (6.10.17) dy ^=A2X + B2y + b2,

where the /7 x /? matrices are

0 1 0 0 0 0 0 1 0 0 / = 1,2, (6.10.18) 0 1

-a. -a ip _j

0 1 0 0 0 0 0 1 0 0 B.= i = l,2, (6.10.19) 0 0 0 0 1 •bn -bn -bn -bn • • • -b 258 CHAPTER 6 PERFECT MATRICES

and the p-dimensional vectors are

b.= 1,2. (6.10.20) 0

The problem can be further consolidated to the form

d bi (6.10.21) dt A2 B2 + b.

w = Tw + b, (6.10.22) dt where w and b are 2/7-dimensional vectors and T is a 2p x 2p matrix. For an initial value problem, the solution to Eq. (6.10.22) is

w = exp(TOwo + / exp(T(^ - T))b(T) dx, (6.10.23)

where the vector WQ is composed of the values of M, i; and their first p — 1 derivatives at time t = Q. Analogously, the three pth-order differential equations

dt i=\ "' r=l "dt^' 1=1 "*

dPy ^ dP-'u ^ dP-'v ^^ dP^w _ (6.10.24) dt 1=1 dPyo P dP-'u ^ dP-'v P dP-'w _

reduce to

1 B, c,1 X fb,] d_ 2 82 C2 y + b2 (6.10.25) Jt 3 B3 C3J z Lbs J or

w = Tw + b, dt where, this time, w and b are 3/7-dimensional vectors and T is a 3/? x 3p matrix. The generalization to an arbitrary number of /?th-order differential equations ought to be obvious at this point. PERTURBATION THEORY 259

6.11. PERTURBATION THEORY

In physical problems, one frequently encounters a matrix that can be split into a sum A 4- €B, where A. is a matrix whose eigenproblem is solved and eB is small in the sense that

|x^Ax|»€|x^Bx| (6.11.1)

for the vectors x of interest. When this is the case, the eigenvalue problem

(A + €B)x,. =kiXi (6.11.2)

can be solved by perturbation theory. Perturbation theory has been especially useful in quantum mechanics, where A represents the Hamiltonian of some solvable system, B is some small external electric or magnetic field, and A,, is the electronic or nuclear energy of the quantum state of interest. We will restrict the analysis to the case of a distinct eigenvalue A,. From the characteristic equation

|A-1-6B-AI|=0, (6.11.3)

it follows that A, is a function of eB, For small ^B, we expect to be able to expand a given eigenvalue A, in a series in € (^ is a dimensionless index parameter that just keeps track of the power of B contributing to A,, or x,). Thus, we expand

^, = Ef'"^*"" (6.11.4) m=0

and

oo X, = 5: e-xf"', (6.11.5) m=0

where k^^^ and x^^^ are the eigenvalue and eigenvector of A in the absence of B, i.e.,

Axf'=xrxf\ (6.11.6)

Inserting the expansions of Eqs. (6.11.4) and (6.11.5) into Eq. (6.11.2) gives

oo oo oo oo A 53 e'-x*'"' + B 53 e^+'xl"" = E E e^^'^f'^t?"" (6.11.7) m=0 m=0 m=0 /=0

or Axr + €(Ax<" + Bxf>) + e^(Ax<^» + Bx<") + • • • = X'O'x^ + 6(X('"x<" + X<'>xr) (6.11.8) + ^^(Pi'V + ^N" + ^* V) + • • • • 260 CHAPTERS PERFECT MATRICES

Equating coefficients of like powers of € on each side of Eq. (6.11.8), we obtain the perturbation sequence

(A-X<"'l)x<'> = APxf-Bxr

: (6.11.9)

(A - A*°>I)xf' = Y. ^r^f-'^ - Bxf-'\ / = 1, 2,. • • .

In quantum mechanics, and in the analysis of x-ray scattering, one is frequently only interested in first-order perturbation analysis, i.e. in the equations Axf'=X«'>xr' (6.11.10)

and

(A - Xf I)xf > = Af >xf - Bxf \ (6.11.11)

By hypothesis, we assume that the eigenvalue xf^ and eigenvector xf^ of A are known. According to the Fredholm alternative theorem, Eq. (6.11.11) is solvable if and only if

z;(X<"xr-Bxr) = 0 or X<" = ?^, (6.11.12)

where z, is the solution to the equation

(AJ - A,,^°**I)z,. = 0. (6.11.13)

In quantum mechanics, A is always self-adjoint and so A^ = A, z, = x}*", and

C^ioFloTxP^^Bx-. (6.11.14) XV'X^ Thus, the first-order shift in the eigenvalue caused by a small perturbation can be computed directly from the perturbation operator B and the unperturbed eigenvector xf^ Expanding the unknown vector x^ in the basis set [xf\ ..., xf^}

xr^ = E«o-^r^ (6.11.15) 7 = 1 we can write Eq. (6.11.11) as E(xf -xr)a,,x,=xr'xr -Bxr. (6.11.16) PROBLEMS 261

Multiplication of Eq. (6.11.16) by the reciprocal vector z^, where zlxf^ = 5^^, leads to the result

(Xr-Xr)«,. = ^l"«..-4Bxr (6.11.17)

or -z Bx" »a=.J .' for kjti. (6.11.18)

For fc = /, the left-hand side of Eq. (6.11.17) vanishes because A.^ — xf^ = 0 and the right-hand side vanishes owing to the solvability condition, x\^^ = zjBxf ^ (where the eigenvectors of A^ have been biorthonormalized with the eigenvectors {xf ^}). a^ can be set to 0 without loss of generality and the first-order perturbation solution to the eigenproblem becomes

A.. ^ Xf + zJfBx,.

ZVBX^^^ (6.11.19) X| ~ X,. - 2^ —X^. .

EXERCISE 6.1 I.I. Develop the first-orderperturbatio n theory for the case of degenerate eigenvalues, i.e., when more than one eigenvector has the same eigenvalue A.,. The results of the first-order theory are

V -f «<^V0)4-. V K(^ri-B)E;..<>xf>]xf (6.11.20) Jik "~ 2^ Pii ^ '^^ JL, 1 (0) 1 (0) /=! Ml ir ^} ~^i

for fc = 1,..., r. Here the \f^,..., x^^^^ are the r eigenvectors of A corresponding to the eigenvalue xf^ and the z,^,..., z, are the r eigenvectors of A^ corresponding to the eigenvalue xf^*, pf^,..., p\^^ and xf^ are the eigenvectors and eigenvalues of

CP = Xp, Cj^i = zJ^Bx,.^, it, / = 1,.. ., r. (6.11.21)

Although the unperturbed eigenvectors xf\ ..., x[^^ have the same eigenvalue, the perturbed eigenvectors y,^,..., y,^ have different eigenvalues A.,.^,..., X, . Thus, the effect of the perturbation is to split or separate the degenerate eigenvalues. In quantum mechanics, it is well known that degenerate energy levels (different quantum states corresponding to the same energy level) can be split by imposing a small electric or magnetic field on the system.

PROBLEMS

1. Find the eigenvalues and eigenvectors of

l+i -1+r --\ 1 + j \-i 262 CHAPTER 6 PERFECT MATRICES

and verify that U is unitary. Verify that the eigenvalues are of modulus 1 (i.e., of unit magnitude). 2. (a) Form an orthonormal set Xj, X2, X3 from

1 2 -1 1 -1 2 y2 = y3 = 1 -1 2 1 1 1

(b) Find the orthonormal vector X4 such that x,, X2, X3, and X4 form an orthonormal basis set. (c) Find the reciprocal vectors of the set yi, yi, and y3. 3. Find an orthonormal set of eigenvectors for

7 -16 -8 A = -16 7 8 -8 8 -5

Solve this problem manually, not with a computer. Show your work. 4. Suppose A = XAX"', where A is a diagonal matrix. (a) Find the matrix Y that diagonalizes the matrix

0 A ^A 0

in a similar transformation. (b) Show that the same matrix diagonalizes

/(A) g(A) g(A) /(A)

where f{t) and g(t) are expressible as Taylor series in t. 5. Show that the n x n Hilbert matrix 1 A = [ay], «• + ;•-1 is positive definite for n = 10. What about the general case? 6. Suppose

A^ = -Bx,

where A and B are positive-definite matrices independent of t. (a) Prove that the solutions to this equation are purely oscillatory. PROBLEMS 263

(b) Solve the equation for

A = B =

and dx x(0) = ~dt •1

7. Let A be a real, symmetric n x n tridiagonal matrix such that det(Ajt) ^ 0, /c = 1,..., n, where the matrix A^ is formed by striking the last n — k rows and columns of A. (a) Derive LU-decomposition formulas for A and use these to derive a recursion formula for computing det(Ajt), A; = 1,..., n. (b) Determine the largest n for which A is positive definite if a,, = 2, a^^^ — 1.01, a,4.ij = 1.01, anda,^ = 0 otherwise. 8. If U is unitary, prove that, in the /? = 2 norm,

IIUAII = ||A||.

9. Given 1 0 0 findexp(/A). 10. Show that the (AZ -f m) x (n + m) matrix V o o u is unitary, where V is an n x /i unit matrix and U is an m x m unitary matrix. 11. The moments of inertia of a system of particles of masses m, at positions (^11 Ji' ^1) ^^ defined by the matrix

yl + 2/ -xiyi -X.,Zi J = E'"- -yiXi xf + zj -y.-Zf -Z,Xi -ZiVi ^f+J? Show that i = ~Y.m,Rl where

0 -z,- y> R< = Zi 0 -X

-yi Xi 0 264 CHAPTER 6 PERFECT MATRICES

Deduce that J is a Cartesian tensor of order 2. If the body is continuous, the sums are replaced by integrals. Prove that the diagonal elements of the inertia tensor for a cube are all equal and that the off-diagonal elements are all 0, irrespective of the orientation of the coordinate axes with respect to the cube. 12. The moment of inertia of a rigid body is a second-rank tensor given by

Jo= f p(r)(r-rI-rr)^V, Jv where p{v) is the local density of the body and the subscript "0" on JQ indicates that the origin with respect to which the position vector r is measured is at point O. For a Cartesian coordinate system, r = jci -f- 3^j -f zk and the unit matrix I = ii+jj-fkk. Choosing the point O as the center of mass for any given body, the "principal directions" (i', j', and k') are defined as the Cartesian unit vectors such that the moment of inertia is diagonal, i.e., i^ = J,\\-\-Jj]' + J,\^k,

where the quantities Jy, J2, and ^3 are called the principal moments of inertia. Solve for the principal directions and principal moments of inertia for the following rigid bodies at constant density p(r) = p: (a) Sphereof radius/?. (b) Cylinder of radius R and length L. (c) Rectangular parallelepiped of lengths L^, L2, and L3. (d) Spherical dumbbell consisting of two equal spheres of radius R held together by a rod of zero diameter and length L. (e) Tetrahedral dumbbell consisting of four identical spheres of radius R held together at the center of mass by four equivalent cylinders of zero diameter and length L. 13. Prove that | exp(A)| > 1 if A is positive definite. 14. Prove that for a perfect matrix A the determinant of exp(r A) is equal to expC^ELi^n)- 15. B is positive definite and C is positive semidefinite (x^Cx < 0 for all X € £•„). Define A = B + C

and prove that (a) A is positive definite, (b) |B| < |A|, and (c) B"^ - A"^ is positive semidefinite. 16. If A is positive definite, prove that

Use this to prove that

a relationship known as Hadamard's inequality. PROBLEMS 265

17. If A is self-adjoint, prove that I-f cA is positive definite if 6 > 0 and sufficiently small. What condition must e satisfy for the above to be true? 18. Prove that if A is anti-Hermitian, such that A^ = —A, then (a) The eigenvalues of A (X^) are either purely imaginary or 0. (b) The eigenvectors of A satisfy x/xy = 0 if X, ^ kj. 19. Prove that if A is skew-symmetric, such that A^ = —/A, then (a) The eigenvalues of A (A,) are of the form a, -h /a,, where a, is a real scalar value. (b) The eigenvectors of A satisfy xjx^ = 0 if X, :^ kj. 20. Let A be a real, anti-Hermitian (anti-symmetric) matrix. If z = x -h /y is an eigenvector of A with eigenvalue if^t (where x, y, and /x are all real and nonzero), then show that x^ = 0. 21. Prove that every matrix is uniquely expressible in the form A = H -f- S, where H is self-adjoint and S is anti-Hermitian. 22. Consider the matrix C = A 4- /B, where A and B are both n-dimensional square, normal matrices. Note: i — ^J~—i, (a) Under what conditions is C a normal matrix? (b) If C is normal, what can we say about the eigenvalues of the commutator: [A, B^]? (c) Under what conditions is C a self-adjoint matrix? (d) If C is self-adjoint, do A and B necessarily commute? Explain why or why not. 23. Prove that any n-dimensional matrix C can be written in the form C = A H- /B, where A and B are both n x « self-adjoint matrices. 24. Consider the 2 x 2 matrix equation:

1 0 A^ + = 0. 0 1

(a) Find all possible eigenvalues of the matrix A. (b) Do there exist any real-valued matrices A that satisfy the above equation? Explain why or why not and, if so, give an example. 25. Consider a point mass m moving in a three-dimensional force field whose potential energy is given by

where a and VQ are positive constants and jc, y, and z are the Cartesian coordinates of the particle's position vector. (a) Show that V has a single minimum point. Find the position of that minimum (jc*, y*, z*). (b) Find the normal frequencies and modes of vibration about the minimum. 266 CHAPTER 6 PERFECT MATRICES

26. For the Gaussian distribution function given by Eq. (6.7.52), derive a formula for the average value of the product xxxx. Evaluate the average value of A:^A:| when

4 -1 0 A = -1 4 -1 0 -1 4

27. Prove that the n-dimensional integral:

1 /-^ /"^ . -—^ / ••• / dx^"'dx^exp(-x^Ax) = -~= ^TC J-oo J-oo V" {-by where (elements not indicated are zero)

a b a A = b

and a and b are real with a > b. Hint: Show that A is a normal matrix and can be diagonalized by a unitary transformation. 28. Evaluate the expression

fZo'-'fZo^î"-^^n exp(-x'Âx + b'^x) fZo'" fZo ^-^1 • • • At exp(-xÂx) where A is a real, positive-definite matrix and b is a real vector. 29. Consider the tridiagonal matrix

-(« + /6) a 0 0 ••• 0 P -(« + /3) a 0 ••• 0 A = 0 P -(« + ^) a 0

-(a-i-p)

i.e., a^i = -{a -f P), «,+! , = P, a.^+i = a, and a^j = 0 otherwise. By expansion by minors, it can be shown that

P„{k) = -(« + ^ + A)P„_,(A) - apP„_,{X), (1)

with Po(^) = 1 and P^iX) =-{a + p + A), where P„{X) is the characteristic polynomial for the « x n tridiagonal matrix. Assume that the solution to Eq. (1) is of the form PROBLEMS 267

and prove that

-(a~\-p+k)± y/(a-\-p + A)2 - 4a y± =

and that the general form of P„ is

Evaluate c, and C2 using the special cases w = 0 and n = 1. Show that the eigenvalues A are

Xj =^ -a - p - >JAOLP COS ——-, j = 1,..., n,

and that the component jc,y of the eigenvector x^ is

. . , ,,^sinr7r/(n - / -I- l)/(n + 1)1 / = l,...,n. ' ' ^' ' ^' sin[7rj/(n +1)]

Give the eigenvalues and eigenvectors of A^ 30, Prove the following inequalities for any complex matrix A:

(a)

1=1 i\;=l

(b)

2:|Re(X,)f< E «,7+«;,•

(c)

E|lm(X,)r< E % - «;.-

31. Prove the HSlder inequality

where x^ and y, are real, p > 1, and ^ = /7/(/7 — 1). 32. Prove the determinant inequality

|XA + (1-X)B|>|A|^|BXimi-| x 268 CHAPTER 6 PERFECT MATRICES

for 0 < A, < 1, where A and B are n x n positive-definite matrices. Hint: Note that

• • • / exp(—Ax^Ax — (1 — X)x^Bx) rfjCj • • • dx„ -OO •^—OO

|AA + (l-X)B|^/2

and use the Holder inequality on the left-hand side of the equation. 33. Use the results of Problem 31 to prove

where / and g are real functions, p > 1, and q — p/{p ~\). 34. Show that if A is positive definite and B is self-adjoint, then

njl • • • I exp(—x^Ax — /x^BxdX]) '' • dx„ = n -OO 'f —OO " iA+/Br/2'

35. In the Fig. 6.P.35, a sequence of rotations is outlined by means of which a set of axes (x, j, z) is transferred to a set (JC*, J*, Z*) by three two-dimensional rotations involving angles ^, ^, , which are called the

7^ ^ x' = x" FIGURE 6.P.35 PROBLEMS 269

Eulerian angles. Show that

x*" cos (/) sin 0 0 1 0 0 y* = — sin (f) cos 0 0 0 cos^ sin^ z*_ 0 0 L 0 -sin^ cos^ J

cos ir sin V^ 0 X X X — sin ^ cos xfr 0 y = A y , 0 0 1 z _z ^

where

cos 0 COS V^ cos 0 sin -^ sin 0 sin ^ — sin COS 0 sin y^r + sin cos S cos i/r A = — sin Q cos ^ — sin 0 sin 0^ cos 0 sin ^ — cos 0 cos B sin ^ + cos cos ^ cos 0- sin 0 sin V^ — sin ^ cos i/r, cos^

Show (without forming A^A or AA^) directly that A is orthogonal. The rotations illustrated in the Fig. 6.P.35 are:

(i) Positive rotation about the z axis by the angle ^; new axes x'y'z!. (ii) Positive rotation about the x' axis by the angle Q\ new axes x"y"7!'. (iii) Positive rotation about the z!^ axis by the angle 0; new axes A:*y*z*.

36. Prove that half of the eigenvalues of A are equal to those of B and the other half are equal to those of C if

B O A = O C

What is the relationship among the eigenvectors of A, B, and C? 37. Consider the generalized eigenvalue problem

Ax = XBx.

Suppose

4 -1 0 0 1 4 -1 0 0 -1 4 -1 0 0 -1 4 270 CHAPTER 6 PERFECT MATRICES

and

1 2 0 0 ~2 1 1 2 0 B 2 ~2 1 1 0 2 ~2 ~2 0 0 1 2 ~2

Find the generalized eigenvalues A,, and eigenvectors x, for these matrices. Give the spectral decomposition of the matrix B"^ A. 38. A chemical reactor process may be described by the following equations:

dx 1 i^y dt ""

dt -•^(yo-y)- K-\-y'

where 9 is the holding time; y^ is the constant-feed concentration of species y; and a, /x, and K are positive rate constants. Concentrations must lie in the ranges jc > 0 and 0 < y < yo- Find the two steady-state solutions to these equations. Determine the ranges for the quantities ^, yo, a, /it, and K for which each of the steady-state solutions is stable to small perturbations. For a microbial population for which yo is nutrient and /x = 1/h, A' = 0.2 g/L, and a = 2, find the holding time 6 for which a feed of yo = 0.3 g/L will produce at steady state a concentration jc = 0.1 g/L of microorganisms. 39. Evaluate the integral

-OO y.0O /2^'2/ ^ / .. / exp(-x'^Ax);cf'•jcf dx^---dx„ Jo Jo

in terms of appropriate derivatives of the determinant of A for r, / zero or positive integers. Assume A is an w x n, positive-definite, real matrix. 40. If A and B are n x n, real, positive-definite matrices, evaluate

/^^ = / ... / exp(—x^ABx) dxi • • • dx„. Jo Jo

Is AB positive definite? Prove your answer. What is the value of I„ for the special case

2 1 1 1 A = B = 1 2 2 3 PROBLEMS 271

41. Consider two simultaneous reactions involving n components with stoichiometric equations

n J:V„JCJ=0, a = 1,2,

where v„y represents the stoichiometric coefficient of components / in the ath reaction. The rates of transformation of components are given by

dc, / = 1,... ,a, It '7 = i:%(Krf\cf'-KfhcT\ a=l \ ;=:1 7 = 1 / where k„f and k^^ are the forward and reverse reaction rate constants of reaction a and fi^j and y„j are constants. At equilibrium, dcy/dt = 0 and the compositions c, = c? can be determined from the above equations. Derive the conditions under which the equilibrium will be stable to small fluctuations. 42. (a) Consider the symmetric matrix

1 2 A = 2 1 Find the solution to the problem

dx Ax, x(0) = 1^ 1=0 (b) Give the general form of the eigenvector solution to the equation

dH , d\ —J = Ax, x(0) = Xo, = Xo, dt^ dt /=0 where A is an n x n self-adjoint matrix. 43. A chemical reactor process is described by the following equations:

dcA "^'^Kwv)^^

Vc,p^ = qc^piT, -T)- k,exp(^-^y^V(AH) - U(T - TJ

Data: €/R = 10" K i-AH/CpP) = 200 K L/gmol iU/VCpP) = l min-' q/V = lmm-^ kg = e^' min~' f^Ao = 1 gmol/L Tf = T, = 350 K. 272 CHAPTER 6 PERFECT MATRICES

(a) Verify that the following are steady-state solutions of the problem:

7; = 354 K c^^ = 0.964 gmol/L T, = 400 K c^^ = 0.5 gmol/L T, = 441 K c^^ = 0.0885 gmol/L.

(b) Check for the stability of each of the three steady states. (c) While the reactor is operating at the steady state 73 = 441 K and c^ = 0.0885 gmol/L, a small perturbation brings the system to a new state (7^ = c^.). Find T and c^ as a function of time after that small disturbance. 44. Two masses are suspended by springs as shown in Fig. 6.P.44. We can neglect gravity. If JC, is the displacement of mass / from its motionless or static position, the motion of the masses is governed by the equations

m, — KtXt "T" '^2V"^2 -^1/ dhc2 ntj dt' — ~^2(-^2 -^l)*

(a) Let mj = m2 = m. Show that the independent vibrational motions (independent modes) of this system can be deduced from the eigenproblem

Ax = Ax,

where

/Ci "T" K'2 Kn A = m

7779 FIGURE 6.P.44 PROBLEMS 273

(b) Let ki = 1^2 ^^^ kjm =: \. Suppose that, initially, x^ = —0.5 and X2 = 0.5. Plot Xi (t) and X2(t) versus time t. Also plot the positions f j (t) and ^2(0 of the independent modes versus time t, 45. Consider the two-mass system shown in Fig. 6.P.45. This is a simplified schematic of a vibratory feeder for moving particulate material. The top mass ("feeder trough") is angled with respect to the floor and is designed to convey material through high-frequency oscillations induced by the lower mass ("exciter"). The set of springs fixing the feeder trough to the floor (isolation springs) are assumed to have a combined effective spring constant of Ki, and the springs attaching the exciter to the trough (reactor springs) have an effective spring constant of K^. The feeder trough and exciter have masses of Mj and M2, respectively, and the heights of the centers of mass are labeled as a^ and ^2 (as shown in the figure) for the case when there is zero tension on both sets of springs. (a) By defining the displacement of My and M2 from a^ and ^2 by the vector

X = Z2-a2

show that the equations of motion for the two masses can be written in the form ^ = Ax + b. (1)

Express A and b in terms of the system masses, spring constants, and g (the acceleration due to gravity). Note: In this case, gravity cannot be neglected. (b) Transform the above coupled system in Eq. (1) to a decoupled system by introducing the transformation ^ = X~^x, where X~^AX = A. (c) Prove that the natural frequencies (eigenvalues of —A) are real and positive for all real, positive values of Mj, M2, A^j, and Kj^. (d) Find the steady-state solution x^^^ to Eq. (1). What does this state represent physically? (e) Under normal operations, a motor-driven forcing function is applied to the exciter (lower mass) of the form F(0 = K sin cot in the z direction. By defining a new displacement vector u = x — x^^^ solve for

FIGURE 6.P.45 274 CHAPTER 6 PERFECT MATRICES

displacement of each mass, assumed to be initially at rest, as a function of time. What operation frequencies co lead to instabilities in this system? (f) In actual operation, oscillations are damped by air resistance and through the dynamics of the particulate material being transported. As an approximation, assume each vibrational mode is individually damped by a force factor of the form —K^ d^i/dt, / = 1, 2, in the decoupled system. Solve for the total stroke (defined as twice the amplitude) of the feeder trough in the large time limit (i.e., as r -> oo). What effect does damping have on the natural frequencies and the trough stroke? 46. A damped pendulum in small oscillations obeys the equation

(f'O dd_ de 0{t = 0) = a, {t = 0) = P. ~dt dt

where 0 is the angle of displacement of the pendulum and k^ and k2 are positive constants. Setting x^ =6 and X2 = dx^/dt, we can convert this to the system

dx 0 1 = Ax, A = dt —/C2 —k^

where

x(t = 0) = Xo =

From the spectrum of A, determine the relationship between k^ and /c2 such that the motion is (a) periodic, (b) damped periodic, and (c) overdamped. 47. Consider the augmented matrix

1 1 -8 -14 [A,b] = 3 -4 -3 0 2 -1 -7 -10

(a) Use simple Gauss elimination to find the rank of [A, b]. What is the rank of A? (b) How many of the column vectors of [A, b] are linearly independent? Why? (c) If the problem Ax = b (1) has a solution, find the most general one. If it does not have a solution, why not? (d) Evaluate the traces, tr^ A, 7 = 1,2, 3, of the matrix A. (e) Find the eigenvalues of A and At. (f) Is A perfect? Why? Can A be diagonalized by a similarity transformation? Why? PROBLEMS 275

(g) Is there a solution to the homogeneous problem

A+z = 0?

What are the implications of the answer to this question to the solvability ofEq.(l)? 48. Consider the following predator-prey model for big fish with population x^ (predators) and little fish with population x^ (prey):

dx It (a) Using the Newton-Raphson method, find the nontrivial steady-state populations JC^^^ and x^^^ of little fish and big fish for the case where K = 1000, r = 100, Of = 50, ^ = 12, it = 10, and /i = 2. (b) At steady state, would the population be stable to small perturbations, i.e., if at t^ the population densities were perturbed to jc„ = xj^^^ + €„ and Xp = jc^^^ +€p,€^ and €p quite small, would the population return to steady state in time? Why? 49. The generalized eigenvalue problem, with two n x n matrices A and B, is defined by the equation

Ax = A.Bx.

(a) Assume that the determinants of A and B are not 0 and prove that

det(A) = det(B)n^r-

(b) If A and B are positive definite, show that X is positive. (c) If

2 1 2 -1 A = B = 1 2 -1 2

find the eigenvalues and the eigenvectors, (d) Solve the differential equation

dx r 1 B—=Ax, x(f = 0) = dt 0

where A and B are given in part (c). 276 CHAPTER 6 PERFECT MATRICES

i. C'2 rn rT\

FIGURE 6.P.50

50. For the electric circuit shown in Fig. 6.P.50, the currents /] and Ij obey the equations

+ M •• EQ sin cot df + (1) '2 , h W^ + ^2^ + 7f = £;cosv/

where L, and M are inductances, C, are capacitances, EQ, E'Q are electric fields, and co and v are frequencies. Show that Eq. (1) can be expressed in the form

— = Ax + b, dt where x and b are four-dimensional vectors and A is a 4 x 4 matrix. Under what conditions will the currents I^ and 4 be purely oscillational in the case EQ = EQ = 0? Sketch the current versus time for the case ^o 7^ 0, EQ^O under the same conditions. 51. A system whose transient behavior is described by

dx = Ax It may be unstable. A forcing function f can be imposed on the system to yield

dx = -Ax + f. dt In terms of eigenvalue analysis, describe how a "proportional" forcing function.

f = Kix, PROBLEMS 277

FIGURE 6.P.52

or a differential forcing function,

can be used to stabilize the system. Can stability always be obtained if K| and K2 are suitable constants times the unit matrix? 52. Consider the three beads on the string shown in Fig. 6.P.52. The string is under tension T, its total length is 61, and its mass is negligible. The mass of each bead is m and the ratio ml = U'

For small displacements, the equations of motion for the beads are

= —rXx {X2-X^) m 21

2 =-Yi^-2 •^1)+2/^-^3--^2) m-dt d^x. m dt^ = -:^(-^2/''^3' --^2)'^'^ - r^'

(a) Transform these equations into the set

d^v -^=co^yi, / = 1,2,3,

by a linear transformation of the jc,. Give the relationship between x, and J, and the values of co^. (b) If, initially, JC, = 0.02/, X2 = 0.04/, x^ = 0.02/, and dxi/dt — dx2/dt = dx^/dt = 0, find JC, as functions of time. (c) Suppose each bead is subjected to an external force equal to / sin cot at t = 0. Solve this problem / with the same initial conditions. What happens if ft) = co^l 53. Is the following matrix positive definite; i.e., is

Q = x^Ax > 0 278 CHAPTER 6 PERFECT MATRICES

for all real and complex vectors x 7^ 0?

1 2 1 A = 1 3 2 2 1 4 Is Q = x^Ax > 0 for all real vectors x ^ 0? Is A^ positive definite, where

A, = -[A + A^]?

As is the symmetric part of A.

FURTHER READING

Bellman, R. (1970). "Introduction to Matrix Analysis." McGraw-Hill, New York. Noble, B. (1969). "Applied Linear Algebra." Prentice Hall International, Englewood Cliffs, NJ. Noble B., and Daniel, J. W. (1977). "Applied Linear Algebra" Prentice Hall. Englewood Cliffs, NJ. Parlett, B. N. (1980). "The Symmetric Eigenvalue Problem." Prentice Hall International, Englewood Cliffs, NJ. Watkins, S. W. (1991). "Fundamentals of Matrix Computations." Wiley, New York. Wilkinson, J. H. (1965). "The Algebraic Eigenvalue Problem." Clarendon, Oxford. 7 rMPERFECT OR DEFECTIVE MATRICES

T.I.SYNOPSrS

We have found that matrices with nondistinct eigenvalues may or may not be perfect. If the number of eigenvectors corresponding to the eigenvalue A, is less than the multiplicity p^,, then we say that the matrix is imperfect or defective. Thus, by definition, an nxn matrix A is imperfect if it has fewer than n eigenvectors. In this chapter we examine the properties of imperfect matrices and, when possible, we will extend the theorems and properties, or present analogous relations, presented in Chapter 6 for perfect matrices. We will begin by proving that for any square matrix A there exists a nonsingular matrix Q that, in a similarity transformation, transforms A to the Jordan canonical form, i.e., Q~^AQ = Aj, where the partitioned matrix Aj has Jordan block matrices J, along the main diagonal and O's elsewhere. The Jordan block J is a square matrix whose elements are 0 except for those on the main diagonal, which are all equal, and those on the first diagonal above the main diagonal, which are all equal to 1. We will see that, analogous to perfect matrices, if f(t) can be expressed as a Taylor series in t, then /(A) = Qf{Aj)Q~\ where /(A) is a block diagonal matrix with the elements /(J,) along the main diagonal and O's elsewhere. The exponential of a A: x /: Jordan block is, therefore, of the form

exp(rj) = exp(ar)K(0,

279 280 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

where a is the value of a main diagonal element of J and

t' 1 t 2! (k-\)\ fk-2 0 1^. K(f) = {k-2)\

0 0 0 .

The general solution to dx/dt = Ax, x(0) = x{t = 0), is, therefore.

exp(Xir)Ki 0 0 x = Q 0 exp(A.^f)K2 0 Q-^x(O), 0 0 exp(kj)K,

where exp(X^OKi = exp(rj-), J, being the iih Jordan block of A. The column vectors of Q contain all of the eigenvectors of A. If A is imperfect, then Q has more column vectors than there are eigenvectors of A. The excess column vectors are called generalized eigenvectors. From the Jordan canonical transformation theorem, we can deduce an algorithm for calculating the generalized eigenvectors for any imperfect matrix. We will see that the Fredholm alternative theorem must be invoked in computing the generalized eigenvectors and examples of the calculation of Q will be given. We will also see an application of an initial value problem for which the companion matrix is imperfect. Finally, we will show that any square matrix, perfect or imperfect, can be expressed in Schmidt normal form as

1=1 where {x,,...,x„} and [y^,... ,y„] are orthonormal basis sets and K^ are positive real numbers. The vectors {x,} are eigenvectors of AÂ and AA^ respectively, and Ki are the positive square roots of the eigenvalues of AÂ (or of AA^ since its eigenvalues are the same as those of AÂ).

7.2. RANK OF THE CHARACTERISTIC MATRIX

The eigenvector or eigenvectors of the matrix A, corresponding to the eigenvalue A,, are obtained by solving the characteristic equation

(A - kil)Xi = 0. (7.2.1)

According to the general solvability theorem, the number of solutions to Eq. (7.2.1)

IS n — r A-A,I' where n is the number of columns of A (and thus the dimension of X,) and r^_x i is the rank of the characteristic matrix A — XJ. In Section 6.4 we RANK OF THE CHARACTERISTIC MATRIX 28 I

showed that if X^ is a distinct eigenvalue, then Eq. (7.2.1) admits only one linearly independent eigenvector. The starting point of the proof was the relationship

^(A) = (-iyj\ir„_j(A - AI), (7.2.2)

where P„(A,) = |A — Xl\ is the characteristic polynomial. Suppose the multiplicity of the root X^ is /?,. Then the characteristic polynomial can be factored as

P,(X) = {k,-kr^ n'(^-^) (7-2.3)

from which it follows that

0 ifk

From Eqs. (7.2.2) and (7.2.4), it follows that

tr„_^^.(A-X,I)/0, (7.2.5)

which proves that at least one (n — /7-)th-order principal minor is not 0. This, in turn, implies that

^A-x,i > « - A . (7.2.6)

If the eigenvalue is distinct, i.e., p- = \, then Eq. (7.2.6), plus the fact that the rank of A — XJ is less than n, requires that r^_x i = n — 1. If /?, > 1, then we only know that

n-r^-,,i

which means that there might be fewer eigenvectors than p^. Since X!/ Pi = n, Eq. (7.2.1) implies that

EO'-^x-x^)

which means that a matrix might be imperfect if any eigenvalue is degenerate. As pointed out in Chapter 5, having degenerate eigenvalues is a necessary but not a sufficient condition for a matrix to be imperfect. For example, the characteristic polynomial of I is P„{X) = (1 —A)", and so Xj = 1 and p^ = n. Nevertheless, I is a perfect matrix because any basis set {x,,... ,x„} in E„ forms a linearly independent set of eigenvectors of I. 282 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

On the other hand, consider the /c x A: matrix

a 1 0 •• • 0 0 0 a 1 •• • 0 0 (7.2.9)

0 0 0 •• a 1 0 0 0 •• • 0 a

which has a's on the main diagonal, I's on the diagonal just above the main diagonal, and O's everywhere else. The characteristic polynomial for this matrix is

p,(A)=:(a-x)^ (7.2.10)

and so Xi = a and p, = k. However, the rank of J — X^l is fc — 1, which means that J has only one eigenvector, namely,

e, = (7.2.11)

A matrix of the form of J is known as a Jordan block matrix. In what follows, it will be shown that Jordan block matrices play a fundamental role in the theory of imperfect matrices.

7.3. JORDAN BLOCK DIAGONAL MATRICES

The k X k Jordan block matrix J was defined in Eq. (7.2.9). It is a square matrix whose elements are 0 everywhere except on the main diagonal, which are all equal to a, and those on the first diagonal above the main diagonal, which are all equal to 1. A Jordan block matrix has only one eigenvalue of multiplicity k, and only one eigenvector, namely, the ^-dimensional unit vector e,. We define a Jordan block diagonal matrix as a partitioned matrix of the form

J, 0 0 0 0 J2 0 0 A,= (7.3.1)

0 0 0

where the matrices J, along the main diagonal of the partitioned matrix are Jordan blocks and all the other matrices are 0. An example of a Jordan block diagonal JORDAN BLOCK DIAGONAL MATRICES 283

matnx is

^1 1 0 0 0 0

0 ^1 1 0 0 0 J> Oi 0 0 0 0 0 A.= "21 "23 (7.3.2) 0 0 0 A2 1 0 0„ 0 J3 0 0 0 0 0 32 0 0 0 0 0 X,

where

0 [0 01 J. = 0 1 On = 0 0 0,3 = 0 0 [o oj 0 0 0 ©21 = (7.3.3) 0 0 0 ©23 =

03, = [0,0,0], ©32 = [0,0] J3 = X3.

In the above example, there are three Jordan blocks of dimension 3, 2, and 1, respectively. And since |Aj — AI| = (A,j — A)^ (A.2 — A,)^ (A.3 — A), there is an eigenvalue X^ of multiplicity 3, X2 of multiplicity 2, and A3 of multiplicity 1. The characteristic matrix Aj - A, I has the rank 5 for all three eigenvalues. Thus, the 6x6 matrix has only three eigenvectors. These are

1 0 0 0 0 0 0 0 0 X, = , X2 — X3 = (7.3.4) 0 1 0 0 0 0 0 0 1

This result points out an important aspect of a Jordan block diagonal matrix. Let us partition the eigenvectors of Aj and the unit matrix I to conform with the block diagonal form in Eq. (7.3.2); i.e., let us write

rx<»] I<" 0 0 X = x<2> 1 = 0 I<2) 0 (7.3.5) [x<3)J 0 0 I")

where x*" is a three-dimensional vector, x*^^ a two-dimensional vector, and x*" a one-dimensional vector, and I"\ I*^', and I*^* are 3 x 3, 2 x 2, and 1 x 1 unit matrices. Then the eigenproblem given by

(A. - AI)x = 0 (7.3.6) 284 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

now becomes

J, - AI<'* 0,2 0, TxW " 0<" 1 x(2' 0(2) 0, J, - AI'2> 0.2 3 = (7.3.7) 0. J3-AI«> Lx"> 0*3) J

(J,. - XI<'>)x<'> = 0"', i = 1,2,3. (7.3.8)

Three linearly independent solutions to these equations are

1 .0) 0 A — A J, X(1j ) _— C^(1 ) 0 = 0«' , xf'=0"'=0, 1 0 o_ o" „(2) (2) r A — A.?' ^2 — — 0 , A2 , xf = 0<3> = 0, (7.3.9) er = 0 o_ 'o 'o A — A^, XT — U xf = 0<2' = , xf> = ef> = l. 0 0 0 The three solutions fxPI X,. = xp / = 1,2,3, (7,3.10) xf' J correspond to the three eigenvectors given by Eq. (7.3.4). If, in the general case Eq. (7.3.1), we partition the eigenvectors as

X = (7.3.11)

As)

we can reduce the eigenproblem

(Aj-AI)x = 0 (7.3.12)

(J, - A.I^^"^)x^''^ = 0^'\ / = 1,...,5, (7.3.13) JORDAN BLOCK DIAGONAL MATRICES 285

which yields

0(2)

QO-l) X, = where AjX, = X,x,, / = 1,..., 5'. (7.3.14)

00+1)

O(^)

What we have just seen is that the eigenvalues of a Jordan block diagonal matrix are the eigenvalues of each of the Jordan blocks and that if the n x n matrix Aj is composed of s Jordan block matrices, then Aj has exactly s linearly independent eigenvectors. These eigenvectors are n-dimensional unit vectors e, of the form given by Eq. (7.3.14). If we designate the dimension of a Jordan block as /:,, then it follows that X]f=i ^t = ^' By carrying out the matrix multiplication, it is easy to show from Eq. (7.3.1) that positive powers of Aj are given by

Jt 0 0 0 0 J* 0 0 A5 = (7.3.15)

0 0 0

where ^ is a positive integer. If none of the eigenvalues is 0, then the inverse Aj' exists and is of the form

•jr' 0 0 1 0 0 -1 J2' • J = (7.3.16) 0 0 •• J;'J

as can be verified by showing that the product Aj^Aj equals the unit matrix. Similar to Eq. (7.3.15), the *th power of AJ^ is

Jr* 0 0

0 J2»2 * 0 A- = (7.3.17)

0 0 286 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

From Eq. (7.3.15) it follows that if f(t) = E£o A^^ then

J\ 0 0 0 J5 0 /(Aj) = EA (7.3.18a) k=0 0 0 I*

/(J,) 0 0 0 /(J2) 0 /(Aj) = (7.3.18b)

0 0 /(J.) If Aj is not singular, then it follows from Eqs. (7.3.15) and (7.3.17) that if f{t) = X!£_oo A'*' the function /(Aj) is also of the form given by Eq. (7.3.18b). We saw in the previous chapter that if A is perfect the solution to the equation dx/dt = Ax is an exponential function of time. What if A is imperfect? The simplest such case is d\ Jx, x(0) = x(( = 0), (7.3.19) Tt where J is a fc x fc Jordan block. In terms of its components, this vector equation becomes dx^ — A,X\ ~t" Xy

dxj, A.X'j "T~ X'l dt (7.3.20)

dxu — ^^k-\ + ^k dt dx^ — KXu, dt The solution to these equations can be obtained by solving first the fcth equation, then the {k — l)th, etc. From this we obtain

Xj^ = exp(A.r)jc^(0), (7.3.21)

then

— (exp(—AOA:jt_i) = exp(-AO^)k = Xj^{0) dt (7.3.22) jjf^j = exp(XO^it_i(0) 4- exp(>-0^-^jt(0)» JORDAN BLOCK DIAGONAL MATRICES 287

and so on until

xj^_^ = exp(Xt)Xf,_j{0)'^-exp{Xt)tXf,_j^ii0)

fJ (7.3.23) 2! ^ j\

In vector form these equations become

X = exp(A/)Kx(0), (7.3.24)

where

^fe-3 ^k-2 1 t 2! (A:-3)! (it-2)! (k - l)\ ,k-3 0 1 (it-4)! (it-3)! (it-2)!

K = (7.3.25)

0 0 0 ••• 1 t f

0 0 0 ••• 0 1 t

0 0 0 ••• 0 0 1

Since the formal solution to Eq. (7.3.19) is

X = exp(rj)x(0) (7.3.26)

for an arbitrary initial condition x(0), Eqs. (7.3.24) and (7.3.26) prove that

exp(rj) = exp(XOK(0. (7.3.27)

For a Jordan block, then, the elements of the exponential function exp(r J)x(O) are not linear combinations of exponential functions of time, but are rather linear combinations of exp(A.r)r^ y == 0,..., A: — 1. In fact, asymptotically.

||exp(fj)|| -> exp(X/)r* (7.3.28)

Although the asymptotic form of ||exp(rj)|| is not a simple exponential function of time, it still follows that whether

X = exp(rj)x(0) (7.3.29)

tends to 0 or cx) will depend on whether the real part of the eigenvalue k is positive or negative. 288 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

Since exp(fJi) 0 0 0 expC/Ji) 0 expC^Aj) (7.3.30)

exp(f J,) J

it follows that exp(AiOKi 0 0 0 exp(X2t)K2 0 expo A j) = (7.3.31)

exp(X,OK,

and so the solution to dx x(0) = xO = 0), (7.3.32) dt ^

~exp(A.iOKi 0 0 0 exp(X20K2 0 X = x(0). (7.3.33)

0 0 • • • exp(A.,OK, J If all of the Jordan blocks are 1 x 1 matrices, Aj is just a diagonal matrix and the diagonal elements of the above matrix become simple exponentials.

7.4. THE JORDAN CANONICAL FORM

Why was the Jordan block diagonal matrix discussed in such detail in Section 7.3? The answer is that this form, known as the Jordan canonical form, is the closest we can come to diagonalizing an imperfect matrix through a similarity transformation. The theorem stating this is: THEOREM. IfAisannxn square matrix, there exists a nonsingular matrix Q such that

J, 0 0 0 J, 0 Q 'AQ = Aj = (7.4.1)

0 0

where J, are Jordan blocks. The same eigenvalue may occur in different blocks but the number of different blocks having the same eigenvalue is equal to the number of independent eigenvectors corresponding to that eigenvalue. The number of distinct eigenvectors of A equals the number of Jordan blocks in Aj. THE JORDAN CANONICAL FORM 289

The proof of this theorem is rather detailed and will not be reproduced here. Instead, the consequences of the theorem will be explored. In the special case that A is perfect, the theorem becomes the spectral resolution theorem and all Jordan blocks are 1 X 1 matrices (i.e., J, = X,, / = 1,..., n). From what we learned in the previous section, we know there are s linearly independent eigenvectors of Aj, one for each Jordan block. For a given eigenvalue A.,, the rank of Aj — X,I will determine how many eigenvectors correspond to this eigenvalue. Since the rank of a matrix is not changed when it is multiplied by a nonsingular matrix, the rank of Q(Aj — A,I)Q~^ = A — A.,1 is the same as the rank of AJ — A,I. This establishes that if Q transforms A, as shown in Eq. (7.4.1), then the number of distinct eigenvectors of A equals the number of Jordan blocks. In fact, if y^ is the eigenvector of Aj arising from the /th Jordan block, i.e., if

0(1)

00-1) y. (7.4.2) OO'+i)

0^0

then the eigenproblems,

(Aj-A,I)y,=0, (7.4.3)

can be transformed into

Q(Aj ~ AJ)Q-Qy, =0 or (A - A,I)x, = 0, (7.4.4)

where

Qy« (7.4.5)

Thus, if Q is known, the eigenvectors of A can be obtained from the eigenvectors of A J by the simple relationship x, = Qy,. Since the eigenvectors y, are n- dimensional unit vectors e^, this relationship establishes that the eigenvectors x, of A are some subset of s column vectors of Q. The remaining n — s column vectors of Q are called generalized eigenvectors. In the next section, we will discuss how to calculate the generalized eigenvectors. First, however, let us consider some of the implications of the theorem. Note that

Q 1 A^Q = Q-^ AQQ^ AQ Q^ AQ = Aj, (7.4.6) 290 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

where A: is a positive integer. Thus, if f(t) is a series function of t, f(t) = Er=o/*«*.then

/(J,) 0 0 0 fih) 0 Q-'/(A)Q = /(Aj) (7.4.7)

fQs)

or fih) 0 0 0 /(J2) 0 /(A) = Q Q'. (7.4.8)

0 0 fih) If A is nonsingular, then Eq. (7.4.8) also holds if fit) is a Laurent series in t, i.e., /(o = Er=-oo//- The general solution to dx — = -Ax, x(0) = x(t = 0), (7.4.9) dt now becomes

exp(A,,OK, 0 0 0 e\p{X2t)K2 0 x = Q Q-'x(O), (7.4.10)

0 0 exp(A,jOKj J

where, according to Eq. (7.3.22),

,2 fk,-l I'- ,*,-2 1 t ll iki~2V. ik, - 1)!

0 1 t (fc;-3)! (^,-2)! K, (7.4.11)

0 0 0 1

0 0 0 0

Just as in the case of 2i perfect matrix, if all we want to know is whether ||x|| -> 0 or oo as f -> (30, then all we need to know is whether the real parts of the eigenvalues of A are negative or positive. The solution to Eq. (7.4.9) for an imperfect matrix is a linear combination of terms of the fonn T exp(A,0» where the integers v depend on the sizes of the Jordan blocks. THE JORDAN CANONICAL FORM 291

ILLUSTRATION 7.4.1 (Damped Oscillator). Consider the case of the damped oscillator depicted schematically in Fig. 7.4.1. This illustrates a problem in control theory involving a second-order system and is related to the way shock absorbers on an automobile work. We denote the position of the mass m by M. The restoring force acting on the mass when it is displaced from its equilibrium position (defined in the absence of gravity) is —k(u—UQ), where k is the spring constant. The force of gravity is ~mg, where g is the acceleration of gravity. In addition, a viscous force of —r]du/dt is exerted by a dash pot (a plunger being pulled through a viscous fluid). Here, the damping factor rj is proportional to the viscosity of the fluid. The equation of motion for the mass is

dhi du m It mg. (7.4.12) With the definitions

32 = ? = t/r = 1, (7.4.13) 2y/mk' mg and uk (7.4.14) mg' ^ e' the equation of motion becomes in dimensionless form d^y dy (7.4.15) ^^^+2C-+dx^ dx , = V^. We convert this to a first-order system of equations by defining

x^=y (7.4.16) dx' to obtain dxi = Xo dx (7.4.17) dxy —^ = -Xi - 2^X2 4- f dx

FIGURE 7.4.1 Damped oscillator. 292 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

or dx = Kx + b, (7.4.18) dx where

0 0 1 X = b = K = (7.4.19) -1 -2?

The eigenvalues of K are

Ai=-f-Vf'-l and ^2 = -? + \/? 1. (7.4.20)

With the above solution, we note that if f > 1 both eigenvalues are real and negative. Likewise, if f < 1, both eigenvalues are complex, but have negative real parts. If f = 1, A-i = ^2 = — 1. The solution to the first-order system is

y = exp(Kr)yo, (7.4.21)

where

y = X - Xp and yo = x(0) - x^, (7.4.22)

and

x„ = -K^b. (7.4.23)

(i) Discuss how a disturbance given by y(r = 0) = 1 decays in time for the cases ^ > 1, f < 1, and ^ = 1 in terms of the properties of K, i.e., when K is perfect and when it is not. (ii) Evaluate exp(KT) for the three cases f < 1, ^ > 1, and f = 1. In what follows assume that the datum of the coordinate system is chosen so that ^ = 0 (and so b = 0). Also, consider a disturbance such that Ji = 1 and y^ = 0 at r = 0. (iii) For ^ < 1 the system is an oscillatory, underdamped system. Give the solution for y^ versus r for this case. (iv) For f > 1 the system is a nonoscillatory, overdamped system. Give the solution for y^ versus r for this case. (v) For f = 1 the system is critically damped. Give the solution for y, versus r for this case. (vi) Plot yi versus r for the values f = 0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.4, and 1.6. The Jordan canonical form enables us to prove for an arbitrary square matrix the Hamilton-Cayley theorem:

HAMILTON-CAYLEY THEOREM. An arbitrary square matrix satisfies its own characteristic equation. THE JORDAN CANONICAL FORM 293

To carry out the proof, we need to define the nilpotent matrix.

DEFINITION. We say that the square matrix N is nilpotent of index p if W-^ ^ 0 and N'' = 0. Consider tht k x k matrix

0 1 0 0 •• • 0 0 0 1 0 •• • 0 H = 0 0 0 1 • 0 (7.4.24)

0 0 0 0 •• • 0

i.e., all elements are 0 except on the diagonal just above the main diagonal, all of whose elements are unity. Consider what happens when a matrix is left-multiplied byH:

0 1 0 0 •• 0" a,, «12 «U 1

0 0 1 0 •• 0 «21 «22 "ik HA

0 0 0 0 •• 1 «t-i,i ^k-l,2 ^k-l,k 0 0 0 0 •• • o_ - «*! «t2 Okk J (7.4.25) ^21 "22 "2fe

«32

%2 0 0 0

Multiplication of H by A eliminates the top row of A, raises each other row up one, and replaces the bottom row by 0*s. Thus, the product H^ will have only O's in the last two rows and the (k — 2)th row will be 0, 0,..., 0, 1. Continuing the process k — \ times will place O's in rows 2,..., ^ and the first row will be 0,0,..., 0, 1, i.e..

0 1 0 0 H k-\ (7.4.26)

0 • . 0 0

Finally, multiplication of Eq. (7.4.26) by H yields

H*=0, (7.4.27)

which, since H*~^ ^ 0, proves that the nilpotent index of a A; x /: matrix of the form of Eq. (7.4.24) is k. 294 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

Next, consider the characteristic polynomial of an arbitrary matrix A. In factored form, it can be expressed as

(7.4.28) 1 = 1

where k^ is the dimension of the Jordan block J, appearing in the Jordan canonical form Aj given in Eq. (7.4.1). Let us consider the Jordan block J^ and examine the function P„(jp, which can be expressed as

^n(J>) = n(M*'^'-J;)* (7.4.29) i=l

where I**^' is a kj x kj unit matrix. The factor in the product corresponding to i = j is

iXjr''^-Jjf'=(-lfiu]', (7.4.30)

where Hj is a kj x kj matrix of the form defined by Eq. (7.4.24). We have just proven that the nilpotency of such a matrix is kj, and thus it follows that

/'„(!;) = 0, 7 = 1 s. (7.4.31)

But, according to Eq. (7.4.8),

0 0 P„(J2) 0 ^(A) = Q Q (7.4.32)

0 0 PniJs) which proves that

P„(A) = 0 (7.4.33)

and establishes the Hamilton-Cayley theorem.

7.5. DETERMINATION OF GENERALIZED EIGENVECTORS

If A is a 2 X 2 defective matrix, the Jordan block diagonalization theorem gives

Xi 1 AQ = Q (7.5.1) 0 X,

or 1 A[qi,q2] = [qi,q2] (7.5.2) X, DETERMINATION OF GENERALIZED EIGENVECTORS 295

where q^ and q2 are the two-dimensional column vectors of Q. Carrying out the matrix multiplication in Eq. (7.5.2), we obtain

[Aq,, Aq2] = [A-iq^, X,q2 + qj. (7.5.3)

Equating the two column vectors, we find

Aqi =X,qi (7.5.4)

and

(A-A,il)q2 = qi, (7.5.5)

We see that the column vector q^ is, in fact, the eigenvector of A, whereas q2 satisfies an inhomogeneous equation. Recall that, according to the solvability theorem, Eq. (7.5.5) has a solution if and only if the rank of A — A,I is the same as the rank of [A — Xil, q,]. Or stated as the Fredholm alternative theorem, Eq. (7.5.5) has a solution if and only if

z^q^ = 0, where z satisfies (A^ — k\l)z = 0. (7.5.6)

The matrix 4 4 3 3 A = (7.5.7) 1 ? 3 3 J can help to illustrate the process. In this case, the characteristic polynomial is P2 = (-A)^ + 4(-A,) + 4, whose root A, = 2 is of multiplicity 2. The rank of A — 21 is 1, and so there is only one eigenvector. We easily find it to be

X, = (7.5.8)

which we set equal to qj. The solution to (A^ — 2I)z = 0 is

-~1 (7.5.9) z = 2

By Eq. (7.5.6) we find that the required condition, z^qj = 0, is obeyed, and so the equation

4 3 ^12 (7.5.10) ^22 296 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

is ensured a solution. Either ^12 ^^ ^22 ^^^ ^^ chosen arbitrarily. With ^22 = ^» the solution to Eq. (7.5.10) is

-3 q2 (7.5.11) 0

and 0 1 2 -3 Q-' = (7.5.12) 1 0 1 2 L "3 3 J The reader can easily verify that

2 1 Q 'AQ (7.5.13) 0 2

Consider now the general case of an n x n matrix:

J, 0 0 0 h 0 AQ = Q (7.5.14)

0 0

where the dimension of each Jordan block J^ will be denoted by A;,. It is convenient to repartition the matrix

Q = [q,,...,qJ, (7.5.15)

Q = [Q*",....Q<^'], (7.5.16)

where each matrix Q*'* contains A, of the n-dimensional column vectors of Q, which will be relabeled as

(7.5.17)

for convenience. With this partitioning, Eq. (7.5.14) becomes

J, 0 0 0 J2 0 A[Q<'\...,Q<'>] = [Q<" Q<^'] (7.5.18)

0 0

[AQ<",...,AQ«] = [Q<'>J.,...,QWjJ. (7.5.19) DETERMINATION OF GENERALIZED EIGENVECTORS 297

Equating partitioned components of Eq. (7.5.19) yields

AQ^'^ = Q^'^J,., / = 1,...,5. (7.5.20)

The result is that the generalized eigenvectors of each Jordan block can be determined separately. Using Eq. (7.5.17) in Eq. (7.5.20), we get the system of equations

Aqf = X,qf> (A-A,.I)qf = ql'' (A-A,I)q«=qf (7.5.21)

Thus, as in the 2 x 2 case, the first column vector in Q^'^ is the eigenvector corresponding to X, and the other generalized eigenvectors have to be determined by solving the appropriate inhomogeneous equations. If each Jordan block has a different eigenvalue, then the multiplicity of each eigenvalue determines the dimension of each Jordan block and the system represented by Eq. (7.5.21) can be solved for each block without further consideration. However, suppose we have the case

Jr 0 1 Aj = (7.5.22) 0 J2J where the eigenvalues of Ji and J2 are the same, say Aj. A priori, we know from the rank of A — Ajl that A has only two eigenvectors and, therefore, only two Jordan blocks. However, we do not know the dimension of the blocks. For an n X n matrix, Ji can have any dimension ranging from ^^ = « — 1 to 1 as long SiS ki + k2 = n. The strategy for finding the appropriate Jordan blocks and the generalized eigenvectors has to take into account this uncertainty. The following strategy is recommended: Step 1. Find the eigenvalues A, and their multiplicities pj^.. Step 2. For each distinct eigenvector A, (p;^. = 1), set q, = x,, the eigenvector corresponding to X^, and set J, = X^. Step 3. For each case p^. > \, compute the rank r^^.;^ i of the characteristic matrix. The number of eigenvectors equals n — r^_x i- This is also the number of Jordan blocks corresponding to A,. Step 4. For the case n — r^,;^, = P)^, > 1, each Jordan block is a 1 x 1 matrix equal to A,,. For each such case, compute the p;^. eigenvectors x, and set them equal to the appropriate q,. Step 5. For each case pj^, > 1 and n — r^_x.i = 1» there is only one Jordan block and it is a p^ ^ Px, niatrix. The generalized eigenvectors can be computed according to Eq. (7.5.21) with A:, = p^,. 298 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

Step 6. When p^^, > n — r^_^i > 1, there will be n — r^_x.i Jordan blocks for the eigenvalue A,,, but we will not usually know at the outset the dimension of the block (if p^, = 3, « — r^_x i = 2, then one block will be 1 X 1 and the other 2x2, but for py^, > 3, there is no unique choice). (6a) The procedure for this case is to calculate the n — r^.^.i eigenvectors {x,}. (6b) Next, calculate the rank of the augmented matrix [A — X,!, x] for each eigenvector x. If, for any eigenvector, the rank of the augmented matrix is not equal to r^_x,i, then its Jordan block is simply A,,. (6c) If the ranks of the augmented matrix with some of the eigenvectors (say Xj and X2 for illustration) are equal to r^ — AJ, then the generalized eigenvectors q^{^ and q2''^ can be computed from

(A - A.,.I)q^-^^ = X, and (A - A.,.I)qf = X2. (7.5,23)

Next, try to find generalized eigenvectors by solving

(A - X,.I)q<^> = qf (7.5.24)

and

(A-A,I)qf*=qf. (7.5.25)

Keep going until no further solutions can be found. (6d) Suppose Eqs. (7.5.24) and (7.5.25) do not admit solutions, but we know (from p^) that the Jordan blocks have to be bigger than two 2 x 2's. The failure to find a solution means that zjq2^^ or zjq2*^ are not all 0, where z,, / = 1,2, are the solutions to (A^ — A.*I)z = 0 corresponding to Xj and X2. We have to return to Eq. (7.5.23) and obtain a solution to

(A - X,.I)q^'"^ = axi 4- ^^2 (7.5.26) (A-Aj)qf = )/Xi-|-5x2.

q|^^ = aXi H- Px2 ^^^ ^f ^ = Y^x + ^^^2 ^^ still eigenvectors of A corresponding to A,,. The new solutions are

q<^> = aq«> + ^qf (7.5.27) qf = Kqf+5qf, where q2^' and qf are the solutions found in Eq. (7.5.23). We now impose the conditions

z/q<^* = 0 = az,^q<^> + ^Sz/qf, i = 1, 2, (7.5.28) ztqf = 0 = y zjq<^> + 5zjqf, i^\,l. (7.5.29) DETERMINATION OF GENERALIZED EIGENVECTORS 299

Equation (7.5.28) or (7.5.29) will have a solution for a and P ory and 5 if a solution to Eq. (7.5.24) or (7.5.25) exists. We continue this process until all of the Jordan blocks are accounted for. A couple of examples will perhaps help to illustrate the procedure for finding the transformation matrix Q. EXAMPLE 7.5.1. Define the matrix " 14 8 2 1 1 12 8 4 (7.5.30) -^ -1 -2 7 1 -2 -4 -6 2

and reduce it to its Jordan canonical form. The eigenvalues of A are A^ = 2, p^^ = 3 and X2 = l, p^^ = I. The rank of A — A, 11 is 3 and the rank of A — A,2l is 3. Thus, there is one eigenvector corresponding to each of the two eigenvalues. These are

0 -2 0 1 X, = and X, = (7.5.31) 0 1 0 '2 1

Since A2 is distinct, its Jordan block is Jj = A-j = 1. The Jordan block of A., = 2 is

2 1 0 J.= 0 2 1 (7.5.32) 0 0 2

and the solution of (A^ — A,iI)Zi = 0 is r2 3 4 z, = 3 (7.5.33)

1 The generalized eigenvectors of X, =2 obey the equations

(A - 2I)q2 = X, (7.5.34)

and (A - 2I)q3 = q^. (7.5.35) 300 CHAPTER? IMPERFECT OR DEFECTIVE MATRICES

Since z|x, = 0, Eq. (7.5.34) is solvable and the solution is

-3 0 q2 (7.5.36) 1 0

Also, since zjqj = 0, Eq. (7.4.34) is solvable with the solution

q3 = (7.5.37)

The transformation matrix Q = [x,, q2, qs, Xj] is given by

2 -3 -4 0

1 0 0 0 Q = (7.5.38) 1 0 1 0 2

0 1 1 1

and through the similarity transformation Q 'AQ, A is transformed into the Jordan canonical form. With 2 4 6 8l 0 1 0 0 (7.5.39) •'-^ 1 2 8 4 _-2 -4 -6 -3 J it is straightforward to show that 0 ' Q-'AQ = Ji (7.5.40) 0 J2

In this example, the detennination of Q represents an easy application of the transformation theorem. The next example represents a more complex application. It should be noted that, since the column vectors of Q are calculated from homogeneous equations and particular solutions to singular equations, Q is not unique. In fact, the matrix

2 1 0 0 1 -2 1 0 P = (7.5.41) 0 1 -2 1 0 0 1 -2 DETERMINATION OF GENERALIZED EIGENVECTORS 301

also reduces the matrix A defined in Eq, (7.5.30) to the Jordan canonical form in a similarity transformation.

EXAMPLE 7.5.2. Find the Jordan canonical form of

-20 I 4 -8 5 -1 -5 -2 15 -16 10 -2 -4 0 -2 -17 15 -3 (7.5.42) -; -3 0 9 -32 20 -4 -2 0 6 -12 -3 2 -1 0 3 -6 2 -6

The eigenvalues of A are A,i = —2, pj^^ =5 and X2 =^ —I, Px^ ^ ^' The rank of A — X,I is 4, and so there are two eigenvectors corresponding to Xj = —2. This means that there will be two Jordan blocks of dimension 4x4 and I x I or 3 x 3 and 2x2. There is one eigenvector for X2 = —1 and its Jordan block is X2.

The eigenvectors of A are

-6 3 1 -3 2 2 0 1 1 3 X, = X2 = (7.5.43) 3 0 4 2 0 5 1 0 6

where Ax, = —2x,, AX2 = —2x2, and AX3 = —X3. The solutions to (A^ —Xjl)z, = 0, i = 1, 2, are

Zi = and (7.5.44)

Since zjx^ =0, / = 1,2, the equations for the generalized eigenvectors

(A-XiI)q2=Xi (7.5.45)

and

(A - A,,I)P2 = X2 (7.5.46) 302 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

have solutions. These are 17 -6 4 -1 0 0 ^2 and P2 = (7.5.47) -4 1 0 0 0 0

Since the two generalized eigenvectors exist, the Jordan blocks must be of dimension 3x3 and 2x2. However, since zjqj = —4 and zjpa = 1, neither

(A - X,I)q3 = q^ nor (A - A.il)p3 = Pz (7.5.48)

has a solution. Since any linear combination of Xj and Xj is also an eigenvector corresponding to Xj, we replace the eigenvector x, by the eigenvector

y, = X, + 4x2, (7.5.49)

and compute q^ from

(A - A,|I)q2 = y. (7.5.50)

to obtain

q2 (7.5.51)

for which zjqj = Z2q2 = 0. The linear combination of Xj and X2 chosen to form y, thus satisfies Eq. (7.5.28). Now the equation

(A-A.,I)q3 = q2 (7.5.52)

has a solution, namely.

q3 = (7.5.53)

The eigenvectors and generalized eigenvectors form the matrix

Q=[yi.q2.

6 -7 0 3 -6 1 5 0 -7 2 -1 2 4 0 0 1 0 3 Q = (7.5.55) 3 0 0 0 1 4 2 0 0 0 0 5 I 0 0 0 0 6

With this matrix,

Jl 0 0 Q-'AQ = 0 h 0 (7.5.56) 0 0 J3

where

-2 1 0 J,= 0 -2 1 0 -2 (7.5.57) ' -2 r h = 0 -2 h = -1.

7.6. DYADIC FORM OF AN IMPERFECT MATRIX

For a perfect matrix, the spectral decomposition, A = YTiî îî^]^ amounts to the expression of A as a dyadic consisting of a linear combination of the dyads x,zj, / = !,...,«, where {x,} are the eigenvectors of A and {z,} their reciprocal set. We are interested in searching for an equivalent decomposition of an imperfect matrix into a linear combination of n dyads. Suppose Q = [q^,..., q,J is a matrix that transforms A into the Jordan canonical form in a similarity transformation. Define

z=[z„...,zJ^(Q-^)^ (7.6.1)

The column vectors of Z are the reciprocal set to the column vectors of Q, since Z^Q = tzjqy] = I = [8ij], or zjq^ = 5,^. If {q,} is taken as a basis set in £"„, then the identity matrix can be expressed as

i = Eq.=^J (7.6.2) 1 = 1 304 CHAPTER? IMPERFECT OR DEFECTIVE MATRICES

From this, and the property A = AI, it follows that

A = i:y,zJ, (7.6.3) 1=1

where

y,= Aq,. (7.6.4)

Thus, any arbitrary matrix can be expressed as a dyadic consisting of a sum of n dyads. If an imperfect matrix has s eigenvectors, denoting these by Xj, X2,..., x_y and noting that y, = Ax, = A,x, for this set, we can rewrite Eq. (7.6.3) as

A = J:M,Z?+ 1: y,zj, (7.6.5) 1=1 i=s+l

where the y, are linearly independent vectors related to the generalized eigenvectors through Eq. (7.6.4). The proof that the vectors y^ form a linearly independent set proceeds as follows. Assume that the y, are linearly dependent; i.e., assume that there exists a set of numbers {a^+i,..., a„), not all 0, such that

i: a J, = 0. (7.6.6)

From Eq. (7.6.4), it follows that

A E a^qt = 0 (7.6.7) i=.v-M or that J2"-^^i Qf^q, is an eigenvector of A. This is a contradiction, since the eigenvectors Xj,..., x^ are not included in the set of generalized eigenvectors generating the set {y,^i,..., y„} in Eq. (7.6.6). Equation (7.6.5) indicates that a dyadic representation of A needs at most n dyads. If k of the eigenvalues of A are 0, then Eq. (7.6.5) will contain n — k dyads. ••• EXERCISE 7.5.1. Suppose {x,} is a basis set in E„ and {z,} is its reciprocal set, and let

A = Ex»^J-fi- 1=1 Find the eigenvectors and eigenvalues of A. Give the Jordan canonical form Aj • • I of A. What is the nilpotent index of A?

7.7. SCHMIDT'S NORMAL FORM OF AN ARBITRARY SQUARE MATRIX

Since its proof is somewhat tedious, we will first state the theorem of interest in this section, examine its consequences, and then give its proof: SCHMIDTS NORMAL FORM OF AN ARBITRARY SQUARE MATRIX 305

THEOREM. For an arbitrary square matrix A, the solutions Xj, Xj,..., x„ and

yi» y2» • • •»Yn ^fthe equations

AXi=Kji (7.7.1)

and

AV,=^/X,- (7.7.2)

form orthonormal basis sets in E„ and the quantities KJ are real numbers equal to or greater than 0. Note that this theorem is valid whether or not A is a perfect matrix. By multiplying Eq. (7.7.1) by A^ and Eq. (7.7.2) by A and then substituting Eqs. (7.7.2) and (7.7.1) into the resulting expressions, we find

A^Ax,. = Kfxi {1.13)

and

AAV, =/c,Ve. (7.7.4)

Thus, we see that the orthonormal sets {x,,...,x,,} and {yi,...,y„) are, respectively, the eigenvectors of the self-adjoint matrices A^A and AA^ In Chapter 5 we saw that the eigenvalues of AB and BA are the same, in agreement with Eqs. (7.7.3) and (7.7.4). It also follows from the equation

(Ax,, Ax,) = (x„ A^Ax,) = K^\\x,f (7.7.5)

that fc? is positive. Thus, as an alternative to finding ^,,x,, and y, by solving Eqs. (7.7.1) and (7.7.2), we can solve the two self-adjoint eigenproblems posed by Eqs. (7.7.3) and (7.7.4) to obtain x,, y,, and /c, = y/Kf, An important implication of the above theorem is that the matrix A can be expressed in the biorthogonal diagonal form

This is known as Schmidt's normal form of a square matrix. The proof of Eq. (7.7.6) is easy. Since {Xj,..., x,,} is an orthonormal basis set,

I = Ex,x;. (7.7.7) 1=1

But since A = AI = X]IUiAx,xJ and Ax, = /c,y,, we immediately see that Eq. (7.7.6) is true. As an application of Eq. (7.7.6), consider the inhomogeneous equation

Az = b (7.7.8) 306 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

for the case |A| 7*^ 0, or Kf :^ 0. Expanding b as b = I]i(yi»^>yi» and noting that Az = I]"=i'c:,(x,, z>y,, we obtain /c,{x,, z> = (y,, b). Thus, the solution to Eq. (7.7.8) is

z=:T^lll^x,. (7.7.9)

For the case |A| = 0, or some /c, = 0, Eq. (7.7.8) has a solution if and only if (y., b> = 0 for the vectors y^ satisfying AV, = 0. The solution is then

z = E'-^^^'^,+E'V,-, (7.7.10) i ^i i where a, are arbitrary constants. The primed sum Y! nieans summation over the vectors for which K^ ^ 0 and the double primed sum Yl' nieans summation over the vectors for which Ax, = 0, or K^ = 0. This result is consistent with our earlier solvability theorem. What is new is that the solution is given in terms of the eigenvalues and eigenvectors of AA^ and A^A. Let us now turn to the proof of the theorem. Consider the vectors satisfying

Ax,=0 and AV,=0. (7.7.11)

The vectors x, and y, are, by definition, the null vectors of A and A^ Furthermore, we know that there will be n — r^ vectors x, and y, each, where r^ is the rank of A (and of A^). These eigenvectors can, of course, be orthonormalized by the Gram- Schmidt procedure. The next step is to hunt the maximum of |(y, Ax)|^ among all normalized vectors x, y in £"„ such that x is orthogonal to the null vectors of A, and y is orthogonal to the null vectors of A^ Since

|(y,Ax>|^<||Af||x||||y|| = ||Af, (7.7.12)

we know that the quantity |{y, Ax)|^ has a maximum M (which is, in fact, equal to ||A|p). We can, therefore, find x^ and y^ such that

|(yi,Ax,)p = M. (7.7.13)

This is because there exist sequences of normalized vectors x^"^ and y^"^ such that lim |{y(''>, Ax^"^)|2 = M, and so x^ = limx<«^ and y^ = limy^"^ We then let

(yi, Axi> = % + iKi = K, (7.7.14)

where /CR and KI are real numbers and |/c|^ = fc| + /Cj^ = M. Consider next a normalized vector x* that is orthogonal to Xj and to the null vectors of A. We can show that

(yi,Ax*)=0 (7.7.15)

by assuming that (y^ Ax*) = fx^ + ifi2 = M» where /Xj and />t2 are real numbers. We then consider the vector c^Xj + C2X*, where Cj and C2 are chosen so that llcjX, + C2X*|| = 1. This requires that IcJ^ + |c2p = 1 and now we have

|(yi, A(ciXi + C2X*))|^ = \c,K + C2MI' = c^Hc, (7.7.16) SCHMIDT'S NORMAL FORM OF AN ARBITRARY SQUARE MATRIX 307

where

-1 c = and H = {1.1.11)

Since H is a self-adjoint matrix, the maximum value of c^Hc is attained when c is the normalized eigenvector c^, of H, corresponding to the maximum eigenvalue A^ (which equals \K\^ + |/xp). Thus, choosing c = c,„, we obtain

|(yi, A(qxi + C2X*)>p = XJ\cJ\' = \K\' + M' (7.7.18)

However, we already know that |(yi, Ax)|^ < \K\^ for any normalized vector x. Therefore, it follows that fJi = 0, which proves Eq. (7.7.15) for all normalized vectors x* that are orthogonal to Xj and the null vectors of A. We can rewrite Eq. (7.7.14) as

(AVI,XI) =K, (7.7.19)

using the property (y, Ax) = (A^y, x) of the adjoint matrix, and then prove in a similar way that (AV*, Xi) = 0 for all normalized vectors y* that are orthogonal to yi and the null vectors of A^ Next, we will prove that

Ax, =/cyi. (7.7.20)

First, assume that Ax, - fcy, = x. Then, for any vector y, in the null vector set of AHA^Yi = 0), it follows that

(yi» x) = (y,., Axi - ^yi) = {Ay,., x,) - ^(y,, y,) = 0 (7.7.21)

and (yi, x) = (yi, Axi) - /<:(y,, y,) =K-K =0. (7.7.22)

Also, since (AV*, x,) = 0 for all normalized vectors orthogonal to y, and the null vectors of A^, we find

(y*,x> = (AV,Xi)-^(y*,y,)=0. (7.7.23)

Thus, we find that x is orthogonal to a complete orthonormal basis set and so X = 0, which proves Eq. (7.7.20). A similar proof establishes that

AVI =K%. (7.7.24)

Let us express the complex number K in the form K = exp(/^)/<:,, where K^ = \K\ and % = K^ cosO and ACJ = K^ sinO. If the vectors x, and y, are replaced by exp(—/^/2)Xi and exp(/^/2)yi, they remain unit vectors. Thus, without loss of generality, Eqs. (7.7.20) and (7.7.24) can be rewritten as

Ax, = KjIJl and AVI =^IXI, (7.7.25) where ^c, > 0 and ||Xi|| = HyJI = 1. 308 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

The process of obtaining x^.y^, and K^ can be repeated to find X2, y2, and KJ with the constraint that X2 is orthogonal to Xj and the null vectors of A, and y2 is orthogonal to yi and the null vectors of A^. The result is

AX2 = K2y2 and -^ y2 "~ '^2^2' 0 < /C9 < /c. (7.7.26) Continuing this process will generate two sets of vectors corresponding to nonzero

values of /c,. The vectors x,,1, xA2 , . x^^ plus the null vectors of A form an orthonormal basis set in E„, Likewise, the vectors y^Yi^ - • - ^Yr^ P^"^ ^^^ ^^^^ ^^^" tors of A^ form an orthonormal basis set in £"„. This completes the proof of our theorem. There is a subtlety that might be perplexing without a little thought. Suppose A is a self-adjoint matrix whose eigenproblem is

Az.. = A...Z.., (7.7.27)

The eigenvalues X, are real and can be either negative or positive. Clearly, z, are the eigenvectors of AA^ and A^A with eigenvalues X?. Thus, Kf in Eqs. (1,13) and (7.7.4) must equal Xf. However, we cannot simply set x, = y, = z, in Eqs. (7.7.1) and (7.7.2) because this would imply that X, = /c^, and so all of the eigenvalues of a self-adjoint matrix would have to be positive. The subtlety is that x, and y, differ by the factor exp(/^), where A., = exp(/^,)/c,. If A- is negative, then O^ = n. Thus, in Eqs. (7.7.1) and (7.7.2), we set x- = z- and y, = exp(/^,)z, for which they correctly reduce to Az = A,,z, and A^z, — A*z,. Of course, A* = X, for self-adjoint matrices.

7.8. THE INITIAL VALUE PROBLEM

The solutions to the initial value problems (IVPs) given by Eqs. (6.10.8) and (6.10.24) are valid whether or not A or T is perfect. However, in the general case, the simplest form that the equation

X = exp(AOxo + / exp(A(r — r))b(T) dr (7.8.1) Jo can take is expOA.,)K 0 0 0 exp(fA.2)K2 0 x=Q Q 'Xo

0 ••• exp(fXJKj exp((f-r)A,,)K, 0 0 0 exp((/ —T)A.2)K2 0 dr •'0 exp((r-r)X,)K, xQ-'b(r). (7.8.2) THE INITIAL VALUE PROBLEM 309

The following example illustrates the value of the Jordan canonical form in solving the IVP. EXAMPLE 7.8.1. The quantity u(t) obeys the second-order equation

d^u du -TT + 2-7- + M = cos (7.8.3) df dt (0' with du u{t) = 2, —- = -1 atr =0. (7.8.4) dt The problem transforms (see Section 6.10) to

d% 2 = Ax + f, x(0) (7.8.5) dt -1

where the companion matrix is

0 1 A=: (7.8.6) -1 -2

Recall that

(7.8.7) cos(0

and

X = (7.8.8) du Tt A Thus, A is imperfect; it has one eigenvalue, A, = — 1, and one eigenvector,

X, = (7.8.9)

The solution q2 to

(A - X,I)q2 = x. (7.8.10)

? (7.8.11) q2 = 0 310 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

and so 1 1 0 -1 Q = Q' = (7.8.12) -1 0 1 1

and

-1 1 A = QJQ • = Q Q-' (7.8.13) 0 -1

The solution to Eq. (7.8.5) is

X = exp(rA)x(0) + I exp((f - T)A)f(T) dx Jo (7.8.14) = Qexp(/J)Q-'x(0) + Q /' exp((/ - r)J)Q-'f(r)rfT. •'0 When the matrix products and integrations are carried out in Eq. (7.8.14), the final result is

X = du L'dt (7.8.15) -3 - 5f + ^M 3COS - + 4sin - I 2 + t 4 = e -l-t + 25^" ( f 3 A 1 2 + 5f+ eM 2cos- - -sin- V 2 2 2/ J Thus, 4 r t u = (2-^ t)e' + — -(3 + 5t)e~' + 3cos - +4sin ^]- (7.8.16) The first part of the solution comes from the initial conditions x(0) and the second part comes from the inhomogeneous part, or the driving force, f. Note that, for 0> 1,

M(O^^6cos^+4sin0; (7.8.17)

• • I i.e., asymptotically, the solution to Eq. (7.8.3) is an oscillating function of time.

PROBLEMS

1. Find the Jordan block form and transformation matrix Q for the following imperfect matrices: (a) "l 3 4 2 PROBLEMS 31

(b) 5 1 2 0 3 0 2 1 5

(d) 7T n/3 —n 0 n nil 0 0 ;r

(e) -1-4 4 -2 -2 3 -3 -5 -6 2. Find exp (AO for the matrices A given in parts (a)-(e) of Problem 1. 3. Suppose N is an n X M nilpotent matrix of index p, i.e., W~^ ^ 0 and N*" = 0. Prove that if u is a vector such that N''~'u ^ 0, then the vectors u, Nu,..., N''"''u form a linearly independent set. 4. (a) Suppose x,, Xj,..., x„ is an orthonormal basis set in E„. Prove that the matrix A = XjxJ, i i^ y,

is an imperfect matrix, (b) Suppose {x,,..., x„} is a basis set in E„ and {z,,..., z„) is the reciprocal set (z/x^ = 5,^). Prove that the matrix

x,-zj-, «• 7^;,

is an imperfect matrix. 5. Consider the fourth-order equation

1^ ~d? 'IF It with the initial conditions {t = 0)

d^u du = 2, = 0, -3, H=4. IF dt^ It Convert this problem into a first-ordersyste m and, using the Jordan canonical transformation, find and plot « and du/dt as a function of time. 312 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

6. Consider the matrix

10 0 0 0 0 0 0 2 0 0 0 0 0 0 0 10 0 0 0 A = 0 0 0 10 10 0 0 0 0 10 0 10 0 0 0 10 0 0 0 0 0 2 0

(a) Determine the eigenvalues of A and their respective multiplicities. (b) Find the rank of A — A,jl. How many eigenvectors does each eigenvalue have? (c) Find the matrix Q that Jordan block diagonalizes A in a similarity transformation. (d) Consider the problem

dx -^=A\ x(f = 0) = Xo, ^(f = 0) = 0, Tt

and its solution

X = cos(A/) XQ.

Give the solution in terms of the Jordan block diagonalization.

7. Transform the matrix

5 4 3 -1 0 -3 1 -2 1

into its Jordan block form using a similarity transformation. Give the solution to

-1 d\ — = Ax, x(f = 0) = 2 dt 1

and plot jc,,; = 1, 2, 3, versus t. PROBLEMS 313

8. Find the eigenvalues of A and the matrix Q that transfonns A into Jordan block diagonal form in a similarity transformation.

5 1 1 1 ~6 3 2 3 6 1 8 4 2 2 3 3 3 3 -1 -2 5 5 5 16 13 6 3 2 3 6 1 1 1 2 13 6 3 2 3 6-1 9. A galvanometer is an electromagnetic device for measuring electric current via the deflection of a needle through some angle ^. In a typical galvanometer, the deflection obeys the equation of motion

d'^e dO _ EG

where G is a calibration coefficient, D is a damping coefficient, S is the effective force constant of a spring opposing the deflection, J is the moment of inertia of the needle about its rotation axis, and R is the resistance of the galvanometer. The quantity E is the applied initial voltage (at t - 0) when 0 = dO/dt = 0. Depending on the parameters J, D, and 5, the galvanometer can undergo undamped 0 = Op(l -cos (o^t), underdamped 0 = Op[l - K exp(-«r) sm{(jDt 4- 0)], and critical damped motion

^=^^[l-(l+ft)oOexp(-a>oO].

(a) Convert the second-order problem in Eq. (1) into the first-order system

0 dx = Ax, X = de_ (2) L dt (b) From the properties of A, determine the necessary conditions on D, /, and S needed to satisfy each type of motion described above. (c) Solve Eq. (2) and give the formulas for 0^, 0)^, w, K, a, and 0 in terms of the constants J, D, S, E, G, and R. 3 I 4 CHAPTER 7 IMPERFECT OR DEFECTIVE MATRICES

FURTHER READING

Bellman, R. (1970). "Introduction to Matrix Analysis," McGraw-Hill, New York. Noble, B. (1969). "Applied Linear Algebra," Prentice Hall International, Englewood Cliffs, NJ. Noble B., and Daniel, J. W. (1977). "Applied Linear Algebra." Prentice Hall International, Englewood Cliffs, NJ. Watkins, S. W. (1991). "Fundamentals of Matrix Computations." Wiley, New York. INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

8.1. SYNOPSIS

The remaining chapters of this text will be devoted to the theory of infinite- dimensional linear vector spaces and function spaces. We will find, however, that a great deal of what we learned about finite-dimensional spaces will still apply— with a few added stipulations. Therefore, we will, whenever possible, draw upon the theorems and definitions presented in Chapters 1-7 with the aim of forming a solid analogy between matrix theory and linear operator theory. We will begin by showing with examples that function spaces are, in fact, infinite dimensional, where functions of the form f{t) become analogous to vectors in finite-dimensional spaces. We will quickly see, however, that the concept of infinite dimensionality allows for many types of function spaces. We will consequently define several of the more common ones. In the finite-dimensional vector space, E„, n linearly independent vectors x,, / = !,...,«, always form a basis set; i.e., any vector in E„ can be expressed as a linear combination of the x,. In an infinite-dimensional space, however, an infinite number of linearly independent vectors is not necessarily a basis set. Hence, infinite dimensionality requires us to develop the concept of the completeness of a vector space. This, in turn, leads to the need to enlarge the concept of an integral from the definition of Riemann to the definition of Lebesgue. The abstract linear vector spaces having the closest analog to finite linear vector spaces are Hilbert spaces. A Hilbert space ?^ is a linear vector space in

315 316 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

which an inner or scalar product is defined, and in which every Cauchy sequence converges to a vector contained inH. A Cauchy sequence is a sequence of vectors Vj, V2,... such that the length of the difference v„ — v^^ converges in the infinite limit. Namely, I|V„-V^|| <€ if n and m are greater than some integer N(€) that depends on €. In a Hilbert space, the vector v = lim„_^oo v„ always lies in the space if v„, n = 1, 2,..., is a sequence converging in the Cauchy sense. A typical function space that forms a Hilbert space is C2{ci, b), whose functions are square integrable, i.e..

/ r(t)f(t)dt< 00.

The need for square integrability arises directly from the definition of the inner product of f and g: (f,8)-1 r(t)g{t)dt, ''a where f*(t) is the complex conjugate of fit). As we shall see, the integrals must be defined according to the measure theory of Lebesgue. We will introduce the concept of linear operators and compare their properties with the finite-dimensional analogs of matrices. We will see, for instance, that, for a vector x e £„, the matrix product Ax is also contained in £"„. However, this is not necessarily the case in Hilbert spaces. We will show that the domain of a linear operator in H is actually, in general, only a subset of H. We will define a special class of operators called dyadic operators represented in the form K = i:a,bJ. 1 = 1 We will discuss the solutions of the linear equations, Lu = f, in which the operator L is either a /:-term dyadic or the sum of the identity operator and a A:-term dyadic. These problems can be solved using the solvability theorem for linear systems in a finite-dimensional vector space. Finally, we will show that perfect operators in a Hilbert space can be represented by a dyadic operator just as in finite linear vector spaces—although the dyadic may contain an infinite number of dyads. This representation is a spectral decomposition totally analogous to perfect matrices in finite-dimensional vector spaces. In function spaces, the implication of this is that, whether the operator is an integral or a differential operator, the spectral representation is with certainty an integral operator.

8.2. INFINITE-DIMENSIONAL SPACES

The definition of a linear vector space is the same whether the space is finite or infinite dimensional. As described in Chapter 2, a vector space 5 is a collection of elements x, y, z... having the operation of addition (denoted by -j-) and possessing INFINITE-DIMENSIONAL SPACES 317

the following properties: 1. If x,y € S, then

x-\-y € S. (8.2.1)

x + y = y4-x. (8.2.2)

3. There exists a zero vector 0 such that

x + 0 = x. (8.2.3)

4. For every x e 5, there exists -x such that

X + (-X) = 0. (8.2.4)

5. If a and p are complex numbers, then aiPx) = iaP)x {a + p)x = OCX-{-fix (8.2.5) of(x + y) =ax + ay. Since a real vector space is contained within a complex vector space, we will deal with complex vector spaces throughout the rest of this text. In both £•„ and E^, a vector is denoted by a boldface, lowercase letter, e.g., x, and its /th component is denoted by a lowercase letter with the subscript /, e.g., jc,. We have concentrated in previous chapters on the vector space £"„ in which

X = (8.2.6)

and we defined the adjoint x^ of x as

x^^[x\..,,,x:i (8.2.7)

Analogously to £„, a vector in the space E^ is of the form

X = (8.2.8)

and its adjoint is given by

.rt-- [x\,xl...]. (8.2.9)

We say that x has a denumerably infinite number of components. 318 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

Function spaces can also form a linear vector space. For example, C^(0, 1), the space of all continuous functions defined on the interval (0, 1), is a linear vector space. We will denote a vector in C^(0,1) by a boldface, lowercase letter, e.g., f. This vector represents a continuous function /(r), 0 < r < 1, in our function space. Whereas f represents all values of the function (analogous to x in E^), the quantity f(t) is the value of the function at t and is analogous to a component A:, of X. In fact, in a discretized representation of the function f(t), we can introduce the approximation

(8.2.10)

where / = /(r,) and r,, / = 1,..., n, are the values of the continuous variable t at which we estimate f(t). In function spaces, we also introduce the concept of an adjoint vector. Thus, if f is a vector in the function space C^(0, 1) representing the function f{t),0

/ f{t)dt (8.2.11)

exists, where the above integral is the Riemann integral. RIEMANN AND LEBESGUE INTEGRATION 319

4. Ciia, b): the space of functions whose Lebesgue integral exists; i.e., if f eC^(a,b), then

/ fiOdt (8.2.12)

exists, where the above integral is the Lebesgue integral. The difference between the Riemann and the Lebesgue definitions of an integral will be explained in the next section. £2(^0)' the space of functions, of D variables, whose squared absolute values are Lebesgue integrable; i.e., if f € £2(^0), then

/ "'I nr)f{7)d^r (8.2.13)

exists, where the D-dimensional integral is a Lebesgue integral. In the subsequent sections, we shall utilize the concept of a complete vector space. In essence, this property will ensure that a set of basis functions will exist for that space. Thus, if a space S is complete, then we can find a basis set f^, £2,... such that any vector f in S can be represented by the series

f = E«nf.. (8.2.14)

where the scalars a^ are unique for a given vector f. For example, £"„ forms a complete vector space, but of the function spaces defined above, only C^ and £2 ^r^ complete. This is why Lebesgue integrability is important in the general theory of linear vector spaces and why we shall take it up in the next section.

8.3. RIEMANN AND LEBESGUE INTEGRATION

The integral of a continuous function f{t) in the interval a < t < b h iht area under the curve representing f{t) as shown in Fig. 8.3.1. The way we define a Riemann integral is to discretize the interval {a, b) into the subintervals (r,, r,-|-Af,), / = 1,..., n, and then construct the upper and lower Darboux sums D^ and DJ'. In the interval (^,, r, 4- A?,), we define the upper bound on f{t) as /)" = max^ f{t)

ti ti^i ti^2 ^i-f3

I FIGURE 8.3.1 320 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

and the lower bound as fl = min^/Cr). These are illustrated in Fig. 8.3.1. The upper Darboux sum is the sum of the areas of the rectangles of height /" and width A^,, i.e.,

Dl = j:f^hu. (8.3.1)

Likewise, the lower Darboux sum is defined as the sum of the areas of the rectangles of height fl and width A/^, i.e.,

D" = f:f!At,. (8.3.2)

If, as the discretization is refined, i.e., n -> oo and Ar, -> 0, the Darboux sums converge and are equal to each other, we say the Riemann integral exists and has the value

/ f{t)dt = lim Dl = lim D^. (8.3.3) •'a It is easy to show that continuous and piecewise-continuous functions are Riemann integrable, but what about the situation when f{t) is continuous except at a set of points f^, j = 1, 2,..., where f{tj) ^ Hm,^,. /(f)? The area under an isolated point is 0, and so one does not expect the area under a curve representing f{t) to be affected by these isolated points. However, the Riemann integral of a function can fail to exist even though the area under its curve exists. For example, consider the interval {a, b) and the function

/(/) = !, /an irrational number (8.3.4) = 0, fa rational number.

In any interval Af,, there will be both rational and irrational numbers, and so

Dl = 1 and /)," = 0. (8.3.5)

Since DJI and Df have different limits, the function is not Riemann integrable. The Lebesgue measure theory introduced early in the 20th century leads to a definition of the integral of a function that captures the concept of the integral as the area under the curve, avoiding the problem caused by a function having discontinuities such as in Eq. (8.3.4). In Lebesgue's theory, one considers the class of functions /(f), which can be approximated by a sequence of step functions if„it) almost everywhere, i.e., everywhere but on a denumerable set of isolated points (denumerable means the number can be infinite, but countably infinite in the sense that the set can be mapped one to one onto the set of all positive integers—the irrational numbers are an example of an uncountable set). The step functions are defined in the function fn(0 as

fnit)^f-, f,

where /j" is the vakie of f{t) within the interval i not including the isolated points in question. The integral of the step functions is then

tir„(t)dt = f2fpAt,. (8.3.7)

The requirement for the existence of a Lebesgue integral of f{t) is that

\l ^"^^^"^^

fit) = lim f„{t) (8.3.9) n->co

almost everywhere. Consequently, the the Lebesgue integral is given by

[ f{t)dt= lim / ifnit)dt. (8.3.10) J a n-^coJa

Clearly, when a Riemann integral exists, the Lebesgue integral exists. For example, we could sufficiently choose as f„{t) either v^n(o = y;", r,

fniO = f!. t,

On the other hand, while the Riemann integral of the function defined by Eq. (8.3.4) does not exist, the step function

xlr„(t) = l, a

is equal to f(t) almost everywhere, and so the Lebesgue integral of f{t) does exist and gives the result

/ fit)dt= lim I \dt = h-a, (8.3.14)

In practical problems, functions are generally Riemann integrable and so the subtleties involved in Lebesgue integration need not concern us except in as much as they guarantee the completeness of certain function spaces. The need for completeness of a linear vector space will be addressed in the next section. Those who are interested in a deeper appreciation of the measure theory and the Lebesgue integral are encouraged to explore references recommended at the end of this chapter. 322 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

8.4. INNER PRODUCT SPACES

In the space £„, the simplest inner or scalar product we can define is x^y = YTi=i ^tyi' Analogously, in E^, we can define the inner product x^y = YlZi ^tyt- More generally, we defined in Section 5.2 the inner product for an arbitrary abstract linear vector space S by the properties

{x,y) = {y,x)* (8.4.1) (ax + ^y, z) = a* (x, z) + /8* (y, z) (8.4.2)

and

(x, x) > 0 for any X 7^ 0 in S. (8.4.3)

The length (or norm) of x in 5 was defined by

||x||^V(x,x), (8.4.4)

and we also proved in Section 5.2 that the properties in Eqs. (8.4.1)~(8.4.3) imply the triangle inequality

l|x + y||<||xl| + ||y|| (8.4.5)

and the Schwarz inequality

|{x,y)|<||x||||y||. (8.4.6)

These proofs are valid for an arbitrary inner product space. Some examples of inner products in linear vector spaces include: (i) The space £*„, with

(x,y)=xV = i:x;y,-. (8.4.7) 1=1 (ii) The space E^, with

{x,y)=x'Ay = f:a,jx:yj, (8.4.8) i = l where A is a positive-definite matrix, (iii) The space E^, with

(iv) The space E^, with

CO {x,y)=x'Ay = J^a,jX*yj, (8.4.10)

where A is a positive-definite matrix. INNER PRODUCT SPACES 3 23

(v) The space £2(^11) of functions square integrable in the interval (0, 1), with

(f^g) ^fg= f r(t)g{t)dt. (8.4.11)

(vi) The space £2(0^ 1; ^(0), with

{f,g)^ / nt)g{t)k{t)dt. (8.4.12) where k{t) > ^ almost everywhere, (vii) The space £2(^0) of functions square integrable in the D-dimensional volume Qj), with

(f, g) = fg ^ /*. •. / /*(r)g(r)^^r. (8.4.13)

(viii) The space £2(^0* ^)» with

(f,g> ^ /•- f rir)g{r)kir)d''r, (8.4.14)

where k{r) > 0 almost everywhere in Q^^, In Eqs. (8.4.11) and (8.4.13), we have generalized the notation x^y of the vector space £„ to function spaces. In £„, the notation x^ means the sum over all / of the elements jcf>?,. Likewise, in a function space 5, the notation f^g means the integral of the elements f*(r)g(r) over the volume (or interval in one dimension) in which the functions are defined. Notice in the function spaces £2(0,1) and £2(^0) we can define the inner product with or without a weighting function k. We can even define an inner product analogously to Eqs. (8.4.8) and (8.4.10) in which A is replaced by a positive-definite integral operator. In this and in subsequent chapters, we will use the notation (x, y> and x^y interchangeably as inner products. In handling dyadic expressions, x^y and yx^ turn out to be quite suggestive and will aid us significantly in generalizing our thinking from finite-dimensional vector spaces to function spaces. Another inner product that further illustrates the generality of the concept is (ix) The space C^O, 1) of functions with continuous first derivatives in the closed interval (0, 1), with '''^''^'^''^'^dt^rmsio). (8.4.15) •'0 dt dt Note that the temptingly simpler definition r'dr(Orfg(f) {f.g> = /" ^ dt (8.4.16) ^0 dt dt does not define an inner product. This is because the vector f = 1 (or the function f{t) = 1) is a nonzero vector belonging to C*(0,1), and, if the inner product is defined by Eq. (8.4.16),

(f,f)= f OxOdt=0, (8.4.17)

which violates the property required in Eq. (8.4.3). 324 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

8.5. HUBERT SPACES

We say a vector belongs to a normed linear vector space S if every vector x in «S has a finite norm or length, i.e., if ||x|| < oo. We claimed in earlier chapters that norms can be defined independently of inner products. For example, in E„, the p = oo norm of a vector is given by \\\\\oo = maxi

1 1

X = J_ (8.5.1) V3

then

lrf = ET=oo. (8.5.2)

On the other hand, the space S = E^ with the condition ||x|p = {x, x) < oo does form a normed linear vector space. Interestingly, if we chose the /? = oo norm, ||x||^ = 1 for this example. However, the vector x with components x^ = i has an infinite p = oo norm. Unlike finite-dimensional spaces, in an infinite-dimensional space we must concern ourselves with the issues of convergence (of a sequence of vectors) and completeness of the space. A sequence of vectors x,, X2,... in 5 is said to converge to a vector x if, for any number e > 0, there exists an integer N{€) such that

||x — x„|| < € for rt > N{€), (8.5.3)

The vector x is said to be the limit of the sequence, namely.

X = lim x„. (8.5.4)

From the triangle inequality, it follows that if Eq. (8.5.3) is true, then

l|x„, - xjl = ||x ~ x„ - (X - x^)|| < ||x - x„ + l|x - X, (8.5.5) <6 for n, m > NI-V

Such a sequence of vectors is said to converge in the Cauchy sense; i.e., we say that if a sequence x„ in S converges to a vector x in

If the sequence Xj, X2,... converges in the Cauchy sense, for vectors in the finite-dimensional vector space E„, then

X = lim x^ € £„; (8.5.6)

i.e., the sequence converges to a vector in E„ if it converges in the Cauchy sense. This is because of the well-known property that if, for the sequence of complex numbers ofj, a2» • • • • ^^r^ exists an integer N{€) such that

for m, p > N(€), (8.5.7)

then lim^_^^^ a^ exists and is equal to a complex number. On the other hand, in an infinite-dimensional vector space 5, Cauchy convergence of a sequence does not always imply that the limit of the sequence is a vector in the space S. For example, consider the space C^(0, 1) of continuous functions defined on the interval (0, 1). Defining the inner product as

(8.5.8) Jo we can define the following sequence of continuous functions:

1 1 0, 0

fnit)

FIGURE 8.5.1 326 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

EXERCISE 8.5.1. For a sequence defined by Eq. (8.5.9), show that, for any number 6 > 0, there exists an integer N(€) such that

K-U = I \fniO - fmiONt < 6, «, m > N{€). (8.5.10)

Equation (8.5.10) establishes that f„ converges in the Cauchy sense. However, the limit of this sequence (n -^ oo) is the function

0, o<'<5. f(t) = (8.5.11) 1 1. -,

This function is not continuous in the interval C^(0,1), and so the limit vector f = lim„_^^ f„ is not in the vector space of continuous functions. We say that a vector space S is complete if the limit vector of any sequence converging in the Cauchy sense is a vector in the space S. Clearly, if a given space is not complete, it can be made complete by adding to it all of the limit vectors of Cauchy sequences. When this is done for the space of continuous functions with the inner product defined by Eq. (8.5.8), the resulting complete space is £2(0* 1)— the space of functions square integrable in the sense of Lebesgue. The need to complete a vector space forced analysts to abandon the Riemann integral in favor of the Lebesgue integral. Without proving it, let us remark that the linear spaces (i)~(viii), with the inner products indicated by Eqs. (8.4.7)-(8.4.14), and with the requirement that ||x||^ = (x, x) < 00, form complete vector spaces. If a sequence in any of these spaces converges in the Cauchy sense, then it converges to a vector in the space. Another way of saying this is that if the sequence x„ converges in the Cauchy sense in a complete space 5, then the limit x of the sequence exists and is in the space S. A complete linear vector space with an inner product norm is called a Hilbert space. In the remainder of this text, the theory of linear operators will be conducted in an appropriate Hilbert space. We shall denote a Hilbert space generically by H, although more suggestive notation such as £„, €2(0, 1), and £2(^0) ^^^^ ^^ ^sed at times.

8.6. BASIS VECTORS

Just as in finite-dimensional vector spaces, all Hilbert spaces possess basis sets. We say that the linearly independent vectors u^, U2,... form a basis set in H if, for any vector f in K, there exists a unique set of complex numbers aj, 0^2 such that

In function spaces, the basis sets are denumerably infinite; i.e., the number of basis vectors is infinite, but the vectors are countable in the sense that they are in one- to-one correspondence with the positive integers. BASIS VECTORS 327

As a simple example of a basis set, consider the set of all analytic functions f(t) in the interval (—1,1). Since the functions are analytic, the series expansion

always exists. This means that the functions M„ = r'', n = 0,1, 2,..., form a basis set for analytic functions in the interval (—1,1). Although the functions M„ = r", n = 0, 1,..., are linearly independent, they are not orthogonal since

fu:u„,dt= /^ (8.6.3) j-\ /T-f/n + 1 However, the Gram-Schmidt procedure can be used to orthogonalize a linearly independent set in any Hilbert space. Recall the procedure: if u,,U2,... is a linearly independent set, then the set Vj, V2,..., where

Vt =U, (Vi,U2)

(8.6.4)

t^ (Vi,U>

is an orthogonal set, i.e.,

{v,,v,.)#0 and {v,,v,.>=0, / / >. (8.6.5)

EXAMPLE 8.6.1. Consider the vectors defined by the set of functions u„ = t", n = 0,1, 2,..., in the space C2{—1, 1). Orthogonalize the set. By the Gram-Schmidt procedure, the first three orthogonal vectors are defined by the functions Ug, V], and V2:

Vo(t) = Mo(r) = 1

u,(0 = M,(0-7^^^Vo(0

f\tdt

V2U) = M2(0 - -; rVo(0 - -, zMO (Vo.Vo) (V„V,> 2 !\t'dt f_^tUt = t

= t^ ~3' 328 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

Aside from multiplicative factors, these functions are the first tliree Legendre polynomials defined by

P^(t) = -i——{t^ - l)\ (8.6.7)

The Legendre polynomials are known (e.g., from quantum mechanics) to be a basis • • • set in the Hilbert space C2{-'\, 1). From the theory of Fourier series, we know that the set of functions

v^{t) = Vlsinnnt, « = 1, 2,..., (8.6.8)

form an orthonormal basis set in the Hilbert space £2(0* ^)- ^^ shall use this set to illustrate what is meant when we say that vectors in a Hilbert space are equal. Consider the function f{t). We define the vector f by the expansion

00 CO

fit) = E^ni^nCO = E«nV2sinn7rr (8.6.9)

and claim it is equal to the vector f if

5„ = f vl{t)f{t)dt = V2 f smn7Ttfit)dt (8.6.10) for all n. In other words, two vectors are equivalent if they have the same Fourier coefficients. In £2(0^ 1)' vector equality means

l|f-f|l'= / \f{t)~m\'dt = 0. (8.6.11)

If /(/) is a continuous function, the theory of Fourier series teaches us that f{t) = f(t) at every point in the interval (0,1). Thus, Eq. (8.6.11) follows since f(t) — f(t) = 0 for all values of Hn (0, 1). On the other hand, if f{t) is piecewise continuous, f(t) — f(t) =0 everywhere f(t) is continuous, but, at a point of discontinuity t\

f(0 = \[fiO + f(Ol (8.6.12)

where /(/I) is the value of f(t) as t^ is approached from below and f{t'^) is the value of f{t) as t' is approached from above. For example, if

0, ^^^^5' f{t) = (8.6.13) 1, \

its Fourier expansion is

fit) = Y,a„V2 sin nnt, (8.6.14) BASIS VECTORS 329

with

«„ = —(kl + (-l)T'-(-l)"!- (8-6.15)

Then f(t) = 0, 0 < r < |; f(t) = 1, | < r < 1, but

/(, = i) = i(0+l) = i. (8.6.16)

Thus, f(t) - f(t) 7.^ 0 at r = |. However, even though f{t) and f(t) are not equal at every point in the interval t, since they differ only on a set of measure 0, it follows that ||f - ff = /J \f(t) - /(OP dt = 0, and so the vector f is equal to the vector f in the Hilbert space £2(^' !)• Another point to make with the expansion in Eq. (8.6.14) is that the sequence

m fm(0 = J]oi^VlsinriTTt, m = 1,2,..., (8.6.17)

with a„ given by Eq. (8.6.15), is a sequence of continuous functions, i.e., functions in C^(0,1), which converges to a piecewise-continuous function in the limit m -» 00. This points out the need to enlarge the space to include functions outside C^(0, 1) in order to complete it. This next example is indicative of the need to use Lebesgue instead of Riemann integration in completing function spaces. Consider again the orthonormal basis set defined by Eq. (8.6.8). We define the piecewise-continuous function in (0, 1):

/(0 = T, —-7

and from the properties of Fourier series it follows that

fit) - fit) = 0, T-^ < f < -, / = 1, 2,..., (8.6.20) I H- 1 / and

fit)-fit) = U\-T^) atr = T, /=2,3,. (8.6.21) 2\i i-\-\/ I

If we try to take the upper and lower Darboux sums for the function fit) — fit) for arbitrary discretizations of the interval (0, 1), we find that the Riemann integral does not exist because the upper and lower Darboux sums differ. However, since fit) and fit) only differ on a denumerable set of measure 0, the Lebesgue integral of fit) - fit) is 0, and so ||f - f || = 0 and f = f in the Hilbert space £3(0,1). 330 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

We should note that, although n linearly independent vectors in E^ always form a basis set, an infinite number of linearly independent vectors do not necessarily form a basis set in an infinite-dimensional space. For example, if v^ = V2 sin TT? is left out of the set defined by Eq. (8.6.8), then the remaining infinite number of vectors do not form a basis. This follows since v^ cannot be expanded in terms of them since v^ is linearly independent oi v^,n > 1.

8.7. LINEAR OPERATORS

Analogous to matrices in E„, we define a linear operator L in a Hilbert space H as follows: (1) if x, y € 'H and Lx, hy eH and (2) if

L(ax + ^y) = aLx 4- /3Ly, (8.7.1)

where a and p are complex numbers, then we say L is a linear operator in %. An n X n matrix A is, of course, a linear operator in E„ and, with an appropriate inner product, an oo x oo matrix is a linear operator in the space E^. For example, suppose

(X, y)=xV = £•**>'.• (8.7.2)

is the inner product in E^ and the matrix A is given by

10 0 0 0 2'/^ 0 0 0 0 Z^'^ 0 A = (8.7.3)

i.e., ttjj = i^^^Sjj, i = 1,2,.... The vectors

1 1 1 W ¥ X = (8.7.4) y = 1 z = 1 w 3^

are in E^, and since \\xf = YZi 1//' < oo, ||yf = E~i V'^ < oo, and ||z||^ = YiZi V''' < oo, X, y, and z belong to the Hilbert space formed by imposing LINEAR OPERATORS 331

the inner product x^y on E^. The vectors

1 1 1 w 2 1 Ay = and Az = (8.7.5) 1 3372 3

also belong to the Hilbert space because ||Ay|| < 00 and ||Az|| < 00. However, Ax does not belong to the Hilbert space since

00 I IIAX|P=ET=OO. (8.7.6)

Indeed, A is a linear operator in the space 71 for all x € £„ for which ||x|| < 00, but for some vectors x € H, the vector Ax lies outside the Hilbert space. This property of linear operators did not arise in our analysis of finite-dimensional vector spaces. Because of this property, in infinite-dimensional Hilbert spaces we must define the domain P of a linear operator.

DEFINITION. We say that a vector x in H belongs to the domain V of the linear operator L in H if Lx belongs to %.

For the operator A defined by Eq. (8.7.3), and the inner product x^y = Y.i ^t'yi^ it follows that y and z belong to the domain V of A, but x does not. As another example, consider the differential operator L operating on the vectors u e £2(^» !)• ^y definition, the domain X> of L consists of the vectors v in £2(0,1) for which Lv belongs to £2(^1 O- Consider, in particular, the operator defined by the differential expression

dMt) Lv(t) = (8.7.7) '~dF'' with the boundary conditions

viO) = y,. i;(l) = K2. (8.7.8)

Note that a differential operator must be defined by its differential expression and its boundary conditions. The vectors forming the domain of L in £2(0* 1) tnust be square-integrable functions whose second derivatives exist. Furthermore, the functions must satisfy the boundary conditions in Eq. (8.7.8). The eigenvalue problem for a linear operator follows analogously from matrix theory as

Lv = Xv, (8.7.9) 332 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

where v belongs to the domain of L. If yj and ^2 equal 0 in the differential example above, then the eigenvalue equation becomes

(f-v = -\v, (8.7.10) dt

whose general solution is

V = a sinv^r-f Z? cosv^r. (8.7.11)

Upon applying the boundary conditions i;(0) = i;(l) = 0, it follows that ^ = 0 and sin VA = 0, requiring that A = {nnY for n = 1, 2, The solutions to the eigenproblem are then given by

i;„ = a sinnTrr, n = l,2, (8.7.12)

We know from the theory of Fourier series that the functions a ^mnnt, n — 1,2,..., form a basis set for £2(0^ !)• Thus, the operator L defined by Eq. (8.7.7) with i;(0) = i;(l) = 0 is a perfect operator in the sense that its eigenvectors form a complete set (i.e., a basis set) in £2(0, !)• In fact, the set is a complete, orthonormal basis set if a is set to \/2. In Chapter 9 we will explore the spectral theory of integral and differential operators and we will find that there are perfect as well as imperfect (defective) operators. Another example of a differential operator is the operator L in £2(^D)> where Lv is defined by the differential expression

. d^v d^v d^v ^„^^^^

with the boundary condition

i;(r) = }/(?) forrona^^, (8.7.14)

where yij) is a given function for values of r lying on the boundary dO^j^ oi the volume Qj), An integral operator K in £2(^' ^) is defined by the expression

Kv= / k{t,s)v{s)ds, (8.7.15)

k{t, s) is referred to as the kernel of the integral operator K and it is analogous to the component a^j of the matrix operator. Thus, if y = Ax, then the /th component of y is y, = Yij ciijXj, and if f = Kv, then the "component" of f corresponding to the index t is f(t) = f^ k(t, s)v(s)ds. What about the domain of an integral operator K? If K is an integral operator in £2(^' ^)' i^ ^ ^nd b are finite, and if the kernel k{t, s) is a continuous function of t and s, then f{t) = /^ k{t, s)v(s) ds is a continuous function for any function v(s) in C2{a, b). Since any continuous function belongs to C2{a, b), every vector in £2(^5 ^) belongs to the domain K. LINEAR OPERATORS 333

On the other hand, if K is an operator in £2(0,1) and if k{t, s) = \/is -f t)^, i;(^) = 1, and f = Kv, then

m = 7 - Y^- (8.7.16)

Since f(t) is not square integrable, the function u = 1 does not define a vector in the domain of K. However, if i;(^) = s and f = Kv, then

which is square integrable in £2(^11)'

f f{tfdt^]--h{\n2)\ (8.7.18) Jo I Thus, the function v — s does define a vector in the domain of K. In fact, the functions i;„(5') = ^'', n = 1, 2,..., all belong to the domain of K. Perhaps the simplest kind of integral operator is the one with the kernel k{t,s) = g(t)h*(s). Symbolically, the operator corresponding to this kernel is a dyad, i.e., K = gh^ (8.7.19)

Application of such an operator to v yields

f = Kv = ghV = g(h, V) (8.7.20)

or /(0 = (h,v)^(0, (8.7.21)

where

{h,v)= / h*{s)v{s)ds. (8.7.22) Ja We can define the dyad operator for an arbitrary Hilbert space 7i by Eq. (8.7.19) and the expression

Kv = g(h,v), (8.7.23)

where (h, v) denotes the inner product defining the Hilbert space H, For example, suppose H = £2(^01 ^)» such that

(h, V) = f • • • /* h*(r)v{r)k{r) d^r. (8.7.24)

The kernel of K = gh^ is given by g(r)h*(s), but the action of K on v is to take the inner product of h^ and v in £2(^0» ^)' ^"d so if f = Kv, it follows that

/(r) = (h,v)^(r), (8.7.25)

where (h, v) is given by Eq. (8.7.24). 334 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

A dyadic operator is, by definition, just a sum of dyads, i.e.,

K = E&ht. (8.7.26)

The dyadic form of an operator figured prominently in the spectral theory of matrix operators in finite-dimensional vector spaces. As we shall see in Chapter 9, this form will also become important in the spectral theory (i.e., the eigenanalysis) of operators in infinite-dimensional vector spaces. If f is a vector in a Hilbert space H, then af, |a| < oo, also belongs to the space H. Moreover, the adjoint (af)^ of af obeys the relationship

(aty =a*(\ (8.7.27)

where a* is the complex conjugate of a. Correspondingly, the adjoint K^ of the ^-term dyadic K = X^f^j a^a^bj is

K^ = E<»>/aJ. (8.7.28)

Dyadics provide the simplest examples of perfect and imperfect operators in Hilbert spaces. For example, consider the operator

K = POPJ + PIP1 (8.7.29)

in C2i—l, 1), where PQU) and Pi(t) are the zeroth- and first-degree Legendre polynomials PQCO = 1 and P^it) = r. The solutions to the eigenproblem

Ku = A.U (8.7,30)

are M„ = P„ for n = 0, 1,..., with the eigenvalues AQ = 2, X^ = |, and A.„ = 0 for n > 1. This results from the orthonormality of the Legendre polynominals and the nature of the dyadic operator. Namely,

KP„ = (Po, PJPo + (Pi, PJPi, (8.7.31)

and {P„,, P„) = 0 if m 7«^ n. Since the Legendre polynomials form a basis set in C2i-'lj 1), the dyadic defined by Eq. (8.7.27) is a perfect operator. Consider, however, the operator

K = PiPj-hPiPl (8.7.32)

and the corresponding eigenproblem

Ku = (Po, u)Pi -f (Pi, u)Pi = Xu. (8.7.33)

For this case, u^ = Pi, X^ = | and u„ = P„, X„ = 0 for n = 2, 3, However, PQ is not an eigenvector of K, and so the eigenvectors of K do not form a basis set. The operator defined by Eq. (8.7.32) is, therefore, imperfect. LINEAR OPERATORS 335

••H EXERCISE 8.7.1. Determine whether or not the operator

K = PiPj + PoPl (8.7.34)

is perfect, where PQ and Pj are the first two Legendre polynomials. Find all of the • I I eigenvectors and eigenvalues of K. We end this section by deriving an explicit expression for the inverse of the operator

L = I + K, (8.7.35)

where ||K|( < y, 0 < y < 1. Heuristically, we expect the inverse, denoted by

(I + K)-^ or —^, (8.7.36)

to be expandable in the series

S = I-K + K^-K^ + .... (8.7.37)

Thus, we must first prove that this series converges. Consider the finite sum

Sn=I + E(-K)'. (8.7.38)

Recalling the definition of the norm of an operator in the Hilbert space H'.

||Lf = maxi?;^, (8.7.39) neH (U, U> we observe that

l|Lu||<||L||||u||, (8.7.40)

where the length ||u|| of a vector is defined by ||u||^ = (u, u). Thus, for any vector u € ?^, it follows that

||(S„ - SJull = |ri:(-K)'u|| < £ ||(-K)'u||. (8.7.41) 11/=m 11 i=m

Observing that ||K'u|| < K'||U||, Eq. (8.7.41) becomes

||(S„ - S,„)u|| < ^^ _ ^ ||u||. (8.7.42)

Since y < 1, there exists an m(€) such that y"" < €(1 — y), and so ||(S„ - S„)u|| < e (1 - K"-") <€, n>m> m(e). (8.7.43)

This implies that S„u is a Cauchy sequence and so lim„_^ooS„« converges to a vector V in 'H. Thus, we define the operator S as

V = Su = lim S„u. (8.7.44) n->co 336 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

The next question is whether S = (I + K)~*. Note that

llS„(I + K)u-S(I + K)u|| = {'=" I II (f:/-)||(I + K)u|| (8-7-45)

<-r^(l4-y)||u||. \-y Since the right-hand side of Eq. (8.7.45) goes to 0 as n -^ oo, it follows that lim„_^ S„(I+K) = S(I+K). Similar details establish that lim„_,^(I+K)S„ = (I + K)S. In summary, we have proved that:

THEOREM. //" L = I + K and ||K|| < y, 0 < y < 1, then the inverse of (I + K) exists and can he represented as

oo (I + K)-^=I + X](-Ky. (8.7.46) 1=1 This also leads to the conclusion that the solution of

(I -f K)u = f (8.7.47)

always exists and is unique. It could be computed from the series

oo u = f + X](-l)'K:'^ (8.7.48) 1=1 although this might not be the most economical way to obtain the solution. Alternatively, an iterative solution to the problem could be obtained from the series

u,=f

U2 = f — KUi (8.7.49)

u„=f-Ku„_i,

since the theorem at Eq. (8.7.46) guarantees convergence of the series.

8.8. SOLUTIONS TO PROBLEMS INVOLVING fe-TERM DYADICS

In the previous section we encountered the fc-term dyadic operator of the form

K = Ea,bJ, (8.8.1) SOLUTIONS TO PROBLEMS INVOLVING k-TERM DYADICS 337

where the sets {a,,..., a^^} and {bj,..., b^} are linearly independent sets. {Note: Some texts refer to operators of the form of Eq. (8.8.1) as degenerate operators.) If the sets {a,} and {b,} are not linearly independent, then the dyadic operator in Eq. (8.8.1) can be transformed into another dyadic operator composed of smaller sets {§^1 and {b,} that are linearly independent. For example, suppose a;t is linearly dependent on the vectors a^ ..., a^.p Then there exists a set of numbers {QTI, ..., ctj^_{\ such that

a, = X:«.a,. (8.8.2)

Inserting this expression into Eq. (8.8.1) gives

k-\ k-\ K = Ea,bJ + Ea.«.bI 1 = 1 (8.8.3) k-i

1 = 1

a (^ ~ l)-term dyadic in which

b,. = b, + orfb;^, / = 1,..., it - 1. (8.8.4)

If one vector in the set {b,} is linearly dependent on the rest, then a similar process will enable reduction of Eq. (8.8.3) to

fe-2 K = i:a,bJ. (8.8.5) r = l

The process can be continued until K is reduced to a sum of dyads a,bj whose vectors {a,} and {b,} form linearly independent sets. We say such a dyadic is irreducible. In what follows, we will always assume that the A:-term dyadic is irreducible. Consider next the inhomogeneous equation

Ku = f, (8.8.6)

where K is a A;-term irreducible dyadic. Since Eq. (8.8.6) can be expressed as

j:(b„u>a,=f, (8.8.7) i=l

it follows that Eq. (8.8.6) will have a solution only t/f is a linear combination of the set {aj, i.e., if

f = j:a,a,. (8.8.8) i=l 338 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

If f is, indeed, of the form of Eq. (8.8.8), it follows from Eq. (8.8.7) that

(b^,u) =a,., / = l,...,fc. (8.8.9)

To find a particular solution Up, we assume that

(8.8.10)

and substitute this into Eq. (8.8.9) to yield the matrix problem

(8.8.11)

where A is a self-adjoint, k x k matrix given by

A = [(b,,b^.)], (8.8,12)

and P and a are A;-dimensional vectors with components pi and a,, respectively. From the linear independence of the set {b,,..., b^^l' it follows that the matrix A is nonsingular, i.e., |A| ^ 0. To prove this, let us begin with the hypothesis that A is singular; i.e., its rank is less than k. From the theory of matrices, this implies that at least one column, say column A:, of A is a linear combination of the other columns; i.e., there exists a set of numbers c,,..., Cf^_i, not all 0, such that

r(b„b,)' "(b..b,>1 k-\ (8.8.13) i — \ y—* L(b*,b,)_ _(b„b^)J

or k-l i = l, ,k. (8.8.14)

Rearrangement gives

1 = 1, .k, (8.8.15)

where Cj = —I if j = k and Cj = Cj otherwise. Equation (8.8.15) implies that the vector bjt+j = X!*=i Cjhj is orthogonal to the linearly independent set b,,..., b^, and thus the set b,,..., b^^, is linearly independent. This is a contradiction since the equation

E^.b,=0 (8.8.16) / = 1

has the solution 8^k-^i 1 and 8: -c, for / = 1,..., A;. Thus, we have proven that if the set {b,} is linearly independent, then A must be nonsingular. SOLUTIONS TO PROBLEMS INVOLVING Ik-TERM DYADICS 339

We can restate this point in the following theorem: THEOREM. 7/*bi,..., bj^ is a linearly independent set in an abstract linear vector space % {of arbitrary dimension as long as the dimension is greater than or equal to k), then the k x k matrix A = [(b,, b^)] is nonsingular. Since A is nonsingular, the inverse of A exists, and so Eq. (8.8.11) has the unique solution p = A~^a, yielding

k k «P = EEA7V,- (8.8.17)

We can assign the vectors bj,..., bj^^ as the first k vectors of a basis set an infinite number of basis vectors. To complete the basis set, we can add to these the vectors b|^+i, hf^+2^..., and with the aid of the Gram-Schmidt process, the added vectors can even be chosen to be orthogonal to the vectors bi,..., b^. The added vectors then obey the homogeneous equation

Kb,. = 0, / = fc + 1, A: + 2,.... (8.8.18)

Thus, the solvability theorem for the fc-term dyadic problem reads as follows: THEOREM, The equation

k Ku = Xla,(b,.'U)=f (8.8.19) 1=1 has the solution

oo u = Up+ Y^ c,b,, (8.8.20) i=k-\-l where Up is a particular solution to Eq. (8.8.19) and the b^, i = ^ + 1, A: + 2,..., are solutions to the homogeneous equation Ku = 0. Equation (8.8.11) guarantees the existence of a particular solution, and the solutions to the homogeneous problem, {b^^^^i, b;t+2' • • •)» can always be found by constructing basis vectors orthogonal to the set {b^ ..., b^^}. The theorem, of course, assumes that the A;-term dyadic has been reduced to a sum of linearly independent dyads. EXAMPLE 8.8.1. Consider the space C2{—1, 1) and the two-term dyadic operator K with the kernel

kit.s) = 2exp(-0+^sinr (8.8.21)

and the equation

Ku = f, (8.8.22)

where

fit) = 3 exp(-f) -h 5 sin t. (8.8.23) 340 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

Equation (8.8.22) is given explicitly as

-1 A 2exp(-/) f u{s)ds -\-smt I su(s)ds = 3txp(-t)-\-5sint. (8.8.24)

The operator K is represented by Eq. (8.8.1) with b^it) = 2, /?2(0 = ^ a^it) = e~\ and a2 = sint. The vector f yields the coefficients ofj = 3 and af2 = 5, and the relevant matrix elements are given by (bi,bi) = 4/_j dt = 8, (bi,b2) = (b2,bi) =2/|jrdr =0, and (b2,b2> = f^t^dt = |. Correspond- ingly, the equation Aj8 = a Eq. (8.8.11) becomes

8 0 3 — (8.8.25) 0 2/3 1 ^' 5 1 ^2 whose solution is jS, = | and ySj = y. Thus, the particular solution is

.X 3 15 «p(') = 4 + y'- (8.8.26)

Since b^it) = 2Po(0 and ^2(0 = PiiO, it follows that

KP, = 0, Z = 2,3,..., (8.8.27)

where P, is the Legendre polynomial:

(8.8.28)

The general solution to the problem is, therefore,

3 15 °° (8.8.29) ^ ^ 1=2 where the coefficients c, are arbitrary and

f^P,(t)PJt)dt^O, ly^m. (8.8.30)

• The set {P,} is known to be an orthogonal basis set in C2i—l, 1). Another interesting problem involves the operator of the form L = I + K, where I is the identity operator and K is a ft^-term dyadic. The corresponding inhomogeneous equation is

(I + K)u = f or u + Ku = f, (8.8.31)

u + Y,ai{hi,n)=f. (8.8.32) SOLUTIONS TO PROBLEMS INVOLVING k-TERM DYADICS 3 4 I

Clearly, if the quantities (b,, u) can be found, a solution of Eq. (8.8.32) is

k u = f-^(b,,u)a,. (8.8.33)

To generate a set of equations for the coefficients (b,, u), we take the inner product of Eq.(8.8.32) with b,, / = 1,..., /:. The result is the matrix problem

AP = a, (8.8.34)

where A is a /: x ^ matrix with elements

a^ = 1 4- (b,., a,) and Gij = (b,-, a^), / ^^ y, (8.8.35)

and fi and a are /^-dimensional vectors with elements

Pf = (b,., u) and a,- = (b,-, f). (8.8.36)

According to the theory of the solvability of linear algebraic equations, Eq. (8.8.34) has a solution if and only if the rank of A is the same as the rank of the augmented matrix [A, a]. When Eq. (8.8.34) is solvable, we know from Chapter 4 that the solutions are of the form

k-r P = P' + J:CJP). (8.8.37) j=i

where r is the rank of A, P^ is a particular solution, and AjS'j = 0, 7 = 1,..., k—r. The general solution to Eq. (8.8.32) then becomes

k k~-r k

/ = l 7 = 1 / = 1

EXAMPLE 8.8.2. For the space ^2(0' 1)» fi"^ ^he conditions of the solution of

u(t) -\-X f su(s) ds = fit). (8.8.39) Jo Here l(t, s) — 8(t — s) -}- Xs, where l(t, s) is the kernel of L = I 4- K, and so a^(t) = X and b^is) = s. We can multiply Eq. (8.8.39) by t and integrate to get

f tu{t)dt-^X f tdt f sii(s)ds = f tfit)dt, (8.8.40) JQ JQ JQ JO or ^ + ^\b„ u)==j\f(t)dt, (8.8.41)

We see that Eq. (8.8.40) has a unique solution only if A 7^ —2. Namely, M(0 = fit) -H^•HTi + ^) / ' tfit) dt (8.8.42) for arbitrary fit) in £2(0- !)• 342 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

If A. = —2, Eq. (8.8.40) requires that f(t) satisfy the equation

0= fr ' tf(t)dt. (8.8.43)

The resulting homogeneous solution is u{t) — p, where p is an arbitrary complex number. The solution to Eq. (8.8.39) then becomes u{t) = f(t) -f p. On the other hand, if X == —2 and f(t) does not satisfy Eq. (8.8.43), the problem does • not have a solution. We will end this section with the following theorem:

THEOREM. A k-term dyadic operator in H,

k L = i:u,vJ, (8.8.44) 1 = 1

can always be transformed into a p-term dyadic of the form

L= E u;,,-0,0j, (8.8.45)

where p <2k and the 0^ are orthonormal vectors, i.e., (0/,0;>=5,,.. (8.8.46)

The proof of this is simple. By the definition of a A:-term dyadic, the sets {Ui,..., u^} and {Vi,..., v^^.} are both linearly independent. The vectors V|,...,Vjt, however, are not necessarily linearly independent of the vectors Up ..., u^. If they are, the set {u,,..., u^^, Vj,..., v^j forms a set of 2k linearly independent vectors and the Gram-Schmidt procedure can be used to transform this set into the orthonormal set {0^,... ,^2^)- However, if some of the vectors Vp ..., Vjt are linearly dependent on the vectors Uj,..., u^, then the Gram- Schmidt procedure yields the orthonormal set {^i,..., 0^} with p < 2k. In either case, u, and Vj can be expressed as the linear combinations

«/ = E ccijt and V, = E MJ^ (8.8.47)

and so L becomes

/=1 y=l /=1 (8.8.48) P = E ^ji^j4>l

where

k o^j, = E"ijPn- (8-8.49) 1 = 1 PERFECT OPERATORS 343

Note that if L is self-adjoint, i.e., if L = L^ then

f: co,jii>,4'] = i: 4*;0j=E«*,0>j-, (8.8.50)

and, therefore, cOij = af•^.

8.9. PERFECT OPERATORS

We call an operator in H perfect if its eigenvectors form a basis set. For example, the operator L in £2(0* 1) defined by the differential expression

Lv{t) = -^. (8.9.1) Tt with the boundary conditions

i;(0) = i;(l), (8.9.2)

is perfect since its eigenvectors

V, = V2sinw7r^ n = 1, 2,..., (8.9.3)

with the corresponding eigenvalues A„ = {nnY for w = 1, 2,..., form a basis set in £2(0^ !)• In fact, not only does (8.9.3) form a basis set, it is an orthonormal basis set since we observe that

(v„, v^> = v^v^ = 2 1 sinnjrr sinmntdt = 8,„„. (8.9.4)

Thus, any vector f in £2(^' I) ^^^ ^^ expressed as

where

a„ = (y,J) = ylf. (8.9.6)

By substituting Eq. (8.9.6) and rearranging, Eq. (8.9.5) becomes

f = Ev„v:f= Ev^vMf. (8.9.7) n=l \n=l /

Since Eq. (8.9.7) is valid for any arbitrary vector f in £2(f^' I)» ^^ follows that the expression

00 is^v„v: (8.9.8) 344 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

is the identity operator in £2(0, 0- The individual dyads v„v] are integral operators with the kernels v,j{t)v*{s) — VlsinnntVlsinnns, and so I is an integral operator with the kernel

00 i{t,s) = '£v„{t)v:is). (8.9.9)

Therefore, Eq. (8.9.7) reads f = If, or

f{t)= f i(t,s)fis)ds (8.9.10) Jo for an arbitrary Lebesgue integrable function in ^2(0' !)• F^'o^ this it follows that the kernel i(t, s) must actually be the Dirac delta function 8(1 — s), i.e.,

00 S{t-s) = J2v„it)v*„is). (8.9.11) 1 = 1 Equation (8.9.8) represents the resolution of the identity and Eq, (8,9.11) provides a representation of the Dirac delta function in terms of the functions of an orthonormal basis set in £2(0^ !)• Equation (8.9.11) is sometimes referred to as the completeness relation. For the purpose of illustration, we considered the basis set given in Eq. (8.9.3). However, for any orthonormal basis set in £2(^' !)» Eqs. (8.9.8) and (8.9.11) follow. Similar arguments show that if v„,« = 1,2,..., is an orthonormal basis set in C2(^p), then the resolution of the identity is given by

where I is an integral operator with the kernel

S(r-s). (8.9.13)

In this case, 8(r — s) is the Dirac delta function in the D-dimensional EucHdean space. Suppose the eigenvectors v„, n = 1, 2,.,., of the operator L in ?^ form an orthonormal basis set. Then, since f = Y1T=\ ^n^l^ ^^^ Lv„ = X„v„, it follows that

00 U = j:x„y,yj (8.9.14) n = l

for arbitrary f. From this it follows that the operator L can be represented by

00 L = E^«v„v:. (8.9.15)

We have just derived the spectral resolution theorem for perfect operators in H that have orthonormal eigenvectors. We will explore this theorem in greater detail in Chapter 9. PERFECT OPERATORS 345

The significance of Eq. (8.9.15) is tremendous. Consider the case of the second-order differential operator defined by Eqs. (8.9.1) and (8.9.2). According to Eq. (8.9.15), this differential operator can be represented in the Hilbert space £2(0, 1) by an integral operator whose kernel is given by

l{t,s) = J2X„v,(t)vHs) (8.9.16) — Y^2(nn)^ sin nnt sinnTr^*.

It is straightforward to show that

00 L* = E>^>„v:, (8.9.17)

and, from that,

00 /(L) = X:/(X„)v„v„\ (8.9.18)

where f(t) is any function that can be represented by a series in t. In the notation of quantum mechanics, a dyadic K = J^„ a„b^^ is denoted by

K = J2M{hJ, (8.9.19) n

The meaning of the dyadic is that the operation of K on u gives

Ku = X;|a„)(b„u); (8.9.20) n

i.e., Ku is a linear combination of the vectors a„ whose coefficients are the inner products (b„, u). Our notation means the same thing. We write

Ku = ^a„bJ,u (8.9.21) n

in which b^u denotes the inner product in H. In the space £2(^0)' the inner product is simply

K^= l'"i K{r)u{r)d''r, (8.9.22)

whereas, in the space £2(^0* ^)' l^e inner product is

K^= f I K(rMr)k(r)d^'r, (8.9.23)

Thus, the operation of the dyad a„bj on u is defined in this case as taking the A;(r)-weighted inner product between hi and u. We prefer our dyad notation ab^ to the so-called "bra" and "ket" notation of quantum mechanics, |a>{b|, because the 346 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

former notation is more suggestive of the analogy between operators in abstract linear vector spaces and matrix operators in E„. As was true for basis sets in £"„, basis sets in an arbitrary Hilbert space need not be orthogonal. In such cases, the notion of a reciprocal basis set becomes important. If v„, n = 1, 2,..., is a basis set, the reciprocal set u„, n = 1, 2,..., is defined by the conditions

(v„,uj=5„,. (8.9.24)

How we go about finding the reciprocal basis set is an issue that we will address in later chapters. If v„ and u„, n = 1,2,..., denote reciprocally related sets in H, then, since the coefficients Q!„ exist such that

f = E««v„ (8.9.25) n

for any f, it follows from Eq. (8.9.24) that a„ = (u„,f). Equation (8.9.25) can, thus, be written as

f = X: v„(u„, f> = (E v„u,t)f, (8.9.26) n N n /

and so the resolution of the identity becomes

I = Ev„u: (8.9.27) n

for a nonorthogonal basis set in H. If the set {v,j}, n — 1,2,..., are eigenvectors of a perfect operator L, it follows from Eq. (8.9.27) that

l^ = EKyyn (8.9.28) n

and, similarly,

f(h) = j:f{k„)y,,nl (8.9.29) n

where /(O is a function that can be represented by a series in t. An important property of the dyad ab^ as we have defined it, is that its adjoint is given by

(ab^)^=:ba^ (8.9.30)

Thus, if 7^ = £•„, the quantity ab^ is a matrix whose components are a,^^ and whose adjoint matrix, ba\ has components b^a*. If 7^ is a function space, say £2(^0)' then ab^ is an integral operator with the kernel a(r)b*(s). In this case, the adjoint ba^ is also an integral operator with the kernel bir)a*{s). This property has PERFECT OPERATORS 347

important implications for a perfect operator whose eigenvectors are not orthogonal. The spectral resolution of such an operator is given by Eq. (8.9.28). The adjoint of this operator is

n from which it follows that

Ly = XX, n = l,2,.... (8.9.32)

Hence, the eigenvectors of the adjoint \J of a perfect operator L form the reciprocal basis set for the eigenvectors of L. Moreover, the eigenvalues of L^ are the complex conjugates of the eigenvalues of L. We see that the analogy between linear operators in abstract linear vector spaces and matrix operators in E„ is strikingly close, especially for perfect operators. The challenge, however, is to determine what classes of operators are perfect. In the following chapters we will investigate certain classes of integral and differential operators and we will try to provide an answer to this question. Next, let us consider the following theorem:

THEOREM. A self-adjoint, k-tenn dyadic is a peifect operator, has a complete set of orthonormal eigenvectors, and has real eigenvalues. To prove this, suppose

where the sets {Uj,..., U;^} ^^^ {^i,..., V;^} are linearly independent sets in a Hilbert space H. We define the adjoint of L by

k L^ = Ev/u/. (8.9.34)

The operations Lu and L^u can then be composed as follows:

k k Lu = E«/v/" = E(v/. u>«/ (8.9.35)

and

k k Vn = J2^iu]n = X^{u„ u)v,, (8.9.36)

where v/u and (v^,u) are, again, our two different notations for the inner product defining the Hilbert space. Using the Gram-Schmidt procedure, a set of orthonormal vectors {0i,..., 0^} can be constructed as a linear combination of {Vj,..., Vj^} and, thus, we can find coefficients y^y such that

v/ = E K/;0;, Where (^,,0,.) = <5,,.. (8.9.37) 348 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

Recall that the orthonormal functions 4>k+i' 0it+2' • • • can always be found to complete an orthonormal basis set 0,, / = 1, 2,..., in H. Insertion of the expression for v^ in Eq. (8.9.37) into Eq. (8.9.33) yields

L = X:i^;^]' (8.9.38)

where

fj^EYu»,- (8.9.39)

Since the set {0,} forms a basis set, we can express the vectors fj as

oo fj = Z"iA' j = h...,k. (8.9.40)

1 = 1

The operator L and its adjoint can now be represented as

k oo L = EEM/*I (8.9.41) and

k oo

Operating on the vector 0„ these expressions become

etl " ' ~ (8.9.43) 0, n > k.

and

i^'n=i:<i- (8.9.44) / = 1

Since Lu = L^u, comparison of Eqs. (8.9.43) and (8.9.44) leads to the conclusions or.^j = a*, for n < k and a^j = 0 for n > fc. So the A: x A: matrix A = [a^j] has the property

A = A^; (8.9.45)

i.e., A is self-adjoint. We know from Chapter 6 that a self-adjoint matrix can be expressed in the form

A = UAU^ (8.9.46) PERFECT OPERATORS 349

where A is a diagonal matrix whose diagonal elements are the real eigenvalues of A, and U is a unitary matrix, i.e.,

k U^U = I or X; ^>ij = ^ij' (8.9.47) i=\ Thus, we can write the coefficients of A as

k «,7 = E^/"//4 (8.9.48)

and express the operator L as

k L= E ^/«(;«;/0;0l- (8-9-49)

With the definitions

k k X, = E«./0, and x' = E

Eq. (8.9.49) becomes

L = E^/X/X/. (8.9.51) 1=1

Finally, we note that

k k {xi,xi')=^T.Ilj) i=i j=i (8.9.52) = E">7/' =5//'.

Thus, the vectors x,» / = 1,..., ^, are the orthonormal eigenvectors of L and the eigenvalues of A are the eigenvalues of L corresponding to Xt- Moreover, since (0., 0^.) =0 for / < k and j > k, it follows that

L0^. =0, 7 =:/^+l,fc + 2,...; (8.9.53)

i.e., the vectors 0^, for which j > k, are eigenvectors of L with zero eigenvalue. We see that the eigenvectors Xi» • •» Xjt^ 0jt+i» 0Jt+2- • • f^^^ ^^ orthonormal basis set in H and a A;-term dyadic in an infinite-dimensional vector space will always have an infinite number of eigenvectors having a zero eigenvalue. A similar proof can be given for the following theorem:

THEOREM. A normal k-term dyadic operator {defined by the property \AJ = iJh) in H has a complete set of orthonormal eigenvectors in H. 350 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

L can, again, be shown to be expressible in the form

k cvo

where the set {^J forms an orthonormal set. The condition LL^ = L^L and the inner products (0^, LL^0^) and (0^, L^L0^) lead to the conditions

k J2 otrj(x*sj =0, r or 5 > ;^, (8.9.55)

and

k 00 E«.;<; = E^;.«;>- (8-9.56)

Equation (8.9.55) for r = J: > A: impHes that a^j = 0 for r > 0. Thus, the A: x A; matrix A = [a,-^], according to Eq. (8.9.56), obeys the condition

AA^ = A+A; (8.9.57)

i.e., A is a normal matrix. We proved previously that normal matrices can be expressed as

A = UAU^ (8.9.58)

where U is a unitary matrix and the diagonal matrix A contains the eigenvalues X, of A. This result again allows us to transform L into the form of Eq. (8.9.51) in which the X/'s are orthonormal eigenvectors of L with eigenvalues A,. The A,, are not necessarily real as in the case of a self-adjoint matrix. It follows from the spectral theorem that, for any function f{t) that can be expressed as a series in t,

/(L) = E/(^/)X/Xj (8.9.59)

for a normal A:-term dyadic operator (of which a self-adjoint operator is a special case). A further generalization is that, for any f{t) that exists at r = A.,, i = 1,2,..., Eq. (8.9.59) defines /(L) for a normal operator. Examples of this are f(t) = rV2 and fit) = Inr, for which

/(L) = E^/^'X/Xj and f(L) = Y.\nX,Xixl (8.9.60) /=i e=i Let us close this chapter by noting that a linear problem in any Hilbert space can be mapped into a linear problem in the space E^. Consider the equation

Lu = f, njen. (8.9.61) PROBLEMS 351

Suppose v„, n = 1, 2,..., is an orthonormal basis set in H. Then it follows that u = Dn v„(v„, u) and f = Yln Vn(v„, f> from which Eq. (8.9.61) becomes

X:Lv„(v„u) = i:v„(v,,f). (8.9.62) n n

Taking the inner product of Eq. (8.9.62) with v^, we obtain

X:(v,, Lv,){v„, u) = (V,, f), m = 1, 2,.... (8.9.63) n

We can define the matrix A in E^ with components

a,„ = (v^,Lv„), (8.9.64)

and the vectors x and b with components

x„ = {v„,u) and Z?„ = {v„,f). (8.9.65)

Equation (8.9.63) then becomes the matrix equation

Ax = b, x.heE^, (8.9.66)

Hence, the problem posed in Eq. (8.9.61) has been converted into an equivalent problem in E^. Of course, handling vectors in the infinite-dimensional space E^ is not necessarily easier than dealing directly with differential and integral equations. However, the equivalence of any H to E^ does provide encouragement to seek analogies between problems in % and known results in £„. In the usual solution of practical problems, one almost always converts differential and integral problems to algebraic problems of finite dimension. Finite- element analysis is a popular example. The unknowns are approximated by a finite number of basis functions, and then the methods and theory laid out in Chapters 1-7 for solving algebraic systems are employed to get answers. This being the case, why pursue the theory of abstract vector spaces further? The answer is that the solvability of differential and integral problems is an important issue underlying any approximate solution. Also, there exists a great body of theory from classical analysis that often provides analytical solutions to important, illustrative problems. These solutions often identify similarities and important differences between finite- and infinite-dimensional problems and can be used to tCvSt approximating computer codes.

PROBLEMS

1. Solve the dyadic equation

.1 3 352 CHAPTER 8 INFINITE-DIMENSIONAL LINEAR VECTOR SPACES

2. Discuss the solution of the equation

/ cos 3(t — s)u{t)dt = sin 3t2 cos 3t

in £2(0,^)- 3. Consider the dyadic operators

(X)

and

where {v,} is an orthonormal basis set in a Hilbert space H and X, = /, fji. = 1/v^, and Vi = 1//. The square of the norm of an operator in H is 2 (Lu, Lu) IILir = max— —. ue?^ (U, U)

(a) Prove that Lj and Lj are unbounded and that L3 is bounded, i.e., ||L,||-||L2l|=ooand||L3|| < 00. (b) Which of the three operators are perfect? 4. Consider the operator L, defined by

M(1) = w(0) and —;— = —;— dx dx in £2(0, 1). (a) Find the eigenvectors and eigenvalues of L. (b) Normalize the eigenvectors. (c) As will be proved in a later chapter, L is a perfect operator. Give its dyadic form; i.e., give its spectral representation. Note that this enables you to express a differential operator as an integral operator. (d) Is L a bounded operator in the sense described in Problem 3? 5. Prove that if f belongs to £2(0^ !)» ^hen it belongs to £2(0, 1; e~^)\ i.e., prove that if I \f{t)\^dt< 00, Jo then ^ f \f(t)\^cxp{~t)dt< 00, Jo where the integrals are Lebesgue integrals. The proof is simple. FURTHER READING 353

6. The Hermite functions

/z = 0, 1, 2,.. ., form a basis set in Cii—oo, oo). Prove that they form an orthonormal basis set. 7. Suppose f„ € £2(^' 0» where /„(f) = r". Show by direct calculation that

i(f,„fji

8. Consider the vector space C2{—OQ, OO) and the sequence of linearly independent vectors

i;„(/) = r"exp(-^), fi=:0, 1,: 2 Use the Gram-Schmidt procedure to orthonormalize this sequence for n — 0, 1, 2. Compare the orthonormalized vectors w„, n =0, I, 2, with the Hermite functions defined in Problem 6. 9. Find the eigenvectors of the differential operator L in £2(^1 ^)' where

Lv{t) = -- dt' v(0) — 0, and dv/dt = i? at r = 1. 10. Consider the equation u + Ku = f, where u and f are vectors in C2(0, I) and K is an integral operator with the kernel

(a) Prove that ||K|| < 1, and so the equation has a unique solution. (b) Find the solution when f{t) = 1. Note that K is a dyad operator. (c) Find the solution when /(/) = sinf.

FURTHER READING

Bear, H. S. (1995). A Primer of Lebesgue Integration, Academic Press, San Diego. Bell, D. J. (1990) "Mathematics of Linear and Nonlinear Systems: for Engineers and Applied Scientists." Clarendon Press, Oxford. Churchill, R. V. (1958). "Operational Mathematics," McGraw-Hill, New York. Friedman, B. (1956). "Principles and Techniques of Applied Mathematics." Wiley, New York. Kantorovich, L. V. and Akilov, G. P. (1964). "Functional Analysis in Normed Spaces." Pergamon, New York. Kolmogorov, A. N. and Fomin, S. V. (1961). "Measure, Lebesque Integrals, and Hilbert Spaces." Academic, New York. Kolmogorov, A. N. and Fomin, S. V. (1970). "Introductory Real Analysis." Prentice Hall International, Englewood Cliffs, NJ. Riesz, F. and Nagy, B. S. (1965). "Functional Analysis." Ungar, New York. Schmeidler, W. (1965). Linear Operators in Hilbert Spaces. Academic, New York. This Page Intentionally Left Blank LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

9.1. SYNOPSIS

In this chapter we will study linear integral operators in Hilbert spaces. The emphasis here will be primarily on bounded operators (i.e., operators whose norms ||K|| are less than oo). The domain of an unbounded operator is a subset of vectors in the Hilbert space. In the class of bounded operators, completely continuous operators axe of special interest since almost everything we have learned about operators in finite-dimensional vector spaces is true for these operators. A completely continuous operator is bounded and can be approximated as closely as we please by a finite, n-term dyadic. Finally, we will consider the class of Hilbert-Schmidt operators that satisfy the condition J2n ll^^^nll^ < ^^' where {^„} is a complete orthonormal set in the Hilbert space. We will see that Hilbert-Schmidt operators are actually a subclass of completely continuous operators. As we did in Chapter 4 for finite-dimensional vector spaces, we will prove the Fredholm solvability theorems as they apply to the above classes of operators— allowing us to explore the solvability of linear integral equations. In particular, we will show that the equation Lu = f has a unique solution if and only if the only solution to the homogeneous equation Lu = 0 is u = 0. Furthermore, a solution to the inhomogeneous equation exists if and only (f (v,, f> = 0, r = 1,..., m, where {v,} are all the solutions to the adjoint equation h^y = 0. Section 4 of this chapter is devoted to Volterra equations of the first and second kind. This important class of equations is unusual in that solutions are generally unique and Volterra operators have no eigenvectors. We will show that

355 356 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

one-dimensional Volterra equations can be solved by the method of Laplace transforms when the kernel k{t,s) depends only on t — s. Volterra equations of the second kind can generally be solved by a method of iteration, which always converges for these equations. We will further show that one-dimensional initial value problems involving pth-order linear differential equations can be converted to the problem of solving corresponding Volterra equations. The important and extremely useful spectral theory of integral operators will be presented and derived for the class of completely continuous operators. We will show that any completely continuous operator can be represented in Schmidt's normal form K = J], '^/î0,^ where /c, / 0. The orthonormal sets {0i, 02' • • • 1 and {^1, ^2' • •} ^^^"^ orthonormal basis sets in H and are solutions of the self- adjoint equations K^K0, = ^?0, and KK^^, = Kff^, For the cases of normal, completely continuous operators (with K^K = KK^) and self-adjoint operators (with K^ = K), the spectral decomposition becomes K = J2i î4î4^% {4î^j) = ^17» where the eigenfunctions 0, form complete orthonormal basis sets in the appropriate Hilbert space. The eigenvalues {A,}, analogous to the case for matrices in the finite-dimensional theory, are always real numbers for self-adjoint operators, and, in general, are complex for normal operators. We will subsequently show that the spectral resolution for a function, /(K) = J2i f(î)4>i4>]^ follows quite naturally, provided f{t) exists for / = A,, / = 1, 2,.... In our presentation of the spectral resolution theorem, we will call upon various definitions and theorems that are useful in their own right, including BesseFs inequality: Jî l(0/'f)l^ ^ ll^ll^' where the set {0J is an arbitrary orthonormal set in Ti. We will also show that completely continuous operators have a finite number of eigenvectors for any nonzero eigenvalue X^. We note that if 0 • is an eigenvector of K of eigenvalue X,, then 4>i is also an eigenvector of /(K) of eigenvalue /(A,) if the function f{t) is defined for t == X^. We will also prove that if the operators L and K commute, i.e., if LK = KL, then K and L have at least one common eigenvector. Virtually everything we have learned about matrix operators in finite-dimensional vector spaces also holds for completely continuous operators in a Hilbert space. The class of completely continuous integral operators contains operators with continuous or piecewise-continuous kernels in a closed, bounded domain, Hilbert-Schmidt operators in bounded and unbounded domains, and operators with weakly singular kernels (kernels diverging as 1/1? — ?|", where 0 < a < D/2 and D is the number of independent components of r or s).

9.2. SOLVABILITY THEOREMS

The linear integral equations we will study in this chapter will be either of the first kind

Ku = f (9.2.1)

or of the second kind

u + Ku = f, (9.2.2) SOLVABILITY THEOREMS 357

where K is a linear integral operator. For the one-dimensional problem, these equations become

f k{t,s)iiis)ds = f{t) (9.2.3)

and

u{t) + f k{t, s)u{s) ds = /(O, (9.2.4)

where k{t, s) is the kernel of the integral operator K, k{t, s) and f(t) are known functions, and u{t) is to be determined. For the /^-dimensional problem, these equations take the form

f ^r,5)M(?)J^5 = /(r) (9.2.5)

and

u{r) -h i k{r, s)u(s) d^s = /(r), (9.2.6)

where r and s represent the D independent variables, d^s represents a volume element in the D-dimensional space in which r and s are defined, and ^£, is the volume of space over which these variables range. Whether r and s are D- dimensional Euclidean vectors or simply sets of D independent variables (such as concentrations of chemicals in solution) makes no difference. The kernel k{r, s) and the quantity f(r) are, again, known functions and ii(r) is to be determined. In proving the solvability theorems presented in this section, we will assume that w, k, and / are Riemann integrable functions in the closed domains of the independent variables. Examples of such functions include continuous and piecewise- continuous functions. We define the adjoint K^ of the integral operator K, in a one-variable problem, by

KV=/ k*is,t)v{s)ds; (9.2.7)

i.e., if k(t, s) is the kernel of K, then k*(s, t) is the kernel of K^ where k*{s, t) is the complex conjugate of k{s, t). The adjoint (Ku)^ of Ku is then given by

(Ku)^ = u^K^ = / u\s)k\t, s)ds. (9.2.8)

So in the Hilbert space C2{a, b), with the inner product defined as

(v,u)= / v\t)u{t)dt, (9.2.9)

the inner product of Kv and Ku becomes

(Kv, Ku) = v^K+Ku =111 v\s^)k\t, s^)k{t, 52)1/(^2) ds^ ds^ dt. (9.2,10) 358 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

We define the norm of K by (Ku, Ku) ||K|r = max u^O (U, U) b rh ch.... .,...... (9.2.11) fa fa fa ^*(^' '^l)^^^' S2)u*(Si)u(S2) ds^ dS2 dt = max -^» r„\u{s)\^ds Similarly, for the D-dimensional problem in the Hilbert space £2(^0)* ^^ adjoint K^ of K is defined as

K^v = Jr(5, r)i;(?)rf^5. (9.2.12)

Likewise, (Kv)^ is given by

v+K^=f i;*(?)A:*(r,?)J^^, (9.2.13)

and the inner product (Kv, Ku) by

(Kv, Ku) = v^K^Ku etc (9.2.14) = / / / vXsi)^^(r.H)^(r,'s^uCs'dd^s^d^S2d^r. JQD JQD J^O With the above definitions in hand, we can now state the following theorem:

THEOREM, /f Q^ ^^ a finite volume {or interval in one dimension) and k(r, s) is a bounded function (which it certainly is if it is continuous on the closed domain ^£,), then the norm of K is bounded', i.e., it is finite.

To prove this, first note that

I / u(s)d''s\ = |(u, 1)1 < (|u|, 1) < Hull lUII = llull^;/'. (9.2.15)

If M == max- J k(r, s), it follows that |(Ku,Ku)| < M^Q^KN, l)(|u|, 1)1 < M^^^^ljuf (9.2.16)

or, using Eq. (9.2.11),

IIKf = max ^^^ < M'Ql (9.2.17) u^O lluf Since ||I + K|| < ||I|| + ||K||, it also follows from Eq. (9.2.17) that ||I + K|| < {\-\-M)Q[), proving the theorem. All of the integral operators studied in this section will be of the type described above and will, therefore, have bounded norms. Let us now turn to the solvability theory of equations of the first and second kind. We will extend the problem to include D independent variables since the one-dimensional case is included in this more general case. First, imagine that the volume Q^^ is filled with small volume elements A^s. We shall number these volume elements 1, 2,..., n with the center of the ith element being located by SOLVABILITY THEOREMS 359

the vector r,. Since the integral in Eq. (9.2.5) is Riemann integrable, the integral equations of the first and second kind can be approximated as

n J^k.jA^su, = /;., / = 1,..., Az, (9.2.18a)

and

n J2(8,j -f k,j A''s)uj = ./;., / = !,..., n, (9.2.18b)

where k^j = kir^^rj), M, = M(?/), and / = /(r,). The homogeneous adjoint equations, KH = 0 and (I^K^)v = 0, are similarly approximated as

n Y^k*.A''sVj =0, / = 1,...,«, (9.2.19a)

7=1 and

n Y^iSij + ifc* A''5)i;^ =0, i = \,...,n, (9.2.19b)

where *t. = k*(rj,fi) and u, = i;(r,). In the Hmit A'^s -^ 0, Eqs. (9.2.18) and (9.2.19) become the integral equations

Ku = f (9.2.20a)

(I + K)u = f (9.2.20b)

and

KV = 0 (9.2.21a) (I + K^)\ = 0. (9.2.21b)

From Section 4.6, we know that Eq. (9.2.18) has a solution if and only if the condition

T,v*f = 0 (9.2.22) 1 = 1 is obeyed for any solution to the homogeneous adjoint equation (9.2.19). Multiply- ing Eq. (9.2.22) by A^^, we can express the solvability condition as

;f:i;*/.A^=0. (9.2.23) i=l

Taking the limit A^s ~> 0, Eq. (9.2.23) becomes

\ v\r)f{r) d^'r = (v, f) = 0. (9.2.24) 360 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

We also know from Section 4.6 that the number of solutions to the homogeneous equations

n Y^kijA^suj =0, / = 1,... ,n, (9.2.25a) 7 = 1 and

n J2(8ij 4- kijA^s)iij =0, / = 1,..., n, (9.2.25b)

7 = 1 are the same as the number of solutions to the homogeneous adjoint equations. And when there are no solutions to the homogeneous equation, there is a unique solution to Eq. (9.2.18) for arbitrary /,,...,/„. In the limit A^^ -> 0, Eq. (9.2.24) becomes

Ku = 0 (9.2.26a)

and

(I + K)u = 0. (9.2.26b)

Since the algebraic approximations to the integral equations converge to the continuum equations, it seems reasonable to expect the algebraic solutions to converge to solutions to the continuum equations. If u, k, and / are continuous functions in a closed finite domain of independent variables, this is indeed true. Detailed mathematical proofs can be found in W. V. Lovitt's Linear Integral Equations, Dover, 1950, and in R. Courant and D. Hilbert's Methods of Mathematical Physics, Interscience, 1953. Thus, for the continuum case, when L = K or I -f K and w, k, and / are continuous and Qj^ is finite and closed (such that M, k, and / do not diverge on the boundary of Qp), we can summarize the Fredholm alternative theorems as follows:

FREDHOLM'S ALTERNATIVE THEOREM. (1) The equation

Lu = f (9.2.27)

has a unique solution if and only if the only continuous solution to the homogeneous equation

Lu = 0 (9.2.28)

is u = 0. (2) Alternatively, the homogeneous equation has at least one solution if the homogeneous adjoint equation

V\ = 0 (9.2.29)

has at least one solution. The homogeneous equation (9.2.28) and the adjoint homogeneous equation (9.2.29) have the same number of solutions. (3) When solutions to the homogeneous equation exist, the inhomogeneous equation (9.2.27) has a solution if and only if

(v^,f)=:0, / = l,...,/n, (9.2.30) SOLVABILITY THEOREMS 3 6 I

where v, is a solution to the homogeneous adjoint equation. The number m of solutions to the homogeneous equation can be infinite in infinite-dimensional vector spaces (as are the function spaces appropriate to integral equations). Although we have outlined the proof of the theorem only for Riemann integrable kernels k and functions / and u in bounded (closed) domains Qj^ of independent variables, the solvability theorem often applies to singular problems— which involve unbounded kernels and/or cases in which one or more of the independent variables goes to oo. Such cases will be illustrated in some of the examples to follow. In Section 9.3 we will examine the vsubclass of completely continuous operators for which the Fredholm solvability theorems apply. ••H EXAMPLE 9.2.1. Solve the equation u + Ku = f, where K is a dyad operator, i.e., K = ab^ and u, f E €2(0, 1). The equation is given by

u(x) - - / xtu(t)dt = —. (9.2.31) 2 JQ 6 The methods for solving dyadic equations were described in Chapter 8. For the equation u -f- abû = f, take the inner product with respect to b (to obtain (1 + bâ)bû = b^f) and solve for bû. Here a(x) = -\x, b{x) = jc, and f(x) = 5JC/6. Multiplying Eq. (9.2.31) by x and integrating over dx, we obtain

/ xu(x)dx - - f x^dx f tu{t)dt = - f x^dx (9.2.32)

or b^u = f^xu(x)dx = |, and so u{x) = x is the unique solution to Eq. (9.2.31)—unique because u(x) = 0 is the only solution to the homogeneous III equation. ••• EXAMPLE 9.2.2. Solve the equation u 4- ab^u = f, where u, f 6 £2(0^ ^)- The equation is given by

/.oo u{x) + / exp(-(jc + t))uit) dt =x exp(-jc). (9.2.33) Jo Again, solve by taking the inner product of the equation with b:

/.oo /.oo /.oo / exp(—JC)M(JC) Jjc + / exp(—2JC) Jjc / exp(—0^(0^^ ° ^ ° " (9.2.34) = / xQxp{—2x)dx, Jo and so b^u = f^exp{—x)u{x)dx = g. Thus,

u{x) = \x- - ) exp(-jc) (9.2.35)

• • • is the unique solution to Eq. (9.2.33). ••• EXAMPLE 9.2.3. Solve the equation Ku = f, u, f € C2{~oo, 00), where

exp(-(jc - tf)u{t)dt = exp I —- j. (9.2.36) 362 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

This equation is solved by the method of Fourier transforms. The Fourier transform u{k) of a function u(x) is, by definition, 1 r'^ u(k) = —==z \ exp{ikx)u{x)dx. (9.2.37)

The inverse Fourier transform of v{k) is likewise 1 c"^ u(x) = -j= / exp(-ikx)uik)dk (9.2.38) V 27r '^-oo and nicely reciprocal in form to Eq. (9.2.37). According to the theory of Fourier transforms, the Fourier transform of fT^k(x — y)u(y)dy is V27tk(k)u{k). Thus, the Fourier transform of Eq. (9.2.36) is

exp (-^y(k) = exp (-^\ (9-2.39) V2 and so

«« = ,/l3HrfZl). (,,2.40)

which, since

1= r expiikx - x^) dx = £!^PLi!Zl), (9.2.41) 27r "^-oo v2 inverts to

u{x) = J - exp(-jc^). (9.2.42)

This is the only solution to Eq. (9.2.36). EXAMPLE 9.2.4. Solve the equation Ku = f, where u, f € AC/^s)* and R^ is an unbounded three-dimensional Euclidean vector space, i.e., r = xi -\- yj -f zi, —oo < jc, y, z < oo. In polar coordinates, r = rsin^cos0i' 4-rsin^sin0j + r cos^^, where 0

j exp(-(r - s)^)u(s)d^s = exp (^ )• (9.2.43)

Again, the method of Fourier transforms is useful. In three dimensions, the Fourier transform is defined by

«(^) = TTW I ^^PO'^ • ?)"('') d'r (9.2.44)

and has the inverse 1 r ^ - u{r) = / exp(-/A: • r)u(k) d^k. (9.2.45) SOLVABILITY THEOREMS 363

The three-dimensional Fourier transform of f^ k(r—s)u(s) (Ps is {2n)^^^k(Jk)u(k), which yields for Eq. (9.2.43) the result

The transforms of exp(—r^) and exp(—r^/2) are

(9.2.47) 23/372 ^^PCT") ^""^ ^""PIT")' and so

^(^) = ;^^^P(-^)' (9-2.48)

which inverts to «.

u{r) = (^-j exp(-r^). (9.2.49)

This is the only solution to Eq. (9.2.43). EXAMPLE 9.2.5. Solve the equation Ku = f for u, f € C^i-l, 1), where K is the two-term dyadic

K = POP1 + )SP,PI, (9.2.50)

^ is a number, and P^ is a Legendre polynomial defined by

Assume that f{x) = 1 -f 4jc so that the integral equation is

Po{Pi,u)-f^Pi(Pi,u)=f (9.2.52)

/ tu{t) dt + pxf tu(t)dt = l-\-4x. (9.2.53)

This has a solution only if

/ tu(t)dt = \ and pf tu{t)dt=A. (9.2.54)

Thus, only if j8 = 4 and /_j tu{t) dr = 1 is there a solution—for example, u{t) = t. Since {P/,P;„) = 0 if / 7?^ m, the homogeneous equation Ku = 0 has the solution

Ufc = aoPo + E«.P.-' (9-2-55) i=2 364 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

where the a^ are arbitrary complex numbers. Thus, if ^ = 4, the general solution to Eq. (9.2.52) can be written as

/=0 where QJJ = 1 and the other a, are arbitrary. If j6 7^ 4, there is no solution. Expressed as the Fredholm alternative theorem, the condition on f is

(v, f) = 0, (9.2.57)

where v is any solution to K^v = 0 or any solution to

Pi(Po,v) + ^Pi(Pi,v>=0. (9.2.58)

^^ The solutions to this equation are v, = P,, / = 2, 3,..., and v such that

/ v(t) dt + p I tv{t) dt = 0. (9.2.59)

The solution to this equation that is linearly independent of P^, / > 1, is i; = b(t - P/2) or V = 6(Pi - (iS/2)Po), where b is arbitrary. Since f = PQ -f 4P,, the Fredholm condition, Eq. (9.2.57), yields

^r-^(Po,Po>+4(P„P,)l=0, (9.2.60)

or, as before, we find that j6 = 4 is the condition for solution to the inhomogeneous III equation. ••• EXAMPLE 9.2.6. Consider an orthonormal basis set v^, V2,... in an arbitrary infinite-dimensional Hilbert space H. The operator

00 K = E^/V,vJ (9.2.61)

is a perfect operator for any set of numbers X,. Suppose X, ^ 0 and consider an arbitrary vector f eH. The equation Ku = f, or

^X,v,(v,,u>=f, (9.2.62)

has the unique solution

u = EV'(Vi,f)v,- (9.2.63) i For this operator, a bound of the norm is simple to compute. Any vector v in H can be expressed as

v = E«.v,' (9.2.64) SOLVABILITY THEOREMS 365

and so

Kv = X!«r^«^* (9.2.65)

Then

(v,v) T.i\oiir i

Consider the three cases: (a) )^i = l/i; (b) X, = l//'/2; (c) X, = i. In case (a), X!, A? = X!. 1/'^ < oo. and so K^ is a bounded operator. In cases (b) and (c), J], |A, p = oo, and so Eq. (9.2.66) does not establish a bound for Kb or K^.. However, since (Kv,,Kv,) , , ' .' = kl (9.2.67) (v,-.v,.) it follows that

r->oo {v.,V,.)

is 0 for case (b) and is infinite for case (c). Thus, the operator

oo I K^ = ETV,VJ (9.2.69)

is definitely bounded and

oo K'= = J2iyi^i (9.2.70)

is definitely unbounded, whereas

oo I (9.2.71)

is different from both. K*' has the property that there exists a sequence K„ of finite dyadics such that

lim ||(K' -- KJull = 0 for any neH. (9.2.72) n->oo In particular, we can choose

K« = i:T^v,vJ. (9.2.73) 366 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

SO that

OO 1 IKK^-KJuf = Y: TI(V,,U)|2

1 «^ <;7:^ E l(v,'U)p (9.2.74)

1 ~ 1 III n + 11 —.^,^ ' n ++ 1

Clearly, lim ||(K — K„)u|| = 0 for the operator K*'. Operators K obeying the condition that there exists a sequence of finite dyadic operators converging to K in the sense of Eq. (9.2.72) are called completely continuous operators. We shall see later that this special class of operators obeys most of the known theorems of finite linear vector spaces. For instance, the Fredholm alternative theorems of solvability are true for this class of operators, even though it is bigger than the class of integral operators having continuous kernels in a finite domain. K^ and K*' are both examples of completely continuous operators. However, the kernel of K^ is a continuous function in its domain of definition, whereas the kernel of K^ is not.

9.3. COMPLETELY CONTINUOUS AND HILBERT-SCHMIDT OPERATORS

A completely continuous operator K is one that can be uniformly approximated by a sequence of finite-term dyadic operators:

K„ = i:u,vJ. (9.3.1) i=l

This means that, for any vector u, there exists an integer n(€) such that

||(K-KJu||<€||u|| (9.3.2)

for all values of « > n(^). Heuristically, if an operator is completely continuous, then it can always be well approximated by a finite /i-term dyadic, and so all of the properties of matrix operators in finite-dimensional vector spaces can be expected to hold. An example of a completely continuous operator is

OO K = E^(0.*J' (9-3.3)

where 0,, / = 1, 2,..., is a complete orthonormal basis set in a Hilbert space H and A., :^ 0, A,_^i < A,, and A,, -> 0 as / -> oo. The «-term dyadic

K„ = EX,^,^J (9.3.4) r:=i COMPLETELY CONTINUOUS AND HILBERT-SCHMIDT OPERATORS 367

forms a sequence that approximates K uniformly. Since any vector uinH can be expressed as

1=1 it follows that

(K-KJu= J2 îîî^ (9.3.6) i=«+i and so

IKK - K„)u|p = X: ^/Kf < KM E KI' < V.llull'- (9-3.7)

Thus, if we choose n{€) such that A.^(^)_^j = 6, then Eq. (9.3.2) holds for any n > n(€). Actually, any perfect operator K = Y!h=i ^i4^i4>] forwhic h X, -> 0 as / -> oo is a completely continuous operator. As a counterexample, the operator

oo K = E'*-*J (9.3.8) / = 1 is not completely continuous, even though it is perfect (perfect since its eigenvectors form a basis set in V). We define a Hilbert-Schmidt operator as follows:

DEFINITION. An operator K is a Hilbert-Schmidt operator if K is hounded and if

oo 5:i|Kiir;|p

THEOREM. A Hilbert-Schmidt operator is completely continuous. To prove this, we note that I = Yl^x fii^] ^^^ make use of the identity u = lu, or

oo « = E^i>Jw' (9.3.10) 1=1 giving

oo Ku = ^^(Kf ,)^Ju. (9.3.11) 1=1 The fact that K is bounded (i.e., ||K|| < M, where M < co) ensures that the series in Eq. (9.3.11) converges to Ku. Since K^, is a vector in V,, the operator

K„ = i:(K^,)f J (9.3.12) 368 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HUBERT SPACE

is an w-term dyadic operator and

oo (K-K„)u- Yl Kf,^Ju. (9.3.13)

However, by the triangle and Schwarz inequalities, it follows that II oo 11 oo X: Kf,if]u)\< J2 ||K^,|||lfr/u| / oo oo \ '/2 < E IIKf.lP E ll«^l»n (9.3.14)

/ oo \ J/2 < E liKlJ'.lP) Hull.

and the convergence of the series in Eq. (9.3.9) implies that there exists an integer n(€) such that

oo E l|Klfr,lP

for n > n(€). Thus, from Eqs. (9.3.13) and (9.3.14), we conclude that

||(K-KJu||<6||u|| (9.3.16)

for n > n(€). This proves that K can be uniformly approximated by the sequence of «-term dyadics K„, and so K is completely continuous. As an example, consider the operator

oo 1 K = ET0.0J, (9.3.17) / = l ' where {0,} is a complete orthonormal basis set. K is, by definition, a completely continuous operator. Furthermore, K is a Hilbert-Schmidt operator. This is easily seen by letting ^, = 0,. We then find that

oo oo 1 oo 1 E IIK^.f = T,-2{i'i)=j:-2<^- (9.3.18)

As we have shown, all Hilbert-Schmidt operators are completely continuous. However, Hilbert-Schmidt operators form a subclass of the class of completely continuous operators. As an example, consider the completely continuous operator defined by Eq. (9.3.3). It follows that

oo oo 1 ^||K0,f = X:T = OO, (9.3.19)

illustrating that K is an operator that is completely continuous but it is not a Hilbert-Schmidt operator. It is important to note that boundedness alone is not COMPLETELY CONTINUOUS AND HILBERT-SCHMIDT OPERATORS 369

enough to make an operator either completely continuous or Hilbert-Schmidt. For example, the unit operator I = Ydli ^i4>l is bounded, since ||Iu|| = ||u||, whereas the sum of its eigenvalues, A., = 1, is infinite and there is no sequence of finite- term dyadics that converges uniformly to I. Thus, I is neither Hilbert-Schmidt nor completely continuous. Let us now consider the following theorem:

THEOREM: If the kernel k(r, s) of an integral operator K is Lebesgue square integrable in H with respect to the variables r and ?, then the operator K is completely continuous. This means that k(r, s) is square integrable in the two-dimensional domain spanned by r and ?, i.e., H — C^fS^^ x ^^). To prove the theorem, consider an oithonormal basis set ^,. Recall that I = J^"^^ fif], where I is the identity integral operator whose kernel /(r, ?) = Y^L} ^i(^)f*(^) = 5(? - ?)» the Dirac delta function. Therefore,

E IIK^.II' = E/ ^""^ / k\r, s)f;{s)d''s I k(r, 7)f,{7)d^s' i—\ 1=1

= / d''rd''sd''s'k\r,s)k{rJ')Tfi{s')ir-{s) ^ i=\ (9.3.20) = jd^rd^sd^s'k*(r,s)k(rj')8{s' -s)

= fd^rd^s\k(rj')\\

Since f d^rd^s\k(r,s)\^ < oo by hypothesis, it follows from Eq. (9.3.20) that K is a Hilbert-Schmidt operator, and so K is also completely continuous. Equa- tion (9.3.20) also proves that if K is a Hilbert-Schmidt operator, i.e., Eq. (9.3.2) holds, then fd^rd^s\k(r, s)\^ < oo. Thus:

THEOREM. An integral operator K is a Hilbert-Schmidt operator if and only ifk(rj) € €2(^0 X ^D)' EXAMPLE 9.3.1. Let k(t, s) be a continuous function of t and s in the finite interval [a,b]. Then

/ / \kit,s)\Usdt

where M = max^<^^<;., k{t, s), and so K is a Hilbert-Schmidt operator in C2(a, b). EXAMPLE 9.3.2. Let k{t, s) = exp(-r2 _ ^2) ^^^ ^ ^ C2{-oo, 00). Then

/ \k{t,s)f-dsdt= \ / Q\^{-2t'^-ls'^)dsdt = ~, (9,3.22) -OO •^—oo *'—oo "^-oo 2 and so K is a Hilbert-Schmidt operator in C2{—oo, oo). EXAMPLE 9.3.3. Suppose K is an integral operator in C2{(i, b) with a kernel of the form k{t, s) = kit, s)l\t — s\", where a and b are finite, 0 < a < |, and 3 70 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

\k(t,s)\ is a continuous function of t and s in the interval [a,b]. Prove that K is a Hilbert-Schmidt operator. Since k{t, s) is continuous for t,s e [a, b], it follows that \k{t, 5)| < M < cx), and so

/ f \k(t,s)\^dtds= f f \k{t,s)\\ ^ dtds

<^' t tTT-^dtds (9.3.23)

M^2{b - fl)2-2a < < oo. (l-2a)(2-2a) This proves that K is a Hilbert-Schmidt operator. EXERCISE 9.3.1. Suppose K is an integral operator in €2(^0), where ^p is a finite D-dimensional closed domain. Assume that the kernel of K is of the form kirj) = pl4^, a<^, (9.3.24) \r-sr 2 where k(r, s) is a continuous function of r and s in ^£,. Prove that K is a Hilbert- Schmidt operator. Hint: Use D-dimensional polar coordinates to establish that the integral /^ f^ \r — s\~^"d^rd'^s is bounded. \r — s\ is the distance between r • and s. In Cartesian coordinates, \r -s\ = [E,^i(^/ - •y,)^]*^^- To prove the solvability of the equation (I -f- K)u = f, where K is completely continuous, consider the equation

(V + W)u = f, (9.3.25)

where V has an inverse and W is the fc-term dyadic

W = J2^jh]. (9.3.26) r = l Then Eq. (9.3.25) can be rearranged to give

(I + T)u = g, (9.3.27)

where I is the identity operator, T is the fc-term dyadic

T^J^Cjb], (9.3.28)

with

c^. =V-^a^ (9.3.29)

and

g = V-^f. (9.3.30) COMPLETELY CONTINUOUS AND HILBERT-SCHMIDT OPERATORS 3 7 I

Taking the inner product of Eq. (9.3.27) with b, yields the algebraic system

k J2{Sij -f a,j)xj = A. / = 1,..., ^, (9.3.31) 7=1

which can be written as

Ax = P, (9.3.32)

where the components of A, x, and fi are

(9.3.33) X. =bju= (b,,u)

The solvability theory of the algebraic problem was presented in great detail in Chapter 4 and will not be repeated here. We note only that if L is the sum of I and a A:-term dyadic operator, the solution to Lu = g reduces to solving the problem Ax = P, where A is a ^ x A: matrix and x, j8 € Ef^. Furthermore, the problem has a solution if and only if z^P — 0, where z is any solution to A^z = 0. If there is a solution to the algebraic system, then the solution to Eq. (9.3.27) is

k-r u = uP + x;y;u*^'

k k~r k = g-E4s-EKE^Ny' j=i 1=1 j=i

where jcj are components of a particular solution to Eq. (9.3.32), and Xj' are components of the rth solution to the homogeneous equation Ax'^ = 0. The coefficients Yi are arbitrary, and r is the rank of the matrix A. As stated above, the purpose of discussing the solution to Eq. (9.3.25) is to recognize that the equation

(I + K)u = f (9.3.35)

can be converted to the form of Eq, (9.3.32) if K is a completely continuous operator. If K is completely continuous, then there exists an /i-term dyadic K„ such that if

R„=K~K„, (9.3.36)

the remainder operator R„ has a norm less than unity, i.e.,

IIRJI < 1. (9.3.37) 372 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

Accordingly, Eq. (9.3.35) can be written as

(I-|-R„ + KJu = f, (9.3.38)

where

K„ = i:u,vJ. (9.3.39) i=l

We showed in Section 8.7 that if ||RJ| < 1, then the inverse (I + RJ"^ of I + R„ exists and can be expressed as

oo (I + RJ-'=I+^(-R„)^ (9.3.40)

Thus, by multiplying Eq. (9.3.38) by the inverse (I + R„)~^ we obtain

(I + T)u = g, (9.3.41)

where

(9.3.42) w, = (I + R„)-'u, g = (I + R„)-'f.

This proves that the solution of Eq. (9.3.35) can be reduced to solving a linear system in a finite-dimensional vector space if K is completely continuous.

••• EXAMPLE 9.3.4. Solve the equation of the form

u + aibju + a2b^u = f, (9.3.43)

given by

1 /-i u(t) + - sin(7rO / COS(7Ts)u(s)ds "' , (9.3.44) + - sin(27rO / (cos 2ns)u{s)ds = t^. 4 J-i

Define

Xi = cos(7ts)u{s)ds and -^2=/ cos(27ts)u{s)ds. (9.3.45) COMPLETELY CONTINUOUS AND HILBERT-SCHMIDT OPERATORS 3 73

Multiply Eq. (9.3.44) by cos(7rO^^ and cos{27tt)dt, respectively, and integrate to obtain

I 1-i-- / cos(;rO^M^i 4--I / cos(7rOsinr JMJC2

= / tcos(jTt)dt (9.3.46) I - I cosilnt) cos r JMjci + I 1 -f - / cos(27rf) sin(27rO dt p

A = / t'^ cosilnt) dt. -1 The unique solution to this system is -4 1 jci = —IT and JC2 = —:r, (9.3.47) and so the unique solution to Eq. (9.3.44) is 1 1 u{t) = r 4- -T sin nt - -—r sin Int. (9.3.48) 7T^ 47r^ EXAMPLE 9.3.5. Solve the equation

u + Ku = f, (9.3.49)

where u, f G C2{—OO, OO\ exp(—^^)) and

K = aa^ (9.3.50)

with

a(t) = t^ and f(t) = t\ (9.3.51)

In this Hilbert space, v^u denotes the inner product

CO v*{t)u{t) exp(-r^) dt. (9.3.52) / -oo Thus, Eq. (9.3.49) corresponds to the expression oo Qxp(-s^)sMs) ds = t\ (9.3.53) / -oo Let

CO exp(-5^)5^M(^) ds. (9.3.54) / -co Multiply Eq. (9.3.53) by t^tx^{—t^)dt and integrate (this is equivalent to taking the inner product between a and u + Ku = f) to obtain

(1 + r exp(-r^)r2dt]x^ = r txp(-t^)t'^dt (9.3.55) \ •'-oo / ^'-oo 3 74 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

„ = ^^, ,,.3.56, 1 -f- v^/2 since

e\p{-t^y dt = ^ and / exp{-t^)tUt =-^/n. (9.3.57) -CX5 ^ *'—OO 4 Thus, the unique solution to Eq. (9.3.53) is

«(0 = t'- -^^Pr/. (9.3.58)

Completely continuous operators are of special interest for two reasons. First, the Fredholm alternative theorem is obeyed for these operators. That is:

THEOREM. Let K be a completely continuous operator. If L, = 1 -i-K or L = K, then

Lu = f (9.3.59)

has a solution if and only if

(v,f>=0 (9.3.60)

for any solution to the homogeneous adjoint equation

V\ = 0. (9.3.61)

Equation (9.3.61) and the homogeneous equation

Lu = 0 (9.3.62)

have the same number of solutions. If Eq. (9.3.61) has no solution, then a unique solution to Eq. (9.3.59) exists for any vector f in the Hilbert space H in which K is defined. When Eqs. (9.3.61) and (9.3.62) have homogeneous solutions, the general solution to Eq. (9.3.59) is given by

m u = uP-f ^y.uj', (9.3.63) »=i where u^ is a particular solution to the inhomogeneous equation (9.3.59) and v^,i = 1,. .. ,m, are linearly independent solutions to the homogeneous equation (9.3.70). The proof of the above theorem is accomplished by proving it for the operators L„ = I + K„ or L„ = K„, where K„, n = 1, 2,..., is the n-term dyadic sequence that converges uniformly to L = I -f K or L = K. The essence of the proof is to prove that the solutions u„ to L„u„ = f converge to the solution u of Lu = f. We will not present all of the details of the proof. However, the necessity of the condition expressed by Eq. (9.3.60) is easy to prove. We assume that a solution VOLTERRA EQUATIONS 375

u to Lu = f exists and take the inner product of a solution v of L^v = 0 and Lu, i.e., (v, Lu). By the definition of the adjoint operator, it follows that

(V, Lu) = (LV, U) for V, u € -H. (9.3.64)

For example, consider the Hilbert space £2(^D)- Then Eq. (9.3.64) becomes

{v,Lu)=f (Prv*{r)i k{rj)u(s)d^s ^"^ ^"^ (9.3.65) = 11 k(rj)v*{r)u(s)d^rd^s,

whereas

(L+v,u)=:/ d^^lf k^s,r)v(s)d^s) u{r) ^"^ ^^"^ ^ (9.3.66) = / / k{s,r)v*(sMr)d^rd^s.

The integrands of Eqs. (9.3,65) and (9.3.66) differ only in the interchange of dummy variables, and so the integrals are identical, proving that Eq. (9.3.64) is true. Thus, if we assume that u is a solution to Lu = f and v is a solution to V\ = 0, the Fredholm condition, E^q. (9.3.60), follows from Eq. (9.3.64). The second reason that completely continuous operators are of interest is that completely continuous operators that are self-adjoint or normal are perfect operators and so obey the spectral resolution theorem of integral operators—examined in Section 9.5. In the next section, however, we will examine Volterra equations, which form a special, and somewhat different, class of integral equations.

9.4. VOLTERRA EQUATIONS

A Volterra equation of the first kind is a linear integral equation of the special form Ku = f, or, in integral form,

f ^(r,?)M(?)J^ = /(r), (9.4.1)

where

kir, ?) = 0 if any 5^ > r,., (9.4.2)

and Si and r^ are the independent components of s and 7. In one dimension, the Volterra equation reads

/ k(t,sMs)ds = fit). (9.4.3)

The lower limit of s and t can be a nonzero constant a, \a\ < oo, but without loss of generality the variables can be redefined so that the lower limit becomes 0. 376 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

FIGURE 9.4.1

Volterra equations in more than one dimension are not so common, and so we will restrict ourselves to analyzing one-dimensional problems in the rest of this section. An example of a Volterra equation of the first kind in which k is not continuous arises from Abel's problem. Consider a curve /i(§) in the ^ — r; plane as shown in Fig. 9.4.1. A particle beginning at rest at point P can slide freely under gravity (which is in the —rj direction) down the curve. From Newton's second law, it follows that the velocity of the particle at a point Q on the curve is given by ds = -^2g{y - ri). (9.4.4) It where s represents the arc length along /?(§). Thus, the time it takes to go from point P at jc, y to the origin O is given by

MO) ds (9.4.5) (>•) ^2g(y-T])' But since s is a function of § (or, equivalently, t] through the relation rj = h{^) or ^ == h\r])), we can write ds = u(i])dr], where u{r]) = v 1 + (dh'/dt])'^. Equation (9.4.5) then becomes u{r])dri (9.4.6) •^0 y/2g(y-r])' Abel's problem was to find that curve u(r]) for which the time t to slide from the point x, y to the point 0,0 is a prescribed function f{y) of y. Thus, Abel's problem is to solve the integral equation

u{r))dr) fiy) = f (9.4.7) Jo ^2g{y - T]) for u{ri). This is just a Volterra equation of the first kind with the kernel 1 k{y. n) = (9.4.8) y/2g(y-r])' VOLTERRA EQUATIONS 377

A generalization of Eq. (9.4.7) is the Volterra equation

/ , dn^fiy). 0

This problem can be solved by the method of Laplace transforms. The Laplace transform Cg of a function g{x) is defined as

/.OO Cg = g{s) = / exp(-^jc)g(jc) dx, (9.4.10)

If g(jc) is continuously differentiable, then the Laplace transform of dg{x)/dx = g\x) is

g\s) = / txp{-^sx)~f-{x)dx = sg{s) - g(0), (9.4.11) ^0 dx as can be seen by integrating by parts. An important property of the Laplace transform is the convolution theorem, namely,

c(fkix-^M^)dn =k(s)u(s), (9.4.12)

Thus, if a particular Volterra kernel k{x, §) depends only on the difference JC — $, i.e.,

f k{x-H)u{H)dH = f{x), (9.4.13)

then the Laplace transform of Eq. (9.4.13) yields the simple relation

k{s)u{s) = />), (9.4.14)

and so the solution to Eq. (9.4.13) can be obtained by finding the inverse Laplace transform of M(5) = f{s)/k{s). Consider again Eq. (9.4.9). The Laplace transform of l/x^' is r^ 1 1 r^ 1 r(l - a) / cxp(-sx) —dx = -jz^ / exp(j) — dy = j^^^—, (9.4. 15)

where r(l — a) is a gamma function. From this result and the convolution theorem, we find by taking the Laplace transform of Eq. (9.4.9) that

r(i-Qf)

Given the Laplace transform of fiy), we can find u{y) by inverting the Laplace transform in Eq. (9.4.16). For the special case that /(>) is continuously differentiable, Eq. (9.4.11) implies that sfis) = fis) -f /(O), allowing the rearrangement

5"r(i —Of) 5'^r(i —a) 3 78 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

where f\s) denotes the Laplace transform of fix) = df(x)/dx. Since the inverse Laplace transform of 1/5"" is x'^'^/Tia), it follows from Eq. (9.4.17) that the solution to Eq. (9.4.9) is sin(Q:7r) f(0) . r ^ u(y) = +r 7—^f(^)^^i (9.4.18) n r"' Jo (y-riy(y-ri) " where we have made use of the relationship l/[r(a)r(l —a)] = sin(a:7r)/7r when 0

u(y) = / , dt], (9.4.19) TT Jo y/y - r] when fiy) is continuously differentiable. Even if f(y) is not continuously differentiable, the solution can be found by Laplace inversion of Eq. (9.4.16). In any case, Eq. (9.4.16) implies that Abel's problem and its generalization has a unique solution since the only solution for f{x) = 0 is u(s) = 0. A further generalization of Abel's problem is the Volterra equation

y /i(y, ri) I, , ^ 'u(rj)drj = f{y), 0 < a < 1, (9.4.20) where h(y, rj) is a bounded function of y and t]. Although more complicated, it can be shown that this equation also admits a unique solution. Equations (9.4.7), (9.4.9), and (9.4.20) are all examples of singular equations. An integral equation is said to be singular if either its kernel becomes infinite for one or more points or any of the limits of integration are infinite. Being singular does not, of course, mean that the equation is unsolvable. In fact, all three variations of Abel's problem given above admit unique solutions. If the kernel k(y, rj) is itself a bounded function of y and r], then the solution to a Volterra equation of the first kind will be unique for continuous functions f(y) for which /(O) = 0. Volterra equations of the second kind are linear integral equations that can be expressed in the form

(I + K)u==f (9.4.21)

M(0 + / k{t, s)u(s) ds = fit), (9.4.22)

If the kernel kit.s) and function fit) are differentiable, then the Volterra equation of the first kind, Eq. (9.4.3), can be differentiated with respect to t to yield

kit, t)uit) + / k,it, s)uis)ds = fit), (9.4.23)

where k,it,s) = dk/dt and fit) = df/dt. If kitj) :^ 0, then Eq. (9.4.23) becomes a Volterra equation of the second kind. Similar to Eq. (9.4.13), if the VOLTERRA EQUATIONS 379

kernel in Eq. (9.4.22) is of the form k(t — s), then the equation can be solved by Laplace transforms. The Laplace transform of Eq. (9.4.22), in this case, is hz) u{z)=^ -^V . (9.4.24) l+k(z) Turning now to the solvability of these equations, we begin by introducing the following theorem: THEOREM. If f{t) is continuous and the kernel k{t, s) is a continuous function oft and s in the interval [a, b], then the Volterra equation of the second kind has one and only one continuous solution u{t).

To prove this theorem, consider the sequence of functions

Uy{t) = f{t)

U2{t) = fit)- I k(t,s)Uy{s)ds (9.4.25)

«„(0 = /(O - / k{t,s)u„_^(s)ds.

By successive substitution, the function M„(0 can be expressed in the form

u„{t) = f{t)-(k{t,s,)f{sOds,

+ k{t,Sy) k{Sy,S2)f{S2)dSydS2-{"" (9.4.26) + (-1)"-' f k(t, S^) r kiSi,S2) • • • f"'' k{S„_2, 5„_,) •'a •'a "a X f(s„-i)dsids2--ds„_j

or, in operator form,

u„ = f - Kf + K^f + • • • + (-l)"-'K"-'f (9.4.27) = S„f,

where

S„=I + E(-1)'K^''- (9.4.28)

Since k and / are continuous, it follows that

/ k(t, sOfiSi) dsy \

where A = max^<^^<^ \k(t, s)\ and B = max^<,<^ 1/(01- Similarly, M-df_ J n J a I J a "a 2! (9.4.30) and, in general,

\ k{t,s^) k(s^,S2)"' k(s„_^,sjf(sjdsi"-ds„

These inequalities imply that

""-' M-aY K(OI

which, in turn, implies that the series MJ, M2» • • • converges absolutely and uniformly. Thus, the function u{t) = lim„_^oo M„(0 is a continuous function. Since

(I + K)S„=I + (-1)"-*n-l^nK (9.4.33)

it follows that

(I + K)u„ = f + (-l)"-^K'^f. (9.4.34)

However, according to the bound obtained at Eq. (9.4.30), lim„_^^K"f = 0, and so taking the limit of Eq. (9.4.34) as n ^- oo, we find

(I + K)U=:f, (9.4.35)

where u = lim„_^^u„. This proves that the continuous function u{t) = lim„^oo"«(0 is a solution to the Volterra equation. To prove that the solution is unique, assume that u and v are two different solutions. The vector w = u — v then satisfies the homogeneous equation (I + K)w = 0, or

w = -Kw. (9.4.36)

Successive substitution of —Kw for w on the right-hand side of this equation yields

w = (-l)"K"w (9.4.37)

for an arbitrary positive integer n. Since w(t) is a continuous function, the argument leading to Eq. (9.4.31) leads to the conclusion (t - aY (b - aY Mt)\ < A"C- < A''C- ^, (9.4.38) n! n\ where C = max^<,<^ |w;(OI- From this result, we find

\w{t)\ < lim A"C^—^ = 0, (9.4.39) VOLTERRA EQUATIONS 381

requiring w = 0, thus proving that the solution u to the Volterra equation is unique. Besides completing the proof, we have also established that no continuous function satisfies a homogeneous Volterra equation of the second kind. We can extend our analysis of Volterra equations by repeating the above arguments for the general equation

/ k(t,s)u(s) ds-]-oiu{t)^ fit), (9.4.40) Jo where or is a finite, arbitrary constant. The sequence analogous to Eq. (9.4.25) in this case is

M,(0 = -/(0 a 1 1 c' WiW = -fit) - — / k{t,s)u^{s)ds a 01^ Ja (9.4.41)

1 1 c' Unit) = -fit) - — / kit,s)u,_^is)ds. a a Ja By using our previous definitions of A = max^<,^<^ |A;(f,5^)| and B = max^<,<^ 1/(01, we can rederive Eq. (9.4.32) to get

We see that as long as a is non zero, the sequence {u„} converges as n -> oo. From here, the analysis follows exactly as above for the case or = 1. Namely, we can prove that the continuous function uit) = Wm,^^^ u„it) is a unique solution to Eq. (9.4.40), and furthermore, there is no continuous function satisfying the homogeneous equation

f kit, s)uis) ds 4- auit) = 0 (9.4.43)

for a 7«^ 0. Equation (9.4.43) is, in fact, simply the eigenequation (for negative a) for the Volterra operator K. We have, therefore, shown that the Volterra operator does not have any non zero eigenvectors. There still remains the question of whether the equation

/ kit,s)iiis)ds=0 (9.4.44) Jo has a solution. However, the only continuous function uis) that can satisfy Eq. (9.4.44) for any arbitrary value 5 is M = 0 since kit, s) is continuous. Combining this result with our above results allows us to form the following theorem:

THEOREM 9.4.2. Ifkix,y) is continuous in the finite rectangle a < x < b, a Sy Sb, then the corresponding Volterra operators of the first and second kind have no eigenvectors. 382 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

It does not, however, follow from our analysis that the general Volterra equation of the first kind,

/ k{x,y)u{y)ds= f{x), (9.4.45)

always has a unique solution. For example, the equation

f\x - y)u{y) ds=x (9.4.46) Jo has no solution, whereas the equation

f(x-yMy)ds = ^ (9.4.47) ^0 O does have the unique solution u{x) = JC. One should note that a homogeneous Volterra equation of the second kind can have a discontinuous solution. For example, consider the equation

u(t) - f s'-'u(s) ds = f(t). (9.4.48)

The kernel k{t, s) — —s^~^ is continuous, and so, provided that f{t) is continuous, the theorem proved above says that there is one and only one continuous solution to Eq. (9.4.39). This also means that M(0 = 0 is the only continuous function satisfying the homogeneous equation u + Ku = 0. However, the function

u{t) = ct'-^ (9.4.49)

satisfies the homogeneous equation

u{t) - ( s'-'u{s) ds = 0, (9.4.50)

where c is an arbitrary constant. But ct^~^ is not a continuous function, since it diverges as f -> 0. We will end this section with an example of the applicability of Volterra equations. It so happens that linear initial value problems can be converted into Volterra equations. Consider, for example, the equation

^j^a2 +«i(^)~(x) —: +^2W}^ = fM (9.4.51) dx^ dx for X > 0. If we define

d^ w(x) = ^, (9.4.52) dx^ then di f w(xi)^^i+Cj (9.4.53) dx VOLTERRA EQUATIONS 383

and

y= dx2 M(XJ) J^,+Cijc 4-C2, (9.4.54)

where c, and C2 are constants. By interchanging the order of integration of x^ and X2 in Eq. (9.4.54) according to the rule

I dx2 I dx^A^ I dxi I dx2A, (9.4.55) •'O •'O -^0 Jxi Eq. (9.4.54) becomes

y{x) = I {x — Xi)u{x^)dxi-\-C]X + €2^ (9.4.56)

Equation (9.4.51) can now be transformed into the Volterra equation

.X 2 u(x) + / [a,(x)-^a2{x)(x - t)]u{t)dt = f(x) - X^c,a,(x), (9.4.57)

where

ai(x) = a,(jc) + a2(x)x and 0^2 (A:) = a2(x). (9.4.58)

The constants c, and C2 are fixed by the conditions yiO) — YI and y'{0) = 3/2 i" the initial value problem. The pth-order initial value problem dPy dP-^y d^ "^'''^''^d^ + • • • + '''^''^^ == ^^""^ ^^-^-^^^ can similarly be transformed into the Volterra equation

^(^) + r [E«i(^)T--TTrl"(^)^^ = /(^) - EQ«,(^)^ (9.4.60)

where the q are unknown constants, the «, are given by

«iW = E«i+;(^)^' (9.4.61)

and y and its derivatives are related to u through the equations

1^ = / (f_f> ^(0 J, 4- EOT:^- (9.4.62) dxP ' Jo (i- 1)1 p; ^(i -j)\ The values of c, have to be set by the initial conditions dJy(x = 0) = r;>i, 7=0,l,...,p-l. (9.4.63) dxJ When the coefficients a, are all constant, the initial value problem is easier to solve by transformation to a problem in a finite vector space as given in Sections 6.10 and 7.7 rather than by transformation to a Volterra integral equation. 384 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

••• EXAMPLE 9.4.1. Find the Volterra equation corresponding to the initial value problem

^ + exp{-x)^ + x^y = jc, (9.4.64) dx^ ax where

^ = 1 and y = 2 at jc = 0. (9.4.65) dx Set d^y 2 = M. (9.4.66) dx Then

di = f u(t)dt + l (9.4.67) dx and

y=: fix- t)u(t) dt-^x+2, (9.4.68)

and so

u{x) H- / [exp(-jc) + x^(x - t)]u{t) =x- exp(-jc) - 2x^ - jc^ (9.4.69) • • • EXERCISE 9.4.1. Find the Volterra equation corresponding to the initial value problem

d^y nd^y dy T4 + ^VT + sm JC -f- -f J = exp(-j) (9.4.70) dx^ dx dx

y = l, TI=^' T4=^ atjc=0. (9.4.71) dx ' dx^ Write a computer program to solve the Volterra equation iteratively as given by Eq. (9.4.25). EXAMPLE 9.4.2. Let us define p as the capillary pressure needed to force water out of a certain sample of porous rock until it occupies only the fraction s (saturation) of the pore space. How p depends on s is an important question in soil science and in the characterization of aquifers and oil reservoirs. To measure p versus s, one spins in a centrifuge a water-filled cylindrical sample of the porous rock and measures the volume of water removed (spun out) as a function of the spinning rate. The average saturation s of water left in the sample at the spin rate u) is given by

. - r. ^iP')sip) s = Ir^p h (l-Bp'/p) VOLTERRA EQUATIONS 385

where r^ and r2 are the distance of the two ends of the cylinder from the axis of rotation in the centrifuge and

1 r^ p = -Ap w^{rl - r\) and B = \ - A^, (9.4.73)

where A/o is the difference between water and air densities. Equation (9.4.72) is a Volterra equation of the first kind. One varies p to vary s and solves Eq. (9.4.72) for s{p') versus p'. The problem is that experiments are accompanied by error and a Volterra equation of the first kind is ill conditioned (see Linz, 1982), which means that many solution techniques for solving integral equations are overly sensitive to error in the data {s and p). For example, if the data are available on a uniform mesh h = p^^^ — Pj, / being the /th measurement or the /th setting of rotation rate w, then the integral in Eq. (9.4.72) can be approximated numerically by

Si = 2^2^ L 2 ^,^l-Bpj/p,

fori = 1,..., M, where M is the total number of measurements made. This is a linear algebraic system that can be solved by forward substitution. However, since the left-hand side of Eq. (9.4.74) is multiplied by h, it follows that, upon grid refinement (smaller h), error in 5, is amplified as l/h in the solution. The problem can be alleviated by using a least squares technique to smooth the data (Linz, 1982). Other techniques can also be used (Ayappa, 1989). To illustrate the numerical problem in solving Eq. (9.4.74) when s^ contains error, consider the theoretical capillary pressure curve

1, 0 2, P where p is in an appropriate set of units. Suppose r^/r2 = 0.5. Then (ri+r2)/2r2 = 0.75 and B = 0.75. From Eq. (9.4.72), it follows that

s{p) = / . 0 < /? < 2, (9.4.76) P Jo yi-0.75p7p and 0.75 f^ dp' 0.75 fP 1.5/p'-f0.25 , , P Jo yi-0.75/77/? P J2 yi-0.75p7/7

(9.4.77)

+ -— -(tanh-yi~^-tanh-l)( tanh"^ J 1 ~ -i- - tanh"^ ^ ). , /? > 2, 386 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

and Eq. (9.4.74) becomes

0.75 (9.4,78) 2 f^^ VI-0.75;// + ^«

where s^ = s{ih) and .y^ = s{ih). We can write Eq. (9.4.78) in matrix form as

s = As 4- b, (9.4.79)

where bi = 0.75/2/ and 0.75 T» 7 < ^

0.75 (9.4.80) J =1. 2/VI-0.75'

0, J > I, for / = 1,..., M. Suppose now that we generate "experimental" values for s by adding a fixed percentage of random fluctuation to the theoretical values from Eqs. (9.4.76) and (9.4,77). We can generate such data on a uniform mesh of values for /?, resulting in the vector s, which can subsequently be used in Eq. (9.4.79) to solve for s. A Mathematica program is provided in the Appendix for this example, where we show that even a fluctuation of 0.01% results in large deviations from III the theoretical function in Eq. (9.4.75). •••i ILLUSTRATION 9.4.1 (Linear Viscoelastic Stress). The non-Newtonian stress/ strain behavior of certain non-glassy polymers and plastics can be described by a viscoelastic model first conceived by Boltzmann. For an isotropic material (in which we assume the effects of deformation are independent of direction), the modulus G is defined as the ratio of linear stress, r (force/area), to degree of strain, y. G = -. (9.4.81) y For perfectly elastic materials, this relation reflects the fact that an elastic response is essentially an instantaneous one. However, many materials, particularly polymer liquids and melts, exhibit time-dependent responses. The stress/strain behavior of these viscoelastic materials—as we call them—follows a generalization of Eq, (9,4.81) given by xit) ^ ^ ^ G{yj) = -^, (9.4.82)

where we recognize that the relaxation modulus G is a function of time. In general, the relaxation modulus decays with time under applied strain. Physically, we can attribute this to molecular rearrangement of weakly interacting macromolecules. Since these materials are typically highly disordered, the exact functional form of G{y, t) is very complicated. However, we can approximate the behavior by assuming an "instantaneously elastic" response of the form

dx = y dG. (9.4.83) SPECTRAL THEORY OF INTEGRAL OPERATORS 387

(i) By defining a memory function M{t) such that

M(r,t) = -^^, (9.4.84) at show that the isotropic stress at time t is given by

T(r) = - / M{y, t - t')y{t')ds. (9.4.85)

(ii) One method of modeling the relaxation modulus involves imagining a linear collection of independent modes of relaxation. We write

C^(0 = EC7,exp(—Y (9.4.86)

where the sum is over "modes" (n in total), and the constants Gf^ and a^ represent the nominal modulus and relaxation time for mode k. Show that the operator M(t — t') in Eq. (9.4.85) represented by Eq. (9.4.86) is a completely continuous operator in the Hilbert space C2{—oo, b), where b is some finite, yet large time. (iii) Find an eigenfunction (and corresponding eigenvalue) for M{t — f) derived from Eq. (9.4.86). What is the physical significance of this eigenpair? Justify why such an eigenfunction should exist in light of the theorem preceding Eq. (9.4.45). Is Eq. (9.4.85) a Volterra equation? (iv) Consider a viscoelastic material with relaxation modulus given by the general expression Eq. (9.4.86). Derive an expression for the time-dependent stress resulting from an oscillatory strain of the form y = KQ sinft;r. Notice that the stress oscillates with the same frequency as the strain but not generally with the same phase. Derive an expression for the phase shift in the stress for a single mode as a III function of a^^ and co. What are the limits of the shift at low and high frequency?

9.5. SPECTRAL THEORY OF INTEGRAL OPERATORS

9.5.1. Bessel's Inequality Throughout this section we will make important use of Bessel's inequality. Thus, it is appropriate to begin with its derivation. The inequality can be expressed in terms of the following theorem: THEOREM, /f^p 02» • • • '-s* <^'^ orthonotmal set {not necessarily a complete set or even an infinite set) in the Hilbert space H, then, for any vector f in H, the inequality

holds. To prove the theorem, we define

a, = (0,-,f) (9.5.2) 388 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

and note that

||f-E«.-^.f >0- (9-5.3) i Specifically, the above expression will be equal to 0 only if the set is complete. Using the inner product property ||x+Q:y||^ = ||x||^+a{x, y)+a*(y, x)-f |«|^||y||^, we obtain from Eq. (9.5.3)

i i i

where the property ||0,||^ = 1 of an orthonormal function has been used. Since a,. = (0., f) and a* = {f, 0,), Eq. (9.5.4) can be rearranged to yield Eq. (9.5.1)— BesseFs inequality.

9.5.2. Eigenvalue Degeneracy Suppose the kernel kijr, s) of K is square integrable in the Hilbert space H — £2(^0 ^ ^D) (i-^' ^(?.?) is square integrable with respect to r and ?); then K is a Hilbert-Schmidt operator. Suppose also that ^,, / = 1, 2,..., /i, are the eigenvectors of K corresponding to the same eigenvalue A, i.e.,

K^. =A.^, / = l,...,/i, (9.5.5)

where the number h is the degeneracy (or multiplicity in some texts) of A. With- out loss of generality, these eigenvectors can be assumed to be orthogonal (due to the Gram-Schmidt procedure) and nonnalized. Furthermore, they belong to the Hilbert space for £2(^0)- ^^ ^ is an eigenvector in a function space, as is the case for integral operators, we say that ^(r) is an eigenfunction. The terms eigenfunction and eigenvector are often used interchangeably in function spaces. Strictly speaking, ^ denotes the eigenvector in H, whereas ^^(0 denotes a component of ^—analogous to x in £„, and x, a component of x. Consider now the following theorem:

THEOREM. An operator K, whose kernel is square integrable in the Hilbert space H = £2(^0 x ^D)> ^^"^ ^^fy ^ finite number h of eigenvectors for any nonzero eigenvalue A. To prove this, let us consider k*{r, s) to be a function of s and apply Bessel's inequality to obtain

h j:\ik*, fi)\^<\\k*f (9.5.6) 1 = 1

or, equivalently,

EI/ /:(r,?)t/r-(?)rf^|

However, /^ k(r, ?)^,(?) J^i- = AV^,(?), and so Eq. (9.5.7) implies that

\M'E\fi(f)\'< / k\(rJ)\'d''s, (9.5.8)

Integrating Eq. (9.5.8) with respect to d^r and using the property ||^,||^ = 1, we obtain

M^h

We know that the right-hand side of Eq. (9.5.9) is finite since k(r, s) is a function in €2(^0 X Qj)) by hypothesis. Thus, h must be finite as well, proving the theorem. Even though fe(r, s) is a function in £2(^0 x Qjy), the number of eigenvectors of K having zero eigenvalues can be infinite; i.e., a zero eigenvalue can have infinite degeneracy. For example, assume ^,, / = 1, 2..., is an infinite orthonormal basis set in H and suppose k(r, s) = ^i(r)V^*(?). Then

Kf / = fiifu fi) =0, / = 2, 3 (9.5.10)

Thus, the vectors ^,, / = 2, 3, form an infinite number of eigenvectors of K having zero eigenvalue. Actually, we can prove the even stronger theorem:

THEOREM. If K is a completely continuous operator in H and X is a nonzero eigenvalue of K, then the number of eigenvectors corresponding to X is finite.

We proceed by assuming that there is an infinite number of eigenvectors 01, 02» • • • corresponding to X. These can be assumed to make up an orthonormal set. We can add to this set the orthonormal set ^j, ^2' • • • required to form a complete set (or basis set) in H. The ^^ can be chosen to be orthogonal to the 0,. Since 0^, 02,... and ^j, ^2' • • • ^^^ f^^m a complete orthonormal set, we can express the identity operator as

00 I = E*«0l + E^/W- (9.5.11)

Note that the set {fj] may be finite or infinite. Since KI = K, it follows from Eq. (9.5.11) that

00 K = X E0/0J + Y:(^fj)fj'. (9.5.12) r = l J

Because the quantity X YlZi ^j^] cannot be approximated uniformly by a sequence of n-term dyadics, this is a contradiction to our hypothesis that K is a completely continuous operator. Therefore, there cannot be an infinite number of eigenvectors corresponding to a nonzero eigenvalue A of a completely continuous operator. 390 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

9.5.3. Eigenvectors and Eigenvalues of the Function f(K) Just as in the case of a matrix operator, if the function f{t) = YlT=i ^k^^ exists, then the eigenvector 0 of the integral operator K is an eigenvector of /(K). In fact, we can show that

/(K)0, =/(A,)0,. (9.5.13)

If X. ^ 0, then, for a Laurent series f{t) = Y!kL-oo^k^^^ ^^ ^^^^ follows that

/(K)0, = /(A,)0,. (9.5.14)

Equations (9.5.13) and (9.5.14) follow from the properties

K^0, = K(X,0) = X,K0, = X?0, (9.5.15)

and, if A, yt 0,

K-'K = I, 0, = K-'K0i = K-'A,,.^,. = A,K-'0,

or K-Vi = V'0,. •• •' K-*0, = Xr%. (9.5.16)

In order for Eqs. (9.5.13) and (9.5.14) to have meaning, it is, of course, required that fit) exists for f = X,.. For example, if f{t) = YZi ^^"^ f(^i) = 1/(1 - K) for |A.,| < 1 but /(A.j) does not exist if |XJ > 1. We can extend Eqs. (9.5.13) and (9.5.14) to nonseries functions f(t) of t if fit) exists for t = A,. For example, if fit) = f^/^, /~^/^, or In/, respectively, then 0, is an eigenvector of /(K) with eigenvalue A./ , A^ , or InX,, respectively— again, so long as these functions exist for t = X^.

9.5.4. Some Special Properties of Spectral Operators We saw in Section 9.5.3 that, from the definition of an integral operator given in Eq. (9.2.7) or (9.2.12), it follows that

(V, Ku) = (K^v, u) for every v,u eH, (9.5.17)

or that

jd^r v\r)jd^s kir, s)uis) = jd^sljd^r ^(5, r)i;(r)l*M(?). (9.5.18)

A subtlety of Eq. (9.5.17) is that Ku and K^v are defined for any function in the Hilbert space H. This condition defines a large and interesting class of integral operators. However, for differential operators, the domains of v and u of L^ and L are always subsets of the Hilbert space. In the general treatment of integral operators, the domains of K and K^ can be different but are still subsets of the Hilbert space. This more general class of operators lies outside the scope of this text and the interested reader should consult the texts listed under Further Reading for a more general treatment. SPECTRAL THEORY OF INTEGRAL OPERATORS 3 9 I

With the aid of Eq. (9.5.17), we can easily prove the following theorem:

THEOREM. IfK is self-adjoint, i.e., ifK = K^ then (i) the eigenvalues X, of K are real and (ii) the eigenvectors 0, of K fonn an orthogonal set {which can be normalized). To prove this, we assume that K0, = A,,0,. We can, therefore, express Eq. (9.5.17) for these eigenvectors as

{0,,K0,) = (r0,,0,). (9.5.19)

However (0,,K0,> = (0,,A/0,) = X,(0,0,) and (r0,,0,) = (K0,,0,) = (A.,.0,.,0.) = A*(0,.0). Thus, it follows from Eq. (9.5.19) that XMif = A,*||0.f, meaning that A,, must be equal to its complex conjugate X* and, therefore, must be real. To prove part (ii), we set v = 4>j and u = 0^ in Eq. (9.5.17) to obtain

(X,-A,.){0^.,0,)=O. (9.5.20)

It then follows that

(0,.,0,)=O ifX, T^X,.. (9.5.21)

Finally, we note that if the degeneracy of the eigenvalue X, is /?, i.e., if there are p eigenvectors corresponding to A.,, then these eigenvectors can be orthogonalized by the Gram-Schmidt procedure. We can also prove Eq. (9.5.21) for a normal operator. However, we will need to establish another property first. Recall the following definition:

DEFINITION. A normal operator is defined by the property

KK^ = K^K; (9.5.22)

i.e.J K commutes with its adjoint K^ First, we assume that K is a normal operator and then consider the inner products

(V, K^Kv) = (Kv, Kv) (9.5.23)

and

(V, KK^v) = (KV, K^V). (9.5.24)

With the aid of Eq. (9.5.22), it follows that

{Kv, Kv) = (K^, KV). (9.5.25)

Now suppose 0, is an eigenvector of K with zero eigenvalue, i.e., K0, = 0. It follows from Eq. (9.5.25) that K^0, = 0; i.e., 0, is also an eigenvector of the adjoint operator K^ corresponding to a zero eigenvalue. Suppose next that 0, is 3 92 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

an eigenvector of K corresponding to the eigenvalue X, ^ 0, i.e., K0, = A,0, or (K - A, I) 0^ = 0. We now define

L = K - ;t,.I. (9.5.26)

The adjoint of this operator is L^ = K^ — X*I, and since LL^ = L^L, this operator is also a normal operator for which L0, = 0. Thus, according to Eq. (9.5.25) for normal operators, it follows that L^0, = 0, or

K^0,=A*0,. (9.5.27)

We have, in fact, proven the theorem:

THEOREM. If K is a normal operator, every eigenvector 0^ of K with eigenvalue ki is an eigenvector of K^ with eigenvalue A,*. It is now easy to prove the following:

THEOREM. IfK is a normal operator and if ki ^ Xj, then from

(f.,K0,) = {Kt0,,0,) (9.5.28)

it follows that

(X.-X,)(0,,0,)=O, (9.5.29)

and so {4^i,4^j) = 0, proving that if X^ ^ Xj the eigenvectors 0, and 4^^ are orthogonal.

If A,, has a degeneracy /?, i.e., if there are p eigenvectors corresponding to the eigenvalue A-, these eigenvectors can be orthogonalized by the Gram-Schmidt procedure and thus we can summarize the above in the following theorem:

THEOREM. If K is a normal operator, its eigenvectors form an orthogonal set {which, of course, can he normalized). The eigenvectors if^^ of K are also eigenvectors of the adjoint operator K^ and the eigenvalues of K^ are X*, the complex values of the eigenvalues A,, of K.

An example of a normal operator is

L = exp(/fK), (9.5.30)

where / = V —1, Ms a real number, and K is a self-adjoint operator. L is also a unitary operator since LL^ = L^L = I. Next, consider the linear operators K and L, which commute; i.e., they have the property

KL = LK. (9.5.31) SPECTRAL THEORY OF INTEGRAL OPERATORS 393

Such operators obey the theorem:

THEOREM. If K and L commute and if one of the operators, say L, has an eigenvalue k of finite degeneracy h, then K and L have a common eigenvector— i.e., there exists at least one vector x such that

Lx = Xx and Kx = /xx. (9.5.32)

To prove this theorem, we introduce the concept of the null space of an operator L. The null space Mi is the collection of all vectors that are linear combinations of the zero-eigenvalue eigenvectors of L; i.e., any vector x € Af^ can be expressed as a linear combination of the x,, where

Lx,=:0, / = l,2,.... (9.5.33)

Afi can be of finite or infinite dimension. For example, suppose ^,, / = 1, 2,..., is a complete oithonormal basis set, then the null space of

U=fif\ (9.5.34)

is the infinite set ^,, / = 2, 3,..., and all Unear combinations thereof. The null space of the operator

oo L^ = j:^:f>f], X,^0, (9.5.35) 1=2

on the other hand, consists of all scalar multiples of ^^ and the null space of

00 L, = J2^ififl ^./O. (9-5.36) 1 = 1

is empty. With this definition, we can now prove the following theorem:

THEOREM. If L and K commute, then the null space of L {or K) is an invariant manifold of K (or L). What this means is that if x e J\fi (or A/"^), then Kx (or Lx) also belongs to Afi (or Aff^). Still another way to say this is that if x is a vector that L (or K) maps to the zero vector 0, then the vector Kx (or Lx) is also mapped to 0 by L (or K). To prove the theorem, assume that x, e A/^. Then, since Lx, = 0 and KL = LK, it follows that

0 = KLx,. = L(Kx,.). (9.5.37)

Thus, Kx, belongs to Af^ for all vectors x, in Af^. Now we can return to the theorem expressed at Eq. (9.5.32). The h eigenvectors Xi,..., X,,, corresponding to the eigenvalue A. of L, form a finite-dimensional manifold for the null space Afi_xi of L — A.I. According to the theorem just proved. 394 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

the null space of Afi_xi is invariant to K because (L — AI)K = K(L — kl). Thus, if Lx, = A,x,, then

(9.5.38)

Equation (9,5.38) can be summarized in the form

K[x,,...,x,] = [x„...,xJA, (9.5.39)

where [x^,..., x^,] is a row vector with elements x^, and A is the square matrix

*;i2 A = (9.5.40)

^hh

Suppose ^ is an eigenvector (a column vector) of A corresponding to the eigenvalue M, i.e.,

A| = /x|. (9.5.41)

Multiplying each side of Eq. (9.5.39) from the right by ^, we obtain

h h KE^.X,. = [xj,... ,xJA$ = ML?/X,. (9.5.42)

Ky = /xy, y = E?,x,.. (9.5.43) 1=1

Since (L - XI)x, = 0, it follows that

Ly = -^y. (9.5.44)

thus proving that L and K possess a common eigenvector. In fact, the number of eigenvectors L and K have in conmion depends on the rank of A. EXAMPLE 9.5.1. Suppose ^,, / = 1, 2,..., is a complete orthonormal set in % and define

^^fxf\-^f2f\-^Y.>^ifif\. X,^l, />2, (9.5.45) 1=3

and

K = 2^,^^ + 2Vr,^I. (9.5.46) SPECTRAL THEORY OF INTEGRAL OPERATORS 395

Note that

LK - KL = {2f,irl + 2ir,it\) - (2^,f ^ + 2f,fl) = 0, (9.5.47)

and so K and L commute. Since

L^,=f,, / = 1,2, (9.5.48)

it follows from the theorem at Eq. (9.5.32) that K must possess at least one eigenvector in the manifold spanned by |^i and ^2- Indeed, this is true, since

K(^i+^2) = 2(fi + ^2) (9.5.49)

and

Lit, + f^) = (f 1 + f 2)- (9.5.50)

K, however, has only one eigenvector, ^i + ^2* whereas L has two, which can be • • • chosen as ^j and ^j -h ^2* Finally, we note that if K0, = X,0, and K^^y = Vjfj and X^ ^ v*, then {fj,4^i} = 0. To prove this, note that, from the property of the adjoint operator (v, Ku> = (K^v, u>, it follows that (X, — v*)(fj, 0^> =0. Thus, we have proved the following theorem:

THEOREM. If 0, is an eigenvector of K of eigenvalue X- and f^ is an eigenvector ofK^ of eigenvalue Vj, then 4>i is orthogonal to i^j if X^ ^ vj.

9.5.5. Completely Continuous Self-Adjoint Operators We shall begin by stating the following main theorem for the operators of interest in this section:

THEOREM. If K is a completely continuous integral operator in a Hilbert space and is self-adjoint^ i.e. e/K = K^ then the eigenvectors 0i, 02» • • • ^/^ form a complete orthonormal set {i.e., an orthonormal basis set) in H and the eigenvalues of K are real. Incidentally, this theorem is valid for any completely continuous self-adjoint linear operator in a Hilbert space. For example, K could be a matrix operator in E^. As noted before, a linear operator in a function space can be mapped into a matrix operator in E^. The proof of this theorem will be addressed in what follows. We will first, however, review the implications of the theorem and define the spectral resolution of an operator and its functions. Of course, the above theorem means that self- adjoint, completely continuous operators in H are perfect; i.e., their eigenvectors form a basis set in 7^. If y is an arbitrary vector in H and K is a normal, completely continuous operator, then there exists a set of numbers aj, a2». • • such that y = X^,a,0,, where

0„ / = 1,2,..., (9.5.51) 3 96 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

are the eigenvectors of K. Since

{4>i,j)=S,j, (9.5.52)

it follows that a, = (0j, y) = 0fy, or

y = E0-(f-y) = (E^.>l)y (9-5.53)

for arbitrary y eH. This implies that the identity operator can be resolved as

I = E0-f'. (9-5.54)

which, in turn, implies the spectral resolution theorem:

SPECTRAL RESOLUTION THEOREM

K = i:^.-0r0'' {4>iAi)=^ir (9.5.55)

for any normal, completely continuous operator K. Of course, the function /(K) of K also obeys the spectral resolution theorem:

/(K) = E/(^.)^.^J (9-5.56)

if f{t) is defined at / = X,, / = 1, 2,.. .. Let us now outline the proof of the theorem that a completely continuous, self- adjoint operator has a complete set of orthonormal eigenvectors. As a first step, we will prove that a normalized vector y is an eigenvector of a linear self-adjoint operator K in H if it is the vector satisfying the maximum condition

max(y,Ky), ||yf = 1. (9.5.57) y Equation (9.5.57) is a constrained maximum and by introducing the Lagrange multiplier X, the unconstrained maximum of

/ = (y,Ky)-X(y,y) (9.5.58)

gives the vector satisfying Eq. (9.5.57). Expanding y in terms of an arbitrary orthonormal basis set ^,, ^2' - - -

y = E«.T^i- (9-5.59)

and inserting this into Eq. (9.5.58), we obtain

/ = E ^o<«; - ^ E ^>i' (9.5.60) SPECTRAL THEORY OF INTEGRAL OPERATORS 397

where kij = (^,, K^^). The maximum can be found from the conditions df/daf = df/da] = 0, where af and a] are the real and imaginary parts of a,. These conditions yield the eigenproblem

J2kijaj=kai, / = 1,2,.... (9.5.61) j However, Eq. (9.5.61) is equivalent to the eigenproblem

Ky = Xy, (9.5.62)

whose equivalence can be proved by inserting Eq. (9.5.59) into Eq. (9.5.62) and taking the inner product of the result with ^,, / = 1, 2,.... Thus, the solution to Eq. (9.5.57) is a solution to the eigenproblem in Eq. (9.5.62). Actually, Eq. (9.5.62) is the solution only to the extremal problem. If K has no positive eigenvalues, then we would consider the operator —K so as to find a positive maximum. This is permissible since K and —K have the same eigenvectors. The next step in the proof is to recall that a completely continuous operator is bounded, i.e., ||K||^ = maXx^o(Kx, Kx)/(x, x) = M^ < oo. This means that, for any normalized vector y in ^,

(y, Ky) < llyll IIKyll < M||y||2 = M. (9.5.63)

If the largest magnitude eigenvalue of K is positive (as we assume, or otherwise we consider —K), then there exists a sequence of normalized vectors yi, y2, • • • such that

lim(y„,Ky„> = M. (9.5.64)

Recall that a property of completely continuous, self-adjoint operators is that y„ -> y; i.e., there is a vector y eH such that

(y, Ky) = M. (9.5.65)

Thus, y is an eigenvector (say 0^) of K and M is the maximum eigenvalue (say A-i) of K. Next, we hunt maxima of {y, Ky) among the normalized vectors in H that are orthogonal to 0i. If there is more than one eigenvector corresponding to Xj, then this step will generate a second eigenvector 02- Continuing the process will generate all h eigenvectors corresponding to X^. Recall that we proved earlier that h has to be finite for a nonzero eigenvalue of a completely continuous operator. The next step in the process will generate the eigenvectors of the second largest positive eigenvalue of K. Continuing in this manner will eventually generate all the nonzero positive eigenvalues and eigenvectors 0,,02» • • • of K. Carrying out the same process for —K will generate all the nonzero negative eigenvalues and eigenvectors of K. As we can see, the eigenvalues of a nonnal, completely continuous operator form a countable or denumerable (though possibly infinite) set. This property arises from the defining property that a completely continuous operator is the limit of a sequence of n-term dyadic operators. 398 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

Let US label all of the nonzero eigenvalues of K by the sequence Ai < A2 < X3 < • • • and the corresponding eigenvectors by 0|, 02» 03» • • • • ^^ ^^^^ define a new self-adjoint operator

K = K-^X,^,0t, (9.5.66)

and suppose that K is nonzero. There then exist a nonzero eigenvalue /x and its eigenvector ^, i.e.,

Kir = iiiif. (9.5.67)

However, the eigenvectors 0i, 02' • • • of K are also eigenvectors of K of zero eigenvalue. Thus, (0,, ^) = 0, / = 1, 2,..., which implies that

Kf = Kf = fif, (9.5.68)

and, therefore, f must be an eigenvector of K of nonzero eigenvalue. This is, of course, a contradiction since the process we described above generated all of the eigenvectors of K having nonzero eigenvalues. Thus, it follows that K = 0, or

K = X:A,0,0J, (9.5.69) i proving the spectral resolution theorem for self-adjoint, completely continuous operators. If K has zero eigenvalues, then the set of eigenvectors in Eq. (9.5.69) does not form a basis set in H. However, we can add to the eigenvectors 01, 02,..., for which X, ^ 0, an orthonormal set ^j, fj^ • • • needed to form a basis set in H. Since the ^, can be chosen to be orthogonal to the 0,, it follows from Eq. (9.5.69) that K^, = 0. Thus, the basis set 0j02,..., ^i^2» • • • in H are all eigenvectors of K. Since (0^, K0,) = (K^0,, 0^) and K = K^ for a self-adjoint operator, it follows that (k^ — A.*)||0,||^ = 0, and so the eigenvalues of K are real. ••• EXAMPLE 9,5.2. Suppose the vectors 0i02,... form an orthonormal basis set in an infinite-dimensional Hilbert space K. (a) Then

K = E-^0.0j (9.5.70) e=i y/i is a self-adjoint, completely continuous operator with eigenvalues and eigenvectors

A, = -^, 0„ / = l,...,/z, (9.5.71) y/i

and

X. = 0, 0,., / = n -M, n -f 2,.... (9.5.72)

For this case, there is an infinite number of eigenvectors with X, = 0. SPECTRAL THEORY OF INTEGRAL OPERATORS 399

(b) On the other hand, if

00 i (9.5.73)

K is a self-adjoint operator with eigenvalues and eigenvectors

ki = 0, 0,, i = I,... ,n- 1, (9.5.74)

1 0,, / = /i,n -f 1,.... (9.5.75)

In this case, there is a finite number of eigenvectors with A,, = 0. EXAMPLE 9.5.3. Suppose k(t, s) — exp(-/;rr + ins), -I < s,t < I. Then k(t,s) = k*(s,t), and so K is self-adjoint. Since f_^f_^ \k{t,s)\^dt ds = 4, it follows that K is completely continuous. The eigenvectors of K obey the equation

exp{—int) / cxp(i7Ts)v(s)ds = Xv{t). (9.5.76)

VQ = \l\Jl is an eigenvector with AQ = 0 since f ^exp(i7rs)ds = (exp(/7r) — exp(—/7r))//7r = 0. v^ = cxp(—i7Tt)/y/2 is an eigenvector with X2 = 2. Since /_i ^xp{inms)ds = 0, m = ±1, ±2,..., it follows that all of the eigenvectors and eigenvalues of K are

A. = 0 "^=7=2' exp(-r;rO , - 1 (9.5.77) exp(/7rO

exp(-/n7rr) , ^ . ^ ..i i>^ = —-—— , K—^^ « = db2, ±3, ±4, v2 According to our theorem, the set v„ = exp(—/n7rf)/\/2, « = 0, ±1, ±2,..., forms a complete orthonormal set in C2(—l, 1), confirming that any function /(r), — lC2(~l'l) can be expanded in the series ^-A ofn v-^ exp(—/«7rO ^ ^ « /(0 = E«n'^.(0 = -|+ E «n ^ ;;. • (9-5.78) « V2 „=±i,db2,... v2 This is well known from the theory of Fourier series. EXAMPLE 9.5.4. Consider in C2(—oo, oo) the operator

.2 1 „2\ Kv = f^exp(^-^^^^^y(y)dy. (9.5.79) 400 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

(a) Prove that K is self-adjoint. (b) Prove that K is completely continuous. (c) Find the eigenvectors of K. Since k(x,y) — exp(—(x^ + y^)/2) = /:*(j,jc), the operator is self-adjoint. Also,

OO AOO y.OO /.OO / / [^(^, yyfdxdy = / exp{-x^)dx I exp{-y^)dy = TT < oo, -OO *'—OO '^—OO *'—OO (9.5.80) which is a sufficient condition that K is completely continuous. The eigenproblem is

expf ^"l f^cxp(^y(y)dy = kvix). (9.5.81)

One eigenvector is VQix) = a exp(—A:^/2), for which Eq. (9.5.81) yields

Qxp(^ja I ^exp{-y^)dy = X^aexpl^j (9.5.82)

or XQ = -JH. Normalizing v^, we obtain

1 = ||vi f = a^ P exp( ^ j dx = or^V^ (9.5.83)

VQ(x) = ~^expl-^Y XQ = 7T'^\ (9.5.84)

This is the only eigenfunction of K with a nonzero eigenvalue. The eigenfunctions corresponding to A. = 0 obey the equation 0''K~Ty• {y)dy = 0. (9.5.85) The first two of these are

v,ix) = -jl=2xcxp(-^\ (9.5.86)

and

Vjix) = } {4x^ - 2)exp( ^ 1. (9.5.87) 27 v^ m In general,

v„ix) = —J==H„(x)cxp(-^), (9.5.88) SPECTRAL THEORY OF INTEGRAL OPERATORS 40 I

where //„(JC) is a Hermite polynomial generated by the formula

H„(x) = (-\rexpix')£-^{expi~x^)). (9.5.89)

From the theorem for completely continuous self-adjoint operators, it follows that ^oMi Vi(jc), V2{x),... form an orthonormal basis set in C2(—oo, oo). The functions

^"^^^ n =0,1,2,..., (9.5.90) V2"n!v^

• I • therefore, form an orthonormal basis set in Cii—oo, oo; exp(~x^)).

9.5.6. Schmidt's Normal Form for Completely Continuous Operators The theorem of interest in this section is:

THEOREM. If K is a completely continuous operator in H, then it can be expressed in the form

K = Y.K,f,4>\, (9.5.91)

where the /c, are real numbers greater than 0 and the vectors 0, and ^,, / = 1,2,..., are orthonormal sets in H satisfying the equations

Kf, = K,4>. (9.5.92)

and

K^4>,=K,f,, (9.5.93)

From Eqs. (9.5.92) and (9.5.93), it follows that ^, and 0, are eigenvectors of the self-adjoint operators K^K and KK^; namely, they satisfy the eigenequations

K^K0,. = ^20. (9.5.94)

and

KK^f,= /cf^,.. (9.5.95)

A completely continuous operator can be approximated as closely as we please by an n-term dyadic. With this approximation, linear equations involving completely continuous operators can be transformed into matrix equations in a finite- dimensional vector space, in which case, the theorem has already been proven in Section 7.7. Thus, heuristically, we could anticipate the validity of the theorem for completely continuous operators without further work. Those with little patience for theorem proving can ignore the rest of this section as we will outline the details of the rigorous proof in what follows. 402 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

Completely continuous operators have the property that if x„ -> x and y„ -^ y, then (y„, Kx„) ~> (y, Kx) as n -^ oo. By definition, the null spaces J\fjc and A/^t of K and K^ are subspaces of H spanned by the linearly independent solutions of

Kx = 0 (9.5.96)

and

KV = 0. (9.5.97)

We note that the solutions of Eqs. (9.5.96) and (9.5.97) are the zero-eigenvalue eigenvectors of K and K^ respectively, and these eigenvectors can be orthonormalized. We also note that every vector in A/^ or A/^t can be expressed as a linear combination of the corresponding orthonormal set. However, these orthonormal sets do not form a complete set because K and K^ have at least one nonzero eigenvalue unless K = K^ = 0. We consider now all vectors x and y in H such that ||x|| = ||y|| = 1. Since K is bounded, it follows that

|{y,Kx)|2<^, (9.5.98)

where B, the least upper bound of |(y, Kx)|^, is positive and finite. From Eq. (9.5.98) it follows that there exist weakly convergent sequences x^"^ and y^"^ such that

lim |{y^'^\ Kx^''^>|2 = B. (9.5.99) n->oo We denote the Umits of x^"^ and y^"^ by Xj and y^, i.e., limx^"^ = x^ and limy^"^ = yi (recall that weakly converging sequences—Cauchy sequences—converge to a limit in a Hilbert space). Because K is completely continuous, it follows that

lim |(y^'^\ Kx^''^)p = Ky^, Kxi>p = B. (9.5.100) n~>oo Without loss of generality, we can assume that X| is orthogonal to all of the vectors in the null space jV^, and y^ is orthogonal to all of the vectors in the null space JV^t. We now set

(yi, Kxi) = /CR + //ci = /c, (9.5.101)

where K^ and K^ are the real and imaginary parts of /c. It immediately follows that B = ^^ + ATj^. Consider next a normalized vector x that is orthogonal to Xj and the null vectors of K. We assume that (y, Kx) :^ 0 and set

(yi, Kx) = /XR + ifji, = M. (9.5.102)

Choosing Cj and C2 such that \c^\^ -f |c2p = 1, we consider the vector c^x^ + C2X, noting that ||c,Xi 4-C2xf = kiPHxif + k2p||xf = 1, and, therefore,

|(yi, K(ciXi +C2X)>|' = Icj/c +C2/xp = c^Hc, (9.5.103) SPECTRAL THEORY OF INTEGRAL OPERATORS 403

where

and H = (9.5.104)

The maximum value of c^Hc is attained when c is the normalized eigenvector c„ of H corresponding to the largest eigenvalue X^(= |^|^ + |M|^). Setting c = c^, we obtain

|(yi, K(c,Xi + C2X))P = X^llc^ll^ = kp + M\ (9.5.105)

However, we already know that

|(y„Ax)|^

and so it follows that /JL must be equal to 0, and, therefore,

{yi,Kx)=0. (9.5.107)

This proves that Eq. (9.5.107) is true for all vectors x in 7{ that are orthogonal to Xj and to the vectors in the null space Af^^ of K. We can rewrite Eq. (9.5.101) as

(KVi,Xi)=fc (9.5.108)

using the general property, (K^v, u) = (v, Ku), of the adjoint operator, and prove by similar considerations that

(ry,xi)=:0 (9.5.109)

for all vectors y orthogonal to y^ and to the vectors in the null space AT^t of K^ Next, we assume that

Kxi-/cy, =x. (9.5.110)

It follows that, for any vector y in A/^t (all vectors such that K^y = 0),

(y,x) = {y,Kx,)-/c(y,yi) (9.5.111) = (KV,X,)=0. Also, (yi.x> = (y,,Kx,>-/<:(y,,y,> (9.5.112) = K — K =0,

and from Eq. (9.5.109) it follows that

(y,x) = {y,Kx,>-fc(y,y,> (9.5.113) = (ry,x,)=0, where y is any vector orthogonal to y, and to the vectors in M^i. 404 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

From Eqs. (9.5.1 ll)-(9.5.113), we can conclude that x is orthogonal to an orthonormal basis set in the Hilbert space H. Therefore, x = 0, and

Kxi=^yi. (9.5.114)

KVI =/C*X,, (9.5.115)

Finally, if we express K in the form K^ cxp(iO), where K^ = \K\, and we replace the unit vectors Xj and yj with the unit vectors exp(—/^/2)Xj and exp(f^/2)yi, Eqs. (9.5.114) and (9.5.115) take the form

Kx, = ^,yi and KVI='<^IXI, (9.5.116)

where ||Xi|| = HyJI = 1 and 0 < K^ < oo. Note that, for the vectors exp(-/^/2)Xi and exp(/^/2)y,, we have again used the symbols x, and y^ The procedure used above can be used again to find the maximum of | (y, Kx)p among the normalized vectors x and y in ?{ such that x is orthogonal to x^ and the vectors in A/^, and y is orthogonal to y^ and the vectors in A/^t. The result is that there exist vectors X2 and y2 such that (y2, KX2) = KJ, where K2 < K^ and

Kx2 = K2y2 and KV2 = '^2^2- (9.5.117)

It is, of course, possible that K^ = ^2* ^^^ there will only be a finite number of pairs (x,, y,) for which this is tme. Continuation of the procedure will yield the orthonormal sets {x,, X2,...} and {yi»y2'- • •} such that

Kx,. = /c,y, and KV, = ^iX^, ATJ > /Cj > /C3 > • • • . (9.5.118)

The vectors x,, X2,... constitute a basis set for all of the vectors in H that are orthogonal to Afj^. Likewise, the vectors y^ y2, • • • constitute a basis set for all of the vectors in K that are orthogonal to Mf^u The union of {x,} and an orthonormal set {x9), spanning jV)^, forms an orthonormal basis set for H as does the union of {y,} and an orthonormal set {yj^}, spanning A/^t. Thus, if we define the set 0i, 02' • • to be the set Xj, X2,..., Xp x^,... and the set ^^ ^2^ • • to be the set yi, y2» • •, y?* y2,. ., we have proved Eqs. (9.5.92) and (9.5.93), and since

I = E*/0j^ (9.5.119)

it follows from K = KI = Ef K0,.0j = X!, 'c.^e^J that Eq. (9.5.91) is true. This completes the proof of the theorem. As an illustration of the use of Schmidt's form for K, consider the integral equation

Ku = f. (9.5.120) SPECTRAL THEORY OF INTEGRAL OPERATORS 405

This problem has a solution only if

f = E«/^.- (9.5.121)

where the prime on X)' means that the sum is restricted to the functions 0, for which K0, 7«^ 0. When this condition is obeyed, the solution to Eq. (9.5.118) is

u = E'^^^^^/ + E"^^0.' (9.5.122) i ^i i

where the c, are arbitrary and the double prime on Y!' indicates that the sum is over the basis functions in M^. The quantities 0,, f^, and /c, can be computed from the self-adjoint equations (9.5.94) and (9.5.93). Another application of the theorem will be given in the next section.

9.5.7. Completely Continuous Normal Operators Again, let us begin by stating the main theorem of interest in this section:

THEOREM. If K is a completely continuous integral operator in a Hilbert space and is nonnal, i.e., KK^ = K^K, then the eigenvectors 0,,02» • • • <^/K form a complete orthonomial basis set in H. The theorem is actually valid for any completely continuous, normal, linear operator in a Hilbert space. For example, K could be a matrix operator in E^. As noted before, a linear operator in a function space can be mapped into a matrix operator in E^. As was the case for self-adjoint operators, this theorem immediately leads to the spectral resolution for the operator: K = E^-^/^J' (9.5.123) i and for a function /(K) of the operator:

f(K) = Y:f(X,)4>,il>], (9.5.124)

as long as the function f{t) is defined for ^ = X,, / = 1, 2,.... The proof of the theorem follows easily from Schmidt's normal form of K. We first note that K = XI, '<^i^/0j» where 0, and ^, satisfy the equations K^K0, = /cf0, and KK^^, = /c?^,. Since K^K = KK^ for normal operators, 0, and ^, can differ only by a multiplier of modulus 1, i.e., ^, = exp(/^,)0,, where O^ is real. Thus, Schmidt's normal form becomes

K = X^/c,exp(/^,)0,0j, (9.5.125)

where {0,} is an orthonormal set and the eigenvalues of K are A., = AC, exp(/^,). We note that, in the case of a normal operator, X, can be either real or imaginary. Also, the vectors {0,} in Eq. (9.5.92), plus an orthonormal set spanning the null space Mjc, form an orthonormal basis set in the Hilbert space H—completing the proof. 406 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

The spectral resolution theorem for self-adjoint operators is, of course, a subcase of the theorem for normal operators, and so, in actuality, Section 9.5.5 could have been skipped. •^•1 EXAMPLE 9.5.5. Consider the equation

^ = /Ku, u{t = 0) = Uo, (9.5.126) at where i = ^/^ and K is a completely continuous, normal operator in H. Assume that the imaginary part of the eigenvalue X„ goes as l//t^/^ and examine the asymptotic behavior of u(r). The formal solution to Eq. (9.5.126) is

u = exp(//K)Uo = X]exp(/aj{0„, Uo)0„. (9.5.127) n The length of u is given by ||u|p = ^exp(/r(X„ -x:))|(0„,uo)p

Eexpf^jl(0„,uo>|' (9.5.128)

Thus, u remains a vector in the Hilbert space ?{ as r -> oo. In fact, its length is • • • bounded by the initial length ||UQ||.

PROBLEMS

1. (a) What are the eigenvalues and eigenvectors of the integral K whose kernel is f2, sinnrsinn^

in the Hilbert space £2(^» ^V- Note that K is self-adjoint, (b) Give the solution to the problem dn

where

u{t = 0) = 1.

2. (a) Convert the equation d^u + / sin2(5* - t)u{s) ds = cosr, 'dF with M(0) = duiO)/dt = 0, into an integral equation, (b) Solve the equation. PROBLEMS 407

3. The exact solution of the equation

u(t) 4- / exp((r — s))u{s)ds = exp(0

u{t) = 1.

Use the trapezoidal method to find an approximation u{t) to the equation. Find the discretization interval A^ below which iiu-uf

where \\uf = f^\u{t))^dt.

4. Use the method of Laplace transforms to solve the equation

u(t) — I sm{t — s)u{s)ds = 1.

5. Consider the Abel-type equation of the form

I {t- sY^'hit, s)u{s) ds = /(O, 0 < a < 1, where h(t, s) is a continuous function. If the variable s is discretized such that ti < s < r,^i, and h{t, s)u{s) is approximated in the interval (r,, r,^i) as k(t, ^,+1/2)^(^,4.1/2), the integral equation becomes

E ^nMtn. ^/+l/2)w, + l/2 = fiQ. H = 1, 2, . . . , 1=0 where

^ni = / i^n-s) "^ ds

and M,^|.i/2 = u(t — ^(/, 4- ^,+1)). This is known as the midpoint method. Assume that /?(/, s) — I-{-s, a = \, f = ^t^^^ + ^t^^^, and t^^^ - t^ = h. Solve the equation for /i = ^, ^, and ^ for 0 < / < 2. Plot the midpoint solutions versus the analytical solution, which is u{t) — t (verify). 6. Prove that the operator

Ku = / uu(s)ds, 0 < a < 1, J-i (t - sY is completely continuous m C2{—\A)' 7. Prove that the operator

I h(t c) ^ ' ^ u(s)ds, 0

8. Suppose u(x) satisfies the equation du f"^ -— =au + P exp(-(A: - y))u{y) dy, dx JQ where u{0) = y. Integrate the equation to obtain the Volterra equation

u{x) = r -f / k{x- y)u{y)dy. (*) Jo What is k{x — y)l Solve (*) by the method of Laplace transforms for a = p = y = {. 9. In a certain dielectric material, the electric displacement D(t) is related to the electric field £"(0 by

D{t) = €E(t) -\' f 4>{t- s)E{s) ds, JQ where D{t = 0) = €EQ, € is the dielectric constant and (j)(t — s) is the "memory function" of the material. If the memory is exponential, i.e..

0(0 =:aexp( — j,

use the method of Laplace transforms to find E{i) as a function of D(i), Suppose 6 = 10, a = 5, and r,. = 1 and assume that E{t) = E^ siuTrr. Plot D/JE'O and EjE^ versus t. 10. Consider the equation

= Ax -h / A(? — 5)x(5*) ds. (*) dt where x(0 is an n-dimensional vector function of t and A(0 is an « x n matrix function of t. Suppose

x(/ = 0)

and

a,/0 = «/; exp(-y,.^.0,

where Qf,^ and i',y are constants (y,, > 0). Moreover, assume that A is real and Of., = a^, and y,^ = y^,. Use the method of Laplace transforms to find the formal solution to (*). For the special case n = 2, a,i = 0^22 = —2, 0^12 = 0^2 y,, = 1, Of = 1, and x\—x\ = 1. Find x^{t) and X2(t). Plot the results for 0 < / < 5. 11. Consider the following kernels of self-adjoint operators in C^i—n, n): (a) k{t,s) =:4cos(r - s). (b) k{t,s) — 1 +cos(r ~ s). PROBLEMS 409

(c) k(t, 5) = 1 + cos(r - s)-\- sin 2(r + s). (d) k(t,s) = sin3(^ - s). Determine all of the nonzero eigenvalues and their eigenvectors for each of these operators. Give the spectral decomposition of operator K. Give the spectral decomposition of exp(aK). 12. Find the Schmidt's normal form for the operator K in C2{—7T, TT) if the kernel of K is given by

k{t,s) = sin r cos 5".

Give the spectral decomposition of KK^ and K^K. 13. Solve

+ / sinaO -s)u(s)ds — f(t), Jo with the conditions u(t) = du{t)/dt = 0 at r = 0. Plot the solution u{t) versus r for a = 1 and f{t) = exp(—0- 14. Consider the operator K such that

Ku = / k{t — s)u{s) ds,

where kit) = /~^ 0 < r < 1, and k{t) = 0, r < 0. Prove that K is a completely continuous operator in C2{—\, 1) when 0 < v < 1. Hint: Show that K is a Hilbert-Schmidt operator using the orthonormal basis set 0„ = |expO>i7rO, « = 0, ±1, ±2,. .., in C2{—\, 1). 15. Consider the operator K with the kernel

k{x, y) =.y^-^ sin(n + \)x sinny, 0 < jc, y < jr.

(a) Prove that K is a Hilbert-Schmidt operator and that K has no eigenvalue. (b) Solve

(K + I)u = f,

where

f{x) =x.

16. Consider the eigenproblem oo k{x, y)u{y) dy = Xu{x), —00 < jc < 00, / -00 where .2 _L „2 \ / ^2 ^ y2 _ 2yxy \ k{x. y) = (1 - v^r'f^tx^i^^\ expj"-

where v is a real number lying between 0 and 1. 4 I 0 CHAPTER 9 LINEAR INTEGRAL OPERATORS IN A HILBERT SPACE

(a) Show by substitution that

UQ{X) = exp(-f)

is an eigenfunction corresponding to the eigenvalue k^ — A/TF. (b) Let

«nW=expf — j//„(x),

where ^„(JC) is a Hermite polynomial, given by

H,{x) = (-ir exp(x2)|^[exp(-x2)].

Assume that Ku„ = A,„u„ and show that Ku„_,.| = A„^,u„^i, where X„_j.i = vX„. Thus, by induction, establish that u„, n = 0,1, 2,..., are eigenfunctions of K of eigenvalue X„ = v^^/ic. (c) Prove that

/ -oo and give the spectral decomposition of K (the vectors {u„} form an orthogonal basis set in C^i—oo, oo)). 17. Consider the self-adjoint operator K in ll2{—oo, oo) whose kernel is

/2 , ,2> kit, s) = ~^ expf-^ j j ^ exp(-f ^) Jf I exp(~^^)^?, t < S,

(a) Show that the Hermite functions //^\ J"exp(-r^)

are eigenfunctions of K with eigenvalues A„ = 2n -f- 2. These Hermite functions form a complete orthogonal basis set in C2(—oo, oo). (b) Give the solution to the equation du/dt = —Ku, M(f = 0) = 1. 18. Show that the eigenfunctions of the self-adjoint operator K in £2(0* ^ with the kernel /f + 2\ /-^expC-r) , kit, s) = exp( -— 1 / —^—- dx, 0 < r < 5,

are the orthogonal Laguerre functions

with eigenvalues X„ = n + 1. FURTHER READING 411

FURTHER READING

Akhiezer, N. I. and Glazman, I. M. (1963). "Theory of Linear Operators in Hilbert Space." Vol. II, Ungar, New York. Akhiezer, N. I. and Glazman, I. M. (1966). "Theory of Linear Operators in Hilbert Space." Vol. I, Ungar, New York. Courant, R. and Hilbert, D. (1953). "Methods of Mathematical Physics." Interscience Pub., Inc., New York. Green, C. D. (1969). "Integral Equation Methods." Barnes & Noble, New York. Hochstadt, H. (1973). "Integral Equations." Interscience, New York. Korevaar, J. (1968). "Mathematical Methods." Vol. I, Academic, New York. Linz, P. (1985). "Analytical and Volterra Methods for Volterra Equations." Soc. for Industr. & Appl. Math., Philadelphia. Lovitt, W. V. (1950). Linear Integral Equations, Dover, New York. Petrovskii, I. G. (1957). "Integral Equations." Graylock Press, Rochester, New York. Porter, D. and Stirling, D. S. G. (1990). "Integral Equations, a Practical Treatment, from Spectral Theory to Applications." Cambridge Univ. Press, Cambridge. Riesz, F. and Nagy, B. S. (1965). "Functional Analysis." Ungar, New York. Schmeidler, W. (1965). "Linear Operators in Hilbert Space." Academic, New York. Schwabik, S. and Turdy, M. (1979). "Differential and Integral Equations, Boundary Value Problems," Reidel, Dordrecht. Smithies, F. (1958). Integral Equations. Cambridge Univ. Press, Cambridge. This Page Intentionally Left Blank LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

10.1. SYNOPSIS

Lu = J2a,{x)pr, (10.1.1)

413 4 I 4 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

and a^ix) are continuous and a^ix) ^ 0, then the inhomogeneous equation Lu = /, with the initial conditions

^ = y., X = a, / = 0,..., /7 - 1, (10.1.2)

has a unique solution for any piecewise continuous function f(x). This is the initial value problem (IVP). The equation

Lw = 0, (10.1.3)

where L is defined in Eq. (10.1.1), has p linearly independent solutions [u^(x),.,,, Up(x)], where Uj(x) corresponds to the initial conditions

-£f ^ K,^, jc = a, /= 0,...,/?- 1, (10.1.4)

such that the vectors yj = {y^j,..., Kp-i.y), 7 = 1,...,/?, form a linearly independent set. Thus, the functions {M^, ..., M^} are solutions to the homogeneous IVP corresponding to a p linearly independent set of initial values of u{x) and its first /7 — 1 derivatives. The set {M,} is referred to as a fundamental system. Fundamental systems are useful for solving boundary value problems and eigenvalue problems. The boundary value problem (BVP)

LM = /, a < X < b,

for a /7th-order differential equation places constraints on u and its first p — I derivatives at both ends of the interval [a,b]. These conditions can be summarized as

BiU = yi, /=:l,...,m, (10.1.5)

where B^u is some linear combination of u and its first p — I derivatives at jc = a and b. The BVP may have a unique solution, a nonunique solution for certain functions /, or no solution at all. If, for the operator defined in Eq. (10.1.1), a^ix) e 0(a,b), i.e., a, has a continuous /th derivative, then the following Fredholm alternative theorem holds: the equation

LM=:/, B,M = 0, z = l,...,m, (10.1.6)

has a solution if and only if

(v,f>=0, (10.1.7)

where v is any solution to the homogeneous adjoint equation L^v = 0. If there is no solution to L^i; = 0, then the solution to Eq. (10.1.6) is unique. If L^i; = 0 has V solutions, then the solution to Eq. (10.1.6) will be Up + X^J^j«/";,., where a^ are arbitrary, Lup = /, and LM;, = 0, / = 1,..., V. The functions M,J. are solutions to the homogeneous problem LM = 0, B,M = 0, / = 1,..., m. If 5,M = y^, i = 1,..., m, then the necessary and sufficient condition for a solution to Lu = f is that (v, f) equal a particular linear combination of y, for each solution v to the SYNOPSIS 415

homogeneous adjoint equation L^u = 0. Note that this theorem is analogous to the Fredholm theorem in Chapter 4 for finite vector spaces. We will prove that the inverse of a differential operator L is an integral operator G, where the kernel of G is called Green's function. If the homogeneous equation LM = 0 has no solution for the given boundary conditions, then we will show that G exists. For the pth-order differential operator defined in Eq. (10.1.1), the operator G is a completely continuous integral operator. This property enables us to use what we learned in the previous chapter for completely continuous integral operators to show that differential operators are often closely analogous to matrix operators with respect to spectral (eigen) properties. If g{x, y) is Green's function corresponding to the inverse of L, then g*(y, x) is equal to the kernel g\x, y) corresponding to the inverse of L^ We will show that the inverse G of L always exists for the IVP and it may or may not exist for the BVP. In many cases, L can be transformed to an operator that does have an inverse and that has the same spectral (eigen) properties as L. We say that L is a perfect operator if its eigenvectors {0„) form a complete set. For such an operator, the resolution of the identity is I = JZn ^ni^l ^^^ the spectral decomposition of L is L = J2nK^ni^l^ where the set {^„} is the reciprocal of the set {0„} (i.e., they obey the biorthonormality condition (0^, ^„> = 8„^) and are the eigenvectors of the adjoint L^ If f(t) is defined for r = A,„, n = 1, 2,..., then the spectral decomposition of /(L) is /(L) = 5^„ f(.K)4>ni^l' ^^ ^^^ ^^^^ much of our analysis of perfect operators from Chapters 6 and 9 apply also to differential operators. We will show that if L is a pth-order differential operator, then the number of eigenvectors corresponding to a given eigenvalue must be less than or equal to /?. If the differential operator L is a regular self-adjoint operator, then its eigenvectors {0„} form a complete orthonormal set in the Hilbert space of the operator. Furthermore, its eigenvalues are real. We require that the coefficients in the differential expression of a regular differential operator have no singularities, a^ is strictly positive, and the domain of definitions of the variable x of u{x) (interval [a, b] or volume ^„) is finite. We will see that some regular operators whose differential expression is normal also have a complete orthonormal set of eigenvalues. Second-order differential operators, whose boundary conditions make them self-adjoint, are called self-adjoint Sturm-Liouville operators. With certain boundary conditions, self-adjoint Sturm-Liouville operators are bounded from below and so the eigenvalues obey the conditions |X„| > M < oo and |X„| -^ oo as n -^ oo. We will see that regular self-adjoint Sturm-Liouville operators have a complete orthonormal set of eigenvectors. Furthermore, the eigenfunctions of these operators obey an oscillation theorem; i.e., if we label the eigenfunctions by increasing value of their corresponding eigenvalues, the nth eigenfunction will have exactly n zeros in the open interval (a.b). Singular self-adjoint Sturm-Liouville operators (i.e., operators having singular coefficients in their differential expression, having zeros in ap(x), or being defined on an infinite interval [a, b] or volume Q„) may have a complete set of eigenfunctions or may possess continuous spectra. In the case of singular self-adjoint operators, the spectral decomposition of L is L = X^m ^m4^m4^m + I ^k^k^l^^ ^^ one dimension and L = JZ^ ^m^m^m + / ^k^k^l ^"^ i^ ^ dimensions. Here, the 0 4 I 6 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

are eigenvectors and the vectors u obey Lu = Au. However, the u are not eigenvectors since they do not belong to the Hilbert space. Specific examples are given for the various types of singular self-adjoint operators. The reason that these self-adjoint Sturm-Liouville operators are given so much attention is that they occur often in problems of heat and mass transfer, structural vibrational problems, fluid mechanics, and the quantum mechanics of particles. Several examples of applications to these problems are given in this chapter.

10.2. THE DIFFERENTIAL OPERATOR

Matrices in finite-dimensional vector spaces, and the classes of integral operators of the types considered in Chapter 9, form linear operators for any vector in the respective Hilbert spaces. For this reason, we have not yet made a distinction between the domain D^ of an operator L and the Hilbert space H in which the operator is defined. The domain X>^ is composed of the subset of vectors u in K for which the object Lu is defined. For example, if the differential expression for the operator L is

Lu = -V + q{x)u{x) (10.2.1)

and H = C2(a,b), then the domain of L will include only those functions in £2(^5 b) that are twice differentiable. If we further restrict the domain of functions to those for which Lu belongs to the Hilbert space C2(a, b), then P^ contains only functions such that ||Lu|| < 00. Thus, for the example given by Eq. (10.2.1), we required that

(Lu, Lu) = / ( --i-r + '7*w* 11 -—^ -^qujdx < 00, (10.2.2)

The definition of an operator is completed by specifying the boundary conditions that the functions u(x) obey. Consider the inhomogeneous equation

Lu(x) = fix), a

where the interval [a, b] is finite and L is a /?th-order differential expression of the form

d^uix) dP-'u{x) Lu{x) = a^(x)-^—- -f S-i(^) ^^p-x + * • • "" (10.2.4) du{x) -f-fli(x)—; \-aAx)u{x). dx THE DIFFERENTIAL OPERATOR 417

In this chapter we assume that the boundary conditions will be set by fixing the values of linear functional of the form

BiM ^ X: «U"^'*~'H«) + E «l, .+;"^'"'H^) 7=1 7=1 (10.2.5)

.7 = 1 7 = 1

where M^'^^,§ = a,b, denotes the ith derivative of u (i.e., d^u/d^\ evaluated at ^ = a and b). We assume that these functional are linearly independent, which requires that the rank of the matrix

'U2p B = (10.2.6)

a,m l ^m,2p J is m, or, equivalently, that the row vectors

/ = 1,... ,m, (10.2.7)

are linearly independent. If the boundary conditions are set by B^u = y,, / = 1,..., m, where the y, are given numbers, then the domain of the operator defined by Eqs. (10.2.4) and (10.2.5) is expressed as

D^ = {u, LUG?/; ^,M = }/,, / = 1,...,W}. (10.2.8)

A /?th-order differential equation with p linearly independent boundary conditions is called a "balanced problem." Only for a balanced problem is there any hope of obtaining a unique solution. On the other hand, an unbalanced problem can have a solution and a balanced one can fail to have one. For example, the equation

(10.2.9)

with no conditions on M, i.e., m = 0, has the solution

u(x) = a -^ Px —— 6 for arbitrary a and p. Thus, there are many solutions to this problem. However, the balanced problem d^u(x) MX0) = W'(1) = 0 (10.2.10) "~d? = ^, has no solution because the boundary condition M'(0) = 0 implies ^ = 0, whereas the condition u\\) = 0 implies j0 = |, a contradiction. On the other hand, the balanced problem

= jc, M(0) = M(1) = 0 (10.2.11) dx^ 4 I 8 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

has the unique solution

u(x) = -x(\-x^), (10.2.12) 6 It is important to realize that a differential operator L is defined by its differential expression L and its domain V^. For example, even though L is the same in the following cases, the operator L is different in each case:

(i) Lu = --j-j, w(0) = 0 and M(1) = 0, (10.2.13)

d^u (ii) Lu = -—^, M(0) = M(1) and u\l) = 0, (10.2.14) dx d^u (iii) Lu = -—y, M(0) = M(1) and u'{0) = M'(1), (10.2.15) dx^' d^ (iv) Lu = -—^, M(0) = 0 and u'il) = 0, (10.2.16) "dx^

where M'(JC) = du(x)/dx. In addition to the need to identify the domain X>^ of a differential operator as a subset of a Hilbert space H, the differential operator differs in another fundamental way from integral operators. Whereas some classes of integral operators are bounded, there are no bounded differential operators. A simple example illustrates this point. Suppose

Lu = --y-j,fu M(0) = M(1)=0. (10.2.17) 'dx^ Then the sequence of functions

,3/2 ft /—nx\ "n = ^^(^ - ^^^^A~Y~) (10.2.18) belongs to the domain P^ defining the operator L in £2(0* 1). It is straightforward to show that

ujr->l asn->oo (10.2.19)

and

IILuJI^ 2::: — -> 00 as n -» 00. (10.2.20) 16 Thus, since

IILuli max V7^ = oo, (10.2.21) ueP, ||u||

the norm of L is unbounded. THE DIFFERENTIAL OPERATOR 419

If L is a /7th-order differential operator, it can be easily proven that L is unbounded. Consider the sequence of functions

u„{x) = K(x) exp(^=^^^^^—^) (10.2.22)

in Vj^, where V^ C C2{a,b) and the interval [a,b] is finite. Assume, moreover, that the functions h„{x) are chosen such that the functions M„ obey the boundary conditions of L and that

IKlP = l. (10.2.23)

Such functions can always be constructed, as we did in Eq. (10.2.18). For this sequence of functions, it follows that II Lull 2 •V-;;!^ -> oo (as n^^) as /i -> oo, (10.2.24) ll«lr proving that any pth-order differential operator is unbounded in a finite interval [a,bl A similar proof can be constructed if the interval is infinite, say [—oo, oo]. In the Hilbert space C2(—oo, oo), the boundary conditions on the vectors in the domain of a differential operator are that ||u|| < oo and ||Lu|| < oo if u G I>£^. We can construct functions

u^(x) = h„(x)txp(-nx^), n = 1, 2,..., (10.2.25)

where ||u„|p = 1, h„(x) has the necessary differentiability, and \h„{x)\ < exp(—jc^) as |jc| -> oo. These functions belong to V^ and it is straightforward to show that

\\Luf -> oo (as n^) as n -» oo. (10.2.26)

Thus, we see that all differential operators are unbounded. To anticipate what we will learn later, let us consider a differential operator L whose eigenvectors 0„, n = 1, 2,..., form a complete orthonormal set. Then the resolution of the identity takes the form

CXD I = E^»^I or i(x.y) = Hx-y) = J2„{x):(y) (10.2.27) «=1 n

if 0„ ^ C2{a,b). L0„ = X„4^„, and LI = L, and so it follows that

This has two implications. The first is that the differential operator L can be represented by an integral operator with the kernel

l{x, y) = E^n0nU)C(>')- (10.2,29) 420 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

The second is that |>.„| -> oo as n -> oo (where the labeling of eigenvalues is chosen such that |A.„_,.i| > |A,„|). This conclusion follows from the property that if |A„| < oo for all n, then u = Xln ^n^n ^^^^

M^=^MY

contradicting the fact that all differential operators are unbounded. In the Hilbert space C2{a,b\s), s(x) > 0, there is a technical detail that should be kept in mind as regards the resolution of the identity in terms of a basis set 0„ in the space. Suppose 0„ is an orthonormal set. Then u = lu is written as

u = E^-.(f!'"> (10-2-31) n or

u{x) = J^f (l>nM(t>:(y)s(y)u(y) dy. (10.2.32)

Since Eq. (10.2.32) holds for arbitrary u(x), it follows that

8(x - y) = s{x)Y.(l>,{x)(t>:{y), (10.2.33)

Thus, in the space Ciia.h'.s), the kernel of the identity operator is the product of the sum Yin 4>ni^)^liy) ^^^ ^^e weighting factor s{x). Similarly, if {0^} is the set of eigenvectors of L, the relationship

Lu Y^XAMn^^) = E l\n(x)

implies that the kernel of the integral operator representing L is

l(x, y) = s{x)Y.X,(t),{xWn{y). (10.2.35) n

10.3. THE ADJOINT OF A DIFFERENTIAL OPERATOR

Consider the /7th-order differential expression of the form dPu(x) dP~^u{x) Lu(x) = Qpix)——- + fl^_i(x)-—3j— H + aQ{x)u{x), (10.3.1)

where ap{x) ^ 0 and a^ix) e C^'\a, h), i = 1,..., p. C^'\a, b) is the space of functions whose /th derivative is continuous in the finite interval [a,b]. The boundary conditions for the problem will be determined by setting the m linearly independent functionals

Bû=a^û{a)^"'-\-a^yp-^\a)^-a^p^û{b)-\-"--â^^2p^^^~^\b)

: (10.3.2) THE ADJOINT OF A DIFFERENTIAL OPERATOR 421

where u^'\^) = d'u(^)/d^\ ^ = a or b. Accompanying the differential expression L is its formal adjoint differential expression L\ We can derive the form of the adjoint by considering the expression

^{x)^ f v\y)Lu{y)dy. (10.3.3) J a or, in expanded form,

K(x) = £ v\y)a^{y)u{y)dy + £ v\y)a,(y)^^dy (10.3.4) + / v'{y)a2{y)—~r^dy-^'-, Ja dy^ Integrating the second term by parts once:

f v\y)a,{y)^^dy = v\y)a,{y)u{y) Ja dy (10.3.5) -ff-[vHy)a,(y)]u(y)dy,

the third term by parts twice:

dMy) du{y) '^a dy dy

- ^[v\y)a2{y)]ii(y)\ (10.3.6)

+ [^b*(y)a2(y)]u{y)dy,

etc., we obtain

ax) = fyoiy)v*iy) u(y)dy - £ ^[a,iy)v*iy)]uiy)dy

f" d^ + / Tl[(i2iy)v*iy)]ii{y)dy +a^{y)vHy)u{y) (10.3.7) J a CIV du{y) + a2(y)v*(y) [a^iy)v*iy)]uiy) + • dy dy By comparing Eqs. (10.3.3) and (10.3.7), we find

f[v*Lu - {L^v)*u] dx = 7(u, V) (10.3.8)

where JL .dHa*v) (10.3.9) dx} ;=0 422 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

and

(10.3.10) dx^-^ dx^-^'

Equation (10.3,8) is known as Green's formula. If we differentiate this equation, we obtain Lagrange's identity:

v\x)Lu{x) - (Vv%x))u{x) = — J(u, V). (10.3.11) ^ dx If we define the vectors

v{a) u{d)

and X = (10.3.12) v{b) u{h)

u^P-^\b)

we find from Eqs. (10.3.8) and (10.3.10) that

f [v'Lu - iL^v*)u]dx = J^ ylp^jxj = y+Px, (10.3.13) '.7=1 where the elements p^j of the 2p x 2p matrix P are various derivatives of a^ix) evaluated at x = a or ^. The m functional of Eq. (10.3.2) will be used to define the boundary conditions for the operator L. In general, we assume that m < 2p. Thus, to these functionals we will add the 2p — m linearly independent functionals

^m+l« ==«m+l, M^) + • • • + «m+l,2p«^^ ^H^)

(10.3.14)

B2pU =a2p,iu(a)^2p,\ -\-' " + a2n2pU^P2p,2p^ ^\b). These supplemental boundary functionals can be defined by any linearly independent set of vectors

Pj = [«n' • • •' «,,2p]' / = m + 1,.. ., 2/7 (10.3.15)

that is also linearly independent of the set

p^ = [«,!,..., a,. 2p]> / = l,...,m. (10.3.16)

Combining Eqs. (10.3.2) and (10.3.14), we can write

b = Ax, THE ADJOINT OF A DIFFERENTIAL OPERATOR 423

where bf = B^u, i = 1,..., 2p; x is defined in Eq. (10.3.12); and the elements of A are a^j. Since the 2/7 row vectors of A are linearly independent, A is nonsingular and so its inverse exists. Thus, x = A~*b, which, when inserted into y^Px, yields

with Cij = (PA~^),y and ^2p-i-i-r^ = Yl%\^*jyj' What we have found is that Eq. (10.3.13) can be expressed as

(V, Lu) - (L+v, u) = (BlvTB.u + • • • + {BI^.VTB^U ^ (10.3.18) + (5;i;)*B^^iM + ... + (5|u)*B2p". In solving the equation Lu = /, we have to consider the following problems: (i) The inhomogeneous problem

Lu = /, B^u = y,, / = 1,..., m (10.3.19)

(ii) The homogeneous problem

LM = 0, 5,M = 0, / = 1,..., m (10.3.20)

(iii) The homogeneous adjoint problem

Vv = 0, BJV = 0, / = 1,..., 2/7 - m (10.3.21) The boundary functionals for the adjoint operator are determined by requiring that

(V, Lu) = (LV, u) (10.3.22)

for any vector u in the domain

D^ = {u, Lu e n. BiU = 0, / = 1,..., m}. (10.3.23)

From Eq. (10.3.18), it follows that, for such vectors u,

(V, Lu) = (LV, U) + {Bl__^VyB^^,U -f (^2Vm-l^)*^m+2« + ' * * + {BlvYB^pU, Since the boundary functionals B^u, / = 1,..., m, for L are linearly independent of the functionals 5,M, / = m + 1,..., 2/7, the values of B^u, / = m + 1,..., 2/7, are arbitrary and so it follows from Eq. (10.3.21) that the boundary conditions for the adjoint operator L^ are

BJV =0, / = 1,..., 2/7 - m. (10.3.25)

The domain of L^ for the problem defined by Eq. (10.3.21) is

P^t = {v, V\ € n, B]V = 0, / = 1,..., 2p - m}. (10.3.26) 424 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

Since the functionals BjU, / = m -I-1,..., 2p, are not unique, neither are the functionals B^v, / = !,..., 2/7—m. However, the domain V^i defined by Eq. (10.3.26) is unique because any linearly independent set B,M, / = m + 1,..., 2/7, is a linear combination of the set B^w, / = m + 1,..., 2/7, which means, in turn, that the set BJV, / = 1,..., 2/7 — m, corresponding to B^u, / = m + 1,..., 2/7, will be a linear combination of the set ^/i;, / = 1,..., 2/7 — m. If we started with the adjoint problem

L^v = h, BJV = (5,, / = 1,..., 2p - m, (10.3.27)

we would find that its formal adjoint differential expression is L and the boundary functionals of L are B^u, i = 1,..., m. This means that (L^)^ = L, which is equivalent to the properties (A^)^ = A and (K^)^ = K of matrix and integral operators. Let us study some simple examples. Consider the differential expression

Lu = -~;-j (10.3.28) "d? for the following five sets of boundary conditions:

(i) M(0) = 0, M(1) = 0,

(ii) M(0)-M(1) = 0, u\l)=0,

(iii) M(0)-M(1)=:0, W'(0)-M'(1) = 0, (10.3.29)

(iv) w(0) = 0, w'(l) = 0,

(v) u\0) - M'(1) = 0.

For this operator, V = L and the boundary conditions for V are determined from the equation

ji y(u, V) = v\0)u'{Q) - V*(1)M'(1) •o (10.3,30) -f [v'{\)Xu{\) - [i;'(0)]*w(0) = 0.

We will see that, even though the adjoint differential expression V is the same as L, the adjoint operator \J is not the same as the operator L. For case (i), Eq. (10.3.30) implies that

I;*(0)M'(0) - I;*(1)M'(1) = 0 for all u e V^. (10.3.31)

Note that the boundary conditions on L put no constraint on the derivatives u'(x) at X = 0 and x = I. Thus, u'{0) and u'(l) are arbitrary and so the boundary conditions on L^ deduced from Eq. (10.2.26) are

i;(0) = i;(l) = 0. (10.3.32) THE ADJOINT OF A DIFFERENTIAL OPERATOR 425

This implies that P^ = P^t, and since the differential expressions L and V are the same and the domains of the operators are the same, the operator L of case (i) is self-adjoint. Consider next case (ii). With the corresponding boundary conditions, Eq. (10.3.30) becomes

v\Q)u'{Q) + [i;'(l) - i;'(0)]*M(0) = 0 for all u G P^^. (10.3.33)

Again, the quantities M'(0) and M(0) are arbitrary since the boundary conditions for case (ii) put no constraints on them. Consequently, Eq. (10.2.29) implies the boundary conditions

i;(0) = 0 and i;'(l) = i;'(0) (10.3.34)

for the adjoint operator L^ Since P/t ^ P^r^, the operator L for case (ii) is not equal to its adjoint L^ even though their differential expressions are the same. The boundary conditions for case (iii) give the condition

[i;(0) - U(1)]M'(0) + [i;'(0) - V'(1)]*M(0) = 0 for all u € P^. (10.3.35)

Since u'{Qi) and M(0) are unconstrained, it follows that the boundary conditions of L^ are

i;(0) = i;(l) and i;'(0) = i;'(l). (10.3.36)

Thus, Vi — Vjj and so the operator L is self-adjoint in this case. The boundary conditions of case (iv) lead to the condition that

I;*(0)M'(0) + [I;'(1)]*M(1) = 0 for all u € P^. (10.3.37)

Since u\0) and M(1) are unconstrained, the boundary conditions of L^ become

v(0) = i;'(l) = 0, (10.3.38)

and so L is self-adjoint.

Finally, the boundary conditions for case (v) yield

[i;*(0) - I;*(1)]M'(1) -f [I;'(1)]*M(1) - [I;'(0)]*M(0) = 0. (10.3.39)

Since M'(1), M(1), and M(0) can have arbitrary values, the boundary conditions for L^ are i;(0) - i;(l) = 0, i;'(l) = 0, i;'(0) = 0. (10.3.40)

Thus, the operators of cases (i), (iii), and (iv) are self-adjoint, whereas the operators of cases (ii) and (v) are not. Of course, if L ^ L\ then the operator L will not be self-adjoint for any conditions. For example, if

d^u{x) du(x) Lu = a2{x)—j-^—^"^i(^)~7I—\-aQ{x)u{x), (10.3.41) dx^ ^ dx 426 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

then

d^(ao(x)u(x)) d(aUx)u(x)) Vu = ^ ' , ^ - ^ ^\ '^ + al{x)u{x), (10.3.42) dx^ dx

However, if L = L\ we say that an operator is formally self-adjoint. For the second-order case, formal self-adjointness requires that

a^{x) = a*(jc), fli(jc) = 2-%^ - fl*(;c) (10.3.43a) dx and

aoW = ^-^+«o*W. (10.3.43b)

If the a, are real, then formal self-adjointness requires that da^ldx = A^, or

d ( Jw(jc)\ Lu = — U2(Jt)—r-^ -f ao(-^)w(^). (10.3.44) dx\ dx J

10.4. SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM

Consider the homogeneous equation

A dUi Lu^y a.(jc)—~r=0, a

u^^\x) = f., X = c, / = 0 p - 1, (10.4.2)

where c is some point in the interval [a, b] (here a and Z? may be finite or infinite). We assume that a^ix) 7^ 0 for jc in {a,b] and that the coefficients a,(x), / = 0,..., /?, are continuous in the interval [a, fe].Th e following theorem can be proved:

THEOREM. The solution to the pth-order homogeneous equation (10.4.1) with initial conditions Eq. (10.4.2) exists and is unique.

The proof of the above theorem was given in Section 9.4, where we showed that the initial value problem can be transformed into the problem of solving a Volterra equation of the second kind—which was shown to possess a unique solution. In this section, however, we prefer to examine an alternative proof of the theorem since it allows us to introduce "fundamental solutions" to pth-order differential equations. SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM 427

First, we convert the pth-order equation to a first-order system by defining Ui, i = 1,...,/?, by

Zi =W

du dz\ ^2 = dx dx d^u dz2 dx^ dx (10.4.3a)

__dP''^u dZp.i ^P ^ dx^ ^ dx

and noting that

dPu _ 1 y.^ _ dzp (10.4.3b)

Thus, the problem of solving Eq. (10.4.1) with initial conditions in Eq. (10.4.2) is transformed into the problem of solving system dz 1^ = Az, (10.4.4) with the initial condition

z{x) = f for jc = c, (10.4.5)

where

?1 u{c)

< = (10.4.6)

^p J and 0 1 0 0 0 1 0 0 1 0 0 A = ^p(^) ; _ -cioix) -a^ix) -a2{x) -a^{x) •• • -«y,-lU) (10.4.7)

The theorem now becomes:

THEOREM. The solution to Eq. (10.4.4) with the initial condition in Eq, (10.4.5) exists and is unique. 428 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

If all of the coefficients a, are constant, we already know from matrix theory that the unique solution to Eqs. (10.4.6) and (10.4.7) is

z(jc) = exp((x - c)A)<. (10.4.8)

To prove the theorem for the general case, let us integrate Eq. (10.4.4) between c and X to obtain

z= rA(y)z(j)t/>' + < = Kz + <. (10.4.9) "^ C

This is a Volterra equation of the second kind. Writing Eq. (10.4.9) in the form

(I-K)z = <, (10.4.10)

we next consider the sequence of vectors

Z2 = < + K:< (10.4.11)

z„ = <+K<„_,.

By successive substitution, we find

z„ = S„<, (10.4.12)

where

S„=I + X;K'. (10.4.13)

From the identity

(I-K)S„=I-K", (10.4.14)

it follows that

(I-K)z, = <-K"<. (10.4.15)

Note that the quantity R„ = —K"f is a /7-dimensional vector given by

R„ = - f dy, My,) r dy, A(y^) •••('" dy„ A{y„)(. (10.4.16)

We can examine the vector as n -> oo in the norm ||v||oo = ^'^'^^\

IIRJIoc < f dy, f'dy,... rdy„\\A(y,)\\^---\\A(y„)\Uaoo- (10.4.17) SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM 429

From the structure of A(jc), we find that either ||A(jc)||^ < 1 or

|a,(x) l|A(jc)||oo < max max (10.4.18) \

IIRJIoo < ^^^^F" max Kl as n -> oo (10.4.19)

for a < X < b, where the interval [a, b] is finite. This proves that (I —K)z„ — ^ = 0 as n ^- oo, or that z = lim„_,^ z„ is a solution to Eq. (10.4.10) and the quantity

oo (I-K)-* =J2K^ (10.4.20) 1=0 is the inverse of I — K. To prove that the solution z is unique, we assume that there is another solution w. Then (I—K)(z—w) = 0, or z—w = K(z—w), from which it follows that (z-w) = K"(z-w). As shown in Eqs. (10.4.17)-(10.4.19), this result implies that z — w = 0, and thus the solution is unique since lim„_^^ S„0 = 0. Since the initial conditions can be represented as a /7-dimensional vector {, it follows that there are only p possible sets of linearly independent initial conditions. Suppose {f J,..., (p] is a set of p linearly independent /7-dimensional vectors. Then any other set of initial conditions can be represented as a linear combination in these p vectors. We define a fundamental set of solutions (also known as a fundamental system) to Eq. (10.4.5) as a set of solutions {z,,..., z^} corresponding to p linearly independent sets of initial conditions {

THEOREM. The fundamental solutions of a pth-order linear differential equation are linearly independent. To prove this, we assume that the contrary is true, namely, that there exists a set of numbers {^,,..., ^p], not all 0, such that Y.i Pi'^i = 0- However, when A: = 0, z, = 5,, and so J], p^^i = 0, which is a contradiction since the f, were chosen to be linearly independent. We can now express solutions to arbitrary initial conditions in terms of the fundamental system by using the following theorem:

THEOREM. If ( represents an arbitrary set of initial conditions, then there exists a unique set of numbers [a^,, . . ,ap] such that ^ = J2i=i^iKi ^^^ ^he solution to

— =A(jc)z, z(c) = < (10.4.21) dx is a linear combination of the fundamental solutions, namely,

z = j:a,z,, (10.4.22) 1=1 where z,(c) = ^,. 430 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

We can restate these conclusions for the equivalent /7th-order problem in the following theorems:

THEOREM. The homogeneous equation

d'u Lu = ^a.(^)-_ =0 (10.4.23) dx'

has a fundamental system^ i.e., a set of p fundamental solutions {ui{x),... ,Upix)] corresponding to p sets of linearly independent initial conditions

uf(c) = ^j, 0,...,/?- 1, 7 = 1,.,.,/?. (10.4.24)

THEOREM. The solution u{x) for an arbitrary set of initial conditions u^^\c) = î, i =0,..., p—l, is a linear combination of the fundamental solutions [Ui{x),...,Up{x)). A fundamental system is not unique, but any fundamental system can be expressed as a linear combination of the fundamental solutions of any other fundamental system. It might not be obvious that linear independence of the fundamental solutions {Zi(jc),... ,z^(jc)} implies linear independence of the functions {u^{x),..., Up(x)}, which are the first components of {Zj,..., z^}. Assume that the solutions uîx) and U2(x) are linearly dependent while f j and (2 ^^ linearly independent. But if M, and MJ ^^^ linearly dependent, Ui(x) = Pu2(x), and so u^l\x) = cu2\x), i = 0,...,/? — 1 (remember M^'^ = dû/dx'). This, in turn, implies that z^(x) = ^^2^ ^^» setting jc = c, that ^1 = /5f 2' contradicting the fact that 5, and f 2 ^^^ linearly independent. Thus, the functions [uîx),.,. Up{x)} must be linearly independent functions if the vectors {f j,..., f ^j are linearly independent vectors. EXAMPLE 10.4.1. The homogeneous equation

d^u = 0 (10.4.25) dx^ has Mj = 1 and M2 = ^ as fundamental solutions for the initial conditions {M(0) = 1, M'(0) = 0 and M(1) = 0, u\l) = 1}, respectively. Thus,

1 (10.4.26) 0 ^2 =

For the arbitrary initial conditions.

< = = «?l+^<2. (10.4.27)

the solution is

• • • M = aui + Pui =a + fix. (10.4.28) SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM 431

EXAMPLE 10.4.2. The homogeneous equation

5? + " = ' (10.4.29) has Mj = sinjc and U2 = cosjc as fundamental solutions for the initial conditions {u(0) = 0, M'(0) = 1 and M(0) = 1, u\0) = 0}, respectively, and so the solution for the arbitrary initial conditions u{0) ~ a and M'(0) = J^ is

u{x) = jSsinjc + a COSJC. (10.4.30)

EXAMPLE 10.4.3. The homogeneous equation

d^u d^u du + M =0 (10.4.31) dx^ dx^ dx

has Ui{x) = ^ ^, M2(jc) = e^, and UT^{X) = xe^ as fundamental solutions for the initial conditions M(0) == 1, M'(0) = -1, u"{Q) = 1; M(0) = M'(0) = u"{^) = 1; and M(0) = 0, u'{0) = 1, M"(0) = 2, respectively. The solution for the general initial value problem w(0) = a, u'{Q) = )3, u"{Q) = y is

u{x) = \{oc -ip-h y)e-' + \0a + 2^ - y)e^ (10.4.32) • • • Let us next consider the inhomogeneous initial value problem:

^ d'u L.^iM-[~i = fix), (10.4.33)

M^'Hc) = C., / =0,...,/?- 1, (10.4.34)

where the functions f{x) and a fix) are continuous and ap(x) 7"^ 0 in the interval [a,b]. The corresponding first-order system is

dz = A(jc)z4-b(jc), (10.4.35) dx

where

fix) 0 b(jc) = (10.4.36)

We again integrate between c and x to obtain

il^K)z = ^-^ fbiy)dy. (10.4.37) 432 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

The solution to this equation is

z = (I - K)-^< + (I - K)-^ r b(3;) dy, (10.4.38)

We proved above that the integral operator (I —K)~^ exists. The solution is unique and is just the sum of the solution of the homogeneous equation with the inhomogeneous initial conditions z(c) = f and of the solution of the inhomogeneous equation with the homogeneous initial conditions z(c) = 0. We have learned so far that the solution to the general initial value problem exists and is unique. In the case of the general boundary value problem, however, things are more complicated. A solution might exist but not be unique or a solution might not even exist. The rest of this section will be devoted to the general boundary value problem. For the pth-order differential equation, the fully inhomogeneous boundary value problem we want to consider is

Lu(x) = /(jc), a

with balanced boundary conditions

: (10.4.40)

where not all y, are 0. The differential operator is defined in Eq. (10.3.2) and the functional B^u, i = 1,..., p, are defined in Eq. (10.3.1). The difference between the operators defined in Eq. (10.3.1) and in Eq. (10.4.1) is that the former requires that ai(x) e C^'\a, ^), / = 0,..., p. Suppose u(x) is a solution to the homogeneous equation

LM(JC) = 0, a

obeying the inhomogeneous boundary conditions

5.M = K, / = l,...,/7. (10.4.42)

Suppose also that u(x) is a solution to the inhomogeneous equation

Lu(x) = fix), a

obeying the homogeneous boundary conditions

BiU=0, i = \,...,p. (10.4.44)

Clearly, the function u(x) = u(x) -^ u(x) obeys the equation Lu = f and the boundary conditions B^u = y^. Thus, the problem of solving the fully inhomogeneous problem defined by Eqs. (10.4.39) and (10.4.40) can be divided into the problem of solving the homogeneous equation for inhomogeneous boundary conditions defined by Eqs. (10.4.41) and (10.4.42) and the problem of solving the SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM 43 3

inhomogeneous equation for the homogeneous boundary conditions defined by Eqs. (10.4.43) and (10.4.44). This is a special case of the superposition principle, which states

SUPERPOSITION PRINCIPLE. Ifu is the solution to the problem Lu = /, Bû — y,, i — 1,..., p, and U is the solution to the problem LU = F, BÛ — F,, / = 1,..., p, then cû -i- CjU is the solution to the problem Lw = Cj/ + C2F, BiW - C^Yy + C2r2, / = 1, . . . , /7. The principle follows from the linearity of L and B-u. The superposition principle leads to the theorem:

THEOREM. If the homogeneous problem Lu =0, B,M = 0, i = 1,..., /?, has only the trivial solution M = 0, then the fully inhomogeneous problem has at most one solution. The proof of the theorem is easy. Assume that Wj and M2 are solutions to the fully inhomogeneous problem. Then U2 — u^ obeys the equation L(M2 ~^i) — ^ ^"^ the boundary conditions B^{u2 — Wi) =0, implying that U2 — U\ = 0 or M2 = Wj. Let us examine a few simple examples before continuing with the general theory. •••I EXAMPLE 10.4.4. Consider the inhomogeneous equation

-^ = fix), u(0) = K,, u(\) = K2. (10.4.45)

The general solution to the homogeneous equation —d^u/dx^ = 0 is M = a + fix, and from the boundary conditions M(0) = y, and M(1) = 72 it follows that

« = yi + (y2-ri)^- (10.4.46) We will see in Section 10.5 that the solution to the inhomogeneous equation -d^a/dx^ = fix), M(0) = w(l) = 0 is

u= f [r^{x - y)y{\ - x) + rj{y - x)x{l - y)]f{y) dy, (10,4.47)

and, thus, the solution to Eq. (10.4.45) is

M = Ki + (K2 - y\)x A (10.4.48) + / [^{x~y)y{\-x)^n{y-x)x{\-y)]f(y)dy.

Since the only solution to —d^u/dx^ = 0 with w(0) = M(1) = 0 is M = 0, it III follows that Eq. (10.4.48) is the unique solution to Eq. (10.4.45). ••• EXAMPLE 10.4.5. Consider the inhomogeneous equation

-^ = 2, u\0) = 1, u\l) = -1. (10.4.49) dx^ The general solution is

u=a + Px- x^, (10.4.50) 434 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

and the boundary conditions yield P = I, and so the solution is

M =:Q:4-JC-JC^ (10.4.51)

III where a is an arbitrary constant. In this case, the solution exists but is not unique, •••i EXAMPLE 10.4.6. Consider the inhomogeneous equation

-^ = fix). u\0) = K,, u\\) = -K2. (10.4.52)

where the function f(x) is such that

f f(x)dx^y,-{-y,, (10.4.53)

Integrating Eq. (10.4.52) once yields

-u\l) + M'(0) = f f{x)dx, (10.4.54)

which contradicts Eq. (10.4.53) since —M'(1) + M'(0) = 72 + Yv Thus, no solution exists for any f{x) obeying Eq. (10.4.53). To understand the general solvability conditions for the fully inhomogeneous problem, we need to return to Eq. (10.3.18), which is

/ [v*Lu-{Lhyu]dx (10.4.55)

The domain I>^ for the inhomogeneous problem is

Vi^ = {u, Lu e n. BiU = K, / = 1,..., /7}. (10.4.56)

The domain I^^t of the operator iJ of the homogeneous adjoint problem is

Vjj = {v, L+v e n, BJV = 0, / = 1,..., p}. (10.4.57)

Thus, if we restrict ourselves to vectors v e I>^t and u e X>^, Eq. (10.4.56) becomes

{V, Lu) - (Vy, u) = j:n(^Ip+i-.-'^)*- (»0.4.58)

We have now developed the ideas needed to state the alternative theorem for boundary value problems. Consider the three problems:

Lu = /, a < X < b, BiU = y., I = 1,.. .,P, (10.4.59)

Lu = 0, a < X < b, BiU = 0, I = 1,.. .,p, (10.4.60)

L^v = 0, a < X < b, BJV = 0, I = 1,.. '.p. (10.4.61) SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM 435

where Lu and B^u are defined by Eqs. (10.3.1) and (10.3.2). The following alternative theorem holds:

ALTERNATIVE THEOREM, (a) If the homogeneous problem, Eq. (10.4.60), possesses only the trivial solution, so does the homogeneous adjoint problem, Eq. (10.4.61). (b) If the homogeneous problem, Eq. (10.4.60), has k linearly independent solutions, then the homogeneous adjoint problem, Eq, (10.4.61), has k linearly independent solutions. (c) If the homogeneous problem, Eq. (10.4.60), has k linearly independent solutions uj,, / = 1,..., ^, then the fully inhomogeneous problem, Eq. (10.4.59), has a solution if and only if

«^> = E yj{BUi-Ay^ / = !,...,/:, (10.4.62) y=i

where the y^, i = \,... ,k, are the linearly independent solutions to the homogeneous adjoint problem, Eq. (10.4.61). If the solvability conditions hold, the solution to the fully homogeneous problem will be of the form

k u = Up + i:qu|„ (10.4.63)

where Up is a particular solution to Eq. (10.4.59); uj,, / = I,... ,k, are the linearly independent solutions to the homogeneous problem, Eq. (10.4.60); and the constants Ci, i = I,... ,k, are arbitrary.

The necessity of the conditions in Eq. (10.4.62) follows easily from Eq. (10.4.58): if u is a solution to Lu = f and if v is a solution to V\ = 0, then Eq. (10.4.58) reduces to Eq. (10.4.62). The proof that the conditions at Eq. (10.4.62) are sufficient to ensure solvability of the inhomogeneous problem is significantly more difficult and will not be given here. A more advanced textbook should be consulted for the proof. When the solvability conditions are met, the solution to the fully inhomogeneous problem, Eq. (10.4.59), can be constructed in the following way. Let {MJ, ..., Up] denote a fundamental system of Lu = 0. Let MJ denote the solution to the initial value problem

Luj = /, u^'\x = a) = 0, i=:Q,...,p-l, a

The set {u^,..., u^} exists according to the theorem proved above. MJ exists and can be computed from a Volterra equation as indicated by Eqs. (9.4.59)-(9.4.63). The function

p u(x) = Y^ajUjix) -t- Uiix) (10.4.65) 436 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

is, therefore, a solution to LM = / for arbitrary constants {a,,..., oc^}. If we can choose these constants to satisfy iB^w = y,, / = 1,..., p, i.e.,

p Yl(^iUj)aj = K, - B,u,, / = 1,..., /7, (10.4.66)

then Eq. (10.4.65) is the solution to the fully inhomogeneous problem, Eq. (10.4.59). Let us illustrate this method of solving LM = /, B,w = y,, i = 1,..., p, with the simple examples given above for the equation

^=f(x), 0

Mj(jc) = - / (jc ~t)f(t)dt. (10.4.68)

Consider again the following examples. EXAMPLE 10.4.7. Consider again the inhomogeneous equation

d^u ^^2 = /(^), w(0) = Ki, u{\) = Y^. (10.4.69)

A linearly independent set of fundamental solutions to Lw = 0 (i.e., a fundamental system of Lw = 0) is

Wi(jc) = 1, u^ix) = X, (10.4.70)

(An equally acceptable system would be MJ = \+x and M2 = 1 —x.) The solution to Eq. (10.4.69) is then

M = QflMj +Qf2W2 4-Mi, (10.4.71)

where M(0) = y^ and u{\) = Y2, or

(10.4.72) «i+«2- f (l-r)/(0^^ = K2-

Solving for a, and a2» we obtain

M(X)=:)/I4-(K2-KI)-^-+--^ i (l-t)f(t)dt (10.4.73) - f\x-t)f(t)dt. Jo This result (when appropriately rearranged) is in agreement with the solution given • • • earlier by Eq. (10.4.48). SOLUTION TO THE GENERAL INHOMOGENEOUS PROBLEM 437

EXAMPLE 10.4.8. Consider again the inhomogeneous equation

—-"d?. = 2, M'(0) = 1, M'(1) = -1. (10.4.74) In this case,

MI = -JC^ (10.4.75)

and again we choose MJ = 1 and M2 = JC as the fundamental system of Lu = 0. With

u = «!«! -f a2U2 + Mj, (10.4.76)

the boundary conditions require

Oil = 1 (10.4.77) a2-2 = -1 or the solution to Eq. (10.4.74) is

for arbitrary a^ in agreement with the solution given by Eq. (10.4.49). EXAMPLE 10.4.9. Consider again the inhomogeneous equation dhi ~^ji = /(^)' "'(0) = Ki, u(l) = ^72, (10.4.78) dx where

/ f{x)dx^y,+y2. (10.4.79)

Then

u{x) =a^+a2X- f (x- t)f{t) dt (10.4.80) Jo and

u\x) = ^2 - r /(O dt, (10.4.81)

We seek a^ and ^2 such that

^1 (10.4.82) «2- / fO)dt = -K2.

which requires that

f fit)dt = y, + y2. (10.4.83)

Since we consider now f{x) such that the inequality at Eq. (10.4.79) holds, there I • • is no solution to this example. 438 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

••• EXERCISE 10.4.1. Find the adjoint operators for the operators defined in Examples 10.4.7-10.4.9 and demonstrate that the alternative theorem is obeyed in • • • each example. Let us close this section by showing that the solution to the general second- order problem d^u du Lu — a2 3T + ^1 T~ "^ ^0^ = /' a < X < h, (10.4.84)

with 5,M = y,, / = 1,..., /?, and a2(x) 7^ 0, can be constructed from a fundamental system M^ and U2, of Lu = 0. We define the determinant

Ui(x) Ujix) W(u^,U2,x) - = Ui(x)u2{x) — U2{x)u\(x) (10.4.85) Mj(jc) Ujix)

(called the Wronskian). If u^ and U2 is a fundamental set, then Lu^ — Lu2 — 0. From this it follows that 0 = UyLu2 — U2LU1 = a2iuiU2 — U2u'[) -{- ai{uiU2 — W2M1) (10.4.86)

dx ^2 Thus,

iyW = W(xo)exp(-£[^].,), (10.4.87)

where W(XQ) is an arbitrary point in the interval [a,b]. From this it follows that W{x) is either identically 0 or nowhere 0 in [a, b]. Suppose W{x) is identically 0. Then

u^(x)u2(x) - U2(x)u[(x) = 0, (10.4,88)

or dlnui/dx — d\nu2/dx — 0, or u^{x) = Au2{x), where A is an arbitrary constant. This is a contradiction since MJ and M2 f^^*^ ^ fundamental system and so are linearly independent. Thus, W{x) is nowhere 0 in the interval [a, fo]. We assert that Uy{x), the solution to Eq. (10.4.64) for this second-order case, is

Ja a2{y)[u^{y)u2{y) - U2{y)uy{y)\

Since a2{y) i=- 0 and ^{u^,U2\y) ^ 0, the function M,(JC) G C^^Ha,^), and by direct differentiation we can show that Lu^ = / and Ui{a) = u\d) — 0. The general solution to the second-order case, when it exists, can thus be expressed as u{x) — aiU^ix) + a2U2ix) u,{x)u2{y) ~ U2{x)u,{y) ^^ ^^ (10.4.90) Jaf a2(y)[ui(yW2{y) - U2(y)u[(y)] with Of J and ^2 determined by the conditions B^u = y^ and B2U = )/2- GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR 439

••• EXAMPLE 10.4.10. Consider the problem

-—-l-TT^M =cos7rx, 0 < X < jr, (10.4.91)

with M(0) = M(1) = 0. The adjoint operator L^ equals L since L =^ V and Vj = Viu The solution to the homogeneous adjoint equation L^u = 0 is u = sin Tfjc. Since

(v, f) = / sin TTJc cosTTJcdjc = 0, (10.4.92)

it follows that Eq. (10.4.91) has a solution. A fundamental system for Lu = 0 is Wj = siuTT^ and M2 = cos TTX. WJ satisfies the initial conditions M,(0) = 0, u\{0) = —n and M2 satisfies the initial conditions M2(^) = 1* ^2(0) = 0. Insertion of M, and M2 into Eq. (10.4.90) yields 1 c^ u{x) = «! sin Ttx + (X2 cos nx -\— sin nx j cos^ ny dy J , "" " (10.4.93) cosTTjc / sin^rj co^nydy.

n JQ The boundary conditions w(0) = M(1) = 0 yield 0^2 = 0- Thus, 1 r^ u(x) — ay sin nx ^— sin nx / cos^ ny dy , ^, ^' (10.4.94) COS nx I sin ny cos Try t/y, TT ^0 • • • where ar, is arbitrary. Thus, u = u^-^-a^u^ as required by the alternative theorem. 10.5. GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR

Let us consider again the equation

Lu{x) = f{x), a

where the interval [a, b] is finite and L is a pth-order differential expression defined by

.dPuix) dP''u{x) , Lu{x) = a^{x)-—~ + «p_i , + • • • ax ax (10.5.2)

We assume that ap{x) i^ 0 and that a,(A:) € C^'H^» ^)» ' = 0,..., /?, where, as stated before, C^'^(a, fc) is the space of functions whose /th derivative is continuous in [a, h\ As pointed out in previous sections, in order to associate an operator L with the differential expression L, we must specify its domain P^. For example, we can say 440 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

that u G X>L if " ^"d Lu € L^ia, b). Usually, p independent boundary conditions are also specified for a pth-order differential equation. This need not be done, but if p boundary conditions are not stated, a 77th-order differential equation will certainly not have a unique solution. Of course, if the homogeneous equation Lu = 0 has a nontrivial solution, then the inhomogeneous equation will not have a unique solution even with p independent boundary conditions. In this section we consider p linear, homogeneous boundary conditions of the form

: (10.5.3)

where u^'\^), ^ = a,b, denotes the /th derivative of u {d^u{^)/d^^) evaluated at ^ = a and b. We assume that the row vectors

pj = [a,i,... ,a,. 2^, / = 1,..., /?, (10.5.4)

are linearly independent. We pointed out earlier that this assumption implies that the rank of the matrix [/>!,..., p^] is p and ensures that Eq. (10.5.3) defines p linearly independent boundary conditions; thus, the problem is a balanced one. In the special cases a^j = 0, 7 = 1,..., /?, or or^y = 0, 7 = p -f 1,..., 2/7, Eqs. (10.5.1) and (10.5.3) define the initial value problem (IVP). We know from Section 10.4 that, for the functions such that a^(jc) ^ 0 and a,(x) G C^{a,b), i = 1,..., /7, the IVP has a unique solution. On the other hand, if some of the a,^, 1 5 7 :< P, and some of the a,^, p + I < j < 2/?, are nonzero, Eqs. (10.5.1) and (10.5.3) define a boundary value problem (BVP). The BVP may or may not have a unique solution as stated in the alternative theorem in Section 10.4. For example, consider the equation

1, 0<;c

The general solution is

u{x) =a-{-px--j, (10.5.6)

With the boundary conditions u'{0) = u\l) =0, we find

0 = ^ and 0 = ^-1, (10.5.7)

which is clearly impossible. Thus, this simple BVP has no solution. Since the homogeneous equation Lu = 0 has the solution M = a for this BVP, we already knew the problem would not have a unique solution. It turns out that this problem has no solution, which can be confirmed from the general solvability conditions for a BVP—given in Section 10.4. GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR 44 I

We will now restrict our attention to problems in which the homogeneous equations defined by LM = 0 and J5,M = 0, / = 1,..., /?, have only the trivial solution II = 0. For this class of problems, the inverse of the differential operator L exists. In summary, the operator problem in this section becomes

Lu = f, where Vj^ = {u, Lu G H; B,M = 0, / = 1,..., /?}; (10.5.8)

the differential expression for L is defined by Eq. (10.5.2); and the boundary conditions are defined by Eq. (10.5.3). One choice of the Hilbert space V, is Ciici, b). However, since a^,{x) never equals 0 on the interval [a, b], it is either strictly positive or strictly negative. An alternative Hilbert space is C2{a, b\ ±l/tz^(jc)), where the expression +l/a^(x) is the weighting function if ^^ is positive and —\/ap(x) is the weighting function if a^ix) is negative. If u, v G Ciia, b\ l/a^(x)), then the inner product (u, v> is, by definition.

(u,v)= /' ^ ' y dx, (10.5.9)

Let us now consider the simplest of linear, inhomogeneous differential equations du(x) —A_Z :=, fi^x), M(0) = 0, 0 < jc < 1. (10.5.10) dx In operator form, the equation becomes Lu = f. (10.5.11)

If L~* is defined as the inverse of L, then L~'L = LL~^ = I, where I is the identity operator. Multiplication of Eq. (10.5.11) by L"^ yields the solution

u = L-'f. (10.5.12)

This works, of course, only if L~^ exists and is useful only if we can construct L"'. Integration of Eq. (10.5.10) yields

ii(x)= f n{x-y)f{y)dy. (10.5.13)

where ;; is the step function defined by

.W=j'' ^'^'' (10.5.14) II, X > 0. In order to accommodate operator theory, the simpler solution u{x) = f^ f{y) dy has been rewritten in the form given in Eq. (10.5.13). In operator form, Eq. (10.5.13) becomes

u = Gf, (10.5.15)

where G is an integral operator with the kernel gix,y) = r]{x,y). The kernel of the operator G is called Green's function. Since Lu = LGf = f for arbitrary f. 442 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

it follows that LG = I. Since I is the identity operator in a function space, its kernel is 8(x — y), where 8 is the Dirac delta function. Thus, for the operator L, it follows that Green's function must satisfy the equation

Lg(x, y) = ^^^ = 8(x - y). (10.5.16) ax It is well known from functional analysis that the derivative of a step function is the Dirac delta function, i.e., dr](x — y)/dx = 8(x — y). To see that this is true, consider the basic properties of a Dirac delta function: 8{x) = 0 if x ^ 0 and 8(x) —>• oo as jc -> 0 such that

J a 10, Otherwise.

From this property it follows that

f8i^-y)d^ = r ""^y' (10.5.18)

or f 8{^-y)dH^r){x-y), (10.5.19)

Thus, since g{x, y) = r\{x — y), it follows that dg{x, y)ldx = 8{x — y), as required by the condition that LG = I. Let us next examine the slightly more complicated differential equation d^u(x) Lu = ~d?JY- = Z^-^)' "(^) = ^(1) = ^- (10.5.20) The operator L in £2(^' 0 corresponding to this equation has the domain

D^ = {M(0) = M(l) = 0, u, Lu € £2(0. !)}• (10.5.21)

Formal integration of Lu — f yields

u{x) ^a + bx- f d^ [ f{y)dy, (10.5.22) Jo JQ

where a and b are constants. By interchange of variables of integration, we obtain

r d^ I f(y) dy = f f(y) dy f d^ = f{x - y)f{y) dy, (10.5.23) Jo Jo 'fO Jy Jo From this and the boundary conditions for L, it follows that fl = 0 and b = - f {I- y)f{y) dy, (10.5.24) Jo and so

u(x) = / [-n(x -yKx- y) 4-,Jc(l - y)]f(y) dy, (10.5.25) GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR 443

The kernel of Eq. (10.5.25) possesses a symmetry that, though important to us in relation to the adjoint Green's function, is not apparent as written. Equation (10.5.25) can be rewritten in the form

u{x) = f[-ix - y)-^x(\ - y)]fiy)dy -f f xH - y)f{y)dy Jo Jx = f y(l-x)f(y)dy+ I x(l-y)f{y)dy (10.5.26) JQ JX

= i b{x - y)y{\ -x) + viy - x)x{l - y)]f{y) dy. JQ Thus, we have found that the solution to Eq. (10.5.20) is u = Gf, where Green's function is

gix. y) = riix - y)y{\ - x)-\- rj{y - x)x{\ - y). (10.5.27)

The condition LG = I requires that Green's function obey Lg{x,y) = 8{x — y). Differentiation of Eq. (10.5.27) with respect to x yields

——^— = Six - y)y{\ -x)- rj(x - y)y - 8{y - x)x{l - y) dx + 7](y-x){\-y) (10.5.28) = -r](x - y)y + T](y - x)(l - y).

We arrived at this result by noting that 8(x) is an even function, i.e., 8(x — y) = 8(y — x), and that 8(x — y)y(l — x) = 8ix — y)x{l — y) since 8{x — y) is nonzero only at jc = y. Differentiating Eq. (10.5.28) with respect to jc, we find

d^oix v) d^ " ^^^ " ^^^ "^ ^^^ "" ""^^^ " ^^ " ^^^ ~ ^^' (10.5.29) as required for Green's function. Another property of the kernel g{x,y) is that

g(0, y) = g(l, y) = 0 for 0 < y < 1; (10.5.30)

i.e.. Green's function satisfies the boundary condition for its corresponding operator L. The simple examples just analyzed point out some general properties of Green's {functions. In terms of Green's function, the solution to Lu = f is u{x) = /^ g(jc, y)f{y)dy for arbitrary functions /(y). From the boundary conditions for L, it follows that

\\B,g)f{y)dy = 0, / = 1,..., p. (10.5.31) J a From this one concludes (since we can choose /(y) = Big) that Green's function g(jc, y) satisfies, with respect to the variable jc, the boundary conditions of L, i.e.,

B.^ = 0, / = l,...,/7. (10.5.32) 444 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

Also, since LG = I, it follows that for the /7th-order operator g(x,y) satisfies the differential equation

l^diix)——— = 8(x - y). (10.5.33) dx' /=o If g{x, y) obeys Eq. (10.5.33), then it has to have p — 2 continuous derivatives at X = y. Otherwise, if the derivative d^~^g{x, y) jdx^~^ were discontinuous at X — V, then the derivative dP~^g(x, y)/dx^~^ would be a Dirac delta function of X — y and the derivative d^g{x, y)/dx^ would be the derivative of a Dirac delta function of JC — j, instead of just the delta function as in Eq. (10.5.33). Given that the first p — 2 derivatives of g(x, y) are continuous, division of Eq. (10.5.33) by ap(x) and integration over x from y — € to y + € (where € is an infinitesimal positive number), we obtain

d'^'gix^y) d'-'gix.y) dxP-^ dxP-^ x=y+€ (10.5.34) p~\ y-^^ ai{x)i d'g{x,y)dx + cipix) dx' «p(y)

In the limit as 6 -^ 0^, the integrals in Eq. (10.5.34) vanish because the quantities {ai/Gp) are continuous, and so

^+^ ai(x)d^g(x,y)dx a,(y) y+' d'g{x,y)dy lim r ^ lim = 0 (10.5.35) (X) dx' /, Jx' for / = 0 to p — 1. Thus, Eq. (10.5.34), in the limit as e -> 0^, yields the following jump condition on g{x, y):

d^-^g(x,y) dP-'g{x,y) 1 (10.5.36) dxP-^ dxP-^ ^piy) Let us now examine Green's functions for some special cases.

10.5.1. The Initial Value Problem Suppose the coefficients af,y, y = /? -f 1,..., 2/?, in Eq. (10.5.3) are 0. Then, since the rank of [/)i,..., p^] = [a^y] is p, we find

u(a) = u\a) = ... = u^P-'\a) = 0 (10.5.37)

for the initial value problem with homogeneous boundary conditions. Consider the solution Uyix) to the following pth-order IVP problem (called the causal fundamental solution in some texts):

LUy{x) = 0; (10.5.38) w,(j) = i^l'\y) = '" = uf-'\y) = 0, uf-'\y) = ^p(y) GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR 445

Note that the boundary conditions in Eq. (10.5.38) are defined at the point x = y. The boundary condition for u^yF^^^y) was purposely chosen for reasons that will soon become clear. It turns out the function u{x) defined by

u(x) ^ fu,(x)f(y)dy = f vix - y)u^(x)f(y)dy (10.5.39)

is a solution to Lu = f with the initial conditions of Eq. (10.5.37). To see this, we simply apply L to Eq. (10.5.39) and use the initial conditions to obtain

LM = £LM,(x)/(y)Jy+^/jc)["£^M/x)j f(x)==f(x). (10.5 40) x=y Thus, Green's function for the IVP in this section is

g(x,y) = r](x-~y)u,{x\ (10.5.41)

where Uy{x) is the solution to Eq. (10.5.38). From this we can state the following theorem: THEOREM. The inverse operator G of the operator L for the IVP problem is a Volterra integral operator of the first kind. EXAMPLE 10.5.1. Find Green's function for the problem

2+M=Jc, w(0) = M'(0) = 0. (10.5.42) dx The formal solution to d'^uJx) -f M,(jc)=0 (10.5.43) dx^ is

Uy{x) = a sin(jc + 8)-\- p cos(jc + y). (10.5.44)

With the boundary conditions duJx) I uJy) = 0 and ^ =1, (10.5.45) ^ dx \x=y we find

u (x) = sin(jc - y). (10.5.46)

Thus,

g(x, y) = r](x - y) sin(x - y) (10.5.47)

and the solution to Eq. (10.5.42) is

u(x) = I sin(jK: — y)ydy Jo (10.5.48) • • • = jc — sinjc. 446 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

EXERCISE 10.5.1. Show by integration by parts that GLu = u for the operators G and L defined in Example 10.5.1. Since LG = I by construction, this completes the proof that GL = LG = I, and so the inverse L"* of L is the integral operator G, In the remainder of this section, we will restrict our attention to the BVP for second-order equations such that a2(x) ^ 0, ai(x) are continuous, and a < x

10.5.2. The Boundary Value Problem for Unmixed Boundary Conditions In this case,

(10.5.49) B2U = a2^u{b) -)r a24u{b) = 0,

and Green's function can be constructed from the solution of two IVPs, i.e., from a fundamental system of LM = 0. Consider first the function u^ix) satisfying the boundary condition B^Ui =0 and the differential equation

Lui = a2{x)---Y -\-ai{x)-~- h«o(-^)Wi = 0, a < X < y. (10.5.50) ax ax We are free to choose u^ia) = —aj2 ^^^ u\{a) = a^ to satisfy B^u = 0. The solution to this IVP exists and is nontrivial if at least one of the two coefficients otii and 0^12 are nonzero—in order to ensure that the rank of the matrix

0 0 (10.5.51) 0 0 ^23

is 2. Similarly, the solution to the IVP

d^Uo dU'f Lu2 = ^2M-rY +^i(-^)~7 f-«o(-^)"2 = 0^ y < X < b. (10.5.52) dx dx exists when U2ib) = —0^24 and U2(b) = Qf23, ensuring that ^2^2 = ^- Again, at least one of the quantities «23 and a24 must be nonzero. If we define the kernel

|ciMi(x), a < X < y, g(x,y) = (10.5.53) |c2M2(-^)' y < X < b,

where Cj and C2 are constants, it follows that

B,g = B2g = 0 (10.5.54)

and

Lg{x, y) = 0 if A: ^ y. (10.5.55) GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR 447

According to our arguments in Eqs. (10.5.32), (10.5.33), and (10.5.36), g(jc, 3^) will be Green's function for the second-order problem if Eqs. (10.5.54) and (10.5.55) hold, if g(jc, y) is continuous at x = y, and if

dg(x,y) dg(x,y) 1 (10.5.56) dx dx ^liy) To enforce the continuity and jump conditions, we require that Cy and C2 be chosen to satisfy the equations

-CiU^(y) + C2U2(y) = 0 1 (10.5.57) -Ciu[(y) + C2U2{y) = aiy) Solving these equations for Cy and C2, we find

ih(y) i^iiy) Ci = and C2 = (10.5.58) a2{y)W{u^,U2,y) ^ diiyWi^u^h^y) where W{ui,U2,y) is the Wronskian determinant defined in Eq. (10.4.85)

uiiy) U2{y) W(u^,U2,y) = (10.5.59) ^\iy) wi(y) We proved in Section 10.4 that the Wronskian is nowhere 0. We have, thus, found that, for the second-order differential equation with unmixed boundary conditions. Green's function is given by

U2(y)u^ix) a < X < y, ci2{y)W{u^,U2,y)' gix. y) = (10.5.60) Ui(y)u2(x) , y < X < b, a2{y)W(u^,U2,y) where u^ix) and U2(x) are solutions of the initial value problems posed at Eqs. (10.5.50) and (10.5.52). EXAMPLE 10.5.2. Find Green's function for the problem

.d^u du LM = -(1+JC)^—^+4(l+jc)—+2M = /(JC), 0

A solution to Liii = 0 with MI(0) = 0 and M',(0) = 1 is X (10.5.63)

A solution to Luj = 0 with H2(1) = 1 and M'2(1) = —1 is 4 M, = (10.5.64) (1+JC) 2- 448 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

These fundamental solutions satisfy Lwj =0, B^u^ = 0 and Lu2 = 0, B2U2 = 0. Thus, Green's function for this example is (from Eq. (10.5.60))

g(x. y) = < (10.5.65) y , _ y < X

EXAMPLE 10.5.3. Find Green's function for the problem

d^u du Lu — —^ -f 3 \-2u — fix), -1 < jc < 1, (10.5.66) dx dx u{-\)-2u\~l)=0 (10.5.67) u{l) + u{l)=0 and solve the inhomogeneous problem for the cases

(a) fix) = X (10,5.68)

and

(b) /(x) = exp(-jc^). (10.5.69)

The general solution to LM = 0 for Eq. (10.5.66) is

u =a exp(—2x) + p exp(—x). (10.5.70)

To obtain MJ, set M(-1) = 2 and u'i-l) = 1. This leads to a = —3/exp(2) and P = 5/e, and so

u^ix) = -3exp(-2(jc -f 1)) -|-5exp(-(jc 4-1)). (10.5.71)

Similarly, to find U2, set M(1) = 1 and M'(1) = —1 to obtain a = 0, jS = e, giving

U2ix) = exp(-(jc - 1)). (10.5.72)

From Mj and M2' we find

W(M,, U2, y) = -3 exp(-(3j 4-1)) (10.5.73)

and

[-^[-3exp{-2(x + l)) +5exp(-(x + l))]exp(2(>; + l)), -\

The solution to the inhomogeneous equation is then

«(^) = / g(x, y)f(y) dy, (10.5.75)

or, upon substitution of Eq. (10.5.74),

uix) = --exp{-(x-D) f [-3expiy-l)+5exp{2y)jf{y)dy

- ^ [-3exp(-2(;c +1)) +5exp(-(A: +1))] (10.5.76)

X l\xp{2{y + \))f{y)dy.

For case (a), f{y) = y, and direct substitution yields

u{x) = - —[-3exp(~2(l -f jc)) +5exp(-(l +Jc))]

X [exp(4) - exp(2(l + JC))(2JC - 1)1 (10.5.77) — — exp(l — ;c)| —9exp(—2) 4- exp(jc — 1)

X [l2-5exp(l-hJc)-t-2A:(5exp(l+x)~6)l|.

For case (b), f{y) = exp(—>'^), and Eq. (10.5.76) does not yield an analytical solution. We can, however, solve for a solution by utilizing the numerical techniques of Chapter 3 for integration. Both solutions are plotted in Fig. 10.5.1 for the interval —1 < jc < 1. (A Mathematica program has been provided in the Appendix for • • • this example.)

III! -3.5

-6.0 ^^y^ J

u{x) -8.5 /(a;) ^xy^ X J

-11.0

1Q K III -1.0 -0.5 0.0 0.5 1.0

FIGURE 10.5.1 450 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

••• EXERCISE 10.5.2. Calculate and plot the solution u(x) to Eqs. (10.5.66) and III (10.5.67) when /(;c) = sinnx, f{x) = 1/(2 - jc), and f{x) = ^/x + 1.

10.5.3. The Boundary Value Problem for General Boundary Conditions In this case, the boundary conditions are

BiU = aûia) + a,oM'(a) -\- atû(b) + auu\b) = 0 (10.5.78) B2I1 = a2iu(a) + a22u\a) + a2ûib) -j- a2/û\h) = 0,

with the rank of the matrix

(10.5.79) *22 "23 "24

equaling 2. The differential expression is again

Lu =«2(-^)T^ +^I(-^)3—\-aQ{x)u, a < x < h, (10.5.80)

where the functions a,(x) are continuous, a2{x) ^ 0, and [a, b] is a finite interval. If we wish to define the adjoint differential expression L^u, we require that a,(jc) belong to C^'\a, b), i.e., that they have continuous /th-order derivatives. Let Ml be a nontrivial solution to Lwj =0 with B^UY = 0 and let M2 be a nontrivial solution to Lu2 = 0 with ^3^2 = 0. Solutions to the second-order differential equation satisfying only one homogeneous boundary condition always exist and can always be computed from linear combinations of fundamental solutions. Recall also that the causal fundamental solution, M^(JC), defined by Eq. (10.5.38) exists and satisfies the continuity and jump conditions for Green's function. Thus, if we set

g{x, y) = r](x — y)Uy(x) + aui(x) 4- Pu2(x), (10.5.81)

then Lg(x,y) = 8(x — y) and g(x,y) is the desired Green's function if a and p are chosen so that g{x,y) satisfies the boundary conditions Big = B2g = 0, or

a^ûAb) -f- a^jû'Sb) + ciBxU2 = 0 (10.5.82) a2iUy{b) 4- a2û'y{b) -f 01B2U1 = 0.

The quantities Uy{a) and u'^id) are missing from these equations because Tfia ~ y) = Q fox a < y

« = -[a23Uy{b)+a24u' {b)]/B2Ui ; (10.5.83) P = -[ai,Uy{b)+a,yy(b)]/BiU2. GREEN'S FUNCTION: INVERSE OF A DIFFERENTIAL OPERATOR 45 I

••• EXAMPLE 10.5.4. Use the method just described to find Green's function gix,y) for the differential expression

Lu = —j-^, (10.5.84) dx with

B.u = M(0) = 0 (10.5.85) B2U = M(1) = 0.

We have already clearly determined g{x,y) directly (Eq. (10.5.27)) and so we will only use the problem to illustrate the procedure for incorporating the general boundary conditions. The solution to

-TT"yW=^' w,(x) =0, —p-^ =—- (10.5.86) aX \x=y dx \x=y ^2\y) is

u^(x) = -{x-y), (10.5.87)

Furthermore, the solution to d^ujdx^ = 0 with MI(0) = 0 is ^^(JC) = CiX and to d^U2/dx^ = 0 with U2{\) = 0 is U2{x) = C2(l — x), where Cj and C2 are arbitrary constants. From these results, it follows that a — —Uy{\)jc^ = (1 — j)/c, and jg = 0. Thus, Eq. (10.5.81) becomes

g{x,y) = -r){x-y){x~y)^x{\-y) (10.5.88) = ni^ - y)yi^ -x)-\- r]{y - x)x{\ - y), which is in agreement with Green's function (Eq. (10.5.27)) found by direct • • • integration. Let us end this section with a few comments on Green's function for the adjoint L^ of the operator L. Remember that the boundary conditions for L^ are B/M = 0, M = 1,..., /7. The functional BJV can be derived from the functional BfU and the condition 7(u, v)|J = 0 for u € P^^^. If, however, we have Green's function g(x,y) for L, we do not have to find the boundary conditions BJV = 0, i = 1,..., p, to find Green's function g^(x, y) for L^ This is because of the theorem:

THEOREM. If g(x,y) is Green's function for L, then

gHx.y) = g*iy,x), (10.5.89)

where g\x, y) is Green's function for L^ The proof of this theorem is straightforward. By construction of Green's function, we have

Lg(x. y) = 8{x - y) and L^gHx, y) = 8(x - y), (10.5.90) 452 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

But from the property of the adjoint, (v, Lu) = (L^v, u), it follows that

flsHx, y)YLgix, z) dx = flL'gHx, y)Yg{x, z) dx. (10.5.91)

Together, Eqs. (10.5.90) and (10.5.91) imply

f [gHx,y)]*Hx-z)dx= f S{x-y)g(x,z)dx, (10.5.92)

gHz.y) = g%y,z), (10.5.93)

in agreement with Eq. (10.5.89). From the above theorem, it follows that the kernel of the adjoint G^ of the integral operator G is, as expected. Green's function of the adjoint L^ of the operator L. Given Green's function g(x,y) for L, the solution to the equation

L^ = h (10.5.94)

is simply

v = G^h or v(x)= g*(y,x)h{y)dy (10.5.95)

for the boundary conditions BJV = 0, i = I,.,., p. What is nice is that we do not even have to determine the boundary conditions of L^ to find its inverse G^ or to solve Eq. (10.5.94). ••• EXERCISE 10.5.3. Use the result in Eq. (10.5.93) to find the boundary con- • I • ditions for the adjoint L^ of the operator L defined in Example 10.5.3. f 0.6. SPECTRAL THEORY OF DIFFERENTIAL OPERATORS: SOME GENERAL PROPERTIES

The eigenproblem for differential operators is, of course, to find values of the scalar X for which the equation

L0 = X0, 0eP^, (10.6.1)

has solutions—where L is a pth-order differential expression and

r>^ = {u, Lu e n, BiU = 0, / = 1,..., /?}.

Certain things we know for matrix and integral operators follow immediately for differential operators. If L^ is the adjoint operator with the domain Vj^ = {v, L^v G Ti, B]V = 0, / = 1,..., /7} and obeying the relationship (v, Lu) = (L^v, u), it follows immediately that

THEOREM. //"0,, X^ are an eigenvector and an eigenvalue ofL and ^,, Vj are an eigenvector and an eigenvalue ofh and ifv* ^ X,, then 4>i is orthogonal to fj. SPECTRAL THEORY OF DIFFERENTIAL OPERATORS: SOME GENERAL PROPERTIES 453

We can see this from

{fj, L^,) = {L+ ir^ ,,)= A, if J. ^,) = v* {ir.,,1,,). (10.6.2)

Another property can be expressed by:

THEOREM. If L is self-adjoint, then the eigenvalues X, are real, since

Of course, if L is self-adjoint, it follows that (^,, j) =0 if A, :/: Xj. If L is a perfect operator, i.e., if the eigenvectors 0,, / = 1,..., form a complete set, then the spectral resolution theorem holds:

00 L = E^/X|0„ (10.6.3) 1 = 1

where (xJ, 0,) = S^j, The set {x,} is the reciprocal set to {0,}. Moreover, since the adjoint of Eq. (10.6.3) is

00 L* = E^;*Jx„ (10.6.4) t = l

it follows that the vectors Xe ^^^ the eigenvectors of the adjoint L^ of L and that the eigenvalues of L^ are the complex conjugates of the eigenvalues of L. Of course, when Eq. (10.6.3) holds, the function /(L) obeys the decomposition theorem

oo /(L) = 5]/(X,)xj0i, (10.6.5) j = l as long as the function f{t) is defined at r = A,, / = 1, 2.... A couple of simple examples of perfect differential operators are du (i) Lu = i — , 0 < jc < 1, M(0) = M(l), (10.6.6) dx

and

d^u (ii) LM = --J-217^, 0 < JC < 1, M(0) = M(l) = 0. (10.6.7) The reader can verify that these operators are self-adjoint. The eigenvectors and eigenvalues of case (i) are

0^(x) = exp(—/X„.x:), X^ = 2nn, n = 0, ±1, ±2,..., (10.6.8)

and of case (ii) are

(/>„(jc) = Vlsitiy/I^^x, X„ =nn, /i = 1,2, (10.6.9)

We know from the theory of Fourier series that Eqs. (10.6.8) and (10.6.9) each define complete orthonormal sets in £2(^' !)• ^^ ^^^o see that for these perfect differential operators |X„| -> oo as n -^ oo. 454 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

We can also establish completeness of the eigenvectors of cases (i) and (ii) from the theory of integral equations. Consider first case (ii). The eigenproblem is

L0 = A.^, 0€X>^. (10.6.10)

Since L0 = 0 has only the trivial solution, the inverse G of L exists. Thus, since GL = I, multiplication of Eq. (10.6.10) by G yields (since X / 0)

G0 = X-V (10.6.11)

or, recalling Green's function from Eq. (10.5.26), we obtain

\ri(x - y)y(l - x) + t](y - x)x(l - y)](l>(y) = X'^cPix). (10.6.12)

Since /^ \g(x,y)\^dx dy < oo and G = G^ it follows that the integral operator G is a self-adjoint, Hilbert-Schmidt operator. In Chapter 9, we proved that such operators are perfect and have a complete orthonormal set of eigenvectors {^,} in H. Since these are also the eigenvectors of L, it follows that L is perfect and that its eigenvalues k^ are the reciprocal of the eigenvalues /x, of G. We also proved in Chapter 9 that the eigenvalues jit, have a finite degeneracy (multiplicity) from which we conclude that so also do the eigenvalues A,. In case (i), the homogeneous equation Lu =0 has the solution M = 1, and so L has no inverse. However, we can define the operator L by

Lu = Lu + u, M(0)=:M(1) (10.6.13)

for which solutions to

L0 = A0 (10.6.14)

obey the equation

L0 = (X - 1)0. (10.6.15)

Thus, the eigenvectors of L are eigenvectors of L and the eigenvalues are related by A. = A. — 1. Green's function corresponding to L is

g(x , y) = \-r]{x -y)-h ^—~^]i exp(/(jc - y)). (10.6,16) L l-exp(~-Oj Again, /^ \g{x, y)\^ dx dy < oo and L and G are self-adjoint so that Eq. (10.6.14) can be transformed to

G0 = X-^0, (10.6.17)

where G is again a self-adjoint, Hilbert-Schmidt operator. Thus, the eigenvectors of G and, therefore, of L and L, form a complete orthonormal set in £2(0, !)• The device we just used in studying case (i) can be employed to obtain some general results for the spectral theory of differential operators. Suppose a is a number, real or complex, that is not an eigenvalue of a regular self-adjoint pth-order SPECTRAL THEORY OF DIFFERENTIAL OPERATORS: SOME GENERAL PROPERTIES 455

differential operator L (where a,(jc) have continuous ith derivatives in the finite interval [a,b] and a^ix) / 0). Since L is self-adjoint, its eigenvalues are always real, and so any complex number or, Ima 7*^ 0, will not be an eigenvalue of L. Suppose the operator L is defined by

Lu = J^a.ix)^. Bju = 0, J = 1,..., p, (10.6.18)

with the requirement u, Lu € H. We can define the new operator L = L + al by

^ d'u Lu = yaAx)—-r-{'au, B:U=0, 7 = l,...,p, (10.6.19)

again with the requirement u, Lu e W. L and L have the same eigenvectors because the equation

L0,. = X^^i (10.6.20)

can be rearranged to give

L0,. =:(A,.-a)0,.. (10.6.21)

Thus, the eigenvalues of L are X, = X, — a. Since a is not an eigenvalue of L, the only solution to the homogeneous equation Lu = 0 is the trivial one u = 0, and so the inverse L~^ of L exists. This inverse is the integral operator G whose kernel g(x, y) is Green's function for L. The eigenvectors of L and G are the same because the equation

G0, = K,0, (10.6.22)

can be rearranged to give

L0, = K,-V,, (10.6.23)

where we used L~* = G. Equations (10.6.21) and (10.6.23) demonstrate that L and G have the same eigenvectors. For the eigenvector 0,, the eigenvalue X, of L is related to the eigenvalue y, of G by the expression }/, = XJ'^ = (A., 4- a)~^. The eigenvalues A, cannot be 0 since L is nonsingular. However, X, can be 0 and y, may approach 0 as / -> cx). Correspondingly, |X,| may approach 00 as / -> 00, in keeping with the fact that a differential operator is unbounded (keep in mind that L may not even have an eigenvector—as, for example, in the case of initial value boundary conditions.) Note that if 0, is an eigenvector such that L0, = 0, then 0, will be an eigenvector of G with eigenvalue y, = 1/a. Thus, L and G have exactly the same eigenvectors. The kernel g{x,y) of G is a continuous function of x and y, and so, since [fl, b] is a finite interval, g(x,y) is square integrable in the bounded rectangle [a, b] X [a, b]. We proved in Section 9.5.2 that if the kernel of an integral operator G is square integrable, then G has only a finite number of eigenvectors for any 456 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

nonzero eigenvalue. This enables us to state the theorem:

THEOREM. Ifh is a pth-order differential operator in a finite interval [a,b], ifap(x) ^ 0 and a^ix) is continuous in [a, b], and if there exists a number a not equal to an eigenvalue of L, then the number of eigenvectors corresponding to a given eigenvalue A,, is finite. X, can be zero or nonzero. The property implied by this theorem distinguishes a differential operator from an integral operator. Consider, for example, the self-adjoint integral operator K = \4>l (10.6.24) where 0^ is the first of a complete orthonormal set {0i, 02* • • )• ^^ ^his case, K has an infinite number of eigenvectors, 02' 03' • •» corresponding to the eigenvalue A = 0. By the theorem just given, this cannot happen for a self-adjoint differential operator defined in a finite interval [a,b]. There is another way to prove that the degeneracy of eigenvectors of a differential operator is finite. Consider the eigenproblem

Lu ~ Xu = 0, (10.6.25)

where L is a pth-order differential operator with the boundary conditions

B,M=0, / = !,..., p. (10.6.26)

Suppose X is an eigenvalue of L, 0 or otherwise. Equation (10.6.25) is a pth-order homogeneous equation. d^u du f . (10.6.27) Lu =ap{x)-—^dxP ^ f-ai(x)—dx -h \a^{x) - X)u =0. If {MJ, ..., Up] is a fundamental system of LM = 0, a solution to Eq. (10.6.27)— which exists by the hypothesis that A is an eigenvalue of L—can be expressed as

U = ^Qf.M.. (10.6.28)

The boundary conditions, Eq. (10.6.26), determine the values of a^. Insertion of Eq. (10.6.28) into Eq. (10.6.26) yields

^i«i BiUp a, (10.6.29)

BpU^ Bp^P ot„ The rank r of the matrix [B^Uj] determines how many linearly independent solutions {«!,..., a^} Eq. (10.6.29) admits. The rank r must lie between 0 and p — 1, since we assumed at the outset that A, is an eigenvalue. The number of linearly independent solutions to Eq. (10.6.29), p — r, corresponds to the number of eigenfunctions L has for the eigenvalue A. This yields the theorem:

THEOREM. Ifhis a pth-order differential operator, then the number of eigenvectors corresponding to a given eigenvalue is less than or equal to p. SPECTRAL THEORY OF DIFFERENTIAL OPERATORS: SOME GENERAL PROPERTIES 457

Consider a pth-order differential operator L that has an adjoint, i.e., ap{x) ^ 0 and aiix) € C^^\a,b). And suppose L is an operator for which there exists a real or complex number a that is not an eigenvalue. A self-adjoint operator is an example of such an operator. Since such an operator, if singular, can be converted to a nonsingular one by the addition of aV to L (where a is not an eigenvalue of L), and since LX = LL^ if L^L = LL^ we can assume, for the purpose of proving the next theorem, that L is nonsingular. Consider the /7th-order differential operator L and its adjoint with boundary conditions

BiU = BJU=0, / = 1,..., /7. (10.6.30)

We assume that Lu = 0 has only the trivial solution u = 0 and that the differential expressions L and L^ obey the relationship

LL^w = L^Lw. (10.6.31)

We say L is a normal differential operator if it obeys Eqs. (10.6.30) and (10.6.31). We define the inverse of L as G and that of L^ as G^ From Eqs. (10.6.30) and (10.6.31), it follows that

LL^ = L^L or (L^)-^L-^ = L'^Vy^ (10.6.32)

giving

G^G = GG^ (10.6.33)

Since the kernel of G is square integrable in the bounded rectangle [a,h] x [a, h] and obeys Eq. (10.6.33), we conclude that G is a normal, completely continuous operator. According to the theorem proved in Section 9.5.7, such an operator possesses a complete set of orthonormal eigenvectors 0,, f = 1,2,..., and since L0J = Vj'Vr if G^0r = ^i4^i^ the following theorem is obeyed:

THEOREM. IfL is a normal differential expression (i.e., it obeys Eqs. (10.6.30) and (10.6.31)) and is nonsingular, or can be made nonsingular by the addition of Oil, then the eigenvectors 4>i^ i = 1,2,..., of the operator h fonn a complete orthonormal set. Since the eigenvalues of a self-adjoint operator are real, there always exists a complex number such that L + al is not singular, and so the theorem always holds for self-adjoint operators. The theorem yields the spectral decomposition of L and /(L), namely, L = E^.^.^I and /(L) = 2:/(X,)0,0j, (10.6.34)

as long as f{t) exists for r = A,,, / = 1, 2,.... The theorem generalizes to pth- order differential operators the theorem proved in Chapter 9 for normal, completely continuous operators. Self-adjoint operators are a subset of the class defined by Eqs. (10.6.30) and (10.6.31). 458 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

It is important to note that there are differential operators for which there does not exist a complex number that is not an eigenvalue. For example, consider

Lu = -—r, M(0) = -M(1), M'(0) = u\l). (10.6.35)

The general solution to LM = A,M is

M = Ci sin \/Xx + C2 cos VAX, (10.6.36)

where c^ and C2 are arbitrary constants (sin \/KX and cos \/Xx form a fundamental system for the equation Lu — ku = 0). Applying the boundary conditions to Eq. (10.6.36), we find

Ci sin VA -f C2(l + cos vA) = 0 (10.6.37) Ci\/I(l - cos \/A) + <:2Vx = 0.

Equation (10.6.37) has a nontrivial resolution for c^ and C2 if and only if

sin ^/X 1 + cos \/X = 0. (10.6.38) VX(l — COS A/X) \/A sin VX

However, this determinant is

VX sin^ vX + cos^ VX -1=0 for any k. (10.6.39)

Thus, the eigenvectors of the operator L defined by Eq. (10.6.35) are given by Eq. (10.6.36), where the eigenvalues A, are any complex number. There is then no number a such that L + ofl is nonsingular. Of course, in this case L is neither self-adjoint nor normal. There are also non-self-adjoint operators L that have no eigenvectors or eigenvalues. For example.

d^u Lu = w(0) = ~2u(7t), M'(0) = lu'in). (10.6.40)

The general solution to Lu = ku is again given by Eq. (10.6.34) and nonzero values of Cj and C2 can be found if

2 sin TTvX 1 4-2 cos TTVX 0 = \/X(l —2cos7t^/k) 2\/X sin TTvX (10.6.41)

= vX;(4 sin^ JT VX -h 4 sin^ TT VX •- 1) = 3 VX.

From Eq. (10.6.41), we see that the only candidate for an eigenvalue is A. = 0. In this case, u = €2. But the boundary condition M(0) = —2u{7t) yields C2 ~ —2c2 or C2 = 0. Therefore, the operator defined by Eq. (10.6.40) has no eigenvalues. SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 459

10 J. SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS

Consider the second-order differential expression 1 d ( du\ q(x^ Lu = --— — Ipix)— 4- —u, a

(v,u)=/ s{x)v\x)u(x)dx, (10.7.2)

The formal adjoint differential expression V corresponding to L, defined by the expression

\ s{x){v*Lu - {L^vyu]dx = p{x)[v\x)u\x) - {v\x)yu{x)][, (10.7.3)

is the same as L. Thus, the Sturm-Liouville operator is formally self-adjoint. The operator hm C2{a,b\s)\^ defined by the differential expression L and the boundary conditions Bû = Qfiiw(a) -f oti2u'{a) + a^^ûQ)) -f- a^û'ib) = 0 (10.7.4) B2U = a2iu{a) -\- a22u\a) -h a2û{b) -\- a2û\b) = 0,

where the coefficients a,y are real numbers. The adjoint operator L^ is determined by the differential expression L and the boundary conditions B\V = BIV — {) derived from the condition

p(x){v\x)u\x) - {v'{x))\{x)f^ = 0, for all u € P^^, (10.7.5)

where

Vi^ = {u, Lu G C2{a. b; s); B^u = B^M = O}. (10.7.6)

The boundary functional B,M of interest here are those for which B}U = B^u, / = 1, 2, so that L^ = L (i.e., L is a self-adjoint operator). It is interesting to note that L cannot be self-adjoint if the functional B^u define an initial value problem. To prove this, consider the case of initial conditions

B.u = or,iM(a) -I- a^ru'{o) — 0 (10.7.7) B2U = a2iuia) -f 0^22w (a) = 0. Since the matrix

""" '''' (10.7.8) ^21 ^22 460 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

has rank 2, it follows that u{a) = u'{a) = 0 in this case. Equation (10.7.5) then becomes p{b)[v*{bWib)-{v\b)yu(b)] = o (10.7.9) for any u e P^. Since u(b) and u'(b) are arbitrary, it follows that the boundary conditions for L^ are v{b) = v\b) = 0—which are different from the boundary conditions of L—and so L cannot be self-adjoint for the initial value problem. The general condition for self-adjointness of the Sturm-Liouville operator is summarized in the following theorem:

THEOREM. The Sturm-Liouville operator defined by Eqs. (10.7.1) and (10.7.4) is self-adjoint if and only if

or, p{a) «I4 = pib) «12 (10.7.10) *23 «24 ^21 «22

Since L = L^ self-adjointness merely requires that fij^ = JB^, / = 1, 2, i.e., that V and u satisfy the same boundary conditions. This requirement can be summarized by the matrix equation

B^u B^v* 0 0 (10.7.11) B2U B,v^ 0 0

which can be rearranged to give

a^û{a) -\-a^^2^'{a) Qfjii;*(a) 4-ai2(i''(^)) a2\u{a) + oi22u\a) oi2iV*{a) -\- a22{y'{ci)) (10.7.12) a^ûib) -{- a^û'ib) a^^v*(b) -f ai^{v'(b)y a23,u(b) + a24u\b) a23V*{b) + a24{v\b)y

Taking the determinant of each side of Eq. (10.7.12) and using the elementary properties of determinants, we find that the expression

a, u{a) v*{a) ^21 *22 u'(a) (v\a)r (10.7.13) u{b) v*ib)

a 23 X24 , I u^(b) (v\b)r is the condition that the boundary conditions of L and L^ are the same. However, by the definition of L^ in Eq. (10.7.3), u and v satisfy

u{a) v*{a) u(b) i;*(fc) p{a) = p(b) (10.7.14) u\a) (v\a)r u'(b) (v\b)y

There are two possibilities to consider here. The first is that the determinants ^11^22 ~ ^21^12 ^^^ ^i3<^24 "^23^14 ^^^ ^^t ^-1^ this case, division of the left- and SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 461

right-hand sides of Eq. (10.7.13) by the left- and right-hand sides, respectively, of Eq. (10.7.10) yields Eq. (10.7.14), proving that the condition at Eq. (10.7.10) yields self-adjoint boundary conditions for this case. The other possibility is that the determinants Qr,iQf22 — âi^îa ^"^ î3<^24 ~ <^23î4 ^^^ ^^^^ ^- The only other alternative—that one of the determinants is 0 and the other is not—defines an initial value problem that we have already shown will not yield a self-adjoint operator. Since the rank of the matrix [a^j] (for / = 1, 2, 7 = 1,..., 4) is 2, at least one of the determinants

«ii «14 «12 «14 /?! = /?2 = «21 «24 «22 «24 (10.7.15) «11 «13 «12 «13 ' R,= /?4 = ^21 «24 ^22 «23

is nonzero since

Ofi Otr M4 (10.7.16) ^21 ^22 a 23 *24

for this case. From the linear combinations 0^24^1 ^13^2^ = 0, and the relation (Xi^otjA ~ ^23^14 = ^^ ^^ obtain

R^a) -f- /?2w'(«) = 0 (10.7.17) R^u(a) + /?4M'(a) = 0

The equations in Eq. (10.7.17) are also satisfied by v*, i.e.,

R,v*ia)-\-R2{v\a)y = 0 (10.7,18) R,v*(a)^R,{v\a)y =0

Consider R^,..,, R^ to be unknowns of the four equations in Eqs. (10.7.17) and (10.7.18). Since at least one of the /?,, i = 1,..., 4, is nonzero, it follows that the determinant

u(a) M'(fl) 0 0 v*(a) {v'(a)y 0 0 (10.7.19) 0 0 u{a) u'(a) 0 0 v*ia) {v'(a)y

is 0, or, equivalently.

u{a) u'{a) = 0. (10.7.20) v*(a) (v'(a))' 462 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

Analogously, the equations ctii^x^ ~ ^nî^ = ^' ^2\B\U — otuBjU = 0, and the relation ot^â22 — (^n^îx = ^ ^^^"^ ^^ ^ê condition

u{b) u'{b) : 0. (10.7.21) v\b) {v'{b)y

It follows from Eqs. (10.7.20) and (10.7.21) that Eq. (10.7.14) holds, and so Vj^ = P^t for this case. This completes the proof of the sufficiency part of the theorem. The proof of necessity requires the assumption that Eqs. (10.7.13) and (10.7.14) are true and then proving that Eq. (10.7.10) is implied. This part will be left to the reader. It is important to note that there is a Hilbert space in which Eq. (10.7.10) guarantees self-adjointness for any real, regular second-order differential operator. Consider the general operator r (fu dy 1 Lu = —\a2{x)—-iz -[-aAx)-—\-aQ{x)u , a < x 0, and [a, b] is a finite interval. Next consider the differential operator defined by

L'u = -aAx) exp(- £ ^ ^?) £ {exp(£ ^ ^?) ^ } + a,ix)u, (10.7.23) where c is an arbitrary point in [a,b]. Carrying out the differentiation on the right- hand side of Eq. (10.7.23), we find that V = L. It follows then that if we set

/7(^) = exp(r^jA, six) = ^ andq(x)=a,(x)s(x), (10.7.24) Vc a2{;) ) p{x) we can transform any real, regular second-order differential expression Lu into a self-adjoint form with the appropriate Hilbert space C2(a,b\s{x)). Thus, if and only if the boundary functions B^u and B2U obey Eq. (10.7.10), the operator defined by Eq. (10.7.22) is self-adjoint in the Hilbert space C2ia, b\ s), where s{x) is given in Eq. (10.7.24). It follows from the transformation of Eq. (10.7.27) into Eq. (10.7.23) that Sturm-Liouville operators represent the entire class of real, regular second-order differential operators. EXERCISE 10.7.1. Consider the operator L defined by

d^u dii Lu = ~---r - -—, 0 < jc < 1, M(0) = M(1) = 0. (10.7.25) dx^ dx' (a) Find the Hilbert space in which Lu is formally self-adjoint. (b) Is L self-adjoint for the boundary conditions given? (c) Find the eigenvalues and eigenvectors of L. (d) Solve the problem

Lu=—, (10.7.26) at where M(A:,0) = 1. SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 463

(e) Solve the problem

Lu = f, (10.7.27)

where f{x) = x.

Some examples of self-adjoint boundary conditions for the operator

Lu = — 0 < X < 7t, (10.7.28) dx^'

are

(i) M(0) = u(7r) = 0

(ii) M(0) = u(n), u\0) = u'(n) (10.7.29) (iii) M(0) sin0 + u\0) cos^ = 0, 0 arbitrary u(n) sin0 -I- U{7T) cos0 = 0, 0 arbitrary,

Consider now the eigenproblem

Lu = Xu (10.7.30)

for the operator in Eq. (10.7.28). A fundamental system for Lu — Xu =0 is

Ui=smVXx, U2 = cosVXx. (10.7.31)

The eigenproblem is solved by setting

2 w = E^i"^ (10.7.32)

and applying the boundary conditions B^u = 0, / = 1, 2. This leads to the equations

Cj^iMj + C2B1M2 = 0 (10.7.33) C1B2M1 -\-C2B2U2 = 0,

which have a solution if and only if

OjMj ByU2 = 0. (10.7.34) B2U1 B2U2 464 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

For the three examples in Eq. (10.7.29), we find the following equations for the eigenvalues of L:

0 1 (i) 0 (10.7.35) sinjrVx cos TTVX

— sin;r\/X 1—COSTTVX (ii) = 0 (10.7.36)

VXcos^ sin^ (iii) sin 0 sin n VX sin 0 cos n vx = 0 (10.7.37) + VXCOS^COSTTVX — ^/X COS 0 sin TTA/X

For case (i), we find that the eigenvalues must satisfy

sin TTVX = 0. (10.7.38)

Candidate eigenvalues are then X^ = 7^, y = 0, I, 2,.... Note, however, that the case >.Q = 0 is not an eigenvalue since the eigenvector

Uj(x) = Ci sin J)^jX + C2 cos J^jX (10.7.39)

yields MQ = C2 for XQ == 0. This does not satisfy the boundary condition M(0) = u(7z) = 0. For Xj = /, 7 = 1, 2,..., M^.(O) = Ujin) = 0 if C2 = 0. Thus, the normalized eigenvectors of L for case (i) are

with eigenvalues Xj = j^, j = 1,2,.... We know from the theorem given in the previous section that 0y, 7 = 1,2,..., form a complete orthonormal set in A(0,7r). For case (ii), X = 0 is an eigenvalue and corresponds to the eigenfunction 00 = I/A/TT- P^^" ^ 7^ 0» Eq- (10.7.36) requires

— sin^TrVX— 1 4-2cos;r\/x - cos^7r\/X = 0

cosSTTV A = 1. (10.7.41)

It follows from Eq. (10.7.41) that Vx = 7, 7 = 0, 1, 2, The rank of the determinant in Eq. (10.7.36) is 0, and so there are two linearly independent solutions corresponding to each eigenvalue Xj = 4j^, j = 1,2,.... For XQ = 0, there is only one eigenvector since sin ^/J^x = 0. The orthonormal eigenfunctions for SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 465

case (ii) are thus 1 00 =

sin Ijx (10.7.42)

M(0) = 0 and u(n) = u'(7t), (10.7.43)

These conditions are appropriate for a heat transfer problem in a slab in which the slab is insulated at jc = 0 and is in contact with a fluid at jc = TT. Again, A = 0 is not an eigenvalue for this case and the eigenvalues obey the equation

siuTrVX = VXcosTZy/x. (10.7.44)

For this problem, the eigenvalues kj have to be found numerically. From the elementary properties of the equation tanj^c = jc, we can say that A.^ -> oo as 7^ as 7 -> 00 and that, for large 7, Xj ^ {2j + 1)^/4. The normalized eigenfunctions for this example are

/27r -sin27ryAA'^^ /—- t>j = [ 4-^^V )M si.;n^ 1^A J = 1,2,.... (10.7.45) Although it is not apparent from Eq. (10.7.45) that these eigenfunctions are orthogonal in £2(0' ^)' w^ know that they are from the basic theory of self-adjoint operators. EXAMPLE 10.7.1. We can solve case (iii) above numerically for arbitrary values of 0 and 0 by using the Newton-Raphson method of Chapter 3 to solve for X in Eq. (10.7.37). The eigenvalues of L defined in Eq. (10.7.28), in this case, must satisfy

0 =vXcos^rsin0cos(7rVA) — VAcos0sin(7rVA-)l (10.7.46) — smO[VXcos(f)cos(7T\/k) -f sinsin(7rA/A)]. As an example, we will choose 0 = 2 and 0 = 1.5 (radians). Figure 10.7.1 contains a plot of the right-hand side of Eq. (10.7.46). We choose, as our initial guesses, estimates of the zeros read right off the graph. Using the set of initial guesses {1, 3,7,13,21, 30,42,55, 71, 88}, the first 10 corresponding eigenvalues are [kJ = {0.741483, 3.04912, 7.07338, 12.9198, 20.6431, (10.7.47) 30.2711,41.8202, 55.3014,70.7238, 88.0946,...}. 466 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

/(A) /(A) = 0

FIGURE 10.7.1

The corresponding eigenvectors can easily be determined from Eqs. (10.7.31) and (10.7.32) as

U^(X) = Ci„ sin y/X„X + C2^„ COS y/^X, (10.7.48)

where the coefficients are determined from

Ci = imO, (10.7.49) Vk with the appropriate normalization. Note that, although the solution X = 0 satisfies Eq. (10.7.37), it is not an eigenvalue for this example. We can see this directly by noting that the coefficients Ci and C2 are solutions to the matrix equation

0 sin^ = 0 (10.7.50) 0 sin0

when A, = 0. The only nontrivial solution possible is when sin^ = sin0 = 0, in which case Cj = 0 (by default since sin(O) = 0 in Eq. (10.7.48)) and C2 = 1/v^ I I (after normalization). •• EXAMPLE 10.7.2. In the classical physics of heat and mass transfer, one frequently encounters problems of the form

du = —DLu, B^u = B2I1 =0, a < X < b, (10.7.51)

where u(x,t) is a function of two variables (position x and time 0- Typically, initial conditions of the type u(x, 0) = f(x) are considered. In Eq. (10.7.51), D is a transport coefficient and is a positive scalar. In these problems, Lu = —d^u/dx^. SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 467

and when the boundary conditions are such that L is self-adjoint, the solution to Eq. (10.7.51) is

u = exp(-rDL)f = J^txp(-tDXj){4>j, f)if>j, (10.7.52) j or

u(x) = J2Qxp(-tDXj)(t)j(x) / (t>]{x)f{x)dx, (10.7.53)

where 0^ and A.^, 7 = 1, 2,..., are the eigenvectors and eigenvalues of L. It is interesting to note that Eq. (10.7.53) yields a very general solution to Eq. (10.7.51) (owing to the spectral resolution theorem); of course, the particular eigenvectors • • • and eigenvalues depend on the boundary conditions imposed. ••• EXERCISE 10.7.2. Choose units such that D = 1; let f{x) = exp(-jc) and let [a, b] = [0, 1]. Plot the solution u{x) versus JC for / = 0.1,0.5, 1, and 5 for the case analyzed in Example 10.7.2 (with 0 = 2 and (^ = 1.5). Plot the same curves in real units where fe = 5 cm, D = 0.245 cm^/s, and / and u are in kelvins. This III corresponds to the cooling of thin iron rods insulated everywhere except the ends. ••• EXAMPLE 10.7.3. In quantum mechanics, one encounters the one- dimensional problem of finding the eigenvalues of the equation Lu = A.M, where the self-adjoint Sturm-Liouville operator L is the Schrodinger operator and the eigenvalue X is the energy of the system. As an example, consider the wavefunction of a particle in a one-dimensional box obeying the equation

-^ = Ef, 0

-^ = AVr, 0 < ? < 1, V^(O) = V^(l) = 0. (10.7.56)

A fundamental system for Eq. (10.7.56) is u^ = sin VX^ and U2 = cos \fx^. Taking V^ = CjMi -(- C2M2 ^^d applying the boundary conditions ^(0) = V^(l) = 0, we find c^ = 0 and the condition

sm VT = 0 (10.7.57) 468 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

for the eigenvalues A.. Thus, kj = (JTT)^, j = 1,2,..., and xj/j = CiSinTry^. Replacing § by x/b and determining c^ from the normalization criterion, JQ xjrjdx = 1, the wavefunctions become

irj=^yjl sin ^, 7 = 1,2,..., (10.7.58)

corresponding to the particle energies

... Ej = - -f-, 7 = 1,2,.... (10.7.59) III ^ 2m b^ ••• EXAMPLE 10.7.4. The initial concentration of carbon in a certain long bar of steel is given by C(JC, r = 0) = CQ(X). The ends of the bar, say at jc = 0 and x = b, are insulated, which means that dc/dx = 0 at jc = 0 and x = b. The change in concentration in time is governed by the diffusion equation

dc d^c - = «^. (10.760,

where D is the diffusion coefficient. The problem of finding the time evolution concentration can be restated as the problem of solving

— = -DLc, (10.7.61) dt

where Lu = —d^u/dx^, with boundary conditions M'(0) = u^{b) = 0, defines a self-adjoint operator. Thus, the solution to Eq. (10.7.61) is

c = exp(-rDL)Co = J^Qxp{~tDkj)(j, Co)0^-, (10.7.62) J

where A.^ and 0^ obey the eigenequation L0y = ^y^y Applying the boundary conditions to u(x) = Ci sin \/Xx-\-C2 cos ^/kx, we find that c^ =0 and the eigenvalues must satisfy

VksmVkb = 0, (10.7.63)

namely, XQ = 0 and .Jkjb = jn, 7 = 1, 2,.... Thus, the eigenfunctions are

1 <^o (10.7.64) nix cos , j = \,2,.... > =# b and the concentration c{x,t) is given by

1 /•'' ~ /-tDK^j^\ [2 nix c{x,t) = -j^ CoWJj: + l]exp( -^ j(0.,Co>^-cos ^, (10.7.65) SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 469

with

^,Co> = /^{co (x)cos^dx. (10.7.66) b Thus, we see that diffusion drives the concentration of carbon in an isolated bar toward a constant value c — b~^ f^ CQ{x)dx, which is the mean value of the initial • • concentration. The concentration approaches the mean value exponentially in time. Consider now the operator L defined by

Lu = ——r1?', 0 < jc < 1, (10.7.67)

with the following boundary conditions:

(a) M(0) = M(1) = 0 (10.7.68a)

(b) M'(0) = M'(1)=0 (10.7.68b)

For each of the above conditions, the operator L is self-adjoint and the corresponding eigenvalues and eigenfunctions are

(a) A„ = (n7^)^ 0„(jc) = \/2sin«7rjc, « = 1, 2,... (10.7.69a) (b) Ao = 0, 0oW = l

X^ = {n7xf, 0„(A:) = \/2 cos/ITT JC, w = 1, 2,... (10.7.69b)

(c) X„ = (n + -j:^^, 0„(x) = V2sin( n-f-- ITTX, n = l,2,... (10.7.69c) (d) A.^*^ = {Innf, ^^\x) = V2sin2n7rA:, n = 1, 2,. .. kf=0, ^^\x) = \ Xf^ = (2n7^)^

EXERCISE 10.7.3. Show that the eigenvalues and eigenvectors in Eq. (10.7.69) are truly the eigenpairs of Eq. (10.7.67) for each of the boundary conditions in Eq. (10.7.68).

Since each of the examples above defines a self-adjoint operator, each set of eigenvectors forms a complete orthonormal set. Thus, the spectral resolution I = Y.ni^n4^\ O^ ^(-^ — >') = Y.n^ni^)^liy) ^^ ^^^'^ fOT Cach of thcSC CaSCS. Moreover, L = E« K4>n4^l and /(L) = Y.n f(K)n^l when /(A,J exists. If we denote the kernel of /(L) by /(x, y), then the spectral resolution of /(L) for 470 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

cases (a)-(d) above becomes

oo (a) f{x,y) = J^ f{n^n^)2 sin nx sin nny (10.7.70)

oo (b) fix, y) = /(O) + J2 f(n^7t^)2cos7txcosnKy (10.7.71)

CO (d) fix, y) = /(O) -f X] /(4n^7r^)2(cos2/i7rxcos2/i7r}? n=l + sin Innx sin 2:7r«>') (10.7.73)

By setting fix) = 1, we recover the resolution of the identity operator I in C2i0, 1) (/(jc, y) = Six - y)). When Fit) = t, fix, y) is simply the kernel lix, y) of the • • integral operator representing L. ••i EXERCISE 10.7.4. Use the spectral resolution of /(L) to solve the equation

— =:-Lu, M(jc,f =0)=jc, 0

^^=D^, 0

iron bar ^d r FIGURE 10.7.2 Case-hardened iron rod. SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 47 I

where t is time, x is distance from the flat surface of the bar, and d = 0.5 cm is the bar thickness. Edge effects will be neglected. The initial and boundary conditions are ^(jc, r = 0) = 0, for all x, 0^x = Oj)=0^, t >0, (10.7.76) 0(x =dj) = e^, f > 0, where 0^ = 0.012. Consider the operator

Lu = --4, M(0) = u(d) = 0. (10.7.77) dx"^ This is a self-adjoint operator whose eigenvectors and eigenvalues are

2n nn {nn\ M^(A:)= /-—sm—-jc and >.„=-—), M = 1,2, .... (10.7.78) y nd d \ " / (i) Using the fact that M„ for n = 1,... forms a complete orthonormal basis set in £2(0* d) space, solve the diffusion equation for the bar. Plot the concentration 0 versus x at various times. Determine how long the bar should be left in the methane bath for the concentration of carbon to be 0.005 at jc = 0.05 cm. (ii) Suppose the concentration of carbon in the bar is initially

e(x, r = 0) = 0.004 - 0.016(jc - 0.5 cm)Vcml

Find the concentration profiles (0 versus JC) for various times for this case. Assume again that the surface concentration is fixed at 0^ = 0.012 for t > 0. The reader may have noticed that in all of the examples of self-adjoint Sturm- Liouville operators that we have studied, the eigenvalues A„ are bounded from below by M, where \M\ < 00 and A,„ -> 00 as ft -> 00. Boundedness from below is a general property for a large class of regular self-adjoint Sturm-Liouville operators. By bounded from below, we mean that, for any u in X>^, the property

(u, Lu) > M{u, u), \M\ < cx), (10.7.79)

holds. Before stating the basic boundedness theorem and proving it, let us note that, for a self-adjoint Sturm-Liouville operator, one can, without loss of generality, limit the Hilbert space to real functions. This is because L is real and its eigenvalues are real. Suppose 0„ is an eigenvector of L and suppose it is complex. Then 0„ = tf„ + /^„, where 0„ and ^„ are real-valued functions and i = ^/—l. But L0„ = L0„ -h /L^„ = X^0„ -f i>^ni^n, and since L^„ and hifr^ are real it follows that L(?„ = kj„ and L^„ = A.„f „. If the multiplicity of X„ is 1, then $„ = ^„. If the multiplicity of A„ is larger than 1, say /?„, then the p^ vectors 0„ can always be chosen such that tf„ = ^„. Thus, the eigenvectors of L are real except for the complex factor. Since an eigenvector is unchanged when multiplied by an arbitrary constant, it follows that the eigenvectors of L can be chosen to be real without loss of generality. Accordingly, we can prove the following theorem for real functions in the Hilbert space C2{a,b;s) 472 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

THEOREM. If L, is a regular Sturm-Liouville operator in C2(a,b;s), it is bounded from below if the boundary conditions are either

u{a) = u{b)=0 (10.7.80)

u{a) = aiiM(a) and u'{b) = —a2iu{b), (10.7.81)

where a^^ and a2i ^^^ arbitrary real numbers. Consider again the Sturm-Liouville operator defined in Eq. (10.7.1). The quantity (u, Lu), after integration by parts, yields

(u, Lu) = p{a)u(a)u(a) - p{h)u{b)u'{b) .b b (10.7.82) + / /^(•^)[W'(A:)J dx + j q{x)[u{x)\ dx.

For the boundary conditions u{a) = u(b) =0 and u\a) = u'{b) = 0 (the latter is a special case of the conditions in Eq. (10.7.81)), it follows immediately that L is bounded from below since

/ p(x)[u\x)fdx > 0 (10.7.83)

and

/ q{x)[u(x)fdx >M( s{x)[u(,x)fdx = M(u, u), (10.7.84)

where

M= min ^. (10.7.85) a

v = min oii^pix), (10.7.86)

and use the boundary conditions u\a) = a^uia) and u^(b) = —a2iu(b) in Eq. (10.7.86), we find

(u, Lu) > v([u(a)f + [u(b)f^ + j p{u'fdx + j qu^dx, (10.7.87)

Moreover, if we define

p^ = max p(x) and 8 = min(p^, v), (10.7.88) a

then it follows from Eq. (10.7.87) that

(u, Lu) > 5 u{af + u{bf + / (uYdx + f qu^dx, (10.7.89)

Next we need to find a bound for the quantity in the square brackets in Eq. (10.7.89). We do this by starting with the relation

/ —(xu^)dx = b[u{b)f - a[u(a)f (10.7.90) Ja dX

i u^dx = - f Ixuu'dx + b[u{b)f - a{u{a)f. (10.7.91)

Next consider the inequality

W\ < ^cc^ + ~P\ y>0{a.p real), (10.7.92)

which is easily derived from the expression

(jy\a\ - -j^p) = \y\a' + :^^f ^ 2\ap\ > 0. (10.7.93)

Letting a = u and fi = u\ we can express the integrand on the right-hand side of Eq. (10.7.91) as

I - 2JCM//| < 2fi\uu'\ < 2M(|W^ + Y~^l^ (10.7.94)

where

/x = max \x\. (10.7.95) a

Next we set y = l/(2/x) and use the inequality in Eq. (10.7.94) to obtain

j u^dx < Aix^ j {ufdx^2(b[u{b)f -a[u{a)fY (10.7.96)

But, since b[u{b)f - a[u{a)f < ii{[u{b)f + [u(a)]^), Eq. (10.7.96) implies

/ u^dx

where

€ = max(4At^ 2fjL). (10.7.98) 474 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

Finally, combining Eqs. (10.7.89) and (10.7.97) yields

(u,Lu) > ts(x)\-^^-\-^]u'dx (10.7.99) > M(u,u),

where

M = min + —xy\ I (10.7.100) a

and (u,u) = /^ s{x)u{x)^dx. Since |M| < oo, this completes the proof. The lower bound of L, of course, does not have to be 0. For example, the operator

Lu^—~z-Mu, 0<;c

A„ = {nnf + M and 0„ = V2 sin WTTJC, n = 1, 2.... (10.7.102)

We can let M be a large negative number and, thus, A,„ > M. However, A.„ < 0 until n is large enough such that {nn)^ > \M\. The various examples of regular, self-adjoint Sturm-Liouville operators bring out the special properties that are identified in the following theorem:

THEOREM. Let

where p{x), p\x), q(x), and s{x) are real and continuous functions on the finite interval {a,b]\ p(x) and s(x) are positive on [a, b]. Let

BxU = a^^u{d) -h ay2u\d) — 0 (10.7.104) B2U ~ a2T,u{b) + a2^u'{b) — 0,

where the coefficients ot^ are real and obey the conditions a\^ + QL\2 ^ 0 and ^23 + ^24 7^ 0 {these boundary conditions are equivalent to those of case (iii) in Eq. (10.7.29)). Then L is self-adjoint and its eigenvalues can be arranged in the sequence

where

|Xol < 00 and lim A.„ = 00. (10.7.106) n->oo Furthermore, the eigenfunction 0„(JC) corresponding to A„ has exactly n zeros in the open interval {a, b). SPECTRAL THEORY OF REGULAR STURM-LIOUVILLE OPERATORS 475

Consider cases (a)-(d) given by Eq. (10.7.68). In case (a), A.„ = (w + l)^7r^ and n(x) = V2sin{n + l)7rjc, n = 0,1,2... (we have transformed the integer sequence labeling X„ so that n = 0 gives the smallest eigenvalue). Notice that the eigenfunction 0o = \/2 sin JTX has no zeros in the interval (0, 1). The zeros at jc = 0 and jc = 1 do not violate the theorem, because jc = 0 and jc = I do not belong to the open interval (0, 1). The second eigenfunction (p^ = ^/isinlnx has a 0 at JC = ~; the third eigenfunction (/>2 = V2sin 3nx has zeros at A: = | and ^ = |; etc. In this case, the eigenvalues order as XQ < Xj < ^2 < • • •. Furthermore, cases (b) and (c) can easily be seen to obey the theorem with the eigenvalue ordering

An - 00 as n -> CO follow from the facts that the Sturm-Liouville operator with the boundary conditions given by Eq. (10.7.104) is bounded from below and that a differential operator is unbounded as n -> 00. The only thing we have not proved is that the eigenfunctions can be ordered such that 0„ has exactly n zeros in the open interval {a,b). This is the so-called oscillation theorem for Sturm-Liouville equations. The proof is a bit tedious, and so we refer the reader to texts on differential equations instead of reproducing it here. (Coddington and Levinson, Theory of Ordinary Differential Equations, is an excellent reference for this theorem.) The theorem pertaining to case (d) is as follows:

THEOREM. Let

s{x)dx\ ax J s{x) where p(x), p'(jc), ^(jc), and s(x) are real and continuous on the finite interval {a, b] and p{a) = p{b) (this last condition is needed to render L periodic in C2(a, b\ s)). Suppose the boundary conditions are either periodic,

u{a) = u{b) and u{a):=u\b), (10.7.108)

or anti-periodic,

uia) = -uib) and u'ia) = -u\b). (10.7.109) 476 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

With either set of boundary conditions, L is a self-adjoint operator. If X^,n > 0, denote the eigenvalues for periodic boundary conditions and k„, n > 1, denote those for anti-periodic boundary conditions, then the complete set of eigenvalues can be ordered into the following sequence:

AQ < Xi < ^2 < Ai < A.2 < X3 < X4 < • • • , (10.7.110)

where

\XQ\ < 00 and lim A.„ = lim X„ = 00. (10.7.111) n->oo «->oo For XQ, there is a unique eigenfunction 0o- When A2„î = ^2^+2 ^^ ^2n+\ = în-î^ there are two eigenfunctions, 02n4-i' în+i ^^ 4>2n-î^^2n+2^ corresponding to the same eigenvalue. The function 0o has no zeros in [a, b]. 2n4-i ^f^d 02«4-2^ ^ ^ 0, have exactly 2n-\-2 zeros in (a, b] and ^2n+\ ^^^ 02n+2' '^ ^ 0, have exactly 2n-{-\ zeros in [a, b). Note that the semi-open interval [a, b) includes jc = ^ but not x —b, whereas the semi-open interval {a, b] contains x = fo but not jc = a. We refer the reader to the literature on differential equations for the proof of this theorem. (Again, Coddington and Levinson, Theory of Ordinary Differential Equations, is an especially good text.) Let us examine case (d) as an example of an operator obeying this theorem. For the periodic boundary conditions, we find from Eq. (10.7.69) that

Ao = 0, 00 = 1 Ai = {Inf, 0j == v2sin27rjc

X2 = (27^)^ 02 = V2cos2;rx (10.7.112)

^3 = (47r)^ 03 = v2sin47rjc

A4 = (4;^)^ 04 = V2cos47rjc

while the eigenproblem for M(0) =: —u{n) and M'(0) —d^u/dx^ yields

Aj = 7r , 01 = V2 sin TTj^:

A.2 = TT , 02 = v2cos7rjc

A^3 = (37^)^ 03 = V^ sin STT jc (10.7.113) K = On)\ 04 = \/2cos3;rx

The eigenvalues of the above two problems obey the sequence

AQ < Xj = ^2 < A,| = A2 < A.3 = ^4 < A,3 = A4 < • • • , (10.7.114) SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 477

and the functions 02„+i and 02«+2' '^ - ^» have 2n 4- 2 zeros in [0, 1), and the functions 02n+i ^^^ 02n+2' ^ ^ ^^ havc 2/t 4- 1 zeros in (0, 1]—as required by the theorem. According to the theorem, the eigenvalues of the periodic and the anti- periodic operators obey the separate sequences

-oo < Ao < A, < ^2 < ^3 < -^4 < • • • (10.7.115)

and

-00 < Ai < A2 < X3 < A4 < • • • , (10.7.116)

respectively. In the special case (d), the inequalities "<" are replaced by equalities.

10.8. SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS

The differential expression of a singular Sturm-Liouville operator is of the same form as that of a regular Sturm-Liouville operator. An operator is classified as singular if either \a\ or \b\ of the interval [a,b] is infinite, if s{x) or p{x) has a zero in [a,b], or if \qix)\ is infinite at some point in [a,b]. The condition for self-adjointness is the same as for a regular operator, namely, that L — L^ and Vj^ = D^t. The domain X>^t is defined as those vectors y m H such that L+v € U and

(V, Lu> = (LV, U) for all u € P^. (10.8.1)

Two examples of singular, self-adjoint Sturm-Liouville equations are (i) The Legendre equation:

d V ^ dill LM = (1 - x^)— = AM, -1

L2U = —-j-j + x^u = AM, —00 < X < 00. (10.8.3) dx The Legendre equation arises in the quantum mechanical analysis of radially distributed potentials and angular momentum. The Hermite equation is encountered in the quantum mechanical analysis of the energy of a harmonic oscillator. In our analysis, u denotes the wavefunction and A the energy expressed in appropriate units. The domain of the operator in Eq. (10.8.2) is

P^j = {u,LiU€£2(-M)' M(~1) = M(1), M'(-1)=:M'(1)}, (10.8.4)

and the domain of the operator in Eq. (10.8.3) is

Vj^^ = {u, L2U € C2(-oo, 00)}. (10.8.5)

It is easily shown that Ly = L\ and D^ = P^t and that L2 = L\ and P^^ = P^t. Thus, the operators Lj and L2 are self-adjoint. However, Lj is singular because 478 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

p(x) = 1 — jc^ is 0 at jc = — 1 and x — I. Likewise, L2 is singular because it is defined in the infinite interval (—00, 00). The eigenfunctions of Eq. (10.8.2) are the Legendre polynomials

Pi(x) = :^J^(^' - ly. / = 0,1,2,..., (10.8.6)

and the corresponding eigenvalues are A,/ = /(/ + 1). The eigenfunctions of Eq. (10.8.3) are the Hermite functions 2 u,(x) = H,(x) exp(^), V = 0,1, 2,..., (10.8.7)

where H^ is the Hermite polynomial given by

H,(x) = (-I)^exp(x2)£^[exp(-x2)]. (10.8.8)

The eigenvalue corresponding to u^, is k^, = 2(v + ^). In the theory of special functions, it can be proven that the Legendre polynomials form a complete orthogonal set in C2(—IA) and the Hermite functions form a complete orthogonal set in C2{'-oo, 00) (F/ and w^, are not normalized as given). By inspection, one can show that P^ix) has exactly / zeros in the interval (—1,1) and My has exactly v zeros in the interval (—00, 00). Moreover, the eigenvalues corresponding to Pi and u^ are ordered in the sequence

-00 < Xo < Ai < X2 < (10.8.9)

Thus, even though the Legendre and Hermite operators are singular self-adjoint Sturm-Liouville operators, they have a complete set of orthogonal eigenvectors and their eigenfunctions and eigenvalues obey the theorems proved for regular, self-adjoint Sturm-Liouville operators. Still another such case is given in Exam- ple 10.7.3. It is from examples like these that one says energy is quantized in the quantum mechanical formalism; i.e., it takes on only discrete values. ••• EXAMPLE 10.8.1. The vibrational modes A, of a circularly symmetric annular membrane with fixed boundaries satisfy the equation 1 d / du\ Lu = Ir—)=XM, a a. Physically, the eigenvalues in Eq. (10.8.10) are A = a//v^, where co is the frequency of oscillation and v is the transverse wave velocity for the membrane (note that Eq. (10.8.10) derives from the one-dimensional wave equation in circular coordinates where M(r, t) = u{r) exp(/ft)0)- The above equation is singular if tz = 0, because s{r) = p{r) = r. A fundamental system for LM — AM = 0 is Wj = J^iy/Xr) and M2 = NQiVXr), where JQ is the Bessel function J^, for f = 0 and NQ is the Neumann function (Bessel function of the second kind) N^ for v = 0. If we introduce the notation

? = -, K = -, M = A«^ (10.8.11) a a SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 479

Eq. (10.8.10) becomes

LM = -i-^f^^)=/xM, 1

uiH) = cM^^) + c^N^i^H) (10.8.13)

for which the boundary conditions in Eq. (10.8.12) (for nonzero a) give the following conditions for the eigenvalues /x:

0 = Ci7o(v^) + c^N^i^) (10.8.14) 0 = C^JQ(^K) + C2NQ{^K).

Equation (10.8.14) yields a nontrivial solution for the constants c^ and C2 if and only if

= 0. (10.8.15)

Notice that the redefined eigenvalues fi depend only on /c, the ratio of b/a. The expHcit dependence on a comes in the definition of /x. We can solve for the eigenvalues of a given K by expanding Eq. (10.8.15):

JQ{^)NQ{^K) = NQ{^)JQ{^K). (10.8.16)

Consider, for example, the case where /c = 1.5. In Fig. 10.8.1, we have plotted both the left- and right-hand sides of Eq. (10.8.16). Here, as in Example 10.7.1, we will solve for the values of fx using the Newton-Raphson method, taking our

0.15

0.10

0.05 h

/(v^) 0.00

-0.05 h

-0.10

-0.15

FIGURE 10.8.1 480 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

initial guesses directly from the intersecting curves in the figure. Using the initial guesses {6, 12.5, 19} for ^, the first three resulting eigenvalues are

{v^} = {6.27022,12.5598,18.8451,...}. (10.8.17)

The eigenfunctions are thus as in Eq. (10.8.13) with the coefficients Cj and c^ from Eq. (10.9.76) given by

For the special case a = 0, as mentioned above, Eq. (10.8.12) is singular. Physically, this case represents a circular membrane (drum). If we introduce the following new notation

? = ^ and ii = Xb^, (10.8.19) b Eq. (10.8.10) for the singular problem becomes

1 d / du\ Lu = TZ\^-7r] =^tJiu, 0 < ^ < 1, M(0) = w(l) = 0, (10.8.20) ^ d^ \ a§ / with the eigenfunctions M(^) given in Eq. (10.8.13). Note, however, that the boundary condition M(0) = 0 can only be satisfied if Cj = C2 = 0 since JQ(0) = 1 and ^o(^) -^ — oc. Thus, with the singularity at ^ = r = 0, the operator L defined in Eq. (10.8.20) has no eigenvectors. We can, however, alter the boundary conditions in Eq. (10.8.20) to make the problem nontrivial if we require that

M'(0)=0 and M(1) = 0. (10.8.21)

This case represents physically the condition that the membrane at the center of the circle (§ = 0) is free to move but is bounded at the edge (^ = 1). Since /o(0) = 0 and NQ{0) —> oo, we can set Cj = 1 and C2 = 0 in Eq. (10.8.13). The second boundary condition in Eq. (10.8.21) then leads to the equation

/O(VM) = 0. (10.8.22)

Thus, we see that the ^ can only have values of the zeros of /Q, namely,

• • I (V^} = {2.40483, 5.52008, 8.65373, 11.7915, 14.9309,...}. (10.8.23)

••• EXAMPLE 10.8,2. Consider the problem of diffusion of carbon into an iron cannonball initially free of carbon. This process is often conducted in order to case-harden (i.e., produce a steel casing on) cannonballs. Suppose the cannonball is a sphere of radius r = b. Initially, the cannonball contains no carbon, but at r = 0 it is contacted with a carbon source that keeps the surface concentration at the constant value C^. We want to find the concentration distribution of carbon as a function of time. SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 48 I

Assuming that the concentration of carbon in the cannonball depends only on the radial distance r from the center of the ball, the concentration obeys the diffusion equation dC Id/ ^dC\

where D is the diffusivity of carbon in iron. The initial and boundary conditions for this problem are

C(r, 0) = 0 (10.8.25)

and

C{bj) = C, and —(0,0=0. (10.8.26) dr It is convenient to define C(t\ t) = C(r, t) — Q, such that C still satisfies Eq. (10.8.24) and the initial and boundary conditions

C(r, t-0) = -C, C{r = fo, 0 = 0, —(r =bj) = 0. (10.8.27) dr The Sturm-Liouville operator 1 d / ^du\ Lu = —o-r-l^ -T-) (10.8.28) r^ dr\ dr / is self-adjoint for the boundary conditions in Eq. (10.8.27). Thus, we wish to solve the eigenproblem

—r —(r^—) = A.M, ueC2{0,b;r^), (10.8.29) r^ dr\ dr / with uQ)) — 0 and du(0)/dr = 0. Using the transformation

u(r) = , r the eigenproblem becomes _dhv 2 =kw, w€C2iO,b), (10.8.30) ~'d? with

w(b)=0 and w{Q)=0. (10.8.31)

The boundary condition for w(0) comes from the physical requirement that C < oo at the center of the cannonball and is equivalent to the condition dC/dr = 0 at r = 0. Equations (10.8.30) and (10.8.31) define a regular, self-adjoint Sturm- Liouville eigenproblem in £2(0* b). Thus, the eigenfunctions <^„, n = 1,2,..., for this problem form a complete orthonormal set in £2(0, b). Likewise, since }l/„{r) = n{r)/r is an eigenfunction of the operator in Eq. (10.8.29), it follows that ^n{r)lr, n = 1, 2,..., is a complete orthonormal set in £2(0* b\ r^). 482 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

The eigenfunctions and eigenvalues of Eq. (10.8.30) are

2 , nnr 0„(O = A/^sin—, n = l,2, (10.8.32) '" = (T) ' Thus, if Eq. (10.8.24) is expressed as ac = -DLC, (10.8.33) dt its solution is

C = exp(-f DL)Co, (10.8.34)

C(r,r) = C, + J]exp(-?D(^—j j^l-^sm —

& Jljb . nnr' X / —;—sm (10.8.35) h r -i-cy'dr ^ ^ ^. ..n { /^^\^^\26 . nnr = C.+C.E(-l)"exp(-(-) />^)—sm—.

In Fig. 10.8.2, we show C/C^ versus t (in units of b'^jD) for r/h = 0.9, 0.5, III and 0. A Mathematica program is included in the Appendix for this example. EXERCISE 10.8.1. Repeat Example 10.8.2, but for an initial concentration of • • • C(r, r = 0) = 0.2Q(r - fc)V6^ ILLUSTRATION 10.8.1 (Stokes' Oscillating Plate). Consider the problem of an incompressible Newtonian fluid between two parallel plates separated by a distance / as shown in Fig. 10.8.3. Keeping the top plate fixed, the bottom plate

l.V j r/b = O^^^^...^...-—-'-'- 11^^^==*^ 0.8 -j /r/6==0.5/y^ 0.6 H C/Cs / // 0.4 1 / A/^=^ ~

0.21 -l

n n i y y^ 1 1 1 1 t).0 0.1 0.2 0.3 0.4 0.5 Dt/b'^

FIGURE 10.8.2 Time dependence of the concentration profile in Example 10.8.2 (Eq. (10.8.35)). SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 483

-^-z Oscillation FIGURE 10.8.3

oscillates in the z direction (parallel to the top plate) with velocity v = VQ sin cot. We desire to find the velocity profile of the fluid as a function of its perpendicular distance away from the moving plate {x direction). For a Newtonian fluid, the velocity in the x direction obeys the Navier-Stokes equation, which for this problem reduces to

3i; _ 9^; (10.8.36)

where the kinematic viscosity is given by v = ix/p. The initial condition and boundary conditions are

i;(r =0,jc)=0 v{t > 0, jc = 0) = 1^0 sin o)t (10.8.37) v{t >0,;c =0=0.

It is convenient to begin by defining the dimensionless quantities

H^ / (10.8.38) u = •^0 X s = (Ot,

(10.8.37) become du a—T-, (10.8.39) 9T ^ a?2 and

U(T = 0,?) = 0 u(x >o,? = 0) = sinr (10.8.40)

M(T >0,| = 1) = 0,

where we define a = V/(CD/^). 484 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

The next step is to split the solution into a steady-state part and a transient part

w(r, ^) = UsiT, ^) + Uj(T, ?). (10.8.41)

We now characterize the steady-state solution as a "pure oscillation" for which we assume a solution of the complex form

Ms = /(§)exp(/r). (10.8.42)

Here we specify the true velocity as the imaginary part of Eq. (10.8.42), i.e., u^ = Im{/(jc) exp(/r)}. This particular method is very common in physics and often, as in this case, offers the only viable method of obtaining a closed-form solution. We can now solve for the steady-state solution by noting that at large time the transient solution, by definition, is 0. Substituting Eq. (10.8.42) into Eq. (10.8.39) gives

^ = r/(^)- (10.8.43)

(i) Show that the function

exp(-(l/V2^)(l + /)(1 - ^)) - exp((l/V2^)(l + /)(1 - ^)) /(?) exp (~(1/V2^)(1 + /)) - exp((l/x/2^)(l + /)) (10.8.44)

is a solution to Eq. (10.8.43) and that the corresponding steady-state solution satisfies the two boundary conditions in Eq. (10.8.40). Derive an expression for the true steady state velocity profile U^{T,^). Plot this solution for several values of 0 < r < 27r and of = 1. To solve for the transient solution, we first note that it, too, is a solution to Eq. (10.8.39):

—L=a—1. (10.8.45)

However, we need the transient solution to approach 0 as r ^- oo. We accomplish this by requiring the initial condition

Uj(T = 0, §) = -U^(T = 0, §). (10.8.46)

Equation (10.8.46) ensures that the initial condition in Eq. (10.8.37) is satisfied. Noting that at ^ = 0 the steady-state solution gives the correct velocity for all times, we can impose the boundary conditions

UJ(T > 0, 1 = 0) == 0 (10.8.47) Mj(r > 0,^ = 1) == 0. (ii) Defining the operator J2„ Lu = (10.8.48) SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 485

we note that

Lu = -a-j^ = Xu, w(0) = 0, w(l) = 0, (10.8.49)

defines a regular, self-adjoint Sturm-Liouville eigenproblem in £2(0* 0- Show that the functions

M„(^) = \/2sinn7r?, X„ = n^n^a, (10.8.50)

are eigenfunctions of L with the corresponding eigenvalues given above for n = 1,2,.... Prove that there are no eigenvalues that are negative or 0. (iii) In operator form, we can write Eq. (10.8.48) as au —• = -LU, (10.8.51) OT which yields the formal solution

U(r) = exp(-rL)U(0). (10.8.52)

Using the spectral resolution theorem, show that

00 WT(T,§) = Y,a„u„{^)exp{-n^7t^ax). (10.8.53)

Use the initial condition in Eq. (10.8.46) to solve for the parameters a„. Plot the transient solution for the case a = 1 for several cycles of r. After approximately how many cycles can we neglect the transient behavior? With the four cases given by Eqs. (10.8.2), (10.8.3), (10.8.10), and (10.8.29), it seems tempting to conclude that singular self-adjoint Sturm-Liouville operators also always possess a complete set of orthogonal eigenvectors. This, alas, turns out not to be the case. Consider the self-adjoint problem

LM = —3-2=AM, U e C2(—00,00). (10.8.54) d^ This is the quantum mechanical problem for the energy k and wavefunction u of a particle in free space. A fundamental system for LM — AM = 0 is

Ml = exp(iVXx) and M2 = exp(—/VX;c) (10.8.55)

(an equally good set would be MJ = sin VAJC and M2 = cos VXA:). The problem with Eq. (10.8.55) is that there is no linear combination,

M = Cj exp{i^/kx) 4- C2 exp(—/VXJC) (10.8.56)

for which u would be square integrable in C2(—oo, 00). Thus, the operator L defined by Eq. (10.8.54) has no eigenvector. On the other hand, Eq. (10.8.56) satisfies Eq. (10.8.54) for \/A = k, where k is any real number. We say that A^ = k^ (k any real number) is the continuous spectrum of L. Although the operator L does not have eigenvectors, it turns out it does have a spectral resolution. 486 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

Let US consider a function h{x), which has a Fourier transform; i.e., the function h{k) defined by 1 r~ h{k) = —= / exp{ikx)h(x)dx (10.8.57) lln exists. From the theory of Fourier transforms, it is known that 1 r^ ~ h{k) = -j= \ h(k)Qxp{-ikx)dk. (10.8.58)

Insertion of Eq. (10.8.57) into Eq. (10.8.58) yields

h{x) = —l / QX^(-ik{x-y))dk\h{y)dy, (10.8.59) In }- and since Eq. (10.8.59) holds for any function having a Fourier transform, it follows that the quantity in square brackets must equal the Dirac delta function, i.e., 1 r^ 5(x - j) = —- / exp(-/A:(x - y)) dk. (10.8.60)

Suppose u G P^. Then, since u = lu, it follows that

oo h{x - y)u{y) dy / -oo I -oo i.oo (10.8.61) = — / / exp ( — ik{x — }'))w(j) dk dy Lu J—oo ''—oo and oo r J -oo n / — / k^exp{-ik{x - y))dk \u{y)dy. (10.8.62) The quantity in square brackets on the right-hand side of Eq. (10.8.62) is a kernel of an integral operator representing the operator L. If we denote this kernel by l{x,y), we have 1 r^ /(jc, y)^— e exp(-/i^(jc - y)) dk, (10,8.63) ZTT •^—oo which is analogous to the spectral resolution

L = E^A*I or Kx,y) = Y.KnixWniy) (10.8.64) n n of a regular self-adjoint Sturm-Liouville operator. (Note that the summation is replaced by an integral in Eq. (10.8.64).) The generalization of the spectral decomposition for the operator function /(L) is 1 r^ fix, y) = — fih) cw{-iHx - y)) dk (10.8.65) ZTZ •/—oo (where A,^ = k^) for any function f{t) that is defined for /(A^). /(JC, y) denotes the kernel of the integral operator representing /(L). SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 487

EXAMPLE 10.8.3. Consider the problem

Lu = ——- =i — ^ u £ C2i—oo,oo), (10.8.66)

and u(x, 0) = UQ(X) 6 C2{—oo, oo). The problem can be solved formally as

u = exp(-z7L)Uo, (10.8.67)

or with the aid of the spectral decomposition theorem

u{x) = — / / exp[—/[fc(jc - y) -\- tk^]]uQ{y)dkdy ^"^ ~ ~ (10.8.68) 1 f^ = —- / expiitk^) exp(~-ikx)riQ(k) dk. 2n J-oo Thus, the solution to Eq. (10.8.66) is an integral transform. Interestingly, the solution corresponds to the process of Fourier transforming the initial concentration, followed by "time propagation" (through multiplication by the factor exp(itk^)) and concluded by a Fourier transformation back to "real space." If the boundary conditions for L were such that L is self-adjoint in a finite interval [a,b], the solution to Eq. (10.8.66) would be

u(x) = J]exp(/X„O0„(^){^n. «o). (10.8.69) n

Equations (10.8.68) and (10.8.69) are quite similar. The eigenfunction 0„(JC) is analogous to exp(—/^jc), the inner product (0„,Uo) is analogous to MQ(A;) = (I/ITT) f^^exp(ikx)uQ(x)dx, and the factors expiitk^) and exp(itXJ are equivalent. The major difference is that the summation in Eq. (10.8.69) is replaced by an integral in Eq. (10.8.68). In fact, the next exercise shows just how similar the spectral representations are in Eqs. (10.8.68) and (10.8.69). EXERCISE 10.8.2. Consider the problem cPx Lu = —-7-3 = ^w, —a < X < a, (10.8.70) "dx with the periodic boundary conditions M(—a) = u{a) and u'{—a) = u^{a). Find the spectral representation of L for finite a. Show that in the limit that a -> 00, • • I the spectral decomposition of L becomes that given in Eq. (10.8.63). For some other examples of the decomposition of the identity in terms of the continuous spectra of singular self-adjoint Sturm-Liouville equations, consider the following: d^u Lu = 0 < J: < 00, 1 £2(0,00), (10.8.71)

for the cases (a) M(0) = 0

(b) H'(0) = 0 (10.8.72) (c) M'(0) = aM(0), a>0. 488 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

The corresponding decompositions of the kernels of I are 2 /"^ (a) i{x, y) = S(x — y) = — I sinA:jc sinkydk 2 r^ (b) i(x, y) = S(x — y) = — coskx cosky dk n Jo (10.8.73) 2 f'^/ Of \ (c) i{x,y) =z 8{x — y) = — / (COS^A: + — sin^x) 7t JQ \ k / /y \ 1^^ (cos ky -\- - sin ky) -x ^ dk.

wmmm EXERCISE 10.8.3. Derive from the results in Eq. (10.8.73) the spectral reso- • • I lution of L for the above three cases. From what we have examined thus far, one might hope that singular self- adjoint Sturm-Liouville operators have either only discrete spectra (when the eigenfunctions form a complete orthogonal set) or continuous spectra. Unfortu- nately, this hope is also soon dashed. Consider case (c) in Eq. (10.8.72), but now assume that a < 0. The function sfloitx^iax) is then an eigenfunction (with eigenvalue X^ — a^) of the operator defined by Lu — —d^u/dx^ and M'(0) = aM(0). Note that when a > 0 the function exp(ajc) is not square integrable on (0, oo) and thus it is not an eigenfunction. The continuous spectra are the same as for case (c). Thus, the spectral decomposition of I when a < 0 is the same as in Eq. (10.8.73) with an addition term:

/(x,y) = (5(x —y) = 2aexp(a(A: + y)) 2 r/ , « . , ^/ , « . , \ ^' ., (10.8.74) + —- / I cos/cjcH- —smArx If cosKy-j- —sm/cy )-^r -dk n Jo \ k /V k /k^-{-oL^ Thus, the operator has both discrete and continuous spectra. As another example, consider the following problem from quantum mechanics:

h^ d^u -^ + V{x)u = Eu, ue A(-oo, oo), (10.8.75) 2m dx^ where the potential energy function,

V{x) = VQ, X < —a = 0, -a

= Vo' ^ > ^'

is shown in Fig. 10.8.4. a and VQ are positive and the eigenvalue E is the energy of a particle. This is a singular self-adjoint problem corresponding to a particle in a space containing a potential well (often called a rectangular well because of its symmetry about X = 0). The eigenvalues E are bounded from below by 0. Since |M(JC)P represents the probability density that the particle can be located between x and x -{-dx, the mathematical requirement that /^ \u{x)\^ dx < oo, or u G C2{—OO, OO), means SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 489

Vix)

->- X —a FIGURE 10.8.4

in quantum mechanics that the particle is in bound state. Formal solutions to Eq. (10.8.74) that are not in C2i—co, oo) are called continuum states. Outside the potential well (i.e., when |;c| > a), we need to solve the equation

2m — K^Ui = 0, |;c| > a and K^ = -^(VQ- E). (10.8.77)

Inside the well, we need to solve the equation

(fu2 2m -\-KIU^ =0, |;c| < a and /c-j = yr^- (10.8.78)

From these two equations, the solution to Eq. (10.8.76) is obtained by requiring u^{x) — u^ix) and u\(x) = u'^ix) aX x = —a and a. Since the eigenfunctions must be square integrable in (—oo, oo), i.e., u G C2(—oo, oo), we can expect to find a solution to Eq. (10.8.76) only if /c, > 0, or £" < VQ. In this case, a fundamental system for Eq. (10.8.77) is

QxpiK^x) and exp(—^jjc). (10.8.79)

Likewise, a fundamental system for Eq. (10.8.78) is

sinAc:2JC and cos/i:2^. (10.8.80)

For X > a, only the fundamental solution exp(—ACJJC) can be chosen since cxp(K^x) —> oo as jc —> oo. Correspondingly, for x < —a, only the fundamental solution exp(^ijc) can be chosen since exp(~K^x) -> oo as jc ~> —oo. Thus, one solution to Eq. (10.8.78) is

Aexp(fCiJc), —oo < jc < —a, u^^\x) BsinK2X, —a

Another solution is

Dtxp{Kix), —oo < X < —a, u^''\x) = FC0SK2X, —a

These two solutions are equivalent to the more generally stated solution

c^expiKix), -00 < jc < —a, u{x) = { C2 sin K2X + C3 cos K2X, —a

C expi—K^a) = Bs'mK2a and — /CjCexp(—/c,a) = K2BcosK2a, (10.8.84)

and so the energy eigenvalues are determined by

/Ci = —K2COtK2a (10.8.85)

and the coefficients C and B obey the equation

C — — txp{K^a) sin/C2fl. (10.8.86) B Similarly, we find for solutions of even parity that the eigenvalues are determined by

K^— K2 tan K2(i (10.8.87)

and the coefficients obey

— = exp(^ia)cos/C2a. (10.8.88)

If we use the relation KI -\- KI = (2m/h^)VQ to eliminate fCj from the squares of Eqs. (10.8.85) and (10.8.87), and define the dimensionless quantities

2ma^ a = K^a and p' = •Vo, (10.8.89)

we can transform Eq. (10.8.85) into

a^csc^a = 0^ (10.8.90) SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 491

and Eq. (10.8.87) into

op- sec^ a = p> (10.8.91)

The positive roots of a^^^ of Eq. (10.8.90) determine the eigenvalues of the eigenfunctions (p^^^ of odd parity and the roots oif^ of Eq. (10.8.90) determine the eigenvalues of the eigenfunctions 0^"^ of even parity. Acceptable roots of Eq. (10.8.78) have to meet the requirement that tan/c:2« is positive, and those of Eq, (10.8.91) have to meet the requirement that cot K2a is negative. From the above definitions, the energy £"„ is related to a root a„ by

fi2 ^. = (10.8.92) 2ma If £•„ > Vo, there is no solution to the problem because K^ would be imaginary and neither expiK^x) nor exp(—^,jc) would be in the Hilbert space C2{~oo, oo) because it would not be square integrable. Thus, there are eigenfunctions only for solutions to Eqs. (10.8.90) and (10.8.91) for a < p. Figure 10.8.5 shows how the quantum mechanical energies E vary with the depth VQ of the potential well. Notice that, for values fi below n/2, there is only one eigenfunction or bound state for a particle in a potential well. With increasing p, or well depth, the number of eigenfunctions increases. However, for any finite value of VQ there will be a finite number of eigenfunctions. Therefore, the eigenfunctions in a finite potential well cannot form a basis set in C2{—oo, oo). Consider next the situation when the particle energy E is greater than VQ- We then define K^ = (2m/h^)(E - VQ) and AC| = {2m/h^)E and construct the eigenproblems

-\-K^Ui — 0, \x\ > a, (10.8.93) lb?

/32^2rM

10.0 -=^[^ FIGURE 10.8.5 Energy levels for the bound states of a particle in a rectangular well. The full lines and broken lines refer to states of even and odd parity, respectively. 492 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

and

^ -f KIU^ = 0, \x\ < a, (10.8.94) ax applying the continuity conditions u^{x) = ii2{x) and u\(x) = U2(x) aX x = —a and a. In terms of the fundamental systems exp{±iKix) and cxp{±iK2x), we can express the solution as

-IKxX L2 r -oo < j^ < —a, u{x) = C3 ^"^2^ + C4 ^-''^2^, -a < X < a, (10.8.95) C5 ^'"••^ + C5 ^~"^'^, a < X < 00. For arbitrary r3, C4, /Cj, and /C2, the continuity conditions at jc = —A can be used to determine Cj and C2 as linear combinations of c^ and Q. Furthermore, the continuity conditions oX x = a can be used to determine c^ and c^ as a linear combination of c^ and C4. Thus, Eq. (10.8.92) has a solution for any value of E greater than VQ. However, the solution is not square integrable, and so the values of E lying in the range VQ < E < 00 represent the continuum spectrum for the eigenproblem defined by the particle in a finite potential well. The electronic states of the hydrogen atom represent another example. In that case, the potential has the form V(r) = —e^/r, and there exist an infinite number of orthonormal eigenfunctions for M < £ < 0, |M| < 00, but they do not form a complete set. The energy range 0 < £ < 00 represents the continuum spectrum for the electronic eigenstates (called the scattering states) of the hydrogen atom. We see from the special cases examined in this section that a singular self- adjoint Sturm-Liouville operator may have a complete orthonormal set of eigenvectors, may have no eigenvectors (and thus only continuous spectra), or may have some orthonormal eigenvectors (even an infinite number) that are not a complete set and thus possess both continuous and dicrete spectra. So what is the most general theorem that can be stated for self-adjoint differential operators? At the end of this section, we will answer this question with a theorem that we will give without proof. Let us first reexamine the spectral decompositions given above for the operator Lu = —(fu/dx^ in (—00, 00) and (0, 00). In the case of a self-adjoint Sturm-Liouville operator in £2(0' ^^^ ^) ^^ C2(~oo, 00; s) having a complete set of eigenfunctions 0„, the spectral resolution of I and /(L) is

00 iix,y) = Y,nix):iy) (10.8.96) « = 1 00 fix, y) = E fa„)'P„(x):(y), (10.8.97)

where i{x, y) is the kernel representing I and f(x,y) is the kernel representing /(L) when f{t) exists for r = A„, n = 1, 2, For the operators defined by Eq. (10.8.54) and Eqs. (10.8.71) and (10.8.72), the spectral decomposition of I and /(L) is of the form

/.oo iix,y)= ui^{x)uliy)dk (10.8.98) SPECTRAL THEORY OF SINGULAR STURM-LIOUVILLE OPERATORS 493

and

f{k,)u,(x)ul{y)dK (10.8.99)

where k^ = —oo for the operator defined by Eq. (10.8.54) and fcg = 0 in the other three cases. The quantity A.^^ corresponds to LM;^ = ^jt"it» although the functions Uj^ are not eigenfunctions since they do not belong to X>^ (which in these cases means they are not square integrable). For the self-adjoint operator having a complete set of eigenfunctions, the relation u = lu = Yin ^n^l^ corresponds to a Fourier expansion of u in terms of {u„}. For the self-adjoint operator having only continuous spectra, the relation u = lu = fUf^uludk is an integral transform. In the examples given by Eq. (10.8.74) and by the particle in the potential well, the operators have both discrete spectra (eigenfunctions) and continuous spectra. These correspond to the most general case, which can be stated in the following theorem:

THEOREM. If L is a linear self-adjoint operator in some domain V^ in a Hilbert space, then the identity operator I can be represented by the spectral decomposition

i(x. y) = J2nM€(y) + / " u,ix)uliy)dK (10.8.100)

where the 0„ are eigenfunctions of L and the Uf^ are functions satisfying Luj^ = ^k^k (^k ^yi^S ^^ ^he continuous spectra of L). Depending on the operator, /:„, |A;„|, and \ki\ can be finite or infinite.

The proof of this theorem lies outside the scope of this text and advanced works on differential equations or functional analysis should be consulted for those interested. Equation (10.8.100), through the relation /(L) = /(L)I, yields

fix, y) = iZ f(K)(t>nM(l>:(y) + f " f{X,)udx)ul{y)dk (10.8.101)

if f{t) exist for t = X^ and A.^ values. Here f{x,y) is the spectral decomposition of the integral operator representing /(L). It is important for the reader to remember that in the weighted Hilbert space C2{a, b\ s) the inner product is u^v = (u, v> = /^ u''{x)v{x)s{x)dx, and so

lu = / i{x, y)s{y)u{y) dy (10.8.102)

and <

/(L)u= / f{x,y)s{y)u{y)dy, (10.8.103) 494 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

10.9. PARTIAL DIFFERENTIAL EQUATIONS

Although this chapter is primarily dedicated to ordinary differential equations, we will touch briefly in this section on partial differential equations. One reason for mentioning partial differential equations here is that frequently, by separation of variables, they can be solved by solution of ordinary differential equations. For example, diffusion or heat transfer in a solid rectangular parallelepiped obeys the equation

1 du Lu = —, (10.9.1) D dt where

,„_V^„_(^_ + _ + _j, aO.9.2)

D is the molecular or thermal diffusivity (for physical reasons D > 0), and t is time. Suppose the boundary conditions on the solid are

M(0, y,z) = «(a, y- z) = 0 du M(JC,0, b, z) =: (X, z) = 0 (10.9.3) du du , U,y,0) = —(X,3;,C:)=0. 9z 9z In a typical mass or heat transfer situation, the initial concentration or temperature is given, i.e.,

M(JC, J, Z, ? = 0) = u^{x, y, Z). (10.9.4)

Since the boundary conditions are separable (i.e., the constraints are on u in the A:, y, and z directions separately), the method of separation of variables can be used. By this method, we choose the form of the solution as M = X{x)Y(y)Z{z). We then define the operator L from the differential expression in Eq. (10.9.2) and the boundary conditions at Eq. (10.9.3), and seek the solution to the eigenproblem

Lu = Xu. (10.9,5)

Inserting u = XYZ into Eq. (10.9.5) and rearranging leads to

X\x) Y'\y) Z'iz) = X, (10.9.6) X(x) Y(y) Z(z) where the double primes on X,Y, and Z indicate second derivatives. Since the three quantities on the left-hand side of Eq. (10.9.6) depend only on jc, y, and z, respectively, it follows that the three quantities are constant, i.e.,

-X" = k^'^X, -Y" = X^y^Y, -Z" = X^^^Z, (10.9.7) PARTIAL DIFFERENTIAL EQUATIONS 495

where X = A^^^ + A^^^ + A^^\ The boundary conditions for the three eigenproblems in Eq. (10.9.7) are

X(0) = X(fl) = 0 r(0) = Y'{b) = 0 (10.9.8) Z'(0) = Z'{c) = 0,

and thus all three equations in Eq. (10.9.7) are regular, self-adjoint Sturm-Liouville equations. The eigenfunctions and eigenvalues of these equations are

X„ = J-sin , A^^^ = { —) , /T = l,2,...

/T n;rjc ,. /nn\^ Z.• =:J-coV c s c , ^^^r=m'" ' 'V c ^/ . " = ..2,. (10.9.9) Since {X^}, {F„}, and {Z^,} are complete orthonormal sets in Cii^.a), €2(0, b), and £2(^' <^)' respectively, the set

0,„^ = X,Y„Z^, m, fi = 1, 2,..., p = 0, 1, 2..., (10.9.10)

forms a complete orthonormal set in £2(^3)» where ^3 is the volume of the rectangular parallelepiped defined byO

Thus, we have constructed the eigenvectors and eigenvalues of a regular, self-adjoint, three-dimensional Sturm-Liouville operator from the eigenvectors and eigenvalues of three regular, self-adjoint, one-dimensional Sturm-Liouville operators.

The formal solution to Eq. (10.9.1) is

u=:exp(-rDL)Uo, (10.9.12)

and so, from the spectral decomposition exp(-/DL) Yl exp(-/DX„„,)0„„^0Lp. (^0.9.13) m,n, p 496 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

we find

oo uix,y,z)= E txpi-tDX„„p)X„ix)Y„{y)Z„iz) •n.n.p^l (10.9.14) /.fl pb pC X / / / ^mixi)Y^{yi)Z (zi)uQ{X],yi,Zi)dx^dyydzi. JQ JO JQ The operator L defined by Eqs. (10.9.2) and (10.9.3) is an example of a regular, selfadjoint, multidimensional Sturm-Liouville operator. For the general n- dimensional case, consider the finite volume Q„ and surface d^„. ^„ will denote the volume Q„ plus its boundary, i.e., ^„ = Q,^ + 9^„. The differential expression defining the regular Sturm-Liouville operator is

where 3c is an /i-dimensional Euclidean vector with Cartesian components {jî,.. .,xj and (i) Pi J, s, and q are real-valued functions of x and p^j = pj^', (ii) Pijj C^(^J,^ six), q(x) e C«(^J; (iii) s(x) > 0 for 3c e ^,; (iv) Yllj=\ Pij(^)î ^ ^QT!i=iî^ for all 3c in Q^ arbitrary real numbers ^1, ...,§„; CQ is a constant greater than 0. This condition is that the matrix [p,y(3c)] is positive definite for all 3c € ^„. This operator is formally self-adjoint in the Hilbert space £2(^n*' ^)^ since it can be shown by integration by parts that V ^ L. The boundary conditions for the operator L are denoted by

5M =0, 3cea^„. (10.9.16)

Thus, the domain Vj of the Sturm-Liouville operator defined by Eqs. (10.9.15)- (10.9.16) is

Vi^ = {u, Lu G £2(^«; s). u 6 C^(^„); 5M = 0, 3c G a^J. (10.9.17)

To derive the boundary conditions for the adjoint L^ we need to use the Gauss theorem

/ u(x)v(x)dS„= f Vu(x)dQ„, (10.9.18)

where V is the /i-dimensional gradient operator and v{x) is the outward-pointing normal at the point x locating the element dS„ of the surface dQ„. In Cartesian coordinates, the directional vectors are orthogonal and we can construct the Gauss theorem for individual coordinates as follows

/•••/ ^^dxr"dx„:= f^'^f u{x)v.{x)dS,, (10.9.19) J JQ„ OX: J JdQ„ PARTIAL DIFFERENTIAL EQUATIONS 497

where v, is the component of 0 in the direction of jc,. Using Eq. (10.9.20) and (v, u) = f^ v*(x)u(x)s(x) dQ„, we find

(V, Lu> - (L^, u> = f J2 PiM^'^ - ''^y) ^^- (10.9.20)

where M^. and v^_ denote the partial derivatives of u and v with respect to Xj. Thus, the domain V^^ is defined by the condition

/ E PIMKJ^ - "x/*) dS„ = 0 for all u € Vj. (10.9.21)

If the domain D^t of vectors v defined by Eq. (10.9.21) is equal to P^, the Sturm- Liouville operator will be self-adjoint since L = LK Some examples for which the multidimensional Sturm-Liouville operator is self-adjoint are: (i) M(jc) = 0 for jc € dQ,,. (ii) u^Xx) = 0, 7 = 1,..., n, for Jc € dQ„. (iii) /?M = 0 for 3c € 9fi„, where Ru = Jllj=\ Pij(^)i*x^j' (iv) Ru -f aw = 0 for x e 8fi„, a real and a(x) G C^(9^J. Since condition (i) places no constraint on u^,, it follows from Eq. (10.9.21) that v = 0 for 3c € 9fi„ for this case. Similarly, condition (ii) places no constraint on u and so Eq. (10.9.21) implies that v^, = 0, 7 = 1,..., n, for 3c € dQ„ for this case. Equation (10.9.21) can be expressed in the form

f (uRv* - v*Ru)dS„ = 0 (10.9.22) JdQ„

from which it follows that /?M = 0 for 3c € 9^„ for condition (iii) and Ru-^av = 0 for X € dQ^ for condition (iv). Thus, the regular, real Sturm-Liouville operators defined by Eqs. (10.9.15) and (10.9.16), with the boundary conditions (i)-(iv), are self-adjoint. One can also prove the theorem:

THEOREM. With the boundary conditions (i)~(iv), the regular, real Sturm- Liouville operator in C2{^n'^ s) is hounded from below, i.e.,

(u, Lu) > M{v, v), \M\ < 00. (10.9.23)

The proof of the theorem is more difficult than was the case in one dimension and will not be given here. We will also give here without proof the following fundamental theorem for regular, real Sturm-Liouville operators:

THEOREM. If the boundary conditions Bu — 0 for x e dQ,^^ are such that the n-dimensional regular Stunn-Liouville operator L—defined by Eqs. (10.9.15) and (10.9.16)—is a subset of Cji^n'^s), then the eigenvectors {0^} of la form a complete orthononnal set in £2(^n» *^) ^^^ ^^^ eigenvalues A^, are real 498 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

Thus, the same spectral resolution theorem

and

L = E^«.0m^m. /(L) = E fiK)K^L (10.9.25) ni in

when /(X^) exists, holds for regular self-adjoint Sturm-Liouville operators in any dimension. As was evident in the example given by Eqs. (10.9.10) and (10.9.11), an eigenvector in n dimensions will generally be characterized by n indices. Thus, the boldface m in the above theorem indicates that n indices are required to specify the mth eigenvector. One route to proving the theorem is to prove that the Green's function operator, G, for a regular Sturm-Liouville operator is a completely continuous integral operator. The kernel of the operator G is determined from

Lg(3c,y)=:5(jc, j), Bg{x,y) = Q forJc€a^„. (10.9.26)

Once it is well established that the Green's function for an w-dimensional regular Sturm-Liouville operator is completely continuous, then the proof of the theorem would be as easy as was the proof for the one-dimensional case. The interested reader should pursue more advanced texts for the complete proof. The situation for singular self-adjoint operators in n dimensions is the same as that in one dimension. If 5'(3c) = 0 or p{x) = 0 at some point, if |^(Jc)| is infinite at some point, or if the domain fi„ is infinite in any dimension, then a self- adjoint equation is singular. Singular self-adjoint operators can have exclusively discrete spectra (their eigenfunctions form a complete orthonormal set), exclusively continuous spectra, or a mixture of discrete and continuous. Again, the simple example

Lu = —^—2' n€ Cji—oo < Xi < oo, i = I,... ,n) (10.9.27)

provides us with a self-adjoint operator with no eigenfunctions. For this case, the functions

^k = 77-^2 expO"^ • ^), k^x = JZKx,^ (10.9.28)

obey the equation Lu^ = Xiui, where A,]^ = Z:^ = X]"^j /:? and k^ can be any real number. However, ui is not square integrable and so does not belong to C2{R„), where R„ denotes the entire n-dimensional Euclidean space. From the theory of Fourier transforms in /?„, it is known that if

m = -^^ f^ cxp{ik . x)h{x) d'x (10.9.29) PARTIAL DIFFERENTIAL EQUATIONS 499

(here dQ„ and d'^x denote the same thing), then

h{x) = j exp{-ik . x)h{k) d"k (10.9.30)

from which it follows that

(10.9.31) {lit) JR„ But this means that

= / UjtutdU (10.9.32)

and

Xj^u^-uIJU. (10.9.33)

since LI = L and Luj^ = ^k^k- This is the n-dimensional version of the one- dimensional case of the spectral decomposition of a self-adjoint operator having only continuous spectra. The Schrodinger operator for the electronic states of the hydrogen atom is an example of a singular (because H = AC^/?) ^^^ 1^(^)1 = ^^ at |3c| = 0) Sturm-Liouville operator having an infinite number of eigenvectors plus continuous spectra. The general theorem for self-adjoint partial differential equations is the same as that for ordinary self-adjoint differential equations:

THEOREM. If h is a linear self-adjoint operator in some domain P^ in a Hilbert space of functions u(x) of the n-dimensional Euclidean vector x, then the identity operator I can be represented by the spectral decomposition

I = E*n.*m + / nv^\d"k, (10.9.34) in ^n

where L(/)^(x) — ^m^^ix) and Luj^(x) = Xpiix). The 0^ are eigenvectors; i.e., they are square integrable in the Hilbert space. The u^ are not square integrable and are therefore not eigenvectors and represent the continuous spectra. The X„ and X^ are real numbers, representing discrete and continuous spectra, respectively. Again, the reader is referred to more advanced texts for the proof of this theorem. ILLUSTRATION 10.9.1 (Heat Transfer in a Laminar-Flow Pipe). As an illustration of the separation of variables technique, consider a fluid flowing in a cylindrical pipe as shown in Figure 10.9.1. Assuming that the flow is laminar, the velocity in the z direction can be shown to be parabolic, satisfying the formula vJr) = 2D,•{"&] (10.9.35) where VQ is the average (or bulk) velocity. Assume that the fluid enters a length of pipe at z = 0 with a uniform temperature TQ. We would like to heat the fluid 500 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

R 1. ) -— z

FIGURE 10.9.1

by holding the inside surface of the pipe (at r = R) fixed at some temperature T^, Our goal is to find the temperature of the fluid as a function of r and z further downstream. A differential energy balance on the appropriate control volume gives the well- know"' n equatiottion of change for the temperature profile .(?--)= kV^T, (10.9.36) where p is the fluid density, Cp is the fluid heat capacity, and k is the fluid thermal conductivity. We wish to solve for the steady-state temperature profile in which the time derivative is 0. Further, we note that the velocity of the fluid only has a component in the z direction. In cylindrical coordinates, the above equation for steady-state conditions is thus given by

dT [I d / dT , vAr) (10.9.37) '-) \

where we define a = k/[pCp). If the velocity is large enough or the pipe radius small enough, we can neglect the axial component of conduction; i.e., we can assume that d'^T/dz^ ^ 0. In this case, using the velocity profile in Eq. (10.9.35), the above equation becomes

and the boundary conditions are r(z = o,r) = ro

(10.9.39) dT\ = 0. dr =0 Before proceeding, it is convenient to define the following dimensionless variables: T,-T zot 0 = (10.9.40) ?^ «=«• PARTIAL DIFFERENTIAL EQUATIONS 5 0 I

for which Eq. (10.9.38) becomes

(l-?')^ = r^U^I- (10.9.41)

The boundary conditions become

0(^=0,^) = I 9(;,l = l) = 0 (10.9.42) = 0. 9? (i) We can solve this equation by using the separation of variables technique and assuming a solution of the form

0(f,?) = Z(f)S(t). (10.9.43)

Show that, with this solution form, Eq. (10.9.41) can be separated into the two coupled equations dZ , (10.9.44) -— + A,^Z = 0

The solution to Eq. (10.9.44) is simply

Z(0 = exp(-X^O. (10.9.46)

This exponential behavior in ^ seems reasonable in light of the boundary conditions for the problem. Thus, we are sure we chose the correct sign for A^. Next, we turn to the solution for Eq. (10.9.45). (ii) First, show that the differential equation in Eq. (10.9.45) represents a Sturm-Liouville eigenproblem. What are the corresponding values of 5* (5), p($), and g(^)? What are the boundary conditions? (iii) Show that the polynomial functions

CO «x(?) = Ec„t'", (10.9.47) n=0 where

Co = 1

2 ^' "~ ~T (10.9.48) '^^{i) (^«-2--«-i)^ are solutions to Eq. (10.9.45). Write an expression representing the orthogonality conditions for the u^{^). 502 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

(iv) Applying the boundary conditions for 3(^), obtain an expression for the eigenvalues X^, Use the Newton-Raphson method to solve for the first 10 lowest eigenvalues. (v) The spectral resolution theorem allows us to compose a solution from a superposition of eigenfunctions as

oo m,^) = E«««x,(?)exp(-A^O. (10.9.49)

Derive an expression for the parameters a„ using the boundary condition when • • I C = 0.

PROBLEMS

1. Find the adjoint L^ of the operator L, where

d^u d^u du ^ Lu = -—r — 2-—r —;—h 2M, 0 < X < 1, dx^ dx^ dx

and M(0) = 0 and u'{\) = 0. Give V and D^t. 2. Find the adjoints L^ of the operators L, where

d'^u Lu = —T, 0 < jc < 1, dx^ and the boundary conditions are

(a) M(0) = w(l) = M^^O) = u"{\) = 0 (b) u'{0) = u'{0) = u'"(0) = ii"{l) = 0 (c) M(0) = u(l) = u'(0) = u"{\) = 0 and M^(0) = u\\) (d) M(0) = u"{0) = u{\) = u''\l) = 0. For which set (or sets) of boundaiy conditions is the operator self-adjoint? 3. Consider the operator

Lu = exp(r^)V • (exp(-r^)VM),

where V is the gradient in three dimensions; i.e., in Cartesian coordinates, V has the form

.d .a .a V = i—+j—^k — . dx dy az

Also r = jcf + yj -f zj and r^ = x^ -}- y^ + z^ in these coordinates. Suppose the Euclidean domain for the problem is R^, namely, all of the three-dimensional Euclidean space. In what Hilbert space will the operator L be self-adjoint? Is L a Sturm-Liouville operator? Is it regular or singular? Why? PROBLEMS 503

4. Consider the differential expression

du - — -h2u, dx (a) Solve the initial value problems

Lii = 1, M(0) = 1, u\0) = 0, u'(0) = 2

and

L^i; = jc, u(0)=0, v\0) = h v\0) = 1,

where L^ is the differential expression of the formal adjoint of L. (b) Plot M(JC) and v{x) for -2 < jc < 2. 5. Give fundamental systems for LM = 0 and L^v = 0, where L and L^ are defined in Problem 4. 6. Give a fundamental system for Lii = 0, where

d'^u Lu = —d7J — qu, and ^ is a positive real constant. 7. Consider the differential operator

d^u du Lu — -—T: -\-a{x)~—\-b{x)u, a < X

— -jx^u = f(x), 0 < ^ < 1,

with M(0) = u{a) = u"{Q) ~ u"{\) = 0 has a solution. 10. Repeat Problem 9, but with the boundary conditions

M'(0) = M'(1) = u\0) = M'''(1). 5 04 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

11. Repeat Problem 9, but with the boundary conditions

M(0) = u'\0) = M(1) = u''{i) = 0.

12. Find the adjoint operator and the solvability condition for the problem

— z=f{x), 0

M(0) = M(1) = M''(0) = u'\l) = 0, u'{0) - u\l) = 2. 13. Find the adjoint operator and the solvability conditions for the problem "T1?T — M = fM, —n < X

Lu = —-7-7 — 9w, 0 < jc < 1,

15. For the operator given in Problem 14, solve the problem

Lu = fix), 0 < ^ < 1,

with M(0) = yj, u\l) = y2. Plot u{x) versus x for f{x) = x^, yj = 1, and y2 = 2. 16. Find Green's function for the operator

Lu = :z—ku, 0 < X < 1, dx^ with M(0) = M(1) and M'(0) = u\\), 17. Prove that Green's function g(r, r') for the operator

Lu = V^M -f ^^M, u, Lu € £2(^3)

is -» -*

|r-r'| where q is a. positive constant and V^ is the Laplacian in a three-dimensional Euclidean vector space. R^ means that the domain is the entire space. Hint: Show that if

u(r)= f g{r~r')f{r')d'r\

then Lu = f for an arbitrary f e £2(^3)- PROBLEMS 505

18. Consider the eigenproblem d { ^du\ Lu — I X ~- I = Xu, \ < X < e, M(1) = u(e) = 0. dx \ dx ) Show that the eigenfunctions of this equation are

(\)^{x) — (2A:)~^/^sin(n7rlnjc), n = 1,2,...,

and that they form a complete set \vi C^iy.e), What are the corresponding eigenvalues? Prove that (0„, 0^,) = 0, n ^m. Give the spectral decomposition of the Green's function operator G for L. Solve the problem Lu = x^ and plot u(x) versus JC for 1 < jc < ^. 19. Find the eigenvalues and normalized eigenfunctions of

Lu = r, —1 < X < 1, dx"^ where u'{—\) = cotaM(—1) and u'(\) = cot^w(l), and 0 < or < TT, 0 < p < n. The situation when a = )9 = 0 corresponds to M(—1) = M(1) = 0. Can an eigenvalue be less than 0? Prove your answer. 20. Find the eigenfunctions and eigenvalues of the operators in Problems 2 and 6 for the boundary conditions (a) and (b) in Problem 2. Do the eigenfunctions form a complete orthonormal set in these cases? Why? 21. Find the spectral decomposition of L, where

dSi Lu = —T, 0 < jc < oo, dx for the boundary conditions (a) u{0) = u'\0) = 0. (b) u'(0) = u'"(0) = 0. :. Suppose dhi Lu ~ r, 0 < JC < OO, dx^ where w(0):= Oand u e £2(0' ^- Use the spectral resolution of exp(—rL) to solve d^u du a? ~ i7' where M(0, f) = 0 and

M(JC,0) = exp(—x^).

23. Consider the operator

Lu = —-r-Tfu , 0 < JC < OO, "d? 506 CHAPTER 10 UNEAR DIFFERENTIAL OPERATORS IN A HUBERT SPACE

where cosQfM(O) — sinorw'CO) = 0, or is real, and 0 < a < TT. This problem can have discrete and continuous spectra, depending on a. The Hilbert space for this operator is ^2(0' ^' (a) Give the spectral resolution of I, L, and G = L~^ for the case 0 < a < (7r/2). (b) Give the spectral resolution of I, L, and G = L~* for the case (n/l) < a < 7T. 24. The temperature of a slab of material held between two plates and internally heated obeys the equation

d^T dT

where b is the length of the slab, a is the thermal diffusivity (thermal conductivity divided by the product of the density and the heat capacity), and /(jc) is the internally generated power divided by the product of the density and the heat capacity. Suppose one end of the slab is insulated, so that dT — (0,0 = 0, ax and the other end is at a fixed temperature, i.e.,

T{x = b,t) = T,,

Suppose also that the initial temperature distribution is given by

T{x, r = 0) = TQ(X), (a) Use the spectral decomposition of cxp(—ath) to formally solve for T(x,t), where L is the operator defined by

Lu = —-4, w(0) = 0, u\b) = 0

and u, Lu e £2(0, b). (b) Define a convenient set of units such that a = 1 and b = \. Suppose that TQ{X) = 100 and / = IOJC. Plot T versus JC for r = 0, 1, 2, 5, and 10. 25. Consider a long cylinder of radius r = a, which has an initial temperature distribution of

T{rJ = 0) = To(r).

Suppose the cylinder is immersed in a uniform heat bath that fixes the surface temperature at T^, i.e.,

Tir=a,t) = Ty,. PROBLEMS 507

From symmetry, it follows that

— =0 at r = 0. dr The differential equation for this problem is

1 a / dT\ _dT_ a I rdr \ ^ dr)'~~dt'

Using the fact that a solution to the eigenproblem

du\ 'rTr\

find the temperature T of the cylinder as a function of r and t. Assuming that r = 2 in. T^ = 600°C and TQ = 25^C, and that the value of a is 3.3 ft^/hr (the value of the thermal diffusivity of aluminum), calculate T(r, t). Plot T versus r for / = 10 sec, 10 min, 20 min, and 1 hr. Also, plot r(r = 0, t) versus t, 26. Suppose a can of beer at room temperature (75°F) is put in a refrigerator at 40°F. The can is 6 in. long and 2| in. in diameter. Calculate how long the beer must be left in the refrigerator for its center to reach 60°F. Assume the following:

(i) The temperature depends on the radial distance r from the center of the can and axial distance x from the bottom of the can, i.e., T = T{r,x,t). (ii) The temperature of the surface of the can is 40°F, i.e.,

T(r = 1.25 in., x,t) = T{r,x =0,0 = r(r,jc = 6in.,0=40T.

(iii) Because of symmetry, dT/dr = 0 at r = 0. (iv) The effect of the metal can on heat transfer is negligible and the thermal diffusivity of beer is the same as that of water, namely, a = 5.3 X 10-^ ft^/hr.

The differential equation for this problem is

ri a / dT\ d^Tl dT

Hint: Use the method of separation of variables to solve the problem. 508 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

27. Suppose a particle is confined to a cube of length 5 x 10~^ cm on a side. The energy E of such a particle satisfies the Schrodinger equation

Confinement in quantum mechanics means i/r = 0 on all the faces of the cube (and outside the cube). Find the lowest energy of the confined particle for (a) a proton and (b) an electron. The value of Planck's constant h is 1.054 x 10"^^ ergs. For an electron m^ = 9.11 x 10~^^ g and for a proton m^ = 1836me. 28. In organic molecules having conjugated double bonds, some electrons move freely from one end of the molecule to the other (or from one end of the conjugation sequence to the other). Thus, the electrons behave approximately like particles in a one-dimensional box. When white visible light shines on such molecules, photons with the frequency V = (^26 ~ ^\e)/h ^i^l b^ absorbed, where e^^ is the ith electronic energy state. Thus, molecules with conjugated bonds can be used as dyes. Using the quantum theory of a particle in a one-dimensional box, estimate the length 8 that an organic molecule with conjugated bonds must have to absorb photons with wavelengths X = 6500 or 4000 A. These correspond to red and violet, respectively, in the spectrum of white light. 29. Consider the cannonball problem in Example 10.8.2. Suppose the initial concentration in the ball is 0 and that for r > 0 the concentration on the surface varies as C^ = h{t). Find the concentration distribution C(r, 0 for f > 0. Assume that

/i(0 = Co(l-exp(-j80).

and compute ^M/Ana^C^ versus Dt/b^ for ^D/Z?^ = 0.1,0.5, 1, and 2, where M = 47T f C(r,tydr.

M is the total amount of carbon in the cannonball at time t, 30. Consider a string fixed at its ends (jc = 0 and jc = L) but free to move in the z direction everywhere else (0 < x < L). For small vibrations, the time-dependent displacement u of the string is given by the partial differential equation:

—7 — V T-T =0, 0 < JC < L,

with the boundary conditions

M(0, 0 = M(r, L) = 0, M(0, JC) = UQ{X), M,(0, jc) = 0,

where u^ denotes the partial derivative of u with respect to r, v is the transverse wave velocity of the string and is assumed to be constant. PROBLEMS 509

X = Xo x = L FIGURE iO.P.30

(a) Use the separation of variables technique to show that the vibrational motion of the string is given by the equation

nnx nnvt u{x, 0 = 5Z ^n si^ c^s

(b) Solve for the constants a„ when the string is initially deformed as UQ{X) as shown in Fig. 10.P.30. 31. Solve the eigenproblem

= AM, 0 < ;t < :7r,

u{0) = 0, U{JT) = Xu{n). Do the eigenfunctions of this problem form a complete set? This is not easy to answer since the eigenvalue also appears in the boundary conditions. Hint'. Study the operator

'~u'{xy LU =

for vector functions.

u{x) e C^(0, 7T) and Wj constant.

Define the inner product

(U, V)= / u(x)v(x)dx-{-UiVi

for this vector space. The domain of L is completed with the requirement that

M(0) = 0 and u^ = u{n).

Is L self-adjoint? What conclusions about the eigenproblem posed above can be drawn from the spectral theory of LU? 5 I 0 CHAPTER 10 LINEAR DIFFERENTIAL OPERATORS IN A HILBERT SPACE

FURTHER READING

Bates, D. R., editor, (1961). "Quantum theory." Academic, New York. Carlslaw, H. S. and Jaeger, J. C. (1959). "Conduction of Heat in Solid." Oxford Univ. Press, Oxford. Churchill, R. V. (1958). "Operational Mathematics." McGraw-Hill, New York. Codington, E. H. and Levinson, N. (1955). "Theory of Ordinary Differential Equations." McGraw Hill, New York. Conti, R. (1977). "Linear Differential Equations and Control." Academic, London. Crank, J. (1975). "The Mathematics of Diffusion." Clarendon, Oxford. Dym, H. and McKean, H. R (1972). "Fourier Series and Integrals." Academic, New York. Friedman, A. (1969). "Partial Differential Equations." Holt, Reinhart and Winston, New York. Friedman, B. (1956). "Principles and Techniques of Applied Mathematics." Wiley, New York. Garabedian, P. R. (1964). "Partial Differential Equations." Wiley, New York. Hellwig, E. (1964). "Partial Differential Equations," Blaisdell, New York. Hellwig, E. (1964). "Differential Operations of Mathematical Physics." Addison-Wesley, Reading, MA. Kovach, L. D. (1984). "Boundary Value Problems." Addison-Wesley, Reading, MA. Krein, M. G. (1983). "Topics in Differential and Integral Equations and Operator Theory," BirkhSuser, Basel. Krieth, F. (1973). "Principles of Heat Transfer." Harper & Row, New York. Lanczos, C. (1961). "Linear Differential Operators." Van Nostrand, London. Merzbacher, E. (1970). "Quantum Mechanics." Wiley, New York. Millen, K. S. (1963). "Linear Differential Equations in the Real Domain." W. W Norton and Co., Inc., New York. Morse, P. M. and Feshbach, H. (1953). "Methods of Theoretical Physics," McGraw-Hill Book Co., New York. Naylor, A. W. and Sell, G. R. (1982). "Linear Operator Theory in Engineering and Science," Springer- Verlag, New York. Ramkrishna, D. and Amundson, N. R. (1985). "Linear Operator Methods in Chemical Engineering with Applications to Transport and Chemical Reaction Systems," Prentice Hall, Englewood Cliffs, NJ. Rapp, D. (1971). "Quantum Mechanics," Holt, Rinehart and Winston, Inc., New York. Schwabik, S. and Turdy, M. (1979). "Differential and Integral Equations: Boundary Value Problems," Reidel, Dordrecht. Stakgold, I. (1979). "Green's Functions and Boundary Value Problems," Wiley, New York. Titchmarsh, E. C. (1946). "Eigenfunction Expansions Associated with Second Order Differential Equa- tions," Oxford Univ. Press, Oxford. Zwillinger, D. (1989). "Handbook of Differential Equations," Academic Press, San Diego. APPENDIX

A.I. SECTION 3.2: GAUSS ELIMINATION AND THE SOLUTION TO THE LINEAR SYSTEM Ax = b

The following codes illustrate how to implement the Gauss elimination algorithms of Sections 3.2 and 3.3 to solve the linear system Ax = b. We include, in the following, implementations of simple Gauss elimination and Gauss elimination with partial and complete pivoting. The main structure of the routines follows that outlined in Section 3.2.

Auxiliary Functions The following will be needed in the Gauss elimination routines that follow. The function RowSwap[a,i,j] is used to swap rows / and j in the matrix a. Likewise, the function ColumnSwap[a,i,j] swaps columns / and j of a. Note that the function RowSwap[] is called in ColumnSwap[].

RowSwap [ a_, i-Integer, j .Integer ] : = If[ i==j, a, If[ j>i. Delete[ ReplacePart[ Insert[a, Part[a,i],j], Part[a,j], i], j+1],

511 5 I 2 APPENDIX

RowSwap[a,j,i] ] ] ColumnSwap[a_, i_Integer, j_Integer] := Transpose [ RowSwap[Transpose[a],i,j] ]

Simple Gauss Elimination The simple Gauss elimination routine is defined below in the function SimpleGaussElim[a,b], where a is a square nonsingular matrix and b is a vector of equivalent dimension. The output contains the transformed augmented matrix ([a,b])„.

SimpleGaussElim[a_,b_] : = Module[{aa, len, factor, i, j }, len = Length[a]; aa = Transpose[Append[Transpose[a],h]]; For[ i=l, i

Solution of Linear Systems The function SimpleGaussElim[a,b] defined above is used below in the function GaussSolve[a,b], which implements back substitution to generate the solution vector to the linear system ax = b.

GaussSolve[a_,b_] : = Module[ {aa,len,xsol,i,j}, len = Length[a]; aa = SiinpleGaussElim[a,b] ; If[ aa == -1, Print["Error: no solution GAUSS ELIMINATION AND THE SOLUTION TO THE LINEAR SYSTEM Ax = b 513

found"]; Return[ ]; ]; xsol = Table[0,{len}] ; xsol[[len]] = aa[[len,len+1]]/aa[[len,len]]; For[ i=len-l, i>l, i--, xsol[[i]] = aa[[i, len+1]]; For[ j=i+l, j

Gauss Elimination with Pivoting The Gauss elimination routine below (defined in the function GaussElim[a,b, pivot]) is an extended version of SimpleGaussElim[a,b] and includes both partial and complete pivoting. The value of the pivot can be set to (pivot=l) for partial (pivot=2) for complete pivoting. The default (pivot=0) means no pivoting, in which case the function is identical to SimpleGaussElim[a,b] defined above. GaussElim[a_,b_, pivot-: 0] : = Module[{aa, len, factor, i, j ,k,max, imax, jinax,piv}, piv = pivot; If[ (piv 7^ 1 && piv i^ 2), piv=0]; len = Length[a]; aa = Transpose[Append[Transpose[a],b]]; For[ i=l, i max, max = aa[[j,i]]; 5 I 4 APPENDIX

imax = j;

If[ max == 0, Print["Error: singular matrix"]; Return[-1]; ]; aa = RowSwap[aa,i,imax]; ]; If[ piv == 2, max = Abs[aa[[i, i ] ] ] ; imax = i; jmax = i; For[ j=i+l, j max, max = aa[[j,k]]; imax = j; jmax = ]<:; ]; ]; ]; If[ max == 0, Print["Error: singular matrix"]; Return[-1]; ]; aa = RowSwap[aa,i,imax]; aa = ColumnSwap[aa,i,jmax]; ]; For[ j=i+l, j

A.2. EXAMPLE 3.6.1: MASS SEPARATION WITH A STAGED ABSORBER

The following code is an example of how to include Gauss elimination in the definition of complicated functions in Mathematica. Below, we construct the function Xout[n], which returns the outlet concentration of solute in an n-stage ideal absorber based on liquid and vapor mass flow rates (L and G), inlet liquid and vapor solute concentrations (xin and yin), and the vapor-liquid partition coefficient (K) defined by y = Kx. ITERATIVE METHODS FOR SOLVING THE LINEAR SYSTEM Ax = b 515

Initialization of flow rates and partition coefficient: L and G are in units of Ibn^/hr and K[=](weight fraction solute in vapor)/(weight fraction solute in liquid).

L = 3500; G = 4000; K = 0.876;

Initialization of inlet solute concentration (in units of weight fraction):

xin = 0.01; yin = 0.0;

Program module for the function Xo^Jnl. The variables A and b are the corresponding matrix and vector used in Example 3.6.1 (Eqs. (3.6.37) and (3.6.38)). Here we use LinearSolve, the Mathematica built-in Gauss elimination routine for solving the system Ax = b. Alternately, the Gauss elimination routines above could be substituted.

Xo^t t^-^^teger] := Module[{A, b, x, i, j}, A = Table[ If[ i == j, -(L + GK), If[ j == i+l, GK, If[ j == i-1, h, 0] ] ], {i, 1, n}, {j, 1, n} ]; b = Table[ If[ i == 1, L xin, If[ i == n, G yin, 0]

{i, 1, n} ]; X = LinearSolve[A, -b]; x[[n]] ]

A.3. SECTION 3.7: ITERATIVE METHODS FOR SOLVING THE LINEAR SYSTEM Ax = b

The following codes illustrate how to implement the iterative methods of Section 3.7 to solve the linear system Ax = b. We include the Jacobi, Gauss- Seidel, and SOR methods in the form of user-defined functions in Mathematica. It should be noted that the control variables (TOLERANCE and Nmax) can be adjusted to the user's taste. TOLERANCE is the convergence criterion for the residual defined by |x^*+*^ — jc^*^|. Nmax is the maximum allowed number of iterations. 5 I 6 APPENDIX

Jacob! Method LinearJacobi [a_List, b_List, xO_List] : = Module[{len, xnew, xold. Residual, TOLERANCE, Nmax}, TOLERANCE = 10"^\- Nmax = 100; len = Length[b]; xnew = Table[0, {len}]; xold = xO; Residual = 1.0; n = 0; While[Residual > TOLERANCE, n++; If[n > Nmax, Print["Warning: Exceeded maximum iterations."]; Break[]; ]; For[i = 1, i < len, i++, xnew[[i]] = - Sum[ a[[i, j]] xold[[j]] // N, {j, 1, i-1}] - Sum[ a[[i, j]] xold[[j]] // N, {j, i+1, len}]; xnew[[i]] += b[[i]] // N; xnew[[i]] /= a[[i, i]] // N; ]; Residual = ^/(xne^fJ — xold) • (xnew — xold) // N; xold = xnew;

xnew ]

Gauss-Seidel Method LinearGaussSeidel [a_List, b_List, xO_List] : = Module[{len, xnew, xold. Residual, TOLERANCE, Nmax}, TOLERANCE = 10" (-15) ; Nmax = 100; len = Length[b]; xnew = Table[0, {len}]; xold = xO; Residual = 1.0; n = 0; While[Residual > TOLERANCE, n++; If [n > Nmax, ITERATIVE METHODS FOR SOLVING THE LINEAR SYSTEM Ax = b 517

Print["Warning: Exceeded maximum iterations."]; Break[]; ]; For[i = 1, i < len, i++, xnew[[i]] = - Sum[ a[[i, j]] xnew[[j]] // N, {j, 1, i-1}] - Sum[ a[[i, j]] xold[[j]] // N, {j, i+1, len}]; xnew[[i]] += b[[i]] // N; xnew[[i]] /= a[[i, i]] // N; ]; Residual = V(^new — xold) • (xnew — xold) // N; xold = xnew; ]; xnew

Successive Overrelaxation Method LinearSOR[a_List, b_List, xO_List, co.] : = Module[{len, xnew, xold, Residual, TOLERANCE, Nmax}, TOLERANCE = 10"^^- Nmax = 100; len = Length[b]; xnew = Table[0, {len}]; xold = xO; Residual = 1.0; n = 0; While[Residual > TOLERANCE, n++; I f [ n > Nmax, Print["Warning: Exceeded maximum iterations."]; Break[]; ]; For[i = 1, i < len, i++, xnew[[i]] = - Sum[ a[[i, j]] xnew[[j]] // N, {j, 1, i-1}] - Sum[ a[[i, j]] xold[[j]] // N, {j, i+1, len}]; xnew[[i]] += b[[i]] // N; xnew[[i]] /= a[[i, i]] // N; xnew[[i]] *= co II N; xnew[[i]] += (1 - co) xold[[i]] // N;

Residual = V(xnew — xold) • (xnew — xold) // N; 5 I 8 APPENDIX

xold = xnew;

xnew ]

A.4. EXERCISE 3.7.2: ITERATIVE SOLUTION TO Ax = b — CONJUGATE GRADIENT METHOD

The following code illustrates how to implement the conjugate gradient method of Exercise 3.7.2 to solve the linear system Ax = b. The routine is in the form of the user-defined Mathematica function, LinearConGard[]. The control variables (TOLERANCE and Nmax) can be adjusted to the user's taste. TOLERANCE is the convergence criterion for the residual defined by \x^^'^^^ — x^^^. Nmax is the maximum allowed number of iterations.

Conjugate Gradient Method The following routine, LinearConGrad[a,b,xO], solves for ax = b using the initial guess vector xO.

LinearConGrad[a_List, b_List, xO_List] : = Module[{xnew, xold. Knew, Rold, TOLERANCE, Nmax, n, rl, rO, pO, a, p}, TOLERANCE = 10~^^- Nmax = 500; rO = b - a . xO; pO = rO; xold = xO; Rold = VConjugate[rO] . rO; n = 0; While[Rold > TOLERANCE, n++; I f [ n > Nmax, Print["Warning: Exceeded maximum iterations."]; Break[]; ]; Rold^ a = (Conjugate[pO] . a . pO)' xnew = xold + a pO; rl = rO - aa • pO; Rnew = VConj ugate[rl],r 1 // N; /Rnew\ 2 ^ = [^^) • pO = rl + /3 pO; xold = xnew; Rold = Rnew; CONVERGENCE OF THE PICARD AND NEWTON-RAPHSON METHODS 5 I 9

rO = rl;

xnew ]

A.5. EXAMPLE 3.8.1: CONVERGENCE OF THE PICARD AND NEWTON-RAPHSON METHODS

The following codes are examples of how to implement the Picard and Newton- Raphson methods. In both cases, the residual functions are defined externally and the solutions at each iteration are stored in the lists PSolution and NRSolution.

Picard Method Define the Picard residual vector g:

gl[xl_, x2>] := xl - xl Tan[xl] + Vx22 - xl^ g2[xl_, x2J := x2 + ""r^ T"^^^... + 7^^^ - (xl + 7r/4)^ Tan[xl + 7r/4] Define the guess variables and provide initial guesses:

xl = 1.0; x2 = 1.0;

Initialize the solution lists. PSolution 1 and PSolution2 contain the accumulated guesses for xl and x2, respectively, at each step. PSolutionNorm contains the accumulated solution norms at each step.

PSolutionl = {xl}; PSolution2 = {x2}; PSolutionNorm = {};

Initialize the program parameters. TOLERANCE is the user-defined convergence criterion. Nmax is the user-defined maximum number of iterations.

TOLERANCE = 10~^• Nmax = 100;

Program module for the Picard method. The output values of the calculation are printed after the last iteration.

Module[{i, xlnew, x2new, norm}, norm = 1.0; 1 = 0; While[norm > TOLERANCE, xlnew = gl[xl, x2]; x2new = g2[xl, x2]; norm = ^(xlnew - xl)^ -f (x2new — x2)^; 520 APPENDIX

AppendTo[PSolutionl, xlnew]; AppendTo[PSolution2, x2new]; AppendTo[PSolutionNorm; norm]; xl = xlnew; x2 = x2norm; i++; If[ i > Nmax, Print["WARNING: Exceeded maximum iterations."]; Break[]; ]; ]; Print[xl " = xl"]; Print[xl " = xl"];

Newton-Raphson Method Initialize the solution lists. NRSolutionl and NRSolution2 contain the accumulated guesses for xl and x2, respectively, at each step. NRSolutionNorml and NRSolu- tionNorm2 contain the accumulated values of the norms, defined as the modulus of the difference in the update vectors and the modulus of the residuals, respectively.

NRSolutionl = {xl}; NRSolution2 = {x2}; NRSolutionNorml = {}; NRSolutionNorm2 = {};

Program module for the Newton-Raphson method. The output values of the calculation are printed after the last iteration.

Module[{i, jac, residual, xdel, norml, norm2, TOLERANCE, Nmax}, TOLERANCE = 10~^^; Nmax = 100; norml = 1.0; norm2 = 1.0; i = 0; While[norml > TOLERANCE && norm2 > TOLERANCE, jac = {{jll[xl, x2], jl2[xl, x2]}, {j21[xl, x2], j22[xl, x2]}}; residual = {-fl[xl, x2], -f2[xl, x2]}; xdel = LinearSolve[jac, residual]; norml= yxdel|[l]l2+xdel|[2f; norm2 = y residualjl]]^ + residual[[2]|^; xl += xdelOI; x2 += xdeipj; AppendTo[NRSolutionl, xl]; STEADY-STATE SOLUTIONS FOR A CONTINUOUSLY STIRRED TANK REACTOR 52 I

AppendTo[NRSolution2, x2]; AppendTo[NRSolutionNorml, norml]; AppendTo[NRSolutionNorm2, norm2]; i + + ; I f [ i > Nmax, Print["WARNING: Exceeded maximum iterations."]; Break[]; ];

Print[xl " = xl"]; Print[x2 " = x2"]; ]

A.6. EXAMPLE 3.8.2: STEADY-STATE SOLUTIONS FOR A CONTINUOUSLY STIRRED TANK REACTOR

The following code is an example of how to implement the Newton-Raphson method to solve for the steady-state solutions (T and c^) for an ideal, continuously stiiTed tank reactor (CSTR). Following Example 3.8.2, we first input the problem parameters, which include the inlet stream properties, reactor dimensions, kinetic constants, the heat transfer coefficient, and any other physical constants. We then define the reaction rate of component A as a user-defined function "reaction." Next, we define the residual functions and the Jacobian element functions and construct the function CSTRSolve as the Newton-Raphson routine for solving the problem.

Physical Constants Inlet stream conditions:

TO := 273; (* Inlet Temperature[deg. F] *) q = 1250; (* Inlet Volumetric Flow Rate [Ibm/hr] *) cAO = 0.15; (* Inlet Concentration[moles/ft^] *)

Reactor dimensions:

V = 4500; (* Volume of Reactor[ft^] *) A = 25; (* Effective Surface Area of Reactor[ft^] *)

Kinetic constants:

kO := 0.590; (* Preexponential Rate Constant [1/hr] *) EO := 2500; (* Activation Energy [Btu/mole] *) R = 8.314; (* Gas Constant [Btu/lbmole °R] *) 522 APPENDIX

Thermodynamic and heat transfer constants:

H = - 33.5; (* Heat of Reaction [Btu/lbmole] *) U = 258; {* Overall Heat Transfer Coefficient *) Tb = 298; (* Ambient Temperature[deg. F] *) p = 15.5; (* Liquid Density of Solute [Ibm/ft^] *) Cp = 4.54; {* Heat Capacity of Solute [Btu/lbmole (deg. F)] *)

Auxiliary Functions Define the function reaction [T, CjJ as the production rate of component A via the reaction:

reaction[T_, C-] := - kO Exp[-EO / (RT) ] c

Define the residual vector components:

fl[T_, c_] := q(cAO - c) + reaction[T, c]V f2[T_, c_] := q p Cp (TO - T) + reaction[T, c]V H - A U(T - Tb)

Define the Jacobian matrix elements:

Dll[xl-, x2_] := D[fl[x, y], x] /. {x -> xl, y ^ x2} jl2[xl_, x2_] := D[fl[x, y]. y] /. {x -> xl, y ^ x2} J21[xl_, x2_] := D[f2[x, y], x] /. {x -> xl, y -^ x2} j22[xl_, x2_] := D[f2[x, y] , y] /. {x -> xl, y -> x2}

Newton-Raphson Method Program module for the Newton-Raphson method. Ti and ci are the initial guesses for the outlet temperature and concentration.

CSTRSolve[Ti_, ci_] : = Module[{i, T, c, jac, residual, xdel, norml, norm2, TOLERANCE, Nmax}, TOLERANCE = 10"^^ Nmax = 100; norml = 1.0; norm2 =1.0; T = Ti; c = ci; i = 0; While[norml > TOLERANCE && norm2 > TOLERANCE, jac = {{jll[T, c], jl2[T, c]}, {J21[T, c], j22[T, c]}}; THE DENSITY PROFILE IN A LIQUID-VAPOR INTERFACE 523

residual = {-fl[T, c], -f2[T, c]}; xdel = LinearSolve[jac, residual] // N; norml = Vxdel • xdel // N; norm2 = Vresidual • residual // N; T += xdeljlj // N; c += xdelI2]l // N; i++; If [i > Nmax, Print["WARNING: Exceeded maximum iterations.'']; Break[]; ];

A.7. EXAMPLE 3.8.3: THE DENSITY PROFILE IN A LIQUID-VAPOR INTERFACE (ITERATIVE SOLUTION OF AN INTEGRAL EQUATION)

The following codes illustrate how to implement the iterative solution of an integral equation. We have included two iterative methods for solving Eq. (3.8.50). The first routine is a modification to the successive overrelaxation method in which the vector g is updated at each step using the previous value of n, the density profile. The second routine uses the Newton-Raphson method to handle the nonlinear function g(n). It should be noted that in each routine the control variables (TOLERANCE and Nmax) can be adjusted to the user's taste. TOLERANCE is the convergence criterion for the residual defined by |jc^^^'^ — x^^^\. Nmax is the maximum allowed number of iterations.

Physical Constants van der Waals coefficients:

a = 10.5; b = 0.06;

Gas constant:

Rg = 0.082058; T = 300;

Equilibrium conditions:

P = 2.096; fji = -130.682; nl = 13.8478; ng = 0.0879781; 524 APPENDIX

Auxiliary Functions The following functions (and constants) are required in the routines below and are defined in Example 3.8.3. 2 a r Abs[x]i K[x_] := _^Exp|_—p;^j

a = NIntegrate[K[x] , {x, -oo, oo}];

g[x_, L_] := -IX - ng NIntegrate[K[y - x] , {y, L, oo}] - nl NIntegrate[K[y - x] , {y, -oo, -L}] Equilibrium chemical potential: r 1 1 n b Rg T uO[n_] := -Rg T Log 1 H 2 n a Ln b J 1 — n b

Successive Overrelaxation Method The function DensityS0R[L,5] uses the successive overrelaxation technique to solve Eq. (3.8.50) iteratively, where L is defined in Example 3.8.3 and 8 is the user-input discretization length for the array n. The value of g(n) at each step is updated the previous step's value of n. As an initial value for n, a linear density profile is imposed across the interface. DensitySOR[L_, 5_] : = Module[{}, o) = 1.15; TOLERANCE = 10"^^ Nmax = 100; Nstep = Ceiling[L / 8]; delta = h / Nstep; Kmat = Table[K[delta (j - i)]delta, {i, -Nstep, Nstep}, {j, -Nstep, Nstep}]; For[ i = 1, i < 2 Nstep + 1, i++, Kmat[[i, i]] -= a; ]; bvec = Table g[i delta, L] 1 - -(K[delta(-Nstep - 1 - i)]nl + K[delta (Nstep + 1 - i)] ng), {i, -Nstep, Nst ep}j; (nl — ng) / delta i\ xold = Table {i, -Nstep, Nstep}} xnew = xold; Residual = 1.0; THE DENSITY PROFILE IN A LIQUID-VAPOR INTERFACE 525

n = 0; While[Residual > TOLERANCE, n+ + ; I f [ n > Nmax, Print["Warning: Exceeded maximum iterations."]; Break[]; ]; For[i = 1, i < 2 Nstep + 1, i-f-+, xnew[[i]] = - Sum[Kmat[[i, j]] xnew[[j]] // N, {j, 1, i - 1}]

Sum[Rmat[[i, j]] xold[[j]] // N, {j, i + 1, 2 Nstep + 1}]; xnew[[i]] += (bvec[[i]] + uO[xold[[i]]]) // N; xnew[[i]] /= Kmat[[i, i]] // N; xnew[[i]] *= a> // N; xnew[[i]] += (1 - co) xold[[i]] // N; ]; Residual = VC^^^^w — xold) . (xnew —xold) // N; xold = xnew; ]; xnew

Newton-Raphson Method The function Density[L,5] uses the Newton-Raphson technique to solve Eq. (3.8.50) iteratively. NR and 8 are as defined above for the SOR implementation. As above, for an initial value for n, a linear density profile is imposed across the interface.

DensityNR[L_, 5_] : = Module[{n, i, jac, residual, x, xdel, norml, norm2, TOLERANCE, Nmax, Kmat, delta, bvec, Nstep}, TOLERANCE = 10"^^; Nmax = 100; norml = 1.0; norm2 = 1.0; n = 0; Nstep = Ceiling[L / 8]; delta = L / Nstep; Kmat = Table[K[delta (j ~ i)] delta, {i, -Nstep, Nstep}, {j, -Nstep, Nstep}]; 526 APPENDIX

For[ i = 1, i < 2 Nstep + 1, i++, Kmat[[i, i]] -= a; ]; bvec = Table[g[i delta, L] - I(K[delta (-Nstep - 1 - i)]nl + K[delta (Nstep + 1 - i)] ng), {i, -Nstep, Nstep}]; r. (nl-ng)/ delta i\ X = Table[ng + 1 1 I, {i, -Nstep, Nstep}]; While[norml > TOLERANCE && norm2 > TOLERANCE, If[n > Nmax, Print["Warning: Exceeded maximum iterations."]; Break[]; ]; jac = Table[Kmat[ [i, j]] - D[uO[z], z] /. {z -> x[[i]]}, {i, 1, 2 Nstep + 1}, {j, 1, 2 Nstep + 1}]; residual = Table [ Kmat • x - bvec]; For[ i = 1, i < 2 Nstep + 1, i-h+, residual[[i]] -= uO[x[[i]]]; ]; xdel = LinearSolve[jac, residual]; norml = Vxdel • xdel; norm2 = yresidual • residual; X += xdel; n++; ]; X

A.8. EXAMPLE 3.8.4: PHASE DIAGRAM OF A POLYMER SOLUTION

The following code is an example of how to implement the Newton-Raphson method with first-order continuation. The example involves calculation of the coexistence curves of two phases in a polymer solution using the Flory-Huggins theory. The residual functions for this case are defined externally.

Newton-Raphson Method Define the residual vector f:

fl[xl_, yl_, x^-' ^-] '= Log[xl] - Log[yl] + H)(l-i| (1 - xl) PHASE DIAGRAM OF A POLYMER SOLUTION 527

(l-xl)2 (l-yl)^^ "" Xl Xl ' f2[xl_, yl_, xl-r V-] := Log[l - xl] - Log[l - yl] + (1 - V) xl - (1 - V) yl V xl^ V yl^ ^ ~7l 7l~' Define the Jacobian matrix elements:

jll[xl_, yl_, xl_, y_] := 3^^ f 1 [xx, yy, XX^ ^^'i /• {xx -> xl, yy -> yl; XX -> xl/ v'y-^ v}; jl2[xl_, yl_, xl-/ ^-1 •= 9yy fl[xx, yy, XX^ ^v] /• {xx -^ xl, yy -> yl, XX -> xl. iî^ -> v}; j21[xl_, yl_, xl-/ i^-l •= ^xx f2[xx, yy, xX/ ^^H I - {xx -> xl, yy -> yl, XX -> xl/ v^v-> v^}; j22[xl_, yl_, xl-/ ^-] •= 9yy f2[xx, yy, xX/ ^^^3 /• {xx -> xl, yy -^ yl, XX -> xl/ iî^-> i^}; dl[xl_, yl_, xl-/ ^-] := ûu fl[xx, yy, xX > ^^1 /• {xx -> xl, yy -> yl, XX -^ xl/ i^^^ v}; d2[xl_, yl_, xl-/ i^-] •= ^vv f2[xx, yy, xX/ î^J /• {xx -^ xl, yy -> yl, XX -> xl/ iî^-> i^};

Parameter initialization: ymax is the maximum value of v to be calculated, vstep is the number of steps for incrementing v in the calculation, xstep is the number of steps for incrementing x» the dimensionless temperature.

ymax = 5; ystep = 40; Xstep = 20;

Calculate the step size for incrementing y.

ymax — 1 deltay = ystep Initialize the program parameters. TOLERANCE is the user-defined convergence criterion. Nmax is the user-defined maximum number of iterations. 1 TOLERANCE = -\ 10^ Nmax = 100; 528 APPENDIX

Define the guess variables: x[n] is the solvent concentration for the solvent-rich phase for the nth value of x (x = n nustep). Likewise, y[n] is the solvent concentration for the polymer-rich phase for the nth value of v. x[n] is the dimensionless temperature for the nth value of v.

x[0] = 0.0 y[0] = 0.0 X[0] = 0.0

Initialize all of the values of x[n], y[n], and x[n] to their values at the critical point (v = 1). This will be the starting point for the continuation steps in x-

For[n=l, n < x^tep - 1, n++, n 0,5 x[n] Xstep Y[n] = 1.0-x[nl; 1.0-2 x[n] X[n] i^og[i^]'

Initialize the above value for the starting point when x = 0.5.

x[xstep] = 0.5 ylxstep] = 0.5 X[xstep] =0.5

Program module. The calculation begins at y = 1 and, using first-order continuation, X is decremented by xstep and x and y are calculated via the Newton- Raphson method. Again, x is decremented using continuation for a total of xstep times. First-order continuation is then used to increment v and then x and y are recalculated at all values of x using the Newton-Raphson method and first-order continuation in sequence. Each set of curves generated at a specific value of v (i.e., x[x,v] and y[x»^J) is a corresponding phase diagram (coexistence curves) for that value of V. Note that in order to calculate a phase diagram for some target v ;:$> 1, this continuation scheme must be used.

Module[{i, j, k, v, x^, jac, residual, xdel, delta, norm}, V = 1.0; For[k = 1, k < ystep, k++, For[i = 1, i < xstep - 1, i++, residual = {-dl[x[i], y[i], x[i]/ ^1 deltav, -d2[x[i], y[i], X [i] , V] deltav}; jac = {{jll[x[i], y[i], x[i]/ ^] > jl2[x[i], y[i], x[i]. v;]}, {j21[x[i], y[i], x[i]. V], j22[x[i], y[i], x[i]. ^]}h xdel = LinearSolve[jac, residual]; GAUSS-JORDAN ELIMINATION AND THE SOLUTION TO THE LINEAR SYSTEM Ax = b 529

x[i] += xdelllU; y[i] += xdelpB; ]; V = deltay; 2.0

XC delta = XStep For[i = 1, i < xstep - 1, i++, X [i] = i delta; norm = 1.0; J = 0; While[norm > TOLERANCE, jac = {{jll[x[i], y[i], xtil' ^] > jl2[x[i], y[i], x[il. V]}, {j21[x[i], y[i], x[i]. V], j22[x[i], y[i], x[i]. ^]}h residual = {-fl[x[i], y[i], x[i]. V], -f2[x[i], y[i], X[i], ni']}; xdel = LinearSolve[jac, residual]; norm = ^/xdeljlf-f xdelpf; x[i] += xdelfflj; y[i] += xdelpl; j++; I f [ j > Nmax, Print["WARNING: Exceeded maximum iterations."]; Break[];

1.0 x[xstep] =1.0 , 1 + Vv y[Xstep] = x[xstep]; X [xstep] = xc; ];

A.9. SECTION 4.3: GAUSS-JORDAN ELIMINATION AND THE SOLUTION TO THE LINEAR SYSTEM Ax = b

The following code illustrates how to implement the Gauss-Jordan elimination algorithm of Section 4.3 to solve the linear system Ax = b. The main structure of the routine follows that outlined in Sections 3.2, 3.3, and 4.3. 530 APPENDIX

Auxiliary Functions The following functions will be needed in the Gauss-Jordan elimination routine below. The function RowSwap[a,i j] is used to swap rows i and j in the matrix a. Likewise, the function ColumnSwap[a,i,j] swaps columns / and j of a. Note that the function RowSwap[] is called in ColumnSwap[].

RowSwap[a_, i_Integer, j-Integer] := If[ i==j, a, If[ j>i, Delete[ ReplacePart [ Insert[a, Part[a,i], j], Part[a,j], i], j+1], RowSwap[a, j, i] ] ] ColunmSwap[a_, i_Integer, j .Integer] := Transpose[RowSwap[Transpose[a], i, j] ]

Gauss-Jordan Elimination The simple Gauss-Jordan elimination routine (without pivoting) is defined below in the function SimpleGaussJordan[a,b], where a is a square nonsingular matrix and b is a vector of equivalent dimension. The output contains the transformed augmented matrix ([a,b]Xr. Note that the function ColumnSwap[] is not included in the routine below. We have included it in this section in case the reader wishes to include pivoting in a separate routine.

SimpleGaussJordan[a_,b_] : = Module[{len, aa, factor, i, j}, len = Length[a]; aa = Transpose[Append[Transpose[a],b]]; For[ i=l, i

For[ j=i-l, j>l, j--, factor = aa[[j,i]] / aa[[i,i]]; aa = ReplacePart[ aa, aa[[j]] -factor aa[[i]],j]; ]; ]; aa

A. 10. SECTION 5.4: CHARACTERISTIC POLYNOMIALS AND THE TRACES OF A SQUARE MATRIX

The following codes are Mathematica programs for generating the traces and characteristic polynomial of a square matrix. The first routine is contained in the function Tr[j,a], which evaluates the jth trace of a where I < j < N. The second routine is contained in the function CharPoly[a,A], which expresses the characteristic polynomial in terms of the variable X (i.e., /*„(—A)).

Trace Routine The following routine calculates the jth-order trace of a by summing up the determinants of the corresponding yth-order minors (as outlined in Section 1.7). The following function Tr[j,a] successively strikes out rows and columns using a recursive routine to nest For loops. The variable "case" contains the List of rows (and columns) to be stricken and is passed internally in the recursive cell. The routine is somewhat sophisticated and inexperienced programmers may have difficulty understanding the algorithm. It is, however, a good exercise to explore the routine and the interested reader may find Wolfram's textbook The Mathematica Book useful in doing so. Tr[n_Integer, a_List, case_: {}] : = Module[{ans, vec, dropit, start, i}, vec = case; ans = 0; If[n > Length[a] || n < 1, Print["Error: Illegal integer value."]; Return[]; ]; If [Length[vec] == Lengt]i[a] - n, dropit = Partition[vec, 1]; ans = Det[Delete[Transpose[Delete[a, dropit]], dropit]];, If[Length[vec] == 0, start =0;, start = vec[[Length[vec]]];]; For[ i = start + 1, i < n + 1 + Length[vec], i++, 532 APPENDIX

AppendTo[vec, i]; ans += Tr[n, a, vec]; vec = Drop[vec, -1]; ]; ]; Return[ans]; ]

Trace Routine The following routine CharPoly[a,x] returns the characteristic polynomial of the matrix a with respect to the variable x. The output is in the form P„(—Jc) and derives from Eqs. (5.5.3) and (5.5.5). Note that the function Tr[] is called in CharPoly[].

CliarPoly[a_, x_] : = Module[{}, len = Lengtli[a] ; ans = (-x) ^ len; For[i = 1, i < len, i++, ans += Tr[i, a] (-x) " (len - i); ]; ans ]

Example The following is an example of the uses of Tr[ ] and CharPoly[ ]:

mat = {{1, 4, 3, 1}, {2, 1, 0, 1}, {1, 2, -1, 1}, {Q, 1, 1, 1}};

Tr[l, mat]

Tr[2, mat]

-13

Tr[4, mat]

CliarPoly [mat, k]

14 - 7 A. - 13 X^ - 2 X^ + X^ ITERATIVE METHOD FOR CALCULATING THE EIGENVALUES OF TRIDIAGONAL MATRICES 533

Solve [CharPoly [mat, A.] ==0, A.] // N {{A.-> 0.781784}, {X-> 4.8533}, {A--> - 1.81754 - 0.621562 l}, {A-> - 1.81754 + 0.621562 l}}

A.I I. SECTION 5.6: ITERATIVE METHOD FOR CALCULATING THE EIGENVALUES OF TRIDIAGONAL MATRICES

The following code is the implementation of the iterative method described in Section 5.6 to calculate the eigenvalues of a tridiagonal matrix. The method involves the calculation of the polynomial expression Pni^o) for some initial guess eigenvalue XQ, which is handled below in a separate function PolyTri. The Newton- Raphson method is then used to iteratively solve the equation P„(A.o) = 0, where n is the dimension of the matrix.

Polynomial Functions The function PolyTri[a,A,n] calculates the nth polynomial P„(k) using the recursion formula Eq. (5.6.4). Notice that we must include three function definitions for the three separate cases where n = 0, n = 1, and n > \.

PolyTri [a-List, A_, n_Integer] : = (a[[n, n]] - X) PolyTri[a, A, n - 1] - a[[n - 1, n]] a[[n, n - 1]] PolyTri[a, X, n-2] /;n>l

PolyTri [a_List, X_, 1] := a[[l, 1]] - X PolyTri[a_List, A_, 0] := 1.0

The function PolyTriD[a,X,n] is the derivative with respect to A. of the corresponding function PolyTri defined above.

PolyTriD[a_List, X_, n_Integer] : = (a[[n, n]] - X) PolyTriD[a, X, n - 1] - PolyTri[a, X, n - 1] - a[[n - 1, n]] a[[n, n - 1]] PolyTriD[a, A, n - 2] /; n > 1

PolyTriD[a_List, X-, 1] := - 1.0 PolyTriD[a_List, A_, 0] := 0.0

Newton-Raphson Method We now define the function EigenvalueTri as an implementation of the one- dimensional Newton-Raphson scheme to solve for an eigenvalue of a given an initial guess XQ. We use the above functions PolyTri and PolyTriD to calculate the residual and Jacobian below. 534 APPENDIX

EigenvalueTri [a_List, A,0_] : = Module[{norml, norm2, TOLERANCE, Nmax, jac, residual, len, Anew, Adel, i}, TOLERANCE = 10"^- Nmax = 100; len = Length[a]; norml = 1.0; norin2 = 1.0; Anew = A.0; residual = - PolyTri[a, Anew, len]; i = 1; While[norml > TOLERANCE && norm2 > TOLERANCE jac = PolyTriD[a, Anew, len]; Adel = residual / jac; Anew += Adel; residual = - PolyTri[a, Anew, len]; norml = VAdel^; norm2 = vresidual^; i++; If[i > Nmax, Print["WARNING: Exceeded maximum iterations."]; BreaJc [ ] ;

Anew ]

A. 12. EXAMPLE 5.6.1: POWER METHOD FOR ITERATIVE CALCULATION OF EIGENVALUES

The following code is an implementation of the power method in Section 5.6 to calculate the highest valued eigenvalue and eigenvector of a square matrix. The form of the program is a user-defined function called PowerMethod[a,Xo], where a is the input matrix and XQ is the initial guess of the eigenvector. The function has been designed to handle complex eigenvalues and eigenvectors.

PowerMethod[a_List, xO_List] : = Module[{TOLERANCE, Nmax, u. A, Aold, norm, i}, TOLERANCE = 10"^^- Nmax = IOC- norm = 1.0; r xO 1 u = Chop . ^ ^ // N ; LvCon3ugate[xO] . xO -I A = Cliop [Conjugate [u] . a . u // N] ; i = 0; While [norm > TOLERANCE, IMPLEMENTATION OF THE SPECTRAL RESOLUTION THEOREM—MATRIX FUNCTIONS 535

i++; u = a . u // N; u = Chop[ , "" // N1; Acid = X; k = Chop[Conjugate[u] . a . u // N]; norm = Chop[A/Conjugate[A — A,old](X — Aold) // N]; I f [ i > Nmax, Print["WARNING: Exceeded maximum iterations."]; Break[ ] ; ];

{A, u} ]

A. 13. EXAMPLE 6.2.1: IMPLEMENTATION OF THE SPECTRAL RESOLUTION THEOREM—MATRIX FUNCTIONS

The following code is a Mathematica program for generating functions of matrices using the spectral resolution theorem of Chapter 6. Two built-in functions (Eigenvalues[] and Eigenvectors[]) are used to generate the required eigenanalysis; however, these routines can be easily replaced with the appropriate used-defined routines. The code is contained in the function MatrixFunction [f,a], where f is the Mathematica function head (e.g., Exp, Cos, Tanh), which can be a Mathemat- ica built-in function or any user-defined function, and a is the matrix (or matrix expression). The function will only work for perfect matrices of a size feasible for handling by the functions Eigenvalues[] and Eigenvectors[]. Note that the function f(t) must be defined at the eigenvalues of a.

Routine In the following program, we have made use of the built-in routine Map and the Mathematica implementation of "pure functions." The reader not familiar with such concepts is encouraged to consult Wolfram's text. The Mathematica Book, for a full description.

MatrixFunction [f_, a_] : = Module[{ans, eval, evec, revec, feval, i, j}, ans = Table[0, {i. Length [a]}, {j, Length [a]}]; eval = Eigenvalues[a]; evec = Eigenvectors[a]; evec = Map[(#/Sqrt[# . Conjugate[#]])&, evec]; revec = Transpose[Inverse[Conjugate[evec]]]; 536 APPENDIX

feval = Map[f, eval]; For[i = 1, i < Length[a], i++, ans += feval[[i]] Transpose[{evec [ [ i ] ] }] • {revec [ [ i ] ] }; ]; ans

Examples The following are examples of how to implement the above function: MatrixFunc- tion[f,a]

{1. 2}};

/(a):

MatrixFunction[f, a] // MatrixForm

/ (l->/5) £[|(5->/5)] (l-y5)(l+75) f[|(5~75)]

(14-A/5) f[|(5+>/5)] (l-v^)(l+V^) f[|(5+v^)] + - (l-fv^)f[|(5-x/5)]

(i-y5)f[|(5+y5)] +- 75

/(a)

MatrixFunction[Exp, a] // N // MatrixForm

/28.0655 14.8839\ \14.8839 13.1815/

/(a) = Jo(a):

MatrixFunction[(BesselJ[0, #])&, a] //N// MatrixForm

-0.125315 -0.433807^ -0.433807 0.308492 , NUMERICAL SOLUTION OF A VOLTERRA EQUATION (SATURATION IN POROUS MEDIA) 537

/(a) = /o exp(-x^) sin(x) dx:

MatrixFunction[(NIntegrate[Exp[-x"2 ] Sin[x], {x, 0, #}])&, a] // N // MatrixForm

0.412416 0.0194503\ ^0.0194503 0.392965 j

A. 14. EXAMPLE 9.4.2: NUMERICAL SOLUTION OF A VOLTERRA EQUATION (SATURATION IN POROUS MEDIA)

The following codes are Mathematica programs for the numerical solution of Eq. (9.4.33), a Volterra equation of the first kind. One of the goals of Example 9.4.2 was to demonstrate the numerical difficulty involved with convergence of Volterra problems. Thus, the following codes have been designed to allow variable mesh sizes and variable inherent "data error." The user is urged to play with both these parameters and explore the stability of the resulting solutions.

Exact Solution We begin by defining the functions that represent the "exact" solution to some physical system. The function sx[p] represents the exact physical saturation in a given sample porous media. The goal of our experiment is to back out this function by measuring the average saturation and a function of p, the capillary pressure. We will therefore eventually compare our numerical solution to sx[pj:

sx[p_] := 1.0 /; p <= 2.0 sx[p_]:= 1.5/p + 0.25 /; p > 2.0

Next, we define the theoretical average saturation, ssx[p], by formally integrating Eq. (9.4.29) giving Eqs. (9.4.30) and (9.4.31):

ssx[p_] := 1.0 /; p <= 2.0 ssx[p_] := 1.75 - 1.5 Sqrt[1.0 - 1.5/p] - 3.00.75 (ArcTanh[Sqrt[0.25]] P - ArchTanh[Sqrt[l - 1.5/p]])/; p > 2.0

Numerical Solution The routine NSat[pmax, error, n] is the numeric solution implemented by Eq. (9.4.32). The variable pmax is the user-specified maximum capillary pressure for evaluation (note that pmax must be greater than 0). The variable "error" is the requested inherent error to be imposed on the data. The data (representing val- 538 APPENDIX

ues of the average saturation on a regular grid of capillary pressures) is generated by evaluating the theoretical value (via ssx[p]) at each pressure and then adding or subtracting a random fluctuation whose magnitude does not exceed the fraction "error" of the theoretical value. The variable n is simply the number of discretizations of capillary pressure on the interval (0 < p < pmax). The output of NSat is a list of points (p,s} representing the computed saturation (s) versus pressure (p).

NSat[pmax_, error_, n_: 200]: = Module[{d, ss, a, b, ans}. If [ pmax < 0.0, Print["Error: pmax must be greater than zero"]; Return[];]; d = pmax/n; ss = Table[ssx[dj ] (1 + error Random[Real, {-1, 1}]), {j, 1, n}]; r r 0.75 a = Table] If|: < i, — =, 0.0] ^ ^ " iVl.0-0.75 j/i If[i == j, 0.5, 1.0], {i, 1, n}, {j, 1, n}]; rO.375 ,. T b = Table —;—, {i, 1, n} ; ans = LinearSolve[a, ss - b]; ans = Transpose[{Table [ i d, {i, 1, n}] , ans}]; Return[ans];

Plotting Function The function PlotSolution[data] uses the IVIathematica function Show to generate a combined plot of the numerical solution (in data) and the theoretical solution (via ss[p]). The function List is used to plot the theoretical curve and ListPlot is used for the numerical data points. The option Display Function is used to suppress the graphical output until both plots have been combined by Show.

PlotSolution[data_List]:= S]iow[ {Plot [sx[p] , {p, 0.0, data[ [Lengt]i[data] , 1]]}, DisplayFunction -> Identity], ListPlot[data, DisplayFunction -> Identity]}, DisplayFunction -> $DisplayFunction]

Examples The following examples show the sensitivity of the Volterra equation to random noise in the data: NUMERICAL SOLUTION OF A VOLTERRA EQUATION (SATURATION IN POROUS MEDIA) 539

First, the solution with zero noise: data = NSat[4.0, 0, 200]; PlotSolution[data] data = NSat[4.0, 0, 200]; PlotSolution[data]

1.1

J^a^^4gVJWWWW

0^9

0.8

0.7

- Grapliics - The numerical solution with 0.001% noise: data = NSat[4.0, 0.00001, 200]; PlotSolution[data] data = NSat[4.0, 0.00001, 200]; PlotSolution[data]

1.1 • • • ..••• 1 •

0.9

» 0.8

0.7

- Grapliics 540 APPENDIX

The numerical solution with 0.01% noise:

data = NSat[4.0, 0.0001, 200]; PlotSolution[data] data = NSat[4.0, 0.0001, 200]; PlotSolution[data]

1.1

*•:.—.n,^A .*.*«^* V*'* • ,uv^v;xw.^^^^^^^

0.9

0.8

0.7

0.6

0.5

Grapliics

A. 15. EXAMPLE 10.5.3: NUMERICAL GREEN'S FUNCTION SOLUTION TO A SECOND-ORDER INHOMOGENEOUS EQUATION

The following codes illustrate how to implement the techniques of numerical integration (from Chapter 3) to define a Green's function solution to an inhomogeneous differential equation. The function u[x] is defined as the integral operation in Eq. (10.5.76) and (0 < JC < 1).

Auxiliary Functions We first define the function f[x], which is the forcing function in Eq. (10.5.66):

f[x_] := Exp[-x^2]

We then define a routine for numerical integration. The function NInt uses the rhombohedral method of integration where "integrand" contains the expression of the integrand; the list "limits" contain (in order) the variable of integration in "integrand " the lower integration limit, and the higher integration limit; and "delta" is the requested step size of the integration (dx). Note that the actual step size will be less than or equal to "delta" in actuality. NUMERICAL GREEN'S FUNCTION SOLUTION TO A SECOND-ORDER INHOMOGENEOUS EQUATION 54 I

NInt [integrand-, limits_List, delta_] : = Module[{y, ymin, ymax, n, dy, ans, i}, y = limits[[1]]; ymin = limits[[2]]; ymax = limits[[3]]; ^ .-, . r(ymax-ymin) i n = Ceiling - 1 ; L delta J (ymax — ymin) dy = — ; n4-1 ans = 0.5 dy (integrand /. y —> ymin + integrand /. y -> ymax) // N; ans += Sum[ dy integrand /. y -> (ymin + i dy), {i, n}] // N ]

Solution Finally, we define the module for the function u[x], incorporating the routine NInt above.

u[x_] := Module[ {z, int, ans, dx}, dx = 0.001; z = -1/3 Exp[-(x - 1)]; int = NInt[(-3 Exp[y - 1] + 5 Exp [2 y] ) f[y], {y, -1, X.}, dx]; ans = z int; z = -1/3 (-3 Exp[-2 (X + 1)] + 5 Exp[-(x + 1)]); int = NInt[ Exp[2 (y + l)]f[y], {y, X, 1}, dx]; ans += z int // N ] Note that we could have used the built-in Mathematica function NIntegrate instead of NInt, in which case the function u[x] would be redefined as:

u2 [x_] := Module[ {z, int, ans}, z = -1/3 Exp[-(x - 1)]; int = NIntegrate[(-3 Exp[y - 1] + 5 Exp[2 y])f[y], {y, -1, x} ]; ans = z int; z = -1/3 (-3 Exp[-2 (X + 1)] + 5 Exp[-(x + 1)]); int = NIntegrate[ Exp[2 (y + 1)] f[y], {y, X, 1}]; ans += z int // N ] 542 APPENDIX

A.r6. EXAMPLE 10.8.2: SERIES SOLUTION TO THE SPHERICAL DIFFUSION EQUATION (CARBON IN A CANNONBALL)

The following code illustrates how to define a series solution (with an imposed cutoff) to the diffusion equation in spherical coordinates. In Example 10.8.2, the boundary conditions are such that the problem is spherically symmetric (i.e., the spatial solution only depends on the radius r and not on the angular orientation). We proceed by defining the function f[r,t] as the solution C/Q in Eq. (10.8.22), where r is the dimensionless radius (r/b) and t the dimensionless time (tD/b^).

f[r_, t_] : = Module[{TOLERANCE, Nmax, norm, ans, del, n}, TOLERANCE = IQ-^^- Nmax = 100; norm = 1.0; ans = 1.0; n = 0; While [norm > TOLERANCE, n++; If[ r == 0.0, del = (-1)^ 2 Exp[ -(Pi n)^ t] ; , . Sin[Pi n r] del = (-1)" 2 Exp[ -(Pi n)^ t] -; Pi n r ]; ans += del; norm = Vdel^; I f [ n > Nmax, Print["WARNING: Exceeded maximum iterations."]; Break[]; ]; ]; ans INDEX

Abel's problem, 376 self-adjoint, 463 Completely continuous operators, Adiabatic temperature profile, 106 unmixed, 446-450 355, 366-375, 387 Adjoint matrix, 164; see also Matrix Boundary functionals, adjoint Schmidt's normal form of, Adjoint of a differential operator, operator, 423 401-405 413, 420-426 Boundary value problem, 414, 440 self-adjoint, 395 Adjugate, see Matrix general boundary conditions, Completeness, vector space, 315 Alien cofactor expansion, 14 450-452 Completeness relation, 344 Anti-Hermitian matrix, 265 for unmixed boundary conditions, Complete vector space, 326 Arrhenius formula, 89 446-450 Conformable partitioned matrix, Atomic matrix, 147 Bounded operators, 355 35-36 Augmented matrix, 53-54, 55, 58, Conjugate gradient method, 84-85 123, 130 iterative methods for solving lin- Cartesian unit vectors, 43 ear system Ax = b, 518-519 Cauchy convergence, 324-326 Cauchy sequence, 316 Continuation strategy, 92 Backward substitution, 49, 53-55 Centered difference formulas, 73 Continuously stirred tank reactor Band matrix, 66-78 Characteristic polynomial equation, (CSTR), 89 Basis set, 170-179; see also specific 165 multireaction at constant type Chemical reaction, 146-147, 210 temperature, 104-106 biorthogonal, 199 Chemical stability, conditions of, steady-state solutions for, orthonormal, 199 242 521-523 reciprocal, 171, 175-177 Cofactor Continuous solutions, 382 Basis vectors, 326-330 definition, 9 Continuous spectrum, 485 BesseFs inequality, 356, 387 expansions, 9-14 Convergence rates, 80-81 Bilinear function, 206 Cofactor expansion theorem, 10-14 Cooling, graphic electrode, 102-104 Biorthogonal basis sets, 199 Commuting matrix, 239-240 Countercurrent separation process, Biorthogonal set, perfect matrix, 205 Companion matrix, 255-258 75-76 Boundary conditions, 416-418 Completely continuous normal Cramer's rule, I, 14-16, 18-22, 29, homogeneous, 444-446 operators, 405-406 47

543 544 INDEX

Damped oscillator, 291 Electric circuit, 143 Generalized eigenproblems, 269, Damped pendulum, 274 Elimination process, 54-55 275 Darboux sum, 320, 329 Entropy maximum principle, 242 Generalized eigenvectors, 280, 289, Defective matrix, 279-314 Equations of state 294-303 Degenerate operators, 337 Peng-Robinson equation, 111 Gerschgorin's theorem, 165, 195 Determinants, 1-23 Redlich-Kwong equation, 111 Gibbs free energy, 243 addition, 1 Soave equation, 111 Gram-Schmidt procedure, 172-173, column addition or subtraction, 1 Euclidean space, 28, 40, 43 197, 327, 347-350, 353 definition, 3-5 Euclidean vector space, 151, 167 orthogonalization, 229, 306 differentiation, 9 unbounded three-dimensional, 362 Green's formula, 422 elementary properties of, 6-9 Eulerian angles, 268 Green's function, 415, 502-504 Vandermonde, 22 Euler method, 116 inverse of a differential operator, Diagonalization, similarity implicit or backward, 117 439-452 transformation, 213-215 numerical solution to Diagonal matrix, 30, 186, 241 second-order inhomogeneous Differentiability, in linear vector Finite-difference approximation, equation, 540-541 space, 38 71-76, 108 Differential equation, partial, Finite-difference method, 71-76 493-501 Finite-element analysis, 351 Hadamard's inequality, 265 Differential expression, First-order continuation, 92, 98 Hamilton-Cayley theorem, 199-201, adjoint, 420-426 Floating-point arithmetic, 55-56 292-293 Differential operator, 331-332, 345, Flory-Huggins theory, polymeric Harmonic oscillator, 206 416-420 solutions, 93-99 Heat exchanger adjoint of, 413, 420-426 Forward substitution, 50 performance, 119-120 inverse of, Green's function, Fourier series, 328-330 profile, 215-216 439-452 Fourier transform, 362, 486 Hermite functions, 353, 477-478 Fredholm's alternative theorem, 124, normal, 457 Hessian matrix, 232 155-159, 295, 360, 364 spectral theory of, 452^58 Hilbert matrix, 117, 262 Fredholm's solvability theorems, 355 Diffusion, in 3 dimensions, 494 Hilbert-Schmidt operator, 355, Frobenius norm, 45 Diffusion equation, spherical, series 366-375, 388, 454 Function spaces, 315, 318 solution to, 542 Hilbert space, 315-316, 324-336, Fundamental solutions, 426-439 Dilute partition coefficient, 78 345 Fundamental system, 503 Dirac delta function, 344, 442-444, linear differential operators in, 486 413-510 Discontinuous solutions, 382 Galvanometer, 313 linear integral operators in, Dyadic imperfect matrix, 303-304 Gas mixture, viral coefficients of, 355-410 Dyadic operator, 316, 333-334, 352 148-150 Holder inequality, 268 finite, 366 Gauss elimination, 5, 47-70, Homogeneous solutions, 141-143 108-110,511-514 with pivoting, 55-58, 513-514 Eigenfunction, 388 simple, 48-55, 512 Identity matrix, 174 function f(K), 390 solution of linear systems, resolution, 174 Eigenproblem, 163-204 512-513 Imperfect matrix, 169, 179, 279-314 generalized, 269, 275 Gaussian distribution, 266 dyadic form, 303-304 Eigenvalues Gauss-Jordan elimination, solution Inertia tensor, 264 analysis, 179-184 to the linear system Infinite-dimensional linear vector calculation of, 189-196 Ax = b, 529-531 spaces, 315-353 degeneracy, 388-389 Gauss-Jordan transformation, Inhomogeneous equation, second- distinct, 219-220 matrix, 129-132 order, numerical Green's iterative calculation, power Gauss-Seidel iteration, 47, 81 function solution, 540-541 method for, 534-535 Gauss-Seidel method, 79-80 Inhomogeneous problem, general, multiplicity, 193 iterative methods for solving 426-439 special properties of, 184-188 linear system Ax = b, 516 Initial value problem, 254-258, Eigenvector, 164 Gauss theorem, 496 308-313,414,440,444-450 adjoint matrix, 164 General boundary conditions, Inner product, 165 function f(K), 390 boundary value problem for, Inner product space, 163, 165, generalized, 280, 289, 294-303 450-452 322-323 INDEX 545

Integrability, in linear vector space, Linear equations, 85-121 normal, 206, 219-220, 245-249, 38 solution by Cramer's rule, 14-16 265 Integral operator, 369 Linear independence, 38-39, 44, orthogonal, 206, 219, 220-224 linear, in a Hilbert space, 355-410 150-155 partitioning, 35-37 spectral theory of, 387-406 Linearly independent vectors, 124 perfect, 164, 169, 191, 205-278 Volterra, 375-387 Linear operator, 163, 330-336 logarithm of, 209 Integral transform, 487 in normed linear vector space, positive-definite, 40, 62, 170, 206, Inverse, see also Matrix 165-170 219, 228, 249 computation of, 58-60 Linear operator theory, 315 positive-semidefinite, 264-265 pseudo-inverse, 26 Inverse matrix, 28-33; see also Linear vector space, 26, 38-43, 163 rank of, linear dependence of a Matrix normed, linear operators in, vector set and, 150-155 Isomerization reactions, 218 165-170 scalar multiplication, 3 Iterative methods, 78-85 Li near-vector space, self-adjoint, 190, 206, 219, solving the linear system Ax = b, finite-dimensional, 170 227-245, 249, 265, 348-350 515-518 Linear viscoelastic stress, 386 skew-symmetric, 161, 219 Iterative solutions, convergence Liquid-vapor interface square, 225, 288, 531-533 Gauss-Seidel method, 81 density profile, 90-92 arbitrary, Schmidt's normal Jacobi method, 81 density profile in, 523-526 form, 304-308 Newton-Raphson method, 81 Liquid-vapor phase diagram, 99-102 symmetric, 271 SOR method, 81 Logarithm of perfect matrices, 209 transpose, 26, 33-35 Lower triangular matrix, 49, 61 transpositions, 3 LU-decomposition, 47, 61-65 tridiagonal, 67, 72, 165, 189 Jacobian elements, 96 calculating eigenvalues of, Jacobian matrix, 86 533-534 unitary, 205, 206, 219, 220-224, Jacobi iteration, 47, 78-79, 81 Mass separation, 76 225-227, 349-350 Jacobi method, iterative methods for with a staged absorber, 514-515 upper triangular, 49, 59, 61 solving linear system Ax = b, Matrix, 2-3, 25-46; see also specific Matrix products, determination of, 516 type 124-129 Jordan block diagonal matrix, addition, 3, 25-28 282-288, 294 Minors, 16-18 adjoint, 25, 33-35, 164 Multicomponent diffusion system, Jordan block matrix, 279, 282-288, adjugate, 25, 29, 43 294-303 243-244,251 atomic, 147 Multicomponent reaction system, Jordan canonical form, 288-294, augmented, 123, 130 217-219 299, 303 commuting, 239-240 Multiplicity, 252-253 solving initial value problem, companion, 255-258 eigenvalues, 193 309-310 defective, 279-314 diagonal, 30, 186, 241 diagonally dominant, 65, 186 Negative-definite matrix, 206, 228, Kronecker delta function, 14 with distinct eigenvalues, 219-220 237 self-adjoint, 347 Gauss-Jordan transformation of, Newton-Raphson iteration, 47, 81 solutions to problems, 336-343 129-132 Newton-Raphson method, 85-87, Hessian, 232 197, 465 identity, 174 continuously stirred tank reactor (CSTR), 521-523, 525-526 Lagrange multiplier, 396 imperfect, 169, 179, 279-314 convergence with Picard method, Lagrange*s identity, 422 dyadic form, 303-304 519-521 Laplace transform, 377 implementation of spectral eigenvalues of tridiagonal matrix, Laurent series, 208 resolution theorem, 535-537 533-534 Least-squares data analysis, 2, 110, inverse, 28-33, 43, 58-60 polymer solution, 526-529 149 linear operator norm, 38 Newton-Raphson scheme, 210 Lebesgue integral, 316, 319-321, lower triangular, 49, 61 Nilpotent matrix, 289, 293 329, 344, 352-353 multiplication, 26-28 Nonlinear systems, spectral Lebesgue square integral, 369 negative-definite, 206, 228, 237 resolution theorem in, 209-210 Legendre polynomial, 327, 334, negative matrix, 228, 237 Norm, 39, 41-^3 363, 477-478 nilpotent, 289, 293 Normal matrix, 206, 219-220, Linear differential operators, Hilbert nonsingular, 288 245-249, 265 space, 413-510 norm, 41-43 Normal operator, 390-393 546 INDEX

Normed linear vector space, 26, 39, Quadratic convergence, Spectral resolution, 485 44, 45, 165-170, 324 Newton-Raphson method. Spectral resolution theorem, 164, basis sets in, 170-179 Quantum mechanics, 260-261, 177, 191, 205, 344 Null space, 393 488-492 implications of, 206-213 integral operators, 375 matrix functions, 535-537 Ohm's law, 143 Rank of matrices, 16-18 normal, completely continuous Onsager matrix, 251 Reaction diffusion, thin film, operator, 396 Operator norm, 38 243-244 normal matrix, 206 Orthogonalization, Gram-Schmidt, Reactor stability analysis, 34 partial differential equations, 172-173 Reciprocal basis set, 175, 207 493-501 Orthogonal matrix, 206, 219, Reciprocal vectors, 175-177 perfect matrix, 205 220-224 perfect matrix, 205 Spectral theorem Orthonormal, 36-37 Redlich-Kwong equation, 111 differential operators, 452-458 Orthonormal basis set, 171, 199, Residual, 97 integral operators, 356, 387-406 206, 223, 305 Resolution, identity matrix, 174 regular Sturm-Liouville normal matrices, 246 Riemann integral, 318-321, 329 operators, 459-477 Runge-Kutta method, 115 singular Sturm-Liouville operators, 477-493 Peng-Robinson equation, 111 Square matrix, 2, 31-32, 225, 288 Perfect matrix, 164, 169, 191, Scalar multiplication, 27 arbitrary, Schmidt's normal form, 205-278 Schmidt normal form, 280 304-308 Perfect operators, 179, 343-351, 347 Schmidt's normal form of arbitrary characteristic polynomials, Perrin's theorem, 202 square matrix, 304-308 531-533 Perturbation theory, 259-261 Schwarz inequality, 163, 166, 322 Staged absorber, 76 Phase diagram, polymer solution, Self-adjoint integral operators, 356 Stirling's approximation, 5 93-99, 526-529 completely continuous, 395-^01 Stress tensor, 264 Phase separator, process control of, Self-adjoint matrix, 190, 206, 219, Sturm-Liouville operators, 415 211-213 227-245, 249, 265, 348-350 partial differential equations, Picard iteration, 42-43, 44, 47 Self-adjoint operators, 457 493-501 Picard method, 85-87 Semidiagonalization theorem, regular, spectral theory of, convergence with Newton- 225-227, 246 459-477 Raphson method, 519-521 Separation of variables, 493-495, singular, spectral theory of, Pivoting, 55-58 499 477-493 complete, 57 Series solution, spherical diffusion Successive overrelaxation method, partial, 56-58 equation, 542 47, 79-81 Plug flow reactor, temperature Similarity transformation, 186 continuously stirred tank reactor, profile in, 106-108 diagonalization by, 213-219 524-525 Polymerization, reversible anionic, perfect matrix, 205 iterative methods for solving 118 Simple perfect matrix, 208-209 linear system Ax = b, 516 Polymer solutions, phase diagram Skew-symmetric matrix, 219 Sylvester's formula, 206, 254 of, 526-529 Soave equation, 111 Sylvester's theorem, 123-129 Polynomial functions, eigenvalues Solutions Symmetric matrix, 271 of tridiagonal matrix, 533 homogeneous, 140-143 Positive-definite matrix, 40, 62, 170, particular, 140-143 206, 219, 228, 249 Solvability theorem, 123 Positive-semidefinite matrix, Ax = b, 133-150 Taylor series, 208 264-265 linear integral equations, 356-366 Thermal diffusivity, 494 Power method, 165, 190-194 for linear systems, 316 Tolerance, 96-99, 191-192 iterative calculation of Spectral decomposition, 457-458, Trace routine, characteristic eigenvalues, 534-535 486-487, 505 polynomials and traces of Predator-prey, 275 completely continuous normal square matrix, 531-533 Principal minors, 281 operators, 405-406 Traces, matrix, 17, 182-187 Process control, 34 self-adjoint operators, 405-406 Traction vector, 201 Projection operators, 211 Spectral operators, special Transpose, see Matrix Pseudo-inverse, see Matrix properties, 390 Trapezoidal rule, 108 INDEX 547

Triangle inequality, 163, 166, 322, van der Waais model, 91, 99 Vibratory feeder, 274-275 324 Vectors, 25-46 Volterra equation, 375-387 vector, 40 column versus row, 2-3 first and second kind, 355-356 Tridiagonal matrix, 67, 72, 165, 189 definition, 2 numerical solution, 537-540 calculating eigenvalues of, length, 38 533-534 linearly independent, 124 orthonormal set, 36-37 Wronskian, 438, 447, 503 reciprocal, perfect matrix, 205 Unbounded operators, 355 scalar product, 27, 43 Unitary matrix, 205-206, 206, 219, space, 26 220-224, 225-227, 349-350 Vector spaces Young-Laplace equation, 114 Unitary transformation, 205-206, finite-dimensional, 372 220 infinite-dimensional linear, Upper triangular matrix, 49, 59, 61 315-353 Zeroth-order continuation, 92 This Page Intentionally Left Blank