Calculus 1 to 4 (2004–2006)

Axel Schuler¨

January 3, 2007 2 Contents

1 Real and Complex Numbers 11 Basics ...... 11 Notations ...... 11 SumsandProducts ...... 12 MathematicalInduction...... 12 BinomialCoefficients...... 13 1.1 RealNumbers...... 15 1.1.1 OrderedSets ...... 15 1.1.2 Fields...... 17 1.1.3 OrderedFields ...... 19 1.1.4 Embedding of natural numbers into the real numbers ...... 20

1.1.5 The completeness of Ê ...... 21 1.1.6 TheAbsoluteValue...... 22 1.1.7 SupremumandInfimumrevisited ...... 23 1.1.8 Powersofrealnumbers...... 24 1.1.9 Logarithms ...... 26 1.2 Complexnumbers...... 29 1.2.1 TheComplexPlaneandthePolarform ...... 31 1.2.2 RootsofComplexNumbers ...... 33 1.3 Inequalities ...... 34 1.3.1 MonotonyofthePowerand ExponentialFunctions ...... 34 1.3.2 TheArithmetic-Geometricmeaninequality ...... 34 1.3.3 TheCauchy–SchwarzInequality ...... 35 1.4 AppendixA...... 36

2 Sequences and 43 2.1 ConvergentSequences ...... 43 2.1.1 Algebraicoperationswithsequences...... 46 2.1.2 Somespecialsequences ...... 49 2.1.3 MonotonicSequences ...... 50 2.1.4 Subsequences...... 51 2.2 CauchySequences ...... 55 2.3 Series ...... 57

3 4 CONTENTS

2.3.1 PropertiesofConvergentSeries ...... 57 2.3.2 OperationswithConvergentSeries...... 59 2.3.3 SeriesofNonnegativeNumbers ...... 59 2.3.4 TheNumber e ...... 61 2.3.5 TheRootandtheRatioTests...... 63 2.3.6 AbsoluteConvergence ...... 65 2.3.7 DecimalExpansionofRealNumbers ...... 66 2.3.8 ComplexSequencesandSeries ...... 67 2.3.9 PowerSeries ...... 68 2.3.10 Rearrangements...... 69 2.3.11 ProductsofSeries ...... 72

3 Functions and Continuity 75 3.1 LimitsofaFunction...... 76 3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity...... 77 3.2 ContinuousFunctions...... 80 3.2.1 TheIntermediateValueTheorem...... 81 3.2.2 Continuous Functions on Bounded and Closed Intervals—The Theorem about Maximum and Minimum 3.3 UniformContinuity...... 83 3.4 MonotonicFunctions ...... 85 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses . . . 86 3.5.1 ExponentialandLogarithmFunctions ...... 86 3.5.2 TrigonometricFunctionsandtheirInverses ...... 89 3.5.3 HyperbolicFunctionsandtheirInverses ...... 94 3.6 AppendixB...... 95 3.6.1 MonotonicFunctionshaveOne-SidedLimits ...... 95 3.6.2 Proofs for sin x and cos x inequalities ...... 96 3.6.3 Estimates for π ...... 97

4 Differentiation 101 4.1 TheDerivativeofaFunction ...... 101 4.2 TheDerivativesofElementaryFunctions ...... 107 4.2.1 DerivativesofHigherOrder ...... 108 4.3 LocalExtremaandtheMeanValueTheorem ...... 108 4.3.1 LocalExtremaandConvexity ...... 111 4.4 L’Hospital’sRule ...... 112 4.5 Taylor’sTheorem ...... 113 4.5.1 ExamplesofTaylorSeries ...... 115 4.6 AppendixC...... 117

5 Integration 119 5.1 TheRiemann–StieltjesIntegral ...... 119 5.1.1 PropertiesoftheIntegral ...... 126 CONTENTS 5

5.2 IntegrationandDifferentiation ...... 132 5.2.1 TableofAntiderivatives ...... 134 5.2.2 IntegrationRules ...... 135 5.2.3 IntegrationofRationalFunctions ...... 138 5.2.4 PartialFractionDecomposition ...... 140 5.2.5 Other Classes of Elementary Integrable Functions ...... 141 5.3 ImproperIntegrals ...... 143 5.3.1 Integralsonunboundedintervals ...... 143 5.3.2 IntegralsofUnboundedFunctions ...... 146 5.3.3 TheGammafunction...... 147 5.4 IntegrationofVector-ValuedFunctions ...... 148 5.5 Inequalities ...... 150 5.6 AppendixD...... 151 5.6.1 MoreontheGammaFunction ...... 152

6 Sequences of Functions and Basic Topology 157 6.1 DiscussionoftheMainProblem ...... 157 6.2 UniformConvergence...... 158 6.2.1 DefinitionsandExample ...... 158 6.2.2 UniformConvergenceandContinuity ...... 162 6.2.3 UniformConvergenceandIntegration ...... 163 6.2.4 UniformConvergenceandDifferentiation ...... 166 6.3 FourierSeries ...... 168 6.3.1 AnInnerProductonthePeriodicFunctions ...... 171 6.4 BasicTopology ...... 177 6.4.1 Finite,Countable,andUncountableSets ...... 177 6.4.2 MetricSpacesandNormedSpaces...... 178 6.4.3 OpenandClosedSets ...... 180 6.4.4 LimitsandContinuity ...... 182 6.4.5 ComletenessandCompactness...... 185 k 6.4.6 Continuous Functions in Ê ...... 187 6.5 AppendixE ...... 188

7 of Functions of Several Variables 193 7.1 PartialDerivatives...... 194 7.1.1 HigherPartialDerivatives ...... 197 7.1.2 TheLaplacian...... 199 7.2 TotalDifferentiation...... 199 7.2.1 BasicTheorems...... 202 7.3 Taylor’sFormula ...... 206 7.3.1 DirectionalDerivatives ...... 206 7.3.2 Taylor’sFormula ...... 208 7.4 ExtremaofFunctionsofSeveralVariables ...... 211 6 CONTENTS

7.5 TheInverseMappingTheorem ...... 216 7.6 TheImplicitFunctionTheorem...... 219 7.7 LagrangeMultiplierRule ...... 223 7.8 IntegralsdependingonParameters ...... 225 7.8.1 Continuity of I(y) ...... 225 7.8.2 DifferentiationofIntegrals ...... 225 7.8.3 ImproperIntegralswithParameters ...... 227 7.9 Appendix ...... 230

8 Curves and Line Integrals 231 8.1 RectifiableCurves...... 231 k 8.1.1 Curvesin Ê ...... 231 8.1.2 RectifiableCurves ...... 233 8.2 LineIntegrals ...... 236 8.2.1 PathIndependence ...... 239

9 Integration of Functions of Several Variables 245 9.1 BasicDefinition...... 245 9.1.1 PropertiesoftheRiemannIntegral ...... 247 9.2 IntegrableFunctions ...... 248 9.2.1 IntegrationoverMoreGeneralSets ...... 249 9.2.2 Fubini’sTheoremandIteratedIntegrals ...... 250 9.3 ChangeofVariable ...... 253 9.4 Appendix ...... 257

10 Surface Integrals 259 3 10.1 Surfaces in Ê ...... 259 10.1.1 TheAreaofaSurface ...... 261 10.2 ScalarSurfaceIntegrals...... 262 10.2.1 Other Forms for dS ...... 262 10.2.2 PhysicalApplication ...... 264 10.3 SurfaceIntegrals ...... 264 10.3.1 Orientation ...... 264 10.4 Gauß’DivergenceTheorem...... 268 10.5 Stokes’Theorem ...... 272 10.5.1 Green’sTheorem ...... 272 10.5.2 Stokes’Theorem ...... 274 10.5.3 Vector Potential and the Inverse Problem of Vector Analysis ...... 276

n

11 Differential Forms on Ê 279 n 11.1 The Exterior Algebra Λ(Ê ) ...... 279

11.1.1 The Dual V ∗ ...... 279 11.1.2 The Pull-Back of k-forms ...... 284 n 11.1.3 Orientation of Ê ...... 285 CONTENTS 7

11.2 DifferentialForms...... 285 11.2.1 Definition...... 285 11.2.2 Differentiation ...... 286 11.2.3 Pull-Back...... 288 11.2.4 ClosedandExactForms ...... 291 11.3 Stokes’Theorem ...... 293 11.3.1 Singular Cubes, Singular Chains, and the Boundary Operator . . . . . 293 11.3.2 Integration ...... 295 11.3.3 Stokes’Theorem ...... 296 11.3.4 SpecialCases ...... 298 11.3.5 Applications ...... 299

12 Measure Theory and Integration 305 12.1 MeasureTheory...... 305 12.1.1 Algebras, σ-algebras,andBorelSets...... 306 12.1.2 AdditiveFunctionsandMeasures ...... 308 12.1.3 ExtensionofCountablyAdditiveFunctions ...... 313 n 12.1.4 The Lebesgue Measure on Ê ...... 314 12.2 MeasurableFunctions...... 316 12.3 TheLebesgueIntegral ...... 318 12.3.1 SimpleFunctions...... 318 12.3.2 PositiveMeasurableFunctions ...... 319 12.4 SomeTheoremsonLebesgueIntegrals...... 322 12.4.1 TheRolePlayedbyMeasureZeroSets ...... 322 12.4.2 The space Lp(X,µ) ...... 324 12.4.3 TheMonotoneConvergenceTheorem ...... 325 12.4.4 TheDominatedConvergenceTheorem ...... 326 12.4.5 Application of Lebesgue’s Theorem to Parametric Integrals...... 327 12.4.6 TheRiemannandtheLebesgueIntegrals ...... 329 12.4.7 Appendix:Fubini’sTheorem...... 329

13 Hilbert Space 331 13.1 TheGeometryoftheHilbertSpace...... 331 13.1.1 UnitarySpaces ...... 331 13.1.2 NormandInnerproduct ...... 334 13.1.3 TwoTheoremsofF.Riesz ...... 335 13.1.4 OrthogonalSetsandFourierExpansion ...... 339 13.1.5 Appendix ...... 343 13.2 BoundedLinearOperatorsinHilbertSpaces ...... 344 13.2.1 BoundedLinearOperators ...... 344 13.2.2 TheAdjointOperator...... 347 13.2.3 ClassesofBoundedLinearOperators ...... 349 13.2.4 OrthogonalProjections ...... 351 8 CONTENTS

13.2.5 SpectrumandResolvent ...... 353 13.2.6 TheSpectrumofSelf-AdjointOperators ...... 357

14 Complex Analysis 363 14.1 HolomorphicFunctions...... 363 14.1.1 ComplexDifferentiation ...... 363 14.1.2 PowerSeries ...... 365 14.1.3 Cauchy–RiemannEquations ...... 366 14.2 Cauchy’sIntegralFormula ...... 369 14.2.1 Integration ...... 369 14.2.2 Cauchy’sTheorem ...... 371 14.2.3 Cauchy’sIntegralFormula ...... 373 14.2.4 ApplicationsoftheCoefficientFormula ...... 377 14.2.5 PowerSeries ...... 380 14.3 LocalPropertiesofHolomorphicFunctions ...... 383 14.4 Singularities...... 385 14.4.1 ClassificationofSingularities ...... 386 14.4.2 LaurentSeries ...... 387 14.5Residues...... 392 14.5.1 CalculatingResidues ...... 394 14.6 RealIntegrals ...... 395 14.6.1 RationalFunctionsinSineandCosine ...... 395 14.6.2 Integrals of the form ∞ f(x) dx ...... 396 −∞ 15 Partial Differential Equations I —R an Introduction 401 15.1 ClassificationofPDE ...... 401 15.1.1 Introduction...... 401 15.1.2 Examples ...... 402 15.2 FirstOrderPDE—TheMethodofCharacteristics ...... 405 15.3 Classification of Semi-Linear Second-Order PDEs ...... 408 15.3.1 QuadraticForms ...... 408 15.3.2 Elliptic,ParabolicandHyperbolic ...... 408 15.3.3 ChangeofCoordinates ...... 409 15.3.4 Characteristics ...... 411 15.3.5 TheVibratingString ...... 414

16 Distributions 417 16.1 Introduction — Test Functions and Distributions ...... 417 16.1.1 Motivation ...... 417 n 16.1.2 Test Functions D(Ê ) and D(Ω) ...... 418 n 16.2 The Distributions D′(Ê ) ...... 422 16.2.1 RegularDistributions...... 422 16.2.2 OtherExamplesofDistributions ...... 424 CONTENTS 9

16.2.3 ConvergenceandLimitsofDistributions ...... 425 P 1 16.2.4 The distribution x ...... 426 16.2.5 OperationwithDistributions ...... 427 16.3 TensorProductandConvolutionProduct ...... 433 16.3.1 TheSupportofaDistribution ...... 433 16.3.2 TensorProducts...... 433 16.3.3 ConvolutionProduct ...... 434 16.3.4 LinearChangeofVariables...... 437 16.3.5 FundamentalSolutions ...... 438

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) ...... 439 n 16.4.1 The Space S (Ê ) ...... 440 n 16.4.2 The Space S ′(Ê ) ...... 446 n 16.4.3 Fourier Transformation in S ′(Ê ) ...... 447 16.5 Appendix—MoreaboutConvolutions ...... 450

17 PDE II — The Equations of Mathematical Physics 453 17.1 FundamentalSolutions ...... 453 17.1.1 TheLaplaceEquation ...... 453 17.1.2 TheHeatEquation ...... 455 17.1.3 TheWaveEquation...... 456 17.2 TheCauchyProblem ...... 459 17.2.1 MotivationoftheMethod ...... 459 17.2.2 TheWaveEquation...... 460 17.2.3 TheHeatEquation ...... 464 17.2.4 PhysicalInterpretationoftheResults ...... 466 17.3 FourierMethodforBoundaryValueProblems ...... 468 17.3.1 InitialBoundaryValueProblems ...... 469 17.3.2 EigenvalueProblemsfortheLaplaceEquation ...... 473 17.4 Boundary Value Problems for the Laplace and the Poisson Equations...... 477 17.4.1 FormulationofBoundaryValueProblems ...... 477 17.4.2 BasicPropertiesofHarmonicFunctions ...... 478 17.5Appendix ...... 485 17.5.1 Existence of Solutions to the Boundary Value Problems...... 485 17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet Principle 490 17.5.3 NumericalMethods...... 494 10 CONTENTS Chapter 1

Real and Complex Numbers

Basics

Notations

Ê Real numbers

C Complex numbers

É Rational numbers

Æ = 1, 2,... positive integers (natural numbers) { }

Z Integers

Z É Ê C Ê É Z We know that Æ . We write , and for the non-negative ⊆ ⊆ ⊆ ⊆ + + + real, rational, and integer numbers x 0, respectively. The notions A B and A B are ≥ ⊂ ⊆ equivalent. If we want to point out that B is strictly bigger than A we write A ( B. We use the following symbols

:= defining equation y, implication, “if ..., then ...” ⇒ “if and only if”, equivalence ⇐⇒ for all ∀ there exists ∃ Let a < b fixed real numbers. We denote the intervals as follows

[a, b] := x Ê a x b closed interval { ∈ | ≤ ≤ }

(a, b) := x Ê a

[a, b) := x Ê a x < b half-open interval { ∈ | ≤ }

(a, b] := x Ê a < x b half-open interval { ∈ | ≤ }

[a, ) := x Ê a x closed half-line ∞ { ∈ | ≤ }

(a, ) := x Ê a < x open half-line ∞ { ∈ | }

( , b] := x Ê x b closed half-line −∞ { ∈ | ≤ }

( , b) := x Ê x < b open half-line −∞ { ∈ | }

11 12 1 Real and Complex Numbers

(a) Sums and Products Let us recall the meaning of the sum sign and the product sign . Suppose m n are ≤ integers, and a , k = m,...,n are real numbers. Then we set k P Q n n a := a + a + + a , a := a a a . k m m+1 ··· n k m m+1 ··· n kX=m kY=m In case m = n the sum and the product consist of one summand and one factor only, respec- tively. In case n < m it is customary to set

n n

ak := 0, (empty sum) ak := 1 (empty product). kX=m kY=m

The following rules are obvious: If m n p and d Z are integers then ≤ ≤ ∈ n p p n n+d

ak + ak = ak, ak = ak d (index shift). − kX=m k=Xn+1 kX=m kX=m k=Xm+d n

We have for a Ê, a =(n m + 1)a. ∈ − kX=m (b) Mathematical Induction Mathematical induction is a powerful method to prove theorems about natural numbers.

Theorem 1.1 (Principle of Mathematical Induction) Let n Z be an integer. To prove a 0 ∈ statement A(n) for all integers n n it is sufficient to show: ≥ 0

(I) A(n0) is true. (II) For any n n : If A(n) is true, so is A(n + 1) (Induction step). ≥ 0

It is easy to see how the principle works: First, A(n0) is true. Apply (II) to n = n0 we obtain that A(n0 + 1) is true. Successive application of (II) yields A(n0 + 2), A(n0 + 3) are true and so on.

n Example 1.1 (a) For all nonnegative integers n we have (2k 1) = n2. − k=1 Proof. We use induction over n. In case n =0 we have anX empty sum on the left hand side (lhs) and 02 =0 on the right hand side (rhs). Hence, the statement is true for n =0. Suppose it is true for some fixed n. We shall prove it for n +1. By the definition of the sum and by induction hypothesis, n (2k 1) = n2, we have k=1 − n+1 n P (2k 1) = (2k 1)+2(n + 1) 1 = n2 +2n +1=(n + 1)2. − − − ind. hyp. Xk=1 Xk=1 This proves the claim for n +1. 13

(b) For all positive integers n 8 we have 2n > 3n2. ≥ Proof. In case n =8 we have

2n =28 = 256 > 192 = 3 64=3 82 =3n2; · · and the statement is true in this case. Suppose it is true for some fixed n 8, i.e. 2n > 3n2 (induction hypothesis). We will show ≥ that the statement is true for n +1, i.e. 2n+1 > 3(n +1)2 (induction assertion). Note that n 8 ≥ implies

n 1 7 > 2 = (n 1)2 > 4 > 2 = n2 2n 1 > 0 − ≥ ⇒ − ⇒ − − = 3(n2 2n 1) > 0 = 3n2 6n 3 > 0 +3n2 +6n +3 ⇒ − − ⇒ − − | = 6n2 > 3n2 +6n +3 = 2 3n2 > 3(n2 +2n + 1) ⇒ ⇒ · = 2 3n2 > 3(n + 1)2. (1.1) ⇒ · By induction assumption, 2n+1 = 2 2n > 2 3n2. This together with (1.1) yields · · 2n+1 > 3(n + 1)2. Thus, we have shown the induction assertion. Hence the statement is true for all positive integers n 8. ≥

For a positive integer n Æ we set ∈ n n! := k, read: “n factorial,” 0!=1!=1. kY=1 (c) Binomial Coefficients

For non-negative integers n, k Z we define ∈ + n k n i +1 n(n 1) (n k + 1) := − = − ··· − . k i k(k 1) 2 1 i=1   Y − ··· · n The numbers k (read: “n choose k”) are called binomial coefficients since they appear in the binomial theorem, see Proposition1.4 below. It just follows from the definition that  n =0 for k > n, k   n n! n = = for 0 k n. k k!(n k)! n k ≤ ≤   −  −  Lemma 1.2 For 0 k n we have: ≤ ≤ n +1 n n = + . k +1 k k +1       14 1 Real and Complex Numbers

Proof. For k = n the formula is obvious. For 0 k n 1 we have ≤ ≤ − n n n! n! + = + k k +1 k!(n k)! (k + 1)!(n k 1)!     − − − (k + 1)n!+(n k)n! (n + 1)! n +1 = − = = . (k + 1)!(n k)! (k + 1)!(n k)! k +1 − −  

We say that X is an n-set if X has exactly n elements. We write Card X = n (from “cardinal- ity”) to denote the number of elements in X.

n Lemma 1.3 The number of k-subsets of an n-set is k . n  The Lemma in particular shows that k is always an integer (which is not obvious by its defi- nition).  n n n Proof. We denote the number of k-subsets of an n set Xn by Ck . It is clear that C0 = Cn = 1 since ∅ is the only 0-subset of Xn and Xn itself is the only n-subset of Xn. We use induction 1 1 1 1 over n. The case n = 1 is obvious since C0 = C1 = 0 = 1 = 1. Suppose that the claim is true for some fixed n. We will show the statement for the (n + 1)-set X = 1,...,n +1 and   { } all k with 1 k n. The family of (k +1)-subsets of X splits into two disjoint classes. In the ≤ ≤ first class A1 every subset contains n +1; in the second class A2, not. To form a subset in A1 one has to choose another k elements out of 1,...,n . By induction assumption the number { } A n n A is Card 1 = Ck = k . To form a subset in 2 one has to choose k +1 elements out of 1,...,n . By induction assumption this number is Card A = Cn = n . By Lemma1.2 { }  2 k+1 k+1 we obtain n n n+1  Cn+1 = Card A + Card A = + = k+1 1 2 k k +1 k +1       which proves the induction assertion.

Proposition 1.4 (Binomial Theorem) Let x, y R and n Æ. Then we have ∈ ∈ n n n n k k (x + y) = x − y . k Xk=0   Proof. We give a direct proof. Using the distributive law we find that each of the 2n summands n n k k of product (x + y) has the form x − y for some k = 0,...,n. We number the n factors n as (x + y) = f1 f2 fn, f1 = f2 = = fn = x + y. Let us count how often the n k k · ··· ··· summand x − y appears. We have to choose k factors y out of the n factors f1,...,fn. The remaining n k factors must be x. This gives a 1-1-correspondence between the k-subsets − n k k of f1,...,fn and the different summands of the form x − y . Hence, by Lemma1.3 their { n } n number is Ck = k . This proves the proposition.  1.1 Real Numbers 15

1.1 Real Numbers

In this lecture course we assume the system of real numbers to be given. Recall that the set of

m

É Z integers is Z = 0, 1, 2,... while the fractions of integers = m, n , n = 0 { ± ± } { n | ∈ 6 } form the set of rational numbers. A satisfactory discussion of the main concepts of analysis such as convergence, continuity, differentiation and integration must be based on an accurately defined number concept.

An existence proof for the real numbers is given in [Rud76, Appendix to Chapter 1]. The author É explicitly constructs the real numbers Ê starting from the rational numbers . The aim of the following two sections is to formulate the axioms which are sufficient to derive all properties and theorems of the real number system. The rational numbers are inadequate for many purposes, both as a field and an ordered set. For instance, there is no rational x with x2 =2. This leads to the introduction of irrational numbers which are often written as infinite decimal expansions and are considered to be “approximated” by the corresponding finite decimals. Thus the sequence

1, 1.4, 1.41, 1.414, 1.4142, ...

“tends to √2.” But unless the irrational number √2 has been clearly defined, the question must arise: What is it that this sequence “tends to”? This sort of question can be answered as soon as the so-called “real number system” is con- structed.

Example 1.2 As shown in the exercise class, there is no rational number x with x2 =2. Set

2 2 É A = x É x < 2 and B = x x > 2 . { ∈ + | } { ∈ + | }

Then A B = É and A B = ∅. One can show that in the rational number system, A ∪ + ∩ has no largest element and B has no smallest element, for details see Appendix A or Rudin’s book [Rud76, Example 1.1, page 2]. This example shows that the system of rational numbers has certain gaps in spite of the fact that between any two rationals there is another: If r

1.1.1 Ordered Sets

Definition 1.1 (a) Let S be a set. An order (or total order) on S is a relation, denoted by <, with the following properties. Let x, y, z S. ∈ (i) One and only one of the following statements is true.

x

(ii) x

In this case S is called an ordered set. (b) Suppose (S,<) is an ordered set, and E S. If there exists a β S such that x β for ⊆ ∈ ≤ all x E, we say that E is bounded above, and call β an upper bound of E. Lower bounds are ∈ defined in the same way with in place of . ≥ ≤ If E is both bounded above and below, we say that E is bounded. The statement x x instead of xy. For example, Ê is an ordered set if r 0 is a positive real number. − Example 1.3 (a) The intervals [a, b], (a, b], [a, b), (a, b), ( , b), and ( , b] are bounded −∞ −∞ above by b and all numbers greater than b. 1 1 1 (b) E := n Æ = 1, , ,... is bounded above by any α 1. It is bounded below { n | ∈ } { 2 3 } ≥ by 0.

Definition 1.2 Suppose S is an ordered set, E S, an E is bounded above. Suppose there ⊆ exists an α S such that ∈ (i) α is an upper bound of E. (ii) If β is an upper bound of E then α β. ≤ Then α is called the supremum of E (or least upper bound) of E. We write

α = sup E.

An equivalent formulation of (ii) is the following:

(ii)′ If β < α then β is not an upper bound of E.

The infimum (or greatest lower bound)ofaset E which is bounded below is defined in the same manner: The statement α = inf E means that α is a lower bound of E and for all lower bounds β of E we have β α. ≤ Example 1.4 (a) If α = sup E exists, then α may ormay not belong to E. For instance consider [0, 1) and [0, 1]. Then 1 = sup[0, 1) = sup[0, 1], however 1 [0, 1) but 1 [0, 1]. We will show that sup[0, 1] = 1. Obviously, 1 is an upper 6∈ ∈ bound of [0, 1]. Suppose that β < 1, then β is not an upper bound of [0, 1] since β 1. Hence 6≥ 1 = sup[0, 1]. We show will show that sup[0, 1)=1. Obviously, 1 is an upper bound of this interval. Suppose that β < 1. Then β < β+1 < 1. Since β+1 [0, 1), β is not an upper bound. Consequently, 2 2 ∈

1 = sup[0, 1). É (b) Consider the sets A and B of Example1.2 as subsets of the ordered set É. Since A B = ∪ + (there is no rational number with x2 =2) the upper bounds of A are exactly the elements of B. 1.1 Real Numbers 17

Indeed, if a A and b B then a2 < 2 < b2. Taking the square root we have a < b. Since B ∈ ∈ contains no smallest member, A has no supremum in É+. Similarly, B is bounded below by any element of A. Since A has no largest member, B has no

infimum in É.

Remarks 1.1 (a) It is clear from (ii) and the trichotomy of < that there is at most one such α.

Indeed, suppose α′ also satisfies (i) and (ii), by (ii) we have α α′ and α′ α; hence α = α′. ≤ ≤ (b) If sup E exists and belongs to E, we call it the maximum of E and denote it by max E. Hence, max E = sup E and max E E. Similarly, if the infimum of E exists and belongs to ∈ E we call it the minimum and denote it by min E; min E = inf E, min E E. ∈

bounded subset of É an upper bound sup max [0, 1] 2 1 1 [0, 1) 2 1 — A 2 — —

(c) Suppose that α is an upper bound of E and α E then α = max E, that is, property (ii) in ∈ Definition1.2 is automatically satisfied. Similarly, if β E is a lower bound, then β = min E. ∈ (d) If E is a finite set it has always a maximum and a minimum.

1.1.2 Fields Definition 1.3 A field is a set F with two operations, called addition and multiplication which satisfy the following so-called “field axioms” (A), (M), and (D):

(A) Axioms for addition

(A1) If x F and y F then their sum x + y is in F . ∈ ∈ (A2) Addition is commutative: x + y = y + x for all x, y F . ∈ (A3) Addition is associative: (x + y)+ z = x +(y + z) for all x, y, z F . ∈ (A4) F contains an element 0 such that 0+ x = x for all x F . ∈ (A5) To every x F there exists an element x F such that x +( x)=0. ∈ − ∈ − (M) Axioms for multiplication

(M1) If x F and y F then their product xy is in F . ∈ ∈ (M2) Multiplication is commutative: xy = yx for all x, y F . ∈ (M3) Multiplication is associative: (xy)z = x(yz) for all x, y, z F . ∈ (M4) F contains an element 1 such that 1x = x for all x F . ∈ (M5) If x F and x =0 then there exists an element 1/x F such that x (1/x)=1. ∈ 6 ∈ · (D) The distributive law

x(y + z)= xy + xz holds for all x, y, z F . ∈ 18 1 Real and Complex Numbers

Remarks 1.2 (a) One usually writes x x y, , x + y + z, xyz, x2, x3, 2x, . . . − y in place of 1 x +( y), x , (x + y)+ z, (xy)z, x x, x x x, 2x, . . . − · y · · ·

(b) The field axioms clearly hold in É if addition and multiplication have their customary mean-

Z Z ing. Thus É is a field. The integers form not a field since 2 has no multiplicative inverse ∈ (axiom (M5) is not fulfilled).

(c) The smallest field is F = 0, 1 consisting of the neutral element 0 for addition and the neu- 2 { } + 0 1 tral element 1 for multiplication. Multiplication and addition are defined as follows 0 0 1 1 1 0 0 1 · 0 0 0 . It is easy to check the field axioms (A), (M), and (D) directly. 1 0 1 (d) (A1) to (A5) and (M1) to (M5) mean that both (F, +) and (F 0 , ) are commutative (or \{ } · abelian) groups, respectively.

Proposition 1.5 The axioms of addition imply the following statements. (a) If x + y = x + z then y = z (Cancellation law). (b) If x + y = x then y =0 (The element 0 is unique). (c) If x + y =0 the y = x (The inverse x is unique). − − (d) ( x)= x. − − Proof. If x + y = x + z, the axioms (A) give

y = 0+ y = ( x + x)+ y = x +(x + y) = x +(x + z) A 4 A 5 − A 3 − assump. − = ( x + x)+ z = 0+ z = z. A 3 − A 5 A 4 This proves (a). Take z = 0 in (a) to obtain (b). Take z = x in (a) to obtain (c). Since − x + x =0, (c) with x in place of x and x in place of y, gives (d). − −

Proposition 1.6 The axioms for multiplication imply the following statements. (a) If x =0 and xy = xz then y = z (Cancellation law). 6 (b) If x =0 and xy = x then y =1 (The element 1 is unique). 6 (c) If x =0 and xy =1 then y =1/x (The inverse 1/x is unique). 6 (d) If x =0 then 1/(1/x)= x. 6 The proof is so similar to that of Proposition1.5 that we omit it. 1.1 Real Numbers 19

Proposition 1.7 The field axioms imply the following statements, for any x, y, z F ∈ (a) 0x =0. (b) If xy =0 then x =0 or y =0. (c) ( x)y = (xy)= x( y). − − − (d) ( x)( y)= xy. − − Proof. 0x +0x = (0+0)x =0x. Hence 1.5(b) implies that 0x =0, and (a) holds. Suppose to the contrary that both x =0 and y =0 then (a) gives 6 6 1 1 1 1 1= xy = 0=0, y ·x y ·x a contradiction. Thus (b) holds. The first equality in (c) comes from

( x)y + xy =( x + x)y =0y =0, − − combined with 1.5(b); the other half of (c) is proved in the same way. Finally,

( x)( y)= [x( y)] = [ xy]= xy − − − − − − by (c) and 1.5 (d).

1.1.3 Ordered Fields

In analysis dealing with equations is as important as dealing with inequalities. Calculations with inequalities are based on the ordering axioms. It turns out that all can be reduced to the notion of positivity. In F there are distinguished positive elements (x> 0) such that the following axioms are valid.

Definition 1.4 An ordered field is a field F which is also an ordered set, such that for all x, y, z F ∈ (O) Axioms for ordered fields

(O1) x> 0 and y> 0 implies x + y > 0, (O2) x> 0 and y> 0 implies xy > 0.

If x> 0 we call x positive; if x< 0, x is negative. Ê For example É and are ordered fields, if x>y is defined to mean that x y is positive. − Proposition 1.8 The following statements are true in every ordered field F . (a) If x

Proof. (a) By assumption (a + y) (a + x)= y x> 0. Hence a + x < a + y. − − (b) By assumption and by (a) we have x + x′

Proposition 1.9 The following statements are true in every ordered field. (a) If x> 0 then x< 0, and if x< 0 then x> 0. − − (b) If x> 0 and y < z then xy < xz. (c) If x< 0 and y < z then xy > xz. (d) If x =0 then x2 > 0. In particular, 1 > 0. 6 (e) If 0

Proof. (a) If x > 0 then 0 = x + x > x +0 = x, so that x < 0. If x < 0 then − − − − 0= x + x< x +0= x so that x> 0. This proves (a). − − − − (b) Since z>y, we have z y > 0, hence x(z y) > 0 by axiom (O2), and therefore − − xz = x(z y)+ xy > 0+ xy = xy. − Prp. 1.8

(c) By (a), (b) and Proposition 1.7 (c)

[x(z y)]=( x)(z y) > 0, − − − − so that x(z y) < 0, hence xz < xy. − (d) If x > 0 axiom 1.4(ii) gives x2 > 0. If x < 0 then x > 0, hence ( x)2 > 0 But − − x2 =( x)2 by Proposition1.7(d). Since 12 =1, 1 > 0. − (e) If y > 0 and v 0 then yv 0. But y (1/y)=1 > 0. Hence 1/y > 0, likewise 1/x > 0. ≤ ≤ · If we multiply x

Remarks 1.3 (a) The finite field F = 0, 1 , see Remarks 1.2, is not an ordered field since 2 { } 1+1=0 which contradicts 1 > 0. 2 (b) The field of complex numbers C (see below) is not an ordered field since i = 1 contradicts − Proposition1.9 (a), (d).

1.1.4 Embedding of natural numbers into the real numbers

Let F be an ordered field. We want to recover the integers inside F . In order to distinguish 0 and 1 in F from the integers 0 and 1 we temporarily write 0F and 1F . For a positive integer

n Æ, n 2 we define ∈ ≥ n := 1 +1 + +1 (n times). F F F ··· F

Lemma 1.10 We have n > 0 for all n Æ. F F ∈ 1.1 Real Numbers 21

Proof. We use induction over n. By Proposition1.9(d) the statement is true for n =1. Suppose it is true for a fixed n, i.e. nF > 0F . Moreover 1F > 0F . Using axiom (O2) we obtain (n + 1)1F = nF +1F > 0.

From Lemma1.10 it follows that m = n implies n = m . Indeed, let n be greater than m, 6 F 6 F say n = m + k for some k Æ, then n = m + k . Since k > 0 it follows from 1.8(a) that ∈ F F F F n > m . In particular, n = m . Hence, the mapping F F F 6 F

Æ F, n n → 7→ F is a one-to-one correspondence (injective). In this way the positive integers are embedded into the real numbers. Addition and multiplication of natural numbers and of its embeddings are the same:

nF + mF =(n + m)F , nF mF =(nm)F .

From now on we identify a natural number with the associated real number. We write n for nF .

Definition 1.5 (The Archimedean Axiom) An ordered field F is called Archimedean if for all

x, y F with x> 0 and y> 0 there exists n Æ such that nx>y. ∈ ∈

An equivalent formulation is: The subset Æ F of positive integers is not bounded above. ⊂

Choose x = 1 in the above definition, then for any y F there in an n Æ such that n>y; ∈ ∈

hence Æ is not bounded above. Æ Suppose Æ is not bounded and x> 0,y > 0 are given. Then y/x is not an upper bound for ,

that is there is some n Æ with n > y/x or nx>y. ∈

1.1.5 The completeness of Ê Using the axioms so far we are not yet able to prove the existence of irrational numbers. We need the completeness axiom.

Definition 1.6 (Order Completeness) An ordered set S is said to be order complete if for every non-empty bounded subset E S has a supremum sup E in S. ⊂ (C) Completeness Axiom

The real numbers are order complete, i. e. every bounded subset E Ê has a supremum. ⊂

The set É of rational numbers is not order complete since, for example, the bounded set

2 É A = x É x < 2 has no supremum in . Later we will define √2 := sup A. { ∈ + | }

The existence of √2 in Ê is furnished by the completeness axiom (C).

Axiom (C) implies that every bounded subset E Ê has an infimum. This is an easy conse- ⊂ quence of Homework 1.4 (a). We will see that an order complete field is always Archimedean.

Proposition 1.11 (a) Ê is Archimedean. É (b) If x, y Ê, and x

22 1 Real and Complex Numbers Ê Part (b) may be stated by saying that É is dense in . Proof. (a) Let x,y > 0 be real numbers which do not fulfill the Archimedean property. That is,

if A := nx n Æ , then y would be an upper bound of A. Then (C) furnishes that A has a { | ∈ } supremum α = sup A. Since x> 0, α x < α and α x is not an upper bound of A. Hence − −

α x < mx for some m Æ. But then α< (m +1)x, which is impossible, since α is an upper − ∈ bound of A.

(b) See [Rud76, Theorem 29].

Ê É Remarks 1.4 (a) If x, y É with x

− 1 1 Æ Ex class: (b) We shall show that inf n Æ = 0. Since n > 0 for all n , > 0 by n | ∈ ∈ n

Proposition1.9 (e) and 0 is a lower bound. Suppose α > 0. Since Ê is Archimedean, we find  m Æ such that 1 < mα or, equivalently 1/m < α. Hence, α is not a lower bound for E ∈ which proves the claim. (c) Axiom (C) is equivalent to the Archimedean property together with the topological com-

pleteness (“Every Cauchy sequence in Ê is convergent,” see Proposition 2.18). (d) Axiom (C) is equivalent to the axiom of nested intervals, see Proposition2.11 below:

Let I := [a , b ] a sequence of closed nested intervals, that is (I I I ) n n n 1 ⊇ 2 ⊇ 3 ⊇··· such that for all ε> 0 there exists n such that 0 b a <ε for all n n . 0 ≤ n − n ≥ 0

Then there exists a unique real number a Ê which is a member of all intervals, ∈ i.e. a = I . n Æ n { } ∈ T 1.1.6 The Absolute Value

For x Ê one defines ∈ x, if x 0, x := ≥ | | ( x, if x< 0. −

Lemma 1.12 For a, x, y Ê we have ∈ (a) x 0 and x =0 if and only if x =0. Further x = x . | | ≥ | | | − | | | (b) x x , x = max x, x , and x a (x a and x a). ± ≤| | | | x{ − x} | | ≤ ⇐⇒ ≤ − ≤ | | (c) xy = x y and y = y if y =0. | | | | | | | | 6 (d) x + y x + y (triangle inequality). | |≤| | | | (e) x y x + y . || |−| ||≤| | Proof. (a) By Proposition 1.9 (a), x < 0 implies x = x > 0. Also, x > 0 implies x > 0. | | − | | Putting both together we obtain, x = 0 implies x > 0 and thus x = 0 implies x = 0. 6 | | | | Moreover 0 =0. This shows the first part. | | The statement x = x follows from (b) and ( x)= x. | | | − | − − (b) Suppose first that x 0. Then x 0 x and we have max x, x = x = x . If x< 0 ≥ ≥ ≥ − { − } | | then x> 0 > x and − 1.1 Real Numbers 23 max x, x = x = x . This proves max x, x = x . Since the maximum is an {− } − | | { − } | | upper bound, x x and x x. Suppose now a is an upper bound of x, x . Then | | ≥ | | ≥ − { − } x = max x, x a. On the other hand, max x, x a implies that a is an upper bound | | { − } ≤ { − } ≤ of x, x since max is. { − } One proves the first part of (c) by verifying the four cases (i) x, y 0, (ii) x 0, y < 0, (iii) ≥ ≥ x< 0, y 0, and (iv) x,y < 0 separately. (i) is clear. In case (ii) we have by Proposition 1.9 (a) ≥ and (b) that xy 0, and Proposition1.7(c) ≤ x y = x( y)= (xy)= xy . | || | − − | | The cases (iii) and (iv) are similar. To the second part. Since x = x y we have by the first part of (c), x = x y . The claim follows by multipli- y · | | y | | cation with 1 . y (d) By (b) we| | have x x and y y . It follows from Proposition1.8(b) that ± ≤| | ± ≤| | (x + y) x + y . ± ≤| | | | By the second part of (b) with a = x + y , we obtain x + y x + y . | | | | | |≤| | | | (e) Inserting u := x + y and v := y into u + v u + v one obtains − | |≤| | | | x x + y + y = x + y + y . | |≤| | | − | | | | | Adding y on both sides one obtains x y x + y . Changing the role of x and y −| | | |−| |≤| | in the last inequality yields ( x y ) x + y . The claim follows again by (b) with − | |−| | ≤ | | a = x + y . | |

1.1.7 Supremum and Infimum revisited The following equivalent definition for the supremum of sets of real numbers is often used in the sequel. Note that

x β x M ≤ ∀ ∈ = sup M β. ⇒ ≤ Similarly, α x for all x M implies α inf M. ≤ ∈ ≤

Remarks 1.5 (a) Suppose that E Ê. Then α is the supremum of E if and only if ⊂ (1) α is an upper bound for E, (2) For all ε> 0 there exists x E with α ε < x. ∈ − Using the Archimedean axiom (2) can be replaced by

1 (2’) For all n Æ there exists x E such that α < x. ∈ ∈ − n

24 1 Real and Complex Numbers Ê (b) Let M Ê and N nonempty subsets which are bounded above. ⊂ ⊂ Then M + N := m + n m M, n N is bounded above and { | ∈ ∈ }

sup(M + N) = sup M + sup N. Ê (c) Let M Ê and N nonempty subsets which are bounded above. ⊂ + ⊂ + Then MN := mn m M, n N is bounded above and { | ∈ ∈ } sup(MN) = sup M sup N.

1.1.8 Powers of real numbers

n We shall prove the existence of nth roots of positive reals. We already know x , n Z. Itis n n 1 1 n 1 ∈ recursively defined by x := x − x, x := x, n Æ and x := n for n< 0. · ∈ x−

Proposition 1.13 (Bernoulli’s inequality) Let x 1 and n Æ. Then we have ≥ − ∈ (1 + x)n 1+ nx. ≥ Equality holds if and only if x =0 or n =1.

Proof. We use induction over n. In the cases n = 1 and x = 0 we have equality. The strict inequality (with an > sign in place of the sign) holds for n =2 and x =0 since (1 + x)2 = ≥ 0 6 1+2x + x2 > 1+2x. Suppose the strict inequality is true for some fixed n 2 and x = 0. ≥ 6 Since 1+ x 0 by Proposition1.9(b) multiplication of the induction assumption by this factor ≥ yields

(1 + x)n+1 (1 + nx)(1 + x)=1+(n + 1)x + nx2 > 1+(n + 1)x. ≥

This proves the strict assertion for n +1. We have equality only if n =1 or x =0. Æ Lemma 1.14 (a) For x, y Ê with x,y > 0 and n we have ∈ ∈ x

⇐⇒ Æ (b) For x, y Ê and n we have ∈ + ∈ n 1 n n n 1 nx − (y x) y x ny − (y x). (1.2) − ≤ − ≤ − We have equality if and only if n =1 or x = y. Proof. (a) Observe that

n n n n k k 1 y x =(y x) y − x − = c(y x) − − − Xk=1 1.1 Real Numbers 25

n n k k 1 with c := k=1 y − x − > 0 since x,y > 0. The claim follows. (b) We have P n n n n 1 n k k 1 n 1 y x nx − (y x)=(y x) y − x − x − − − − − − k=1 Xn  k 1 n k n k =(y x) x − y − x − 0 − − ≥ k=1 X  n k n k since by (a) y x and y − x − have the same sign. The proof of the second inequality is − − quite analogous.

Proposition 1.15 For every real x> 0 and every positive integer n Æ there is one and only ∈ one y> 0 such that yn = x.

1 This number y is written √n x or x n , and it is called “the nth root of x”. n n Proof. The uniqueness is clear since by Lemma1.14(a) 0

E := t Ê t < x . { ∈ + | } Observe that E = ∅ since 0 E. We show that E is bounded above. By Bernoulli’s inequality 6 ∈ and since 0 < x < nx we have

t E tn

Lemma⇒1.14 Ê Hence, 1+ x is an upper bound for E. By the order completeness of Ê there exists y such ∈ that y = sup E. We have to show that yn = x. For, we will show that each of the inequalities yn > x and yn < x leads to a contradiction. Assume yn < x and consider (y + h)n with “small” h (0

n n n 1 n 1 0 (y + h) y n (y + h) − (y + h y) < h n(y + 1) − . ≤ − ≤ − n 1 n Choosing h small enough that h n(y + 1) − < x y we may continue − (y + h)n yn x yn. − ≤ − Consequently, (y + h)n < x and therefore y + h E. Since y + h>y, this contradicts the fact ∈ that y is an upper bound of E. Assume yn > x and consider (y h)n with “small” h (0

n n n 1 n 1 0 y (y h) ny − (y y + h) < hny − . ≤ − − ≤ − n 1 n Choosing h small enough that h ny −

Consequently, x < (y h)n and therefore tn

1/n 1/n 1/n Remarks 1.6 (a) If a and b are positive real numbers and n Æ then (ab) = a b . ∈ Proof. Put α = a1/n and β = b1/n. Then ab = αnβn = (αβ)n, since multiplica- tion is commutative. The uniqueness assertion of Proposition1.15 shows therefore that (ab)1/n = αβ = a1/n b1/n.

(b) Fix b> 0. If m, n, p, q Z and n> 0, q > 0, and r = m/n = p/q. Then we have ∈ (bm)1/n =(bp)1/q. (1.3)

Hence it makes sense to define br =(bm)1/n.

(c) Fix b> 1. If x Ê define ∈

x p

b = sup b p É,p

(d) If a,b > 0 and x, y Ê, then x+y x y x y ∈ bx xy x y x x x (i) b = b b , b − = by , (ii) b =(b ) , (iii) (ab) = a b .

1.1.9 Logarithms

Fix b > 1, y > 0. Similarly as in the preceding subsection, one can prove the existence of a unique real x such that bx = y. This number x is called the logarithm of y to the base b, and we write x = logb y. Knowing existence and uniqueness of the logarithm, it is not difficult to prove the following properties.

Lemma 1.16 For any a> 0, a =1 we have 6 (a) loga(bc) = loga b + loga c if b,c > 0; c (b) loga(b )= c loga b, if b> 0; logd b (c) loga b = if b,d > 0 and d =1. logd a 6 Later we will give an alternative definition of the logarithm . 1.1 Real Numbers 27

Review of Trigonometric Functions

(a) Degrees and Radians

The following table gives some important angles in degrees and radians. The precise definition of π is given below. For a moment it is just an abbreviation to measure angles. Transformation 2π of angles αdeg measured in degrees into angles measured in radians goes by αrad = αdeg 360◦ .

Degrees 0◦ 30◦ 45◦ 60◦ 90◦ 120◦ 135◦ 150◦ 180◦ 270◦ 360◦ π π π π 2π 3π 5π 3π Radians 0 6 4 3 2 3 4 6 π 2 2π

(b) Sine and Cosine

The sine, cosine, and tangent functions are defined in terms of ratios of sides of a right triangle:

length of the side adjacent to ϕ cos ϕ = , hypotenuse length of the hypotenuse opposite side length of the side opposite to ϕ sin ϕ = , length of the hypotenuse ϕ . length of the side opposite to ϕ tan ϕ = . adjacent side length of the side adjacent to θ

Let ϕ be any angle between 0◦ and 360◦. Further let P be the point on the unit circle (with center in (0, 0) and radius 1) such that the ray from P to the origin (0, 0) and the positive x-axis make an angle ϕ. Then cos ϕ and sin ϕ are defined to be the x-coordinate and the y-coordinate of the point P , respectively.

y

P 1 sin ϕ If the angle ϕ is between 0◦ and 90◦ this new definition coincides ϕ with the definition using the right triangle since the hypotenuse cos ϕ x which is a radius of the unit circle has now length 1.

y P

If 90◦ <ϕ< 180◦ we find sin ϕ

ϕ cos ϕ x cos ϕ = cos(180◦ ϕ) < 0, − − sin ϕ = sin(180◦ ϕ) > 0. − 28 1 Real and Complex Numbers

y

If 180◦ <ϕ< 270◦ we find

ϕ cos ϕ x cos ϕ = cos(ϕ 180◦) < 0, − − ϕ sin sin ϕ = sin(ϕ 180◦) < 0. − −

P

y

If 270◦ <ϕ< 360◦ we find

ϕ cos ϕ x cos ϕ = cos(360◦ ϕ) > 0, − ϕ sin sin ϕ = sin(360◦ ϕ) < 0. 1 − −

P

For angles greater than 360◦ or less than 0◦ define

cos ϕ = cos(ϕ + k 360◦), sin ϕ = sin(ϕ + k 360◦), · ·

where k Z is chosen such that 0◦ ϕ + k 360◦ < 360◦. Thinking of ϕ to be given in radians, ∈ ≤ cosine and sine are functions defined for all real ϕ taking values in the closed interval [ 1, 1]. π − If ϕ = + kπ, k Z then cos ϕ =0 and we define 6 2 ∈ 6 sin ϕ tan ϕ := . cos ϕ

If ϕ = kπ, k Z then sin ϕ =0 and we define 6 ∈ 6 cos ϕ cot ϕ := . sin ϕ In this way we have defined cosine, sine, tangent, and cotangent for arbitrary angles.

(c) Special Values

x in degrees 0◦ 30◦ 45◦ 60◦ 90◦ 120◦ 135◦ 150◦ 180◦ 270◦ 360◦

π π π π 2π 3π 5π 3π x in radians 0 6 4 3 2 3 4 6 π 2 2π

sin x 0 1 √2 √3 1 √2 √3 1 0 1 0 2 2 2 2 2 2 − cos x 1 √3 √2 1 0 1 √2 √3 10 1 2 2 2 − 2 − 2 − 2 − √3 √ √ √3 tan x 0 3 1 3 / 3 1 3 0 / 0 Recall the addition formulas for cosine and sine− and− the trig−onometric pythagoras. cos(x + y) = cos x cos y sin x sin y, − (1.5) sin(x + y) = sin x cos y + cos x sin y. sin2 x + cos2 x =1. (1.6) 1.2 Complex numbers 29

1.2 Complex numbers

Some algebraic equations do not have solutions in the real number system. For instance the quadratic equation x2 4x +8=0 gives ‘formally’ − x =2+ √ 4 and x =2 √ 4. 1 − 2 − − We will see that one can work with this notation.

Definition 1.7 A complex number is an ordered pair (a, b) of real numbers. “Ordered” means that (a, b) = (b, a) if a = b. Two complex numbers x = (a, b) and y = (c,d) are said to be 6 6 equal if and only if a = c and b = d. We define

x + y := (a + c, b + d), xy := (ac bd, ad + bc). − Theorem 1.17 These definitions turn the set of all complex numbers into a field, with (0, 0) and (1, 0) in the role of 0 and 1. Proof. We simply verify the field axioms as listed in Definition1.3. Of course, we use the field

structure of Ê. Let x =(a, b), y =(c,d), and z =(e, f). (A1) is clear. (A2) x + y =(a + c, b + d)=(c + a, d + b)= y + x. (A3) (x+y)+z =(a+c, b+d)+(e, f)=(a+c+e, b+d+f)=(a, b)+(c+e, d+f)= x+(y+z). (A4) x +0=(a, b)+(0, 0)=(a, b)= x. (A5) Put x := ( a, b). Then x +( x)=(a, b)+( a, b)=(0, 0)=0. − − − − − − (M1) is clear. (M2) xy =(ac bd, ad + bc)=(ca db, da + cb)= yx. − − (M3) (xy)z =(ac bd, ad + bc)(e, f)=(ace bde adf bcf, acf bdf + ade + bce) = − − − − − (a, b)(ce df, cf + de)= x(yz). − (M4) x 1=(a, b)(1, 0)=(a, b)= x. · (M5) If x = 0 then (a, b) = (0, 0), which means that at least one of the real numbers a, b is 6 6 different from 0. Hence a2 + b2 > 0 and we can define 1 a b := , − . x a2 + b2 a2 + b2   Then 1 a b x =(a, b) , − = (1, 0)=1. · x a2 + b2 a2 + b2   (D)

x(y + z)=(a, b)(c + e, d + f)=(ac + ae bd bf, ad + af + bc + be) − − =(ac bd, ad + bc)+(ae bf, af + be) − − = xy + yz. 30 1 Real and Complex Numbers

Remark 1.7 For any two real numbers a and b we have (a, 0) + (b, 0) = (a + b, 0) and (a, 0)(b, 0) = (ab, 0). This shows that the complex numbers (a, 0) have the same arithmetic properties as the corresponding real numbers a. We can therefore identify (a, 0) with a. This gives us the real field as a subfield of the complex field. Note that we have defined the complex numbers without any reference to the mysterious square root of 1. We now show that the notation (a, b) is equivalent to the more customary a + bi. − Definition 1.8 i := (0, 1).

2 Lemma 1.18 (a) i = 1. (b) If a, b Ê then (a, b)= a + bi. − ∈ Proof. (a) i2 = (0, 1)(0, 1)=( 1, 0) = 1. − − (b) a + bi=(a, 0)+(b, 0)(0, 1)=(a, 0)+(0, b)=(a, b).

Definition 1.9 If a, b are real and z = a + bi, then the complex number z := a bi is called the − conjugate of z. The numbers a and b are the real part and the imaginary part of z, respectively. We shall write a = Re z and b = Im z.

Proposition 1.19 If z and w are complex, then (a) z + w = z + w, (b) zw = z w, · (c) z + z = 2Re z, z z = 2i Im z, − (d) z z is positive real except when z =0. Proof. (a), (b), and (c) are quite trivial. To prove (d) write z = a+bi and note that z z = a2 +b2.

Definition 1.10 If z is complex number, its absolute value z is the (nonnegative) root of z z; | | that is z := √z z. | | The existence (and uniqueness) of x follows from Proposition1.19(d). Note that when x is | | real, then x = x, hence x = √x2. Thus x = x if x > 0 and x = x if x < 0. We have | | | | | | − recovered the definition of the absolute value for real numbers, see Subsection1.1.6.

Proposition 1.20 Let z and w be complex numbers . Then (a) z > 0 unless z =0, | | (b) z = z , | | | | (c) zw = z w , | | | | | | (d) Re z z , | |≤| | (e) z + w z + w . | |≤| | | | Proof. (a) and (b) are trivial. Put z = a + bi and w = c + di, with a,b,c,d real. Then

zw 2 =(ac bd)2 +(ad + bc)2 =(a2 + b2)(c2 + d2)= z 2 w 2 | | − | | | | 1.2 Complex numbers 31 or zw 2 =( z w )2. Now (c) follows from the uniqueness assertion for roots. | | | | | | To prove (d), note that a2 a2 + b2, hence ≤ a = √a2 √a2 + b2 = z . | | ≤ | | To prove (e), note that zw is the conjugate of z w, so that z w + zw = 2Re(z w). Hence

z + w 2 =(z + w)(z + w)= z z + z w + zw + w w | | = z 2 +2Re(z w)+ w 2 | | | | z 2 +2 z w + w 2 =( z + w )2. ≤| | | | | | | | | | | | Now (e) follows by taking square roots.

1.2.1 The Complex Plane and the Polar form There is a bijective correspondence between complex numbers and the points of a plane. Im By the Pythagorean theorem it is clear that z = √a2 + b2 is exactly the distance of | | b z=a+b i z from the origin 0. The angle ϕ between the positive real axis and the half-line 0z is r=| z| called the argument of z and is denoted by ϕ = arg z. If z = 0, the argument ϕ is 6 ϕ uniquely determined up to integer multiples of 2π a Re Elementary trigonometry gives b a sin ϕ = , cos ϕ = . z z | | | | This gives with r = z , a = r cos ϕ and b = r sin ϕ. Inserting these into the rectangular form | | of z yields

z = r(cos ϕ + i sin ϕ), (1.7) which is called the polar form of the complex number z.

Example 1.5 a) z =1+i. Then z = √2 and sin ϕ =1/√2 = cos ϕ. This implies ϕ = π/4. | | Hence, the polar form of z is 1+ i = √2(cos π/4 + i sin π/4). b) z = i. We have i = 1 and sin ϕ = 1, cos ϕ = 0. Hence ϕ = 3π/2 and i = − | − | − − 1(cos 3π/2+isin3π/2). c) Computing the rectangular form of z from the polar form is easier.

z = 32(cos7π/6+isin7π/6) = 32( √3/2 i/2) = 16√3 16i. − − − − 32 1 Real and Complex Numbers

z+w The addition of complex numbers corresponds z to the addition of vectors in the plane. The ge- ometric meaning of the inequality z + w | | ≤ z + w is: the sum of two edges of a triangle | | | | w is bigger than the third edge.

Multiplying complex numbers z = r(cos ϕ + zw i sin ϕ) and w = s(cos ψ + i sin ψ) in the polar form gives

zw = rs(cos ϕ + i sin ϕ)(cos ψ + i sin ψ) = rs(cos ϕ cos ψ sin ϕ sin ψ)+ − z i(cos ϕ sin ψ + sin ϕ cos ψ) w zw = rs(cos(ϕ + ψ)+isin(ϕ + ψ)), (1.8)

where we made use of the addition laws for sin and cos in the last equation. Hence, the product of complex numbers is formed by taking the product of their absolute values and the sum of their arguments.

The geometric meaning of multiplication by w is a similarity transformation of C. More pre- cisely, we have a rotation around 0 by the angle ψ and then a dilatation with factor s and center 0. Similarly, if w =0 we have 6 z r = (cos(ϕ ψ) + i sin(ϕ ψ)) . (1.9) w s − − Proposition 1.21 (De Moivre’s formula) Let z = r(cos ϕ+i sin ϕ) be a complex number with

absolute value r and argument ϕ. Then for all n Z one has ∈ zn = rn(cos(nϕ) + i sin(nϕ)). (1.10) Proof. (a) First let n > 0. We use induction over n to prove De Moivre’s formula. For n = 1 there is nothing to prove. Suppose (1.10) is true for some fixed n. We will show that the assertion is true for n +1. Using induction hypothesis and (1.8) we find zn+1 = zn z = rn(cos(nϕ)+i sin(nϕ))r(cos ϕ+i sin ϕ)= rn+1(cos(nϕ+ϕ)+i sin(nϕ+ϕ)). · This proves the induction assertion. n n (b) If n< 0, then z =1/(z− ). Since 1 = 1(cos0+isin0), (1.9) and the result of (a) gives

n 1 1 n z = n = n (cos(0 ( n)ϕ) + isin(0 ( n)ϕ)) = r (cos(nϕ) + i sin(nϕ)). z− r− − − − − This completes the proof. 1.2 Complex numbers 33

Example 1.6 Compute the polar form of z = √3 3i and compute z15. − We have z = √3+9=2√3, cos ϕ =1/2, and sin ϕ = √3/2. Therefore, ϕ = π/3 and | | − − z =2√3(cos( π/3) + sin( π/3)). By the De Moivre’s formula we have − − π π z15 = (2√3)15 cos 15 + i sin 15 =21537√3(cos( 5π)+isin( 5π)) − 3 − 3 − − z15 = 21537√3.     − 1.2.2 Roots of Complex Numbers

n Æ Let z C and n . A complex number w is called an nth root of z if w = z. In contrast ∈ ∈ to the real case, roots of complex numbers are not unique. We will see that there are exactly n different nth roots of z for every z =0. 6 Let z = r(cos ϕ + i sin ϕ) and w = s(cos ψ + i sin ψ) an nth root of z. De Moivre’s formula gives wn = sn(cos nψ + i sin nψ). If we compare wn and z we get sn = r or s = √n r 0.

ϕ 2kπ ≥ Z Moreover nψ = ϕ +2kπ, k Z or ψ = + , k . For k =0, 1,...,n 1 we obtain ∈ n n ∈ − different values ψ0, ψ1,...,ψn 1 modulo 2π. We summarize. −

Lemma 1.22 Let n Æ and z = r(cos ϕ + i sin ϕ) =0 a complex number. Then the complex ∈ 6 numbers ϕ +2kπ ϕ +2kπ w = √n r cos + i sin , k =0, 1,...,n 1 k n n −   are the n different nth roots of z.

Example 1.7 Compute the 4th roots of z = 1. − 4 z =1= w = √1=1, arg z = ϕ = 180◦. Hence w w | | ⇒ | | 1 0 ϕ ψ = , 0 4 ϕ 1 360◦ ψ = + · = 135◦, 1 4 4 z=-1 ϕ 2 360◦ ψ = + · = 225◦, 2 4 4 ϕ 3 360◦ ψ = + · = 315◦. w w 3 2 3 4 4 We obtain 1 1 w = cos45◦ + isin45◦ = √2 + i √2 0 2 2 1 1 w = cos135◦ + isin135◦ = √2 + i √2, 1 −2 2 1 1 w = cos225◦ + isin225◦ = √2 i √2, 2 −2 − 2 1 1 w = cos315◦ + isin315◦ = √2 i √2. 3 2 − 2 Geometric interpretation of the nth roots. The nth roots of z = 0 form a regular n-gon in the 6 complex plane with center 0. The vertices lie on a circle with center 0 and radius n z . | | p 34 1 Real and Complex Numbers

1.3 Inequalities

1.3.1 Monotony of the Power and Exponential Functions

Lemma 1.23 (a) For a,b > 0 and r É we have ∈ a < b ar < br if r> 0, ⇐⇒ a < b ar > br if r< 0. ⇐⇒

(b) For a> 0 and r, s É we have ∈ r 1, ⇐⇒ r as if a< 1. ⇐⇒

Proof. Suppose that r > 0, r = m/n with integers m, n Z, n > 0. Using Lemma1.14(a) ∈ twice we get m m m 1 m 1 a < b a < b (a ) n < (b ) n , ⇐⇒ ⇐⇒ which proves the first claim. The second part r< 0 can be obtained by setting r in place of r − in the first part and using Proposition1.9(e).

(b) Suppose that s>r. Put x = s r, then x É and x > 0. By (a), 1 < a implies x x s r s −r ∈ r s 1=1 < a . Hence 1 < a − = a /a (here we used Remark 1.6 (d)), and therefore a < a . Changing the roles of r and s shows that s

1.3.2 The Arithmetic-Geometric mean inequality Ê Proposition 1.24 Let n Æ and x ,...,x be in . Then ∈ 1 n + x + + x 1 ··· n √n x x . (1.11) n ≥ 1 ··· n We have equality if and only if x = x = = x . 1 2 ··· n Proof. We use forward-backward induction over n. First we show (1.11) is true for all n which are powers of 2. Then we prove that if (1.11) is true for some n +1, then it is true for n. Hence, it is true for all positive integers. The inequality is true for n =1. Let a, b 0 then (√a √b)2 0 implies a + b 2√ab and ≥ − ≥ ≥

the inequality is true in case n = 2. Equality holds if and only if a = b. Suppose it is true for Ê some fixed k Æ; we will show that it is true for 2k. Let x ,...,x ,y ,...,y . Using ∈ 1 k 1 k ∈ + induction assumption and the inequality in case n =2, we have 1/k 1/k 1 k k 1 1 k 1 k 1 k k x + y x + y x + y 2k i i ≥ 2 k i k i ≥ 2  i i  i=1 i=1 ! i=1 i=1 ! i=1 ! i=1 ! X X X X Y Y 1 k k 2k   x y . ≥ i i i=1 i=1 ! Y Y 1.3 Inequalities 35

This completes the ‘forward’ part. Assume now (1.11) is true for n +1. We will show it for n. n

Let x ,...,x Ê and set A := ( x )/n. By induction assumption we have 1 n ∈ + i=1 i 1 1 nP n+1 n n+1 1 1 1 (x + + x + A) x A (nA + A) x A n+1 n +1 1 ··· n ≥ i ⇐⇒ n +1 ≥ i i=1 ! i=1 ! 1 Y 1 Y n n+1 n n+1 n 1/n 1 n A x A n+1 A n+1 x A x . ≥ i ⇐⇒ ≥ i ⇐⇒ ≥ i i=1 ! i=1 ! i=1 ! Y Y Y It is trivial that in case x = x = = x we have equality. Suppose that equality holds in 1 2 ··· n a case where at least two of the xi are different, say x1 < x2. Consider the inequality with the new set of values x′ := x′ := (x + x )/2, and x′ = x for i 3. Then 1 2 1 2 i i ≥ 1/n 1/n n 1 n 1 n n x = x = x′ x′ . k n k n k ≥ k k=1 ! k=1 k=1 k=1 ! Y X 2 X Y x1 + x2 2 2 2 x x x′ x′ = 4x x x +2x x + x 0 (x x ) . 1 2 ≥ 1 2 2 ⇐⇒ 1 2 ≥ 1 1 2 2 ⇐⇒ ≥ 1 − 2   This contradicts the choice x < x . Hence, x = x = = x is the only case where 1 2 1 2 ··· n equality holds. This completes the proof.

1.3.3 The Cauchy–Schwarz Inequality

Proposition 1.25 (Cauchy–Schwarz inequality) Suppose that x1,...,xn,y1,...,yn are real numbers. Then we have

n 2 n n 2 2 xkyk xk yk. (1.12) ! ≤ · Xk=1 Xk=1 Xk=1

Equality holds if and only if there exists t Ê such that y = t x for k =1,...,n that is, the ∈ k k vector y =(y1,...,yn) is a scalar multiple of the vector x =(x1,...,xn). Proof. Consider the quadratic function f(t)= at2 2bt + c where − n n n 2 2 a = xk, b = xkyk, c = yk. Xk=1 Xk=1 Xk=1 Then

n n n f(t)= x2t2 2x y t + y2 k − k k k k=1 k=1 k=1 Xn X X n = x2t2 2x y t + y2 = (x t y )2 0. k − k k k k − k ≥ k=1 k=1 X  X 36 1 Real and Complex Numbers

Equality holds if and only if there is a t Ê with y = tx for all k. Suppose now, there is no ∈ k k such t Ê. That is ∈

f(t) > 0, for all t Ê . ∈ In other words, the polynomial f(t)= at2 2bt + c has no real zeros, t = 1 b √b2 ac . − 1,2 a ± − That is, the discriminant D = b2 ac is negative (only complex roots); hence b2 < ac: −  n 2 n n 2 2 xkyk < xk yk. ! · Xk=1 Xk=1 Xk=1 this proves the claim.

Corollary 1.26 (The Complex Cauchy–Schwarz inequality) If x1,...,xn and y1,...,yn are complex numbers, then

n 2 n n 2 2 xkyk xk yk . (1.13) ≤ | | | | k=1 k=1 k=1 X X X

C Equality holds if and only if there exists a λ such that y = λx, where y = (y1,...,yn)

n n ∈ ∈ C C , x =(x ,...,x ) . 1 n ∈ Proof. Using the generalized triangle inequality z + + z z + + z and the | 1 ··· n |≤| 1 | ··· | n | real Cauchy–Schwarz inequality we obtain

n 2 n 2 n 2 n n 2 2 xk yk xk yk = xk yk xk yk . ≤ | |! | | | |! ≤ | | · | | k=1 k=1 k=1 k=1 k=1 X X X X X This proves the inequality.

The right “less equal” is an equality if there is a t Ê such that y = t x . In the first “less ∈ | | | | equal” sign we have equality if and only if all zk = xkyk have the same argument; that is arg yk = arg xk + ϕ. Putting both together yields y = λx with λ = t(cos ϕ + i sin ϕ).

1.4 AppendixA

In this appendix we collect some assitional facts which were not covered by the lecture. We now show that the equation

x2 =2 (1.14) is not satisfied by any rational number x. Suppose to the contrary that there were such an x, we could write x = m/n with integers m and n, n =0 that are not both even. Then (1.14) implies 6 m2 =2n2. (1.15) 1.4 Appendix A 37

This shows that m2 is even and hence m is even. Therefore m2 is divisible by 4. It follows that the right hand side of (1.15) is divisible by 4, so that n2 is even, which implies that n is even. But this contradicts our choice of m and n. Hence (1.14) is impossible for rational x. We shall show that A contains no largest element and B contains no smallest. That is for every p A we can find a rational q A with p < q and for every p B we can find a rational q B ∈ ∈ ∈ ∈ such that q 0 the rational number 2 p2 2p +2 q = p + − = . (1.16) p +2 p +2 Then 4p2 +8p +4 2p2 8p 8 2(p2 2) q2 2= − − − = − . (1.17) − (p + 2)2 (p + 2)2 If p is in A then 2 p2 > 0, (1.16) shows that q>p, and (1.17) shows that q2 < 2. If p is in B − then 2 2.

A Non-Archimedean Ordered Field Ê The fields É and are Archimedean, see below. But there exist ordered fields without this

property. Let F := Ê(t) the field of rational functions f(t) = p(t)/q(t) where p and q are polynomials with real coefficients. Since p and q have only finitely many zeros, for large t,

f(t) is either positive or negative. In the first case we set f > 0. In this way Ê(t) becomes an

ordered field. But t > n for all n Æ since the polynomial f(t)= t n becomes positive for ∈ − large t (and fixed n). Our aim is to define bx for arbitrary real x.

Lemma 1.27 Let b, p be real numbers with b> 1 and p> 0. Set

r s É M = b r É,r

s Proof. (a) M is bounded above by arbitrary b , s Q, with s>p, and M ′ is bounded below by r ∈ any b , r É, with r

(c) Let s = sup M and ε > 0 be given. We want to show that inf M ′ < s + ε. Choose n Æ ∈ such that

1/n < ε/(s(b 1)). (1.18) −

By Proposition1.11 there exist r, s É with ∈ 1 r

Using s r< 1/n, Bernoulli’s inequality (part 2), and (1.18), we compute −

s r r s r 1 1 b b = b (b − 1) s(b n 1) s (b 1) < ε. − − ≤ − ≤ n − Hence s r inf M ′ b < b + ε sup M + ε. ≤ ≤ Since ε was arbitrary, inf M ′ sup M, and finally, with the result of (b), inf M ′ = sup M. ≤

Corollary 1.28 Suppose p É and b> 1 is real. Then ∈ p r

b = sup b r É,r

Inequalities

Now we extend Bernoulli’s inequality to rational exponents.

Proposition 1.29 (Bernoulli’s inequality) Let a 1 real and r É. Then ≥ − ∈ (a) (1 + a)r 1+ ra if r 1, ≥ ≥ (b) (1 + a)r 1+ ra if 0 r 1. ≤ ≤ ≤ Equality holds if and only if a =0 or r =1.

Proof. (b) Let r = m/n with m n, m, n Æ. Apply (1.11) to x := 1+ a, i =1,...,m and ≤ ∈ i xi := 1 for i = m +1,...,n. We obtain

1 1 m n m n (m(1 + a)+(n m)1) (1 + a) 1 − n − ≥ · m m a +1 (1 + a) n ,  n ≥ which proves (b). Equality holds if n =1 or if x = = x i.e. a =0. 1 ··· n (a) Now let s 1, z 1. Setting r = 1/s and a := (1+ z)1/r 1 we obtain r 1 and ≥ ≥ − − ≤ a 1. Inserting this into (b) yields ≥ − r r 1 s (1 + a) (1 + z) r 1+ r ((1 + z) 1) ≤ ≤ −  z r ((1 + z)s 1) ≤ − 1+ sz (1 + z)s. ≤ This completes the proof of (a). 1.4 Appendix A 39

Corollary 1.30 (Bernoulli’s inequality) Let a 1 real and x Ê. Then ≥ − ∈ (a) (1 + a)x 1+ xa if x 1, ≥ ≥ (b) (1 + a)x 1+ xa if x 1. Equality holds if and only if a =0 or x =1. ≤ ≤ r Proof. (a) First let a> 0. By Proposition1.29(a) (1 + a) 1+ ra if r É. Hence, ≥ ∈

x r É (1 + a) = sup (1 + a) r É,r xa, and Proposition1.29(a) implies − ≤ (1 + a)r 1+ ra > 1+ xa. (1.20) ≥ By definition of the power with a real exponent, see (1.4)

x 1 r (1 + a) = r = inf (1 + a) r É,r

Taking in (1.20) the infimum over all r É with r < x we obtain ∈ x r

(1 + a) = inf (1 + a) r É,r

Proposition 1.31 (Young’s inequality) If a, b Ê and p> 1, then ∈ + 1 1 ab ap + bq, (1.21) ≤ p q where 1/p +1/q =1. Equality holds if and only if ap = bq.

Proof. First note that 1/q = 1 1/p. We reformulate Bernoulli’s inequality for y Ê and − ∈ + p> 1 1 1 1 yp 1 p(y 1) (yp 1)+1 y yp + y. − ≥ − ⇐⇒ p − ≥ ⇐⇒ p q ≥ If b =0 the statement is always true. If b =0 insert y := ab/bq into the above inequality: 6 1 ab p 1 ab + p bq q ≥ bq   1 apbp 1 ab + bq p bpq q ≥ bq |· 1 1 ap + bq ab, p q ≥ since bp+q = bpq. We have equality if y = 1 or p = 1. The later is impossible by assumption. q q 1 (q 1)p p y = 1 is equivalent to b = ab or b − = a or b − = a (b = 0). If b = 0 equality holds if 6 and only if a =0. 40 1 Real and Complex Numbers

Proposition 1.32 (Holder’s¨ inequality) Let p > 1, 1/p +1/q = 1, and x ,...,x Ê and 1 n ∈ + y ,...,y Ê non-negative real numbers. Then 1 n ∈ + 1 1 n n p n q p q xkyk xk yk . (1.22) ≤ ! ! Xk=1 Xk=1 Xk=1 p q

We have equality if and only if there exists c Ê such that for all k =1,...,n, x /y = c (they ∈ k k are proportional).

1 1 n p p n q q Proof. Set A := ( k=1 xk) and B := ( k=1 yk) . The cases A =0 and B =0 are trivial. So we assume A,B > 0. By Young’s inequality we have P P x y 1 xp 1 yq k k k + k A · B ≤ p Ap q Bq 1 n 1 n 1 n = x y xp + yq ⇒ AB k k ≤ pAp k qBq k Xk=1 Xk=1 Xk=1 1 p 1 q = p xk + q yk p xk q yk 1 1 X X = P+ =1 P p q 1 1 n n p n q p q = xkyk xk yk . ⇒ ≤ ! ! Xk=1 Xk=1 Xk=1 p p q q p q Equality holds if and only if xk/A = yk/B for all k = 1,...,n. Therefore, xk/yk = const.

Corollary 1.33 (Complex Holder’s¨ inequality) Let p > 1, 1/p +1/q = 1 and x ,y C, k k ∈ k =1,...,n. Then

1 1 n n p n q p q xkyk xk yk . | | ≤ | | ! | | ! Xk=1 Xk=1 Xk=1 Equality holds if and only if x p / y q = const. for k =1,...,n. | k | | k | Proof. Set x := x and y := y in (1.22). This will prove the statement.

k | k | k | k | Ê Proposition 1.34 (Minkowski’s inequality) If x ,...,x Ê and y ,...,y and 1 n ∈ + 1 n ∈ + p 1 then ≥ 1 1 1 n p n p n p p p p (xk + yk) xk + yk . (1.23) ! ≤ ! ! Xk=1 Xk=1 Xk=1 Equality holds if p =1 or if p> 1 and xk/yk = const. 1.4 Appendix A 41

Proof. The case p =1 is obvious. Let p> 1. As before let q > 0 be the unique positive number with 1/p +1/q =1. We compute

n n n n p p 1 p 1 p 1 (xk + yk) = (xk + yk)(xk + yk) − = xk(xk + yk) − + yk(xk + yk) − k=1 k=1 k=1 k=1 X X 1 X X 1 1 q 1 q p p (p 1)q p p (p 1)q xk (xk + yk) − + yk (xk + yk) − (1.22)≤ ! !   k   k X 1/p X 1/q X 1/q X xp + yq (x + y )p . ≤ k k k k X  X   X  1 1 We can assume that (x + y )p > 0. Using 1 = by taking the quotient of the last k k − q p p 1/q inequality by ( (xk P+ yk) ) we obtain the claim. Equality holds if xp/(x + y )(p 1)q = const. and yp/(x + y )(p 1)q) = const.; that is P k k k − k k k − xk/yk = const.

Corollary 1.35 (Complex Minkowski’s inequality) If x ,...,x ,y ,...,y C and p 1 1 n 1 n ∈ ≥ then

1 1 1 n p n p n p p p p xk + yk xk + yk . (1.24) | | ! ≤ | | ! | | ! Xk=1 Xk=1 Xk=1

Equality holds if p =1 or if p> 1 and xk/yk = λ> 0. Proof. Using the triangle inequality gives x + y x + y ; hence | k k | ≤ | k | | k | n x + y p n ( x + y )p. The real version of Minkowski’s inequality k=1 | k k | ≤ k=1 | k | | k | now proves the assertion. P P

n n C If x =(x1,...,xn) is a vector in Ê or , the (non-negative) number

1 n p p x p := xk k k | | ! Xk=1 is called the p-norm of the vector x. Minkowski’s inequalitie then reads as

x + y x + y k kp ≤k kp k kp which is the triangle inequality for the p-norm. 42 1 Real and Complex Numbers Chapter 2

Sequences and Series

This chapter will deal with one of the main notions of calculus, the limit of a sequence. Al- though we are concerned with real sequences, almost all notions make sense in arbitrary metric

n n C spaces like Ê or .

Given a Ê and ε> 0 we define the ε-neighborhood of a as

∈ Ê U (a):=(a ε, a + ε)= x Ê a ε

2.1 Convergent Sequences

Ê Æ A sequence is a mapping x: Æ . To every n we associate a real number x . We write → ∈ n

this as (xn)n Æ or (x1, x2,... ). For different sequences we use different letters as (an), (bn), ∈ (yn).

1 1 1 1 Example 2.1 (a) xn = n , n , (xn)= 1, 2 , 3 ,... ; (b) x =( 1)n +1, (x )=(0, 2, 0, 2,... ); n − n  

(c) x = a (a Ê fixed), (x )=(a, a, . . . ) (constant sequence), n ∈ n (d) xn =2n 1, (xn)=(1, 3, 5, 7,... ) the sequence of odd positive integers. n − 2 3 (e) x = a (a Ê fixed), (x )=(a, a , a ,... ) (geometric sequence); n ∈ n

Definition 2.1 A sequence (x ) is said to be convergent to x Ê if n ∈

For every ε> 0 there exists n Æ such that n n implies 0 ∈ ≥ 0 x x < ε. | n − | x is called the limit of (xn) and we write

x = lim xn or simply x = lim xn or xn x. n →∞ →

If there is no such x with the above property, the sequence (xn) is said to be divergent.

In other words: (xn) converges to x if any neighborhood Uε(x), ε > 0, contains “almost all” elements of the sequence (xn). “Almost all” means “all but finitely many.” Sometimes we say “for sufficiently large n” which means the same.

43 44 2 Sequences and Series

This is an equivalent formulation since x U (x) means x ε < x < x+ε, hence x x < n ∈ ε − n | − n | ε. The n0 in question need not to be the smallest possible. We write

lim xn =+ (2.1) n →∞ ∞

if for all E > 0 there exists n Æ such that n n implies x E. Similarly, we write 0 ∈ ≥ 0 n ≥

lim xn = (2.2) n →∞ −∞

if for all E > 0 there exists n Æ such that n n implies x E. In these cases we say 0 ∈ ≥ 0 n ≤ − that + and are improper limits of (x ). Note that in both cases (x ) is divergent. ∞ −∞ n n Example 2.2 This is Example2.1 continued. 1 1 (a) limn = 0. Indeed, let ε > 0 be fixed. We are looking for some n0 with 0 < ε →∞ n n − for all n n . This is equivalent to 1/ε < n. Choose n > 1/ε (which is possible by the ≥ 0 0 Archimedean property). Then for all n n we have ≥ 0 1 1 n n > = <ε = x 0 < ε. ≥ 0 ε ⇒ n ⇒ | n − | Therefore, (x ) tends to 0 as n . n →∞ (b) x = ( 1)n +1 is divergent. Suppose to the contrary that x is the limit. To ε = 1 there is n − n such that for n n we have x x < 1. For even n n this implies 2 x < 1 for 0 ≥ 0 | n − | ≥ 0 | − | odd n n , 0 x = x < 1. The triangle inequality gives ≥ 0 | − | | | 2= (2 x)+ x 2 x + x < 1+1=2. | − |≤| − | | |

This is a contradiction. Hence, (xn) is divergent.

(c) x = a. lim x = a since x a = a a =0 <ε for all ε> 0 and all n Æ. n n | n − | | − | ∈ (d) lim(2n 1)=+ . Indeed, suppose that E > 0 is given. Choose n > E +1. Then − ∞ 0 2 E n n = n> +1= 2n 2 > E = x =2n 1 > 2n 2 > E. ≥ 0 ⇒ 2 ⇒ − ⇒ n − − This proves the claim. Similarly, one can show that lim n3 = . But both (( n)n) and − −∞ − (1, 2, 1, 3, 1, 4, 1, 5,...) have no improper limit. Indeed, the first one becomes arbitrarily large for even n and arbitrarily small for odd n. The second one becomes large for eveb n but is constant for odd n. (e) x = an, (a 0). n ≥ 1, if a =1, lim an = n →∞ (0, if 0 a< 1. ≤ (an) is divergent for a > 1. Moreover, lim an = + . To prove this let E > 0 be given. By

∞ Æ the Archimedean property of Ê and since a 1 > 0 we find m such that m(a 1) > E. − ∈ − Bernoulli’s inequality gives

am m(a 1)+1 > m(a 1) > E. ≥ − − 2.1 Convergent Sequences 45

By Lemma1.23(b), n m implies ≥ an am > E. ≥ This proves the claim. Clearly (an) is convergent in cases a = 0 and a = 1 since it is constant then. Let 0 0. Bernoulli’s inequality gives a − 1 1 n = =(b + 1)n 1+ nb > nb an a ≥   1 = 0 < an < . (2.3) ⇒ nb 1 1 Let ε> 0. Choose n0 > . Then ε> and n n0 implies εb n0b ≥ 1 1 an 0 = an = an < < ε. | − | | | (2.3) nb ≤ n0b

Hence, an 0. → Proposition 2.1 The limit of a convergent sequence is uniquely determined.

Proof. Suppose that x = lim x and y = lim x and x = y. Put ε := x y /2 > 0. Then n n 6 | − |

n Æ n n : x x < ε, ∃ 1 ∈ ∀ ≥ 1 | − n |

n Æ n n : y x < ε. ∃ 2 ∈ ∀ ≥ 2 | − n | Choose m max n , n . Then x x <ε and y x <ε. Hence, ≥ { 1 2} | − m | | − m | x y x x + y x < 2ε = x y . | − |≤| − m | | − m | | − | This contradiction establishes the statement.

Proposition2.1 holds in arbitrary metric spaces.

Definition 2.2 A sequence (xn) is said to be bounded if the set of its elements is a bounded set; i. e. there is a C 0 such that ≥

x C for all n Æ. | n | ≤ ∈

Similarly, (x ) is said to be bounded above or bounded below if there exists C Ê such that n ∈ x C or x C, respectively, for all n Æ n ≤ n ≥ ∈

Proposition 2.2 If (xn) is convergent, then (xn) is bounded. 46 2 Sequences and Series

Proof. Let x = lim x . To ε = 1 there exists n Æ such that x x < 1 for all n n . n 0 ∈ | − n | ≥ 0 Then x = x x + x x x + x < x +1 for all n n . Put | n | | n − |≤| n − | | | | | ≥ 0

C := max x1 ,..., xn0 1 , x +1 . {| | | − | | | }

Then x C for all n Æ. | n | ≤ ∈

The reversal statement is not true; there are bounded sequences which are not convergent, see Example 2.1 (b).

Ex Class: If (xn) has an improper limit, then (xn) is divergent. Proof. Suppose to the contrary that (x ) is convergent; then it is bounded, say x C for all n | n | ≤ n. This contradicts x > E as well as x < E for E = C and sufficiently large n. Hence, n n − (xn) has no improper limits, a contradiction.

2.1.1 Algebraic operations with sequences

The sum, difference, product, quotient and absolute value of sequences (xn) and (yn) are de- fined as follows (x ) (y ):=(x y ), (x ) (y ):=(x y ), n ± n n ± n n · n n n (x ) x n := n , (y = 0) (x ) := ( x ). (y ) y n 6 | n | | n | n  n 

Proposition 2.3 If (x ) and (y ) are convergent sequences and c Ê, then their sum, differ- n n ∈ ence, product, quotient (provided y = 0 and lim y = 0), and their absolute values are also n 6 n 6 convergent: (a) lim(x y ) = lim x lim y ; n ± n n ± n (b) lim(cxn)= c lim xn, lim(xn + c) = lim xn + c.

(c) lim(xn yn) = lim xn lim yn; xn lim xn · (d) lim = if yn =0 for all n and lim yn =0; yn lim yn 6 6 (e) lim x = lim x . | n | | n | Proof. Let x x and y y. n → n → (a) Given ε> 0 then there exist integers n1 and n2 such that n n implies x x < ε/2 and n n implies y y < ε/2. ≥ 1 | n − | ≥ 2 | n − | If n := max n , n , then n n implies 0 { 1 2} ≥ 0 (x + y ) (x + y) x x + y y ε. | n n − |≤| n − | | n − | ≤ The proof for the difference is quite similar. (b) follows from cx cx = c x x and (x + c) (x + c) = x x . | n − | | || n − | | n − | | n − | (c) We use the identity x y xy =(x x)(y y)+ x(y y)+ y(x x). (2.4) n n − n − n − n − n − Given ε> 0 there are integers n1 and n2 such that 2.1 Convergent Sequences 47

n n implies x x < √ε and n n implies y y < √ε. ≥ 1 | n − | ≥ 2 | n − | If we take n = max n , n , n n implies 0 { 1 2} ≥ 0 (x x)(y y) < ε, | n − n − | so that

lim (xn x)(yn y)=0. n →∞ − − Now we apply (a) and (b) to (2.4) and conclude that

lim (xnyn xy)=0. n →∞ − (d) Choosing n such that y y < y /2 if n n , we see that 1 | n − | | | ≥ 1 y y y + y < y /2+ y = y > y /2. | |≤| − n | | n | | | | n | ⇒ | n | | | Given ε> 0, there is an integer n > n such that n n implies 2 1 ≥ 2 y y < y 2 ε/2. | n − | | |

Hence, for n n2, ≥ 1 1 y y 2 = n − < y y <ε y − y y y 2 | n − | n n y | | 1 and we get lim( 1 )= . The general case can be reduced to the above case using (c) and yn lim yn (x /y )=(x 1/y ). n n n · n (e) By Lemma 1.12 (e) we have x x x x . Given ε > 0, there is n such that || n |−| ||≤| n − | 0 n n implies x x < ε. By the above inequality, also x x ε and we are ≥ 0 | n − | || n |−| | | ≤ done.

n+1 Example 2.3 (a) zn := n . Set xn = 1 and yn = 1/n. Then zn = xn + yn and we already n+1 1 know that lim xn =1 and lim yn =0. Hence, lim n = lim 1 + lim n =1+0=1. 3n2+13n (b) an = n2 2 . We can write this as − 3+ 13 a = n . n 1 2 − n2 Since lim 1/n = 0, by Proposition2.3, we obtain lim 1/n2 = 0 and lim 13/n = 0. Hence lim 2/n2 =0 and lim(3 + 13/n)=3. Finally,

2 13 3n + 13n limn 3+ 3 →∞ n lim 2 = 2 = =3. n n 2 limn 1 1 →∞ − →∞ − n  (c) We introduce the notion of a polynomial and and a rational function.

n n 1

Ê Ê Given a0, a1,...,an Ê, an =0. The function p: given by p(t) := ant +an 1t − + ∈ 6 → − + a t + a is called a polynomial. The positive integer n is the degree of the polynomial ··· 1 0 48 2 Sequences and Series

p(t), and a1,...an are called the coefficients of p(t). The set of all real polynomials forms a

real vector space denoted by Ê[x]. p

Given two polynomials p and q; put D := t Ê q(t) =0 . Then r = is a called a rational { ∈ | 6 } q function where r : D Ê is defined by → p(t) r(t) := . q(t)

Polynomials are special rational functions with q(t) 1. the set of rational functions with real ≡ coefficients form both a real vector space and a field. It is denoted by Ê(x).

Lemma 2.4 (a) Let a 0 be a sequence tending to zero with a =0 for every n. Then n → n 6 1 + , if a > 0 for almost all n; lim = ∞ n n →∞ an ( , if an < 0 for almost all n. −∞

(b) Let y a be a sequence converging to a and a> 0. Then y > 0 for almost all n Æ. n → n ∈ Proof. (a) We will prove the case with . Let ε> 0. By assumption there is a positive integer −∞ 1 1 n0 such that n n0 implies ε < an < 0. Tis implies 0 < an <ε and further < < 0. ≥ − − an − ε Suppose E > 0 is given; choose ε = 1/E and n0 as above. Then by the previous argument, n n implies ≥ 0 1 1 < = E. an −ε − 1 This shows limn = . →∞ an −∞ (b) To ε = a there exists n such that n n implies y a < a. That is a

r k s k Lemma 2.5 Suppose that p(t) = k=0 akt and q(t) = k=0 bkt are real polynomials with a =0 and b =0. Then r 6 s 6 P P 0, rs and ar > 0, →∞  ∞ br , r>s and ar < 0. −∞ br  Proof. Note first that  r 1 1 1 1 p(n) n ar + ar 1 + + a0 r 1 ar + ar 1 + + a0 r 1 = − n ··· n = − n ··· n =: c s 1 1 s r 1 1 s r n q(n) n bs + bs 1 + + b0 s n − · bs + bs 1 + + b0 s n − · − n ··· n  − n ··· n 1 Suppose that r = s. By Proposition2.3,  k 0 for all k Æ. By the same proposition, the n −→ ∈ limits of each summand in the numerator and denominator is 0 except for the first in each sum. Hence, 1 1 p(n) limn ar + ar 1 + + a0 r ar lim = →∞ − n ··· n = . n 1 1 q(n) limn bs + bs 1 + + b0 s br →∞ →∞ − n ··· n   2.1 Convergent Sequences 49

Supose now that rs and as > 0. The sequence (c ) has a positive limit. By Lemma2.4(b) br n almost all cn > 0. Hence, q(n) 1 1 dn := = r s p(n) n − cn tends to 0 by the above part and dn > 0 for almost all n. By Lemma 2.4 (a), the sequence 1 = p(n) tends to + as n , which proves the claim in the first case. The case dn q(n) ∞ → ∞ ar/bs < 0can be obtained by multiplying with 1 and noting that limn xn = + implies − →∞ ∞ limn ( xn)= . →∞ − −∞

In the German literature the next proposition is known as the ‘Theorem of the two policemen’.

Proposition 2.6 (Sandwich Theorem) Let a , b and x be real sequences with a x b n n n n ≤ n ≤ n for all but finitely many n Æ. Further let limn an = limn bn = x. Then xn is also ∈ →∞ →∞ convergent to x.

Proof. Let ε > 0. There exist n , n , and n Æ such n n implies a U (x), n n 1 2 3 ∈ ≥ 1 n ∈ ε ≥ 2 implies b U (x), and n n implies a x b . Choosing n = max n , n , n , n ∈ ε ≥ 3 n ≤ n ≤ n 0 { 1 2 3} n n implies x U (x). Hence, x x. ≥ 0 n ∈ ε n →

Remark. (a) If two sequences (an) and (bn) differ only in finitely many elements, then both sequences converge to the same limit or both diverge.

(b) Define the “shifted sequence” b := a , n Æ, where k is a fixed positv integer. Then n n+k ∈ both sequences converge to the same limit or both diverge.

2.1.2 Some special sequences 1 Proposition 2.7 (a) If p> 0, then lim =0. n np (b) If p> 0, then lim √n p =1. →∞ n (c) lim √n n =1. →∞ n →∞ nα

(d) If a> 1 and α Ê, then lim =0. n n ∈ →∞ a 1/p Proof. (a) Let ε > 0. Take n0 > (1/ε) (Note that the Archimedean Property of the real numbers is used here). Then n n implies 1/np <ε. ≥ 0 n (b) If p> 1, put xn = √p 1. Then, xn > 0 and by Bernoulli’s inequality (that is by homework 1 − 1 p 1 4.1) we have p n = (1+ p 1) n < 1+ − that is − n 1 0 < x (p 1) n ≤ n − By Proposition2.6, x 0. If p = 1, (b) is trivial, and if 0

(c) Put x = √n n 1. Then x 0, and, by the binomial theorem, n − n ≥ n(n 1) n =(1+ x )n − x2 . n ≥ 2 n Hence 2 0 x , (n 2). ≤ n ≤ n 1 ≥ r − By (a), 1 0. Applying the sandwich theorem again, x 0 and so √n n 1. √n 1 → n → → (d) Put p =− a 1, then p> 0. Let k be an integer such that k > α, k > 0. For n> 2k, − n n(n 1) (n k + 1) nkpk (1 + p)n > pk = − ··· − pk > . k k! 2kk!   Hence, α α k n n 2 k! α k 0 < = < n − (n> 2k). an (1 + p)n pk α k Since α k< 0, n − 0 by (a). − →

Q 1. Let (xn) be a convergent sequence, xn x. Then the sequence of arithmetic means n → 1 sn := n xk also converges to x. Xk=1 Q 2. Let (xn) > 0 a convergent sequence of positive numbers with and lim xn = x > 0. Then √n x x x x. Hint: Consider y = log x . 1 2 ··· n → n n 2.1.3 Monotonic Sequences

Definition 2.3 A real sequence (xn) is said to be (a) monotonically increasing if x x for all n; n ≤ n+1 (b) monotonically decreasing if x x for all n. n ≥ n+1 The class of monotonic sequences consists of the increasing and decreasing sequences.

A sequence is said to be strictly monotonically increasing or decreasing if xn < xn+1 or xn > x for all n, respectively. We write x and x . n+1 n ր n ց

Proposition 2.8 A monotonic and bounded sequence is convergent. More precisely, if (xn) is increasing and bounded above, then lim x = sup x . If (x ) is decreasing and bounded n { n} n below, then lim x = inf x . n { n} Proof. Suppose x x for all n (the proof is analogous in the other case). Let E := x

n ≤ n+1 { n |

Æ Æ n Æ and x = sup E. Then x x, n . For every ε> 0 there is an integer n such ∈ } n ≤ ∈ 0 ∈ that x ε < x < x, − n0 for otherwise x ε would be an upper bound of E. Since x increases, n n implies − n ≥ 0 x ε < x < x, − n 2.1 Convergent Sequences 51

which shows that (xn) converges to x.

cn Example 2.4 Let x = with some fixed c> 0. We will show that x 0 as n . n n! n → →∞ Writing (xn) recursively, c x = x , (2.5) n+1 n +1 n c we observe that (x ) is strictly decreasing for n c. Indeed, n c implies x = x < n ≥ ≥ n+1 n n +1 xn. On the other hand, xn > 0 for all n such that (xn) is bounded below by 0. By Proposi-

tion2.8, (x ) converges to some x Ê. Taking the limit n in (2.5), we have n ∈ →∞ c x = lim xn+1 = lim lim xn =0 x =0. n n n →∞ →∞ n +1 · →∞ · Hence, the sequence tends to 0.

2.1.4 Subsequences

Definition 2.4 Let (xn) be a sequence and (nk)k Æ a strictly increasing sequence of positive

Æ Æ integers nk . We call (xn )k Æ a subsequence of (xn)n . If (xn ) converges, its limit is ∈ k ∈ ∈ k called a subsequential limit of (xn).

k Example 2.5 (a) xn =1/n, nk =2 . then (xnk )=(1/2, 1/4, 1/8,... ). (b) (x )=(1, 1, 1, 1,... ). (x )=( 1, 1,... ) has the subsequential limit 1; (x )= n − − 2k − − − 2k+1 (1, 1, 1,... ) has the subsequential limit 1.

Proposition 2.9 Subsequences of convergent sequences are convergent with the same limit.

Proof. Let lim x = x and x be a subsequence. To ε > 0 there exists m Æ such that n nk 0 ∈ n m implies x x < ε. Since n m for all m, m m implies x x < ε; ≥ 0 | n − | m ≥ ≥ 0 | nm − | hence lim xnm = x.

Definition 2.5 Let (x ) be a sequence. We call x Ê a limit point of (x ) if every neighbor- n ∈ n hood of x contains infinitely many elements of (xn).

Proposition 2.10 The point x is limit point of the sequence (xn) if and only if x is a subsequen- tial limit.

Proof. If lim xnk = x then every neighborhood Uε(x) contains all but finitely many xnk ; in k →∞ particular, it contains infinitely many elements xn. That is, x is a limit point of (xn). Suppose x is a limit point of (x ). To ε = 1 there exists x U (x). To ε = 1/k there exists n n1 ∈ 1 nk with xn U1/k(x) and nk > nk 1. We have constructed a subsequence (xn ) of (xn) with k ∈ − k 1 x x < ; | − nk | k 52 2 Sequences and Series

Hence, (xnk ) converges to x.

Question: Which sequences do have limit points? The answer is: Every bounded sequence has limit points.

Proposition 2.11 (Principle of nested intervals) Let In := [an, bn] a sequence of closed nested intervals I I such that their lengths b a tend to 0: n+1 ⊆ n n − n Given ε> 0 there exists n such that 0 b a <ε for all n n . 0 ≤ n − n ≥ 0

For any such interval sequence I there exists a unique real number x Ê which is a member { n} ∈ of all intervals, i.e. x = I . n Æ n { } ∈ Proof. Since the intervals areT nested, (a ) is an increasing sequence bounded above by each n ր of the b , and (b ) is decreasing sequence bounded below by each of the a . Consequently, k n ց k by Proposition2.8 we have

x = lim an = sup an bm, for all m, and y = lim bm = inf bm x. n m ∃ →∞ { } ≤ ∃ →∞ { } ≥

Since a x y b for all n Æ n ≤ ≤ ≤ n ∈ ∅ = [x, y] I . 6 ⊆ n

n Æ \∈ We show the converse inclusion namely that [a , b ] [x, y]. Let p I for all n, that n Æ n n n ∈ ⊆ ∈ is, a p b for all n Æ. Hence sup a p inf b ; that is p [x, y]. Thus, n ≤ ≤ n ∈ Tn n ≤ ≤ n n ∈ [x, y] = I . We show uniqueness, that is x = y. Given ε > 0 we find n such that n Æ n y x b ∈ a ε. Hence y x 0; therefore x = y. The intersection contains a unique − ≤ Tn − n ≤ − ≤ point x.

Proposition 2.12 (Bolzano–Weierstraß) A bounded real sequence has a limit point.

Proof. We use the principle of nested intervals. Let (x ) be bounded, say x C. Hence, n | n | ≤ the interval [ C,C] contains infinitely many x . Consider the intervals [ C, 0] and [0,C]. At − k − least one of them contains infinitely many xk, say I1 := [a1, b1]. Suppose, we have already n 2 constructed In = [an, bn] of length bn an = C/2 − which contains infinitely many xk. − n 1 Consider the two intervals [an, (an + bn)/2] and [(an + bn)/2, bn] of length C/2 − . At least one of them still contains infinitely many xk, say In+1 := [an+1, bn+1]. In this way we have constructed a nested sequence of intervals which length go to 0. By Proposition2.11, there exists a unique x I . We will show that x is a subsequential limit of (x ) (and hence a n Æ n n ∈ ∈ limit point). For, choose x I ; this is possible since I contains infinitely many x . Then, T nk ∈ k k m a x b for all k Æ. Proposition2.6 gives x = lim a lim x lim b = x; hence k ≤ nk ≤ k ∈ k ≤ nk ≤ k lim xnk = x.

Remark. The principle of nested intevals is equivalent to the order completeness of Ê. 2.1 Convergent Sequences 53

n 1 1 Example 2.6 (a) x = ( 1) − + ; set of limit points is 1, 1 . First note that 1 = n − n {− } − limn x2n and 1 = limn x2n+1 are subsequential limits of (xn). We show, for example, →∞1 →∞ 1 that 3 is not a limit point. Indeed, for n 4, there exists a small neighborhood of 3 which has ≥ 1 no intersection with U 1 (1) and U 1 ( 1). Hence, 3 is not a limit point. n 4 4 − (b) x = n 5 , where [x] denotes the least integer less than or equal to x ([π]=[3]=3, n − 5 [ 2.8] = 3, [1h /2]i = 0). (xn) = (1, 2, 3, 4, 0, 1, 2, 3, 4, 0,...); the set of limit points is − − 0, 1, 2, 3, 4 { } (c) One can enumerate the rational numbers in (0, 1) in the following way. 1 2 , x1, 1 2 3 , 3 , x2, x3 1 2 3 4 , 4 , 4 , x4, x5, x6, ··· ··· The set of limit points is the whole interval [0, 1] since in any neighborhood of any real number there is a rational number, see Proposition1.11(b) 1.11 (b). Any rational number of (0, 1) appears infinitely often in this sequence, namely as p = 2p = 3p = . q 2q 3q ··· (d) xn = n has no limit point. Since (xn) is not bounded, Bolzano-Weierstraß fails to apply.

Definition 2.6 (a) Let (xn) be a bounded sequence and A its set of limit points. Then sup A is called the upper limit of (xn) and inf A is called the lower limit of (xn). We write

lim xn, and = lim xn. n n →∞ →∞ for the upper and lower limits of (xn), respectively. (b) If (x ) is not bounded above, we write lim x = + . If moreover + is the only limit n n ∞ ∞ point, lim x = + , and we can also write lim x = + . If (x ) is not bounded below, n ∞ n ∞ n lim x = . n −∞

Proposition 2.13 Let (xn) be a bounded sequence and A the set of limit points of (xn). Then lim xn and lim xn are also limit points of (xn).

Proof. Let x = lim x . Let ε > 0. By the definition of the supremum of A there exists x′ A n ∈ with ε x < x′ < x. − 2 Since x′ is a limit point, U ε (x′) contains infinitely many elements x . By construction, 2 k ε U ε (x′) Uε(x). Indeed, x U ε (x′) implies x x′ < and therefore 2 ⊆ ∈ 2 | − | 2 ε ε x x = x x′ + x′ x x x′ + x′ x < + = ε. | − | | − − |≤| − | | − | 2 2

Hence, x is a limit point, too. The proof for lim xn is similar.

Proposition 2.14 Let b Ê be fixed. Suppose (x ) is a sequence which is bounded above, then ∈ n xn b for all but finitely many n implies ≤ (2.6) lim xn b. n →∞ ≤ 54 2 Sequences and Series

Similarly, if (xn) is bounded below, then

xn b for all but finitely many n implies ≥ (2.7) lim xn b. n ≥ →∞

Proof. We prove only the first part for lim xn. Proving statement for lim xn is similar. Let t := lim x . Suppose to the contrary that t > b. Set ε = (t b)/2, then U (t) contains n − ε infinitely many x (t is a limit point) which are all greater than b; this contradicts x b for n n ≤ all but finitely many n. Hence lim x b. n ≤

Applying the first part to b = sup x and noting that inf A sup A, we have n{ n} ≤

inf xn lim xn lim xn sup xn . n { } ≤ n ≤ n ≤ n { } →∞ →∞ The next proposition is a converse statement to Proposition 2.9.

Proposition 2.15 Let (xn) be a bounded sequence with a unique limit point x. Then (xn) converges to x.

Proof. Suppose to the contrary that (xn) diverges; that is, there exists some ε > 0 such that infinitely many xn are outside Uε(x). We view these elements as a subsequence (yk) := (xnk ) of (xn). Since (xn) is bounded, so is (yk). By Proposition2.12 there exists a limit point y of (y ) which is in turn also a limit point of (x ). Since y U (x), y = x is a second limit point; k n 6∈ ε 6 a contradiction! We conclude that (xn) converges to x.

Note that t := lim xn is uniquely characterized by the following two properties. For every ε> 0

t ε

(See also homework 6.2) Let us consider the above examples.

n 1 Example 2.7 (a) xn =( 1) − +1/n; lim xn = 1, lim xn =1. n − − (b) x = n 5 , lim x =0, lim x =4. n − 5 n n (c) (xn) is the sequenceh i of rational numbers of (0, 1); lim xn =0, lim xn =1. (d) x = n; lim x = lim x =+ . n n n ∞

Proposition 2.16 If s t for all but finitely many n, then n ≤ n

lim sn lim tn, lim sn lim tn. n ≤ n n ≤ n →∞ →∞ →∞ →∞ 2.2 Cauchy Sequences 55

Proof. (a) We keep the notations s∗ and t∗ for the upper limits of (sn) and (tn), respectively. Set s = lim sn and t = lim tn. Let ε> 0. By homework 6.3 (a)

s ε s for all but finitely many n − ≤ n = s ε sn tn for all but finitely many n by assumption⇒ − ≤ ≤

= s ε lim tn by Prp.⇒ 2.14 − ≤ = sup s ε ε> 0 t by first Remark in⇒ Subsection 1.1.7 { − | } ≤ s t. ≤ (b) The proof for the lower limit follows from (a) and sup E = inf( E). − −

2.2 Cauchy Sequences

The aim of this section is to characterize convergent sequences without knowing their limits.

Definition 2.7 A sequence (xn) is said to be a Cauchy sequence if:

For every ε > 0 there exists a positive integer n such that x x < ε for all 0 | n − m | m, n n . ≥ 0

The definition makes sense in arbitrary metric spaces. The definition is equivalent to Æ ε> 0 n Æ n n k : x x < ε. ∀ ∃ 0 ∈ ∀ ≥ 0 ∀ ∈ | n+k − n | Lemma 2.17 Every convergent sequence is a Cauchy sequence.

Proof. Let x x. To ε > 0 there is n Æ such that n n implies x U (x). By n → 0 ∈ ≥ 0 n ∈ ε/2 triangle inequality, m, n n implies ≥ 0 x x x x + x x ε/2+ ε/2= ε. | n − m |≤| n − | | m − | ≤

Hence, (xn) is a Cauchy sequence.

Proposition 2.18 (Cauchy convergence criterion) A real sequence is convergent if and only if it is a Cauchy sequence.

Proof. One direction is Lemma2.17. We prove the other direction. Let (xn) be a Cauchy sequence. First we show that (xn) is bounded. To ε = 1 there is a positive integer n0 such that m, n n implies x x < 1. In particular x x < 1 for all n n ; hence ≥ 0 | m − n | | n − n0 | ≥ 0 x < 1+ x . Setting | n | | n0 |

C = max x1 , x2 ,..., xn0 1 , xn0 +1 , {| | | | | − | | | } 56 2 Sequences and Series

x

(xn ) converging to x. We will show that limn xn = x. Let ε > 0. Since xn x we find k →∞ k → k Æ such that k k implies x x < ε/2. Since (x ) is a Cauchy sequence, there 0 ∈ ≥ 0 | nk − | n exists n Æ such that m, n n implies x x < ε/2. Put n := max n , n and 0 ∈ ≥ 0 | n − m | 1 { 0 k0 } choose k with n n n . Then n n implies 1 k1 ≥ 1 ≥ k0 ≥ 1

x xn x xn + xn xn < 2 ε/2= ε. | − | ≤ − k1 k1 − ·

Example 2.8 (a) x = n 1 =1+ 1 + 1 + + 1 . We show that (x ) is not a Cauchy n k=1 k 2 3 ··· n n sequence. For, consider P

2m 1 2m 1 1 1 x x = = m = . 2m − m k ≥ 2m · 2m 2 k=Xm+1 k=Xm+1 Hence, there is no n such that p, n n implies x x < 1 . 0 ≥ 0 | p − n | 2 n ( 1)k+1 (b) x = − =1 1/2+1/3 + +( 1)n+11/n. Consider n k − − ··· − Xk=1 1 1 1 1 x x =( 1)n + +( 1)k+1 n+k − n − n +1 − n +2 n +3 −··· − n + k   1 1 1 1 =( 1)n + + − n +1 − n +2 n +3 − n +4 ···     1 1 n+k 1 n+k , k even + − − 1 (n+k , k odd

Since all summands in parentheses are positive, we conclude

1 1 1 1 1 1 n+k 1 n+k , k even xn+k xn = + + + − − | − | n +1 − n +2 n +3 − n +4 ··· 1 , k odd     (n+k  1 1 1 1 , k even = + + n+k n +1 − n +2 − n +3 ··· 1 1 " ( n+k 1 n+k , k even   − − 1 x x < ,  | n+k − n | n +1 since all summands in parentheses are positive. Hence, (xn) is a Cauchy sequence and con- verges. 2.3 Series 57

2.3 Series

Definition 2.8 Given a sequence (an), we associate with (an) a sequence (sn), where

n s = a = a + a + + a . n k 1 2 ··· n Xk=1 For (sn) we also use the symbol

∞ ak, (2.8) Xk=1 and we call it an infinite series or just a series. The numbers sn are called the partial sums of the series. If (sn) converges to s, we say that the series converges, and write

∞ ak = s. Xk=1 The number s is called the sum of the series.

Remarks 2.1 (a) The sum of a series should be clearly understood as the limit of the sequence of partial sums; it is not simply obtained by addition.

(b) If (sn) diverges, the series is said to be divergent.

(c) The symbol k∞=1 ak means both, the sequence of partial sums as well as the limit of this sequence (if it exists). Sometimes we use series of the form ∞ ak, k0 Æ. We simply P k=k0 ∈ write a if there is no ambiguity about the bounds of the index k. k P ExampleP 2.9 (Example 2.8 continued) ∞ 1 (1) is divergent. This is the harmonic series. n n=1 X ∞ 1 (2) ( 1)n+1 is convergent. It is an example of an alternating series (the summands are − n n=1 changingX their signs, and the absolute value of the summands form a decreasing to 0 sequence). ∞ n n 1 ∞ (3) q is called the geometric series. It is convergent for q < 1 with 0 q = 1 q . This | | − n=0 X n 1 qn+1 P is seen from qk = − , see proof of Lemma1.14, first formula with y = 1, x = q. 1 q k=0 − The series divergesX for q 1. The general formula in case q < 1 is | | ≥ | | ∞ cqn0 cqn = . (2.9) 1 q n=n X0 − 2.3.1 Properties of

Lemma 2.19 (1) If ∞ a is convergent, then ∞ a is convergent for any m Æ. n=1 n k=m k ∈ (2) If a is convergent, then the sequence r := ∞ a tends to 0 as n . n P n P k=n+1 k →∞ P P 58 2 Sequences and Series

(3) If (an) is a sequence of nonnegative real numbers, then an converges if and only if the partial sums are bounded. P

Proof. (1). Suppose that ∞ an = s; we show that ∞ ak = s (a1 + a2 + + am 1). n=1 n=m − ··· − Indeed, let (sn) and (tn) denote the nth partial sums of k∞=1 ak and k∞=m ak, respectively. P m 1 P Then for n > m one has t = s − a . Taking the limit n proves the claim. n n − k=1 k P →∞ P We prove (2). Suppose that ∞ a converges to s. By (1), r = ∞ a is also a convergent 1 n P n k=n+1 k series for all n. We have P P n ∞ ∞ ak = ak + ak Xk=1 Xk=1 k=Xn+1 = s = s + r ⇒ n n = r = s s ⇒ n − n = lim rn = s s =0. n ⇒ →∞ −

(3) Suppose a 0, then s = s + a s . Hence, (s ) is an increasing sequence. By n ≥ n+1 n n+1 ≥ n n Proposition 2.8, (sn) converges. The other direction is trivial since every convergent sequence is bounded.

Proposition 2.20 (Cauchy criterion) an converges if and only if for every ε> 0 there is an

integer n Æ such that 0 ∈ P n

ak <ε (2.10)

k=m X if m, n n0. ≥ n Proof. Clear from Proposition2.18. Consider the sequence of partial sums sn = k=1 ak and n note that for n m one has sn sm 1 = ak . ≥ | − − | | k=m | P P

Corollary 2.21 If an converges, then (an) converges to 0.

Proof. Take m = nPin (2.10); this yields a <ε. Hence (a ) tends to 0. | n | n

Proposition 2.22 (Comparison test) (a) If a Cb for some C > 0 and for almost all | n | ≤ n n Æ, and if b converges, then a converges. ∈ n n (b) If a Cd 0 for some C > 0 and for almost all n, and if d diverges, then a n ≥ Pn ≥ P n n diverges. P P 2.3 Series 59

Proof. (a) Suppose n n implies a Cb . Given ε > 0, there exists n n such that ≥ 1 | n | ≤ n 0 ≥ 1 m, n n implies ≥ 0 n ε b < k C kX=m by the Cauchy criterion. Hence

n n n

ak ak Cbk < ε, ≤ | | ≤ k=m k=m k=m X X X and (a) follows by the Cauchy criterion. (b) follows from (a), for if an converges, so must dn. P P

2.3.2 Operations with Convergent Series

Definition 2.9 If an and bn are series, we define sums and differences as follows

an bn := (an bn) and c an := can, c Ê. ± n P ± P ∈ Let cn := akbn k+1, then cn is called the Cauchy product of an and bn. P P k=1 P − P P P P P P If an and bn are convergent, it is easy to see that 1∞(an + bn) = 1∞ an + 1∞ bn and n ca = c n a . P1 n P1 n P P P Caution, the product series c need not to be convergent. Indeed, let a := b := ( 1)n/√n. P P n n n − One can show that an and bn are convergent (see Proposition 2.29 below), however, cn P n is not convergent, when cn = k=1 akbn k+1. Proof: By the arithmetic-geometric mean in- P 1P 2 − n 2 2n P equality, akbn k+1 = . Hence, cn = . Since cn doesn’t − √k(n+1 k) n+1 k=1 n+1 n+1 | | −P ≥ | | ≥ converge to 0 as n , ∞ c diverges by Corollary 2.21 →∞ n=0 n P P 2.3.3 Series of Nonnegative Numbers

Proposition 2.23 (Compression Theorem) Suppose a a 0. Then the series 1 ≥ 2 ≥ ··· ≥ ∞ an converges if and only if the series n=1 X

∞ k 2 a k = a +2a +4a +8a + (2.11) 2 1 2 4 8 ··· Xk=0 converges.

Proof. By Lemma2.19(3) it suffices to consider boundedness of the partial sums. Let

s = a + + a , n 1 ··· n k t = a +2a + +2 a k . k 1 2 ··· 2 60 2 Sequences and Series

For n< 2k

sn a1 +(a2 + a3)+ +(a2k + + a2k+1 1) ≤ ··· ··· − k s a +2a + +2 a k = t . (2.12) n ≤ 1 2 ··· 2 k On the other hand, if n> 2k,

s a + a +(a + a )+ +(a k−1 + + a k ) n ≥ 1 2 3 4 ··· 2 +1 ··· 2 1 k 1 s a + a +2a + +2 − a k n ≥ 2 1 2 4 ··· 2 1 s t . (2.13) n ≥ 2 k

By (2.12) and (2.13), the sequences sn and tk are either both bounded or both unbounded. This completes the proof.

∞ 1 Example 2.10 (a) converges if p> 1 and diverges if p 1. np ≤ n=1 If p 0, divergenceX follows from Corollary 2.21. If p > 0 Proposition2.23 is applicable, and ≤ we are led to the series k ∞ ∞ k 1 1 2 kp = p 1 . 2 2 − Xk=0 Xk=0   1 p 1 This is a geometric series wit q = . It converges if and only if 2 − > 1 if and only if 2p 1 p> 1. − (b) If p> 1,

∞ 1 (2.14) n(log n)p n=2 X converges; if p 1, the series diverges. “log n” denotes the logarithm to the base e. 1 ≤ 1 If p< 0, n(log n)p > n and divergence follows by comparison with the harmonic series. Now let p> 0. By Lemma1.23(b), log n< log(n +1). Hence (n(log n)p) increases and 1/(n(log n))p decreases; we can apply Proposition2.23 to (2.14). This leads us to the series

∞ 1 ∞ 1 1 ∞ 1 2k = = , · 2k(log 2k)p (k log 2)p (log 2)p kp Xk=1 Xk=1 Xk=1 and the assertion follows from example (a). ∞ 1 This procedure can evidently be continued. For instance diverges, whereas n log n log log n n=3 X ∞ 1 converges. n log n(log log n)2 n=3 X 2.3 Series 61

2.3.4 The Number e Leonhard Euler (Basel 1707 – St. Petersburg 1783) was one of the greatest mathematicians. He made contributions to Number Theorie, Ordinary Differential Equations, Calculus of Varia- 2n tions, Astronomy, Mechanics. Fermat (1635) claimed that all numbers of the form fn =2 +1,

n Æ, are prime numbers. This is obviously true for the first 5 numbers (3, 5, 17, 257, 65537). ∈ Euler showed that 641 232 + 1. In fact, it is open whether any other element f is a | n prime number. Euler showed that the equation x3 + y3 = z3 has no solution in positive in- tegers x, y, z. This is the special case of Fermat’s last theorem. It is known that the limit 1 1 1 γ = lim 1+ + + + log n exists and gives a finite number γ, the so called n 2 3 ··· n − →∞   Euler constant. It is not known whether γ is rational or not. Further, Euler numbers Er play a role in calculating the series ( 1)n 1 . Soon, we will speak about the Euler formula n − (2n+1)r eix = cos x + i sin x. More about live and work of famous mathematicians is to be found in P www-history.mcs.st-andrews.ac.uk/ We define ∞ 1 e := , (2.15) n! n=0 X where 0!=1!=1 by definition. Since 1 1 1 s =1+1+ + + + n 1 2 1 2 3 ··· 1 2 n 1 · 1 · · 1 · ··· < 1+1+ + 2 + + n 1 < 3, 2 2 ··· 2 − 1 the series converges (by the comparing it with the geometric series with q = 2 ) and the defi- nition makes sense. In fact, the series converges very rapidly and allows us to compute e with great accuracy. It is of interest to note that e can also defined by means of another limit process. e is called the Euler number.

Proposition 2.24 1 n e = lim 1+ . (2.16) n n →∞   Proof. Let n 1 1 n s = , t = 1+ . n k! n n Xk=0   By the binomial theorem, 1 n(n 1) 1 n(n 1)(n 2) 1 n(n 1) 1 1 t =1+ n + − + − − + + − ··· n n 2! · n2 3! · n3 ··· n! · nn 1 1 1 1 2 =1+1+ 1 + 1 1 + 2! − n 3! − n − n ···       1 1 2 n 1 + 1 1 1 − n! − n − n ··· − n       1 1 1 1+1+ + + + ≤ 2! 3! ··· n! 62 2 Sequences and Series

Hence, t s , so that by Proposition2.16 n ≤ n

lim tn lim sn = lim sn = e. (2.17) n n n →∞ ≤ →∞ →∞ Next if n m, ≥ 1 1 1 1 m 1 t 1+1+ 1 + + 1 1 − . n ≥ 2! − n ··· m! − n ··· − n       Let n , keeping m fixed again by Proposition2.16 we get →∞ 1 1 lim tn 1+1+ + + = sm. n ≥ 2! ··· m! →∞ Letting m , we finally get →∞

e lim tn. (2.18) ≤ n →∞ The proposition follows from (2.17) and (2.18). The rapidity with which the series 1/n! converges can be estimated as follows. 1 1 P e s = + + − n (n + 1)! (n + 2)! ··· 1 1 1 1 1 1 < 1+ + + = = (n + 1)! n +1 (n + 1)2 ··· (n + 1)! · 1 1 n! n   − n+1 so that 1 0 < e s < . (2.19) − n n! n We use the preceding inequality to compute e. For n =9 we find 1 1 1 1 1 1 1 1 s =1+1+ + + + + + + + =2.718281526... 9 2 6 24 120 720 5040 40, 320 362, 880 (2.20) By (2.19) 3.1 e s < − 9 107 such that the first six digits of e in (2.20) are correct.

Example 2.11 1 n n 1 n 1 1 1 (a) lim 1 = lim − = lim n n = lim n 1 = . n − n n n n n 1 − 1 e →∞   →∞   →∞ n 1 →∞ 1+ n 1 1+ n 1 − − − 3n +1 4n 3n +1 4n 3n 4n (b) lim = lim lim   n n n →∞ 3n 1 →∞ 3n · →∞ 3n 1  −    4  −  4 3n 3 3n 3 1 1 8 = lim 1+ lim 1+ = e 3 . n 3n · n 3n 1 →∞   ! →∞  −  ! 2.3 Series 63

Proposition 2.25 e is irrational. Proof. Suppose e is rational, say e= p/q with positive integers p and q. By (2.19) 1 0 < q!(e s ) < . (2.21) − q q By our assumption, q!e is an integer. Since 1 1 q!s = q! 1+1+ + + q 2! ··· q!   is also an integer, we see that q!(e s ) is an integer. Since q 1, (2.21) implies the existence − q ≥ of an integer between 0 and 1 which is absurd.

2.3.5 The Root and the Ratio Tests

n Theorem 2.26 (Root Test) Given an, put α = lim an . n | | Then →∞ P p

(a) if α< 1, an converges; (b) if α> 1, a diverges; P n (c) if α =1, the test gives no information. P

Proof. (a) If α< 1 choose β such that α<β< 1, and an integer n0 such that

n a < β | n | for n n (such n exists since α is thep supremum of the limit set of ( n a )). That is, ≥ 0 0 | n | n n implies ≥ 0 p a < βn. | n | n Since 0 <β< 1, β converges. Convergence of an now follows from the comparison test. P P (b) If α> 1 there is a subsequence (a ) such that nk a α. Hence a > 1 for infinitely nk | nk |→ | n | many n, so that the necessary condition for convergence, an 0, fails. 1 1 p → To prove (c) consider the series and . For each of the series α = 1, but the first n n2 diverges, the second converges. X X

Remark. (a) a converges, if there exists q < 1 such that n a q for almost all n. n | n | ≤ (b) a diverges if a 1 for infinitely many n. n P | n | ≥ p

TheoremP 2.27 (Ratio Test) The series an a (a) converges if lim n+1 < P1, n a →∞ n

an+1 (b) diverges if 1 for all but finitely many n. a ≥ n

64 2 Sequences and Series

In place of (b) one can also use the (weaker) statement an+1 (b’) an diverges if lim > 1. n an →∞ P an+1 Indeed, if (b’) is satisfied, almost all elements of the sequence are 1. a ≥ n

Corollary 2.28 The series an Pa (a) converges if lim n+1 < 1, a n a (b) diverges if lim n+1 > 1. a n

Proof of Theorem2.27. If condition (a) holds, we can find β < 1 and an integer m such that n m implies ≥ a n+1 < β. a n

In particular,

a < β a , | m+1 | | m | a < β a < β2 a , | m+2 | | m+1 | | m | . . . a < βp a . | m+p | | m | That is, a a < | m | βn | n | βm for n m, and (a) follows from the comparison test, since βn converges. If a a ≥ | n+1 |≥| n | for n n , it is seen that the condition a 0 does not hold, and (b) follows. ≥ 0 n → P

Remark 2.2 Homework7.5 shows that in (b) “all but finitely many” cannot be replaced by the weaker assumption “infinitely many.”

2 n Example 2.12 (a) The series ∞ n /2 converges since, if n 3, n=0 ≥ a (n +P 1)22n 1 1 2 1 1 2 8 n+1 = = 1+ 1+ = < 1. a 2n+1n2 2 n ≤ 2 3 9 n    

(b) Consider the series

1 1 1 1 1 1 1 +1+ + + + + + + 2 8 4 32 16 128 64 ··· 1 1 1 1 1 1 = + + + + + + 21 20 23 22 25 24 ··· 2.3 Series 65

a 1 a n+1 n+1 n 1 2n 2 where lim = , lim = 2, but lim √an = . Indeed, a2n = 1/2 − and a2n+1 = n 2 n an 8 →∞ an 1/22n+1→∞yields a 1 a 2n+1 = , 2n =2. a2n 8 a2n 1 − The root test indicates convergence; the ratio test does not apply. 1 1 (c) For and both the ratio and the root test do not apply since both (a /a ) and n n2 n+1 n n (√an) convergeX toX1. The ratio test is frequently easier to apply than the root test. However, the root test has wider scope.

Remark 2.3 For any sequence (cn) of positive real numbers,

cn+1 n n cn+1 lim lim √cn lim √cn lim . n cn ≤ n ≤ n ≤ n cn →∞ →∞ →∞ →∞ c n+1 n For the proof, see [Rud76, 3.37 Theorem]. In particular, if lim exists, then lim √cn also cn exists and both limits coincide.

Proposition 2.29 (Leibniz criterion) Let bn be an alternating serie, that is bn = ( 1)n+1a with a decreasing sequence of positive numbers a a 0. If lim a =0 − n P 1 ≥ 2 ≥···≥ P n then b converges. P n

ProofP. The proof is quite the same as in Example2.8(b). We find for the partial sums sn of bn s s a P | n − m | ≤ m+1 if n m. Since (a ) tends to 0, the Cauchy criterion applies to (s ). Hence, b is ≥ n n n convergent. P

2.3.6 Absolute Convergence

The series a is said to converge absolutely if the series a converges. n | n | P P Proposition 2.30 If an converges absolutely, then an converges.

Proof. The assertionP follows from the inequality P

n n

ak ak ≤ | | k=m k=m X X plus the Cauchy criterion. 66 2 Sequences and Series

Remarks 2.4 For series with positive terms, absolute convergence is the same as convergence. If a converges but a diverges, we say that a converges nonabsolutely. For in- n | n | n stance ( 1)n+1/n converges nonabsolutely. The comparison test as well as the root and P − P P the ratio tests, is really a test for absolute convergence and cannot give any information about P nonabsolutely convergent series. We shall see that we may operate with absolutely convergent series very much as with finite sums. We may multiply them, we may change the order in which the additions are carried out without effecting the sum of the series. But for nonabsolutely convergent sequences this is no longer true and more care has to be taken when dealing with them. Without proof we mention the fact that one can multiply absolutely convergent series; for the proof, see [Rud76, Theorem 3.50].

∞ ∞ Proposition 2.31 If an converges absolutely with an = A, bn converges, bn = B, n=0 n=0 n X X P ∞ P cn = akbn k, n Z+. Then cn = AB. − ∈ n=0 Xk=0 X 2.3.7 Decimal Expansion of Real Numbers Proposition 2.32 (a) Let α be a real number with 0 α < 1. Then there exists a sequence ≤ (a ), a 0, 1, 2,..., 9 such that n n ∈{ } ∞ n α = an 10− . (2.22) n=1 X The sequence (an) is called a decimal expansion of α. (b) Given a sequence (a ), a 0, 1,..., 9 , then there exists a real number α [0, 1] such k k ∈ { } ∈ that ∞ n α = an 10− . n=1 X Proof. (b) Comparison with the geometric series yields

∞ ∞ n n 9 1 a 10− 9 10− = =1. n ≤ 10 · 1 1/10 n=1 n=1 X X − n Hence the series ∞ a 10− converges to some α [0, 1]. n=1 n ∈ (a) Given α [0, 1) we use induction to construct a sequence (a ) with (2.22) and ∈ P n n n k s α

Suppose a1,...,an are already constructed and

n s α

Remarks 2.5 (a) The proof shows that any real number α [0, 1) can be approximated by ∈ rational numbers. (b) The construction avoids decimal expansion of the form α = ...a9999 . . . , a< 9, and gives instead α = . . . (a +1)000 . . . . It gives a bijective correspondence between the real numbers of the interval [0, 1) and the sequences (a ), a 0, 1,..., 9 , not ending with nines. However, n n ∈{ } the sequence (a )=(0, 1, 9, 9, ) corresponds to the real number 0.02. n ··· (c) It is not difficult to see that α [0, 1) is rational if and only if there exist positive integers ∈ n and p such that n n implies a = a —the decimal expansion is periodic from n on. 0 ≥ 0 n n+p 0

2.3.8 Complex Sequences and Series Almost all notions and theorems carry over from real sequences to complex sequences. For example

A sequence (zn) of complex numbers converges to z if for every (real) ε> 0 there

exists a positive integer n Æ such that n n implies 0 ∈ ≥ 0 z z < ε. | − n |

The following proposition shows that convergence of a complex sequence can be reduced to the convergence of two real sequences.

Proposition 2.33 The complex sequence (zn) converges to some complex number z if and only if the real sequences ( Re zn) converges to Re z and the real sequence ( Im zn) converges to Im z.

Proof. Using the (complex) limit law lim(zn + c) = c + lim zn it is easy to see that we can restrict ourselves to the case z = 0. Suppose first z 0. Proposition1.20(d) gives n → Re z z . Hence Re z tends to 0 as n . Similarly, Im z z and therefore | n |≤| n | n →∞ | n |≤| n | Im z 0. n → Suppose now x := Re z 0 and y := Im z 0 as n goes to infinity. Since n n → n n → 68 2 Sequences and Series

z 2 = x2 + y2, z 2 0 as n ; this implies z 0. | n | n n | n | → →∞ n →

Since the complex field C is not an ordered field, all notions and propositions where the order is involved do not make sense for complex series or they need modifications. The sandwich theorem does not hold; there is no notion of monotonic sequences, upper and lower limits. But still there are bounded sequences ( z C), limit points, subsequences, Cauchy sequences, | n | ≤ series, and absolute convergence. The following theorems are true for complex sequences, too: Proposition/Lemma/Theorem 1, 2, 3, 9, 10, 12, 15,17,18

The Bolzano–Weierstraß Theorem for bounded complex sequences (zn) can be proved by con- sidering the real and the imaginary sequences ( Re zn) and ( Im zn) separately. The comparison test for series now reads:

(a) If a C b for some C > 0 and for almost all n Æ, and if b | n | ≤ | n | ∈ | n | converges, then a converges. n P (b) If a C d for some C > 0 and for almost all n, and if d diverges, | n | ≥ P| n | | n | then a diverges. n P The CauchyP criterion, the root, and the ratio tests are true for complex series as well. Proposi- tions 19,20,26,27,28,30,31 are true for complex series.

2.3.9

Definition 2.10 Given a sequence (cn) of complex numbers, the series

∞ n cn z (2.23) n=0 X is called a power series. The numbers cn are called the coefficients of the series; z is a complex number. In general, the series will converge or diverge, depending on the choice of z. More precisely, with every power series there is associated a circle with center 0, the circle of convergence, such that (2.23) converges if z is in the interior of the circle and diverges if z is in the exterior. The radius R of this disc of convergence is called the radius of convergence. On the disc of convergence, a power series defines a function since it associates to each z with n z < R a complex number, namely the sum of the numerical series n cnz . For example, | | n 1 ∞ n=0 z defines the function f(z) = 1 z for z < 1. If almost all coefficients cn are 0, say − | | P cn = 0 for all n m +1, the power series is a finite sum and the corresponding function is a P ≥ n m n 2 m polynomial: ∞ c z = c z = c + c z + c z + + c z . n=0 n n=0 n 0 1 2 ··· m n Theorem 2.34P Given a powerP series cnz , put

P n 1 α = lim cn , R = . (2.24) n →∞ | | α If α = 0, R = + ; if α = + , R = 0. Thenp c zn converges if z < R, and diverges if ∞ ∞ n | | z > R. | | P 2.3 Series 69

The behavior on the circle of convergence cannot be described so simple. n Proof. Put an = cnz and apply the root test:

n n z lim an = z lim cn = | |. n n →∞ | | | | →∞ | | R p p This gives convergence if z < R and divergence if z > R. | | | |

The nonnegative number R is called the radius of convergence.

m m Example 2.13 (a) The series n=0 cnz has cn = 0 for almost all n. Hence α = n limn cn = limn 0=0 and R =+ . →∞ | | →∞ P ∞ (b) The series nnzn has R =0. p zn (c) The series has R = + . (In this case the ratio test is easier to apply than the root P n! ∞ test. Indeed, X c n! 1 α = lim n+1 = lim = lim =0, n c n (n + 1)! n n +1 →∞ n →∞ →∞ and therefore R =+ . ) ∞ (d) The series zn has R = 1. If z = 1 diverges since (zn) does not tend to 0. This | | generalizes the geometric series; formula (2.9) still holds if q < 1: P | | n ∞ i 2(i/3)2 3 + i 2 = = . 3 1 i/3 − 15 n=2 X   − (e) The series zn/n has R =1. It diverges if z =1. It converges for all other z with z =1 | | (without proof). P (f) The series zn/n2 has R = 1. It converges for all z with z = 1 by the comparison test, | | since zn/n2 =1/n2. | |P

2.3.10 Rearrangements The generalized associative law for finite sums says that we can insert brackets without effecting the sum, for example, ((a1 + a2)+(a3 + a4)) = (a1 +(a2 +(a3 + a4))). We will see that a similar statement holds for series:

Suppose that k ak is a converging series and l bl is a sum obtained from k ak by “inserting brackets”, for example P P P b + b + b + =(a + a ) +(a + + a ) +(a + a ) + 1 2 3 ··· 1 2 3 ··· 10 11 12 ··· b1 b2 b3

Then b converges and the sum| is{z the} same.| If {z a diverges} | to{z + } , the same is true l l k k ∞ for b . However, divergence of a does not imply divergence of b in general, since Pl l k P l 1 1+1 1+1 1+ diverges but (1 1) + (1 1) + converges. For the proof −P −n − ···m P − − ··· P let sn = k=1 and tm = l=1 bl. By construction, tm = snm for a suitable subsequence (snm ) P P 70 2 Sequences and Series

of the partial sums of k ak. Convergence (proper or improper) of (sn) implies convergence (proper or improper) of any subsequence. Hence, b converges. P l l For finite sums, the generalized commutative law holds: P

a1 + a2 + a3 + a4 = a2 + a4 + a1 + a3; that is, any rearrangement of the summands does not effect the sum. We will see in Exam- ple2.14 below that this is not true for arbitrary series but for absolutely converging ones, (see

Proposition 2.36 below). Æ Definition 2.11 Let σ : Æ be a bijective mapping, that is in the sequence (σ(1), σ(2),... ) → every positive integer appears once and only once. Putting

an′ = aσ(n), (n =1, 2,... ), we say that an′ is a rearrangement of an.

If (sn) and (Psn′ ) are the partial sums of P an and a rearrangement an′ of an, it is easily seen that, in general, these two sequences consist of entirely different numbers. We are led to P P P the problem of determining under what conditions all rearrangements of a convergent series will converge and whether the sums are necessarily the same.

Example 2.14 (a) Consider the convergent series

∞ ( 1)n+1 1 1 − =1 + + (2.25) n − 2 3 − ··· n=1 X and one of its rearrangements

1 1 1 1 1 1 1 1 1 + + + . (2.26) − 2 − 4 3 − 6 − 8 5 − 10 − 12 ··· If s is the sum of (2.25) then s> 0 since

1 1 1 1 + + > 0. − 2 3 − 4 ···    

We will show that (2.26) converges to s′ = s/2. Namely

1 1 1 1 1 1 1 1 s′ = a′ = 1 + + + n − 2 − 4 3 − 6 − 8 5 − 10 − 12 ···       X 1 1 1 1 1 1 = + + + 2 − 4 6 − 8 10 − 12 ··· 1 1 1 1 1 = 1 + + = s 2 − 2 3 − 4 −··· 2  

Since s =0, s′ = s. Hence, there exist rearrangements which converge; however to a different 6 6 limit. 2.3 Series 71

(b) Consider the following rearrangement of the series (2.25) 1 1 1 a′ =1 + + n − 2 3 − 4 X 1 1 1 + + + 5 7 − 6   1 1 1 1 1 + + + + 9 11 13 15 − 8   + ··· 1 1 1 1 + + + + + 2n +1 2n +3 ··· 2n+1 1 − 2n +2 ···  −  Since for every positive integer n 10 ≥ 1 1 1 1 n 1 1 1 1 1 1 + + + > 2 − > > 2n +1 2n +3 ··· 2n+1 1 − 2n +2 · 2n+1 − 2n +2 4 − 2n +2 5  −  the rearranged series diverges to + . ∞ Without proof (see [Rud76, 3.54 Theorem]) we remark the following surprising theorem. It shows (together with the Proposition2.36) that the absolute convergence of a series is necessary and sufficient for every rearrangement to be convergent (to the same limit).

Proposition 2.35 Let an be a series of real numbers which converges, but not absolutely. Suppose α β + . Then there exists a rearrangement a′ with partial sums s′ −∞ ≤ ≤ P≤ ∞ n n such that P lim sn′ = α, lim sn′ = β. n n →∞ →∞

Proposition 2.36 If an is a series of complex numbers which converges absolutely, then every rearrangement of a converges, and they all converge to the same sum. P n

Proof. Let an′ be a rearrangementP with partial sums sn′ . Given ε> 0, by the Cauchy criterion

for the series a there exists n Æ such that n m n implies P | n | 0 ∈ ≥ ≥ 0 P n a < ε. (2.27) | k | kX=m Now choose p such that the integers 1, 2,...,n0 are all contained in the set σ(1), σ(2),...,σ(p).

1, 2,...,n σ(1), σ(2),...,σ(p) . { 0}⊆{ }

Then, if n p, the numbers a , a ,...,a will cancel in the difference s s′ , so that ≥ 1 2 n0 n − n n n n n

sn sn′ = ak aσ(k) ak ak < ε, | − | − ≤ ± ≤ | | k=1 k=1 k=n0+1 k=n0+1 X X X X by (2.27). Hence (sn′ ) converges to the same sum as (sn). The same argument shows that an′ also absolutely converges. P 72 2 Sequences and Series

2.3.11 Products of Series If we multiply two finite sums a + a + + a and b + b + + b by the distributive 1 2 ··· n 1 2 ··· m law, we form all products a b put them into a sequence p ,p , ,p , s = mn, and add up i j 0 1 ··· s p + p + p + + p . This method can be generalized to series a + a + and b + b + . 0 1 2 ··· s 0 1 ··· 0 1 ··· Surely, we can form all products a b , we can arrange them in a sequence p ,p ,p , and i j 0 1 2 ··· form the product series p + p + . For example, consider the table 0 1 ··· a b a b a b p p p 0 0 0 1 0 2 ··· 0 1 3 ··· a b a b a b p p p 1 0 1 1 1 2 ··· 2 4 7 ··· a2b0 a2b1 a2b2 p5 p8 p12 . . . ··· . . . ··· ...... and the diagonal enumeration of the products. The question is: under which conditions on an and b the product series converges and its sum does not depend on the arrangement of the n P products a b . P i k

Proposition 2.37 If both series k∞=0 ak and k∞=0 bk converge absolutely with A = k∞=0 ak and B = ∞ b , then any of their product series p converges absolutely and ∞ p = k=0 k P P k Pk=0 k AB. P P P Proof. For the nth partial sum of any product series n p we have k=0 | k | p + p + + p ( a + P+ a ) ( b + + b ), | 0 | | 1 | ··· | n | ≤ | 0 | ··· | m | · | 0 | ··· | m | if m is sufficiently large. More than ever,

∞ ∞ p + p + + p a b . | 0 | | 1 | ··· | n | ≤ | k |· | k | Xk=0 Xk=0 That is, any series ∞ p is bounded and hence convergent by Lemma 2.19 (c). By Propo- k=0 | k | sition 2.36 all product series converge to the same sum s = ∞ p . Consider now the very P k=0 k special product series ∞ q with partial sums consisting of the sum of the elements in the k=1 n P upper left square. Then P

q + q + + q 2 =(a + a + + a )(b + + b ). 1 2 ··· (n+1) 0 1 ··· n 0 ··· n converges to s = AB.

Arranging the elements aibj as above in a diagonal array and summing up the elements on the nth diagonal cn = a0bn + a1bn 1 + + anb0, we obtain the Cauchy product − ···

∞ ∞ cn = (a0bn + a1bn 1 + + anb0). − ··· n=0 n=0 X X

Corollary 2.38 If both series k∞=0 ak and k∞=0 bk converge absolutely with A = k∞=0 ak and B = ∞ b , their Cauchy product ∞ c converges absolutely and ∞ c = AB. k=0 k P k=0P k k=0 k P P P P 2.3 Series 73

Example 2.15 We compute the Cauchy product of two geometric series:

(1 + p + p2 + )(1 + q + q2 + )=1+(p + q)+(p2 + pq + q2)+ ··· ··· +(p3 + p2q + pq2 + q2)+ ··· p q p2 q2 p3 q3 1 ∞ = − + − + − + = (pn qn) p q p q p q ··· p q − n=1 − − − − X 1 p q 1 p(1 q) q(1 p) 1 1 = = − − − = . p <1, q <1 p q 1 p − 1 q p q (1 p)(1 q) 1 p · 1 q | | | | − − − − − − − − Cauchy Product of Power Series

In case of power series the Cauchy product is appropriate since it is again a power series (which k is not the case for other types of product series). Indeed, the Cauchy product of k∞=0 akz and k ∞ b z is given by the general element k=0 k P P n n k n k n akz bn kz − = z akbn k, − − Xk=0 Xk=0 such that ∞ ∞ ∞ a zk b zk = (a b + + a b )zn. k · k 0 n ··· n 0 n=0 Xk=0 Xk=0 X n n Corollary 2.39 Suppose that n anz and n bnz are power series with positive radius of convergence R1 and R2, respectively. Let R = min R1, R2 . Then the Cauchy product n P P { } ∞ c z , c = a b + + a b , converges absolutely for z < R and n=0 n n 0 n ··· n 0 | |

P ∞ ∞ ∞ a zn b zn = c zn, z < R. n · n n | | n=0 n=0 n=0 X X X This follows from the previous corollary and the fact that both series converge absolutely for z < R. | | Example 2.16 (a) ∞ 1 (n + 1)zn = , z < 1. (1 z)2 | | n=0 X − n 1 ∞ Indeed, consider the Cauchy product of n=0 z = 1 z , z < 1 with itself. Since n n − | | an = bn =1, cn = akbn k = 1 1= n +1, the claim follows. k=0 − k=0 P· (b) P P

∞ ∞ ∞ ∞ (z + z2) zn = (zn+1 + zn+2)= zn + zn = n=0 n=0 n=1 n=2 X X X X ∞ 2z2 z + z2 = z +2 zn = z +2z2 +2z3 +2z4 + = z + = . ··· 1 z 1 z n=2 X − − 74 2 Sequences and Series Chapter 3

Functions and Continuity

This chapter is devoted to another central notion in analysis—the notion of a continuous func- tion. We will see that sums, product, quotients, and compositions of continuous functions are

continuous. If nothing is specified otherwise D will denote a finite union of intervals.

Ê Ê Definition 3.1 Let D Ê be a subset of . A function is a map f : D . ⊂ → (a) The set D is called the domain of f; we write D = D(f). (b) If A D, f(A) := f(x) x A is called the image of A under f. ⊆ { | ∈ }

The function f↾A: A Ê given by f↾A(a)= f(a), a A, is called the restriction of f to A. → 1 ∈ (c) If B Ê, we call f − (B) := x D f(x) B the preimage of B under f. ⊂ { ∈ | ∈ } (d) The graph of f is the set graph(f) := (x, f(x)) x D . { | ∈ }

Later we will consider functions in a wider sense: From the complex numbers into complex

n m

F F Ê F C

numbers and from F into where = or = . Ê We say that a function f : D Ê is bounded, if f(D) is a bounded set of real numbers, → ⊂ i. e. there is a C > 0 such that f(x) C for all x D. We say that f is bounded above | | ≤ ∈ (resp. bounded below) if there exists C R such that f(x) < C (resp. f(x) > C) for all x in ∈ the domain of f.

Example 3.1 (a) Power series (with radius of convergence R > 0), polynomials and rational

functions are the most important examples of functions.

Ê Ê Let c Ê. Then f(x)= c, f : , is called the constant function. ∈ → (b) Properties of the functions change drastically if we change the domain or the image set.

2

Ê Ê Ê Ê Ê Ê Ê Let f : Ê , g : , k : , h: function given by x x . → → + + → + → + 7→

g is surjective, k is injective, h is bijective, f is neither injective nor surjective. Obviously, Ê f↾Ê+ = k and g↾ + = h.

n 1

Ê Ê (c) Let f(x) = ∞ x , f : ( 1, 1) Ê and h(x) = , h: 1 . Then n=0 − → 1 x \ { } → h↾( 1, 1) = f. − − P 75 76 3 Functions and Continuity

y y y

f(x)=x f(x)=|x|

c

x x

x

The graphs of the constant, the identity, and absolute value functions.

3.1 Limits of a Function

Definition 3.2 (ε-δ-definition) Let (a, b) a finite or infinite interval and x (a, b). Let

0 ∈ Ê f : (a, b) x Ê be a real-valued function. We call A the limit of f in x (“The \ { 0} → ∈ 0 limit of f(x) is A as x approaches x0”; “f approaches A near x0”) if the following is satisfied

For any ε> 0 there exists δ > 0 such that x (a, b) and 0 < x x < δ imply ∈ | − 0 | f(x) A <ε. | − | We write lim f(x)= A. x x0 →

Roughly speaking, if x is close to x0, f(x) must be closed to A.

f(x)+ε

f(x)

f(x)- ε

x- δ x x+δ

Using quantifiers lim f(x)= A reads as x x0 → ε> 0 δ > 0 x (a, b): 0 < x x < δ = f(x) A < ε. ∀ ∃ ∀ ∈ | − 0 | ⇒ | − | Note that the formal negation of lim f(x)= A is x x0 → ε> 0 δ > 0 x (a, b): 0 < x x < δ and f(x) A ε. ∃ ∀ ∃ ∈ | − 0 | | − | ≥ 3.1 Limits of a Function 77

Proposition 3.1 (sequences definition) Let f and x0 be as above. Then lim f(x)= A if and x x0 → only if for every sequence (xn) with xn (a, b), xn = x0 for all n, and lim xn = x0 we have n ∈ 6 →∞ lim f(xn)= A. n →∞

Proof. Suppose limx x0 f(x)= A, and xn x0 where xn = x0 for all n. Given ε> 0 we find → → 6 δ > 0 such that f(x) A <ε if 0 < x x < δ. Since x x , there is a positive integer | − | | − 0 | n → 0 n such that n n implies x x < δ. Therefore n n implies f(x ) A <ε. That 0 ≥ 0 | n − 0 | ≥ 0 | n − | is, limn f(xn)= A. →∞ Suppose to the contrary that the condition of the proposition is fulfilled but limx x0 f(x) = A. → 6

Then there is some ε > 0 such that for all δ = 1/n, n Æ, there is an x (a, b) such that ∈ n ∈ 0 < x x < 1/n, but f(x ) A ε. We have constructed a sequence (x ), x = x | n − 0 | | n − | ≥ n n 6 0 and xn x0 as n such that limn f(xn) = A which contradicts our assumption. → → ∞ →∞ 6 Hence lim f(x)= A. x x0 →

Example. limx 1 x +3=4. Indeed, given ε > 0 choose δ = ε. Then x 1 < δ implies → | − | (x + 3) 4 < δ = ε. | − |

3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity

Definition 3.3 (a) We are writing lim f(x)= A x x0+0 → if for all sequences (xn) with xn > x0 and lim xn = x0, we have lim f(xn)= A. Sometimes n n →∞ →∞ we use the notation f(x0 + 0) in place of lim f(x). We call f(x0 + 0) the right-hand limit x x0+0 → of f at x0 or we say “A is the limit of f as x approaches x0 from above (from the right).” Similarly one defines the left-hand limit of f at x0, lim f(x) = A with xn < x0 in place of x x0 0 x > x . Sometimes we use the notation f(x 0). → − n 0 0 − (b) We are writing lim f(x)= A x + → ∞ if for all sequences (xn) with lim xn =+ we have lim f(xn)= A. Sometimes we use the n ∞ n notation f(+ ). In a similar way→∞ we define lim f(x)=→∞A. ∞ x (c) Finally, the notions of (a), (b), and Definition3.2→−∞ still make sense in case A = + and ∞ A = . For example, −∞ lim f(x)= x x0 0 → − −∞ if for all sequences (xn) with xn < x0 and lim xn = x0 we have lim f(xn)= . n n →∞ →∞ −∞ Remark 3.1 All notions in the above definition can be given in ε-δ or ε-D or E-δ or E-D languages using inequalities. For example, lim f(x)= if and only if x x0 0 → − −∞ E > 0 δ > 0 x D(f):0 < x x < δ = f(x) < E. ∀ ∃ ∀ ∈ 0 − ⇒ − 78 3 Functions and Continuity

1 1 For example, we show that limx 0 0 = . To E > 0 choose δ = . Then 0 < x < δ = → − x −∞ E − 1 implies 0 0 D > 0 x D(f) : x > D = f(x) > E. ∀ ∃ ∀ ∈ ⇒ The proves of equivalence of ε-δ definitions and sequence definitions are along the lines of Proposition 3.1. 2 For example, limx + x =+ . To E > 0 choose D = √E. Then x > D implies x> √E; → ∞ ∞ thus x2 > E. 1 Example 3.2 (a) lim = 0. For, let ε > 0; choose D = 1/ε. Then x > D = 1/ε implies x + x 0 < 1/x<ε. This→ proves∞ the claim. (b) Consider the entier function f(x) = [x], defined in

Example2.6(b). If n Z, lim f(x) = n 1 whereas y x n 0 ∈ → − − n lim f(x)= n. f(x)=[x] x n+0 n-1 Proof→ . We use the ε-δ definition of the one-sided limits 1 to prove the first claim. Let ε > 0. Choose δ = 2 then 0 < n x < 1 implies n 1

Since the one-sided limits are different, lim f(x) does not exist. x n → Definition 3.4 Suppose we are given two functions f and g, both defined on (a, b) x . By \ { 0} f + g we mean the function which assigns to each point x = x of (a, b) the number f(x)+ 6 0 g(x). Similarly, we define the difference f g, the product fg, and the quotient f/g, with the − understanding that the quotient is defined only at those points x at which g(x) =0. 6

Proposition 3.2 Suppose that f and g are functions defined on (a, b) x , a < x < b, and \ { 0} 0 limx x0 f(x)= A, limx x0 g(x)= B, α, β Ê. Then → → ∈ (a) lim f(x)= A′ implies A′ = A. x x0 (b) lim→ (αf + βg)(x)= αA + βB; x x0 (c) lim→ (fg)(x)= AB; x x0 → f A (d) lim (x)= , if B =0. x x0 → g B 6 (e) lim f(x) = A . x x0 → | | | | Proof. In view of Proposition3.1, all these assertions follow immediately from the analogous properties of sequences, see Proposition2.3. As an example, we show (c). Let (x ) , x = x , n n 6 0 be a sequence tending to x0. By assumption, limn f(xn) = A and limn g(xn) = B. →∞ →∞ By the Propostition2.3 limn f(xn)g(xn) = AB, that is, limn (fg)(xn) = AB. By →∞ →∞ 3.1 Limits of a Function 79

Proposition 3.1, limx x0 fg(x)= AB. →

Remark 3.2 The proposition remains true if we replace (at the same time in all places) x x → 0 by x x +0, x x 0, x + , or x . Moreover we can replace A or B by + → 0 → 0 − → ∞ → −∞ ∞ or by provided the right members of (b), (c), (d) and (e) are defined. −∞ Note that + +( ), 0 , / , and A/0 are not defined. ∞ −∞ ·∞ ∞ ∞

The extended real number system consists of the real field Ê and two symbols, + and . ∞ −∞

We preserve the original order in Ê and define

for every x Ê. ∈ It is the clear that + is an upper bound of every subset of the extended real number system, ∞ and every nonempty subset has a least upper bound. If, for example, E is a set of real numbers

which is not bounded above in Ê, then sup E = + in the extended real system. Exactly the ∞ same remarks apply to lower bounds. The extended real system does not form a field, but it is customary to make the following conventions:

(a) If x is real then x x x + =+ , x = , = =0. ∞ ∞ −∞ −∞ + ∞ −∞ (b) If x> 0 then x (+ )=+ and x ( )= . · ∞ ∞ · −∞ −∞ (c) If x< 0 then x (+ )= and x ( )=+ . · ∞ −∞ · −∞ ∞ When it is desired to make the distinction between the real numbers on the one hand and the symbols + and on the other hand quite explicit, the real numbers are called finite. ∞ −∞ In Homework9.2 (a) and (b) you are invited to give explicit proves in two special cases.

Example 3.3 (a) Let p(x) and q(x) be polynomials and a Ê. Then ∈ lim p(x)= p(a). x a →

This immediately follows from limx a x = a, limx a c = c and Proposition3.2. Indeed, by (b) → → 3 2 3 and (c), for p(x)=3x 4x+7 we have limx a(3x 4x+7) = 3 (limx a x) 4 limx a x+ − → − → − → 7=3a2 4a +7 = p(a). This works for arbitrary polynomials . Suppose moreover that − q(a) =0. Then by (d), 6 p(x) p(a) lim = . x a → q(x) q(a) Hence, the limit of a rational function f(x) as x approaches a point a of the domain of f is f(a). 80 3 Functions and Continuity

p(x) r k (b) Let f(x) = q(x) be a rational function with polynomials p(x) = k=0 akx and q(x) = s b xk with real coefficients a and b and of degree r and s, respectively. Then k=0 k k k P P 0, if r s and ar > 0, → ∞  ∞ bs , if r>s and ar < 0. −∞ bs  The first two statements (r s) follow from Example3.2 (b) together with Proposition3.2. k r ≥ Namely, akx − 0 as x + provided 0 ks follow from r s → → ∞ ≤ x − + as x + and the above remark. → ∞ → ∞ Note that

lim f(x)=( 1)r+s lim f(x) x x + →−∞ − → ∞ since p( x) ( 1)ra xr + . . . a xr + . . . − = − r =( 1)r+s r . q( x) ( 1)sb xs + . . . − b xs + . . . − − s s 3.2 Continuous Functions

Definition 3.5 Let f be a function and x D(f). We say that f is continuous at x if 0 ∈ 0 ε> 0 δ > 0 x D(f) : x x < δ = f(x) f(x ) < ε. (3.1) ∀ ∃ ∀ ∈ | − 0 | ⇒ | − 0 | We say that f is continuous in A D(f) if f is continuous at all points x A. ⊂ 0 ∈

Proposition3.1 shows that the above definition of continuity in x0 is equivalent to: For all sequences (xn), xn D(f), with lim xn = x0, lim f(xn) = f(x0). In other words, f is n n ∈ →∞ →∞ continuous at x0 if lim f(x)= f(x0). x x0 →

Example 3.4 (a) In example3.2 we have seen that every polynomial is continuous in Ê and every rational functions f is continuous in their domain D(f).

f(x)= x is continuous in Ê. | |

(b) Continuity is a local property: If two functions f,g : D Ê coincide in a neighborhood → U (x ) D of some point x , then f is continuous at x if and only if g is continuous at x .

ε 0 ⊂ 0 0 0 Z (c) f(x) = [x] is continuous in Ê . If x is not an integer, then n < x < n +1 for some \ 0 0 n Æ and f(x)= n coincides with a constant function in a neighborhood x U (x ). By (b), ∈ ∈ ε 0 f is continuous at x0. If x0 = n Z, limx n[x] does not exist; hence f is not continuous at n. x2 1 ∈ → (d) f(x)= − if x =1 and f(1) = 1. Then f is not continuous at x =1 since x 1 6 0 − x2 1 lim − = lim(x +1)=2 =1= f(1). x 1 x 1 x 1 6 → − → 3.2 Continuous Functions 81

There are two reasons for a function not being continuous at x0. First, limx x0 f(x) does not → exist. Secondly, f has a limit at x0 but limx x0 f(x) = f(x0). → 6

Proposition 3.3 Suppose f,g : D Ê are continuous at x D. Then f + g and fg are also → 0 ∈ continuous at x . If g(x ) =0, then f/g is continuous at x . 0 0 6 0 The proof is obvious from Proposition3.2.

The set C(D) of continuous function on D Ê form a commutative algebra with 1.

⊂ Ê Proposition 3.4 Let f : D Ê and g : E functions with f(D) E. Suppose f is → → ⊂ continuous at a D, and g is continuous at b = f(a) E. Then the composite function ∈ ∈ ◦ g f : D Ê is continuous at a. → Proof. Let (xn) be a sequence with xn D and limn xn = a. Since f is continuous ∈ →∞ at a, limn f(xn) = b. Since g is continuous at b, limn g(f(xn)) = g(b); hence →∞ →∞ g◦f(x ) g◦f(a). This completes the proof. n →

Example 3.5 f(x)= 1 is continuous for x =0, g(x) = sin x is continuous (see below), hence, x 6 ◦ 1 (g f)(x) = sin is continuous on Ê 0 . x \ { }

3.2.1 The Intermediate Value Theorem Ê In this paragraph, [a, b] Ê is a closed, bounded interval, a, b . ⊂ ∈ The intermediate value theorem is the basis for several existence theorems in analysis. It is

again equivalent to the order completeness of Ê.

Theorem 3.5 (Intermediate Value Theorem) Let f : [a, b] Ê be a continuous function and → γ a real number between f(a) and f(b). Then there exists c [a, b] such that f(c)= γ. ∈ The statement is clear from the graphical presentation. Never- theless, it needs a proof since pictures do not prove anything. γ The statement is wrong for rational numbers. For example, let 2 D = x É 1 x 2 and f(x)= x 2. Then f(1) = 1 { ∈ | ≤ ≤ } − − a c b and f(2) = 2 but there is no p D with f(p)=0 since 2 has no ∈ rational square root. Proof. Without loss of generality suppose f(a) f(b). Starting with [a , b ] = [a, b], we suc- ≤ 1 1 cessively construct a nested sequence of intervals [a , b ] such that f(a ) γ f(b ). Asin n n n ≤ ≤ n the proof of Proposition2.12, the [an, bn] is one of the two halfintervals [an 1, m] and [m, bn 1] − − where m = (an 1 + bn 1)/2 is the midpoint of the (n 1)st interval. By Proposition2.11 the − − − monotonic sequences (an) and (bn) both converge to a common point c. Since f is continuous,

lim f(an)= f( lim an)= f(c)= f( lim bn) = lim f(bn). n n n n →∞ →∞ →∞ →∞ By Proposition2.14, f(a ) γ f(b ) implies n ≤ ≤ n

lim f(an) γ lim f(bn); n n →∞ ≤ ≤ →∞ 82 3 Functions and Continuity

Hence, γ = f(c).

Example 3.6 (a) We again show the existence of the nth root of a positive real number a > 0,

n Ê n Æ. By Example3.2, the polynomial p(x) = x a is continuous in . We find p(0) = ∈ − a< 0 and by Bernoulli’s inequality − p(1 + a)=(1+ a)n a 1+(n 1)a 1 > 0. − ≥ − ≥ Theorem3.5 shows that p has a root in the interval (0, 1+ a). (b) A polynomial p of odd degree with real coefficients has a real zero. Namely, by Example3.3, if the leading coefficient ar of p is positive, lim p(x) = and lim p(x)=+ . Hence x −∞ x ∞ there are a and b with a < b and p(a) < 0 <→−∞ p(b). Therefore, there→∞ is a c (a, b) such that ∈ p(c)=0.

There are polynomials of even degree having no real zeros. For example f(x)= x2k +1. Ê Remark 3.3 Theorem3.5 is not true for continuous functions f : É . For example, 2 → f(x) = x 2 is continuous, f(0) = 2 < 0 < 2 = f(2). However, there is no r É − − ∈ with f(r)=0.

3.2.2 Continuous Functions on Bounded and Closed Intervals—The The- orem about Maximum and Minimum

We say that f : [a, b] Ê is continuous, if f is continuous on (a, b) and f(a +0) = f(a) and → f(b 0) = f(b). −

Theorem 3.6 (Theorem about Maximum and Minimum) Let f : [a, b] Ê be continuous. → Then f is bounded and attains its maximum and its minimum, that is, there exists C > 0 with f(x) C for all x [a, b] and there exist p, q [a, b] with sup f(x) = max f(x)= f(p) | | ≤ ∈ ∈ a x b a x b ≤ ≤ ≤ ≤ and inf f(x) = min f(x)= f(q). a x b a x b ≤ ≤ ≤ ≤ Remarks 3.4 (a) The theorem is not true in case of open, half-open or infinite intervals. For

1 Ê example, f : (0, 1] Ê, f(x)= is continuous but not bounded. The function f : (0, 1) , → x → f(x) = x is continuous and bounded. However, it doesn’t attain maximum and minimum. 2 Finally, f(x)= x on Ê+ is continuous but not bounded. (b) Put M := max f(x) and m := min f(x). By the Theorem about maximum and minimum x K x K ∈ ∈ and the intermediate value theorem, for all γ Ê with m γ M there exists c [a, b] such ∈ ≤ ≤ ∈ that f(c)= γ; that is, f attains all values between m and M.

Proof. We give the proof in case of the maximum. Replacing f by f yields the proof for the − minimum. Let

A = sup f(x) Ê + . a x b ∈ ∪{ ∞} ≤ ≤ 3.3 Uniform Continuity 83

(Note that A = + is equivalent to f is not bounded above.) Then there exists a sequence ∞ (xn) [a, b] such that limn f(xn) = A. Since (xn) is bounded, by the Theoremm of ∈ →∞ Weierstraß there exists a convergent subsequence (xnk ) with p = lim xnk and a p b. Since k ≤ ≤ f is continuous,

A = lim f (xnk )= f(p). k →∞ In particular, A is a finite real number; that is, f is bounded above by A and f attains its maximum A at point p [a, b]. ∈

3.3 Uniform Continuity

Let D be a finite or infinite interval.

Definition 3.6 A function f : D Ê is called uniformly continuous if for every ε > 0 there → exists a δ > 0 such that for all x, x′ D x x′ < δ implies f(x) f(x′) <ε. ∈ | − | | − | f is uniformly continuous on [a, b] if and only if

ε> 0 δ > 0 x, y [a, b] : x y < δ = f(x) f(y) < ε. (3.2) ∀ ∃ ∀ ∈ | − | ⇒ | − | Remark 3.5 If f is uniformly continuous on D then f is continuous on D. However, the converse direction is not true. 1 Consider, for example, f : (0, 1) Ê, f(x)= which is continuous. Suppose to the contrary → x that f is uniformly continuous. Then to ε =1 there exists δ > 0 with (3.2). By the Archimedian 1 1 1 property there exists n Æ such that 2n < δ. Consider xn = n and yn = 2n . Then xn yn = 1 ∈ | − | 2n < δ. However, f(x ) f(y ) =2n n = n 1. | n − n | − ≥ A contradiction! Hence, f is not uniformly continuous on (0, 1).

Let us consider the differences between the concepts of continuity and uniform continuity. First, uniform continuity is a property of a function on a set, whereas continuity can be defined in a single point. To ask whether or not a given function is uniformly continuous at a certain point is meaningless. Secondly, if f is continuous on D, then it is possibleto find, for each ε> 0 and for each point x D, a number δ = δ(x ,ε) > 0 having the property specified in Definition3.5. 0 ∈ 0 This δ depends on ε and on x0. If f is, however, uniformly continuous on X, then it is possible, for each ε> 0 to find one δ = δ(ε) > 0 which will do for all possible points x0 of X. That the two concepts are equivalent on bounded and closed intervals follows from the next proposition.

Proposition 3.7 Let f : [a, b] Ê be a continuous function on a bounded and closed interval. → Then f is uniformly continuous on [a, b]. 84 3 Functions and Continuity

Proof. Suppose to the contrary that f is not uniformly continuous. Then there exists ε0 > 0

without matching δ > 0; for every positive integer n Æ there exists a pair of points x , x′ ∈ n n with x x′ < 1/n but f(x ) f(x′ ) ε . Since [a, b] is bounded and closed, (x ) has | n − n | | n − n | ≥ 0 n a subsequence converging to some point p [a, b]. Since x x′ < 1/n, the sequence (x′ ) ∈ | n − n | n also converges to p. Hence

lim f(xnk ) f(xn′ ) = f(p) f(p)=0 k k →∞ − −  which contradicts f(x ) f(x′ ) ε for all k. nk − nk ≥ 0

There exists an example of a bounded continuous function f : [0, 1) Ê which is not uni- → formly continuous, see [K¨on90, p. 91].

Discontinuities

If x is a point in the domain of a function f at which f is not continuous, we say f is dis- continuous at x or f has a discontinuity at x. It is customary to divide discontinuities into two types.

Definition 3.7 Let f : (a, b) Ê be a function which is discontinuous at a point x . If the → 0 one-sided limits limx x0+0 f(x) and limx x0 0 f(x) exist, then f is said to have a simple dis- → → − continuity or a discontinuity of the first kind. Otherwise the discontinuity is said to be of the second kind.

Example 3.7 (a) f(x) = sign(x) is continuous on Ê 0 since it is locally constant. More- \ { } over, f(0+0) = 1 and f(0 0) = 1. Hence, sign(x) has a simple discontinuity at x = 0. − − 0 (b) Define f(x)=0 if x is rational, and f(x)=1 if x is irrational. Then f has a discontinuity of the second kind at every point x since neither f(x + 0) nor f(x 0) exists. − (c) Define

sin 1 , if x = 0; f(x)= x 6 (0, if x =0.

Consider the two sequences

1 1 xn = π and yn = , 2 + nπ nπ

Then both sequences (xn) and (yn) approach 0 from above but limn f(xn) = 1 and →∞ limn f(yn)=0; hence f(0 + 0) does not exist. Therefore f has a discontinuity of the →∞ second kind at x = 0. We have not yet shown that sin x is a continuous function. This will be done in Section3.5.2. 3.4 Monotonic Functions 85

3.4 Monotonic Functions

Definition 3.8 Let f be a real function on the interval (a, b). Then f is said to be monotonically increasing on (a, b) if a

Theorem 3.8 Let f be a monotonically increasing function on (a, b). Then the one-sided limits f(x + 0) and f(x 0) exist at every point x of (a, b). More precisely, − sup f(t)= f(x 0) f(x) f(x +0)= inf f(t). (3.3) t (a,x) − ≤ ≤ t (x,b) ∈ ∈ Furthermore, if a

f(x + 0) f(y 0). (3.4) ≤ − Analogous results evidently hold for monotonically decreasing functions. Proof. See Appendix B to this chapter.

Proposition 3.9 Let f : [a, b] Ê be a strictly monotonically increasing continuous function → and A = f(a) and B = f(b). Then f maps [a, b] bijectively onto [A, B] and the inverse function

1

f − : [A, B] Ê → is again strictly monotonically increasing and continuous.

1 Note that the inverse function f − : [A, B] [a, b] is defined by f(y0) = x0, y0 [A, B], → ∈ 1 where x0 is the unique element of [a, b] with f(x0) = y0. However, we can think of f − as a

function into Ê. A similar statement is true for strictly decreasing functions. Proof. By Remark 3.4, f maps [a, b] onto the whole closed interval [A, B] (intermediate value theorem). Since x 0 such that x x ε for infinitely many n. Since (x ) [a, b] is 0 | n − | ≥ 0 n ⊆ bounded, there exists a converging subsequence (x ), say, x c as k . The above nk nk → → ∞ inequality is true for the limit c, too, that is c x ε . By continuity of f, x c implies | − | ≥ 0 nk → f(x ) f(c). That is u f(c). Since u u = f(x) and the limit of a converging nk → nk → n → sequence is unique, f(c) = f(x). Since f is bijective, x = c; this contradicts c x ε . | − | ≥ 0 Hence, g is continuous at u. 86 3 Functions and Continuity

n Ê Example 3.8 The function f : Ê , f(x) = x , is continuous and strictly increasing. + → + Hence x = g(y)= √n y is continuous, too. This gives an alternative proof of homework 5.5.

3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses

3.5.1 Exponential and Logarithm Functions In this section we are dealing with the exponential function which is one of the most important in analysis. We use the exponential series to define the function. We will see that this definition x is consistent with the definition e for rational x É as defined in Chapter 1. ∈

Definition 3.9 For z C put ∈ ∞ zn z2 z3 E(z)= =1+ z + + + . (3.5) n! 2 6 ··· n=0 X Note that E(0) = 1 and E(1) = e by the definition at page61. The radius of convergence of

the exponential series (3.5) is R =+ , i. e. the series converges absolutely for all z C, see ∞ ∈ Example2.13 (c). Applying Proposition2.31 (Cauchy product) on multiplication of absolutely convergent series, we obtain

n ∞ zn ∞ wm ∞ zk wn k E(z)E(w)= = − n! m! k!(n k)! n=0 m=0 n=0 k=0 − X Xn X X ∞ ∞ n 1 n k n k (z + w) = z w − = , n! k n! n=0 n=0 X Xk=0   X which gives us the important addition formula

E(z + w)= E(z)E(w), z,w C. (3.6) ∈ One consequence is that

E(z)E( z)= E(0) = 1, z C. (3.7) − ∈ This shows that E(z) = 0 for all z. By (3.5), E(x) > 0 if x> 0; hence (3.7) shows E(x) > 0 6 for all real x. Iteration of (3.6) gives

E(z + + z )= E(z ) E(z ). (3.8) 1 ··· n 1 ··· n Let us take z = = z =1. Since E(1) = e by (2.15), we obtain 1 ··· n n

E(n) = e , n Æ. (3.9) ∈ 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 87

If p = m/n, where m, n are positive integers, then

E(p)n = E(pn)= E(m) = em, (3.10) so that

p

E(p) = e , p É . (3.11) ∈ + p It follows from (3.7) that E( p) = e− if p is positive and rational. Thus (3.11) holds for all − rational p. This justifies the redefinition

x

e := E(x), x C. ∈ The notation exp(x) is often used in place of ex.

k Proposition 3.10 We can estimate the remainder term rn := k∞=n z /k! as follows 2 z n nP+1 r (z) | | if z . (3.12) | n | ≤ n! | | ≤ 2 Proof. We have

n 2 k ∞ zk z z z z r (z) = | | 1+ | | + | | + + | | + | n | ≤ k! n! n +1 (n + 1)(n + 2) ··· (n + 1) (n + k) ··· k=n ! X ··· n 2 k z z z z | | 1+ | | + | | 2 + + | | k + . ≤ n! n +1 (n + 1) ··· (n + 1) ··· ! z (n + 1)/2 implies, | | ≤ z n 1 1 1 2 z n r (z) | | 1+ + + + + | | . | n | ≤ n! 2 4 ··· 2k ··· ≤ n!  

Example 3.9 (a) Inserting n =1 gives

E(z) 1 = r (z) 2 z , z 1. | − | | 1 | ≤ | | | | ≤ In particular, E(z) is continuous at z = 0. Indeed, to ε > 0 choose δ = ε/2 then z < δ 0 | | implies E(z) 1 2 z ε; hence lim E(z)= E(0) = 1 and E is continuous at 0. | − | ≤ | | ≤ z 0 (b) Inserting n =2 gives → 3 ez 1 z = r (z) z 2 , z . | − − | | 2 |≤| | | | ≤ 2 This implies ez 1 3 − 1 z , z < . z − ≤| | | | 2

ez 1 The sandwich theorem gives lim − =1. z 0 z → 88 3 Functions and Continuity

By (3.5), limx E(x) = + ; hence (3.7) shows that limx E(x) = 0. By (3.5), →∞ ∞ →−∞ 0

lim(E(z + h) E(z)) = E(z) lim(E(h) 1) = E(z) 0=0, (3.13) h 0 h 0 → − → − · where limh 0 E(h)=1 directly follows from Example 3.9. Hence, E(z) is continuous for all → z.

x Proposition 3.11 Let e be defined on Ê by the power series (3.5). Then

(a) ex is continuous for all x. (b) ex is a strictly increasing function and ex > 0. (c) ex+y = exey. (d) lim ex =+ , lim ex =0. x + ∞ x → ∞ xn →−∞

(e) lim =0 for every n Æ. x + x → ∞ e ∈ Proof. We have already proved (a) to (d); (3.5) shows that

xn+1 ex > (n + 1)! for x> 0, so that xn (n + 1)! < , ex x and (e) follows. Part (e) shows that ex tends faster to + than any power of x, as x + . ∞ → ∞

x x Since e , x Ê, is a strictly increasing continuous function, by Proposition3.9 e has an strictly ∈ increasing continuous inverse function log y, log: (0, + ) Ê. The function log is defined ∞ → by

elog y = y, y> 0, (3.14) or, equivalently, by

x

log(e )= x, x Ê. (3.15) ∈ Writing u = ex and v = ey, (3.6) gives

log(uv) = log(exey) = log(ex+y)= x + y, such that

log(uv) = log u + log v, u> 0,v> 0. (3.16) 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 89

This shows that log has the familiar property which makes the logarithm useful for computa- tions. Another customary notation for log x is ln x. Proposition3.11 shows that

lim log x =+ , lim log x = . x + x 0+0 → ∞ ∞ → −∞ We summarize what we have proved so far.

Proposition 3.12 Let the logarithm log: (0, + ) Ê be the inverse function to the exponen- ∞ → tial function ex. Then (a) log is continuous on (0, + ). ∞ (b) log is strictly increasing. (c) log(uv) = log u + log v for u,v > 0. (d) lim log x =+ , lim log x = . x + x 0+0 → ∞ ∞ → −∞ It is seen from (3.14) that

x = elog x = xn = en log x (3.17) ⇒ if x> 0 and n is an integer. Similarly, if m is a positive integer, we have

1 log x x m = e m (3.18)

Combining (3.17) and (3.18), we obtain

xα = eα log x. (3.19) for any rational α. We now define xα for any real α and x> 0, by (3.19). In the same way, we redefine the exponential function

x x log a

a = e , a> 0, x Ê. ∈ It turns out that in case a = 1, f(x) = ax is strict monotonic and continuous since ex is so. 6

Hence, f has a stict monotonic continuous inverse function log : (0, + ) Ê defined by a ∞ → x loga x log (a )= x, x Ê, a = x, x> 0. a ∈ 3.5.2 Trigonometric Functions and their Inverses

Simple Trig Functions 1 In this section we redefine the trigonometric functions using the exponential function ez. We will see that the new defini- 0.5 tions coincide with the old ones.

Definition 3.10 For z C define ±6 ±4 ±2 2 4 6 ∈ 1 i z i z 1 i z i z cos z = e + e− , sin z = e e− (3.20) ±0.5 2 2i −   such that ±1

i z Sine e = cos z + i sin z (Euler formula) (3.21) Cosine 90 3 Functions and Continuity

Proposition 3.13 (a) The functions sin z and cos z can be written as power series which con-

verge absolutely for all z C: ∈ ∞ ( 1)n 1 1 1 cos z = − z2n =1 z2 + z4 z6 + (2n)! − 2 4! − 6! −··· n=0 X (3.22) ∞ ( 1)n 1 1 sin z = − z2n+1 = z z3 + z5 + . (2n + 1)! − 3! 5! − ··· n=0 X

(b) sin x and cos x are real valued and continuous on Ê, where cos x is an even and sin x is an odd function, i.e. cos( x) = cos x, sin( x)= sin x. We have − − − sin2 x + cos2 x = 1; (3.23) cos(x + y) = cos x cos y sin x sin y; − (3.24) sin(x + y) = sin x cos y + cos x sin y. Proof. (a) Inserting iz into (3.5) in place of z and using (in)=(i, 1, i, 1, i, 1,... ), we have − − − ∞ zn ∞ z2k ∞ z2k+1 ei z = in = ( 1)k + i ( 1)k . n! − (2k)! − (2k + 1)! n=0 X Xk=0 Xk=0 Inserting iz into (3.5) in place of z we have − ∞ n ∞ 2k ∞ 2k+1 i z n z k z k z e− = i = ( 1) i ( 1) . n! − (2k)! − − (2k + 1)! n=0 X Xk=0 Xk=0 Inserting this into (3.20) proves (a).

(b) Since the exponential function is continuous on C, sin z and cos z are also continuous on

Ê Ê C. In particular, their restrictions to are continuous. Now let x , then i x = i x. By ∈ − Homework11.3 and (3.20) we obtain 1 1 1 cos x = ei x + ei x = ei x + ei x = ei x + ei x = Re ei x 2 2 2       and similarly  sin x = Im ei x . Hence, sin x and cos x are real for real x.  i x For x Ê we have e = 1. Namely by (3.7) and ∈ Homework 11.3 i e i x i sin x ix 2 ix ix ix i x 0 e = e e = e e− = e =1,

so that for x Ê ∈ 0 cos x 1 ei x =1. (3.25)

On the other hand, the Euler formula and the fact that cos x and sin x are real give

1= ei x = cos x + i sin x = cos2 x + sin2 x. | |

3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 91

Hence, eix = cos x + i sin x is a point on the unit circle in the complex plane, and cos x and sin x are its coordinates. This establishes the equivalence between the old definition of cos x as 180◦ the length of the adjacent side in a rectangular triangle with hypothenuse 1 and angle x π and the power series definition of cos x. The only missing link is: the length of the arc from 1 to eix is x. It follows directly from the definition that cos( z) = cos z and sin( z) = sin z for all − − −i (x+y) z C. The addition laws for sin x and cos x follow from (3.6) applied to e . This ∈ completes the proof of (b).

Lemma 3.14 There exists a unique number τ (0, 2) such that cos τ = 0. We define the ∈ number π by

π =2τ. (3.26)

The proof is based on the following Lemma.

Lemma 3.15

x3 (a) 0

(c) cos x is strictly decreasing on [0, π]; whereas sin x is strictly increasing on [ π/2,π/2]. −

x2 sin x In particular, the sandwich theorem applied to statement (a), 1 6 < x < 1 as x 0+0 sin x sin x − sin x → gives limx 0+0 =1. Since is an even function, this implies limx 0 =1. → x x → x The proof of the lemma is in the Appendix B to this chapter. Proof of Lemma3.14. cos0 = 1. By the Lemma3.15, cos2 1 < 1/2. By the double angle formula for cosine, cos 2 = 2 cos2 1 1 < 0. By continuity of cos x and Theorem3.5, cos has − a zero τ in the interval (0, 2). By addition laws, x + y x y cos x cos y = 2 sin sin − . − − 2 2     So that by Lemma3.15 0 cos y. Hence, cos x is strictly decreasing on (0, 2). The zero τ is therefore unique. 92 3 Functions and Continuity

By definition, cos π = 0; and (3.23) shows sin(π/2) = 1. By (3.27), sin π/2=1. Thus 2 ± eiπ/2 = i, and the addition formula for ez gives  eπ i = 1, e2π i = 1; (3.31) − hence,

z+2π i z

e = e , z C. (3.32) ∈ Proposition 3.16 (a) The function ez is periodic with period 2πi.

ix Z We have e =1, x Ê, if and only if x =2kπ, k . ∈ ∈

(b) The functions sin z and cos z are periodic with period 2π. Z The real zeros of the sine and cosine functions are kπ k Z and π/2+ kπ k , { | ∈ } { | ∈ } respectively.

Proof. We have already proved (a). (b) follows from (a) and (3.20).

Tangent and Cotangent Functions

Tangent Cotangent 4 4

3 3

y 2 y 2

1 1

±4 ±3 ±2 ±1 1 2 3 4 ±4 ±3 ±2 ±1 1 2 3 4

x x ±1 ±1

±2 ±2

±3 ±3

±4 ±4

For x = π/2+ kπ, k Z, define 6 ∈ sin x tan x = . (3.33) cos x

For x = kπ, k Z, define 6 ∈ cos x cot x = . (3.34)

sin x Z Lemma 3.17 (a) tan x is continuous at x Ê π/2+ kπ k , and tan(x + π) = tan x; ∈ \ { | ∈ } (b) lim tan x =+ , lim tan x = ; x π 0 x π +0 → 2 − ∞ → 2 −∞ (c) tan x is strictly increasing on ( π/2,π/2); − 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 93

Proof. (a) is clear by Proposition3.3 since sin x and cos x are continuous. We show only (c) and let (b) as an exercise. Let 0 cos y > 0. Therefore sin x sin y tan x = < = tan y. cos x cos y

Hence, tan is strictly increasing on (0,π/2). Since tan( x) = tan(x), tan is strictly − − increasing on the whole interval ( π/2,π/2). −

Similarly as Lemma3.17 one proves the next lemma. Z Lemma 3.18 (a) cot x is continuous at x Ê kπ k , and cot(x + π) = cot x; ∈ \ { | ∈ } (b) lim cot x = , lim cot x =+ ; x 0 0 −∞ x 0+0 ∞ (c) cot→ x− is strictly decreasing→ on (0, π).

Inverse Trigonometric Functions

We have seen in Lemma3.15 that cos x is strictly decreasing on [0, π] and sin x is strictly in- creasing on [ π/2,π/2]. Obviously, the images are cos[0, π] = sin[ π/2,π/2] = [ 1, 1]. − − − Using Proposition3.9 we obtain that the inverse functions exists and they are monotonic and continuous. Proposition 3.19 (and Definition) There exists the in- Arcosine and Arcsine 3 verse function to cos

arccos: [ 1, 1] [0, π] (3.35) 2 − → given by arccos(cos x)= x, x [0, π] or cos(arccos y)= 1 ∈ y, y [ 1, 1]. The function arccos x is strictly decreas- ∈ − ing and continuous. ±1 ±0.8 ±0.6 ±0.4 ±0.20 0.2 0.4 0.6 0.8 1 There exists the inverse function to sin

±1 arcsin: [ 1, 1] [ π/2,π/2] (3.36) − → − given by arcsin(sin x) = x, x [ π/2,π/2] or ∈ − arccos sin(arcsin y) = y, y [ 1, 1]. The function arcsin x arcsin ∈ − is strictly increasing and continuous. Note that arcsin x + arccos x = π/2 if x [ 1, 1]. Indeed, let y = arcsin x; then x = sin y = ∈ − cos(π/2 y). Since y [0, π], π/2 y [ π/2,π/2], and we have arccos x = π/2 y. − ∈ − ∈ − − Therefore y + arccos x = π/2. 94 3 Functions and Continuity

By Lemma3.17, tan x is strictly increasing on Arctangent and Arccotangent ( π/2,π/2). Therefore, there exists the inverse 3 −

function on the image tan( π/2,π/2) = Ê. − Proposition 3.20 (and Definition) There 2 exists the inverse function to tan

1 arctan: Ê ( π/2,π/2) (3.37) → − given by arctan(tan x) = x, x ( π/2,π/2) ∈ −

or tan(arctan y) = y, y Ê. The function ±10 ±8 ±6 ±4 ±2 2 4 6 8 10 ∈ x arctan x is strictly increasing and continuous. There exists the inverse function to cot x ±1

arccot : Ê (0, π) (3.38) → given by arccot (cot x) = x, x (0, π) or arctan arccot ∈ cot(arccot y) = y, y Ê. The function ∈ arccot x is strictly decreasing and continuous.

3.5.3 Hyperbolic Functions and their Inverses

Hyperbolic Cosine and Sine The functions

3 x x e e− sinh x = − , (3.39) 2 2 x x e + e− 1 cosh x = , (3.40) 2 x x e e− sinh x ±2 ±1 1 2 (3.41) tanh x = x − x = e + e− cosh x ±1 ex + e x cosh x coth x = − = (3.42) ±2 ex e x sinh x − − ±3 are called hyperbolic sine, hyperbolic cosine, hyper- bolic tangent, and hyperbolic cotangent, respectively. There are many analogies between these functions and

cosh their ordinary trigonometric counterparts. sinh 3.6 Appendix B 95

Hyperbolic Cotangent Hyperbolic Tangent 3 1

2

y 0.5

1

±3 ±2 ±10 1 2 3 ±3 ±2 ±1 1 2 3

x

±1

±0.5

±2

±1 ±3

Ê Ê The functions sinh x and tanh x are strictly increasing with sinh(Ê) = and tanh( ) =

( 1, 1). Hence, their inverse functions are defined on Ê and on ( 1, 1), respectively, and are − −

also strictly increasing and continuous. The function Ê arsinh : Ê (3.43)

→ Ê is given by arsinh (sinh(x)) = x, x Ê or sinh(arsinh (y)) = y, y . ∈ ∈ The function

artanh : ( 1, 1) Ê (3.44) − →

is defined by artanh (tanh(x)) = x, x Ê or tanh(artanh (y)) = y, y ( 1, 1).

∈ ∈ − Ê The function cosh is strictly increasing on the half line Ê with cosh( ) = [1, ). Hence, + + ∞ the inverse function is defined on [1, ) taking values in Ê . It is also strictly increasing and ∞ + continuous.

arcosh : [1, ) Ê (3.45) ∞ → + is defined via arcosh (cosh(x)) = x, x 0 or by cosh(arcosh (y)) = y, y 1. ≥ ≥

The function coth is strictly decreasing on the x < 0 and on x > 0 with coth(Ê \ 0) =

Ê Ê Ê [ 1, 1]. Hence, the inverse function is defined on [ 1, 1] taking values in 0. It is \ − \ − \

also strictly decreasing and continuous. Ê arcoth : Ê [ 1, 1] (3.46) \ − → is defined via arcoth (coth(x)) = x, x =0 or by coth(arcoth (y)) = y, y < 1 or y> 1. 6 −

3.6 AppendixB

3.6.1 Monotonic Functions have One-Sided Limits Proof of Theorem3.8. By hypothesis, the set f(t) a 0 be given. It follows from the definition of A as a least upper bound that there exists δ > 0 such that a < x δ < x and − A ε < f(x δ) A. (3.47) − − ≤ Since f is monotonic, we have

f(x δ) < f(t) A, if x δ < t < x. (3.48) − ≤ − Combining (3.47) and (3.48), we see that

f(t) A <ε if x δ < t < x. | − | − Hence f(x 0) = A. − The second half of (3.3) is proved in precisely the same way. Next, if a

f(x +0)= inf f(t) = inf f(t). (3.49) x

The last equality is obtained by applying (3.3) to (a, y) instead of (a, b). Similarly,

f(y 0) = sup f(t) = sup f(t). (3.50) − a

Comparison of the (3.49) and (3.50) gives (3.4).

3.6.2 Proofs for sin x and cos x inequalities Proof of Lemma 3.15. (a) By (3.22)

1 1 1 cos x = 1 x2 + x4 x2 + . − 2! 4! − 6! ···     2 2 0 0 and, moreover 1/(2n)! x /(2n + 2)! > 0 for all n Æ; − − ∈ hence C(x) > 0. By (3.22), 1 1 1 sin x = x 1 x2 + x5 x2 + . − 3! 5! − 7! ···     Now, 1 1 1 1 x2 > 0 x< √6, x2 > 0 x< √42,.... − 3! ⇐⇒ 5! − 7! ⇐⇒ Hence, S(x) > 0 if 0

1 1 1 1 x sin x = x3 x2 + x7 x2 + , − 3! − 5! 7! − 9! ···     3.6 Appendix B 97 and we obtain sin x < x if 0 0; − equivalently

? 1 1 1 1 1 1 0 < x3 x5 + x7 + 2! − 3! − 4! − 5! 6! − 7! − ···       ? 2 4 6 8 0 < x3 x2 + x7 x2 + 3! − 5! 7! − 9! ···     Now √10 >x> 0 implies 2n 2n +2 x2 > 0 (2n + 1)! − (2n + 3)!

for all n Æ. This completes the proof of (a) ∈ (b) Using (3.23), we get

0 < x cos x< sin x = 0 < x2 cos2 x< sin2 x ⇒ 1 = x2 cos2 x + cos2 x< 1 = cos2 x< . ⇒ ⇒ 1+ x2 (c) In the proof of Lemma3.14 we have seen that cos x is strictly decreasing in (0,π/2). By (3.23), sin x = √1 cos2 x is strictly increasing. Since sin x is an odd function, sin x is strictly − increasing on [ π/2,π/2]. Since cos x = sin(x π/2), the statement for cos x follows. − − −

3.6.3 Estimates for π Proposition 3.21 For real x we have

n x2k cos x = ( 1)k + r (x) (3.51) − (2k)! 2n+2 Xk=0 n x2k+1 sin x = ( 1)k + r (x), (3.52) − (2k + 1)! 2n+3 Xk=0 where

x 2n+2 r (x) | | if x 2n +3, (3.53) | 2n+2 | ≤ (2n + 2)! | | ≤ x 2n+3 r (x) | | if x 2n +4. (3.54) | 2n+3 | ≤ (2n + 3)! | | ≤ Proof. Let x2n+2 x2 r (x)= 1 . 2n+2 ±(2n + 2)! − (2n + 3)(2n + 4) ±···   Put x2k a := . k (2n + 3)(2n + 4) (2n + 2(k + 1)) ··· 98 3 Functions and Continuity

Then we have, by definition

x2n+2 r (x)= (1 a + a + ) . 2n+2 ±(2n + 2)! − 1 2 − ···

Since x2 ak = ak 1 , − (2n +2k + 1)(2n +2k + 2) x 2n +3 implies | | ≤ 1 > a > a > > 0 1 2 ··· and finally as in the proof of the Leibniz criterion

0 1 a + a a + 1. ≤ − 1 2 − 3 −···≤ Hence, r (x) x 2n+2 /(2n + 2)!. The estimate for the remainder of the sine series is | 2n+2 |≤| | similar.

This is an application of Proposition3.21. For numerical calculations it is convenient to use the following order of operations

x2 x2 x2 cos x = − +1 − +1 − +1 ··· 2n(2n 1) (2n 2)(2n 3) (2n 4)(2n 5) ···   −  − −  − −  x2 − +1+ r (x). ··· 2 2n+2  First we compute cos1.5 and cos1.6. Choosing n =7 we obtain

x2 x2 x2 x2 x2 x2 x2 cos x = − +1 − +1 − +1 − +1 − +1 − +1 − + 182 132 90 56 30 12 2       

+1+ r16(x).

By Proposition3.21

16 x 10 r (x) | | 0.9 10− if x 1.6. | 16 | ≤ 16! ≤ · | | ≤

The calculations give

11 11 cos1.5=0.07073720163 20 10− > 0, cos1.6 = 0.02919952239 20 10− < 0. ± · − ± · By the itermediate value theorem, 1.5 < π/2 < 1.6. 3.6 Appendix B 99

Now we compute cos x for two values of x which are close to cos 1.5 the linear interpolation cos1.5 a =1.5+0.1 =1.57078 . . . cos1.5 cos1.6 −

11 1.5a 1.6 cos1.5707 = 0.000096326273 20 10− > 0, ± · 11 cos1.5708 = 0.00000367326 20 10− < 0. − ± · cos 1.6 Hence, 1.5707 < π/2 < 1.5708. The next linear interpolation gives cos1.5707 b =1.5707+ 0.00001 =1.570796326 . . . cos1.707 cos1.708 11− cos1.570796326 = 0.00000000073 20 10− > 0, ± · 11 cos1.570796327 = 0.00000000027 20 10− < 0. − ± · Therefore 1.570796326 < π/2 < 1.570796327 so that

9 π =3.141592653 10− . ± 100 3 Functions and Continuity Chapter 4

Differentiation

4.1 The Derivative of a Function

We define the derivative of a function and prove the main properties like product, quotient and chain rule. We relate the derivative of a function with the derivative of its inverse function. We prove the mean value theorem and consider local extrema. Taylor’s theorem will be formulated.

Definition 4.1 Let f : (a, b) Ê be a function and x (a, b). If the limit → 0 ∈ f(x) f(x ) lim − 0 (4.1) x x0 x x → − 0 exists, we call f differentiable at x0. The limit is denoted by f ′(x0). We say f is differentiable if f is differentiable at every point x (a, b). We thus have associated to every function f a ∈ function f ′ whose domain is the set of points x0 where the limit (4.1) exists; f ′ is called the derivative of f. Sometimes the Leibniz notation is used to denote the derivative of f

df(x0) d f ′(x )= = f(x ). 0 dx dx 0

f(x0 + h) f(x0) Remarks 4.1 (a) Replacing x x0 by h , we see that f ′(x0) = lim − . − h 0 h (b) The limits → f(x + h) f(x ) f(x + h) f(x ) lim 0 − 0 , lim 0 − 0 h 0 0 h h 0+0 h → − → are called left-hand and right-hand derivatives of f in x0, respectively. In particular for

f : [a, b] Ê, we can consider the right-hand derivative at a and the left-hand derivative at b. → Example 4.1 (a) For f(x)= c the constant function

f(x) f(x0) c c f ′(x0) = lim − = lim − =0. x x0 x x x x0 x x → − 0 → − 0 (b) For f(x)= x, x x0 f ′(x0) = lim − =1. x x0 x x → − 0 101 102 4 Differentiation

(c) The slope of the tangent line. Given a function f : (a, b) Ê which is differentiable at x . → 0 Then f ′(x0) is the slope of the tangent line to the graph of f through the point (x0, f(x0)).

The slope of the secant line through (x0, f(x0)) and

(x1, f(x1)) is

y f(x1) f(x0) f( x1) m = tan α = − . 1 x x 1 − 0 One can see: If x1 approaches x0, the secant line through x f( 0) (x0, f(x0)) and (x1, f(x1)) approaches the tangent line α 1 through (x0, f(x0)). Hence, the slope of the tangent line x x 0 x1 is the limitof the slopesof the secant linesif x approaches

x0: f(x) f(x0) f ′(x0) = lim − . x x0 x x → − 0 Proposition 4.1 If f is differentiable at x (a, b), then f is continuous at x . 0 ∈ 0 Proof. By Proposition3.2 we have

f(x) f(x0) lim (f(x) f(x0)) = lim − (x x0)= f ′(x0) lim (x x0)= f ′(x0) 0=0. x x0 − x x0 x x − x x0 − · → → − 0 →

The converse of this proposition is not true. For example f(x) = x is continuous in Ê but h h | | differentiable in Ê 0 since lim | | =1 whereas lim | | = 1. Later we will become \ { } h 0+0 h h 0 0 h − aquainted with a function which→ is continuous on the whole→ − line without being differentiable at any point!

Proposition 4.2 Let f : (r, s) Ê be a function and a (r, s). Then f is differentiable at a if → ∈ and only if there exists a number c Ê and a function ϕ defined in a neighborhood of a such ∈ that

f(x)= f(a)+(x a)c + ϕ(x), (4.2) − where ϕ(x) lim =0. (4.3) x a x a → − In this case f ′(a)= c. The proposition says that a function f differentiable at a can be approximated by a linear func- tion, in our case by y = f(a)+(x a)f ′(a). − The graph of this linear function is the tangent line to the graph of f at the point (a, f(a)). Later

n m Ê we will use this point of view to define differentiability of functions f : Ê . → 4.1 The Derivative of a Function 103

Proof. Suppose first f satisfies (4.2) and (4.3). Then

f(x) f(a) ϕ(x) lim − = lim c + = c. x a x a x a x a → − →  − 

Hence, f is differentiable at a with f ′(a)= c.

Now, let f be differentiable at a with f ′(a)= c. Put ϕ(x)= f(x) f(a) (x a)f ′(a). Then − − − ϕ(x) f(x) f(a) lim = lim − f ′(a)=0. x a x a x a x a − → − → −

Let us compute the linear function whose graph

is the tangent line through (x0, f(x0)). Con- y sider the rectangular triangle PP0Q0. By Ex- ample 4.1 (c) we have

y P (x,y) y y0 f ′(x0) = tan α = − , P α x x0 f(x 0 ) 0 − Q 0 such that the tangent line has the equation x x 0 x y = g(x)= f(x )+ f ′(x )(x x ). 0 0 − 0

This function is called the linearization of f at x0. It is also the Taylor polynomial of degree 1 of f at x0, see Section4.5 below.

Proposition 4.3 Suppose f and g are defined on (a, b) and are differentible at a point x ∈ (a, b). Then f + g, fg, and f/g are differentiable at x and

(a) (f + g)′(x)= f ′(x)+ g′(x);

(b) (fg)′(x)= f ′(x)g(x)+ f(x)g′(x); f ′ f ′(x)g(x) f(x)g′ (x) − (c) g (x)= g(x)2 .   In (c), we assume that g(x) =0. 6 Proof. (a) Since

(f + g)(x + h) (f + g)(x) f(x + h) f(x) g(x + h) g(x) − = − + − , h h h the claim follows from Proposition3.2. Let h = fg and t be variable. Then

h(t) h(x)= f(t)(g(t) g(x))+ g(x)(f(t) f(x)) − − − h(t) h(x) g(t) g(x) f(t) f(x) − = f(t) − + g(x) − . t x t x t x − − − 104 4 Differentiation

Noting that f(t) f(x) as t x, (b) follows. → → Next let h = f/g. Then

f(t) f(x) h(t) h(x) g(t) g(x) f(t)g(x) f(x)g(t) − = − = − t x t x g(x)g(t)(t x) − − − 1 f(t)g(x) f(x)g(x)+ f(x)g(x) f(x)g(t) = − − g(t)g(x) t x − 1 f(t) f(x) g(t) g(x) = g(x) − f(x) − . g(t)g(x) t x − t x  − −  Letting t x, and applying Propositions3.2 and 4.1, we obtain (c). →

n n 1 Æ Example 4.2 (a) f(x) = x , n Z. We will prove f ′(x) = nx − by induction on n . ∈ ∈ The cases n =0, 1 are OK by Example4.1. Suppose the statement is true for some fixed n. We n+1 n will show that (x )′ =(n + 1)x . By the product rule and the induction hypothesis

n+1 n n n n 1 n n (x )′ =(x x)′ =(x )′x + x (x′)= nx − x + x =(n + 1)x . · n This proves the claim for positive integers n. For negative n consider f(x)=1/x− and use the quotient rule. x x (b) (e )′ = e .

x+h x x h x h x e e e e e x e 1 x (e )′ = lim − = lim − = e lim − = e ; (4.4) h 0 h h 0 h h 0 h → → → the last equation simply follows from Example3.9 (b)

(c) (sin x)′ = cos x, (cos x)′ = sin x. Using sin(x + y) sin(x y)= 2cos(x) sin (y) we − − − have

2x+h h sin(x + h) sin x 2 cos 2 sin 2 (sin x)′ = lim − = lim h 0 h 0 → h → h h sin h = lim cos x + lim 2 . h 0 2 h 0 h →   → 2 sin h Since cos x is continuous and lim = 1 (by the argument after Lemma3.15), we obtain h 0 → h (sin x)′ = cos x. The proof for cos x is analogous. 1 (d) (tan x)′ = . Using the quotiont rule for the function tan x = sin x/ cos x we have cos2 x

2 2 (sin x)′ cos x sin x(cos x)′ cos x + sin x 1 (tan x)′ = − = = . cos2 x cos2 x cos2 x The next proposition deals with composite functions and is probably the most important state- ment about derivatives. 4.1 The Derivative of a Function 105

Proposition 4.4 (Chain rule) Let g : (α, β) Ê be differentiable at x (α, β) and let → 0 ∈ ◦ f : (a, b) Ê be differentiable at y = g(x ) (a, b). Then h = f g is differentiable at → 0 0 ∈ x0, and

h′(x0)= f ′(y0)g′(x0). (4.5)

Proof. We have f(g(x)) f(g(x )) f(g(x)) f(g(x )) g(x) g(x ) − 0 = − 0 − 0 x x g(x) g(x ) x x − 0 − 0 − 0 f(y) f(y0) lim − g′(x0)= f ′(y0)g′(x0). x−→x0 y y0 y y · → → − 0 Here we used that y = g(x) tends to y = g(x ) as x x , since g is continuous at x . 0 0 → 0 0

Proposition 4.5 Let f : (a, b) Ê be strictly monotonic and continuous. Suppose f is differ- → 1 entiable at x. Then the inverse function g = f − : f((a, b)) Ê is differentiable at y = f(x) → with 1 1 g′(y)= = . (4.6) f ′(x) f ′(g(y)) Proof. Let (y ) f((a, b)) be a sequence with y y and y = y for all n. Put x = g(y ). n ⊂ n → n 6 n n Since g is continuous (by Corollary 3.9), limn xn = x. Since g is injective, xn = x for all n. →∞ 6 We have g(y ) g(y) x x 1 1 lim n − = lim n − = lim = . f(xn) f(x) n yn y n f(xn) f(x) n − f ′(x) →∞ →∞ →∞ xn x − − −

Hence g′(y)=1/f ′(x)=1/f ′(g(y)).

We give some applications of the last two propositions.

Ê Ê Ê Example 4.3 (a) Let f : Ê be differentiable; define F : by F (x) := f(ax + b) → → with some a, b Ê. Then ∈ F ′(x)= af ′(ax + b). (b) In what follows f is the original function (with known derivative) and g is the inverse function to f. We fix the notion y = f(x) and x = g(y).

x Ê log: Ê 0 is the inverse function to f(x) = e . By the above proposition + \ { }→ 1 1 1 ′ (log y) = x = x = . (e )′ e y

α α log x α α log x α log x 1 α 1 (c) x = e . Hence, (x )′ = (e )′ = e α = αx − . x 1 (d) Suppose f > 0 and g = log f. Then g′ = f ′ ; hence f = fg . f ′ ′ 106 4 Differentiation

(e) arcsin: [ 1, 1] Ê is the inverse function to y = f(x) = sin x. If x ( 1, 1) then − → ∈ − 1 1 (arcsin(y))′ = = . (sin x)′ cos x Since y [ 1, 1] implies x = arcsin y [ π/2,π/2], cos x 0. Therefore, cos x = ∈ − ∈ − ≥ 1 sin2 x = 1 y2. Hence − − p p 1 (arcsin y)′ = , 1

sin2 x 1 cos2 x 1 y2 = tan2 x = = − = 1 cos2 x cos2 x cos2 x − 1 cos2 x = 1+ y2 1 (arctan y)′ = . 1+ y2 4.2 The Derivatives of Elementary Functions 107

4.2 The Derivatives of Elementary Functions

function derivative const. 0 n n 1

x (n Æ) nx − ∈ α α 1

x (α Ê,x> 0) αx − ∈ ex ex ax, (a> 0) ax log a 1 log x x 1 log x a x log a sin x cos x cos x sin x − 1 tan x cos2 x 1 cot x −sin2 x sinh x cosh x cosh x sinh x 1 tanh x cosh2 x 1 coth x −sinh2 x 1 arcsin x √1 x2 1− arccos x −√1 x2 −1 arctan x 1+ x2 1 arccot x −1+ x2 1 arsinh x √x2 +1 1 arcosh x √x2 1 1− artanh x 1 x2 −1 arcoth x 1 x2 − 108 4 Differentiation

4.2.1 Derivatives of Higher Order Ê Let f : D Ê be differentiable. If the derivative f ′ : D is differentiable at x D, then → → ∈ d2f(x) = f ′′(x)=(f ′)′(x) dx2 is called the second derivative of f at x. Similarly, one defines inductively higher order deriva- tives. Continuing in this manner, we obtain functions

(3) (k) f, f ′, f ′′, f , ...,f each of which is the derivative of the preceding one. f (n) is called the nth derivative of f or the derivative of order n of f. We also use the Leibniz notation

dkf(x) d k f (k)(x)= = f(x). dxk dx  

k Æ Definition 4.2 Let D Ê and k a positive integer. We denote by C (D) the set of all ⊂ (k) ∈ (k) functions f : D Ê such that f (x) exists for all x D and f (x) is continuous. Obviously → ∈ C(D) C1(D) C2(D) . Further, we set ⊃ ⊃ ⊃···

k (k) Æ C∞(D)= C (D)= f : D Ê f (x) exists k , x D . (4.7) { → | ∀ ∈ ∈ }

k Æ \∈ f Ck(D) is called k times continuously differentiable. C(D) = C0(D) is the vector space of ∈ continuous functions on D.

Using induction over n, one proves the following proposition.

Proposition 4.6 (Leibniz formula) Let f and g be n times differentiable. Then fg is n times differentiable with

n (n) n (k) (n k) (f(x)g(x)) = f (x)g − (x). (4.8) k Xk=0   4.3 Local Extrema and the Mean Value Theorem

Many properties of a function f like monotony, convexity, and existence of local extrema can be studied using the derivative f ′. From estimates for f ′ we obtain estimates for the growth of f.

Definition 4.3 Let f : [a, b] Ê be a function. We say that f has a local maximum at the point → ξ, ξ (a, b), if there exists δ > 0 such that f(x) f(ξ) for all x [a, b] with x ξ < δ. ∈ ≤ ∈ | − | Local minima are defined likewise.

We say that ξ is a local extremum if it is either a local maximum or a local minimum. 4.3 Local Extrema and the Mean Value Theorem 109

Proposition 4.7 Let f be defined on [a, b]. If f has a local extremum at a point ξ (a, b), and ∈ if f ′(ξ) exists, then f ′(ξ)=0.

Proof. Suppose f has a local maximum at ξ. According with the definition choose δ > 0 such that a<ξ δ<ξ<ξ + δ < b. − If ξ δ

Remarks 4.2 (a) f ′(x)=0 is a necessary but not a sufficient condition for a local extremum 3 3 in x. For example f(x)= x has f ′(x)=0, but x has no local extremum. (b) If f attains its local extrema at the boundary, like f(x) = x on [0, 1], we do not have f ′(ξ)=0.

Theorem 4.8 (Rolle’s Theorem) Let f : [a, b] Ê be continuous with f(a)= f(b) and let f → be differentiable in (a, b). Then there exists a point ξ (a, b) with f ′(ξ)=0. ∈ In particular, between two zeros of a differentiable function there is a zero of its derivative.

Proof. If f is the constant function, the theorem is trivial since f ′(x) 0 on (a, b). Other- ≡ wise, there exists x (a, b) such that f(x ) > f(a) or f(x ) < f(a). Then f attains its 0 ∈ 0 0 maximum or minimum, respectively, at a point ξ (a, b). By Proposition4.7, f ′(ξ)=0. ∈

Theorem 4.9 (Mean Value Theorem) Let f : [a, b] Ê be continuous and differentiable in → (a, b). Then there exists a point ξ (a, b) such that ∈ f(b) f(a) f ′(ξ)= − (4.9) b a −

Geometrically, the mean value theorem states that there exists a tangent line through some point (ξ, f(ξ)) which is parallel to the secant line AB, A =(a, f(a)), B =(b, f(b)).

a ξ b 110 4 Differentiation

Theorem 4.10 (Generalized Mean Value Theorem) Let f and g be continuous functions on [a, b] which are differentiable on (a, b). Then there exists a point ξ (a, b) such that ∈

(f(b) f(a))g′(ξ)=(g(b) g(a))f ′(ξ). − − Proof. Put h(t)=(f(b) f(a))g(t) (g(b) g(a))f(t). − − − Then h is continuous in [a, b] and differentiable in (a, b) and

h(a)= f(b)g(a) f(a)g(b)= h(b). − Rolle’s theorem shows that there exists ξ (a, b) such that ∈

h′(ξ)= f(b) f(a))h′(ξ) (g(b) g(a))f ′(ξ)=0. − − − The theorem follows.

In case that g′ is nonzero on (a, b) and g(b) g(a) =0, the generalized MVT states the existence − 6 of some ξ (a, b) such that ∈ f(b) f(a) f ′(ξ) − = . g(b) g(a) g (ξ) − ′ This is in particular true for g(x) = x and g′ = 1 which gives the assertion of the Mean Value Theorem.

Remark 4.3 Note that the MVT fails if f is complex-valued, continuous on [a, b], and differen- tiable on (a, b). Indeed, f(x) = eix on [0, 2π] is a counter example. f is continuous on [0, 2π], differentiable on (0, 2π) and f(0) = f(2π)=1. However, there is no ξ (0, 2π) such that f(2π) f(0) iξ ∈ z z 0= − = f ′(ξ) = ie since the exponential function has no zero, see (3.7) (e e− =1) 2π · in Subsection 3.5.1.

Corollary 4.11 Suppose f is differentiable on (a, b).

If f ′(x) 0 for all x (a, b), then f in monotonically increasing. ≥ ∈ If f ′(x)=0 for all x (a, b), then f is constant. ∈ If f ′(x) 0 for all x in (a, b), then f is monotonically decreasing. ≤ Proof. All conclusions can be read off from the equality

f(x) f(t)=(x t)f ′(ξ) − − which is valid for each pair x, t, a

4.3.1 Local Extrema and Convexity

Proposition 4.12 Let f : (a, b) Ê be differentiable and suppose f ′′(ξ) exists at a point → ξ (a, b). If ∈ f ′(ξ)=0 and f ′′(ξ) > 0, then f has a local minimum at ξ. Similarly, if

f ′(ξ)=0 and f ′′(ξ) < 0, f has a local maximum at ξ.

Remark 4.4 The condition of Proposition4.12 is sufficient but not necessary for the existence 4 of a local extremum. For example, f(x)= x has a local minimum at x =0, but f ′′(0) = 0.

Proof. We consider the case f ′′(ξ) > 0; the proof of the other case is analogous. Since

f ′(x) f ′(ξ) f ′′(ξ) = lim − > 0. x ξ x ξ → − By Homework 10.4 there exists δ > 0 such that

f ′(x) f ′(ξ) f ′′(ξ) − > | | > 0, for all x with 0 < x ξ < δ. x ξ 2 | − | − Since f ′(ξ)=0 it follows that

f ′(x) < 0 if ξ δ 0 if ξ

Hence, by Corollary 4.11, f is decreasing in (ξ δ, ξ) and increasing in (ξ,ξ + δ). Therefore, − f has a local minimum at ξ.

Definition 4.4 A function f : (a, b) Ê is → said to be convex if for all x, y (a, b) and all ∈ λ [0, 1] ∈ λ f(x)+(1- λ)f(y) f(λx + (1 λ)y) λf(x)+(1 λ)f(y). − ≤ − f(λ x+(1- λ)y) (4.10)

A function f is said to be concave if f is con- x λ x+(1- λ)y y − vex. Proposition 4.13 (a) Convex functions are continuous.

(b) Suppose f : (a, b) Ê is twice differentiable. Then f is convex if and only if f ′′(x) 0 for → ≥ all x (a, b). ∈ Proof. The proof is in Appendix C to this chapter. 112 4 Differentiation

4.4 L’Hospital’s Rule

Theorem 4.14 (L’Hospital’s Rule) Suppose f and g are differentiable in (a, b) and g(x) = 0 6 for all x (a, b), where a < b + . Suppose ∈ −∞ ≤ ≤ ∞ f (x) lim ′ = A. (4.11) x a+0 → g′(x) If

(a) lim f(x) = lim g(x)=0 or (4.12) x a+0 x a+0 → → (b) lim f(x) = lim g(x)=+ , (4.13) x a+0 x a+0 → → ∞ then f(x) lim = A. (4.14) x a+0 → g(x) The analogous statements are of course also true if x b 0, orif g(x) . → − → −∞

Proof. First we consider the case of finite a Ê. (a) One can extend the definition of f and ∈ g via f(a) = g(a)=0. Then f and g are continuous at a. By the generalized mean value theorem, for every x (a, b) there exists a ξ (a, x) such that ∈ ∈ f(x) f(a) f(x) f ′(ξ) − = = . g(x) g(a) g(x) g (ξ) − ′ If x approaches a then ξ also approaches a, and (a) follows. (b) Now let f(a +0)= g(a +0)=+ . Given ε> 0 choose δ > 0 such that ∞ f (t) ′ A <ε g (t) − ′ if t (a, a + δ). By the generalized mean value theorem for any x, y (a, a + δ) with x = y, ∈ ∈ 6 f(x) f(y) − A < ε. g(x) g(y) −

− We have g(y) f(x) f(x) f(y) 1 g(x) = − − . g(x) g(x) g(y) 1 f(y) − − f(x) The right factor tends to 1 as x approaches a, in particular there exists δ1 > 0 with δ1 < δ such that x (a, a + δ1) implies ∈ f(x) f(x) f(y) − < ε. g(x) − g(x) g(y)

− Further, the triangle inequality gives

f(x) A < 2ε. g(x) −

4.5 Taylor’s Theorem 113

This proves (b). The case x + can be reduced to the limit process y 0+0 using the substitution → ∞ → y =1/x.

L’Hospital’s rule applies also applies in the cases A =+ and A = . ∞ −∞ sin x cos x Example 4.4 (a) lim = lim =1. x 0 x x 0 1 → →1 √x 1 (b) lim = lim 2√x = lim =+ . x 0+0 x 0+0 x 0+0 → 1 cos x → sin x → 2√x sin x ∞ (c) − log x 1 lim x log x = lim = lim x = lim x =0. x 0+0 x 0+0 1 x 0+0 1 x 0+0 − → → x → − x2 → 0 Remark 4.5 It is easy to transform other indefinite expressions to or ∞ of l’Hospital’s rule. 0 f ∞ 0 : f g = 1 ·∞ · g 1 1 g − f : f g = 1 ; ∞−∞ − fg 00 : f g = eg log f . 0 Similarly, expressions of the form 1∞ and can be transformed. ∞

4.5 Taylor’s Theorem

The aim of this section is to show how n times differentiable functions can be approximated by polynomials of degree n. First consider a polynomial p(x)= a xn + + a x + a . We compute n ··· 1 0 n 1 n 2 p′(x)= nanx − +(n 1)an 1x − + + a1, − − ··· n 2 n 2 p′′(x)= n(n 1)anx − +(n 1)(n 2)an 1x − + +2a2, − − − − ··· . . (n) p (x)= n! an.

(n) Inserting x =0 gives p(0) = a0, p′(0) = a1, p′′(0) = 2a2,..., p (0) = n!an. Hence, p (0) p (0) p(n)(0) p(x)= p(0) + ′′ x + ′′ x2 + + xn. (4.15) 1! 2! ··· n! (k) (k) Now, fix a Ê and let q(x)= p(x + a). Since q (0) = p (a), (4.15) gives ∈ n q(k)(0) p(x + a)= q(x)= xk, k! Xk=0 n p(k)(a) p(x + a)= xk. k! Xk=0 114 4 Differentiation

Replacing in the above equation x + a by x yields

n p(k)(a) p(x)= (x a)k. (4.16) k! − Xk=0 (n) Theorem 4.15 (Taylor’s Theorem) Suppose f is a real function on [r, s], n Æ, f is con- ∈ tinuous on [r, s], f (n+1)(t) exists for all t (r, s). Let a and x be distinct points of [r, s] and ∈ define

n f (k)(a) P (x)= (x a)k. (4.17) n k! − Xk=0 Then there exists a point ξ between x and a such that f (n+1)(ξ) f(x)= P (x)+ (x a)n+1. (4.18) n (n + 1)! −

For n = 0, this is just the mean value theorem. Pn(x) is called the nth Taylor polynomial of f at x = a, and the second summand of (4.18) f (n+1)(ξ) R (x, a)= (x a)n+1 n+1 (n + 1)! − is called the Lagrangian remainder term. In general, the theorem shows that f can be approximated by a polynomial of degree n, and that (4.18) allows to estimate the error, if we know the bounds of f (n+1)(x) . Proof. Consider a and x to be fixed; let M be the number defined by

f(x)= P (x)+ M(x a)n+1 n − and put

g(t)= f(t) P (t) M(t a)n+1, for r t s. (4.19) − n − − ≤ ≤ We have to show that (n+1)!M = f (n+1)(ξ) for some ξ between a and x. By (4.17) and (4.19),

g(n+1)(t)= f (n+1)(t) (n + 1)!M, for r

(n) g(a)= g′(a)= = g (a)=0. ···

Our choice of M shows that g(x)=0, so that g′(ξ1)=0 for some ξ1 between a and x, by Rolle’s theorem. Since g′(a)=0 we conclude similarly that g′′(ξ2)=0 for some ξ2 between (n+1) a and ξ1. After n +1 steps we arrive at the conclusion that g (ξn+1)=0 for some ξn+1 between a and ξn, that is, between a and x. 4.5 Taylor’s Theorem 115

Definition 4.5 Suppose that f is a real function defined on [r, s] such that f (n)(t) exists for all

t (r, s) and all n Æ. Let x and a points of [r, s]. Then ∈ ∈ ∞ f (k)(a) T (x)= (x a)k (4.21) f k! − Xk=0 is called the Taylor series of f at a.

Remarks 4.6 (a) The radius r of convergence of a Taylor series can be 0. (b) If T converges, it may happen that T (x) = f(x). If T (x) at a point a converges to f(x) f f 6 f

in a certain neighborhood Ur(a), r> 0, f is called to be analytic at a. Ê Example 4.5 We give an example for (b). Define f : Ê via → 1/x2 e− , if x =0, f(x)= 6 (0, if x =0.

(k) We will show that f C∞(Ê) with f (0) = 0. For we will prove by induction on n that there ∈ exists a polynomial pn such that

(n) 1 1/x2 f (x)= p e− , x =0 n x 6   and f (n)(0) = 0. For n = 0 the statement is clear taking p (x) 1. Suppose the statement is 0 ≡ true for n. First, let x =0 then 6

(n+1) 1 1/x2 ′ 1 1 2 1 1/x2 f (x)= p e− = p′ + p e− . n x −x2 n x x3 n x          2 3 Choose p (t)= p′ (t)t +2p (t)t . n+1 − n n Secondly,

(n) (n) 1 1/h2 (n+1) f (h) f (0) pn h e− x2 f (0) = lim − = lim = lim xpn(x)e− =0, h 0 h h 0 h x → →  →±∞ where we used Proposition2.7 in the last equality. Hence T 0 at 0—the Taylor series is identically 0—and T (x) does not converge to f(x) in f ≡ f a neigborhood of 0.

4.5.1 Examples of Taylor Series (a) Power series coincide with their Taylor series.

∞ n ∞ x x n 1

e = , x Ê, x = , x ( 1, 1). n! ∈ 1 x ∈ − n=0 n=0 X X − (b) f(x) = log(1+ x), see Homework13.4. 116 4 Differentiation

α (c) f(x)=(1+ x) , α Ê, a =0. We have ∈ (k) α k (k) f (x)= α(α 1) (α k+1)(1+x) − , in particular f (0) = α(α 1) (α k+1). − ··· − − ··· − Therefore,

n α(α 1) (α k + 1) (1 + x)α = − ··· − xk + R (x) (4.22) k! n Xk=1 The quotient test shows that the corresponding power series converges for x < 1. Consider | | the Lagrangian remainder term with 0 <ξ α. Then

α α n 1 n+1 α n+1 α R (x) = (1 + ξ) − − x x 0 | n+1 | n +1 ≤ n +1 ≤ n +1 −→       as n . Hence, →∞ ∞ α (1 + x)α = xn, 0

Differentiating this n times and using Leibniz’s formula, Proposition4.6 we have

n (k) 2 (n k) n (y′) (1 + x ) − =0. k Xk=0   n (n+1) 2 n (n) n (n 1) = y (1 + x )+ y 2x + y − 2=0; ⇒ n n 1 n 2       − (n+1) − (n 1) x = 0 : y + n(n 1)y − =0. − This yields 0, if n =2k, y(n)(0) = (( 1)k(2k)!, if n =2k +1. − Therefore,

n ( 1)k arctan x = − x2k+1 + R (x). (4.24) 2k +1 2n+2 Xk=0 One can prove that 1 < x 1 implies R (x) 0 as n . In particular, x =1 gives − ≤ 2n+2 → →∞ π 1 1 =1 + + . 4 − 3 5 − ··· 4.6 Appendix C 117

4.6 AppendixC Ê Corollary 4.16 (to the mean value theorem) Let f : Ê be a differentiable function with →

f ′(x)= cf(x) for all x Ê, (4.25) ∈

where c Ê is a fixed number. Let A = f(0). Then ∈ cx

f(x)= Ae for all x Ê. (4.26) ∈ cx Proof. Consider F (x)= f(x)e− . Using the product rule for derivatives and (4.25) we obtain

cx cx cx F ′(x)= f ′(x)e− + f(x)( c)e− =(f ′(x) cf ′(x)) e− =0. − −

By Corollary 4.11, F (x) is constant. Since F (0) = f(0) = A, F (x) = A for all x Ê; the ∈ statement follows.

The Continuity of derivatives

We have seen that there exist derivatives f ′ which are not continuous at some point. However, not every function is a derivative. In particular, derivatives which exist at every point of an inter- val have one important property: The intermediate value theorem holds. The precise statement follows.

Proposition 4.17 Suppose f is differentiable on [a, b] and suppose f ′(a) <λ 0, so that g(t ) < g(b) for some t (a, b). Hence, 1 ∈ 2 2 ∈ g attains its minimum in the open interval (a, b) in some point x (a, b). By Proposition4.7, ∈ g′(x)=0. Hence, f ′(x)= λ.

Corollary 4.18 If f is differentiable on [a, b], then f ′ cannot have discontinuities of the first kind.

Proof of Proposition4.13. (a) Suppose first that f ′′ 0 for all x. By Corollary4.11, f ′ is ≥ increasing. Let a

Hence, f is convex.

(b) Let f : (a, b) Ê be convex and twice differentiable. Suppose to the contrary f ′′(x ) < 0 → 0 for some x (a, b). Let c = f ′(x ); put 0 ∈ 0 ϕ(x)= f(x) (x x )c. − − 0

Then ϕ: (a, b) Ê is twice differentiable with ϕ′(x )=0 and ϕ′′(x ) < 0. Hence, by → 0 0 Proposition 4.12, ϕ has a local maximum in x0. By definition, there is a δ > 0 such that U (x ) (a, b) and δ 0 ⊂ ϕ(x δ) <ϕ(x ), ϕ(x + δ) <ϕ(x ). 0 − 0 0 0 It follows that 1 1 f(x )= ϕ(x ) > (ϕ(x δ)+ ϕ(x + δ)) = (f(x δ)+ f(x + δ)) . 0 0 2 0 − 0 2 0 − 0 This contradicts the convexity of f if we set x = x δ, y = x + δ, and λ =1/2. 0 − 0 Chapter 5

Integration

In the first section of this chapter derivatives will not appear! Roughly speaking, integration generalizes “addition”. The formula distance = velocity time is only valid for constant t × velocity. The right formula is s = 1 v(t) dt. We need integrals to compute length of curves, t0 areas of surfaces, and volumes. R The study of integrals requires a long preparation, but once this preliminary work has been completed, integrals will be an invaluable tool for creating new functions, and the derivative will reappear more powerful than ever. The relation between the integral and derivatives is given in the Fundamental Theorem of Calculus.

The integral formalizes a simple intuitive concept—that of area. It is not a surprise that to learn the definition of an intu- itive concept can present great difficulties—“area” is certainly What’s the area ?? not an exception.

a b

5.1 The Riemann–Stieltjes Integral

In this section we will only define the area of some very special regions—those which are bounded by the horizontal axis, the vertical lines through (a, 0) and (b, 0) and the graph of a function f such that f(x) 0 for all x in [a, b]. If f is negative on a subinterval of [a, b], the ≥ integral will represent the difference of the areas above and below the x-axis. All intervals [a, b] are finite intervals.

Definition 5.1 Let [a, b] be an interval. By a partition of [a, b] we mean a finite set of points x0, x1,...,xn, where a = x x x = b. 0 ≤ 1 ≤···≤ n We write

∆xi = xi xi 1, i =1,...,n. − − 119 120 5 Integration

Now suppose f is a bounded real function defined on [a, b]. Corresponding to each partition P of [a, b] we put

Mi = sup f(x) x [xi 1, xi] (5.1) { | ∈ − } mi = inf f(x) x [xi 1, xi] (5.2) − n { | ∈ } n x= a x x x= b 0 1 2 3 U(P, f)= Mi∆xi, L(P, f)= mi∆xi, (5.3) i=1 i=1 X X and finally

b f dx = inf U(P, f), (5.4) a Z b f dx = sup L(P, f), (5.5) Z a where the infimum and supremum are taken over all partitions P of [a, b]. The left members of (5.4) and (5.5) are called the upper and lower Riemann integrals of f over [a, b], respectively. If the upper and lower integrals are equal, we say that f Riemann-integrable on [a, b] and we write f R (that is R denotes the Riemann-integrable functions), and we denote the common ∈ value of (5.4) and (5.5) by b b f dx or by f(x) dx. (5.6) Za Za This is the Riemann integral of f over [a, b]. Since f is bounded, there exist two numbers m and M such that m f(x) M for all ≤ ≤ x [a, b]. Hence for every partition P ∈ m(b a) L(P, f) U(P, f) M(b a), − ≤ ≤ ≤ − so that the numbers L(P, f) and U(P, f) form a bounded set. This shows that the upper and the lower integrals are defined for every bounded function f. The question of their equality, and hence the question of the integrability of f, is a more delicate one. Instead of investigating it separately for the Riemann integral, we shall immediately consider a more general situation.

Definition 5.2 Let α be a monotonically increasing function on [a, b] (since α(a) and α(b) are finite, it follows that α is bounded on [a, b]). Corresponding to each partition P of [a, b], we write

∆αi = α(xi) α(xi 1). − − It is clear that ∆α 0. For any real function f which is bounded on [a, b] we put i ≥ n

U(P,f,α)= Mi∆αi, (5.7) i=1 Xn

L(P,f,α)= mi∆αi, (5.8) i=1 X 5.1 The Riemann–Stieltjes Integral 121

where Mi and mi have the same meaning as in Definition5.1, and we define

b f dα = inf U(P,f,α), (5.9) a Z b f dα = sup U(P,f,α), (5.10) Z a where the infimum and the supremum are taken over all partitions P . If the left members of (5.9) and (5.10) are equal, we denote their common value by

b b f dα or sometimes by f(x) dα(x). (5.11) Za Za This is the Riemann–Stieltjes integral (or simply the Stieltjes integral) of f with respect to α, over [a, b]. If (5.11) exists, we say that f is integrable with respect to α in the Riemann sense, and write f R(α). ∈ By taking α(x)= x, the Riemann integral is seen to be a special case of the Riemann–Stieltjes integral. Let us mention explicitely, that in the general case, α need not even be continuous. We shall now investigate the existence of the integral (5.11). Without saying so every time, f will be assumed real and bounded, and α increasing on [a, b].

Definition 5.3 We say that a partition P ∗ is a refinement of the partition P if P ∗ P (that ⊃ is, every point of P is a point of P ∗). Given two partitions, P1 and P2, we say that P ∗ is their common refinement if P ∗ = P P . 1 ∪ 2

Lemma 5.1 If P ∗ is a refinement of P , then

L(P,f,α) L(P ∗,f,α) and U(P,f,α) U(P ∗,f,α). (5.12) ≤ ≥ Proof. We only prove the first inequality of (5.12); the proof of the second one is analogous.

Suppose first that P ∗ contains just one point more than P . Let this extra point be x∗, and suppose xi 1 x∗ < xi, where xi 1 and xi are two consecutive points of P . Put − ≤ −

w1 = inf f(x) x [xi 1, x∗] , w2 = inf f(x) x [x∗, xi] . { | ∈ − } { | ∈ } Clearly, w m and w m (since inf M inf N if M N, see homework 1.4 (b)), 1 ≥ i 2 ≥ i ≥ ⊂ where, as before, mi = inf f(x) x [xi 1, xi] . Hence { | ∈ − }

L(P ∗,f,α) L(P,f,α)= w1(α(x∗) α(xi 1)) + w2(α(xi) α(x∗)) mi(α(xi) α(xi 1)) − − − − − − − =(w1 mi)(α(x∗) α(xi 1))+(w2 mi)(α(xi) α(x∗)) 0. − − − − − ≥

If P ∗ contains k points more than P , we repeat this reasoning k times, and arrive at (5.12). 122 5 Integration

Proposition 5.2 b b f dα f dα. ≤ Z a Z a Proof. Let P ∗ be the common refinement of two partitions P1 and P2. By Lemma5.1

L(P ,f,α) L(P ∗,f,α) U(P ∗,f,α) U(P ,f,α). 1 ≤ ≤ ≤ 2 Hence

L(P ,f,α) U(P ,f,α). (5.13) 1 ≤ 2

If P2 is fixed and the supremum is taken over all P1, (5.13) gives

b f dα U(P ,f,α). (5.14) ≤ 2 Z a

The proposition follows by taking the infimum over all P2 in (5.14).

Proposition 5.3 (Riemann Criterion) f R(α) on [a, b] if and only if for every ε > 0 there ∈ exists a partition P such that

U(P,f,α) L(P,f,α) < ε. (RC) − Proof. For every P we have

b b L(P,f,α) f dα f dα U(P,f,α). ≤ ≤ ≤ Z a Z a Thus (RC) implies b b 0 f dα f dα<ε. ≤ − Z a Z a since the above inequality can be satisfied for every ε> 0, we have

b b f dα = f dα, Z a Z a that is f R(α). ∈ Conversely, suppose f R(α), and let ε > 0 be given. Then there exist partitions P and P ∈ 1 2 such that b ε b ε U(P ,f,α) f dα< , f dα L(P ,f,α) < . (5.15) 2 − 2 − 1 2 Za Za We choose P to be the common refinement of P1 and P2. Then Lemma5.1, together with (5.15), shows that

b ε U(P,f,α) U(P ,f,α) < f dα + < L(P ,f,α)+ ε L(P,f,α)+ ε, ≤ 2 2 1 ≤ Za 5.1 The Riemann–Stieltjes Integral 123 so that (RC) holds for this partition P .

Proposition5.3 furnishes a convenient criterion for integrability. Before we apply it, we state some closely related facts.

Lemma 5.4 (a) If (RC) holds for P and some ε, then (RC) holds with the same ε for every refinement of P .

(b) If (RC) holds for P = x0,...,xn and if si, ti are arbitrary points in [xi 1, xi], then { } − n f(s ) f(t ) ∆α < ε. | i − i | i i=1 X (c) If f R(α) and (RC) holds as in (b), then ∈ n b f(ti)∆αi f dα < ε. − a i=1 Z X

Proof. Lemma5.1 implies (a). Under the assumptions made in (b), both f(si) and f(ti) lie in [m , M ], so that f(s ) f(t ) M m . Thus i i | i − i | ≤ i − i n f(t ) f(s ) ∆α U(P,f,α) L(P,f,α), | i − i | i ≤ − i=1 X which proves (b). The obvious inequalities

L(P,f,α) f(t )∆α U(P,f,α) ≤ i i ≤ i X and b L(P,f,α) f dα U(P,f,α) ≤ ≤ Za prove (c).

Theorem 5.5 If f is continuous on [a, b] then f R(α) on [a, b]. ∈ Proof. Let ε> 0 be given. Choose η > 0 so that

(α(b) α(a))η <ε. − Since f is uniformly continuous on [a, b] (Proposition3.7), there exists a δ > 0 such that

f(x) f(t) < η (5.16) | − | if x, t [a, b] and x t < δ. If P is any partition of [a, b] such that ∆x < δ for all i, then ∈ | − | i (5.16) implies that

M m η, i =1,...,n (5.17) i − i ≤ 124 5 Integration and therefore n n U(P,f,α) L(P,f,α)= (M m )∆α η ∆α = η(α(b) α(a)) < ε. − i − i i ≤ i − i=1 i=1 X X By Proposition5.3, f R(α). ∈

Example 5.1 (a) The proof of Theorem5.5 together with Lemma5.4 shows that

n b f(ti)∆αi f dα <ε − a i=1 Z X if ∆x < δ. i We compute I = b sin x dx. Let ε> 0. Since sin x is continuous, f R. There exists δ > 0 a ∈ such that x t < δ implies | − | R ε sin x sin t < . (5.18) | − | b a − In this case (RC) is satisfied and consequently

n b sin(ti)∆xi sin x dx <ε − a i=1 Z X for every partition P with ∆x < δ, i =1,...,n. i For we choose an equidistant partition of [a, b], xi = a + (b a)i/n, i = 0,...,n. Then − (b a)2 h = ∆x =(b a)/n and the condition (5.18) is satisfied provided n> − . We have, by i − ε addition the formula cos(x y) cos(x + y) = 2 sin x sin y − − n n h n sin x ∆x = sin(a + ih)h = 2 sin h/2 sin(a + ih) i i 2 sin h/2 i=1 i=1 i=1 X X X h n = (cos(a +(i 1/2)h) cos(a +(i +1/2)h)) 2 sin h/2 − − i=1 X h = (cos(a + h/2) cos(a +(n +1/2)h)) 2 sin h/2 − h/2 = (cos(a + h/2) cos(b + h/2))) sin h/2 −

Since limh 0 sin h/h = 1 and cos x is continuous, we find that the above expression tends to → cos a cos b. Hence b sin x dx = cos a cos b. − a − (b) For x [a, b] define ∈ R

1, x É, f(x)= ∈

(0, x É. 6∈ We will show f R. Let P be any partition of [a, b]. Since any interval contains rational 6∈ as well as irrational points, mi = 0 and Mi = 1 for all i. Hence L(P, f)= 0 whereas 5.1 The Riemann–Stieltjes Integral 125

U(P, f)= n ∆x = b a. We conclude that the upper and lower Riemann integrals don’t i=1 i − coincide; f R. A similar reasoning shows f R(α) if α(b) > α(a) since L(P,f,α)=0 < P6∈ 6∈ U(P,f,α)= α(b) α(a). − Proposition 5.6 If f is monotonic on [a, b], and α is continuous on [a, b], then f R(α). ∈

α (x) Let ε > 0 be given. For any positive integer n, choose a partition such that

α(b) α(a) ∆α = − , i =1,...,n. Proof. i n This is possible by the intermediate value theorem (Theorem 3.5) since α is continuous. b a x x x 1 2 n-1 We suppose that f is monotonically increasing (the proof is analogous in the other case). Then

Mi = f(xi), mi = f(xi 1), i =1,...,n, − so that

α(b) α(a) n U(P,f,α) L(P,f,α)= − (f(xi) f(xi 1)) − n − − i=1 X α(b) α(a) = − (f(b) f(a)) <ε n − if n is taken large enough. By Proposition5.3, f R(α). ∈

Without proofs which can be found in [Rud76, pp. 126 –128] we note the following facts.

Proposition 5.7 If f is bounded on [a, b], f has finitely many points of discontinuity on [a, b], and α is continuous at every point at which f is discontinuous. Then f R(α). ∈ Proof. We give an idea of the proof in case of the Riemann integral (α(x) = x) and one single discontinuity at c, a 0 be given and m f(x) M for all x [a, b] and ≤ ≤ ∈ put C = M m. First choose point a′ and b′ with a < a′

U(P, f) L(P, f)= U(P , f ) L(P , f)+ U(P , f) L(P , f)+(M m )(b′ a′) − 1 1 − 1 2 − 2 0 − 0 − ε + ε + C(b′ a′) < 3ε, ≤ − 126 5 Integration

where M0 and m0 are the supremum and infimum of f(x) on [a′, b′]. The Riemann criterion is satisfied for f on [a, b], f R. ∈

Proposition 5.8 If f R(α) on [a, b], m f(x) M, ϕ is continuous on [m, M], and ∈ ≤ ≤ h(x)= ϕ(f(x)) on [a, b]. Then h R(α) on [a, b]. ∈ Remark 5.1 (a) A bounded function f is Riemann-integrable on [a, b] if and only if f is con- tinuous almost everywhere on [a, b]. (The proof of this fact can be found in [Rud76, Theo- rem 11.33]).

“Almost everywhere” means that the discontinuities form a set of (Lebesgue) measure 0. Aset Æ M Ê has measure 0 if for given ε> 0 there exist intervals I , n such that M I n n Æ n ⊂ ∈ ⊂ ∈ and I <ε. Here, I denotes the length of the interval. Examples of sets of measure n Æ n ∈ | | | | S 0 are finite sets, countable sets, and the Cantor set (which is uncountable). P (b) Note that such a “chaotic” function (at point 0) as

cos 1 , x =0, f(x)= x 6 (0, x =0, is integrable on [ π, π] since there is only one single discontinuity at 0. −

5.1.1 Properties of the Integral Proposition 5.9 (a) If f , f R(α) on [a, b] then f + f R(α), cf R(α) for every 1 2 ∈ 1 2 ∈ ∈ constant c and

b b b b b (f1 + f2) dα = f1 dα + f2 dα, cf dα = c f dα. Za Za Za Za Za (b) If f , f R(α) and f (x) f (x) on [a, b], then 1 2 ∈ 1 ≤ 2 b b f dα f dα. 1 ≤ 2 Za Za (c) If f R(α) on [a, b] and if a

(e) If f R(α ) and f R(α ) , then f R(α + α ) and ∈ 1 ∈ 2 ∈ 1 2 b b b fd(α1 + α2)= fdα1 + fdα2; Za Za Za 5.1 The Riemann–Stieltjes Integral 127 if f R(α) and c is a positive constant, then f R(cα) and ∈ ∈ b b fd(cα)= c f dα. Za Za Proof. If f = f1 + f2 and P is any partition of [a, b], we have L(P, f , α)+ L(P, f , α) L(P,f,α) U(P,f,α) U(P, f , α)+ U(P, f , α) (5.19) 1 2 ≤ ≤ ≤ 1 2 since inf f1 + inf f2 inf(f1 + f2) and sup f1 + sup f2 sup(f1 + f2). Ii Ii ≤ Ii Ii Ii ≥ Ii If f R(α) and f R(α), let ε> 0 be given. There are partitons P , j =1, 2, such that 1 ∈ 2 ∈ j U(P , f , α) L(P , f , α) < ε. j j − j j

These inequalities persist if P1 and P2 are replaced by their common refinement P . Then (5.19) implies U(P,f,α) L(P,f,α) < 2ε − which proves that f R(α). With the same P we have ∈ b U(P, fj, α) < fj dα + ε, j =1, 2; Za b since L(P,f,α) < a f dα < U(P,f,α); hence (5.19) implies R b b b f dα U(P,f,α) < f dα + f dα +2ε. ≤ 1 2 Za Za Za Since ε was arbitrary, we conclude that

b b b f dα f dα + f dα. (5.20) ≤ 1 2 Za Za Za If we replace f and f in (5.20) by f and f , respectively, the inequality is reversed, and 1 2 − 1 − 2 the equality is proved. (b) Put f = f f . It suffices to prove that b f dα 0. For every partition P we have m 0 2 − 1 a ≥ i ≥ since f 0. Hence ≥ b R n f dα L(P,f,α)= m ∆α 0 ≥ i i ≥ a i=1 Z X since in addition ∆αi = α(xi) α(xi 1) 0 (α is increasing). − − ≥ The proofs of the other assertions are so similar that we omit the details. In part (c) the point is that (by passing to refinements) we may restrict ourselves to partitions which contain the point b c, in approximating a f dα, cf. Homework14.5. R Note that in (c), f R(α) on [a, c] and on [c, b] in general does not imply that f R(α) on ∈ ∈ [a, b]. For example consider the interval [ 1, 1] with − 0, 1 x< 0, f(x)= α(x)= − ≤ (1, 0 x 1. ≤ ≤ 128 5 Integration

Then 1 f dα = 0. The integral vanishes since α is constant on [0, 1]. However, f R(α) on 0 6∈ [ 1, 1] since for any partition P including the point 0, we have U(P,f,α)=1 and L(P,f,α)= − R 0.

Proposition 5.10 If f,g R(α) on [a, b], then ∈ (a) fg R(α); ∈ b b (b) f R(α) and f dα f dα. | | ∈ ≤ | | Za Za

Proof. If we take ϕ(t )= t2, Proposition5.8 shows that f 2 R(α) if f R(α). The identity ∈ ∈ 4fg =(f + g)2 (f g)2 − − completes the proof of (a). If we take ϕ(t) = t , Proposition5.8 shows that f R(α). Choose c = 1 so that | | | | ∈ ± c f dα 0. Then ≥ R f dα = c f dα = cf dα f dα, ≤ | | Z Z Z Z

since f f . ± ≤| |

The unit step function or Heaviside function H(x) is defined by H(x)=0 if x < 0 and H(x)=1 if x 0. ≥

Example 5.2 (a) If a

U(P,f,α)= M2, L(P,f,α)= m2.

Since f is continuous at s, we see that M and m converge to f(s) as x s. 2 2 → (b) Suppose c 0 for all n = 1,...,N and (s ), n = 1,...,N, is a strictly increasing finite n ≥ n sequence of distinct points in (a, b). Further, α(x)= N c H(x s ). Then n=1 n − n P b N f dα = cnf(sn). a n=1 Z X This follows from (a) and Proposition5.9(e). 5.1 The Riemann–Stieltjes Integral 129

Proposition 5.11 Suppose c 0 for all positive integers n ≥ n Æ, ∞ c converges, (s ) is a strictly increasing se- ∈ n=1 n n quence of distinct points in (a, b), and P c ∞ 3 α(x)= c H(x s ). (5.21) c2 n − n n=1 c X 1 s s s s Let f be continuous on [a, b]. Then 1 2 3 4

b ∞ f dα = cnf(sn). (5.22) a n=1 Z X Proof. The comparison test shows that the series (5.21) converges for every x. Its sum α is evidently an increasing function with α(a)=0 and α(b) = cn. Let ε > 0 be given, choose N so that ∞ P cn < ε. n=N+1 X Put N ∞ α (x)= c H(x s ), α (x)= c H(x s ). 1 n − n 2 n − n n=1 n=N+1 X X By Proposition5.9 and Example5.2

b N f dα1 = cnf(sn). a n=1 Z X Since α (b) α (a) <ε, by Proposition5.9(d), 2 − 2 b f dα Mε, 2 ≤ Za where M = sup f(x) . Since α = α + α it follows that | | 1 2 b N f dα cnf(sn) Mε. a − ≤ Z n=1 X If we let N we obtain (5.22). →∞

Proposition 5.12 Assume that α is increasing and α′ R on [a, b]. Let f be a bounded real ∈ function on [a, b].

Then f R(α) if and only if fα′ R. In that case ∈ ∈ b b f dα = f(x)α′(x) dx. (5.23) Za Za The statement remains true if α is continuous on [a, b] and differentiable up to finitely many points c1,c2,...,cn. 130 5 Integration

Proof. Let ε > 0 be given and apply the Riemann criterion Proposition5.3 to α′: There is a partition P = x ,...,x of [a, b] such that { 0 n} U(P, α′) L(P, α′) < ε. (5.24) −

The mean value theorem furnishes points ti [xi 1, xi] such that ∈ −

∆αi = α(xi) α(xi 1)= α′(ti)(xi xi 1)= α′(ti)∆xi, for i =1,...,n. − − − −

If si [xi 1, xi], then ∈ − n α′(s ) α′(t ) ∆x <ε (5.25) | i − i | i i=1 X by (5.24) and Lemma5.4(b). Put M = sup f(x) . Since | | n n

f(si)∆αi = f(si)α′(ti)∆xi i=1 i=1 X X it follows from (5.25) that

n n

f(si)∆αi f(si)α′(si)∆xi Mε. (5.26) − ≤ i=1 i=1 X X In particular, n f(s )∆α U(P,fα′)+ Mε, i i ≤ i=1 X for all choices of si [xi 1, xi], so that ∈ − U(P,f,α) U(P,fα′)+ Mε. ≤ The same argument leads from (5.26) to

U(P,fα′) U(P,f,α)+ Mε. ≤ Thus

U(P,f,α) U(P,fα) Mε. (5.27) | − | ≤ Now (5.25) remains true if P is replaced by any refinement. Hence (5.26) also remains true. We conclude that b b f dα f(x)α′(x) dx Mε. a − a ≤ Z Z

But ε is arbitrary. Hence b b f dα = f(x)α′(x) dx, Z a Z a for any bounded f. The equality for the lower integrals follows from (5.26) in exactly the same way. The proposition follows.

We now summarize the two cases. 5.1 The Riemann–Stieltjes Integral 131

Proposition 5.13 Let f be continuous on [a, b]. Except for finitely many points c0,c1,...,cn with c0 = a and cn = b there exists α′(x) which is continuous and bounded on [a, b] c ,...,c . \ { 0 n} Then f R(α) and ∈ n 1 b b − f dα = f(x)α′(x) dx + f(c )(α(c + 0) α(c 0))+ i i − i − a a i=1 Z Z X f(a)(α(a + 0) α(a))+ f(b)(α(b) α(b 0)). − − −

+ Proof (Sketch of proof). (a) Note that A = α(c + 0) α(c ) and A− = α(c ) α(c 0) i i − i i i − i − exist by Theorem3.8. Define

n 1 k − + α (x)= A H(x c )+ A−H(c x). 1 i − i − i i − i=0 i=1 X X (b) Then α = α α is continuous. 2 − 1 (c) Since α is piecewise constant, α′ (x)=0 for x = c . Hence α′ (x) = α′(x). for x = c . 1 1 6 k 2 6 i Applying Proposition5.12 gives

b b b

fdα2 = fα2′ dx = fα′ dx. Za Za Za Further, b b b b f dα = fd(α1 + α2)= fα′ dx + fdα1. Za Za Za Za By Proposition5.11 n n 1 b − + fdα = A f(c ) A−( f(c )). 1 i i − i − i a i=1 i=1 Z X X

Example 5.3 (a) The Fundamental Theorem of Calculus, see Theorem5.15 yields

2 2 x4 2 x dx3 = x 3x2 dx = 3 = 12. · 4 Z0 Z0 0

(b) f(x)= x2.

x, 0 x< 1, ≤ 7, x =1, α(x)=  x2 + 10, 1

2 2 f dα = fα′ dx + f(1)(α(1+0) α(1 0)) + f(2)(α(2) α(2 0)) 0 0 − − − − Z Z 1 2 = x2 1 dx + x2 2x dx + 1(11 1) + 4(64 14) · · − − Z0 Z1 x3 1 x4 2 1 1 5 = + +10+200 = +8 +210 = 217 . 3 2 3 − 2 6 0 1

Remark 5.2 The three preceding proposition show the flexibility of the Stieltjes process of integration. If α is a pure step function, the integral reduces to an infinite series. If α has an initegrable derivative, the integral reduces to the ordinary Riemann integral. This makes it possible to study series and integral simultaneously, rather than separately.

5.2 Integration and Differentiation

We shall see that integration and differentiation are, in a certain sense, inverse operations.

Theorem 5.14 Let f R on [a, b]. For a x b put ∈ ≤ ≤ x F (x)= f(t) dt. Za Then F is continuous on [a, b]; furthermore, if f is continuous at x [a, b] then F is differen- 0 ∈ tiable at x0 and F ′(x0)= f(x0). Proof. Since f R, f is bounded. Suppose f(t) M on [a, b]. If a x 0, we see that

F (y) F (x) < ε, | − | provided that y x < ε/M. This proves continuity (and, in fact, uniform continuity) of F . | − | Now suppose that f is continuous at x0. Given ε> 0, choose δ > 0 such that

f(t) f(x ) <ε | − 0 | if t x < δ, t [a, b]. Hence, if | − 0 | ∈ x δ

5.2 Integration and Differentiation 133

This in particular holds for s = x0, that is

F (t) F (x ) − 0 f(x ) < ε. t x − 0 0 −

It follows that F ′(x0)= f(x0).

Definition 5.4 A function F : [a, b] Ê is called an antiderivative or a primitive of a function → f : [a, b] Ê if F is differentiable and F ′ = f. → Remarks 5.3 (a) There exist functions f not having an antiderivative, for example the Heav- iside function H(x) has a simple discontinuity at 0; but by Corollary 4.18 derivatives cannot have simple discontinuities. (b) The antiderivative F of a function f (if it exists) is unique up to an additive constant. More precisely, if F is a antiderivative on [a, b], then F1(x) = F (x)+ c is also a antiderivative of f. If F and G are antiderivatives of f on [a, b], then there is a constant c so that F (x) G(x)= c. − The first part is obvious since F1′(x)= F ′(x)+ c′ = f(x). Suppose F and G are antiderivatives of f. Put H(x)= F (x) G(x); then H′(x)=0 and H(x) is constant by Corollary4.11. − Notation for the antiderivative:

F (x)= f(x) dx = f dx. Z Z The function f is called the integrand. Integration and differentiation are inverse to each other:

d f(x) dx = f(x), f ′(x) dx = f(x). dx Z Z

Theorem 5.15 (Fundamental Theorem of Calculus) Let f : [a, b] Ê be continuous. (a) If → x F (x)= f(t) dt. Za Then F (x) is an antiderivative of f(x) on [a, b]. (b) If G(x) is an antiderivative of f(x) then

b f(t) dt = G(b) G(a). − Za Proof. (a) By Theorem 5.14 F (x) = x f(x) dx is differentiable at any point x [a, b] with a 0 ∈ F (x)= f(x). ′ R (b) By the above remark, the antiderivative is unique up to a constant, hence F (x) G(x)= C. a − Since F (a)= a f(x) dx =0 we obtain

R b G(b) G(a)=(F (b) C) (F (a) C)= F (b) F (a)= F (b)= f(x) dx. − − − − − Za 134 5 Integration

Note that the FTC is also true if f R and G is an antiderivative of f on [a, b]. Indeed, let ε> 0 ∈ be given. By the Riemann criterion, Proposition5.3 there exists a partition P = x ,...,x { 0 n} of [a, b] such that U(P, f) L(P, f) < ε. By the mean value theorem, there exist points − ti [xi 1, xi] such that ∈ − F (xi) F (xi 1)= f(ti)(xi xi 1), i =1,...,n. − − − − Thus n F (b) F (a)= f(t )∆x . − i i i=1 X It follows from Lemma5.4(c) and the above equation that

n b b f(ti)∆xi f(x) dx = F (b) F (a) f(x) dx < ε. − a − − a i=1 Z Z X Since ε> 0 was arbitrary, the proof is complete.

5.2.1 Table of Antiderivatives By differentiating the right hand side one gets the left hand side of the table.

function domain antiderivative α 1 α+1

x α Ê 1 ,x> 0 x ∈ \ {− } α +1 1 x< 0 or x> 0 log x x | | x x

e Ê e x x a

a a> 0, a =1, x Ê 6 ∈ log a

sin x Ê cos x −

cos x Ê sin x

1 Z Ê kπ k cot x sin2 x \ { | ∈ } −

1 π Z Ê + kπ k tan x cos2 x \ 2 | ∈ 1 n o

Ê arctan x 1+ x2

1 2 Ê arsinh x = log(x + √x + 1) √1+ x2 1 1 1 log(x + √x2 1) √x2 1 − − − 5.2 Integration and Differentiation 135

5.2.2 Integration Rules The aim of this subsection is to calculate antiderivativesof composed functions using antideriva- tives of (already known) simpler functions. Notation: f(x) b := f(b) f(a). |a − Proposition 5.16 (a) Let f and g be functions with antiderivatives F and G, respectively. Then

af(x)+ bg(x), a, b Ê, has the antiderivative aF (x)+ bG(x). ∈ (af + bg) dx = a f dx + b g dx (Linearity.) Z Z Z

(b) If f and g are differentiable, and f(x)g′(x) has a antiderivative then f ′(x)g(x) has a an- tiderivative, too:

f ′g dx = fg fg′ dx, (Integration by parts.) (5.28) − Z Z If f and g are continuously differentiable on [a, b] then

b b b f ′g dx = f(x)g(x) fg′ dx. (5.29) |a −

Za Za Ê (c) If ϕ: D Ê is continuously differentiable with ϕ(D) I, and f : I has a antideriva- → ⊂ → tive F , then

f(ϕ(x))ϕ′(x) dx = F (ϕ(x)), (Change of variable.) (5.30)

Z Ê If ϕ: [a, b] Ê is continuously differentiable with ϕ([a, b]) I and f : I is continuous, → ⊂ → then

b ϕ(b) f(ϕ(t))ϕ′(t) dt = f(x) dx. Za Zϕ(a) Proof. Since differentiation is linear, (a) follows. (b) Differentiating the right hand side, we obtain d (fg fg′ dx)= f ′g + fg′ fg′ = f ′g dx − − Z which proves the statement. (c) By the chain rule F (ϕ(x)) is differentiable with d F (ϕ(x)) = F ′(ϕ(x))ϕ′(x)= f(ϕ(x))ϕ′(x), dx and (c) follows. The statements about the Riemann integrals follow from the statements about antiderivatives 136 5 Integration using the fundamental theorem of calculus. For example, we show the second part of (c). By the above part, F (ϕ(t)) is an antiderivative of f(ϕ(t))ϕ′(t). By the FTC we have

b b f(ϕ(t))ϕ′(t) dt = F (ϕ(t)) = F (ϕ(b)) F (ϕ(a)). |a − Za On the other hand, again by the FTC,

ϕ(b) f(x) dx = F (x) ϕ(b) = F (ϕ(b)) F (ϕ(a)). |ϕ(a) − Zϕ(a) This completes the proof of (c).

Corollary 5.17 Suppose F is the antiderivative of f. 1 f(ax + b) dx = F (ax + b), a = 0; (5.31) a 6 Z g (x) ′ dx = log g(x) , (g differentiable and g(x) =0). (5.32) g(x) | | 6 Z n k Example 5.4 (a) The antiderivative of a polymnomial. If p(x)= k=0 akx , then p(x) dx = n ak xk+1. k=0 k+1 P R (b) Put f (x) = ex and g(x)= x, then f(x) = ex and g (x)=1 and we obtain P ′ ′ xex dx = xex 1 ex dx = ex(x 1). − · − Z Z (c) I = (0, ). log x dx = 1 log x dx = x log x x 1 dx = x log x x. ∞ · − x − (d) R R R 1 arctan x dx = 1 arctan x dx = x arctan x x dx · − 1+ x2 Z Z Z 1 (1 + x2) 1 = x arctan x ′ dx = x arctan x log(1 + x2). − 2 1+ x2 − 2 Z In the last equation we made use of (5.32). (e) Recurrent computation of integrals. dx

I := , n Æ. n (1 + x2)n ∈ Z I1 = arctan x. (1 + x2) x2 x2 dx In = − = In 1 . (1 + x2)n − − (1 + x2)n Z Z x Put u = x, v′ = . Then U ′ =1 and (1 + x2)n

x dx 1 (1 + x2)1 n v = = − . (1 + x2)n 2 1 n Z − 5.2 Integration and Differentiation 137

Hence, 2 1 n 1 x(1 + x ) − 1 2 1 n In = In 1 (1 + x ) − dx − − 2 1 n − 2(1 n) − − Z x 2n 3 In = + − In 1. (2n 2)(1 + x2)n 1 2n 2 − − − − x 1 x 3 In particular, I2 = 2(1+x2) + 2 arctan x and I3 = 4(1+x2)2 + 4 I2.

Proposition 5.18 (Mean Value Theorem of Integration) Let f,ϕ: [a, b] Ê be continuous → functions and ϕ 0. Then there exists ξ [a, b] such that ≥ ∈ b b f(x)ϕ(x) dx = f(ξ) ϕ(x) dx. (5.33) Za Za In particular, in case ϕ =1 we have b f(x) dx = f(ξ)(b a) − Za for some ξ [a, b]. ∈ Proof. Put m = inf f(x) x [a, b] and M = sup f(x) x [a, b] . Since ϕ 0 we obtain { | ∈ } { | ∈ } ≥ mϕ(x) f(x)ϕ(x) Mϕ(x). By Proposition5.9(a) and (b) we have ≤ ≤ b b b m ϕ(x) dx f(x)ϕ(x) dx M ϕ(x) dx. ≤ ≤ Za Za Za Hence there is a µ [m, M] such that ∈ b b f(x)ϕ(x) dx = µ ϕ(x) dx. Za Za Since f is continuous on [a, b] the intermediate value theorem Theorem3.5 ensures that there is a ξ with µ = f(ξ). The claim follows.

Example 5.5 The trapezoid rule. Let f : [0, 1] Ê be twice continuously differentiable. Then → there exists ξ [0, 1] such that ∈ 1 1 1 f(x) dx = (f(0) + f(1)) f ′′(ξ). (5.34) 2 − 12 Z0 1 1 Proof. Let ϕ(x)= x(1 x) such that ϕ(x) 0 for x [0, 1], ϕ′(x)= x, and ϕ′′(x)= 1. 2 − ≥ ∈ 2 − − Using integration by parts twice as well as Theorem5.18 we find 1 1 1 1 f(x) dx = ϕ′′(x)f(x) dx = ϕ′(x)f(x) 0 + ϕ′(x)f ′(x) dx 0 − 0 − | 0 Z Z 1 Z 1 1 = (f(0) + f(1)) + ϕ(x)f ′(x) ϕ(x)f ′′(x) dx 2 |0 − Z0 1 1 = (f(0) + f(1)) f ′′(ξ) ϕ(x) dx 2 − Z0 1 1 = (f(0) + f(1)) f ′′(ξ). 2 − 12 138 5 Integration

1 Indeed, 1 1 x 1 x2 dx = 1 x2 1 x3 = 1 1 = 1 . 0 2 − 2 4 − 6 0 4 − 6 12 R 

5.2.3 Integration of Rational Functions We will give a useful method to compute antiderivatives of an arbitrary rational function. Consider a rational function p/q where p and q are polinomials. We will assume that deg p < deg q; for otherwise we can express p/q as a polynomial function plus a rational function which is of this form, for eample x2 1 = x +1+ . x 1 x 1 − − Polynomials

We need some preliminary facts on polynomials which are stated here without proof. Recall that a non-zero constant polynomial has degree zero, deg c =0 if c =0. By definition, the zero 6 polynomial has degree , deg 0 = . −∞ −∞ Theorem 5.19 (Fundamental Theorem of Algebra) Every polynomial p of positive degree with complex coefficients has a complex root, i.e. there exists a complex number z such that p(z)=0.

Lemma 5.20 (Long Division) Let p and q be polynomials, then there exist unique polynomials r and s such that p = qs + r, deg r< deg q.

Lemma 5.21 Let p be a complex polynomial of degree n 1 and leading coefficient a . Then ≥ n there exist n uniquely determined numbers z1,...,zn (which may be equal) such that

p(z)= a (z z )(z z ) (z z ). n − 1 − 2 ··· − n Proof. We use induction over n and the two preceding statements. In case n = 1 the linear polynomial p(z)= az + b can be written in the desired form b b p(z)= a z − with the unique root z = . − a 1 −a   Suppose the statement is true for all polynomials of degree n 1. We will show it for degree n − polynomials. For, let zn be a complex root of p which exists by Theorem 5.19; p(zn)=0. Using long division of p by the linear polynomial q(z)= z z we obtain a quotient polynomial p (z) − n 1 and a remainder polynomial r(z) of degree 0 (a constant polynomial) such that

p(z)=(z z )p (z)+ r(z). − n 1

Inserting z = zn gives p(zn)=0= r(zn); hence the constant r vanishes and we have

p(z)=(z z )p (z) − n 1 5.2 Integration and Differentiation 139 with a polynomial p (z) of degree n 1. Applying the induction hypothesis to p the statement 1 − 1 follows.

A root α of p is said to be a root of multiplicity k, k Æ, if α appears exactly k times among ∈ the zeros z , z ,...,z . In that case (z α)k divides p(z) but (z α)k+1 not. 1 2 n − − If p is a real polynomial, i.e. a polynomial with real coefficients, and α is a root of multiplicity k of p then α is also a root of multiplicity k of p. Indeed, taking the complex conjugation of the equation p(z)=(z α)kq(z) − we have since p(z)= p(z)= p(z)

p(z)=(z α)kq(z) = p(z)=(z α)kq(z). − z:=⇒z − Note that the product of the two complex linear factors z α and z α yield a real quadratic − − factor (z α)(z α)= z2 (α + α)z + αα = z2 2 Re α + α 2 . − − − − | | Using this fact, the real version of Lemma5.21 is as follows.

Lemma 5.22 Let q be a real polynomial of degree n with leading coefficient an. Then there

exist real numbers α , β ,γ and multiplicities r ,s Æ, i =1,...,k, j =1,...,l such that i j j i j ∈ k l q(x)= a (x α )ri (x2 2β x + γ )sj . n − i − j j i=1 j=1 Y Y We assume that the quadratic factors cannot be factored further; this means

β2 γ < 0, j =1,...,l. j − j

Of course, deg q = i ri + j 2sj = n. P P Example 5.6 (a) x4 4=(x2 + 2)(x2 2) = (x √2)(x + √2)(x i√2)(x + i√2) = − − − − (x √2)(x + √2)(x2 + 2) − (b) x3 + x 2. One can guess the first zero x =1. Using long division one gets − 1 x3 +x 2 =(x 1)(x2 + x + 2) − − (x3 x2) − − x2 +x 2 − (x2 x ) − − 2x 2 − (2x 2) − − 0

There are no further real zeros of x2 + x +2. 140 5 Integration

5.2.4 Partial Fraction Decomposition Proposition 5.23 Let p(x) and q(x) be real polynomials with deg p < deg q. There exist real numbers Air, Bjs, and Cjs such that

s p(x) k ri A l j B x + C = ir + js js (5.35) q(x) (x α )r (x2 2β x + γ )s i=1 r=1 i ! j=1 s=1 j j ! X X − X X − where the αi, βj, γj, ri, and sj have the same meaning as in Lemma5.22.

x4 Example 5.7 (a) Compute f(x) dx = dx. We use long division to obtain a ratio- x3 1 Z Z − x nal function p/q with deg p< deg q, f(x)= x + x3 1 . To obtain the partial fraction decompo- sition (PFD), we need the factorization of the denominator− polynomial q(x)= x3 1. One can − guess the first real zero x =1 and divide q by x 1; q(x)=(x 1)(x2 + x + 1). 1 − − The PFD then reads x a bx + c = + . x3 1 x 1 x2 + x +1 − − We have to determine a, b, c. Multiplication by x3 1 gives − 0 x2 +1 x +0= a(x2 + x +1)+(bx + c)(x 1)=(a + b)x2 +(a b + c)x + a c. · · − − − The two polynomials on the left and on the right must coincide, that is, there coefficients must be equal: 0= a c, 1= a b + c, 0= a + b; − − which gives a = b = c = 1 . Hence, − 3 x 1 1 1 x 1 = − . x3 1 3 x 1 − 3 x2 + x +1 − − We can integrate the first two terms but we have to rewrite the last one x 1 1 2x +1 3 1 − = . x2 + x +1 2 x2 + x +1 − 2 1 2 3 x + 2 + 4 Recall that  2x 2β dx 1 x + b − dx = log x2 2βx + γ , = arctan . x2 2βx + γ − (x + b)2 + a2 a a Z Z − Therefore, x4 1 1 1 1 2x +1 dx = x2 + log x 1 log(x2 + x +1)+ arctan . x3 1 2 2 | − | − 6 √3 √3 Z − (b) If q(x)=(x 1)3(x+2)(x2 +2)2(x2 +1) and p(x) is any polynomial with deg p< deg q = − 10, then the partial fraction decomposition reads as p(x) A A A A B x + C B x + C B + C = 11 + 12 + 13 + 21 + 11 11 + 12 12 + 21 21 . q(x) x 1 (x 1)2 (x 1)3 x +2 x2 +2 (x2 + 2)2 x2 +1 − − − (5.36) 5.2 Integration and Differentiation 141

Suppose now that p(x) 1. One can immediately compute A and A . Multiplying (5.36) ≡ 13 21 by (x 1)3 yields − 1 = A +(x 1)p (x) (x + 2)(x2 + 2)2(x2 + 1) 13 − 1 with a rational function p1 not having (x 1) in the denominator. Inserting x = 1 gives 1 1 − A13 = 32 3 2 = 54 . Similarly, · · 1 1 A = = . 21 (x 1)3(x2 + 2)2(x2 + 1) ( 3)3 6 5 x= 2 − − − · ·

5.2.5 Other Classes of Elementary Integrable Functions An elementary function is the compositions of rational, exponential, trigonometric functions and their inverse functions, for example esin(√x 1) f(x)= − . x + log x A function is called elementary integrable if it has an elementary antiderivative. Rational func- tions are elementary integrable. “Most” functions are not elementary integrable as x x2 e 1 sin x e− , , , . x log x x They define “new” functions x t2 W (x) := e− 2 dt, (Gaussian integral), Z0 x dt li(x) := (integral logarithm) log t Z0 ϕ dx F(ϕ,k) := (elliptic integral of the first kind), 2 0 1 k2 sin x Z ϕ − E(ϕ,k) := p1 k2 sin2 x dx (elliptic integral of the second kind). 0 − Z p R(cos x, sin x) dx R p(u,v) Let R(u, v) be a rational function in two variables u and v, that is R(u, v)= q(u,v) with polino- x mials p and q in two variables. We substitute t = tan . Then 2 2t 1 t2 2dt sin x = , cos x = − , dx = . 1+ t2 1+ t2 1+ t2 Hence 1 t2 2t 2dt R(cos x, sin x) dx = R − , = R (t) dt 1+ t2 1+ t2 1+ t2 1 Z Z   Z with another rational function R1(t). There are 3 special cases where another substitution is apropriate. 142 5 Integration

(a) R( u, v)= R(u, v), R is odd in u. Substitute t = sin x. − − (b) R(u, v)= R(u, v), R is odd in v. Substitute t = cos x. − − (c) R( u, v)= R(u, v). Substitute t = tan x. − −

Example 5.8 (1) sin3 x dx. Here, R(u, v)= v3 is an odd function in v, such that (b) applies; t = cos x, dt = Zsin x dx, sin2 x =1 cos2 x =1 t2. This yields − − − t3 sin3 x dx = sin2 x ( sin x dx)= (1 t2) dt = t + + const. − · − − − − 3 Z Z Z cos3 = cos x + + const. . − 3 v (2) tan x dx. Here, R(u, v)= u . All (a), (b), and (c) apply to this situation. For example, let t = sin x. Then cos2 x =1 t2, dt = cos x dx and R − sin x cos x dx t dt 1 d(1 t2) tan x dx = · = = − = cos2 x 1 t2 −2 1 t2 Z Z Z − Z 1− = log(1 t2)= log cos x . −2 − − | |

R(x, √n ax + b) dx TheR substitution t = √n ax + b

n n 1 yields x =(t b)/a, dx = nt − dt/a, and therefore − n n n t b n 1 R(x, √ax + b) dx = R − , t t − dt. a a Z Z  

R(x, √ax2 + 2bx + c) dx UsingR the method of complete squares the above integral can be written in one of the three basic forms R(t, √t2 +1)dt, R(t, √t2 1) dt, R(t, √1 t2) dt. − − Z Z Z Further substitutions

t = sinh u, √t2 + 1 = cosh u, dt = cosh u du, t = cosh u, √t2 1 = sinh u, dt = sinh u du, ± − ± t = cos u, √1 t2 = sin u, dt = sin u du ± − ∓ reduce the integral to already known integrals. 5.3 Improper Integrals 143

dx Example 5.9 Compute I = . Hint: t = √x2 +6x +5 x. √x2 +6x +5 − 2 2 2Z 2 2 t2 5 Then (x+t) = x +2tx+t = x +6x+5 such that t +2tx =6x+5 and therefore x = 6 −2t and − 2t(6 2t)+2(t2 5) 2t2 + 12t 10 dx = − − dt = − − dt. (6 2t)2 (6 2t)2 − − t2 5 t2+6t 5 Hence, using t + x = t + 6 −2t = − 6 2t− , − − ( 2t2 + 12t 10)dt 1 2(6 2t)( t2 +6t 5) I = − − = − − − dt (6 2t)2 t + x ( t2 +6t 5)(6 2t)2 Z − Z − − − dt =2 = log 6 2t + const. = log 6 2√x2 +6x +5+2x + const. 6 2t − | − | − Z −

5.3 Improper Integrals

The notion of the Riemann integral defined so far is apparently too tight for some applications: we can integrate only over finite intervals and the functions are necessarily bounded. If the integration interval is unbounded or the function to integrate is unbounded we speak about improper integrals. We consider three cases: one limit of the integral is infinite; the function is not defined at one of the end points a or b of the interval; both a and b are critical points (either infinity or the function is not defined there).

5.3.1 Integrals on unbounded intervals

Definition 5.5 Suppose f R on [a, b] for all b > a where a is fixed. Define ∈ b ∞ f(x) dx = lim f(x) dx (5.37) b + Za → ∞ Za if this limit exists (and is finite). In that case, we say that the integral on the left converges. Ifit also converges if f has been replaced by f , it is said to converge absolutely. | | If an integral converges absolutely, then it converges, see Example5.11 below, where

∞ ∞ f dx f dx. ≤ | | Za Za

b Similarly, one defines f(x) dx. Moreover, Z−∞ 0 ∞ ∞ f dx := f dx + f dx 0 Z−∞ Z−∞ Z if both integrals on the right side converge. 144 5 Integration

∞ dx Example 5.10 (a) The integral converges for and diverges for . s s > 1 0 < s 1 1 x ≤ Indeed, Z

R dx 1 1 R 1 1 = = 1 . xs 1 s · xs 1 s 1 − Rs 1 Z1 − 1  −  − − Since

1 0, if s> 1, lim s 1 = R + R − (+ , if 0 1. xs s 1 Z0 − (b) R x x R 1 e− dx = e− =1 . − 0 − eR Z0

∞ x Hence e− dx =1. Z0 ∞ Proposition 5.24 (Cauchy criterion) The improper integral f dx converges if and only if a for every ε> 0 there exists some b > a such that for all c,d>bZ

d f dx < ε. Zc

Proof. The following Cauchy criterion for limits of functions is easily proved using sequences:

The limit lim F (x) exists if and only if x →∞ ε> 0 R> 0 x, y > R : F (x) F (y) < ε. (5.38) ∀ ∃ ∀ | − | Indeed, suppose that (x ) is any sequence converging to + as n . We will show n ∞ → ∞ that (F (xn)) converges if (5.38) is satisfied. Let ε > 0. By assumption, there exists R > 0

with the above property. Since xn + there exists n0 Æ such that n n0 implies n−→ ∞ ∈ ≥ x > R. Hence, F (x ) F (x ) <ε→∞as m, n n . Thus, (F (x )) is a Cauchy sequence and n | n − m | ≥ 0 n therefore convergent. This proves one direction of the above criterion. The inverse direction is even simpler: Suppose that lim F (x)= A exists (and is finite!). We will show that the above x + criterion is satisfied.Let ε> →0. By∞ definition of the limit there exists R> 0 such that x, y > R imply F (x) A < ε/2 and F (y) A < ε/2. By the triangle inequality, | − | | − | ε ε F (x) F (y) = F (x) A (F (y) A) F (x) A + F (y) A < + = ε, | − | | − − − |≤| − | | − | 2 2 as x, y > R which completes the proof of this Cauchy criterion. Applying this criterion to the function F (t)= t f dx noting that F (d) F (c) = d f dx , a | − | c the limit limt F (t) exists. →∞ R R

5.3 Improper Integrals 145

∞ ∞ Example 5.11 (a) If a f dx converges absolutely, then a f dx converges. Indeed, let ε> 0 and ∞ f dx converges. By the Cauchy Criterion for the later integral and by the triangle a | | R R inequality, Proposition5.10, there exists b> 0 such that for all c,d>b R d d f dx f dx<ε. (5.39) ≤ | | Zc Zc

Hence, the Cauchy criterion is satisfied for if it holds for . Thus, ∞ converges. f f a f dx sin x 1 | | 1 (b) ∞ dx. Partial integration with u = and v′ = sin x yields u′ = , v = cos x and 1 x x R − x2 − R d sin x 1 d d cos x dx = cos x dx x −x − x2 Zc c Zc d d sin x 1 1 dx dx cos d + cos c + x ≤ −d c x2 Zc Zc 1 1 1 1 1 1 + + 2 + <ε ≤ c d d − c ≤ c d   if and are sufficiently large. Hence, ∞ sin x converges. c d 1 x dx

The integral does not converge absolutely. For non-negative integers n Z we have R ∈ + (n+1)π sin x 1 (n+1)π 2 dx sin x dx = ; x ≥ (n + 1)π | | (n + 1)π Znπ Znπ hence n (n+1)π sin x 2 1 dx . x ≥ π k +1 Z1 k=1 X Since the harmonic series diverges, so does the integral ∞ sin x . π x dx

Proposition 5.25 Suppose f R is nonnegative, f R 0. Then ∞ f dx converges if there ∈ ≥ a exists C > 0 such that R b f dx a. Za The proof is similar to the proof of Lemma2.19(c); we omit it. Analogous propositions are true for integrals a f dx. −∞ PropositionR 5.26 (Integral criterion for series) Assume that f R is nonnegative f 0 and ∈ ≥ decreasing on [1, + ). Then ∞ f dx converges if and only if the series ∞ f(n) converges. ∞ 1 n=1 Proof. Since f(n) f(x) fR(n 1) for n 1 x n, P ≤ ≤ − − ≤ ≤ n f(n) f dx f(n 1). ≤ n 1 ≤ − Z − Summation over n =2, 3,...,N yields

N N 1 N − f(n) f dx f(n). ≤ ≤ n=2 1 n=1 X Z X 146 5 Integration

∞ ∞ If 1 f dx converges the series n=1 f(n) is bounded and therefore convergent. R Conversely, if ∞ f(n) converges, the integral f dx ∞ f(n) is bounded as R n=1 P 1 ≤ n=1 R , hence convergent by Proposition5.25. →∞ P R P

1 dx ∞ ∞ Example 5.12 n=2 n(log n)α converges if and only if 2 x(log x)α converges. The substitution y = log x, dy = dx gives Px R ∞ dx ∞ dy = x(log x)α yα Z2 Zlog 2 which converges if and only if α> 1 (see Example 5.10).

5.3.2 Integrals of Unbounded Functions

Definition 5.6 Suppose f is a real function on [a, b) and f R on [a, t] for every t, a

b b f dx = lim f dx t a+0 Za → Zt if f is unbounded at a and integrable on [t, b] for all t with a

b In both cases we say that a f dx converges.

Example 5.13 (a) R

1 dx t dx π = lim = lim arcsin x t = lim arcsin t = arcsin 1 = . 2 2 0 0 √1 x t 1 0 0 √1 x t 1 0 | t 1 0 2 Z − → − Z − → − → − (b)

1 1 1 1 α 1 1 x − , α =1 , α< 1, dx dx 1 α t 1 α = lim = lim − 6 = − xα t 0+0 xα t 0+0 1 0 → t → (log x t , α =1 (+ , α 1. Z Z | ∞ ≥ Remarks 5.4 (a) The analogous statements to Proposition5.24 and Proposition5.25 are true b for improper integrals a f dx. 1 1 dx 2 1 For example, 0 x(1 xR) diverges since both improper integrals 0 f dx and 1 f dx diverge, − 2 1 dx diverges since it diverges at x = 1, finally I = 1 dx converges. Indeed, the 0 √x(1 x) R 0 √Rx(1 x) R − − Rsubstitution x = sin2 t gives I = π. R (b) If f is unbounded both at a and at b we define the improper integral

b c b f dx = f dx + f dx Za Za Zc 5.3 Improper Integrals 147 if c is between a and b and both improper integrals on the right side exist. (c) Also, if f is unbounded at a define

b ∞ ∞ f dx = f dx + f dx Za Za Zb if the two improper integrals on the right side exist. (d) If f is unbounded in the interior of the interval [a, b], say at c, we define the improper integral

b c b f dx = f dx + f dx Za Za Zc if the two improper integrals on the right side exist. For example,

1 dx 0 dx 1 dx t dx 1 dx = + = lim + lim 1 x 1 x 0 x t 0 0 1 x t 0+0 t x Z− Z− Z → − Z− → Z | | | | t | | 1 | | | | p = limp 2√ x p+ lim 2√x =4p. p t 0 0 1 t 0+0 t → − − − − →

5.3.3 The Gamma function For x> 0 set

∞ x 1 t Γ(x)= t − e− dt. (5.40) Z0 1 x 1 t By Example5.13, Γ1(x)= 0 t − e− dt converges since for every t> 0

R x 1 t 1 − − t e 1 x . ≤ t − x 1 t By Example5.10, Γ (x)= ∞ t − e− dt converges since for every t t 2 1 ≥ 0

R x 1 t 1 t − e− . ≤ t2 x+1 t Note that limt t e− =0 by Proposition3.11. Hence, Γ(x) is defined for every x> 0. →∞ Proposition 5.27 For every positive x

xΓ(x)=Γ(x + 1). (5.41)

In particular, for n Æ we have Γ(n +1)= n!. ∈ Proof. Using integration by parts,

R R x t x t R x 1 t t e− dt = t e− + x t − e− dt. − ε Zε Zε

Taking the limits ε 0+0 and R + one has Γ(x +1)= xΓ(x). Since by Example5.10 → → ∞

∞ t Γ(1) = e− dt =1, Z0 148 5 Integration it follows from (5.41) that

Γ(n +1)= nΓ(n)= = n(n 1)(n 2) Γ(1) = n! ··· − − ···

The Gamma function interpolates the factorial function n! which is defined only for positive integers n. However, this property alone is not sufficient for a complete characterization of the Gamma function. We need another property. This will be done more in detail in the appendix to this chapter.

5.4 Integration of Vector-Valued Functions

k A mapping γ : [a, b] Ê , γ(t)=(γ (t),...,γ (t)) is said to be continuous if all the mappings → 1 k γi, i = 1,...,k, are continuous. Moreover, if all the γi are differentiable, we write γ′(t) =

(γ1′ (t),...,γk′ (t)).

Definition 5.7 Let f1,...,fk be real functions on [a, b] and let f = (f1,...,fk) be the corre- k ponding mapping from [a, b] into Ê . If α increases on [a, b], to say that f R(α) means that ∈ f R(α) for j =1,...,k. In this case we define j ∈ b b b f dα = f1 dα,..., fk dα . Za Za Za  b k b In other words a f dα is the point in Ê whose jth coordinate is a fj dα. It is clear that parts (a), (c), and (e) of Proposition 5.9 are valid for these vector valued integrals; we simply apply R R the earlier results to each coordinate. The same is true for Proposition5.12, Theorem 5.14, and Theorem5.15. To illustrate this, we state the analog of the fundamental theorem of calculus.

Theorem 5.28 If f = (f ,...,f ) R on [a, b] and if F = (F ,...,F ) is an antiderivative 1 k ∈ 1 k of f on [a, b], then b f(x) dx = F (b) F (a). − Za k The analog of Proposition5.10(b) offers some new features. Let x =(x1,...,xk) Ê be any k 2 2 ∈ vector in Ê . We denote its Euclidean norm by x = x + + x . k k 1 ··· k Proposition 5.29 If f =(f ,...,f ) R(α) on [a, b] thenp f R(α) and 1 k ∈ k k ∈ b b f dα f dα. (5.42) ≤ k k Za Za

Proof. By the definition of the norm,

1 f = f 2 + f 2 + + f 2 2 . k k 1 2 ··· k 2  2 By Proposition5.10(a) each of the functions fi belong to R(α); hence so does their sum f1 + f 2 + + f 2. Note that the square-root is a continuous function on the positive half line. If we 2 ··· k 5.4 Integration of Vector-Valued Functions 149 apply Proposition5.8 we see f R(α). k k ∈ b b To prove (5.42), put y =(y1,...,yk) with yj = a fj dα. Then we have y = a f dα, and

k k b R b k R y 2 = y2 = y f dα = (y f ) dα. k k j j j j j j=1 j=1 a a j=1 X X Z Z X By the Cauchy–Schwarz inequality, Proposition1.25,

k y f (t) y f(t) , t [a, b]. j j ≤k kk k ∈ j=1 X Inserting this into the preceding equation, the monotony of the integral gives

b y 2 y f dα. k k ≤k k k k Za If y =0, (5.42) is trivial. If y =0, division by y gives (5.42). 6 k k

Integration of Complex Valued Functions

This is a special case of the above arguments with k = 2. Let ϕ: [a, b] C be a complex- → valued function. Let u, v : [a, b] Ê be the real and imaginary parts of ϕ, respectively; u = → Re ϕ and v = Im ϕ. The function ϕ = u + iv is said to be integrable if u, v R on [a, b] and we set ∈ b b b ϕ dx = u dx + i v dx. Za Za Za The fundamental theorem of calculus holds: If the complex function ϕ is Riemann integrable, ϕ R on [a, b] and F (x) is an antiderivative of ϕ, then ∈ b ϕ(x) dx = F (b) F (a). − Za x Similarly, if u and v are both continuous, F (x)= a ϕ(t) dt is an antiderivative of ϕ(x). Proof. Let F = U +iV be the antiderivative of ϕ where U = u and V = v. By the fundamental R ′ ′ theorem of calculus

b b b ϕ dx = u dx + i v dx = U(b) U(a)+i(V (b) V (a)) = F (b) F (a). − − − Za Za Za

Example: b b αt 1 αt

e dt = e , α C. α ∈ Za a

150 5 Integration

5.5 Inequalities

Besides the triangle inequality b f dα b f dα which was shown in Proposition5.10 a ≤ a | | we can formulate H¨older’s, Minkowski’s, and the Cauchy–Schwarz inequalities for Riemann– R R Stieltjes integrals. For, let p > 0 be a fixed positive real number and α an increasing function on [a, b]. For f R(α) define the Lp-norm ∈ 1 b p f = f p dα . (5.43) k kp | | Za  Cauchy–Schwarz Inequality

Proposition 5.30 Let f,g : [a, b] C be complex valued functions and f,g R on [a, b]. → ∈ Then

b 2 b b fg dx f 2 dx g 2 dx. (5.44) | | ≤ | | · | | Za  Za Za Proof. Letting f = f and g = g , it suffices to show ( fg dx)2 f 2 dx g2 dx. For, b 2 | | b | | b 2 ≤ · put A = g dx, B = fg dx, and C = f dx. Let λ C be arbitrary. By the positivity a a a R ∈ R R and linearity of the integal, R R R b b b b 0 (f + λg)2 dx = f 2 dx +2λ fg dx + λ2 g2 dx = C +2Bλ + Aλ2 =: h(λ). ≤ Za Za Za Za Thus, h is non-negative for all complex values λ.

Case 1. A = 0. Inserting this, we get 2Bλ + C 0 for all λ C. This implies B = 0 and ≥ ∈ C 0; the inequality is satisfied. ≥ Case 2. A> 0. Dividing the above inequality by A, we have

2B C B 2 B 2 C 0 λ2 + λ + λ + + . ≤ A A ≤ A − A A     This is satisfied for all λ if and only if

B 2 C and, finally B2 AC. A ≤ A ≤   This completes the proof.

Proposition 5.31 (a) Cauchy–Schwarz inequality. Suppose f,g R(α), then ∈ b b b b fg dα fg dα f 2 dα g 2 dα or (5.45) a ≤ a | | ≤ s a | | s a | | Z Z Z Z b

fg dα f g . (5.46) | | ≤k k2 k k2 Za 5.6 Appendix D 151

1 1 (b) H¨older’s inequality. Let p and q be positive real numbers such that + =1. If f,g R(α), p q ∈ then b b fg dα fg dα f g . (5.47) ≤ | | ≤k kp k kq Za Za

(c) Minkowski’s inequality. Let p 1 and f,g R(α), then ≥ ∈ f + g f + g . (5.48) k kp ≤k kp k kp

5.6 AppendixD

The composition of an integrable and a continuous function is integrable

Proof of Proposition5.8. Let ε > 0. Since ϕ is uniformly continuous on [m, M], there exists δ > 0 such that δ<ε and ϕ(s) ϕ(t) <ε if s t < δ and [s, t [m, M]. | − | | − | ∈ Since f R(α), there exists a partition P = x , x ,...,x of [a, b] such that ∈ { 0 1 n} U(P,f,α) L(P,f,α) < δ2. (5.49) −

Let Mi and mi have the same meaning as in Definition5.1, and let Mi∗ and mi∗ the analogous numbers for h. Divide the numbers 1, 2,...,n into two classes: i A if M m < δ and ∈ i − i i B if M m > δ. For i A our choice of δ shows that M ∗ m∗ ε. For i B, ∈ i − i ∈ i − i ≤ ∈ M ∗ m∗ 2K where K = sup ϕ(t) m t M . By (5.49), we have i − i ≤ {| || ≤ ≤ } δ ∆α (M m )∆α < δ2 (5.50) i ≤ i − i i i B i B X∈ X∈ so that i B ∆αi < δ. It follows that ∈

P U(P, h, α) L(P, h, α)= (M ∗ m∗)∆α + (M ∗ m∗)∆α − i − i i i − i i ≤ i A i B X∈ X∈ ε(α(b) α(a))+2Kδ<ε(α(b) α(a)+2K). − − Since ε was arbitrary, Proposition5.3 implies that h R(α). ∈

Convex Functions are Continuous

Proposition 5.32 Every convex function f : (a, b) Ê, a < b + , is continuous. → −∞ ≤ ≤ ∞ Proof. There is a very nice geometric proof in Rudin’s book “Real and Complex Analysis”, see [Rud66, 3.2 Theorem]. We give another proof here. Let x (a, b); choose a finite subinterval (x , x ) with a < x

This means that f is bounded below on [x3, x2] by a linear function; hence f is bounded on [x , x ], say f(x) C on [x , x ]. 3 2 | | ≤ 3 2 The convexity implies

1 1 1 f (x + h)+ (x h) (f(x + h)+ f(x h)) 2 2 − ≤ 2 −   = f(x) f(x h) f(x + h) f(x). ⇒ − − ≤ − Iteration yields

f(x (ν 1)h) f(x νh) f(x + h) f(x) f(x + νh) f(x +(ν 1)h). − − − − ≤ − ≤ − − Summing up over ν =1,...,n we have

f(x) f(x nh) n (f(x + h) f(x)) f(x + nh) f(x) − − ≤ − ≤ − 1 1 = (f(x) f(x nh)) f(x + h) f(x) (f(x + nh) f(x)) . ⇒ n − − ≤ − ≤ n −

Let ε> 0 be given; choose n Æ such that 2C/n < ε and choose h such that x < x nh < ∈ 3 − x < x + nh < x2. The above inequality then implies 2C f(x + h) f(x) < ε. | − | ≤ n This shows continuity of f at x.

If g is an increasing convex function and f is a convex function, then g◦f is convex since f(λx + µy) λf(x)+ µf(y), λ + µ =1, λ,µ 0, implies ≤ ≥ g(f(λx + µy)) g(λf(x)+ µg(x)) λg(f(x)) + µg(f(y)). ≤ ≤

5.6.1 More on the Gamma Function Ê Let I Ê be an interval. A positive function F : I is called logarithmic convex if ⊂ → log F : I Ê is convex, i. e. for every x, y I and every λ, 0 λ 1 we have → ∈ ≤ ≤ λ 1 λ F (λx + (1 λ)y) F (x) F (y) − . − ≤

Proposition 5.33 The Gamma function is logarithmic convex.

Proof. Let x,y > 0 and 0 <λ< 1 be given. Set p = 1/λ and q = 1/(1 λ). Then − 1/p +1/q =1 and we apply H¨older’s inequality to the functions

x−1 t y−1 t f(t)= t p e− p , g(t)= t q e− q and obtain 1 1 R R p R q f(t)g(t) dt f(t)p dt g(t)q dt . ≤ Zε Zε  Zε  5.6 Appendix D 153

Note that y x + 1 t p x 1 t q y 1 t f(t)g(t)= t p q − e− , f(t) = t − e− , g(t) = t − e− . Taking the limts ε 0+0 and R + we obtain → → ∞ x y 1 1 Γ + Γ(x) p Γ(y) q . p q ≤  

Remark 5.5 One can prove that a convex function (see Definition 4.4) is continuous, see Propo- sition5.32. Also, an increasing convex function of a convex function f is convex, for example ef is convex if f is. We conclude that Γ(x) is continuous for x> 0.

Theorem 5.34 Let F : (0, + ) (0, + ) be a function with ∞ → ∞ (a) F (1) = 1, (b) F (x +1)= xF (x), (c) F is logarithmic convex. Then F (x)=Γ(x) for all x> 0. Proof. Since Γ(x) has the properties (a), (b), and (c) it suffices to prove that F is uniquely determined by (a), (b), and (c). By (b),

F (x + n)= F (x)x(x + 1) (x + n) ··· for every positive x and every positive integer n. In particular F (n +1) = n! and it suffices to show that F (x) is uniquely determined for every x with x (0, 1). Since n + x = (1 x)n + ∈ − x(n + 1) from (c) it follows

1 x x 1 x x x x F (n + x) F (n) − F (n + 1) = F (n) − F (n) n =(n 1)!n . ≤ − Similarly, from n +1= x(n + x)+(1 x)((n +1+ x) it follows − x 1 x 1 x n!= F (n + 1) F (n + x) F (n +1+ x) − = F (n + x)(n + x) − . ≤ Combining both inequalities,

x 1 x n!(n + x) − F (n + x) (n 1)!n ≤ ≤ − and moreover x 1 x n!(n + x) − (n 1)!n a (x) := F (x) − =: b (x). n x(x + 1) (x + n 1) ≤ ≤ x(x + 1) (x + n 1) n ··· − ··· − x bn(x) (n+x)n Since = x converges to 1 as n , an(x) n(n+x) →∞ (n 1)!nx F (x) = lim − . n x(x + 1) (x + n) →∞ ··· Hence F is uniquely determined. 154 5 Integration

Stirling’s Formula

We give an asymptotic formula for n! as n . We call two sequences (an) and (bn) to be an → ∞ asymptotically equal if lim =1, and we write an bn. n →∞ bn ∼ Proposition 5.35 (Stirling’s Formula) The asymptotical behavior of n! is n n n! √2πn . ∼ e   2 Proof. Using the trapezoid rule (5.34) with f(x) = log x, f ′′(x)= 1/x we have − k+1 1 1 log x dx = (log k + log(k +1))+ 2 12ξ2 Zk k with k ξ k +1. Summation over k =1,...,n 1 gives ≤ k ≤ − n n 1 n 1 1 − 1 log x dx = log k log n + 2 . 1 − 2 12 ξk Z Xk=1 Xk=1 Since log x dx = x log x x (integration by parts), we have − R n n 1 1 1 − 1 n log n n +1= log k log n + 2 − − 2 12 ξk Xk=1 Xk=1 n 1 log k = n + log n n + γ , 2 − n Xk=1   1 n 1 1 γ − n where γn =1 12 k=1 ξ2 . Exponentiating both sides of the equation we find with cn = e − k

P n+ 1 n n!= n 2 e− cn. (5.51)

Since 0 < 1/ξ2 1/k2, the limit k ≤ ∞ 1 γ = lim γn =1 n 2 →∞ − ξk Xk=1 γ exists, and so the limit c = lim cn = e . n →∞ Proof of c √2π. Using (5.51) we have n → c2 (n!)2√2n(2n)2n 22n(n!)2 n √ = 2n+1 = 2 c2n n (2n)! √n(2n)!

2 2 cn c and limn = = c. Using Wallis’s product formula for π →∞ c2n c

∞ 4k2 2 2 4 4 2n 2n π =2 = lim 2 · · · ···· · (5.52) 2 n 4k 1 →∞ 1 3 3 5 (2n 1)(2n + 1) Yk=1 − · · · ···· − 5.6 Appendix D 155 we have

1 n 4k2 2 2 4 2n 1 22 42 (2n)2 2 = √2 · ··· = · ··· 4k2 1 3 5 (2n 1)√2n +1 1 · 2 3 4 (2n 1)(2n) k=1 ! n + Y − · ··· − 2 · · ··· − 1 22n(n!)2 q = , 1 · (2n)! n + 2 q such that 22n(n!)2 √π = lim . n →∞ √n(2n)! Consequently, c = √2π which completes the proof.

Proof of Holder’s¨ Inequality Proof of Proposition 5.31. We prove (b). The other two statements are consequences, their proofs are along the lines in Section1.3. The main idea is to approximate the integral on the left by Riemann sums and use H¨older’s inequality (1.22). Let ε> 0; without loss of generality, let f,g 0. By Proposition5.10 fg, f p, gq R(α) and by Proposition5.3 there exist partitions ≥ ∈ P , P , and P of [a, b] such that U(fg,P , α) L(fg,P , α) <ε, U(f p,P , α) L(f p,P , α) < 1 2 3 1 − 1 2 − 2 ε, and U(gq,P , α) L(gq,P , α) < ε. Let P = x , x ,...,x be the common refinement 3 − 3 { 0 1 n} of P1, P2, and P3. By Lemma 5.4 (a) and (c)

b n fg dα< (fg)(ti)∆αi + ε, (5.53) Za i=1 n Xb p p f(ti) ∆αi < f dα + ε, (5.54) i=1 Za Xn b q q g(ti) ∆αi < g dα + ε, (5.55) i=1 a X Z for any ti [xi 1, xi]. Using the two preceding inequalities and H¨older’s inequality (1.22) we ∈ − have

1 1 p q n 1 1 n n f(t )∆α p g(t )∆α q f(t )p∆α g(t )q∆α i i i i ≤ i i i i i=1 i=1 ! i=1 ! X X 1 X 1 b p b q < f p dα + ε gq dα + ε . Za  Za  By (5.53),

1 1 b n b p b q p q fg dα< (fg)(ti)∆αi + ε< f dα + ε g dα + ε + ε. a i=1 a a Z X Z  Z  156 5 Integration

Since ε> 0 was arbitrary, the claim follows. Chapter 6

Sequences of Functions and Basic Topology

In the present chapter we draw our attention to complex-valued functions (including the real- valued), although many of the theorems and proofs which follow extend to vector-valued func- tions without difficulty and even to mappings into more general spaces. We stay within this simple framework in order to focus attention on the most important aspects of the problem that arise when limit processes are interchanged.

6.1 Discussion of the Main Problem

Definition 6.1 Suppose (f ), n Æ, is a sequence of functions defined on a set E, and suppose n ∈ that the sequence of numbers (f (x)) converges for every x E. We can then define a function n ∈ f by

f(x) = lim fn(x), x E. (6.1) n →∞ ∈

Under these circumstances we say that (fn) converges on E and f is the limit (or the limit function) of (fn). Sometimes we say that “(fn) converges pointwise to f on E” if (6.1) holds. Similarly, if ∞ f (x) converges for every x E, and if we define n=1 n ∈

P ∞ f(x)= f (x), x E, (6.2) n ∈ n=1 X the function f is called the sum of the series n∞=1 fn. The main problem which arises is to determineP whether important properties of the functions fn are preserved under the limit operations (6.1) and (6.2). For instance, if the functions fn are continuous, or differentiable, or integrable, is the same true of the limit function? What are the relations between fn′ and f ′, say, or between the integrals of fn and that of f? To say that f is continuous at x means lim f(t)= f(x). t x → 157 158 6 Sequences of Functions and Basic Topology

Hence, to ask whether the limit of a sequence of continuous functions is continuous is the same as to ask whether

lim lim fn(t) = lim lim fn(t) (6.3) t x n n t x → →∞ →∞ → i. e. whether the order in which limit processes are carried out is immaterial. We shall now show by means of several examples that limit processes cannot in general be interchanged without affecting the result. Afterwards, we shall prove that under certain conditions the order in which limit operations are carried out is inessential.

Example 6.1 (a) Our first example, and the simplest one, concerns a “double sequence.” For

positive integers m, n Æ let ∈ m s = . mn m + n Then, for fixed n

lim smn =1, m →∞ so that lim lim smn =1. On the other hand, for every fixed m, n m →∞ →∞

lim smn =0, n →∞ so that lim lim smn =0. The two limits cannot be interchanged. m n →∞ →∞ n 0, 0 x< 1, (b) Let fn(x) = x on [0, 1]. Then f(x) = ≤ All functions fn(x) are conti- (1, x =1. nous on [0, 1]; however, the limit f(x) is discontinuous at x = 1; that is lim lim tn = 0 = x 1 0 n 6 1 = lim lim tn. The limits cannot be interchanged. → − →∞ n t 1 0 After→∞ these→ examples,− which show what can go wrong if limit processes are interchanged care- lessly, we now define a new notion of convergence, stronger than pointwise convergence as defined in Definition6.1, which will enable us to arrive at positive results.

6.2

6.2.1 Definitions and Example

Definition 6.2 A sequence of functions (fn) converges uniformly on E to a function f if for every ε> 0 there is a positive integer n such that n n implies 0 ≥ 0 f (x) f(x) ε (6.4) | n − | ≤ for all x E. We write f ⇉ f on E. ∈ n

As a formula, fn ⇉ f on E if

ε> 0 n Æ n n x E : f (x) f(x) ε. ∀ ∃ 0 ∈ ∀ ≥ 0 ∀ ∈ | n − | ≤ 6.2 Uniform Convergence 159

f(x) + ε Uniform convergence of fn to f on [a, b] means that f is in the ε-tube of f for sufficiently large f(x) n n f(x) −ε

a b It is clear that every uniformly convergent sequence is pointwise convergent (to the same func- tion). Quite explicitly, the difference between the two concepts is this: If (fn) converges point- wise on E to a function f, for every ε > 0 and for every x E, there exists an integer n ∈ 0 depending on both ε and x E such that (6.4) holds if n n . If (f ) converges uniformly on ∈ ≥ 0 n E it is possible, for each ε> 0 to find one integer n which will do for all x E. 0 ∈ We say that the series k∞=1 fk(x) converges uniformly on E if the sequence (sn(x)) of partial sums defined by P n

sn(x)= fk(x) Xk=1 converges uniformly on E.

Proposition 6.1 (Cauchy criterion) (a) The sequence of functions (fn) defined on E converges uniformly on E if and only if for every ε > 0 there is an integer n such that n, m n and 0 ≥ 0 x E imply ∈ f (x) f (x) ε. (6.5) | n − m | ≤

∞ (b) The series of functions gk(x) defined on E converges uniformly on E if and only if for k=1 every ε> 0 there is an integerXn such that n, m n and x E imply 0 ≥ 0 ∈ n

gk(x) ε. ≤ k=m X

Proof. Suppose (fn) converges uniformly on E and let f be the limit function. Then there is an integer n such that n n , x E implies 0 ≥ 0 ∈ ε f (x) f(x) , | n − | ≤ 2 so that f (x) f (x) f (x) f(x) + f (x) f(x) ε | n − m |≤| n − | | m − | ≤ if m, n n , x E. ≥ 0 ∈ Conversely, suppose the Cauchy condition holds. By Proposition2.18, the sequence (fn(x)) converges for every x to a limit which may we call f(x). Thus the sequence (fn) converges 160 6 Sequences of Functions and Basic Topology pointwise on E to f. We have to prove that the convergence is uniform. Let ε > 0 be given, choose n such that (6.5) holds. Fix n and let m in (6.5). Since f (x) f(x) as 0 → ∞ m → m this gives →∞ f (x) f(x) ε | n − | ≤ for every n n0 and x E. ≥ ∈ n (b) immediately follows from (a) with fn(x)= k=1 gk(x). P

Remark 6.1 Suppose

lim fn(x)= f(x), x E. n →∞ ∈ Put

Mn = sup fn(x) f(x) . x E | − | ∈ Then f ⇉ f uniformly on E if and only if M 0 as n . (prove!) n n → →∞ The following comparison test of a function series with a numerical series gives a sufficient criterion for uniform convergence.

Theorem 6.2 (Weierstraß) Suppose (fn) is a sequence of functions defined on E, and suppose

f (x) M , x E, n Æ. (6.6) | n | ≤ n ∈ ∈

Then n∞=1 fn converges uniformly on E if n∞=1 Mn converges. ProofP. If M converges, then, for arbitraryPε> 0 there exists n such that m, n n implies n 0 ≥ 0 n M ε. Hence, i=m iP≤ P n n n

fi(x) fi(x) Mi ε, x E. tr.In.≤ | |(6.6) ≤ ≤ ∀ ∈ i=m i=m i=m X X X

Uniform convergence now follows from Proposition 6.1.

Proposition 6.3 ( Comparison Test) If ∞ g (x) converges uniformly on E and f (x) n=1 n | n | ≤ g (x) for all sufficiently large n and all x E then ∞ f (x) converges uniformly on E. n P∈ n=1 n Proof. Apply the Cauchy criterion. Note that P

m m m

fn(x) fn(x) gn(x) < ε. ≤ | | ≤ n=k n=k n=k X X X

6.2 Uniform Convergence 161

Application of Weierstraß’ Theorem to Power Series and

Proposition 6.4 Let

∞ n

a z , a C, (6.7) n n ∈ n=0 X be a power series with radius of convergence R> 0. Then (6.7) converges uniformly on the closed disc z z r for every r with 0 r < R. { || | ≤ } ≤ n Proof. We apply Weierstraß’ theorem to fn(z)= anz . Note that

f (z) = a z n a rn. | n | | n || | ≤| n | n Since r < R, r belongs to the disc of convergence, the series n∞=0 an r converges by Theo- n | | rem 2.34. By Theorem 6.2, the series ∞ a z converges uniformly on z z r . n=0 n P { || | ≤ } P

Remark 6.2 (a) The power series ∞ n 1 nanz − n=0 X has the same radius of convergence R as the series (6.7) and hence also converges uniformly on the closed disc z z r . { || | ≤ } Indeed, this simply follows from the fact that

n n n 1 lim (n + 1) an+1 = lim √n +1 lim an = . n n n →∞ | | →∞ →∞ | | R p p (b) Note that the power series in general does not converge uniformly on the whole open disc of convergence z < R. As an example, consider the geometric series | | 1 ∞ f(z)= = zk, z < 1. 1 z | | − Xk=0 Note that the condition

ε > 0 n Æ x E : f (x ) f(x ) ε ∃ 0 ∀ ∈ ∃ n ∈ | n n − n | ≥ 0 implies that (fn) does not converge uniformly to f on E. Prove! n To ε =1 and every n Æ choose z = and we obtain, using Bernoulli’s inequality, ∈ n n+1 1 n 1 zn zn = 1 1 n =1 z , hence n 1. (6.8) n − n +1 ≥ − n +1 − n 1 z ≥   − n so that n 1 − ∞ n k 1 k zn sn 1(zn) f(zn) = zn = zn = 1. | − − | − 1 zn 1 zn (6.8)≥ k=0 − k=n − X X The geometric series doesn’t converge uniformly on the whol e open unit disc.

162 6 Sequences of Functions and Basic Topology

Example 6.2 (a) A series of the form

∞ ∞

a cos(nx)+ b sin(nx), a , b , x Ê, (6.9) n n n n ∈ n=0 n=1 X X is called a Fourier series (see Section 6.3 below). If both ∞ a and ∞ b converge n=0 | n | n=0 | n | then the series (6.9) converges uniformly on Ê to a function F (x). P P Indeed, since a cos(nx) a and b sin(nx) b , by Theorem6.2, the series (6.9) | n |≤| n | | n |≤| n |

converges uniformly on Ê. Ê (b) Let f : Ê be the sum of the Fourier series → ∞ sin nx f(x)= (6.10) n n=1 X Note that (a) does not apply since b = 1 diverges. n | n | n n If f(x) exists, so does f(x +2π)= f(x), and f(0) = 0. We will show that the series converges P P uniformly on [δ, 2π δ] for every δ > 0. For, put − n n ikx sn(x)= sin kx = Im e . ! Xk=1 Xk=1 If δ x 2π δ we have ≤ ≤ − n i(n+1)x ix ikx e e 2 1 1 sn(x) e = − = . | | ≤ eix 1 ≤ eix/2 e ix/2 sin x ≤ sin δ k=1 − 2 2 X − | − | x δ Note that Im z z and eix =1. Since sin sin for δ/2 x/2 π δ/2 we have | |≤| | 2 ≥ 2 ≤ ≤ − for 0

n n sin kx sk(x) sk 1(x) = − − k k kX=m kX=m n 1 1 sn(x) sm 1(x) = sk(x) + − k − k +1 n +1 − m kX=m   n 1 1 1 1 1 + + ≤ sin δ k − k +1 n +1 m 2 k=m   ! X 1 1 1 1 1 2 + + ≤ sin δ m − n +1 n +1 m ≤ m sin δ 2   2 The right side becomes arbitraryly small as m . Using Proposition6.1(b) uniform con- → ∞ vergence of (6.10) on [δ, 2π δ] follows. −

6.2.2 Uniform Convergence and Continuity

Ê Æ Theorem 6.5 Let E Ê be a subset and f : E , n , be a sequence of continuous ⊂ n → ∈ functions on E uniformly converging to some function f : E Ê. → Then f is continuous on E. 6.2 Uniform Convergence 163

Proof. Let a E and ε> 0 be given. Since f ⇉ f there is an r Æ such that ∈ n ∈ f (x) f(x) ε/3 for all x E. | r − | ≤ ∈ Since f is continuous at a, there exists δ > 0 such that x a < δ implies r | − | f (x) f(a) ε/3. | r − | ≤ Hence x a < δ implies | − | ε ε ε f(x) f(a) f(x) f (x) + f (x) f (a) + f (a) f(a) + + = ε. | − |≤| − r | | r − r | | r − | ≤ 3 3 3

This proves the assertion. C The same is true for functions f : E C where E . → ⊂ Example 6.3 (Example 6.2 continued) (a) A power series defines a continuous functions on the disc of convergence z z < R . { || | }

Indeed, let z < R. Then there exists r Ê such that z

(b) The sum of the Fourier series (6.9) is a continuous function on Ê if both a and n | n | b converge. n | n | P (c) The sum of the Fourier series f(x) = sin(nx) is continuous on [δ, 2π δ] for all δ with P n n − 0 <δ<π by the above theorem. Clearly, f is 2π-periodic since all partial sums are. P Later (see the section on Fourier series) we will show that

π/2 f(x) 0, x =0, f(x)= ... π x π 2π ( − , x (0, 2π), 2 ∈

−π/2 Since f is discontinuous at x0 = 2πn, the Fourier series

does not converge uniformly on Ê.

n Also, Example6.1 (b) shows that the continuity of the fn(x) = x alone is not sufficient for the continuity of the limit function. On the other hand, the sequence of continuous functions (xn) on (0, 1) converges to the continuous function 0. However, the convergence is not uniform. Prove!

6.2.3 Uniform Convergence and Integration

2 n2x2 Example 6.4 Let fn(x)=2n xe− ; clearly limn fn(x)=0 for all x Ê. Further →∞ ∈ 1 1 n2x2 n2 fn(x) dx = e− = 1 e− 1. 0 − 0 − −→ Z  

164 6 Sequences of Functions and Basic Topology

1 1 On the other hand limn fn(x) dx = 0 dx = 0. Thus, limn and integration cannot 0 →∞ 0 →∞ be interchanged. The reason, (f ) converges pointwise to 0 but not uniformly. Indeed, R n R 2 1 2n 1 2n fn = e− = + . n n e n−→ ∞   →∞ Theorem 6.6 Let α be an increasing function on [a, b]. Suppose f R(α) on [a, b] for all n ∈ n Æ and suppose f f uniformly on [a, b]. Then f R(α) on [a, b] and ∈ n → ∈ b b f dα = lim fn dα. (6.11) n Za →∞ Za Proof. Put

εn = sup fn(x) f(x) . x [a,b] | − | ∈ Then f ε f f + ε , n − n ≤ ≤ n n so that the upper and the lower integrals of f satisfy

b b b b (f ε ) dα f dα f dα (f + ε ) dα. (6.12) n − n ≤ ≤ ≤ n n Za Z a Z a Za Hence, 0 f dα f dα 2ε (α(b) α(a)). ≤ − ≤ n − Z Z Since ε 0 as n (Remark 6.1), the upper and the lower integrals of f are equal. Thus n → → ∞ f R(α). Another application of (6.12) yields ∈ b b b (fn εn) dα f dα (fn + εn) dα a − ≤ a ≤ a Z b Z b Z f dα f dα ε ((α(b) α(a)). − n ≤ n − Za Za

This implies (6.11).

Corollary 6.7 If f R(α) on [a, b] and if the series n ∈ ∞ f(x)= f (x), a x b n ≤ ≤ n=1 X converges uniformly on [a, b], then

b ∞ ∞ b fn dα = fn dα . a n=1 ! n=1 a Z X X Z  In other words, the series may be integrated term by term. 6.2 Uniform Convergence 165

Corollary 6.8 Let f : [a, b] Ê be a sequence of continuous functions uniformly converging n → on [a, b] to f. Let x0 [a, b]. ∈ x x Then the sequence F (x)= f (t) dt converges uniformly to F (x)= f(t) dt. n x0 n x0

Proof. The pointwise convergenceR of Fn follows from the above theoremR with α(t) = t and a and b replaced by x0 and x.

We show uniform convergence: Let ε > 0. Since f ⇉ f on [a, b], there exists n Æ such n 0 ∈ that n n implies f (t) f(t) ε for all t [a, b]. For n n and all x [a, b] we ≥ 0 | n − | ≤ b a ∈ ≥ 0 ∈ thus have − x x ε F (x) F (x) = (f (t) f(t)) dt f (t) f(t) dt (b a)= ε. | n − | n − ≤ | n − | ≤ b a − Zx0 Zx0 − Hence, F ⇉ F on [a, b]. n

Example 6.5 (a) For every real t ( 1, 1) we have ∈ − 2 3 n 1 t t ∞ ( 1) − log(1 + t)= t + = − tn. (6.13) − 2 3 ∓··· n n=1 X Proof. In Homework 13.5 (a) there was computed the Taylor series

n 1 ∞ ( 1) − T (x)= − xn n n=1 X of log(1 + x) and it was shown that T (x) = log(1+ x) if x (0, 1). ∈ ∞ By Proposition6.4 the geometric series ( 1)nxn converges uniformly to the function 1 − 1+x n=0 on [ r, r] for all 0

(b) For t < 1 we have | | t3 t5 ∞ t2n+1 arctan t = t + = ( 1)n (6.14) − 3 5 ∓··· − 2n +1 n=0 X As in the previous example we use the uniform convergence of the geometric series on [ r, r] − for every 0

t dx t ∞ ∞ t ∞ ( 1)n arctan t = = ( 1)nx2n dx = ( 1)n x2n dx = − t2n+1. 1+ x2 − − 2n +1 0 0 n=0 n=0 0 n=0 Z Z X X Z X 166 6 Sequences of Functions and Basic Topology

Note that you are, in general, not allowed to insert t = 1 into the equations (6.13) and (6.14). However, the following proposition (the proof is in the appendix to this chapter) fills this gap.

Proposition 6.9 (Abel’s Limit Theorem) Let n∞=0 an a convergent series of real numbers. Then the power series P ∞ n f(x)= anx n=0 X converges for x [0, 1] and is continuous on [0, 1]. ∈ As a consequence of the above proposition we have

n 1 1 1 1 ∞ ( 1) − log2 = 1 + = − , − 2 3 − 4 ±··· n n=0 X π 1 1 1 ∞ ( 1)n =1 + = − . 4 − 3 5 − 7 ±··· 2n +1 n=0 X

1 x 1 Example 6.6 We have f (x)= e− n ⇉ f(x) 0 on [0, + ). Indeed, f (x) 0 <ε n n ≡ ∞ | n − | ≤ n 1 if n and for all x Ê . However, ≥ ε ∈ + + ∞ t ∞ t fn(t) dt = e− n = lim 1 e− n =1. 0 − 0 t + − Z → ∞  

Hence ∞ ∞ lim fn(t) dt =1 =0= f(t) dt. n 6 →∞ Z0 Z0 That is, Theorem6.6 fails in case of improper integrals.

6.2.4 Uniform Convergence and Differentiation

Example 6.7 Let

sin(nx) Æ f (x)= , x Ê, n , (6.15) n √n ∈ ∈ f(x) = lim fn(x)=0. Then f ′(x)=0, and n →∞

fn′ (x)= √n cos(nx), so that (fn′ ) does not converge to f ′. For instance fn′ (0) = √n + as n , whereas n −→→∞ ∞ → ∞ f ′(0) = 0. Notethat (f ) converges uniformly to 0 on Ê since sin(nx)/√n 1/√n becomes n | | ≤ small, independently on x Ê. ∈ Consequently, uniform convergence of (fn) implies nothing about the sequence (fn′ ). Thus, stronger hypothesis are required for the assertion that f f implies f ′ f ′ n → n → 6.2 Uniform Convergence 167

Theorem 6.10 Suppose (fn) is a sequence of continuously differentiable functions on [a, b] pointwise converging to some function f. Suppose further that (fn′ ) converges uniformly on [a, b].

Then fn converges uniformly to f on [a, b], f is continuously differentiable and on [a, b], and

f ′(x) = lim fn′ (x), a x b. (6.16) n →∞ ≤ ≤

Proof. Put g(x) = limn fn′ (x), then g is continuous by Theorem6.5. By the Fundamental →∞ Theorem of Calculus, Theorem 5.14,

x

fn(x)= fn(a)+ fn′ (t) dt. Za

By assumption on (fn′ ) and by Corollary6.8 the sequence

x

fn′ (t) dt

a n Æ Z  ∈ converges uniformly on [a, b] to x g(t) dt. Taking the limit n in the above equation, we a →∞ thus obtain R x f(x)= f(a)+ g(t) dt. Za Since g is continuous, the right hand side defines a differentiable function, namely the antiderivative of g(x), by the FTC. Hence, f ′(x)= g(x); since g is continuous the proof is now complete.

For a more general result (without the additional assumption of continuity of fn′ ) see [Rud76, 7.17 Theorem].

n Corollary 6.11 Let f(x)= n∞=0 anx be a power series with radius of convergence R. (a) Then f is differentiable on ( R, R) and we have P −

∞ n 1 f ′(x)= na x − , x ( R, R). (6.17) n ∈ − n=1 X (b) The function f is infinitely often differentiable on ( R, R) and we have −

∞ (k) n k f (x)= n(n 1) (n k + 1)a x − , (6.18) − ··· − n Xn=k 1 (n)

a = f (0), n Æ . (6.19) n n! ∈ 0 In particular, f coincides with its Taylor series.

n Proof. (a) By Remark 6.2 (a), the power series n∞=0(anx )′ has the same radius of convergence and converges uniformly on every closed subinterval [ r, r] of ( R, R). By Theorem 6.10, P − − f(x) is differentiable and differentiation and summation can be interchanged. 168 6 Sequences of Functions and Basic Topology

(k 1) (b) Iterated application of (a) yields that f − is differentiable on ( R, R) with (6.18). In − particular, inserting x =0 into (6.18) we find

f (k)(0) f (k)(0) = k!a , = a = . k ⇒ k k! These are exactly the Taylor coefficients of f hat a = 0. Hence, f coincides with its Taylor series.

Example 6.8 For x ( 1, 1) we have ∈ − ∞ x nxn = . (1 x)2 n=1 X − n Since the geometric series f(x)= ∞ x equals 1/(1 x) on ( 1, 1) by Corollary6.11 we n=0 − − have P ∞ ∞ ∞ 1 d 1 d n d n n 1 = = x = (x )= nx − . (1 x)2 dx 1 x dx dx n=0 ! n=1 n=1 −  −  X X X Multiplying the preceding equation by x gives the result.

6.3 Fourier Series

In this section we consider basic notions and results of the theory of Fourier series. The question

is to write a periodic function as a series of cos kx and sin kx, k Æ. In contrast to Taylor ∈ expansions the periodic function need not to be infinitely often differentiable. Two Fourier series may have the same behavior in one interval, but may behave in different ways in some other interval. We have here a very striking contrast between Fourier series and power series.

In this section a periodic function is meant to be a 2π-periodic complex valued function on Ê,

C Ê that is f : Ê satisfies f(x +2π)= f(x) for all x . Special periodic functions are the → ∈

trigonometric polynomials. Ê Definition 6.3 A function f : Ê is called trigonometric polynomial if there are real num- → bers ak, bk, k =0,...,n with

a n f(x)= 0 + a cos kx + b sin kx. (6.20) 2 k k Xk=1

The coefficients ak and bk are uniquely determined by f since

1 2π a = f(x) cos kx dx, k =0, 1,...,n, k π Z0 (6.21) 1 2π b = f(x) sin kx dx, k =1,...,n. k π Z0 6.3 Fourier Series 169

This is immediate from

2π cos kx sin mx dx =0, 0 Z 2π

cos kx cos mx dx = πδkm, k, m Æ, (6.22) 0 ∈ Z 2π sin kx sin mx dx = πδkm, Z0 where δ = 1 if k = m and δ = 0 if k = m is the so called Kronecker symbol, see km km 6 Homework 19.2. For example, if m 1 we have ≥ 2π 2π n 1 1 a0 f(x) cos mx dx = + ak cos kx + bk sin kx cos mx dx π 0 π 0 2 ! Z Z Xk=1 1 n 2π = (ak cos kx cos mx + bk sin kx cos mx) dx π 0 ! Xk=1 Z 1 n = akπδkm = am. π ! Xk=1 Sometimes it is useful to consider complex trigonometric polynomials. Using the formulas ix ix expressing cos x and sin x in terms of e and e− we can write the above polynomial (6.20) as

n ikx f(x)= cke , (6.23) k= n X− where c0 = a0/2 and

1 1 ck = (ak ibk) , c k = (ak + ibk) , k 1. 2 − − 2 ≥

To obtain the coefficients ck using integration we need the notion of an integral of a complex- valued function, see Section 5.5. If m =0 we have 6 b 1 b eimx dx = eimx . im Za a

If a =0 and b =2π and m Z we obtain ∈ 2π 0, m Z 0 , eimx dx = ∈ \ { } (6.24) Z0 (2π, m =0. We conclude,

2π 1 ikx c = f(x) e− dx, k =0, 1,..., n. k 2π ± ± Z0

170 6 Sequences of Functions and Basic Topology C Definition 6.4 Let f : Ê be a periodic function with f R on [0, 2π]. We call → ∈ 2π 1 ikx

c = f(x)e− dx, k Z (6.25) k 2π ∈ Z0 the Fourier coefficients of f, and the series

∞ ikx cke , (6.26) k= X−∞ i.e. the sequence of partial sums

n ikx

s = c e , n Æ, n k ∈ k= n X− the Fourier series of f.

The Fourier series can also be written as

a ∞ 0 + a cos kx + b sin kx. (6.27) 2 k k Xk=1 where ak and bk are given by (6.21). One can ask whether the Fourier series of a function converges to the function itself. It is easy to see: If the function f is the uniform limit of a series of trigonometric polynomials

∞ ikx f(x)= γke (6.28) k= X−∞ then f coincides with its Fourier series. Indeed, since the series (6.28) converges uniformly, by Proposition6.6 we can change the order of summation and integration and obtain

2π ∞ 1 imx ikx c = γ e e− dx k 2π m 0 m= ! Z X−∞ ∞ 2π 1 i(m k)x = γ e − dx = γ . 2π m k m= 0 X−∞ Z In general, the Fourier series of f neither converges uniformly nor pointwise to f. For Fourier series convergence with respect to the L2-norm

1 1 2π 2 f = f 2 dx (6.29) k k2 2π | |  Z0  is the appropriate notion. 6.3 Fourier Series 171

6.3.1 An Inner Product on the Periodic Functions C Let V be the linear space of periodic functions f : Ê , f R on [0, 2π]. We introduce an → ∈ inner product on V by 1 2π f g = f(x) g(x) dx, f,g V. 2π ∈ · Z0

One easily checks the following properties for f,g,h V , λ,µ C. ∈ ∈ f + g h = f h + g h, · · · f g + h = f g + f h, · · · λf µg = λ µ f g, · · f g = g f. · · For every f V we have f f = 1/(2π) 2π f 2 dx 0. However, f f = 0 does not imply ∈ 0 | | ≥ f = 0 (you can change f at· finitely many points without any impact· on f f). If f V is R ∈ continuous, then f f =0 implies f =0, see Homework14.3. Put f = √f· f. k k2

Note that in the physical· literature the inner product in L2(X) is often linear in· the second com-

Ê C ponent and antilinear in the first component. Define for k Z the periodic function e : ∈ k → by e (x) = eikx, the Fourier coefficients of f V take the form k ∈

ck = f ek, k Z. · ∈

From (6.24) it follows that the functions e , k Z, satisfy k ∈

ek el = δkl. (6.30) ·

Any such subset e k Æ of an inner product space V satisfying (6.30) is called an { k | ∈ } orthonormal system (ONS). Using ek(x) = cos kx + i sin kx the real orthogonality relations (6.22) immediately follow from (6.30). The next lemma shows that the Fourier series of f is the best L2-approximation of a periodic function f V by trigonometric polynomials. ∈ Lemma 6.12 (Least Square Approximation) Suppose f V has the Fourier coefficients c ,

∈ k C k Z and let γ be arbitrary. Then ∈ k ∈ n 2 n 2

f ckek f γkek , (6.31) − ≤ − k= n 2 k= n 2 X− X− and equality holds if and only if ck = γk for all k . Further,

n 2 n 2 2 f ckek = f 2 ck . (6.32) − k k − | | k= n 2 k= n X− X−

172 6 Sequences of Functions and Basic Topology

n

Proof. Let always denote . Put gn = γkek. Then k= n P X− P

f gn = f γk ek = γkf ek = ckγk · · · X P P and gn ek = γk such that 2 · gn gn = γk . · | | Noting that a b 2 =(a b)(a b)= a 2 +P b 2 ab ab, it follows that | − | − − | | | | − −

2 f gn = f gn f gn = f f f gn gn f + gn gn k − k2 − · − · − · − · · = f 2 c γ c γ + γ 2 k k2 − k k − k k | k | = f 2 P c 2 +P γ cP 2 (6.33) k k2 − | k | | k − k | P P which is evidently minimized if and only if γk = ck. Inserting this into (6.33), equation (6.32) follows.

Corollary 6.13 (Bessel’s Inequality) Under the assumptions of the above lemma we have

∞ c 2 f 2 . (6.34) | k | ≤k k2 k= X−∞

Proof. By equation (6.32), for every n Æ we have ∈ n c 2 f 2 . | k | ≤k k2 k= n X− Taking the limit or shows the assertion. n supn Æ →∞ ∈

An ONS e k Z is said to be complete if instead of Bessel’s inequality, equality holds for { k | ∈ } all f V . ∈ Definition 6.5 Let f , f V , we say that (f ) converges to f in L2 (denoted by f f) if n ∈ n n −→ k·k2

lim fn f =0. n 2 →∞ k − k Explicitly

2π 2 fn(x) f(x) dx 0. | − | n−→ Z0 →∞ 6.3 Fourier Series 173

Remarks 6.3 (a) Note that the L2-limit in V is not unique; changing f(x) at finitely many points of [0, 2π] does not change the integral 2π f f 2 dx.

0 | − n | Æ (b) If f ⇉ f on Ê then f f. Indeed, let ε > 0. Then there exists n such that n n −→ R 0 ∈ k·k2 implies . Hence n n0 supx Ê fn(x) f(x) ε ≥ ∈ | − | ≤ 2π 2π f f 2 dx ε2 dx =2πε2. | n − | ≤ Z0 Z0 This shows f f 0. n − −→ k·k2 (c) The above Lemma, in particular (6.32), shows that the Fourier series converges in L2 to f if and only if

∞ f 2 = c 2 . (6.35) k k2 | k | k= X−∞ This is called Parseval’s Completeness Relation. We will see that it holds for all f V . ∈ Let use write ∞ f(x) c eikx ∼ k k= X−∞ to express the fact that (ck) are the (complex) Fourier coefficients of f. Further

n ikx sn(f)= sn(f; x)= cke (6.36) k= n X− denotes the nth partial sum.

Theorem 6.14 (Parseval’s Completeness Theorem) The ONS e k Z is complete. { k | ∈ } More precisely, if f,g V with ∈ ∞ ∞ f c e , g γ e , ∼ k k ∼ k k k= k= X−∞ X−∞ then 2π 1 2 (i) lim f sn(f) dx =0, (6.37) n 2π | − | →∞ Z0 1 2π ∞ (ii) f g dx = c γ , (6.38) 2π k k 0 k= Z X−∞ 1 2π ∞ a2 1 ∞ (iii) f 2 dx = c 2 = 0 + (a2 + b2) Parseval’s formula. 2π | | | k | 4 2 k k 0 k= k=1 Z X−∞ X (6.39)

The proof is in Rudin’s book, [Rud76, 8.16, p.191]. It uses Stone–Weierstraß theorem about the uniform approximation of a continuoous function by polynomials. An elementary proof is in Forster’s book [For01, 23]. § 174 6 Sequences of Functions and Basic Topology

Example 6.9 (a) Consider the periodic function f V given by ∈ 1, 0 x < π f(x)= ≤ ( 1, π x< 2π. − ≤

Since f is an odd function the coefficients ak vanish. We compute the Fourier coefficients bk.

2 π 2 π 2 0, if k is even, b = sin kx dx = cos kx = ( 1)k+1 +1 = k 4 π 0 −kπ − kπ − , if k is odd.. Z 0 ( kπ 

The Fourier series of f reads 4 ∞ sin(2n + 1)x f . ∼ π 2n +1 n=0 X Noting that a2 1 c 2 = 0 + (a2 + b2 )

| k | 4 2 n n Æ k Z n X∈ X∈ Parseval’s formula gives

2π ∞ 2 2 1 1 2 8 1 8 π f 2 = dx =1= bn = 2 2 =: 2 s1 = s1 = . k k 2π 0 2 π (2n + 1) π ⇒ 8 n Æ n=0 Z X∈ X 1 ∞ Now we can compute s = n=1 n2 . Since this series converges absolutely we are allowed to rearrange the elements in such a way that we first add all the odd terms, which gives s1 and then P 2 all the even terms which gives s0. Using s1 = π /8 we find 1 1 1 s = s + s = s + + + + 1 0 1 22 42 62 ··· 1 1 1 s s = s + + + = s + 0 22 12 22 ··· 1 4   4 π2 s = s = . 3 1 6 (b) Fix a [0, 2π] and consider f V with ∈ ∈ 1, 0 x a, f(x)= ≤ ≤ (0, a

1 a a The Fourier coefficients of f are c = dx = and 0 2π 2π Z0 a 1 ikx i ika c = f e = e− dx = e− 1 , k =0. k k 2π 2πk − 6 · Z0  If k =0, 6 2 1 ika ika 1 cos ka c = 1 e 1 e− = − , | k | 4π2k2 − − 2π2k2   6.3 Fourier Series 175 hence Parseval’s formula gives

∞ a2 ∞ 1 cos ak c 2 = + − | k | 4π2 π2k2 k= k=1 X−∞ X a2 1 ∞ 1 1 ∞ cos ak = + 4π2 π2 k2 − π2 k2 Xk=1 Xk=1 a2 1 ∞ cos ak = 2 + 2 s 2 , 4π π − k ! Xk=1 where s = 1/k2. On the other hand

P 1 a a f 2 = dx = . k k2 2π 2π Z0 Hence, (6.37) reads

a2 1 ∞ cos ka a 2 + 2 s 2 = 4π π − k ! 2π Xk=1 ∞ cos ka a2 aπ π2 (a π)2 π2 = + = − . (6.40) k2 4 − 2 6 4 − 12 Xk=1 Since the series

∞ cos kx (6.41) k2 Xk=1 2 converges uniformly on Ê (use Theorem6.2 and k 1/k is an upper bound) (6.41) is the Fourier series of the function P (x π)2 π2 − , x [0, 2π] 4 − 12 ∈

and the Fourier series converges uniformly on Ê to the above function. Since the term by term differentiated series converges uniformly on [δ, 2π δ], see Example6.2, we obtain − ∞ sin kx ∞ cos kx ′ (x π)2 π2 ′ x π = = − = − − k k2 4 − 12 2 Xk=1 Xk=1     which is true for x (0, 2π). ∈ We can also integrate the Fourier series and obtain by Corollary 6.7

x x ∞ cos kt ∞ 1 x ∞ 1 dt = cos kt dt = sin kt k2 k2 k3 Z0 k=1 k=1 Z0 k=1 0 X X X ∞ sin kx = . k3 Xk=1 176 6 Sequences of Functions and Basic Topology

On the other hand,

x (t π)2 π2 (x π)3 π2 π3 − dt = − x + . 4 − 12 12 − 12 12 Z0   By homework 19.5 ∞ sin kx (x π)3 π2 π3 f(x)= = − x + k3 12 − 12 12 Xk=1

defines a continuously differentiable periodic function on Ê. Ê Theorem 6.15 Let f : Ê be a continu- → ous periodic function which is piecewise con- π 2π 3π 4π tinuously differentiable, i.e. there exists a par-

tition t0,...,tr of [0, 2π] such that f [ti 1, ti] { } | − is continuously differentiable. Then the Fourier series of f converges uni- formly to f.

π 2π 3π 4π

Ê Ê Proof. Let ϕi : [ti 1, ti] Ê denote the continuous derivative of f [ti 1, ti] and ϕ: − → | − → the periodic function that coincides with ϕi on [ti 1, ti]. By Bessel’s inequality, the Fourier − coefficients γk of ϕ satisfy ∞ γ 2 ϕ 2 < . | k | ≤k k2 ∞ k= X−∞ If k =0 the Fourier coefficients c of f can be found using integration by parts from the Fourier 6 k coefficients of γk.

ti ti ikx i ikx ti ikx f(x)e− dx = f(x)e− ϕ(x)e− dx . k ti−1 − Zti−1  Zti−1 

Hence summation over i =1,...,r yields,

2π r ti 1 ikx 1 ikx c = f(x)e− dx = f(x)e− dx k 2π 2π Z0 i=1 Zti−1 2π X i ikx iγk c = − ϕ(x)e− dx = − . k 2πk k Z0

Note that the term r ti ikx f(x)e− i=1 ti−1 X vanishes since f is continuous and f(2π) = f(0) . Since for α, β C we have αβ ∈ | | ≤ 1 ( α 2 + β 2), we obtain 2 | | | | 1 1 2 ck + γk . | | ≤ 2 k 2 | | | |  6.4 Basic Topology 177

∞ 1 ∞ Since both and γ 2 converge, k2 | k | k=1 k= X X−∞

∞ c < . | k | ∞ k= X−∞ Thus, the Fourier series converges uniformly to a continuous function g (see Theorem 6.5). Since the Fourier series converges both to f and to g in the L2 norm, f g =0. Since both k − k2 f and g are continuous, they coincide. This completes the proof.

Note that for any f V , the series c 2 converges while the series c Z k Z k k k ∈ ∈ | | ∈ | | converges only if the Fourier series converges uniformly to f. P P

6.4 Basic Topology

In the study of functions of several variables we need some topological notions like neighbor- hood, open set, closed set, and compactness.

6.4.1 Finite, Countable, and Uncountable Sets

Definition 6.6 If there exists a 1-1 mapping of the set A onto the the B (a bijection), we say that A and B have the same cardinality or that A and B are equivalent; we write A B. ∼

Definition 6.7 For any nonnegative integer n Æ let N be the set 1, 2,...,n . For any set ∈ 0 n { } A we say that:

(a) A is finite if A N for some n. The empty set ∅ is also considered to be ∼ n finite. (b) A is infinite if A is not finite.

(c) A is countable if A Æ. ∼ (d) A is uncountable if A is neither finite nor countable. (e) A is at most countable if A is finite or countable.

For finite sets A and B we evidently have A B if A and B have the same number of elements. ∼ For infinite sets, however, the idea of “having the same number of elements” becomes quite vague, whereas the notion of 1-1 correspondence retains its clarity.

Example 6.10 (a) Z is countable. Indeed, the arrangement

0, 1, 1, 2, 2, 3, 3,...

− − −

Z Z gives a bijection between Æ and . An infinite set can be equivalent to one of its proper subsets N. 178 6 Sequences of Functions and Basic Topology

(b) Countable sets represent the “smallest” infinite cardinality: No uncountable set can be a

subset of a countable set. Any countable set can be arranged in a sequence. In particular, É is contable, see Example2.6(c). (c) The countable union of countable sets is a countable set; this is Cantor’s First Diagonal Process: x x x x . . . 11 → 12 13 → 14 ւ ր ւ ր x21 x22 x23 x24 . . . ↓ր ւ ր x31 x32 x33 x34 . . . ւ ր ւ x x x x 41 42 43 44 ··· ↓ ր x 51 ···

(d) Let A = (x ) x 0, 1 n Æ be the set of all sequences whose elements are 0 { n | n ∈ { } ∀ ∈ } and 1. This set A is uncountable. In particular, Ê is uncountable. Proof. Suppose to the contrary that A is countable and arrange the elements of A in a sequence

(sn)n Æ of distinct elements of A. We construct a sequence s as follows. If the nth element in ∈ sn is 1 we let the nth digit of s be 0, and vice versa. Then the sequence s differs from every member s ,s ,... at least in one place; hence s A—a contradiction since s is indeed an 1 2 6∈ element of A. This proves, A is uncountable.

6.4.2 Metric Spaces and Normed Spaces

Definition 6.8 A set X is said to be a metric space if for any two points x, y X there is ∈ associated a real number d(x, y), called the distance of x and y such that

(a) d(x, x)=0 and d(x, y) > 0 for all x, y X with x = y (positive definiteness); ∈ 6 (b) d(x, y)= d(y, x) (symmetry); (c) d(x, y) d(x, z)+ d(z,y) for any z X (triangle inequality). ≤ ∈

Any function d: X X Ê with these three properties is called a distance function or metric × →

on X.

Ê É Z Example 6.11 (a) C, , , and are metric spaces with d(x, y) := y x . | − | Any subsets of a metric space is again a metric space. 2 (b) The real plane Ê is a metric space with respect to

d ((x , x ), (y ,y )) := (x x )2 +(y y )2, 2 1 2 1 2 1 − 2 1 − 2 d ((x , x ), (y ,y )) := x x + y y . 1 1 2 1 2 |p1 − 2 | | 1 − 2 | d2 is called the euclidean metric. 6.4 Basic Topology 179

(c) Let X be a set. Define 1, if x = y, d(x, y) := 6 (0, if x = y.

Then (X,d) becomes a metric space. It is called the discrete metric space. Ê Definition 6.9 Let E be a vector space over C (or ). Suppose on E there is given a function

: E Ê which associates to each x E a real number x such that the following three k·k → ∈ k k conditions are satisfied:

(i) x 0 for every x E, and x =0 if and only if x =0,

k k ≥ ∈ k k Ê (ii) λx = λ x for all λ C (in , resp.) k k | |k k ∈ (iii) x + y x + y , for all x, y E. k k≤k k k k ∈ Then E is called a normed (vector) space and x is the norm of x. k k x generalizes the “length” of vector x E. Every normed vector space E is a metric space k k ∈ if we put d(x, y)= x y . Prove! k − k

However, there are metric spaces that are not normed spaces, for example (Æ,d(m, n) = n m ). | − |

k k C Example 6.12 (a) E = Ê or E = . Let x =(x , , x ) E and define 1 ··· k ∈

k x = x 2. k k2 v | i | u i=1 uX t Then is a norm on E. It is called the Euclidean norm. k·k There are other possibilities to define a norm on E. For example,

x = max xi , ∞ 1 i k k k ≤ ≤ | | k x = x , k k1 | k | i=1 X x = x +3 x , x = max( x , x ). k ka k k2 k k1 k kb k k1 k k2 (b) E = C([a, b]). Let p 1. Then ≥ f = sup f(x) , k k∞ x [a,b] | | ∈ 1 b p f = f(t) p dt . k kp | | Za  p define norms on E. Note that f p √b a f . k k ≤ − k k∞ 2 (c) Hilbert’s sequence space. E = ℓ = (x ) ∞ x < . Then 2 { n | n=1 | n | ∞} 1 P 2 ∞ x = x 2 k k2 | n | n=1 ! X 180 6 Sequences of Functions and Basic Topology

defines a norm on ℓ2. (d) The bounded sequences. . Then E = ℓ = (xn) supn Æ xn < ∞ { | ∈ | | ∞}

x = sup xn

k k∞ n Æ | | ∈ defines a norm on E. (e) E = C([a, b]). Then b f = f(t) dt k k1 | | Za defines a norm on E.

6.4.3 Open and Closed Sets

Definition 6.10 Let X be a metric space with metric d. All points and subsets mentioned below are understood to be elements and subsets of X, in particular, let E X be a subset of X. ⊂ (a) The set U (x) = y d(x, y) < ε with some ε > 0 is called the ε { | } ε-neighborhood (or ε-ball with center x) of x. The number ε is called the radius of

the neighborhood Uε(x).

(b) A point p is an interior or inner point of E if there is a neighborhood Uε(p) completely contained in E. E is open if every point of E is an interior point. (c) A point p is called an accumulation or limit point of E if every neighborhood of p has a point q = p such that q E. 6 ∈ (d) E is said to be closed if every accumulation point of E is a point of E. The clo- sure of E (denoted by E) is E together with all accumulation points of E. In other words p E, if and only if every neighborhood of x has a non-empty intersection ∈ with E. (e) The complement of E (denoted by Ec) is the set of all points p X such that ∈ p E. 6∈ (f) E is bounded if there exists a real number C > 0 such that d(x, y) C for all ≤ x, y E. ∈

(g) E is dense in X if E = X. Ê Example 6.13 (a) X = Ê with the standard metric d(x, y) = x y . E = (a, b) is | − | ⊂ an open set. Indeed, for every x (a, b) we have U (x) (a, b) if ε is small enough, say ∈ ε ⊂ ε min x a , x b . Hence, x is an inner point of (a, b). Since x was arbitrary, (a, b) ≤ {| − | | − |} is open.

F = [a, b) is not open since a is not an inner point of [a, b). Indeed, Uε(a) ( [a, b) for every ε> 0. We have E = F = set of accumulation points = [a, b]. 6.4 Basic Topology 181

Indeed, a is an accumulation point of both (a, b) and [a, b). This is true since every neighbor- hood U (a), ε < b a, has a + ε/2 (a, b) (resp. in [a, b)) which is different from a. For any ε − ∈ point x [a, b] we find a neighborhood U (x) with U (x) [a, b)= ∅; hence x E.

6∈ ε ε ∩ 6∈ Ê The set of rational numbers É is dense in . Indeed, every neighborhood Uε(r) of every real number r contains a rational number, see Proposition1.11(b). For the real line one can prove: Every open set is the at most countable union of disjoint open

(finite or infinite) intervals. A similar description for closed subsets of Ê is false. There is no k similar description of open subsets of Ê , k 2. ≥ (b) For every metric space X, both the whole space X and the empty set ∅ are open as well as closed.

k k Ê (c) Let B = x Ê x < 1 be the open unit ball in . B is open (see Lemma6.16 { ∈ | k k2 } below); B is not closed. For example, x0 = (1, 0,..., 0) is an accumulation point of B since xn = (1 1/n, 0,..., 0) is a sequence of elements of B converging to x0, however, x0 B. − k 6∈ The accumulation points of B are B = x Ê x 2 1 . This is also the closure of B in k { ∈ | k k ≤ } Ê .

f+ε (d) Consider E = C([a, b]) with the supremum norm. g Then g E is in the ε-neighborhood of a function f E f- ε ∈ ∈ if and only if

f(t) g(t) < ε, for all x [a, b]. | − | ∈

Lemma 6.16 Every neighborhood Ur(p), r > 0, of a point p is an open set. Proof. Let q U (p). Then there exists ε > 0 such that d(q,p) = ∈ r r ε. We will show that U (q) U (p). For, let x U (q). Then − ε ⊂ r ∈ ε by the triangle inequality we have

q x d(x, p) d(x, q)+ d(q,p) <ε +(r ε)= r. p ≤ − Hence x U (p) and q is an interior point of U (p). Since q was ∈ r r arbitrary, Ur(p) is open.

Remarks 6.4 (a) If p is an accumulation point of a set E, then every neighborhood of p contains infinitely many points of E. (b) A finite set has no accumulation points; hence any finite set is closed.

Example 6.14 (a) The open complex unit disc, z C z < 1 . { ∈ || | }

(b) The closed unit disc, z C z 1 . { ∈ || | ≤ } (c) A finite set.

(d) The set Z of all integers.

(e) 1/n n Æ . { | ∈ }

(f) The set C of all complex numbers. (g) The interval (a, b). 182 6 Sequences of Functions and Basic Topology

Here (d), (e), and (g) are regarded as subsets of Ê. Some properties of these sets are tabulated below: Closed Open Bounded (a) No Yes Yes (b) Yes No Yes (c) Yes No Yes (d) Yes No No (e) No No Yes (f) Yes Yes No (g) No Yes Yes

Proposition 6.17 A subset E X of a metric space X is open if and only if its complement ⊂ Ec is closed.

Proof. First, suppose Ec is closed. Choose x E. Then x Ec, and x is not an accumulation ∈ 6∈ point of Ec. Hence there exists a neighborhood U of x such that U Ec is empty, that is ∩ U E. Thus x is an interior point of E and E is open. ⊂ Next, suppose that E is open. Let x be an accumulation point of Ec. Then every neighborhood of x contains a point of Ec, so that x is not an interior point of E. Since E is open, this means that x Ec. It follows that Ec is closed. ∈

6.4.4 Limits and Continuity In this section we generalize the notions of convergent sequences and continuous functions to arbitrary metric spaces.

Definition 6.11 Let X be a metric space and (xn) a sequence of elements of X. We say that (xn) converges to x X if lim d(xn, x)=0. We write lim xn = x or xn x. n n n ∈ →∞ →∞ −→→∞

In other words, limn xn = x if for every neighborhood Uε, ε> 0, of x there exists an n0 Æ →∞ ∈ such that n n implies x U . ≥ 0 n ∈ ε Note that a subset F of a metric space X is closed if and only if F contains all limits of convergent sequences (x ), x F . n n ∈ Two metrics d1 and d2 on a space X are said to be topologically equivalent if limn xn = x →∞ w.r.t. d1 if and only if limn xn = x w.r.t. d2. In particular, two norms and on the →∞ k·k1 k·k2 same linear space E are said to be equivalent if the metric spaces are topologically equivalent.

Proposition 6.18 Let E = (E, ) and E = (E, ) be normed vector spaces such that 1 k·k1 2 k·k2 there exist positive numbers c1,c2 > 0 with

c x x c x , for all x E. (6.42) 1 k k1 ≤k k2 ≤ 2 k k1 ∈ Then and are equivalent. k·k1 k·k2 6.4 Basic Topology 183

Proof. Condition (6.42) is obviously symmetric with respect to E and E since x /c 1 2 k k2 2 ≤ x x /c1. Therefore, it is sufficient to show the following: If xn x w.r.t. then 1 2 n 1 k k ≤k k −→→∞ k·k xn x w.r.t. . Indeed, by Definition, limn xn x =0. By assumption, n 2 →∞ 1 −→→∞ k·k k − k

c x x x x c x x , n Æ. 1 k n − k1 ≤k n − k2 ≤ 2 k n − k1 ∈ Since the first and the last expressions tend to 0 as n , the sandwich theorem shows that → ∞ limn xn x =0, too. This proves xn x w.r.t. . →∞ k − k2 → k·k2

k k p p p C Example 6.15 Let E = Ê or E = with the norm x = x + x , p [1, ]. k kp | 1 | ···| k | ∈ ∞ All these norms are equivalent. Indeed, p k k p p p p x xi x = k x , k k∞ ≤ | | ≤ k k∞ k k∞ i=1 i=1 X X √p = x x p k x . (6.43) ⇒ k k∞ ≤k k ≤ k k∞ The following Proposition is quite analogous to Proposition2.33 with k = 2. Recall that a complex sequence (zn) converges if and only if both Re zn and Im zn converge.

k Proposition 6.19 Let (x ) be a sequence of vectors of the euclidean space (Ê , ), n k·k2

xn =(xn1,...,xnk).

k Then (x ) converges to a =(a ,...,a ) Ê if and only if n 1 k ∈

lim xni = ai, i =1,...,k. n →∞

Proof. Suppose that lim xn = a. Given ε > 0 there is an n0 Æ such that n n0 implies n ∈ ≥ x a <ε. Thus,→∞ for i =1,...,k we have k n − k2 x a x a <ε; | ni − i |≤k n − k2 hence lim xni = ai. n →∞ Conversely, suppose that lim xni = ai for i = 1,...,k. Given ε > 0 there are n0i Æ such n ∈ that n n implies →∞ ≥ 0i ε xni ai < . | − | √k For n max n ,...,n we have (see (6.43)) ≥ { 01 0k} √ xn a 2 k xn a < ε. k − k ≤ k − k∞ hence lim xn = a. n →∞ 184 6 Sequences of Functions and Basic Topology

k Corollary 6.20 Let B Ê be a bounded subset and (x ) a sequence of elements of B. ⊂ n Then (xn) has a converging subsequence.

(1) Proof. Since B is bounded all coordinates of B are bounded; hence there is a subsequence (xn ) (2) (1) of (xn) such that the first coordinate converges. Further, there is a subsequence (xn ) of (xn ) (k) (k 1) such that the second coordinate converges. Finally there is a subsequence (xn ) of (xn − ) (k) such that all coordinates converge. By the above proposition the subsequence (xn ) converges k in Ê .

k The same statement is true for subsets B C . ⊂ Definition 6.12 A mapping f : X Y from the metric space X into the metric space Y is → said to be continuous at a X if one of the following equivalent conditons is satisfied. ∈ (a) For every ε> 0 there exists δ > 0 such that for every x X ∈ d(x, a) < δ implies d(f(x), f(a)) < ε. (6.44)

(b) For any sequence (xn), xn X with lim xn = a it follows that lim f(xn)= f(a). ∈ n n The mapping f is said to be continuous on→∞X if f is continuous at every→∞ point a of X.

Proposition 6.21 The composition of two continuous mappings is continuous.

The proof is completely the same as in the real case (see Proposition3.4) and we omit it. We give the topological description of continuous functions.

Proposition 6.22 Let X and Y be metric spaces. A mapping f : X Y is continuous if and → only if the preimage of any open set in Y is open in X.

1 Proof. Suppose that f is continuous and G Y is open. If f − (G) = ∅, there is nothing to ⊂ 1 prove; the empty set is open. Otherwise there exists x f − (G), and therefore f(x ) G. 0 ∈ 0 ∈ Since G is open, there is ε > 0 such that Uε(f(x0)) G. Since f is continuous at x0, there ⊂ 1 exists δ > 0 such that x Uδ(x0) implies f(x) Uε(f(x0)) G. That is, Uδ(x0) f − (G), ∈ 1 1 ∈ ⊂ ⊂ and x0 is an inner point of f − (G); hence f − (G) is open. Suppose now that the condition of the proposition is fulfilled. We will show that f is continu- 1 ous. Fix x0 X and ε> 0. Since G = Uε(f(x0)) is open by Lemma6.16, f − (G) is open by ∈ 1 assumption. In particular, x0 f − (G) is an inner point. Hence, there exists δ > 0 such that 1 ∈ U (x ) f − (G). It follows that f(U (x )) U (x ); this means that f is continuous at x . δ 0 ⊂ δ 0 ⊂ ε 0 0 Since x0 was arbitrary, f is continuous on X.

Remark 6.5 Since the complement of an open set is a closed set, it is obvious that the proposi- tion holds if we replace “open set” by “closed set.” In general, the image of an open set under a continuous function need not to be open; consider for example f(x) = sin x and G = (0, 2π) which is open; however, f((0, 2π))=[ 1, 1] is not − open. 6.4 Basic Topology 185

6.4.5 Comleteness and Compactness (a) Completeness

Definition 6.13 Let (X,d) be a metric space. A sequence (xn) of elements of X is said to be a

Cauchy sequence if for every ε> 0 there exists a positive integer n Æ such that 0 ∈ d(x , x ) <ε for all m, n n . n m ≥ 0 A metric space is said to be complete if every Cauchy sequence converges. A complete normed vector space is called a Banach space.

k k C Remark. The euclidean k-space Ê and is complete.

The function space C([a, b]), ) is complete, see homework 21.2. The Hilbert space ℓ2 is k·k∞ complete

(b) Compactness

The notion of compactness is of great importance in analysis, especially in connection with continuity. By an open cover of a set E in a metric space X we mean a collection G α I of open { α | ∈ } subsets of X such that E G . Here I is any index set and ⊂ α α S G = x X β I : x G . α { ∈ | ∃ ∈ ∈ β} α I [∈ Definition 6.14 (Covering definition) A subset K of a metric space X is said to be compact if every open cover of K contains a finite subcover. More explicitly, if G is an open cover of { α} K, then there are finitely many indices α1,...,αn such that K G G . ⊂ α1 ∪···∪ αn Note that the definition does not state that a set is compact if there exists a finite open cover—the whole space X is open and a cover consisting of only one member. Instead, every open cover has a finite subcover.

Example 6.16 (a) It is clear that every finite set is compact.

(b) Let (xn) be a converging to x sequence in a metric space X. Then

A = x n Æ x { n | ∈ }∪{ } is compact. Proof. Let G be any open cover of A. In particular, the limit point x is covered by, say, G . { α} 0

Then there is an n Æ such that x G for every n n . Finally, x is covered by some 0 ∈ n ∈ 0 ≥ 0 k G , k =1,...,n 1. Hence the collection k 0 − G k =0, 1,...,n 1 { k | 0 − } is a finite subcover of A; therefore A is compact. 186 6 Sequences of Functions and Basic Topology

Proposition 6.23 (Sequence Definition) A subset K of a metric space X is compact if and only if every sequence in K contains a convergent subsequence with limit in K.

Proof. (a) Let K be compact and suppose to the contrary that (xn) is a sequence in K without any convergent to some point of K subsequence. Then every x K has a neighborhood U ∈ x containing only finitely many elements of the sequence (xn). (Otherwise x would be a limit point of (xn) and there were a converging to x subsequence.) By construction,

K U . ⊂ x x X [∈ Since K is compact, there are finitely many points y ,...,y K with 1 m ∈ K U U . ⊂ y1 ∪···∪ ym

Since every Uyi contains only finitely many elements of (xn), there are only finitely many elements of (xn) in K—a contradiction. (b) The proof is an the appendix to this chapter.

Remark 6.6 Further properties. (a) A compact subset of a metric space is closed and bounded. (b) A closed subsets of a compact set is compact.

k k C (c) A subset K of Ê or is compact if and only if K is bounded and closed.

Proof. Suppose K is closed and bounded. Let (xn) be a sequence in K. By Corollary6.20 (xn) has a convergent subsequence. Since K is closed, the limit is in K. By the above proposition K is compact. The other directions follows from (a)

(c) Compactness and Continuity As in the real case (see Theorem 3.6) we have the analogous results for metric spaces.

Proposition 6.24 Let X be a compact metric space. (a) Let f : X Y be a continuous mapping into the metric space Y . Then f(X) is compact. →

(b) Let f : X Ê a continuous mapping. Then f is bounded and attains its maximum and → minimum, that is there are points p and q in X such that

f(p) = sup f(x), f(q) = inf f(x). x X x X ∈ ∈ 1 Proof. (a) Let Gα be an open covering of f(X). By Proposition6.22 f − (Gα) is open for { } 1 every α. Hence, f − (Gα) is an open cover of X. Since X is compact there is an open { 1 } 1 subcover of X, say f − (G ),...,f − (G ) . Then G ,...,G is a finite subcover of { α1 αn } { α1 αn } G covering f(X). Hence, f(X) is compact. We skip (b). { α}

Similarly as for real function we have the following proposition about uniform continuity. The proof is in the appendix.

6.4 Basic Topology 187 Ê Proposition 6.25 Let f : K Ê be a continuous function on a compact set K . Then f → ⊂ is uniformly continuous on K.

k 6.4.6 Continuous Functions in Ê

k Ê Proposition 6.26 (a) The projection mapping p : Ê , i = 1,...,k, given by i → pi(x1,...,xk)= xi is continuous.

k Ê (b) Let U Ê be open and f,g : U be continuous on U. Then f + g, fg, f , and, f/g ⊆ → | | (g =0) are continuous functions on U. 6 (c) Let X be a metric space. A mapping

k

f =(f ,...,f ): X Ê 1 k →

is continuous if and only if all components f : X Ê, i =1,...,k, are continuous. i → k Proof. (a) Let (x ) be a sequence converging to a = (a ,...,a ) Ê . Then the sequence n 1 k ∈ (pi(xn)) converges to ai = pi(a) by Proposition6.19. This shows continuity of pi at a. (b) The proofs are quite similar to the proofs in the real case, see Proposition2.3. As a sample we carry out the proof in case fg. Let a U and put M = max f(a) , f(b) . Let ε > 0, ∈ {| | | |} ε< 3M 2, be given. Since f and g are continuous at a, there exists δ > 0 such that ε x a < δ implies f(x) f(a) < , k − k | − | 3M ε (6.45) x a < δ implies g(x) g(a) < . k − k | − | 3M Note that

fg(x) fg(a)=(f(x) f(a))(g(x) g(a))+ f(a)(g(x) g(a)) + g(a)(f(x) f(a)). − − − − − Taking the absolute value of the above identity, using the triangle inequality as well as (6.45) we have that x a < δ implies k − k fg(x) fg(a) f(x) f(a) g(x) g(a) + f(a) g(x) g(a) + g(a) f(x) f(a) | − |≤| − || − | | || − | | || − | ε2 ε ε ε ε ε + M + M + + = ε. ≤ 9M 2 3M 3M ≤ 3 3 3 This proves continuity of fg at a. (c) Suppose first that f is continuous at a X. Since f = p ◦f, f is continuous by the result ∈ i i i of (a) and Proposition6.21. Suppose now that all the f , i =1,...,k, are continuous at a. Let (x ), x = a, be a sequence i n n 6 in X with limn xn = a in X. Since fi is continuous, the sequences (fi(xn)) of numbers →∞ converge to fi(a). By Proposition6.19, the sequence of vectors f(xn) converges to f(a); hence f is continuous at a. 188 6 Sequences of Functions and Basic Topology

3 2 Ê Example 6.17 Let f : Ê be given by →

2 z sin x +e f(x, y, z)= √x2+y2+z2+1 . log x2 + y2 + z2 +1 ! | | Then f is continuous on U. Indeed, since product, sum, and composition of continuous func- 2 2 2 3 tions are continuous, x + y + z +1 is a continuous function on Ê . We also made use of Proposition6.26(a); the coordinate functions x, y, and z are continuous. Since the denomi- p x2+ez 2 2 2 nator is nonzero, f1(x, y, z) = sin is continuous. Since x + y + z +1 > 0, √x2+y2+z2+1 | | f (x, y, z) = log x2 + y2 + z2 +1 is continuous. By Proposition6.26(c) f is continuous. 2 | |

6.5 AppendixE

(a) A compact subset is closed

Proof. Let K be a compact subset of a metric space X. We shall prove that the complement of K is an open subset of X. Suppose that p X, p K. If q K, let V q and U(q) be neighborhoods of p and q, ∈ 6∈ ∈ respectively, of radius less than d(p, q)/2. Since K is compact, there are finitely many points q1,...,qn in K such that K U U =: U. ⊂ q1 ∪···∪ qn If V = V q1 V qn , then V is a neighborhood of p which does not intersect U. Hence ∩···∩ U Kc, so that p is an interior point of Kc, and K is closed. We show that K is bounded. ⊂ Let ε > 0 be given. Since K is compact the open cover U (x) x K of K has a finite { ε | ∈ } subcover, say U (x ),...,U (x ) . Let U = n U (x ), then the maximal distance of two { ε 1 ε n } i=1 ε i points x and y in U is bounded by S

2ε + d(xi, xj). 1 i

A closed subset of a compact set is compact

Proof. Suppose F K X, F is closed in X, and K is compact. Let U α be an open cover ⊂ ⊂ { } of F . Since F c is open, U α, F c is an open cover Ω of K. Since K is compact, there is a { } finite subcover Φ of Ω, which covers K. If F c is a member of Φ, we may remove it from Φ and still retain an open cover of F . Thus we have shown that a finite subcollection of U α covers { } F . 6.5 Appendix E 189

Equivalence of Compactness and Sequential Compactness

Proof of Proposition6.23(b). This direction is hard to proof. It does not work in arbitrary topological spaces and essentially uses that X is a metric space. The prove is roughly along the lines of Exercises 22 to 26 in [Rud76]. We give the proof of Bredon (see [Bre97, 9.4 Theorem]) Suppose that every sequence in K contains a converging in K subsequence. 1) K contains a countable dense set. For, we show that for every ε > 0, K can be covered by a finite number of ε-balls (ε is fixed). Suppose, this is not true, i.e. K can’t be covered by any

finite number of ε-balls. Then we construct a sequence (xn) as follows. Take an arbitrary x1.

Suppose x1,...,xn are already found; since K is not covered by a finite number of ε-balls, we find xn+1 which distance to every preceding element of the sequence is greater than or equal to ε. Consider a limit point x of this sequence and an ε/2-neighborhood U of x. Almost all elements of a suitable subsequence of (xn) belong to U, say xr and xs with s>r. Since both are in U their distance is less than ε. But this contradicts the construction of the sequence.

Now take the union of all those finite sets corresponding to ε =1/n, n Æ. This is a countable ∈ dense set of K.

2) Any open cover Uα of K has a countable subcover. Let x K be given. Since Uα α I is

{ } ∈ { } ∈ Æ an open cover of K we find β I and n such that U2/n(x) Uα. Further, since xi i Æ ∈ ∈ ⊂ { } ∈ is dense in K, we find i, n Æ such that d(x, x ) < 1/n. By the triangle inequality ∈ i x U (x ) U (x) U . ∈ 1/n i ⊂ 2/n ⊂ β To each of the countably many U (x ) choose one U U (x ). This is a countable sub- 1/n i β ⊃ 1/n i cover of U . { α}

3) Rename the countable open subcover by Vn n Æ and consider the decreasing sequence Cn { } ∈ of closed sets n C = K V , C C . n \ k 1 ⊃ 2 ⊃··· k[=1 If Ck = ∅ we have found a finite subcover, namely V1,V2,...,Vk. Suppose that all the Cn are nonempty, say x C . Further, let x be the limit of the subsequence (x ). Since x C n ∈ n ni ni ∈ m for all n m and C is closed, x C for all m. Hence x C . However, i m m m Æ m ≥ ∈ ∈ ∈ T

Cm = K \ Vm = ∅. Æ m Æ m \∈ [∈ This contradiction completes the proof.

Proof of Proposition6.25. Let ε> 0 be given. Since f is continuous, we can associate to each point p K a positive number δ(p) such that q K U (p) implies f(q) f(p) < ε/2. ∈ ∈ ∩ δ(p) | − | Let J(p)= q K p q < δ(p)/2 . { ∈ || − | } Since p J(p), the collection J(p) p K is an open cover of K; and since K is compact, ∈ { | ∈ } there is a finite set of points p1,...,pn in K such that

K J(p ) J(p ). (6.46) ⊂ 1 ∪···∪ n 190 6 Sequences of Functions and Basic Topology

We put δ := 1 min δ(p ),...,δ(p ) . Then δ > 0. Now let p and q be points of K with 2 { 1 n } x y < δ. By (6.46), there is an integer m, 1 m n, such that p J(p ); hence | − | ≤ ≤ ∈ m 1 p p < δ(p ), | − m | 2 m and we also have 1 q p p q + p p < δ + δ(p ) δ(p ). | − m |≤| − | | − m | 2 m ≤ m

Finally, continuity at pm gives

f(p) f(q) f(p) f(p ) + f(p ) f(q) < ε. | − |≤| − m | | m − |

Proposition 6.27 There exists a real continuous function on the real line which is nowhere differentiable.

Proof. Define ϕ(x)= x , x [ 1, 1] | | ∈ − and extend the definition of ϕ to all real x by requiring periodicity

ϕ(x +2) = ϕ(x).

Then for all s, t Ê, ∈ ϕ(s) ϕ(t) s t . (6.47) | − |≤| − |

In particular, ϕ is continuous on Ê. Define

n ∞ 3 f(x)= ϕ(4nx). (6.48) 4 n=0 X  

Since 0 ϕ 1, Theorem 6.2 shows that the series (6.48) converges uniformly on Ê. By ≤ ≤

Theorem 6.5, f is continuous on Ê.

Now fix a real number x and a positive integer m Æ. Put ∈ 1 δ = ± m 2 4m · m m where the sign is chosen that no integer lies between 4 x and 4 (x + δm). This can be done since 4m δ = 1 . It follows that ϕ(4mx) ϕ(4mx +4mδ ) = 1 . Define | m | 2 | − m | 2 n n ϕ(4 (x + δm)) ϕ(4 x) γn = − . δm 6.5 Appendix E 191

When n > m, then 4nδ is an even integer, so that γ =0 by peridicity of ϕ. When 0 n m, m n ≤ ≤ (6.47) implies γ 4m. Since γ =4m, we conclude that | n | ≤ | m | m n m 1 f(x + δ ) f(x) 3 − 1 m − = γ 3m 3n = (3m + 1) . δ 4 n ≥ − 2 m n=0   n=0 X X

As m , δm 0. It follows that f is not differentiable at x. →∞ →

Proof of Abel’s Limit Theorem, Proposition6.9. By Proposition6.4, the series converges on ( 1, 1) and the limit function is continuous there since the radius of convergence is at least 1, − by assumption. Hence it suffices to proof continuity at x = 1, i.e. that limx 1 0 f(x)= f(1). → −

Put rn = k∞=n ak; then r0 = f(1) and rn+1 rn = cn for all nonnegative integers n Z+ − − ∈ n and limn rn = 0. Hence there is a constant C with rn C and the series ∞ rn+1x →∞P | | ≤ n=0 converges for x < 1 by the comparison test. We have | | P

∞ ∞ ∞ (1 x) r xn = r xn + r xn+1 − n+1 n+1 n+1 n=0 n=0 n=0 X X X ∞ ∞ ∞ = r xn r xn + r = a xn + f(1), n+1 − n 0 − n n=0 n=0 n=0 X X X hence, ∞ f(1) f(x)=(1 x) r xn. − − n+1 n=0 X

Let ε> 0 be given. Choose N Æ such that n N implies r <ε. Put δ = ε/(CN); then ∈ ≥ | n | x (1 δ, 1) implies ∈ − N 1 − ∞ f(1) f(x) (1 x) r xn + (1 x) r xn | − | ≤ − | n+1 | − | n+1 | n=0 X nX=N ∞ (1 x)CN + (1 x)ε xn =2ε; ≤ − − n=0 X hence f tends to f(1) as x 1 0. → −

Definition 6.15 If X is a metric space C(X) will denote the set of all continuous, bounded functions with domain X. We associate with each f C(X) its supremum norm ∈ f = f = sup f(x) . (6.49) k k∞ k k x X | | ∈ Since f is assumed to be bounded, f < . Note that boundedness of X is redundant if X is k k ∞ a compact metric space (Proposition6.24). Thus C(X) contains of all continuous functions in that case. It is clear that C(X) is a vector space since the sum of bounded functions is again a bounded 192 6 Sequences of Functions and Basic Topology function (see the triangle inequality below) and the sum of continuous functions is a continuous function (see Proposition6.26). We show that f is indeed a norm on C(X). k k∞ (i) Obviously, f 0 since the absolute value f(x) is nonnegative. Further 0 = 0. k k∞ ≥ | | k k Suppose now f =0. This implies f(x) =0 for all x; hence f =0. k k | | (ii) Clearly, for every (real or complex) number λ we have

λf = sup λf(x) = λ sup f(x) = λ f . k k x X | | | | x X | | | |k k ∈ ∈ (iii) If h = f + g then

h(x) f(x) + g(x) f + g , x X; | |≤| | | |≤k k k k ∈ hence f + g f + g . k k≤k k k k We have thus made C(X) into a normed vector space. Remark 6.1 can be rephrased as

A sequence (fn) converges to f with respect to the norm in C(X) if and only if f f uniformly on X. n → Accordingly, closed subsets of C(X) are sometimes called uniformly closed, the closure of a set A C(X) is called the uniform closure, and so on. ⊂ Theorem 6.28 The above norm makes C(X) into a Banach space (a complete normed space).

Proof. Let (fn) be a Cauchy sequence of C(X). This means to every ε > 0 corresponds an

n Æ such that n, m n implies f f < ε. It follows by Proposition6.1 that there 0 ∈ ≥ 0 k n − mk is a function f with domain X to which (fn) converges uniformly. By Theorem 6.5, f is continuous. Moreover, f is bounded, since there is an n such that f(x) f (x) < 1 for all | − n | x X, and f is bounded. ∈ n Thus f C(X), and since f f uniformly on X, we have f f 0 as n . ∈ n → k − nk→ →∞ Chapter 7

Calculus of Functions of Several Variables

m n

Ê Ê In this chapter we consider functions f : U Ê or f : U where U is an open set. → → ⊂ In Subsection6.4.6 we collected the main properties of continuous functions f. Now we will study differentiation and integration of such functions in more detail

The Norm of a linear Mapping

n m n m

Ê Ê Ê Proposition 7.1 Let T L(Ê , ) be a linear mapping of the euclidean spaces into . ∈ (a) Then there exists some C > 0 such that

n

T (x) C x , for all x Ê . (7.1) k k2 ≤ k k2 ∈ n (b) T is uniformly continuous on Ê .

n m Ê Proof. (a) Using the standard bases of Ê and we identify T with its matrix T = (aij), m T ej = i=1 aijei. For x =(x1,...,xn) we have P n n T (x)= a1jxj, ..., amjxj ; j=1 j=1 ! X X hence by the Cauchy–Schwarz inequality we have

m n 2 n m 2 T (x) 2 = a x a x k k2 ij j ≤ | ij j | i=1 j=1 i=1 j=1 ! X X X X m n n n

a 2 x 2 = a2 x 2 = C2 x 2 , ≤ | ij | | j | ij | j | k k i=1 j=1 j=1 i,j ! j=1 X X X X X 2 where C = i,j aij. Consequently, qP T x C x . k k ≤ k k (b) Let ε> 0. Put δ = ε/C with the above C. Then x y < δ implies k − k T x Ty = T (x y) C x y < ε, k − k k − k ≤ k − k which proves (b).

193 194 7 Calculus of Functions of Several Variables

Definition 7.1 Let V and W normed vector spaces and A L(V, W ). The smallest number C ∈ with (7.1) is called the norm of the linear map A and is denoted by A . k k A = inf C Ax C x for all x V . (7.2) k k { |k k ≤ k k ∈ } By definition,

Ax A x . (7.3) k k≤k k k k

n m Ê Let T L(Ê , ) be a linear mapping. One can show that ∈ T x T = sup k k = sup T x = sup T x . k k x=0 x x =1 k k x 1 k k 6 k k k k k k≤ 7.1 Partial Derivatives

n Ê We consider functions f : U Ê where U is an open set. We want to find derivatives → ⊂ “one variable at a time.”

n Ê Definition 7.2 Let U Ê be open and f : U a real function. Then f is called partial ⊂ → differentiable at a =(a ,...,a ) U with respect to the ith coordinate if the limit 1 n ∈

f(a1,...,ai + h,...,an) f(a1,...,an) Dif(a) = lim − (7.4) h 0 → h exists where h is real and sufficiently small (such that (a ,...,a + h,...,a ) U). 1 i n ∈ Dif(x) is called the ith partial derivative of f at a. We also use the notations ∂f ∂f(a) Dif(a)= (a)= = fxi (a). ∂xi ∂xi

It is important that Dif(a) is the ordinary derivative of a certain function; in fact, if g(x) = f(a1,...,x,...,an), then Dif(a) = g′(ai). That is, Dif(a) is the slope of the tangent line at (a, f(a)) to the curve obtained by intersecting the graph of f with the plane x = a , j = i. It j j 6 also means that computation of Dif(a) is a problem we can already solve.

2 2 2 Example 7.1 (a) f(x, y) = sin(xy ). Then D1f(x, y) = y cos(xy ) and D2f(x, y) = 2xy cos(xy2).

n Ê (b) Consider the radius function r : Ê → r(x)= x = x2 + + x2 , k k2 1 ··· n

n q n Ê x =(x ,...,x ) Ê . Then r is partial differentiable on 0 with 1 n ∈ \ ∂r x (x)= i , x =0. (7.5) ∂xi r(x) 6 Indeed, the function f(ξ)= x2 + + ξ2 + + x2 1 ··· ··· n q 7.1 Partial Derivatives 195

is differentiable, where x1,...,xi 1, xi+1,...,xn are considered to be constant. Using the chain − rule one obtains (with ξ = xi)

∂r 1 2ξ xi (x)= f ′(ξ)= = . 2 ∂xi 2 x + + ξ2 + + x2 r 1 ··· ··· n

(c) Let f : (0, + ) Ê be differentiable.p The composition x f(r(x)) (with the above ∞ → 7→n radius function r) is denoted by f(r), it is partial differentiable on Ê \ 0. The chain rule gives

∂ ∂r xi f(r)= f ′(r) = f ′(r) . ∂xi ∂xi r (d) Partial differentiability does not imply continuity. Define

xy = xy , (x, y) = (0, 0), f(x, y)= (x2+y2)2 r4 6 (0, (x, y)=(0, 0).

2 Obviously, f is partial differentiable on Ê \ 0. Indeed, by definition of the partial derivative ∂f f(h, 0) (0, 0) = lim = lim 0=0. ∂x h 0 h h 0 → → ∂f Since f is symmetric in x and y, ∂y (0, 0) = 0, too. However, f is not continuous at 0 since f(ε,ε)=1/(4ε2) becomes large as ε tends to 0.

Remark 7.1 In the next section we will become acquainted with stronger notion of differen- tiability which implies continuity. In particular, a continuously partial differentiable function is continuous.

n Ê Definition 7.3 Let U Ê be open and f : U partial differentiable. The vector ⊂ → ∂f ∂f grad f(x)= (x),..., (x) (7.6) ∂x ∂x  1 n  is called the gradient of f at x U. ∈ Example 7.2 (a) For the radius function r(x) defined in Example7.1(b) we have x grad r(x)= . r Note that x/r is a unit vector (of the euclidean norm 1) in the direction x. With the notations of Example 7.1 (c), x grad f(r)= f ′(r) . r

(b) Let f,g : U Ê be partial differentiable functions. Then we have the following product → rule grad(fg)= g grad f + f grad g. (7.7) This is immediate from the product rule for functions of one variable ∂ ∂f ∂g (fg)= g + f . ∂xi ∂xi ∂xi y y 1 y (c) f(x, y)= x . Then grad f(x, y)=(yx − , x log x). 196 7 Calculus of Functions of Several Variables

Notation. Instead of grad f one also writes f (“Nabla f”). is a vector-valued differential ∇ ∇ operator: ∂ ∂ = ,..., . ∇ ∂x ∂x  1 n  n Definition 7.4 Let U Ê . A vector field on U is a mapping ⊂ n

v =(v ,...,v ): U Ê . (7.8) 1 n → n To every point x U there is associated a vector v(x) Ê . ∈ ∈ If the vector field v is partial differentiable (i. e. all components vi are partial differentiable) then

n ∂v div v = i (7.9) ∂x i=1 i X is called the divergence of the vector field v. Formally the divergence of v can be written as a inner product of and v ∇ n ∂ div v = v = v . ∇ ∂x i · i=1 i X

The product rule gives the following rule for the divergence. Let f : U Ê a partial differen- → tiable function and

v =(v ,...,v ): U Ê 1 n → a partial differentiable vector field, then

∂ ∂f ∂vi (fvi)= vi + f . ∂xi ∂xi · · ∂xi Summation over i gives

div (fv) = grad f v + f div v. (7.10) · Using the nabla operator this can be rewritten as

fv = f v + f v. ∇· ∇ · ∇·

n n x Ê Example 7.3 Let F : Ê 0 be the vector field F (x)= , r = x . Since \ → r k k n ∂x div x = i = n and x x = r2, ∂x i=1 i · X Example7.2 gives with v = x and f(r)=1/r

x 1 1 x n n 1 div = grad x + div x = x + = − . r r · r −r3 · r r 7.1 Partial Derivatives 197

7.1.1 Higher Partial Derivatives

n Ê Let U Ê be open and f : U a partial differentiable function. If all partial derivatives ⊂ →

D f : U Ê are again partial differentiable, f is called twice partial differentiable. We can i → form the partial derivatives DjDif of the second order.

More general, f : U Ê is said to be (k +1)-times partial differentiable if it is k-times partial → differentiable and all partial derivatives of order k

D D D f : U Ê ik ik−1 ··· i1 → are partial differentiable.

A function f : U Ê is said to be k-times continuously partial differentiable if it is k-times → partial differentiable and all partial derivatives of order less than or equal to k are continuous. The set of all such functions on U is denoted by Ck(U). We also use the notation ∂2f ∂2f ∂kf D D f = = f ,D D f = ,D D f = . j i ∂x ∂x xixj i i ∂x2 ik ··· i1 ∂x ∂x j i i ik ··· i1 Example. Let f(x, y) = sin(xy2). One easily sees that

f = f =2y cos(xy2) y2 sin(xy2)2xy. yx xy −

n Ê Proposition 7.2 (Schwarz’s Lemma) Let U Ê be open and f : U be twice continu- ⊂ → ously partial differentiable. Then for every a U and all i, j =1,...,n we have ∈

DjDif(a)= DiDjf(a). (7.11)

Proof. Without loss of generality we assume n =2, i =1, j =2, and a =0; we write (x, y) in place of (x1, x2). Since U is open, there is a small square of length 2δ > 0 completely contained in U: 2

(x, y) Ê x < δ, y < δ U. { ∈ || | | | } ⊂

For fixed y U (0) define the function F : ( δ, δ) Ê via ∈ δ − → F (x)= f(x, y) f(x, 0). − By the mean value theorem (Theorem 4.9) there is a ξ with ξ x such that | |≤| |

F (x) F (0) = xF ′(ξ). −

But F ′(ξ) = f (ξ,y) f (ξ, 0). Applying the mean value theorem to the function h(y) = x − x f (ξ,y), there is an η with η y and x | |≤| | ∂ f (ξ,y) f (ξ, 0) = h′(y)y = f (ξ, η) y = f (ξ, η) y. x − x ∂y x xy Altogether we have

F (x) F (0) = f(x, y) f(x, 0) f(0,y)+ f(0, 0) = f (ξ, η)xy. (7.12) − − − xy 198 7 Calculus of Functions of Several Variables

The same arguments but starting with the function G(y)= f(x, y) f(0,y) show the existence − of ξ′ and η′ with ξ′ x , η′ y and | |≤| | | |≤| | f(x, y) f(x, 0) f(0,y)+ f(0, 0) = f (ξ′, η′) xy. (7.13) − − xy From (7.12) and (7.13) for xy =0 it follows that 6

fxy(ξ, η)= fxy(ξ′, η′).

If (x, y) approaches (0, 0) so do (ξ, η) and (ξ′, η′). Since fxy and fyx are both continuous it follows from the above equation

D2D1f(0, 0) = D1D2f(0, 0).

n n Ê Corollary 7.3 Let U Ê be open and f : U be k-times continuously partial differen- ⊂ → tiable. Then D D f = D D f ik ··· i1 iπ(k) ··· iπ(1) for every permutation π of 1,...,k. Proof. The proof is by induction on k using the fact that any permutation can be written as a product of transpositions (j j + 1). ↔

3 3 Ê Example 7.4 Let U Ê be open and let v : U be a partial differentiable vector field. ⊂ 3 → One defines a new vector field curl v : U Ê , the curl of v by → ∂v ∂v ∂v ∂v ∂v ∂v curl v = 3 2 , 1 3 , 2 1 . (7.14) ∂x − ∂x ∂x − ∂x ∂x − ∂x  2 3 3 1 1 2  Formally one can think of curl v as being the vector product of and v ∇

e1 e2 e3 curl v = v = ∂ ∂ ∂ , ∂x1 ∂x2 ∂x3 ∇ × v1 v2 v3

3 Ê where e , e , and e are the unit vectors in Ê . If f : U has continuous second partial 1 2 3 → derivatives then, by Proposition 7.2,

curl grad f =0. (7.15) Indeed, the first coordinate of curl grad f is by definition ∂2f ∂2f =0. ∂x2∂x3 − ∂x3∂x2 The other two components are obtained by cyclic permutation of the indices. We have found: curl v = 0 is a necessary condition for a continuously partial differentiable

3 Ê vector field v : U Ê to be the gradient of a function f : U . → → 7.2 Total Differentiation 199

7.1.2 The Laplacian

n Let U Ê be open and f C(U). Put ⊂ ∈ ∂2f ∂2f (7.16) ∆f = div grad f = 2 + + 2 , ∂x1 ··· ∂xn and call ∂2 ∂2 ∆= 2 + + 2 ∂x1 ··· ∂xn the Laplacian or Laplace operator. The equation ∆f = 0 is called the Laplace equation; its solution are the harmonic functions. If f depends on an additional time variable t, f : U I × →

Ê, (x, t) f(x, t) one considers the so called wave equation 7→ f a2∆f =0, (7.17) tt − and the so called heat equation

f k∆f =0. (7.18) t −

Example 7.5 Let f : (0, + ) Ê be twice continuously differentiable. We want to compute ∞ → n the Laplacian ∆f(r), r = x , x Ê 0. By Example7.2 we have k k ∈ \ x grad f(r)= f ′(r) , r and by the product rule and Example7.3 we obtain x x x x n 1 ∆f(r) = div grad f(r) = grad f ′(r) + f ′(r) div = f ′′(r) + f ′(r) − ; · r r r · r r thus n 1 ∆f(r)= f ′′(r)+ − f ′(r). r In particular, ∆ 1 =0 if n 3 and ∆log r =0 if n =2. Prove! rn−2 ≥

7.2 Total Differentiation

n m Ê In this section we define (total) differentiability of a function f from Ê to . Roughly speaking, f is differentiable (at some point) if it can be approximated by a linear mapping. In contrast to partial differentiability we need not to refer to single coordinates. Differentiable n functions are continuous. In this section U always denotes an open subset of Ê . The vector space of linear mappings f of a vector space V into a vector space W will be denoted by

L(V, W ).

Ê Ê Motivation: If f : Ê is differentiable at x and f ′(x)= a, then → ∈ f(x + h) f(x) a h lim − − · =0. h 0

→ h Ê Note that the mapping h ah is linear from Ê and any linear mapping is of that form. 7→ → 200 7 Calculus of Functions of Several Variables

m Definition 7.5 The mapping f : U Ê is said to be differentiable at a point x U if there

n m → ∈ Ê exist a linear map A: Ê such that → f(x + h) f(x) A(h) lim k − − k =0. (7.19) h 0 h → k k

n m Ê The linear map A L(Ê , ) is called the derivative of f at x and will be denoted by Df(x). ∈ In case n = m =1 this notion coincides with the ordinary differentiability of a function.

Remark 7.2 We reformulate the definition of differentiability of f at a U: Define a function

n m ∈ Ê ϕ : U (0) Ê (depending on both a and h) by a ε ⊂ → f(a + h)= f(a)+ A(h)+ ϕa(h). (7.20)

ϕa(h) k k Then f is differentiable at a if and only if limh 0 h =0. Replacing the r.h.s. of (7.20) by → k k f(a)+ A(h) (forgetting about ϕa) and inserting x in place of a + h and Df(a) in place of A,

n m Ê we obtain the linearization L: Ê of f at a: → L(x)= f(a)+ Df(a)(x a). (7.21) − Lemma 7.4 If f is differentiable at x U the linear mapping A is uniquely determined. ∈

n m Ê Proof. Throughout we refer to the euclidean norms on Ê and . Suppose that A′

n m n ∈

Ê Ê L(Ê , ) is another linear mapping satisfying (7.19). Then for h , h =0, ∈ 6 A(h) A′(h) = f(x + h) f(x) A(h) (f(x + h) f(x) A′(h)) k − k k − − − − − k f(x + h) f(x) A(h) + f(x + h) f(x) A′(h) ≤k − − k k − − k A(h) A′(h) f(x + h) f(x) A(h) f(x + h) f(x) A′(h) k − k k − − k + k − − k h ≤ h h k k k k k k Since the limit h 0 on the right exists and equals 0, the l.h.s also tends to 0 as h 0, that is → → A(h) A′(h) lim k − k =0. h 0 h → k k

n Ê Now fix h Ê , h =0, and put h = th , t , t 0. Then h 0 and hence, 0 ∈ 0 6 0 ∈ → → A(th ) A′(th ) t A(h ) A′(h ) A(h ) A′(h ) 0 = lim k 0 − 0 k = lim | |k 0 − 0 k = k 0 − 0 k. t 0 th t 0 t h h → k 0k → | |k 0k k 0k Hence, A(h ) A′(h ) =0 which implies A(h )= A′(h ) such that A = A′. k 0 − 0 k 0 0

m n Definition 7.6 The matrix (aij) Ê × to the linear map Df(x) with respect to the standard

n m ∈ Ê bases in Ê and is called the Jacobi matrix of f at x. It is denoted by f ′(x), that is 0 . . a =(f ′(x)) = (0, 0,..., 1 , 0,..., 0) Df(x) 1 = e Df(x)(e ). ij ij · ·   i j i . · .   |{z} 0     7.2 Total Differentiation 201

n m n m

Ê Ê Ê Example Let f : Ê be linear, f(x) = B(x) with B L( , ). Then Df(x) = B → ∈ is the constant linear mapping. Indeed,

f(x + h) f(x) B(h)= B(x + h) B(x) B(h)=0 − − − − since B is linear. Hence, lim f(x + h) f(x) B(h) h =0 which proves the claim. h 0 → k − − kk k

Remark 7.3 (a) Using a column vector h =(h1,...,hn)⊤ the map Df(x)(h) is then given by matrix multiplication

n a11 ... a1n h1 j=1 a1j hj . . . . Df(x)(h)= f ′(x) h = . . . = . . ·      P  a ... a h n a h  m1 mn  n  j=1 mj j       m Once chosen the standard basis in Ê , we can write f(x)=(f1(x),...,fP m(x)) as vector of m

n Ê scalar functions f : Ê . By Proposition6.19 the limit of the vector function i → 1 lim (f(x + h) f(x) Df(x)(h)) . h 0 h − − → k k exists and is equal to 0 if and only if the limit exists for every coordinate i =1,...,m and is 0

1 n lim fi(x + h) fi(x) aijhj =0, i =1,...,m. (7.22) h 0 h − − → j=1 ! k k X We see, f is differentiable at x if and only if all fi, i = 1,...,m, are. In this case the Jacobi matrix f ′(x) is just the collection of the row vectors fi′(x), i =1,...,m:

f1′ (x) . f ′(x)=  .  , f (x)  m′    where fi′(x)=(ai1, ai2,...,ain). 1 n (b) Case m = 1 (f = f ), f ′(a) Ê × is a linear functional (a row vector). Proposition7.6 1 ∈ below will show that f ′(a) = grad f(a). The linearization L(x) of f at a is given by the linear

n n+1

Ê Ê functional Df(a) from Ê . The graph of L is an n-dimensional hyper plane in → touching the graph of f at the point (a, f(a)). In coordinates, the linearization (hyper plane equation) is x = L(x)= f(a)+ f ′(a) (x a). n+1 −

· n Ê Here f ′(a) is the row vector corresponding to the linear functional Df(a): Ê w.r.t. the → standard basis.

n n Example 7.6 Let C = (cij) Ê × be a symmetric n n matrix, that is cij = cji for all

∈n × Ê i, j =1,...,n and define f : Ê by → n n

f(x)= x C(x)= c x x , x =(x ,...,x ) Ê . ij i j 1 n ∈ · i,j=1 X 202 7 Calculus of Functions of Several Variables

n If a, h Ê we have ∈ f(a + h)= a + h C(a + h)= a C(a)+ h C(a)+ a C(h)+ h C(h) · · · · · = a Ca +2C(a) h + h C(h) · · · = f(a)+ v h + ϕ(h), · where v =2C(a) and ϕ(h)= h C(h). Since, by the Cauchy–Schwarz inequality, · ϕ(h) = h C(h) h C(h) h C h C h 2 , | | | · |≤k kk k≤k kk kk k≤k kk k ϕ(h) n limh 0 h = 0. This proves f to be differentiable at a Ê with derivative Df(x)(x) = → k k ∈ 2C(a) x. The Jacobi matrix is a row vector f ′(a)=2C(a)⊤. · 7.2.1 Basic Theorems

m Lemma 7.5 Let f : U Ê differentiable at x, then f is continuous at x. →

Proof. Define ϕx(h) as in Remarks7.2 with Df(x)= A, then

lim ϕx(h) =0 h 0 → k k since f is differentiable at x. Since A is continuous by Prop.7.1, limh 0 A(h) = A(0) = 0. → This gives

lim f(x + h)= f(x) + lim A(h) + lim ϕx(h)= f(x). h 0 h 0 h 0 → → → This shows continuity of f at x.

m Proposition 7.6 Let f : U Ê , f(x)=(f1(x),...,fm(x)) be differentiable at x U. Then ∂fi(x→) ∈ all partial derivatives , i = 1,...m, j = 1,...,n exist and the Jacobi matrix f ′(x) ∂xj m n ∈ Ê × has the form

∂f1 . . . ∂f1 ∂x1 ∂xn . . ∂fi(x) (aij)= f ′(x)=  . .  (x)= . (7.23) ∂xj i =1,...,m ∂fm . . . ∂fm    ∂x1 ∂xn  j =1,...,n   Notation. (a) For the Jacobi matrix we also use the notation

∂(f1,...,fm) f ′(x)= (x) . ∂(x ,...,x )  1 n 

(b) In case n = m the determinant det(f ′(x)) of the Jacobi matrix is called the Jacobian or functional determinant of f at x. It is denoted by

∂(f1,...,fn) det(f ′(x)) = (x) . ∂(x1,...,xn) 7.2 Total Differentiation 203

Proof. Inserting h = te = (0,...,t,..., 0) into (7.22) (see Remark 7.3) we have, since h = j k k t and h = tδ for all i =1,...,m | | k kj f (x + te ) f (x) n a h 0 = lim k i j − i − k=1 ik kk t 0 te → k jk P f (x ,...,x + t,...,x ) f (x) ta = lim | i 1 j n − i − ij | t 0 t → | | fi(x1,...,xj + t,...,xn) fi(x) = lim − aij t 0 t − → ∂f (x) = i a . ∂x − ij j

∂fi(x) Hence aij = . ∂xj

Hyper Planes

3 3

Ê Ê A plane in Ê is the set H = (x , x , x ) a x + a x + a x = a where, a , { 1 2 3 ∈ | 1 2 2 2 3 3 4} i ∈ i = 1,..., 4. The vector a = (a1, a2, a3) is the normal vector to H; a is orthogonal to any vector x x′, x, x′ H. Indeed, a (x x′)= a x a x′ = a a =0. − ∈ − − 4 − 4

The plane H is 2-dimensional since· H can be· written· with two parameters α1, α2 Ê as 0 0 0 0 0 0 3 ∈ (x , x , x )+α v +α v , where (x , x , x ) is some pointin H and v , v Ê are independent 1 2 3 1 1 2 2 1 2 3 1 2 ∈ vectors spanning H.

n n Ê This concept is can be generalized to Ê . A hyper plane in is the set of points

n

H = (x ,...,x ) Ê a x + a x + + a x = a , { 1 n ∈ | 1 1 2 2 ··· n n n+1}

n Ê where a1,...,an+1 Ê. The vector (a1,...,an) is called the normal vector to the hyper ∈ ∈ n plane H. Note that a is unique only up to scalar multiples. A hyper plane in Ê is of dimension n n 1 since there are n 1 linear independent vectors v1,...,vn 1 Ê and a point h H − − − ∈ ∈ such that

H = h + α v + + α v α ,...,α Ê . { 1 1 ··· n n | 1 n ∈ }

Example 7.7 (a) Special case m =1; let f : U Ê be differentiable. Then → ∂f ∂f f ′(x)= (x),..., (x) = grad f(x). ∂x ∂x  1 n  n It is a row vector and gives a linear functional on Ê which linearly associates to each vector n y =(y ,...,y )⊤ Ê a real number 1 n ∈

y1 n . Df(x) . = grad f(x) y = f (x)y .  .  xj j y · j=1  n X   204 7 Calculus of Functions of Several Variables

In particular by Remark7.3(b), the equation of the linearization of f at a (the touching hyper plane) is

xn+1 = L(x)= f(a)+ grad f(a) (x a) · − x = f(a)+(f (a),...,f (a)) (x a , , x a ) n+1 x1 xn 1 − 1 ··· n − n n · x = f(a)+ f (a)(x a ) n+1 xj j − j j=1 n X 0= f (a)(x a )+1 (x f(a)) − xj j − j · n+1 − j=1 X 0=˜n (˜x a˜), · − n+1 where x˜ = (x , , x , x ), a˜ = (a , a , f(a)) and n˜ = ( grad f(a), 1) Ê is the 1 ··· n n+1 1 ··· n − ∈ normal vector to the hyper plane at a˜. m (b) Special case n = 1; let f : (a, b) Ê , f = (f1,...,fm) be differentiable. Then f is a

m → m 1 Ê curve in Ê with initial point f(a) and end point f(b). f ′(t)=(f ′ (t),...,f ′ (t)) × is 1 m ∈ the Jacobi matrix of f at x (column vector). It is the tangent vector to the curve f at t (a, b).

3 2 ∈ Ê (c) Let f : Ê be given by → x3 3xy2 + z f(x, y, z)=(f , f )= − . 1 2 sin(xyz2)   Then

2 2 ∂(f1, f2) 3x 3y 6xy 1 f ′(x, y, z)= = − − . ∂(x, y, z) yz2 cos(xy2z) xz2 cos(xy2z) 2xyz cos(xy2z)     The linearization of f at (a, b, c) is

L(x, y, z)= f(a, b, c)+ f ′(a, b, c) (x a, y b, z c). · − − − Remark 7.4 Note that the existence of all partial derivatives does not imply the existence of f ′(a). Recall Example 7.1 (d). There was given a function having partial derivatives at the origin not being continuous at (0, 0), and hence not being differentiable at (0, 0). However, the next proposition shows that the converse is true provided all partial derivatives are continuous.

m Proposition 7.7 Let f : U Ê be continuously partial differentiable at a point a U.

→ n m ∈ Ê Then f is differentiable at a and f ′ : U L(Ê , ) is continuous at a. → The proof in case n =2, m =1 is in the appendix to this chapter.

n m m p

Ê Ê Ê Theorem 7.8 (Chain Rule) If f : Ê is differentiable at a point a and g : is → →

◦ n p Ê differentiable at b = f(a), then the composition k = g f : Ê is differentiable at a and → Dk(a)= Dg(b)◦Df(a). (7.24) 7.2 Total Differentiation 205

Using Jacobi matrices, this can be written as

k′(a)= g′(b) f ′(a). (7.25) · Proof. Let A = Df(a), B = Dg(b), and y = f(x). Defining functions ϕ, ψ, and ρ by

ϕ(x)= f(x) f(a) A(x a), (7.26) − − − ψ(y)= g(y) g(b) B(y b), (7.27) − − − ρ(x)= g◦f(x) g◦f(a) B◦A(x a) (7.28) − − − then ϕ(x) ψ(y) lim k k =0, lim k k =0 (7.29) x a x a y b y b → k − k → k − k and we have to show that ρ(x) lim k k =0. x a x a → k − k Inserting (7.26) and (7.27) we find

ρ(x)= g(f(x)) g(f(a)) BA (x a)= g(f(x)) g(f(a)) B(f(x) f(a) ϕ(x)) − − − − − − − ρ(x) = [g(f(x)) g(f(a)) B(f(x) f(a))] + B◦ϕ(x) − − − ρ(x)= ψ(f(x)) + B(ϕ(x)).

Using T (x) T x (see Proposition7.1) this shows k k≤k kk k ρ(x) ψ(f(x)) B◦ϕ(x) ψ(y) f(x) f(a) ϕ(x) k k k k + k k k k k − k + B k k . x a ≤ x a x a ≤ y b · x a k k x a k − k k − k k − k k − k k − k k − k Inserting (7.26) again into the above equation we continue ψ(y) ϕ(x)+ A(x a) ϕ(x) = k k k − k + B k k y b · a x k k x a k − k k − k k − k ψ(y) ϕ(x) ϕ(x) k k k k + A + B k k . ≤ y b a x k k k k x a k − k k − k  k − k All terms on the right side tend to 0 as x approaches a. This completes the proof.

Remarks 7.5 (a) The chain rule in coordinates. If A = f ′(a), B = g′(f(a)), and C = k′(a),

m n p m p n

Ê Ê then A Ê × , B × , and C × and ∈ ∈ ∈ ∂(k1,...,kp) ∂(g1,...,gp) ∂(f1,...,fm) = ◦ (7.30) ∂(x ,...,x ) ∂(y ,...,y ) ∂(x ,...,x )  1 n   1 m   1 n  ∂k m ∂g ∂f r (a)= r (f(a)) i (a), r =1,...,p, j =1,...,n. (7.31) ∂x ∂y ∂x j i=1 i j X (b) In particular, in case p =1, k(x)= g(f(x)) we have, ∂k ∂g ∂f ∂g ∂f = 1 + + m . ∂xj ∂y1 ∂xj ··· ∂ym ∂xj 206 7 Calculus of Functions of Several Variables

Example 7.8 (a) Let f(u, v) = uv, u = g(x, y) = x2 + y2, v = h(x, y) = xy, and z = f(g(x, y), h(x, y))=(x2 + y2)xy = x3y + x2y3.

∂z ∂f ∂g ∂f ∂h = + = v 2x + u y =2x2y + y(x2 + y2) ∂x ∂u · ∂x ∂v · ∂x · · ∂z =3x2y + y3. ∂x (b) Let f(u, v)= uv, u(t)= v(t)= t. Then F (t)= f(u(t), v(t)) = tt and

∂f ∂f v 1 v F ′(t)= u′(t)+ v′(t)= vu − 1+ u log u 1 ∂u ∂v · · t 1 t t = t t − + t log t = t (log t + 1). ·

7.3 Taylor’s Formula

The gradient of f gives an approximation of a scalar function f by a linear functional. Taylor’s formula generalizes this concept of approximation to higher order. We consider quadratic ap- proximation of f to determine local extrema. Througout this section we refer to the euclidean norm x = x = x2 + + x2 . k k k k2 1 ··· n p 7.3.1 Directional Derivatives

n Ê Definition 7.7 Let f : U Ê be a function, a U, and e a unit vector, e = 1. The → ∈ ∈ k k directional derivative of f at a in the direction of the unit vector e is the limit

f(a + te) f(a) (Def)(a) = lim − . (7.32) t 0 → t Note that for e = e we have D f = D f = ∂f . j e j ∂xj

Proposition 7.9 Let f : U Ê be continuously differentiable. Then for every a U and every n → ∈ unit vector e Ê , e =1, we have ∈ k k

Def(a)= e grad f(a) (7.33) ·

n Ê Proof. Define g : Ê by g(t)= a + te =(a + te ,...,a + te ). For sufficiently small → 1 1 n n ◦ t Ê, say t ε, the composition k = f g ∈ | | ≤

g n f

Ê Ê Ê , k(t)= f(g(t)) = f(a + te ,...,a + te ) → → 1 1 n n is defined. We compute k′(t) using the chain rule:

n ∂f k′(t)= (a + te) g′ (t). ∂x j j=1 j X 7.3 Taylor’s Formula 207

Since gj′ (t)=(aj + tej)′ = ej and g(0) = a, it follows

n ∂f k′(t)= (a + te) e , (7.34) ∂x j j=1 j Xn

k′(0) = fxj (a)ej = grad f(a) e. j=1 · X On the other hand, by definition of the directional derivative k(t) k(0) f(a + te) f(a) k′(0) = lim − = lim − = Def(a). t 0 t 0 → t → t This completes the proof.

Remark 7.6 (Geometric meaning of grad f) Suppose that grad f(a) = 0 and let e be a 6 normed vector, e = 1. Varying e, D f(x) = e grad f(x) becomes maximal if and only if k k e e and f(a) have the same directions. Hence the· vector grad f(a) points in the direction of ∇ maximal slope of f at a. Similarly, grad f(a) is the direction of maximal decline. − For example f(x, y) = 1 x2 y2 has − − x y grad f(x, y) = − p, − . The √1 x2 y2 √1 x2 y2 − − − − maximal slope of f at (x, y) is in direction e = ( x, y)/ x2 + y2. In this case,the tan- − − gent line to the graph points to the z-axis and p has maximal slope.

n Ê Corollary 7.10 Let f : U Ê be k-times continuously differentiable, a U and x such → ∈ ∈ that the whole segment a + tx, t [0, 1] is contained in U. ∈

Then the function h: [0, 1] Ê, h(t)= f(a + tx) is k-times continuously differentiable, where → n h(k)(t)= D D f(a + tx)x x . (7.35) ik ··· i1 i1 ··· ik i ,...,i =1 1 Xk In particular

n h(k)(0) = D D f(a)x x . (7.36) ik ··· i1 i1 ··· ik i ,...,i =1 1 Xk Proof. The proof is by induction on k. For k = 1 it is exactly the statement of the Proposition. We demonstrate the step from k =1 to k =2. By (7.34)

n d ∂f(a + tx) n n ∂f h′′(t)= x = (a + tx)x x . dt ∂x i1 ∂x ∂x i2 i1 i =1 i1 i =1 i =1 i2 i1 X1   X1 X2 In the second equality we applied the chain rule to h˜(t)= f (a + tx). xi1 208 7 Calculus of Functions of Several Variables

For brevity we use the following notation for the term on the right of (7.36): n (x )kf(a)= x x D D f(a). ∇ i1 ··· ik ik ··· i1 i ,...,i =1 1 Xk 2 In particular, (x )f(a) = x1fx1 (a) + x2fx2 (a) + + xnfxn (a) and ( x) f(a) = 2 ∇ ··· ∇ n x x ∂ f . i,j=1 i j ∂xi∂xj P 7.3.2 Taylor’s Formula

k+1 n Theorem 7.11 Let f C (U), a U, and x Ê such that a + tx U for all t [0, 1]. ∈ ∈ ∈ ∈ ∈ Then there exists θ [0, 1] such that ∈

k 1 1 f(a + x)= (x )mf(a)+ (x )k+1f(a + θx) (7.37) m! ∇ (k + 1)! ∇ m=0 X n 1 n f(a + x)= f(a)+ x f (a)+ x x f (a)+ + i xi 2! i j xixj ··· i=1 i,j=1 X X 1 + xi1 xi fxi xi (a + θx). (k + 1)! ··· k+1 1 ··· k+1 i ,...i 1 Xk+1 The expression R (a, x)= 1 (x )k+1f(a + θx) is called the Lagrange remainder term. k+1 (k+1)! ∇

Proof. Consider the function h: [0, 1] Ê, h(t) = f(a + tx). By Corollary 7.10, h is a → (k + 1)-times continuously differentiable. By Taylor’s theorem for functions in one variable (Theorem 4.15 with x =1 and a =0 therein), we have k h(m)(0) h(k+1)(θ) f(a + x)= h(1) = + . m! (k + 1)! m=0 X By Corollary 7.10 for m =1,...,k we have h(m)(0) 1 = (x )mf(a). m! m! ∇ and h(k+1)(θ) 1 = (x )k+1f(a + θx); (k + 1)! (k + 1)! ∇ the assertion follows.

It is often convenient to substitute x := x + a. Then the Taylor expansion reads k 1 1 f(x)= ((x a) )mf(a)+ ((x a) )(k+1)f(a + θ(x a)) m! − ∇ (k + 1)! − ∇ − m=0 X n 1 n f(x)= f(a)+ (x a )f (a)+ (x a )(x a )f (a)+ + i − i xi 2! i − i j − j xixj ··· i=1 i,j=1 X X 1 + (xi1 ai1 ) (xi ai )fxi xi (a + θ(x a)). (k + 1)! − ··· k+1 − k+1 1 ··· k+1 − i ,...,i 1 Xk+1 7.3 Taylor’s Formula 209

We write the Taylor formula for the case n =2, k =3:

f(a + x,b + y)= f(a, b)+(fx(a, b)x + fy(a, b)y)+ 1 + f (a, b)x2 +2f (a, b) xy + f (a, b)y2 + 2! xx xy yy 1 + f (a, b)x3 +3f (a, b)x2y +3f (a, b)xy2 + f (a, b)y3) + R (a, x). 3! xxx xxy xyy yyy 4

k k  If f k∞=0 C (U)= f : f C (U) k Æ and lim Rk(a, x)=0 for all x U, then k ∈ { ∈ ∀ ∈ } →∞ ∈ T ∞ 1 f(x)= ((x a) )mf(a). m! − ∇ m=0 X The r.h.s. is called the Taylor series of f at a.

Example 7.9 (a) We compute the Taylor expansion of f(x, y) = cos x sin y at (0, 0) to the third order. We have

f = sin x sin y, f = cos x cos y, x − y fx(0, 0)=0, fy(0, 0)=1, f = cos x sin y, f = cos x sin y, f = sin x cos y xx − yy − xy − fxx(0, 0)=0, fyy(0, 0)=0, fxy(0, 0)=0. f = cos x cos y, f = cos x cos y, xxy − yyy − f (0, 0) = 1, f (0, 0) = 1, f (0, 0) = f (0, 0)=0. xxy − yyy − xyy xxx Inserting this gives 1 f(x, y)= y + 3x2y y3 + R (x, y;0). 3! − − 4 The same result can be obtained by multiplying the Taylor series for cos x and sin y:

x2 x4 y3 1 y3 1 + y = y x2y + − 2 4! ∓··· − 3! ±··· − 2 − 6 ···     (b) The Taylor series of f(x, y) = exy2 at (0, 0) is

∞ (xy2)n 1 =1+ xy2 + x2y4 + ; n! 2 ··· n=0 X 2 it converges all over Ê to f(x, y).

Corollary 7.12 (Mean Value Theorem) Let f : U Ê be continuously differentiable, a U, n → ∈ x Ê such that a + tx U for all t [0, 1]. Then there exists θ [0, 1] such that ∈ ∈ ∈ ∈

f(a + x) f(a)= f(a + θx) x, f(y) f(x) = f((1 θ)x + θy) (y x). (7.38) − ∇ · − ∇ − · − This is the special case of Taylor’s formula with k =0. 210 7 Calculus of Functions of Several Variables

n Ê Corollary 7.13 Let f : U Ê be k times continuously differentiable, a U, x such → ∈ ∈ that a + tx U for all t [0, 1]. Then there exists ϕ: U Ê such that ∈ ∈ → k 1 f(a + x)= (x )mf(a)+ ϕ(x), (7.39) m! ∇ m=0 X where ϕ(x) lim =0. x 0 x k → k k Proof. By Taylor’s theorem for f Ck(U), there exists θ [0, 1] such that ∈ ∈ k 1 k − 1 1 1 f(x + a)= (x )mf(a)+ (x )kf(a + θx) =! (x )mf(a)+ ϕ(x). m! ∇ k! ∇ m! ∇ m=0 m=0 X X This implies 1 ϕ(x)= (x )kf(a + θx) (x )kf(a) . k! ∇ − ∇ Since x x x . . . x = x k for x =0,  | i1 ··· ik |≤k k k k k k 6 ϕ(x) 1 | | kf(a + θx) kf(a) . x k ≤ k! ∇ − ∇ k k Since all kth partial derivatives of f are continuous,

Di1i2 i (f(a + θx) f(a)) 0. ··· k x 0 − −→→ This proves the claim.

Remarks 7.7 (a) With the above notations let ((x a) )m P (x)= − ∇ f(a). m m!

Then Pm is a polynomial of degree m in the set of variables x =(x1,...,xn) and we have

k ϕ(x) f(x)= Pm(x)+ ϕ(x), lim k k =0. x a k m=0 → x a X k − k Let us consider in more detail the cases m =0, 1, 2. Case m =0. D0f(a) P (x)= x0 = f(a). 0 0!

P0 is the constant polynomial with value f(a). Case m =1. We have n P (x)= f (a)(x a ) = grad f(a) (x a) 1 xj j − j − j=1 · X 7.4 Extrema of Functions of Several Variables 211

Using Corollary7.13 the first order approximation of a continuously differentiable function is ϕ(x) f(x)= f(a)+ grad f(a) (x a)+ ϕ(x), lim =0. (7.40) − x a x a · → k − k The linearization of f at a is L(x)= P0(x)+ P1(x). Case m =2. 1 n P (x)= f (a)(x a )(x a ). 2 2 xixj i − i j − j i,j=1 X 1 Hence, P2(x) is quadratic with the corresponding matrix 2 fxixj (a) . As a special case of Corollary 7.13 (m =2) we have for f C2(U) ∈  1 ϕ(x) f(a + x)= f(a)+ grad f(a) x + x⊤ Hess f(a) x + ϕ(x), lim =0, (7.41) 2 x 0 x 2 · · · → k k where

n (7.42) ( Hess f)(a)= fxixj (a) i,j=1 is called the Hessian matrix of f at a U. The Hessian matrix is symmetric by Schwarz’ ∈ lemma.

7.4 Extrema of Functions of Several Variables

Definition 7.8 Let f : U Ê be a function. The point x U is called local maximum → ∈ (minimum) of f if there exists a neighborhood U (x) U of x such that ε ⊂ f(x) f(y) (f(x) f(y)) for all y U (x). ≥ ≤ ∈ ε A local extremum is either a local maximum or a local minimum.

Proposition 7.14 Let f : U Ê be partial differentiable. If f has a local extremum at x U → ∈ then grad f(x)=0. Proof. For i =1,...,n consider the function

gi(t)= f(x + tei).

This is a differentiable function of one variable, defined on a certain interval ( ε,ε). If f has − an extremum at x, then gi has an extremum at t =0. By Proposition4.7

gi′(0) = 0.

f(x+tei) f(x) Since gi′(0) = limt 0 − = fxi (x) and i was arbitrary, it follows that → t

grad f(x)=(Dif(x),...,Dnf(x))=0. 212 7 Calculus of Functions of Several Variables

Example 7.10 Let f(x, y) = 1 x2 y2 be defined on the open unit disc U = (x, y) 2 2 2 − − { ∈ Ê x + y < 1 . Then grad f(x, y)=( x/r, y/r)=0 if and only if x = y =0. If f has | } p − − an extremum in U then at the origin. Obviously, f(x, y)= 1 x2 y2 1= f(0, 0) for all − − ≤ points in U such that f attains its global (and local) maximum at (0, 0). p To obtain a sufficient criterion for the existence of local extrema we have to consider the Hessian matrix. Before, we need some facts from Linear Algebra.

n n Definition 7.9 Let A Ê × be a real, symmetric n n-matrix, that is a = a for all ∈ × ij ji i, j =1,...,n. The associated quadratic form

n Q(x)= a x x = x⊤ A x ij i j · · i,j=1 X is called positive definite if Q(x) > 0 for all x =0, 6 negative definite if Q(x) < 0 for all x =0, 6 indefinite if Q(x) > 0, Q(y) < 0 for some x, y, positive semidefinite if Q(x) 0 for all x and Q is not positive definite, ≥ negative semidefinite if Q(x) 0 for all x and Q is not negative definite. ≤ Also, we say that the corresponding matrix A is positive defininite if Q(x) is.

2 2 Example 7.11 Let n = 2, Q(x) = Q(x1, x2). Then Q1(x)=3x1 +7x2 is positive definite, Q (x)= x2 2x2 is negative definite, Q (x)= x2 2x2 is indefinite, Q (x)= x2 is positive 2 − 1 − 2 3 1 − 2 4 1 semidefinite, and Q (x)= x2 is negative semidefinite. 5 − 2 Proposition 7.15 (Sylvester) Let A be a real symmetric n n-matrix and Q(x) = x Ax the × corresponding quadratic form. For k =1, , n let · ··· a a 11 ··· 1k . . Ak =  . .  ,Dk = det Ak. a a  k1 ··· kk   Let λ1,...,λn be the eigenvalues of A. Then

(a) Q is positive definite if and only if λ1 > 0, λ2 > 0,...,λn > 0. This is the case if and only if D1 > 0, D2 > 0,..., Dn > 0.

(b) Q(x) is negative definite if and only if λ1 < 0, λ2 < 0,...,λn < 0. This is the case if and only if ( 1)kD > 0 for all k =1,...,n. − k (c) Q(x) is indefinite if and only if, A has both positive and negative eigenvalues.

2 2 a b Example 7.12 Case n = 2. Let A Ê × , A = , be a symmetric matrix. By ∈ b c   Sylvester’s criterion A is (a) positive definite if and only if det A> 0 and a> 0, 7.4 Extrema of Functions of Several Variables 213

(b) negative definite if and only if det A> 0 and a< 0, (c) indefinite if and only if det A< 0, (d) semidefinite if and only if det A =0.

Proposition 7.16 Let f : U Ê be twice continuously differentiable and let grad f(a)=0 at → some point a U. ∈ (a) If Hess f(a) is positive definite, then f has a local minimum at a. (b) If Hess f(a) is negative definite, then f has a local maximum at a. (c) If Hess f(a) is indefinite, then f has not a local extremum at a. Note that in general there is no information on a if Hess f(a) is semidefinit. Proof. By (7.41) and since grad f(a)=0, 1 ϕ(x) f(a + x)= f(a)+ x A(x)+ ϕ(x), lim =0, (7.43) 2 x 0 x 2 · → k k where A = Hess f(a). n (a) Let A be positive definite. Since the unit sphere S= x Ê x =1 is compact (closed { ∈ |k k } and bounded) and the map Q(x)= x A(x) is continuous, the function attains its minimum, say m, on S, see Proposition6.24, ·

m = min x A(x) x S . { · | ∈ } Since Q(x) is positive definite and 0 S, m> 0. If x is nonzero, y = x/ x S and therefore 6∈ k k ∈ 1 x x A(x) 1 m y A(y)= x A = = x A(x), ≤ x x x x x 2 · k k · k k k k· k k k k · This implies Q(x)= x A(x) m x 2 for all x U. ≥ k k ∈ Since ϕ(x)/ x 2 ·0 as x 0, there exists δ > 0 such that x < δ implies k k −→ → k k m m x 2 ϕ(x) x 2 . − 4 k k ≤ ≤ 4 k k From (7.43) it follows 1 1 m m f(a + x)= f(a)+ Q(x)+ ϕ(x) f(a)+ m x 2 x 2 f(a)+ x 2 , 2 ≥ 2 k k − 4 k k ≥ 4 k k hence f(a + x) > f(a), if 0 < x < δ, k k and f has a strict (isolated) local minimum at a. (b) If A = Hess f(a) is negative definite, consider f in place of f and apply (a). − (c) Let A = Hess f(a) indefinite. We have to show that in every neighborhood of a there exist n x′ and x′′ such that f(x′′) < f(a) < f(x′). Since A is indefinite, there is a vector x Ê 0 ∈ \ such that x A(x)= m> 0. Then for small t we have · 1 m f(a + tx)= f(a)+ tx A(tx)+ ϕ(tx)= f(a)+ t2 + ϕ(tx). 2 · 2 214 7 Calculus of Functions of Several Variables

If t is small enough, m t2 ϕ(tx) m t2, hence − 4 ≤ ≤ 4 f(a + tx) > f(a), if 0 < t < δ. | | n Similarly, if y Ê 0 satisfies y A(y) < 0, for sufficiently small t we have f(a + ty) < f(a). ∈ \ ·

Example 7.13 (a) f(x, y) = x2 + y2. Here f = (2x, 2y) =! 0 if and only if x = y = 0. ∇ Furthermore, 2 0 Hess f(x, y)= 0 2   is positive definite. f has a (strict) local minimum at (0, 0) 2 2 2 (b)Find the local extrema of z = f(x, y)= 4x y on Ê . (the graph is a hyperbolic − paraboloid). We find that the necesary condition f =0 implies f =8x =0, f = 2y =0; ∇ x y − thus x = y =0. Further,

f f 8 0 Hess f(x, y)= xx xy = . f f 0 2  yx yy  −  The Hessian matrix at (0, 0) is indefinite; the function has not an extremum at the origin (0, 0). (c) f(x, y)= x2 + y3. f(x, y)=(2x, 3y2) vanishes if and only if x = y =0. Furthermore, ∇ 2 0 Hess f(0, 0) = 0 0   is positive semidefinit. However, there is no local extremum at the origin since f(ε, 0) = ε2 > 0= f(0, 0) > ε3 = f(0, ε). − − (d) f(x, y) = x2 + y4. Again the Hessian matrix at (0, 0) is positive semidefinite. However, (0, 0) is a strict local minimum.

Local and Global Extrema

n Ê To compute the global extrema of a function f : U Ê where U is open and U is the → ⊂ closure of U we have go along the following lines: (a) Compute the local extrema on U;

(b) Compute the global extrema on the boundary ∂U = U U c; ∩

(c) If U is unbounded without boundary (as U = Ê), consider the limits at infinity. Note that sup f(x) = max maximum of all local maxima in U, sup f(x) . x U { x ∂U } ∈ ∈ To compute the global extremum of f on the boundary one has to find the local extrema on the interior point of the boundary and to compare them with the values on the boundary of the boundary. 7.4 Extrema of Functions of Several Variables 215

2 2 2 2 Example 7.14 Find the global extrema of f(x, y)= x y on U = (x, y) Ê x + y 1 { ∈ | ≤ } (where U is the open unit disc.) 2 Since grad f = (fx, fy) = (2xy, x ) local extrema can appear only on the y-axis x = 0, y is arbitrary. The Hessian matrix at (0,y) is

f f 2y 2x 2y 0 Hess f(0,y)= xx xy = = . f f 2x 0 0 0  xy yy x=0   x=0  

This matrix is positive semidefinite in case y > 0, negative semidefinite in case y < 0 and 0 at (0, 0). Hence, the above criterion gives no answer. We have to apply the definition directly. In case y > 0 we have f(x, y) = x2y 0 for all x. In particular f(x, y) f(0,y)=0. Hence ≥ ≥ (0,y) is a local minimum. Similarly, in case y < 0, f(x, y) f(0,y)=0 for all x. Hence, f ≤ has a local maximum at (0,y), y < 0. However f takes both positive and negative values in a neighborhood of (0, 0), for example f(ε,ε)= ε3 and f(ε, ε)= ε3. Thus (0, 0) is not a local − − extremum. We have to consider the boundary x2 + y2 =1. Inserting x2 =1 y2 we obtain − 2 2 3 g(y)= f(x, y) 2 2 = x y = (1 y )y = y y , y 1. |x +y =1 x2+y2=1 − − | | ≤ 2 2 We compute the local extrema of the boundary x +y =1 (note, that the circle has no boundary, such that the local extrema are actually the global extrema).

2 ! 1 g′(y)=1 3y =0, y = . − | | √3

2 Since g′′(1/√3) < 0 and g′′( 1/√3) > 0, g attains its maximum at y =1/√3. Since this − 3√3 is greater than the local maximum of f at (0,y), y > 0, f attains its global maximum at the two points 2 1 M1,2 = , , ±r3 √3! where f(M ) = x2y = 2 . g attains its minimum 2 at y = 1/√3. Since this is less 1,2 3√3 − 3√3 − than the local minimum of f at (0,y), y< 0, f attains its global minimum at the two points

2 1 m1,2 = , , ±r3 −√3! where f(m )= x2y = 2 . 1,2 − 3√3 The arithmetic-geometric mean inequality shows the same result for x,y > 0:

2 2 1 1 x2 + y2 x + x + y2 x2 x2 3 2 = 2 2 y2 = x2y . 3 ≥ 3 3 ≥ 2 2 ⇒ ≤ 3√3   (b) Among all boxes with volume 1 find the one where the sum of the length of the 12 edges is minimal. Let x, y and z denote the length of the three perpendicular edges of one vertex. By assumption xyz =1; and g(x, y, z)=4(x + y + z) is the function to minimize. 216 7 Calculus of Functions of Several Variables

Local Extrema. Inserting the constraint z =1/(xy) we have to minimize 1 f(x, y)=4 x + y + on U = (x, y) x> 0,y> 0 . xy { | }   The necessary condition is 1 f =4 1 =0, x − x2y   = x2y = xy2 =1= x = y =1. 1 ⇒ ⇒ f =4 1 =0, y − xy2   Further, 8 8 4 f = , f = , f = xx x3y yy xy3 xy x2y2 such that 8 4 det Hess f(1, 1) = = 64 16 > 0; 4 8 −

hence f has an extremum at (1, 1). Since fxx(1 , 1)=8 > 0, f has a local minimum at (1, 1).

Global Extrema. We show that (1, 1) is even the global minimum on the first quadrant U. Consider N = (x, y) 1 x, y 5 . If (x, y) N, { | 25 ≤ ≤ } 6∈ f(x, y) 4(5+0+0) =20, ≥ Since f(x, y) 12 = f(1, 1), the global minimum of f on the right-upper quadrant is attained ≥ on the compact rectangle N. Inserting the four boundaries x =5, y =5, x =1/5, and y =1/5, in all cases, f(x, y) 20 such that the local minimum (1, 1) is also the global minimum. ≥

7.5 The Inverse Mapping Theorem

Ê Ê Suppose that f : Ê is differentiable on an open set U , containing a U, and → ⊂ ∈ f ′(a) = 0. If f ′(a) > 0, then there is an open interval V U containing a such that f ′(x) > 6 ⊂ 0 for all x V . Thus f is strictly increasing on V and therefore injective with an inverse ∈ function g defined on some open interval W containing f(a). Moreover g is differentiable (see

Proposition4.5) and g′(y)=1/f ′(x) if f(x)= y. An analogous result in higher dimensions is more involved but the result is very important.

W Rn      f   n  V  R            U a   f(a)            7.5 The Inverse Mapping Theorem 217

n n Ê Theorem 7.17 (Inverse Mapping Theorem) Suppose that f : Ê is continuously dif- → ferentiable on an open set U containing a, and det f ′(a) =0. Then there is an open set V U 6 ⊂ containing a and an open set W containing f(a) such that f : V W has a continuous inverse → g : W V which is differentiable and for all y W . For y = f(x) we have → ∈ 1 1 g′(y)=(f ′(x))− , Dg(y)=(Df(x))− . (7.44)

For the proof see [Rud76, 9.24 Theorem] or [Spi65, 2-11].

n n Ê Corollary 7.18 Let U Ê be open, f : U continuously differentiable and ⊂ →n det f ′(x) =0 for all x U. Then f(U) is open in Ê . 6 ∈ Remarks 7.8 (a) One main part is to show that there is an open set V U which is mapped ⊂ onto an open set W . In general, this is not true for continuous mappings. For example sin x maps the open interval (0, 2π) onto the closed set [ 1, 1]. Note that sin x does not satisfy the − assumptions of the corollary since sin′(π/2) = cos(π/2)=0.

(b) Note that continuity of f ′(x) in a neighborhood of a, continuity of the determinant mapping

n n Ê det: Ê × , and det f ′(a) = 0 implies that det f ′(x) = 0 in a neighborhood V of a, see → 6 6 1 homework 10.4. This implies that the linear mapping Df(x) is invertible for x V1. Thus, 1 1 ∈ Df(x)− and (f ′(x))− exist for x V — the linear mapping Df(x) is regular. ∈ 1 (c) Let us reformulate the statement of the theorem. Suppose

y1 = f1(x1,...,xn),

y2 = f2(x1,...,xn), . .

yn = fn(x1,...,xn) is a system of n equations in n variables x1,...,xn; y1,...,yn are given in a neighborhood W of f(a). Under the assumptions of the theorem, there exists a unique solution x = g(y) of this system of equations

x1 = g1(y1,...,yn),

x2 = g2(y1,...,yn), . .

xn = gn(y1,...,yn) in a certain neighborhood (x ,...,x ) V of a. Note that the theorem states the existence of 1 n ∈

such a solution. It doesn’t provide an explicit formula. Ê (d) Note that the inverse function g may exist even if det f ′(x)=0. For example f : Ê , 3 3 → defined by f(x) = x has f ′(0) = 0; however g(y) = √x is inverse to f(x). One thing is certain if det f ′(a)=0 then g cannot be differentiable at f(a). If g were differentiable at f(a), the chain rule applied to g(f(x)) = x would give

g′(f(a)) f ′(a) = id · 218 7 Calculus of Functions of Several Variables and consequently det g′(f(a)) det f ′(a) = det id = 1 contradicting det f ′(a)=0. (e) Note that the theorem states that under the given assumptions f is locally invertible. There is no information about the existence of an inverse function g to f on a fixed open set. See Example7.15 (a) below.

2 Example 7.15 (a) Let x = r cos ϕ and y = r sin ϕ be the polar coordinates in Ê . More precisely, let

x r cos ϕ 2 2 Ê f(r, ϕ)= = , f : Ê . y r sin ϕ →     The Jacobian is ∂(x, y) x x cos ϕ r sin ϕ = r ϕ = − = r. ∂(r, ϕ) y y sin ϕ r cos ϕ r ϕ

Let f(r0,ϕ0)=(x0,y0) = (0, 0), then r0 = 0 and the Jacobian of f at (r0,ϕ0) is non-zero. 6 6 2 Since all partial derivatives of f with respect to r and ϕ exist and they are continuous on Ê , the assumptions of the theorem are satisfied. Hence, in a neighborhood U of (x0,y0) there exists a continuously differentiable inverse function r = r(x, y), ϕ = ϕ(x, y). In this case, the function can be given explicitly, r = x2 + y2, ϕ = arg(x, y). We want to compute the Jacobi matrix of the inverse function. Since the inverse matrix p 1 cos ϕ r sin ϕ − cos ϕ sin ϕ − = sin ϕ r cos ϕ 1 sin ϕ 1 cos ϕ   − r r  we obtain by the theorem

x y ∂(r, ϕ) cos ϕ sin ϕ x2+y2 x2+y2 g′(x, y)= = = √ √ ; ∂(x, y) 1 sin ϕ 1 cos ϕ y x r r 2 2 2 2 !   −  − x +y x +y in particular, the second row gives the partial derivatives of the argument function with respect to x and y ∂ arg(x, y) y ∂ arg(x, y) x = − , = . ∂x x2 + y2 ∂y x2 + y2 Note that we have not determined the explicit form of the argument function which is not unique

since f(r, ϕ+2kπ)= f(r, ϕ), for all k Z. However, the gradient takes always the above form. ∈ 2 Note that det f ′(r, ϕ) =0 for all r =0 is not sufficient for f to be injective on Ê \ (0, 0) .

2 2 6 6 { } Ê (b) Let f : Ê be given by (u, v)= f(x, y) where → u(x, y) = sin x cos y, v(x, y)= cos x + sin y. − − Since

∂(u, v) u u cos x sin y = x y = = cos x cos y sin x sin y = cos(x + y) ∂(x, y) v v sin x cos y − x y

7.6 The Implicit Function Theorem 219 f is locally invertible at (x ,y )=( π , π ) since the Jacobian at (x ,y ) is cos0 = 1 = 0. 0 0 4 − 4 0 0 6 Since f( π , π )=(0, √2), the inverse function g(u, v)=(x, y) is defined in a neighborhood 4 − 4 − of (0, √2) and the Jacobi matrix of g at (0, √2) is − − 1 1 1 − 1 1 √2 √2 √2 √2 1 −1 = 1 1 . √2 √2 ! − √2 √2 ! π π π π Note that at point ( 4 , 4 ) the Jacobian of f vanishes. There is indeed no neighborhood of ( 4 , 4 )

where f is injective since for all t Ê ∈ π π π π f + t, t = f , = (0, 0). 4 4 − 4 4     7.6 The Implicit Function Theorem

Motivation: Hyper Surfaces

Suppose that F : U Ê is a continuously differentiable function and grad F (x) = 0 for all → 6 x U. Then ∈ S = (x ,...,x ) F (x ,...,x )=0 { 1 n | 1 n }

n n Ê is called a hyper surface in Ê . A hyper surface in has dimension n 1. Examples are 2 − 2 2 hyper planes a1x1 + + anxn + c = 0 ((a1,...,an) = 0), spheres x1 + + xn = r . The

··· 6 n+1··· Ê graph of differentiable functions f : U Ê is also a hyper surface in → n+1

Γ = (x, f(x)) Ê x U . f { ∈ | ∈ } Question: Is any hyper surface locally the graph of a differentiable function? More precisely,

n

Ê Ê we may ask the following question: Suppose that f : Ê is differentiable and × → f(a1,...,an, b)=0. Can we find for each (x1,...,xn) near (a1,...,an) a unique y near b such that f(x1,...,xn,y)=0? The answer to this question is provided by the Implicit Func- tion Theorem (IFT).

2 2 2 Ê Consider the function f : Ê defined by f(x, y)= x + y 1. If we choose (a, b) with → − a,b > 0, there are open intervals A and B containig a and b with the following property: if x A, there is a unique y B with f(x, y)=0. We can therefore define a function g : A B ∈ ∈ → by the condition g(x) B and f(x, g(x)) = 0. If b > 0 then g(x) = √1 x2; if b < 0 then ∈ − g(x) = √1 x2. Both functions g are differentiable. These functions are said to be defined − − implicitly by the equation f(x, y)=0. On the other hand, there exists no neighborhood of (1, 0) such that f(x, y)=0 can locally be solved for y. Note that f (1, 0)=0. However it can be solved for x = h(y)= 1 y2. y −

n m m p

Ê Ê Theorem 7.19 Suppose that f : Ê , f = f(x, y), is continuously differentiable n×+m → in an open set containing (a, b) Ê and f(a, b)=0. Let Dfy(x, y) be the linear mapping

m m ∈ Ê from Ê into given by ∂(f ,...,f ) ∂f (x, y) D f(x, y)= D f i(x, y) = 1 m (x, y) = i , i, j =1,...,m. y n+j ∂(y ,...,y ) ∂y  1 m   j   (7.45) 220 7 Calculus of Functions of Several Variables

n m Ê If det D f(a, b) = 0 there is an open set A Ê containing a and an open set B y 6 ⊂ ⊂ containing b with the following properties: There exists a unique continuously differentiable function g : A B such that → (a) g(a)= b, (b) f(x, g(x))=0 for all x A. ∈

n m Ê For the derivative Dg(x) L(Ê , ) we have ∈ 1 Dg(x)= (Df (x, g(x)))− ◦Df (x, g(x)), − y x 1 g′(x)= (f ′ (x, g(x)))− f ′ (x, g(x)). − y · x The Jacobi matrix g′(x) is given by

∂(g1,...,gm) 1 ∂(f1,...,fm) (x) = fy′ (x, g(x))− (x, g(x)) (7.46) ∂(x1,...,xn) − · ∂(x1,...,xn)   n   ∂(gk(x)) 1 ∂fl(x, g(x)) = (fy′ (x, g(x))− )kl , k =1,...,m, j =1,...,n. ∂xj − · ∂xj Xl=1

n m n m

Ê Ê Ê Idea of Proof. Define F : Ê by F (x, y)=(x, f(x, y)). Let M = f ′ (a, b). × → × y Then

½n 0n,m F ′(a, b)= = det F ′(a, b) = det M =0. 0 M ⇒ 6  m,n 

n m Ê By the inverse mapping theorem Theorem7.17 there exists an open set W Ê con-

n m ⊂ × Ê taining F (a, b)=(a, 0) and an open set V Ê containing (a, b) which may be of the ⊂ × form A B such that F : A B W has a differentiable inverse h: W A B. × × → → × Since g is differentiable, it is easy to find the Jacobi matrix. In fact, since fi(x, g(x)) = 0, i =1,...n, taking the partial derivative ∂f on both sides gives by the chain rule ∂xj ∂f (x, g(x)) m ∂f (x, g(x)) ∂g (x) 0= i + i k ∂xj ∂yk · ∂xj Xk=1 ∂fi(x, g(x)) ∂gk(x) ∂fi(x, g(x)) 0= + f ′ (x, g(x)) ∂x y · ∂x − ∂x j  j  j ∂fi(x, g(x)) ∂gk(x) = f ′ (x, g(x)) . − ∂x y · ∂x  j   j  Since det fy′ (a, b) = 0, det fy′ (x, y) = 0 in a small neighborhood of (a, b). Hence fy′ (x, g(x)) 6 6 1 is invertible and we can multiply the preceding equation from the left by (fy′ (x, g(x)))− which gives (7.46).

Remarks 7.9 (a) The theorem gives a sufficient condition for “locally” solving the system of equations

0= f1(x1,...,xn,y1,...ym), . .

0= fm(x1,...,xn,y1,...,ym) 7.6 The Implicit Function Theorem 221

with given x1,...,xn for y1,...,ym. (b) We rewrite the statement in case n = m = 1: If f(x, y) is continously differentiable on an 2 open set G Ê which contains (a, b) and f(a, b)=0. If f (a, b) =0 then there exist δ,ε > 0 ⊂ y 6 such that the following holds: for every x U (a) there exists a unique y = g(x) U (b) with ∈ δ ∈ ε f(x, y)=0. We have g(a)= b; the function y = g(x) is continuously differentiable with

fx(x, g(x)) g′(x)= . −fy(x, g(x))

Be careful, note f (x, g(x)) = d (f(x, g(x))). x 6 dx Example 7.16 (a) Let f(x, y) = sin(x + y) + exy 1. Note that f(0, 0)=0. Since − f (0, 0) = cos(x + y)+ xexy =cos0+0=1 =0 y |(0,0) 6 f(x, y)=0 can uniquely be solved for y = g(x) in a neighborhood of x =0,y =0. Further

f (0, 0) = cos(x + y)+ yexy =1. x |(0,0) By Remark 7.9 (b)

xg(x) fx(x, y) cos(x + g(x)) + g(x) e g′(x)= = . − f (x, y) cos(x + g(x)) + x exg(x) y y=g(x)

In particular g′(0) = 1. − Remark. Differentiating the equation fx + fyg′ =0 we obtain

0= fxx + fxyg′ +(fyx + fyyg′)g′ + fyg′′

1 2 g′′ = fxx +2fxyg′ + fyy(g′) −fy 2  2 fxxfy +2fxyfxfy fyyfx g′′ = − − . f 3 g′= x fy − fy Since

f (0, 0) = sin(x + y)+ y2exy =0, xx − (0,0) f (0, 0) = sin(x + y)+ x2exy =0, yy − (0,0) f (0, 0) = sin(x + y) + exy(1 + xy) =1, xy − |(0,0) we obtain g′′(0) = 2. Therefore the Taylor expansion of g(x) around 0 reads

2 g(x)= x + x + r3(x).

2 2 (b) Let γ(t)=(x(t),y(t)) be a differentiable curve γ C ([0, 1]) in Ê . Suppose in a neigh- ∈ borhood of t =0 the curve describes a function y = g(x). Find the Taylor polynomial of degree

2 of g at x0 = x(0). 222 7 Calculus of Functions of Several Variables

Inserting the curve into the equation y = g(x) we have y(t)= g(x(t)). Differentiation gives

2 y˙ = g′ x,˙ y¨ = g′′x˙ + g′x¨

Thus

y˙ y¨ g′x¨ y¨x˙ x¨y˙ g′(x)= , g′′(x)= − = − x˙ x˙ 2 x˙ 3

Now we have the Taylor ploynomial of g at x0

g′′(x0) 2 T (g)(x)= x + g′(x )(x x )+ (x x ) . 2 0 0 − 0 2 − 0 (c) The tangent hyper plane to a hyper surface.

Anyone who understands geometry can understand everything in this world (Galileo Galilei, 1564 – 1642)

Suppose that F : U Ê is continuously differentiable, a U, F (a)=0, and grad F (a) =0. → ∈ 6 Then n F (a) (x a)= F (a)(x a )=0 ∇ − xi i − i · i=1 X is the equation of the tangent hyper plane to the surface F (x)=0 at point a. Proof. Indeed, since the gradient at a is nonzero we may assume without loss of generality that

Fxn (a) =0. By the IFT, F (x1,...,xn 1, xn)=0 is locally solvable for xn = g(x1,...,xn 1) 6 − − in a neighborhood of a = (a1,...,an) with g(˜a) = an, where a˜ = (a1,...,an 1) and − x˜ = (x1,...,xn 1). Define the tangent hyperplane to be the graph of the linearization of g − at (a1,...,an 1, an). By Example 7.7 (a) the hyperplane to the graph of g at a˜ is given by −

xn = g(˜a)+ grad g(˜a) x.˜ (7.47) · Since F (˜a, g(˜a))=0, by the implicit function theorem

∂g(˜a) Fx (a) = j , j =1,...,n 1. ∂xj −Fxn (a) − Inserting this into (7.47) we have

n 1 1 − x a = F (a)(x a ). n − n −F (a) xj j − j xn j=1 X Multiplication by F (a) gives − xn n 1 − F (a)(x a )= F (a)(x a ) = 0 = grad F (a) (x a). − xn n − n xj j − j ⇒ − j=1 · X

7.7 Lagrange Multiplier Rule 223 Ê Let f : U Ê be differentiable. For c → ∈ define the level set U = x U f(x) = c { ∈ | c . The set Uc may be empty, may consist of U } −1 a single point (in case of local extrema) or, in U 0 the “generic” case, that is if grad F (a) =0 and U1 6 U is non-empty, U it is a (n 1)-dimensional c c −

hyper surface. U c Ê is family of non- { c | ∈ } intersecting subsets of U which cover U.

7.7 Lagrange Multiplier Rule

This is a method to find local extrema of a function under certain constraints. Consider the following problem: Find local extremma of a function f(x, y) of two variables where x and y are not independent from each other but satisfy the constraint

ϕ(x, y)=0.

Suppose further that f and ϕ are continuously differentiable. Note that the level sets Uc = 2 (x, y) Ê f(x, y)= c form a family of non-intersecting curves in the plane. { ∈ | } We have to find the curve f(x, y) = c intersecting the constraint curve ϕ(x, y)=0 where c is as large f=c or as small as possible. Usually f = c intersects ϕ = 0 if c monotonically changes. However if c is maximal, the curve f = c touches the graph ϕ = 0. In other words, the tangent lines coincide. This means that the defining normal vectors to the tangent lines are scalar multiples of each other. ϕ =0

n Ê Theorem 7.20 (Lagrange Multiplier Rule) Let f,ϕ: U Ê, U is open, be continu- → ⊂ ously differentiable and f has a local extrema at a U under the constraint ϕ(x)=0. Suppose ∈ that grad ϕ(a) =0. 6 Then there exists a real number λ such that

grad f(a)= λ grad ϕ(a).

This number λ is called Lagrange multiplier. Proof. The idea is to solve the constraint ϕ(x)=0 for one variable and to consider the “free” extremum problem with one variable less. Suppose without loss of generality that ϕ (a) = xn 6 0. By the implicit function theorm we can solve ϕ(x)=0 for xn = g(x1,...,xn 1) in a − neighborhood of x = a. Differentiating ϕ(˜x, g(˜x))= 0 and inserting a = (˜a, an) as before we have

ϕ (a)+ ϕ (a)g (˜a)=0, j =1,...,n 1. (7.48) xj xn xj − 224 7 Calculus of Functions of Several Variables

Since h(˜x)= f(˜x, g(˜x)) has a local extremum at a˜ all partial derivatives of h vanish at a˜:

f (a)+ f (a)g (˜a)=0, j =1,...,n 1. (7.49) xj xn xj −

Setting λ = fxn (a)/ϕxn (a) and comparing (7.48) and (7.49) we find f (a)= λϕ (a), j =1,...,n 1. xj xj −

Since by definition, fxn (a) = λϕxn (a) we finally obtain grad f(a) = λ grad ϕ(a) which completes the proof.

Example 7.17 (a) Let A =(aij) be a real symmetric n n-matrix, and define f(x)= x Ax = × n 1 n a x x . We aks for the local extrema of f on the unit sphere S − = x Ê x ·=1 . i,j ij i j { ∈ |k k } This constraint can be written as ϕ(x)= x 2 1= n x2 1=0. Suppose that f attains P i=1 i n 1 k k − − a local minimum at a S − . By Example7.6(b) ∈ P grad f(a)=2A(a). On the other hand grad ϕ(a)= (2x ,..., 2x ) =2a. 1 n |x=a By Theorem 7.20 there exists a real number λ1 such that

grad f(a)=2A(a)= λ1 grad ϕ(a)=2a,

Hence A(a) = λ1a; that is, λ is an eigenvalue of A and a the corresponding eigenvector. In n 1 particular, A has a real eigenvalue. Since S − has no boundary, the global minimum is also a local one. We find: if f(a) = a A(a) = a λa = λ is the global minimum, λ is the smallest eigenvalue. · · (b) Let a be the point of a hypersurface M = x ϕ(x)=0 with minimal distance to a given { | } point b M. Then the line through a and b is orthogonal to M. 6∈ Indeed, the function f(x) = x b 2 attains its minimum under the condition ϕ(x)=0 at a. k − k By the Theorem, there is a real number λ such that grad f(a)=2(a b)= λ grad ϕ(a). − The assertion follows since by Example7.16 (c), grad ϕ(a) is orthogonal to M at a and b a − is a multiple of the normal vector ϕ(a). ∇

Theorem 7.21 (Lagrange Multiplier Rule — extended version) Let f,ϕ : U Ê, i = i → 1,...,m, m < n, be continuously differentiable functions. Let M = x U ϕ (x) = = { ∈ | 1 ··· ϕm(x)=0 and suppose that f(x) has a local extrema at a under the constraints x M. } m n ∈ Suppose further that the Jacobi matrix ϕ′(a) Ê × has maximal rank m. ∈ Then there exist real numbers λ1,...,λm such that grad f(a) = grad(λ ϕ + + λ ϕ )(a)=0. 1 1 ··· m m

Note that the rank condition ensures that there is a choice of m variables out of x1,...,xn such that the Jacobian of ϕ1,...,ϕm with respect to this set of variable is nonzero at a. 7.8 Integrals depending on Parameters 225

7.8 Integrals depending on Parameters

b Problem: Define I(y)= a f(x, y) dx; what are the relations between properties of f(x, y) and of I(y) for example with respect to continuity and differentiability. R

7.8.1 Continuity of I(y)

Proposition 7.22 Let f(x, y) be continuous on the rectangle R = [a, b] [c,d]. b × Then I(y)= a f(x, y) dx is continuous on [c,d]. Proof. Let ε>R 0. Since f is continuous on the compact set R, f is uniformly continuous on R

(see Proposition 6.25). Hence, there is a δ > 0 such that x x′ < δ and y y′ < δ and | − | | − | (x, y), (x′,y′) R imply ∈ f(x, y) f(x′,y′) < ε. | − | Therefore, y y < δ and y,y [c,d] imply | − 0 | 0 ∈ b I(y) I(y ) = (f(x, y) f(x, y )) dx ε(b a). | − 0 | − 0 ≤ − Za

This shows continuity of I(y) at y0.

1 x For example, I(y)= 0 arctan y dx is continuous for y > 0. R Remark 7.10 (a) Note that continuity at y0 means that we can interchange the limit and the b b b integral, lim f(x, y) dx = lim f(x, y) dx = f(x, y0) dx. y y0 y y0 → a a → a (b) A similar statementZ holds forZy : Suppose thatZ f(x, y) is continuous on [a, b] [c, + ) →∞ × ∞ and limy + f(x, y)= ϕ(x) exists uniformly for all x [a, b] that is → ∞ ∈ ε> 0 R> 0 x [a, b],y R : f(x, y) ϕ(x) < ε. ∀ ∃ ∀ ∈ ≥ | − | b Then b ϕ(x) dx exists and lim I(y)= ϕ(x) dx. a y →∞ Za R 7.8.2 Differentiation of Integrals

Proposition 7.23 Let f(x, y) be defined on R = [a, b] [c,d] and continuous as a function of x × for every fixed y. Suppose that f (x, y) exists for all (x, y) R and is continuous as a function y ∈ of the two variables x and y. Then I(y) is differentiable and

d b b I′(y)= f(x, y) dx = f (x, y) dx. dy y Za Za 226 7 Calculus of Functions of Several Variables

Proof. Let ε > 0. Since fy(x, y) is continuous, it is uniformly continuous on R. Hence there exists δ > 0 such that x′ x′′ < δ and y′ y′′ < δ imply f (x′,y′) f (x′′,y′′) < ε. | − | | − | | y − y | We have for h < δ | | I(y + h) I(y ) b b f(x, y + h) f(x, y ) 0 − 0 f (x, y ) dx 0 − 0 f (x, y ) dx h − y 0 ≤ h − y 0 Za Za   b

fy(x, y0 + θh) fy(x, y0) dx<ε(b a) Mean value≤ theorem | − | − Za for some θ (0, 1). Since this inequality holds for all small h, it holds for the limit as h 0, ∈ → too. Thus, b I′(y ) f (x, y ) dx ε(b a). 0 − y 0 ≤ − Za

Since ε was arbitrary, the claim follows.

In case of variable integration limits we have the following theorem.

Proposition 7.24 Let f(x, y) be as in Proposition7.23. Let α(y) and β(y) be differentiable on [c,d], and suppose that α([c,d]) and β([c,d]) are contained in [a, b]. β(y) Let I(y)= α(y) f(x, y) dx. Then I(y) is differentiable and

R β(y) I′(y)= f (x, y) dx + β′(y)f(β(y),y) α′(y)f(α(y),y). (7.50) y − Zα(y) v Proof. Let F (y,u,v)= u f(x, y) dx; then I(y)= F (y, α(y), β(y)). The fundamental theorem of calculus yields R ∂F ∂ v (y,u,v)= f(x, y) dx = f(v,y), ∂v ∂v Zu (7.51) ∂F ∂ u (y,u,v)= f(x, y) dx = f(u,y). ∂u ∂u − −  Zv  By the chain rule, the previous proposition and (7.51) we have

∂F ∂F ∂F I′(y)= + α′(y)+ β′(y) ∂y ∂u ∂v ∂F ∂F ∂F = (y, α(y), β(y))+ (y, α(y), β(y)) α′(y)+ (y, α(y), β(y)) β′(y) ∂y ∂u ∂v β(y) = f (x, y) dx + α′(y)( f(α(y),y))+ β′(y)f(β(y),y). y − Zα(y) 7.8 Integrals depending on Parameters 227

4 sin(xy) Example 7.18 (a) I(y) = 3 x dx is differentiable by Proposition7.23 since fy(x, y) = cos(xy) x = cos(xy) is continuous. Hence x R 4 sin(xy) 4 sin 4y sin 3y I′(y)= cos(xy) dx = = . y y − y Z3 3

sin y x2y (b) I(y)= log y e dx is differentiable with

R sin y 2 x2y y sin2 y 1 y(log y)2 I′(y)= x e dx + cos ye e . − y Zlog y 7.8.3 Improper Integrals with Parameters

Suppose that the improper integral ∞ f(x, y) dx exists for y [c,d]. a ∈

R ∞ Definition 7.10 We say that the improper integral a f(x, y) dx converges uniformly with re- spect to y on [c,d] if for every ε> 0 there is an A > 0 such that A > A implies 0 R 0 A ∞ I(y) f(x, y) dx f(x, y) dx <ε − ≡ Za ZA for all y [c,d]. ∈ Note that the Cauchy and Weierstraß criteria (see Proposition6.1 and Theorem6.2) for uniform convergence of series of functions also hold for improper parametric integrals. For example the theorem of Weierstraß now reads as follows.

Proposition 7.25 Suppose that A f(x, y) dx exists for all A a and y [c,d]. Suppose a ≥ ∈ further that f(x, y) ϕ(x) for all x a and ∞ ϕ(x) dx converges. | | ≤ R ≥ a Then ∞ f(x, y) dx converges uniformly with respect to y [c,d]. a R ∈

R ∞ xy y 2 Example 7.19 I(y)= 1 e− x y dx converges uniformly on [2, 4] since

xy y 2 2x 4 2 R f(x, y) = e− x y e− x 4 = ϕ(x). | | ≤ 2x 4 2 and ∞ e− x 4 dx< converges. 1 ∞ If weR add the assumption of uniform convergence then the preceding theorems remain true for improper integrals.

2 Proposition 7.26 Let f(x, y) be continuous on (x, y) Ê a x < , c y d . { ∈ | ≤ ∞ ≤ ≤ } Suppose that I(y)= ∞ f(x, y) dx converges uniformly with respect to y [c,d]. a ∈ Then I(y) is continuous on [c,d]. R Proof. This proof was not carried out in the lecture. Let ε > 0. Since the improper integral converges uniformly, there exists A > 0 such that for all A A we have 0 ≥ 0 ∞ f(x, y) dx <ε ZA

228 7 Calculus of Functions of Several Variables

2 for all y [c,d]. Let A A be fixed. On (x, y) Ê a x A, c y d f(x, y) ∈ ≥ 0 { ∈ | ≤ ≤ ≤ ≤ } is uniformly continuous; hence there is a δ > 0 such that x′ x′′ < δ and y′ y′′ < δ | − | | − | implies ε f(x′,y′) f(x′′,y′′) < . | − | A a − Therefore, A ε f(x, y) f(x, y ) dx< (A a)= ε, for y y < δ. | − 0 | A a − | − 0 | Za − Finally,

A ∞ I(y) I(y ) = + f(x, y) f(x, y ) 2ε for y y < δ. | − 0 | | − 0 | ≤ | − 0 | Za ZA 

We skip the proof of the following proposition.

Proposition 7.27 Let f (x, y) be continuous on (x, y) a x < , c y d , f(x, y) y { | ≤ ∞ ≤ ≤ } continuous with respect to x for all fixed y [c,d]. ∈ Suppose that for all y [c,d] the integral I(y) = ∞ f(x, y) dx exists and the integral ∈ a ∞ f (x, y) dx converges uniformly with respect to y [c,d]. a y ∈ R Then I(y) is differentiable and I (y)= ∞ f (x, y) dx. R ′ a y Combinig the results of the last PropositionR and Proposition7.25 we get the following corollary.

Corollary 7.28 Let f (x, y) be continuous on (x, y) a x < , c y d , f(x, y) y { | ≤ ∞ ≤ ≤ } continuous with respect to x for all fixed y [c,d]. ∈ Suppose that

(a) for all y [c,d] the integral I(y)= ∞ f(x, y) dx exists, ∈ a (b) fy(x, y) ϕ(x) for all x a andR all y | | ≤ ≥ ∞ (c) a ϕ(x) dx exists.

R ∞ Then I(y) is differentiable and I′(y)= a fy(x, y) dx.

2 2 ∞ x R x Example 7.20 (a) I(y) = 0 e− cos(2yx) dx. f(x, y) = e− cos(2yx), fy(x, y) = x2 2x sin(2yx) e− converges uniformly with respect to y since − R x2 x2/2 f (x, y) 2xe− Ke− . | y | ≤ ≤ Hence, ∞ x2 I′(y)= 2x sin(2yx) e− dx. − Z0 x2 x2 Integration by parts with u = sin(2yx), v′ = e− 2x gives u′ =2y cos(2yx), v = e− and − A A x2 A2 x2 e− 2x sin(2yx) dx = sin(2yA) e− 2y cos(2yx) e− dx. − − Z0 Z0 7.8 Integrals depending on Parameters 229

As A the first summand on the right tends to 0; thus I(y) satisfies the ordinary differential →∞ equation I′(y)= 2yI(y). − 2 ODE: y′ = 2xy; dy = 2xy dx; dy/y = 2x dx. Integration yields log y = x + c; x2 − − − − y = c′e− . y2 The general solution is I(y) = Ce− . We determine the constant C. Insert y = 0. Since 2 ∞ x I(0) = 0 e− dx = √π/2, we find

R √π y2 I(y)= e− . 2

Gamma Function ∞ x 1 t (b) The Gamma function Γ(x)= 0 t − e− dt 2.2

is in C∞(Ê ). Let x > 0, say x + R ∈ 2 [c,d]. Recall from Subsection 5.3.3 the defini-

1.8 tion and the proof of the convergence of the im- 1 proper integrals Γ1 = f(x, t) dt and Γ2 = 1.6 0 ∞ x 1 t 1 f(x, t) dt, where f(x, t) = t − e− . Note 1.4 R that Γ1(x) is an improper integral at t =0+0. R α 1.2 By L’Hospital’s rule lim t log t = 0 for all t 0+0 → c/2 1 α > 0. In particular, log t < t− if 0

t x 1 c 1 Since e− < 1 and moreover t − < t − for t < t0 by Lemma1.23(b) we conclude that

∂ c 1 x 1 t c 1 2 c 1 f(x, t) = t − log te− log t t − t− t − = 1 c , ∂x ≤| | ≤ t 2 − 1 for 0 1, we have log t < t and tx < td, ∞R ≥ 0 such that

∂ x 1 t x t d t d t/2 t/2 t/2 f(x, t)= t − log te− t e− t e− t e− e− Me− . ∂x ≤ ≤ ≤ ≤ d t/2 t/2 Since t e− tends to 0 as t , it is bounded by some constant M and e− is integrable on →∞ [1, + ) such that Γ (x) is differentiable with ∞ 2 ∞ x 1 t Γ2′ (x)= t − log te− dt. Z1 Consequently, Γ(x) is differentiable for all x> 0 with

∞ x 1 t Γ′(x)= t − log t e− dt. Z0

Similarly one can show that Γ C∞(Ê ) with ∈ >0 (k) ∞ x 1 k t Γ (x)= t − (log t) e− dt. Z0 230 7 Calculus of Functions of Several Variables

7.9 Appendix

∂fi Proof of Proposition7.7. Let A =(Aij)= be the matrix of partial derivatives consid- ∂xj

n m   Ê ered as a linear map from Ê to . Our aim is to show that

f(a + h) f(a) A h lim k − − k =0. h 0 h → k k For, it suffices to prove the convergence to 0 for each coordinate i = 1,...,m by Proposi- tion 6.26

n fi(a + h) fi(a) Aijhj lim − − j=1 =0. (7.52) h 0 h → k k P

Without loss of generality we assume m =1 and f = f1. For simplicity, let n =2, f = f(x, y), a =(a, b), and h =(h, k). Note first that by the mean value theorem we have

f(a + h, b + k) f(a, b)= f(a + h, b + k) f(a, b + k)+ f(a, b + k) f(a, b) − − − ∂f ∂f = (ξ, b + k)h + (a, η)k, ∂x ∂y where ξ (a, a + h) and η (b, b + k). Using this, the expression in(7.52) reads ∈ ∈ f(a + h, b + k) f(a, b) ∂f (a, b) h ∂f (a, b) k − − ∂x − ∂y = √h2 + k2 ∂f ∂f ∂f ∂f ∂x (ξ, b + k) ∂x (a, b) h + ∂y (a, η) ∂x (a, b) k = − − . √h2 + k2 

Since both ∂f and ∂f are continuous at (a, b), given ε > 0 we find δ > 0 such that (x, y) ∂x ∂y ∈ Uδ((a, b)) implies

∂f ∂f ∂f ∂f (x, y) (a, b) < ε, (x, y) (a, b) < ε. ∂x − ∂x ∂y − ∂y

This shows

∂f ∂f ∂f ∂f ∂x (ξ, b + k) ∂x (a, b) h + ∂y (a, η) ∂x (a, b) k ε h + ε k − − | | | | 2ε, √ 2 2  ≤ √ 2 2 ≤ h + k h + k

∂f ∂f hence f is differentiable at (a, b) with Jacobi matrix A =( ∂x (a, b) ∂y (a, b)). Since both components of A—the partial derivatives—are continuous functions of (x, y), the assignment x f ′(x) is continuous by Proposition6.26. 7→ Chapter 8

Curves and Line Integrals

8.1 Rectifiable Curves

k 8.1.1 Curves in Ê

k We consider curves in Ê . We define the tangent vector, regular points, angle of intersection.

k k

Ê Ê Definition 8.1 A curve in Ê is a continuous mapping γ : I , where I is a closed → ⊂ interval consisting of more than one point.

The interval can be I = [a, b], I = [a, + ), or I = Ê. In the first case γ(a) and γ(b) are ∞ called the initial and end point of γ. These two points derfine a natural orientation of the curve “from γ(a) to γ(b)”. Replacing γ(t) by γ(a + b t) we obtain the curve from γ(b) to γ(a) with − opposite orientation.

If γ(a)= γ(b), γ issaidtobea closed curve. The curve γ is given by a k-tupel γ =(γ1,...,γk) of continuous real-valued functions. If γ is differentiable, the curve is said to be differentiable. k Note that we have defined the curve to be a mapping, not a set of points in Ê . Of course, with

k k Ê each curve γ in Ê there is associated a subset of , namely the image of γ,

k

C = γ(I)= γ(t) Ê t I . { ∈ | ∈ } but different curves γ may have the same image C = γ(I). The curve is said to be simple if γ is injective on the inner points I◦ of I. A simple curve has no self-intersection.

2 Example 8.1 (a) A circle in Ê of radius r> 0 with center (0, 0) is described by the curve

2

γ : [0, 2π] Ê , γ(t)=(r cos t, r sin t). → 2 Note that γ˜ : [0, 4π] Ê with γ˜(t) = γ(t) has the same image but is different from γ. γ is a → simple curve, γ˜ is not. k (b) Let p, q Ê be fixed points, p = q. Then ∈ 6 γ (t)=(1 t)p + tq, t [0, 1], 1 − ∈

γ (t)=(1 t)p + tq, t Ê, 2 − ∈

231 232 8 Curves and Line Integrals

k are the segment pq from p to q and the line pq through p and q, respectively. If v Ê is a ∈ vector, then γ3(t)= p + tv, t Ê, is the line through p with direction v.

∈ 2 Ê (c) If f : [a, b] Ê is a continuous function, the graph of f is a curve in : → 2

γ : [a, b] Ê , γ(t)=(t, f(t)). →

2 Ê (d) Implicit Curves. Let F : U Ê be continuously differentiable, F (a, b)=0, and ⊂ → F (a, b) = 0 for some point (a, b) U. By the Implicit function theorem, F (x, y)=0 can ∇ 6 ∈ locally be solved for y = g(x) or x = f(y). In both cases γ(t)=(t, g(t)) and γ(t)=(f(t), t) is a curve through (a, b). For example, F (x, y)= y2 x3 x2 =0 − − is locally solvable except for (a, b)=(0, 0). The corresponding curve is Newton’s knot

k Definition 8.2 (a) A simple curve γ : I Ê is said to be regular at t if γ is continuously → 0 differentiable on I and γ′(t ) =0. γ is regular if it is regular at every point t I. 0 6 0 ∈

(b) The vector γ′(t ) is called the tangent vector, α(t) = γ(t )+ tγ′(t ), t Ê, is called the 0 0 0 ∈ tangent line to the curve γ at point γ(t0).

Remark 8.1 The moving partice. Let t the time variable and s(t) the coordinates of a point k moving in Ê . Then the tangent vector v(t) = s′(t) is the velocity vector of the moving point. 2 2 The instantaneous velocity is the euclidean norm of v(t) v(t) = s′ (t) + + s′ (t) . The k k 1 ··· k acceleration vector is the second derivative of s(t), a(t)= v (t)= s (t). ′ p′′ k Let γ : I Ê , i = 1, 2, be two regular curves with a common point γ (t ) = γ (t ). The i i → 1 1 2 2 angle of intersection ϕ between the two curves γi at ti is defined to be the angle between the two tangent lines γ1′ (t1) and γ2′ (t2). Hence, γ (t ) γ (t ) cos ϕ = 1′ 1 2′ 2 , ϕ [0, π]. γ (t ) · γ (t ) ∈ k 1′ 1 kk 2′ 2 k NewtonsKnot

Example 8.2 (a) Newton’s knot. The curve 1.5

2 2 3 Ê γ : Ê given by γ(t)=(t 1, t t) 1 → − − is not injective since γ( 1) = γ(1) = (0, 0) = 0.5 − x0. The point x0 is a double point of the curve. 0 -1 -0.5 0 0.5 1 In general γ has two different tangent lines at -0.5 2 a double point. Since γ′(t) = (2t, 3t 1) we − -1 have γ′( 1)=( 2, 2) and γ′(1) = (2, 2). The − − -1.5 curve is regular since γ′(t) =0 for all t. 6

Let us compute the angle of self-intersection. Since γ( 1) = γ(1) = (0, 0), the self- − intersection angle ϕ satisfies ( 2, 2) (2, 2) cos ϕ = − =0, 8· 8.1 Rectifiable Curves 233

hence ϕ = 90◦, the intersection is orthogonal.

2 2 3 2 Ê (b) Neil’s parabola. Let γ : Ê be given by γ(t)=(t , t ). Since γ′(t) = (2t, 3t ), the → origin is the only singular point.

8.1.2 Rectifiable Curves The goal of this subsection is to define the length of a curve. For differentiable curves, there is a formula using the tangent vector. However, the “lenght of a curve” makes sense for some non-differentiable, continuous curves. k Let γ : [a, b] Ê be a curve. We associate to each partition P = t ,...,t of [a, b] the → { 0 n} points xi = γ(ti), i =0,...,n, and the number

n

ℓ(P,γ)= γ(ti) γ(ti 1) . (8.1) k − − k i=1 X The ith term in this sum is the euclidean distance of the points xi 1 = γ(ti 1) and xi = γ(ti). − −

x3 Hence ℓ(P,γ) is the length of the polygonal x0 path with vertices x0,...,xn. As our parti- tion becomes finer and finer, this polygon ap- x11 proaches the image of γ more and more closely. x2 a t1 t2 b

k Definition 8.3 A curve γ : [a, b] Ê is said to be rectifiable if the set of non-negative real → numbers ℓ(P,γ) P is a partition of [a, b] is bounded. In this case { | } ℓ(γ) = sup ℓ(P,γ), where the supremum is taken over all partitions P of [a, b], is called the length of γ. In certain cases, ℓ(γ) is given by a Riemann integral. We shall prove this for continuously differentiable curves, i. e. for curves γ whose derivative γ′ is continuous.

Proposition 8.1 If γ′ is continuous on [a, b], then γ is rectifiable, and

b ℓ(γ)= γ′(t) dt. k k Za ti Proof. If a ti 1 < ti b, by Theorem5.28, γ(ti) γ(ti 1) = γ′(t) dt. Applying ≤ − ≤ − − ti−1 Proposition5.29 we have R ti ti γ(ti) γ(ti 1) = γ′(t) dt γ′(t) dt. k − − k ≤ k k Zti−1 Zti−1

Hence b ℓ(P,γ) γ′(t) dt ≤ k k Za 234 8 Curves and Line Integrals for every partition P of [a, b]. Consequently,

b ℓ(γ) γ′(t) dt. ≤ k k Za To prove the opposite inequality, let ε> 0 be given. Since γ′ is uniformly continuous on [a, b], there exists δ > 0 such that

γ′(s) γ′(t) <ε if s t < δ. k − k | − |

Let P be a partition with ∆ti δ for all i. If ti 1 t ti it follows that ≤ − ≤ ≤ γ′(t) γ′(t ) + ε. k k≤k i k Hence

ti γ′(t) dt γ′(t ) ∆t + ε∆t k k ≤k i k i i Zti−1 ti = (γ′(t) γ′(t ) γ′(t)) dt + ε∆t − i − i Zti−1 ti ti

γ′(t) dt + (γ′(ti) γ′(t)) dt + ε∆ti ≤ − Zti−1 Zti−1

γ′(ti) γ′(ti 1) +2ε∆ti. ≤k − − k If we add these inequalities, we obtain

b γ′(t) dt ℓ(P,γ)+2ε(b a) ℓ(γ)+2ε(b a). k k ≤ − ≤ − Za Since ε was arbitrary, b γ′(t) dt ℓ(γ). k k ≤ Za This completes the proof.

Special Case k = 2 k =2, γ(t)=(x(t),y(t)), t [a, b]. Then ∈ b 2 2 ℓ(γ)= x′(t) + y′(t) dt. a Z p In particular, let γ(t) = (t, f(t)) be the graph of a continuously differentiable function b 2 f : [a, b] Ê. Then ℓ(γ)= 1+(f (t)) dt. → a ′ p t t Example 8.3 Catenary Curve.R Let f(t) = a cosh , t [0, b], b > 0. Then f ′(t) = sinh a ∈ a and moreover

b t 2 b t t b b ℓ(γ)= 1+ sinh dt = cosh dt = a sinh = a sinh . 0 s a 0 a b a Z   Z 0

8.1 Rectifiable Curves 235

Cycloid and Circle 2

1.5 (b) The position of a bulge in a bicycle 1 tire as it rolls down the street can be

0.5 parametrized by an angle θ as shown in the figure. 0 1 2 3 4 5 6

Cycloid Defining Circle Curve 3 Curve 4 Let the radius of the tire be a. It can be verified by plane trigonometry that

a(θ sin θ) γ(θ)= − . a(1 cos θ)  −  This curve is called a cycloid. Find the distance travelled by the bulge for 0 θ 2π. θ ≤ ≤ Using 1 cos θ = 2 sin2 we have − 2

γ′(θ)= a(1 cos θ, sin θ) − 2 2 γ′(θ) = a (1 cos θ) + sin θ = a√2 2 cos θ k k − − q θ = a√2√1 cos θ =2a sin . − 2 Therefore,

2π θ θ 2π ℓ(γ)=2a sin dθ = 4a cos =4a( cos π +cos0) = 8a. 2 − 2 − Z0   0

(c) The arc element ds. Formally the arc element of a plane differentiable curve can be computed using the pythagorean theorem ds dy ( ds)2 = ( dx)2 + ( dy)2 = ds = dx2 + dy2 ⇒ dy2 p dx ds = dx 1+ 2 r dx 2 ds = 1+(f ′(x)) dx. (d) Arc of an Ellipse. The ellipse with equation x2/a2 +y2/b2 =1 , 0 < b a, is parametrized p ≤ by γ(t)=(a cos t, b sin t), t [0, t ], such that γ′(t)=( a sin t, b cos t). Hence, ∈ 0 − t0 t0 ℓ(γ)= a2 sin2 t + b2 cos2 t dt = a2 (a2 b2) cos2 t dt − − Z0 Z0 tp0 p = a √1 ε2 cos2 t dt, − Z0 √a2 b2 where ε = a− . This integral can be transformed into the function τ E(τ, ǫ)= 1 ε2 sin2 t dt 0 − Z p 236 8 Curves and Line Integrals is the elliptic integral of the second kind as defined in Chapter5. (e) A non-rectifiable curve. Consider the graph γ(t)=(t, f(t)), t [0, 1], of the function f, ∈ t cos π , 0 < t 1, f(t)= 2t ≤ (0, t =0.

Since lim f(t)= f(0) = 0, f is continuous and γ(t) is a curve. However, this curve is not rec- t 0+0 → tifiable. Indeed, choose the partition Pk = t0 = 0, 1/(4k), 1/(4k 2),..., 1/4, 1/2, t2k+1 = { 1 − 1 consisting of 2k + 1 points, ti = 4k 2i+2 , i = 1,..., 2k. Note that t0 = 0 and } − t = 1 play a special role and will be omitted in the calculations below. Then cos π = 2k+1 2ti (1, 1, 1, 1,..., 1, 1), i =1,..., 2k. Thus   − − − 2k 2k 2 2 ℓ(Pk,γ) (ti ti 1) +(f(ti) f(ti 1)) f(ti) f(ti 1) ≥ − − − − ≥ | − − | i=2 i=2 X p X 1 1 1 1 1 1 + + + + + + ≥ 4k 2 4k 4k 4 4k 2 ··· 2 4  −   − −    1 1 1 1 = +2 + + + 2 4 ··· 4k 2 4k  −  which is unbounded for k since the harmonic series is unbounded. Hence γ is not → ∞ rectifiable.

8.2 Line Integrals

A lot of physical applications are to be found in [MW85, Chapter 18]. Integration of vector fields along curves is of fundamental importance in both mathematics and physics. We use the concept of work to motivate the material in this section. The motion of an object is described by a parametric curve ~x = ~x(t)=(x(t),y(t), z(t)). By differentiating this function, we obtain the velocity ~v(t) = ~x′(t) and the acceleration ~a(t) = ~x′′(t). We use the physicist notation ~x˙(t) and ~x¨(t) to denote derivatives with respect to the time t. According to Newton’s law, the total force F˜ acting on an object of mass m is

F˜ = m~a.

Since the kinetic energy K is defined by K = 1 m~v2 = 1 m~v ~v we have 2 2 · 1 K˙ (t)= m(~v˙ ~v + ~v ~v˙)= m~a ~v = F˜ ~v. 2 · · · ·

The total change of the kinetic energy from time t1 to t2, denoted W , is called the work done by the force F˜ along the path ~x(t):

t2 t2 t2 W = K˙ (t) dt = F˜ ~v dt = F˜(t) ~x˙(t) dt. · · Zt1 Zt1 Zt1 8.2 Line Integrals 237

Let us now suppose that the force F˜ at time t depends only on the position ~x(t). That is, we assume that there is a vector field F~ (~x) such that F˜(t)= F~ (~x(t)) (gravitational and electrostatic attraction are position-dependent while magnetic forces are velocity-dependent). Then we may rewrite the above integral as t2 W = F~ (~x(t)) ~x˙(t) dt. · Zt1 In the one-dimensional case, by a change of variables, this can be simplified to

b W = F (x) dx, Za where a and b are the starting and ending positions.

Definition 8.4 Let Γ = ~x(t) t [r, s] , be a continuously differentiable curve ~x(t)

1 n ~ { n| ∈ } ∈ Ê C ([r, s]) in Ê and f : Γ a continuous vector field on Γ . The integral → s f~(~x) d~x = f~(~x(t)) ~x˙(t) dt · · ZΓ Zr is called the line integral of the vector field f~ along the curve Γ .

Remark 8.2 (a) The definition of the line integral does not depend on the parametrization of Γ . (b) If we take different curves between the same endpoints, the line integral may be different. ~ ~ (c) If the vector field f is orthogonal to the tangent vector, then Γ f d~x =0. ~ 2 · (d) Other notations. If f =(P, Q) is a vector field in Ê , R f~ d~x = P dx + Q dy, · ZΓ ZΓ where the right side is either a symbol or P dx = (P, 0) d~x. Γ Γ · Example 8.4 (a) Find the line integral R y dx +(xR y) dy, i = Γi − (1,1) 1, 2, where R Γ 2 Γ 4 Γ = ~x(t)=(t, t ) t [0, 1] and Γ = Γ Γ , 1 1 { | ∈ } 2 3 ∪ 4 with Γ = (t, 0) t [0, 1] , Γ = (1, t) t [0, 1] . 3 { | ∈ } 4 { | ∈ } (0,0) Γ (1,0) In the first case ~x˙(t)=(1, 2t); hence 3

1 1 1 y dx +(x y) dy = (t2 1+(t t2)2t) dt = (3t2 2t3) dt = . − · − − 2 ZΓ Z0 Z0 In the second case f d~x = f d~x + f d~x. For the first part ( dx, dy)=(dt, 0), for the Γ2 Γ3 Γ4 second part ( dx, dy)=(0, dt) such that R R R 1 1 1 1 1 f d~x = y dx +(x y) dy = 0 dt +(t 0) 0+ t 0+(1 t) dt = t t2 = . − − · · − − 2 2 ZΓ ZΓ Z0 Z0 0

(b) Find the work done by the force field F~ (x, y, z)=(y, x, 1) as a particle moves from − (1, 0, 0) to (1, 0, 1) along the following paths ε = 1: ± 238 8 Curves and Line Integrals

~x(t) = (cos t, ε sin t, t ), t [0, 2π], ε 2π ∈ We find 2π F~ d~x = (ε sin t, cos t, 1) ( sin t, ε cos t, 1/(2π)) dt · − · − ZΓε Z0 2π 1 = ε sin2 t ε cos2 t + dt − − 2π Z0   = 2πε +1. − In case ε = 1, the motion is “with the force”, so the work is positive; for the path ε = 1, the − motion is against the force and the work is negative. n We can also define a scalar line integral in the following way. Let γ : [a, b] Ê be a contin- → uously differentiable curve, Γ = γ([a, b]), and f : Γ Ê a continuous function. The integral → b f(x) ds := f(γ(t)) γ′(t) dt k k ZΓ Za is called the scalar line integral of f along Γ .

Properties of Line Integrals

Remark 8.3 (a) Linearity.

(f~ + ~g) d~x = f~ d~x + ~g d~x, λf~d~x = λ f~ d~x. ZΓ ZΓ ZΓ ZΓ ZΓ (b) Change of orientation. If ~x(t), t [r, s] defines a curve Γ which goes from a = ~x(r) to ∈ b = ~x(s), then ~y(t)= ~x(r + s t), t [r, s], defines the curve Γ which goes in the opposite − ∈ − direction from b to a. It is easy to see that

f~d~x = f~ d~x. Γ − Γ Z− Z (c) Triangle inequality. f~d~x ℓ(Γ ) sup f~(x) . Γ ≤ x Γ Z ∈

Proof. Let ~x(t), t [t0, t1] be a parametrization of Γ , then ∈ t1 t1 f~d~x = f~(~x(t)) ~x′(t) dt f~(~x(t)) ~x′(t) dt · tri.in.,CSI≤ k k ZΓ Zt0 Zt0 t1 ~ ~ sup f(~x) ~x′(t) dt = sup f (~x) ℓ(Γ ). ≤ ~x Γ t k k ~x Γ ∈ Z 0 ∈

(d) Splitting. If Γ1 and Γ2 are two curves such that the ending point of Γ1 equals the starting point of Γ2 then f~ d~x = f~d~x + f~ d~x. Γ1 Γ2 Γ1 Γ2 Z ∪ Z Z 8.2 Line Integrals 239

8.2.1 Path Independence

Problem: For which vector fields f~ the line integral from a to b does not depend upon the path (see Example8.4(a) Example 8.2)?

~ n n Ê Definition 8.5 A vector field f : G Ê , G , is called conservative if for any points a → ⊂ and b in G and any curves Γ1 and Γ2 from a to b we have

f~ d~x = f~d~x. ZΓ1 ZΓ2 ~ In this case we say that the line integral Γ f d~x is path independent and we use the notation b f~ d~x. a R R ~ n Definition 8.6 A vector field f : G Ê is called potential field or gradient vector field if → ~ there exists a continuously differentiable function U : G Ê such that f(x) = grad U(x) for → x G. We call U the potential or antiderivative of f~. ∈ Example 8.5 The gravitational force is given by x F~ (x)= α , − x 3 k k where α = γmM. It is a potential field with potential 1 U(x)= α x k k x This follows from Example7.2(a), grad f( x ) = f ′( x ) with f(y)=1/y and f ′(y) = k k k k x 1/y2. k k − Remark 8.4 (a) A vector field f~ is conservative if and only if the line integral over any closed curve in G is 0. Indeed, suppose that f~ is conservative and Γ = Γ Γ is a closed curve, 1 ∪ 2 where Γ1 is a curve from a to b and Γ2 is a curve from b to a. By Remark 8.3 (b), changing the orientation of Γ , the sign of the line integral changes and Γ is again a curve from a to b: 2 − 2

f~ d~x = + f~ d~x = f~ d~x =0. Γ Γ1 Γ2 Γ1 − Γ2 Z Z Z  Z Z−  The proof of the other direction is similar.

y G G (b) Uniqueness of a potential. An open subset n G Ê is said to be connected, if any two ⊂ points x, y G can be connected by a polygo- x ∈ x nal path from x to y inside G. If it exists, U(x) is uniquely determined up to a constant.

y 240 8 Curves and Line Integrals

Indeed, if grad U (x) = grad U (x)= f~ put U = U U then we have U = U U = 1 2 1 − 2 ∇ ∇ 1 − ∇ 2 f~ f~ = 0. Now suppose that x, y G can be connected by a segment xy G inside G. By − ∈ ⊂ the MVT (Corollary7.12)

U(y) U(x)= U((1 t)x + ty)(y x)=0, − ∇ − − since U = 0 on the segment xy. This shows that U = U U = const. on any polygonal ∇ 1 − 2 path inside G. Since G is connected, U U is constant on G. 1 − 2 n Theorem 8.2 Let G Ê be a domain. ⊂ ~ ~ (i) If U : G Ê is continuously differentiable and f = grad U. Then f is conservative, and → for every (piecewise continuously differentiable) curve Γ from a to b, a, b G, we have ∈ f~ d~x = U(b) U(a). − ZΓ ~ n (ii) Let f : G Ê be a continuous, conservative vector field and a G. Put → ∈ x U(x)= f~ d~y, x G. ∈ Za Then U(x) is a potential for f~, that is grad U = f~. (iii) A continuous vector field f~ is conservative in G if and only if it is a potential field. Proof. (i) Let Γ = ~x(t) t [r, s] , be a continuously differentiable curve from a = ~x(r) to { | ∈ } b = ~x(s). We define ϕ(t)= U(~x(t)) and compute the derivative using the chain rule

ϕ˙(t) = grad U(~x(t)) ~x˙(t)= f~(~x(t)) ~x˙(t). · · By definition of the line integral we have

s f~ d~x = f~(~x(t)) ~x˙(t) dt. ZΓ Zr Inserting the above expression and applying the fundamental theorem of calculus, we find

s f~ d~x = ϕ˙(t) dt = ϕ(s) ϕ(r)= U(~x(s)) U(~x(r)) = U(b) U(a). − − − ZΓ Zr n (ii) Choose h Ê small such that x + th G for all t [0, 1]. By the path independence of ∈ ∈ ∈ the line integral

x+h x x+h U(x + h) U(x)= f~ d~y f~ d~y = f~ d~y − · − · · Za Za Zx Consider the curve ~x(t)= x + th, t [0, 1] from x to x + h. Then ~x˙(t)= h. By the mean value ∈ theorem of integration (Theorem 5.18 with ϕ =1, a =0 and b =1) we have

x+h 1 f~ d~y = f~(~x(t)) h dt = f~(x + θh) h, · · · Zx Z0 8.2 Line Integrals 241 where θ [0, 1]. We check grad U(x)= f~(x) using the definition of the derivative: ∈ U(x + h) U(x) f~(x) h (f~(x + θh) f~(x)) h f~(x + θh) f~(x) h − − · = − · − k k

h h CSI≤ h k k k k k k = f~(x + θh) f~(x) 0, − −→h 0 → since f~ is continuous at x. This shows that U = f~. ∇ (iii) follows immediately from (i) and (ii).

Remark 8.5 (a) In case n =2, a simple path to compute the line integral (and so the potential U) in (ii) consists of 2 segments: from (0, 0) via (x, 0) to (x, y). The line integral of P dx+Q dy then reads as ordinary Riemann integrals

x y U(x, y)= P (t, 0) dt + Q(x, t) dt. Z0 Z0 (b) Case n = 3. You can also use just one single segment from the origin to the endpoint (x, y, z). This path is parametrized by the curve

~x(t)=(tx, ty, tz), t [0, 1], ~x˙(t)=(x, y, z). ∈ We obtain

(x,y,z) U(x, y, z)= f1 dx + f2 dy + f3 dz (8.2) Z(0,0,0) 1 1 1 = x f1(tx, ty, tz) dt + y f2(tx, ty, tz) dt + z f3(tx, ty, tz) dt. (8.3) Z0 Z0 Z0 (c) Although Theorem 8.2 gives a necessary and sufficient condition for a vector field to be conservative, we are missing an easy criterion. ~ Recall from Example 7.4, that a necessary condition for f =(f1,...,fn) to be a potential vector field is ∂f ∂f i = j , 1 i < j n. ∂xj ∂xi ≤ ≤ which is a simple consequence from Schwarz’s lemma since if fi = Uxi then

∂Uxi ∂fi ∂fj ∂Uxj Uxixj = = = = = Uxj xi . ∂xj ∂xj ∂xi ∂xi

The condition ∂fi = ∂fj , 1 i < j n. is called integrability condition for f~. It is a ∂xj ∂xi ≤ ≤ necessary condition for f~ to be conservative. However, it is not sufficient.

2 Remark 8.6 Counter example. Let G = Ê (0, 0) and \ { } y x f~ =(P, Q)= − , . x2 + y2 x2 + y2   242 8 Curves and Line Integrals

The vector field satisfies the integrability condition Py = Qx. However, it is not conservative. For, consider the unit circle γ(t) = (cos t, sin t), t [0, 2π]. Then γ′(t)=( sin t, cos t) and ∈ − y dx x dy 2π sin t sin t dt cos t cos t dt 2π f~ d~x = − + = + = dt =2π. x2 + y2 x2 + y2 1 1 Zγ Zγ Z0 Z0 ~ ~ This contradicts Γ f d~x =0 for conservative vector fields. Hence, f is not conservative. ~ 2 f fails to be conservative since G = Ê (0, 0) has an hole. R \ { } For more details, see homework 30.1. The next proposition shows that under one additional assumption this criterion is also sufficient. n A connected open subset G (a region) of Ê is called simply connected if every closed polygonal path inside G can be shrunk inside G to a single point. Roughly speaking, simply connected sets do not have holes.

n convex subset of Ê simply connected 1 1-torus S = z C z =1 not simply connected { ∈ 2 || 2 | 2} 2 2 annulus (x, y) Ê r < x + y R , 0 r < R not simply connected 2 { ∈ | ≤ } ≤ ≤∞ Ê \ (0, 0) not simlpy connected 3 { } Ê (0, 0, 0) simply connected \ { } The precise mathematical term for a curve γ to be “shrinkable to a point” is to be null- homotopic.

n Definition 8.7 (a) A closed curve γ : [a, b] G, G Ê open, is said to be null-homotopic if → ⊂ there exists a continuous mapping h: [a, b] [0, 1] G and a point x G such that × → 0 ∈ (a) h(t, 0) = γ(t) for all t,

(b) h(t, 1) = x0 for all t, (c) h(a, s)= h(b, s)= x for all s [0, 1]. 0 ∈ (b) G is simply connected if any curve in G is null homotopic. ~ Proposition 8.3 Let f =(f1, f2, f3) a continuously differentiable vector field on a region G 3 ⊂ Ê . (a) If f~ is conservative then curl f~ =0, i.e. ∂f ∂f ∂f ∂f ∂f ∂f 3 2 =0, 1 3 =0, 2 1 =0. ∂x2 − ∂x3 ∂x3 − ∂x1 ∂x1 − ∂x2 (b) If curl f~ =0 and G is simply connected, then f~ is conservative. Proof. (a) Let f~ be conservative; by Theorem 8.2 there exists a potential U, grad U = f~. However, curl grad U =0 since ∂f ∂f ∂2U ∂2U 3 2 = =0 ∂x2 − ∂x3 ∂x2∂x3 − ∂x3∂x2 by Schwarz’s Lemma. (b) This will be an application of Stokes’ theorem, see below. 8.2 Line Integrals 243

3 ~ 2 x 2 Example 8.6 Let on Ê , f =(P, Q, R)=(6xy + e , 6x y, 1). Then

curl f~ =(R Q ,P R , Q P )=(0, 0, 12xy 12xy)=0; y − z z − x x − y − hence, f~ is conservative with the potential U(x, y, z). First method to compute the potential U: ODE ansatz. 2 x The ansatz Ux =6xy + e will be integrated with respect to x:

2 x 2 2 x U(x, y, z)= Ux dx + C(y, z)= (6xy + e ) dx + C(y, z)=3x y + e + C(y, z). Z Z Hence, 2 ! 2 ! Uy =6x y + Cy(y, z) =6x y, Uz = Cz =1.

This implies Cy = 0 and Cz = 1. The solution here is C(y, z) = z + c1 such that U = 2 2 x 3x y + e + z + c1. Second method: Line Integrals. See Remark 8.5 (b))

1 1 1 U(x, y, z)= x f1(tx, ty, tz) dt + y f2(tx, ty, tz) dt + z f3(tx, ty, tz) dt 0 0 0 Z 1 Z 1 1 Z = x (6t3xy2 + etx) dt + y 6t3x2y dt + z dt Z0 Z0 Z0 =3x2y2 + ex + z. 244 8 Curves and Line Integrals Chapter 9

Integration of Functions of Several Variables

References to this chapter are [O’N75, Section 4] which is quite elemantary and good accessi- ble. Another elementary approach is [MW85, Chapter 17] (part III). A more advanced but still good accessible treatment is [Spi65, Chapter 3]. This will be our main reference here. Rudin’s book [Rud76] is not recommendable for an introduction to integration.

9.1 Basic Definition

n Ê The definition of the Riemann integral of a function f : A Ê, where A is a closed → ⊂ rectangle, is so similar to that of the ordinary integral that a rapid treatment will be given, see Section 5.1. If nothing is specified otherwise, A denotes a rectangle. A rectangle A is the cartesian product of n intervals,

n

A = [a , b ] [a , b ]= (x ,...,x ) Ê a x b , k =1,...,n . 1 1 ×···× n n { 1 n ∈ | k ≤ k ≤ k } Recall that a partition of a closed interval [a, b] is a sequence t ,...,t where a = t t 0 k 0 ≤ 1 ≤ tk = b. The partition divides the interval [a, b] in to k subintervals [ti 1, ti]. A partition ··· ≤ − of a rectangle [a , b ] [a , b ] is a collection P = (P ,...,P ) where each P is a 1 1 ×···× n n 1 n i partition of the interval [ai, bi]. Suppose for example that P1 = (t0,...,tk) is a partition of

[a1, b1] and P2 = (s0,...,sl) is a partition of [a2, b2]. Then the partition P = (P1,P2) of [a , b ] [a , b ] divides the closed rectangle [a , b ] [a , b ] into kl subrectangles, a typical 1 1 × 2 2 1 1 × 2 2 one being [ti 1, ti] [sj 1,sj]. In general, if Pi divides [ai, bi] into Ni subintervals, then P = − × − (P ,...,P ) divides [a , b ] [a , b ] into N N subrectangles. These subrectangles 1 n 1 1 ×···× n n 1 ··· n will be called subrectangles of the partition P .

Suppose now A is a rectangle, f : A Ê is a bounded function, and P is a partition of A. For → each subrectangle S of the partition let

m = inf f(x) x S , M = sup f(x) x S , S { | ∈ } S { | ∈ }

245 246 9 Integration of Functions of Several Variables and let v(S) be the volume of the rectangle S. Note that volume of the rectangle A = [a , b ] [a , b ] is 1 1 ×···× n n v(A)=(b a )(b a ) (b a ). 1 − 1 2 − 2 ··· n − n The lower and the upper sums of f for P are defined by

L(P, f)= mS v(S) and U(P, f)= MS v(S), XS XS where the sum is taken over all subrectangles S of the partition P . Clearly, if f is bounded with m f(x) M on the rectangle x R, ≤ ≤ ∈ m v(R) L(P, f) U(P, f) M v(R), ≤ ≤ ≤ so that the numbers L(P, f) and U(P, f) form bounded sets. Lemma5.1 remains true; the proof is completely the same.

Lemma 9.1 (a) Suppose the partition P ∗ is a refinement of P (that is, each subrectangle of P ∗ is contained in a subrectangle of P ). Then

L(P, f) L(P ∗, f) and U(P ∗, f) U(P, f). ≤ ≤

(b) If P and P ′ are any two partitions, then L(P, f) U(P ′, f). ≤ It follows from the above corollary that all lower sums are bounded above by any upper sum and vice versa.

Definition 9.1 Let f : A Ê be a bounded function. The function f is called Riemann inte- → grable on the rectangle A if

f dx := sup L(P, f) = inf U(P, f) =: f dx, { } P { } Z A P Z A where the supremum and the infimum are taken over all partitions P of A. This common number is the Riemann integral of f on A and is denoted by

f dx or f(x ,...,x ) dx dx . 1 n 1 ··· n ZA ZA f dx and f dx are called the lower and the upper integral of f on A, respectively. They A A always exist. The set of integrable function on A is denoted by R(A). R R As in the one dimensional case we have the following criterion.

Proposition 9.2 (Riemann Criterion) A bounded function f : A Ê is integrable if and only → if for every ε> 0 there exists a partition P of A such that U(P, f) L(P, f) <ε. − 9.1 Basic Definition 247

Example 9.1 (a) Let f : A Ê be a constant function f(x)= c. Then for any Partition P and → any subrectangle S we have mS = MS = c, so that L(P, f)= U(P, f)= c v(S)= c v(S)= cv(A). S S X X Hence, A c dx = cv(A).

(b) Let f : [0, 1] [0, 1] Ê be defined by R × → 0, if x is rational, f(x, y)= (1, if x is irrational. If P is a partition, then every subrectangle S will contain points (x, y) with x rational, and also points (x, y) with x irrational. Hence mS =0 and MS =1, so L(P, f)= 0v(S)=0, S X and U(P, f)= 1v(S)= v(A)= v([0, 1] [0, 1]) = 1. × S X Therefore, f dx =1 =0= f dx and f is not integrable. A 6 A R R 9.1.1 Properties of the Riemann Integral We briefly write R for R(A).

Remark 9.1 (a) R is a linear space and ( ) dx is a linear functional, i.e. f,g R imply A · ∈

λf + µg R for all λ,µ Ê and ∈ ∈ R (λf + µg) dx = λ f dx + µ g dx. ZA ZA ZA (b) R is a lattice, i.e., f R implies f R. If f,g R, then max f,g R and ∈ | | ∈ ∈ { } ∈ min f,g R. { } ∈ (c) R is an algebra, i. e., f,g R imply fg R. ∈ ∈ (d) The triangle inequality holds:

f dx f dx. ≤ | | ZA ZA

(e) C(A) R(A). ⊂ (f) f R(A) and f(A) [a, b], g C[a, b]. Then g◦f R(A). ∈ ⊂ ∈ ∈ (g) If f R and f = g except at finitely many points, then g R and f dx = g dx. ∈ ∈ A A

(h) Let f : A Ê and let P be a partition of A. Then f R(A) Rif and onlyR if f↾S is → ∈ integrable for each subrectangle S. In this case

f dx = f↾S dx. A S Z XS Z 248 9 Integration of Functions of Several Variables

9.2 Integrable Functions

We are going to characterize integrable functions. For, we need the notion of a set of measure zero.

n Definition 9.2 Let A be a subset of Ê . A has (n-dimensional) measure zero if for every ε> 0

there exists a sequence (Ui)i Æ of closed rectangles Ui which cover A such that ∞ v(Ui) <ε. ∈ i=1 Open rectangles can also be used in the definition. P

n Remark 9.2 (a) Any finite set a ,...,a Ê is of measure 0. Indeed, let ε > 0 and { 1 m} ⊂ choose U be a rectangle with midpoint a and volume ε/m. Then U i = 1,...,m covers i i { i | } A and v(U ) ε. i i ≤ (b) Any contable set is of measure 0. P (c) Any countable set has measure 0.

(d) If each (Ai)i Æ has measure 0 then A = A1 A2 has measure 0.

∈ ∪ ∪··· Æ Proof. Let ε > 0. Since A has measure 0 there exist closed rectangles U , i Æ, k , i ik ∈ ∈ such that for fixed i, the family U k Æ covers A , i.e. U A and ik i k Æ ik i i 1 { | ∈ } ∈ ⊇ v(U ) ε/2 − , i Æ. In this way we have constructed an infinite array U which k Æ ik ik ∈ ≤ ∈ S { } covers A. Arranging those sets in a sequence (cf. Cantor’s first diagonal process), we obtain a P sequence of rectangles which covers A and

∞ ∞ ε v(U ) =2ε. ik ≤ 2i 1 i=1 − i,kX=1 X Hence, ∞ v(U ) 2ε and A has measure 0. i,k=1 ik ≤ P (e) Let A = [a , b ] [a , b ] be a non-singular rectangle, that is a < b for all i =1,...,n. 1 1 ×···× n n i i Then A is not of measure 0. Indeed, we use the following two facts about the volume of finite unions of rectangles:

(a) v(U U ) n v(U ), 1 ∪···∪ n ≤ i=1 i (b) U V implies v(U) v(V ). ⊆ P≤

Now let ε = v(A)/2=(b1 a1) (bn an)/2 and suppose that the open rectangles (Ui)i Æ − ··· − ∈ cover the compact set A. Then there exists a finite subcover U U A. This and (a), 1 ∪···∪ m ⊇ (b) imply n n ∞ ε < v(A) v U v(U ) v(U ). ≤ i ≤ i ≤ i i=1 ! i=1 i=1 [ X X This contradicts v(U ) ε; thus, A has not measure 0. i i ≤ P Theorem 9.3 Let A be a closed rectangle and f : A Ê a bounded function. Let → B = x A f is discontinuous at x . { ∈ | } Then f is integrable if and only if B is a set of measure 0. For the proof see [Spi65, 3-8 Theorem] or [Rud76, Theorem 11.33]. 9.2 Integrable Functions 249

9.2.1 Integration over More General Sets We have so far dealt only with integrals of functions over rectangles. Integrals over other sets are easily reduced to this type. n If C Ê , the characteristic function χ of C is defined by ⊂ C 1, x C, χC (x)= ∈ (0, x C. 6∈

Definition 9.3 Let f : C Ê be bounded and A a rect- → angle, C A. We call f Riemann integrable on C if the ⊂

 product function f χC : A Ê is Riemann integrable    · →   on A. In this case we define       C     f dx = fχ dx.  C    C A   Z Z A  This certainly occurs if both f and χC are integrable on

A. Note, that C 1 dx = A χC dx =: v(C) is defined to be the volume or measure of C. R R

Problem: Under which conditions on C, the volume v(C) = C dx exists? By Theorem9.3 χ is integrable if and only if the set B of discontinuities of χ in A has measure 0. C C R

The boundary of a set C Let C A. For every x A exactly one of the ⊂ ∈ following three cases occurs:

  (a) x has a neighborhood which is completely     contained in C (x is an inner point of C),  C (b) x has a neighborhood which is completely contained in Cc (x is an inner point of Cc), (c) every neighborhood of x intersects both C δ C  c  and C . In this case we say, x belongs to the A  boundary ∂C of C. By definition ∂C = C Cc; ∩ also ∂C = C \ C◦. By the above discussion, A is the disjoint union of two open and a closed set:

c A = C◦ ∂C (C )◦. ∪ ∪

Theorem 9.4 The characteristic function χ : A Ê is integrable if and only if the boundary C → of C has measure 0. Proof. Since the boundary ∂C is closed and inside the bounded set, ∂C is compact. Suppose first x is an inner point of C. Then there is an open set U C containing x. Thus χ (x)=1 ⊂ C on x U; clearly χC is continuous at x (since it is locally constant). Similarly, if x is an inner ∈ c point of C , χC (x) is locally constant, namely χC = 0 in a neighborhood of x. Hence χC is 250 9 Integration of Functions of Several Variables continuous at x. Finally, if x is in the boundary of C for every open neighborhood U of x there is y U C and y U Cc, so that χ (y )=1 whereas χ (y )=0. Hence, χ is not 1 ∈ ∩ 2 ∈ ∩ C 1 C 2 C continuous at x. Thus, the set of discontinuity of χC is exactly the boundary ∂C. The rest follows from Theorem 9.3.

Definition 9.4 A bounded set C is called Jordan measurable or simply a Jordan set if its bound- ary has measure 0. The integral v(C)= C 1 dx is called the n-dimensional Jordan measure of C or the n-dimensional volume of C; sometimes we write µ(C) in place of v(C). R Naturally, the one-dimensional volume in the length, and the two-dimensional volume is the area.

Typical Examples of Jordan Measurable Sets

n 1 Hyper planes i=1 aixi = c, and, more general, hyper surfaces f(x1,...,xn)= c, f C (G)

n n ∈ Ê are sets with measure 0 in Ê . Curves in have measure 0. Graphs of functions

P n+1 n+1 Ê Γf = (x, f(x)) Ê x G , f continuous, are of measure 0 in . If G is a bounded

{ n ∈ | ∈ } n Ê region in Ê , the boundary ∂G has measure 0. If G is a region, the cylinder

n+1 ⊂ Ê C = ∂G Ê = (x, xn+1) x ∂G is a measure 0 set. × n+1 { | ∈ } ⊂ Let D Ê be given by ⊂ D = (x, x ) x K, 0 x f(x) , { n+1 | ∈ ≤ n+1 ≤ }

n Ê where K Ê is a compact set and f : K is continuous. Then D is Jordan f(x) ⊂ → measurable. Indeed, D is bounded by the graph Γf , the hyper plane xn+1 = 0

D

and the cylinder ∂D Ê = (x, x ) x ∂K and all have measure 0 in K n+1 n+1 × { | ∈ } Ê . δK

9.2.2 Fubini’s Theorem and Iterated Integrals

Our goal is to evaluate Riemann integrals; however, so far there was no method to compute multiple integrals. The following theorem fills this gap.

n m Ê Theorem 9.5 (Fubini’s Theorem) Let A Ê and B be closed rectangles, and let

⊂ ⊂ Ê f : A B Ê be integrable. For x A let g : B be defined by g (y)= f(x, y) and let × → ∈ x → x

L(x)= gx dy = f(x, y) dy, Z B Z B

U(x)= gx dy = f(x, y) dy. Z B Z B 9.2 Integrable Functions 251

Then L(x) and U(x) are integrable on A and

f dxdy = L(x) dx = f(x, y) dy dx, A B A A Z × Z Z Z B ! f dxdy = U(x) dx = f(x, y) dy dx. A B A A B Z × Z Z Z  The integrals on the right are called iterated integrals. The proof is in the appendix to this chapter.

Remarks 9.3 (a) A similar proof shows that we can exchange the order of integration

f dxdy = f(x, y) dx dy = f(x, y) dx dy. A B B B A Z × Z Z A ! Z Z  These integrals are called iterated integrals for f.

(b) In practice it is often the case that each gx is integrable so that A B f dxdy = f(x, y) dy dx. This certainly occurs if f is continuous. × A B R

(c) If A = [a , b ] [a , b ] and f : A Ê is continuous, we can apply Fubini’s theorem R R 1 1 ×···× n n → repeatedly to obtain

bn b1 f dx = f(x ,...,x ) dx dx . ··· 1 n 1 ··· n ZA Zan  Za1   (d) If C A B, Fubini’s theorem can be used to compute f dx since this is by definition ⊂ × C fχ dx. Here are two examples in case n =2 and n =3. A B C R Let×a < b and ϕ(x) and ψ(x) continuous real valued functions on R [a, b] with ϕ(x) < ψ(x) on [a, b]. Put ψ (x) 2

C = (x, y) Ê a x b, ϕ(x) y ψ(x) . { ∈ | ≤ ≤ ≤ ≤ } Let f(x, y) be continuous on C. Then f is integrable on C and φ (x)

b ψ(x) a f dxdy = f(x, y) dy dx. b a ϕ(x) ! ZZC Z Z Let

3

G = (x, y, z) Ê a x b, ϕ(x) y ψ(x), α(x, y) z β(x, y) , { ∈ | ≤ ≤ ≤ ≤ ≤ ≤ } where all functions are sufficiently nice. Then

b ψ(y) β(x,y) f(x, y, z) dxdydz = f(x, y, z) dz dy dx. a ϕ(x) α(x,y) ! ! ZZZG Z Z Z 3 (e) Cavalieri’s Principle. Let A and B be Jordan sets in Ê and let A = (x, y) (x,y,c) c { | ∈ A be the section of A with the plane z = c; Bc is defined similar. Suppose each Ac and Bc is

} 2 Ê Jordan measurable (in Ê ) and they have the same area v(A )= v(B ) for all c . c c ∈ Then A and B have the same volume v(A)= v(B). 252 9 Integration of Functions of Several Variables

y (1,1) Example 9.2 (a) Let f(x, y)= xy and

2 2

C = (x, y) Ê 0 x 1, x y x { ∈ | ≤ ≤ ≤ ≤ } 2 = (x, y) Ê 0 y 1, y x √y . x { ∈ | ≤ ≤ ≤ ≤ }

Then

1 x 1 xy2 x 1 1 xy dxdy = xy dy dx = dx = (x3 x5) dx 2 2 2 2 − ZZ Z0 Zx Z0 y=x Z0 C x4 x6 1 1 1 1 = = = . 8 − 12 8 − 12 24 0

Interchanging the order of integration we obtain

1 √y 1 x2y √y 1 1 xy dxdy = xy dx dy = = (y2 y3) dy 2 2 − ZZ Z0 Zy Z0 y Z0 C y3 y4 1 1 1 1 = = = . 6 − 8 6 − 8 24 0

3 3 (b) Let G = (x, y, z) Ê x, y, z 0, x + y + z 1 and f(x, y, z)=1/(x + y + z +1) . { ∈ | ≥ ≤ } The set G can be parametrized as follows

1 1 x 1 x y − − − dz f dxdydz = 3 dy dx 0 0 0 (1 + x + y + z) ZZZG Z Z Z   1 1 x 1 x y − 1 1 − − = − dy dx 2 (1 + x + y + z)2 Z0 Z0 0 ! 1 1 x − 1 1 1 = dy dx 2 (1 + x + y)2 − 8 Z0 Z0    1 1 1 x 3 1 5 = + − dx = log2 . 2 x +1 4 2 − 8 Z0    

(c) Let y/x and the above region. Compute (2,4) f(x, y) = e D the integral of f on D. D can be parametrized as follows D = (x, y) 1 x { | ≤ ≤ 2, x y 2x Hence, (1,2) ≤ ≤ } (2,2) 2 2x y f dxdy = dx e x dy (1,1) 1 x ZZD Z Z 2 2x 2 y 2 3 2 = dx xe x = (e x ex) dx = (e e). 1 x 1 − 2 − Z Z

9.3 Change of Variable 253

But trying to reverse the order of integration we encounter two problems. First, we must break D in several regions:

2 y 4 2 f dxdy = dy ey/x dx + dy ey/x dx. 1 1 2 y/2 ZZD Z Z Z Z

This is not a serious problem. A greater problem is that e1/x has no elementary antiderivative, so y y/x 2 y/x 1 e dx and y/2 e dx are very difficult to evaluate. In this example, there is a considerable advantage in one order of integration over the other. R R

The Cosmopolitan Integral

Let f 0 be a continuous real-valued function on [a, b]. y ≥ There are some very special solids whose volumes can f(x) be expressed by integrals. The simplest such solid G is a “volume of revolution” obtained by revolving the region under the graph of f 0 on [a, b] around the horizontal ≥ a b x axis. We apply Fubini’s theorem to the set

3 2 2 2

G = (x, y, z) Ê a x b, y + z f(x) . { ∈ | ≤ ≤ ≤ }

Consequently, the volume of v(G) is given by

b v(G)= dxdydz = dx dydz , (9.1) ZZZG Za ZZGx  2 2 2 2 where Gx = (y, z) Ê y + z f(x) is the closed disc of radius f(x) around (0, 0). { ∈ | ≤ } 2 For any fixed x [a, b] its area is v(Gx)= dydz = πf(x) . Hence ∈ Gx RR b v(G)= π f(x)2 dx. (9.2) Za Example 9.3 We compute the volume of the ellipsoid obtained by revolving the graph of the ellipse x2 y2 + =1 a2 b2 around the x-axis. We have y2 = f(x)2 = b2 1 x2 ; hence − a2   a 2 3 a 3 2 x 2 x 2 2a 4π 2 v(G)= πb 1 2 dx = πb x 2 = πb 2a 2 = b a. a − a − 3a a − 3a 3 Z−       −

9.3 Change of Variable

g(b) b We want to generalize change of variables formula g(a) f(x) dx = a f(g(y))g′(y) dy. R R 254 9 Integration of Functions of Several Variables

3 If BR is the ball in Ê with radius R around the origin we have in cartesian coordinates

R √R2 x2 √R2 x2 y2 − − − f dxdydz = dx dy dzf(x, y, z). R √R2 x2 √R2 x2 y2 ZZZBR Z− Z− − Z− − − Usually, the complicated limits yield hard computations. Here spherical coordinates are appro- priate. To motivate the formula consider the area of a parallelogram D in the x-y-plane spanned by the two vectors a =(a1, a2) and b =(b1, b2).

g (λ,µ) D = λa + µb λ,µ [0, 1] = 1 (λ,µ [0, 1] , { | ∈ } g (λ,µ) ∈  2  

where g1(λ,µ)= λa1 + µb1 and g2(λ,µ)= λa2 + µb2. As known from linear algebra the area of D equals the norm of the vector product

e1 e2 e3 v(D)= a b = det a a 0 = (0, 0, a b a b ) = a b a b =: d k × k  1 2  k 1 2 − 2 1 k | 1 2 − 2 1 | b b 0 1 2  

Introducing new variables λ and µ with

x = λa1 + µb1, y = λa2 + µb2, the parallelogram D in the x-y-plane is now the unit square C = [0, 1] [0, 1] in the λ-µ-plane × and D = g(C). We want to compare the area d of D with the area 1 of C. Note that d is exactly ∂(g1,g2) the absolute value of the Jacobian ∂(λ,µ) ; indeed

∂(g ,g ) ∂g1 ∂g1 a a 1 2 = det ∂λ ∂µ = det 1 2 = a b a b . ∂(λ,µ) ∂g2 ∂g2 b b 1 2 − 2 1 ∂λ ∂µ !  1 2  Hence, ∂(g ,g ) dxdy = 1 2 dλdµ. ∂(λ,µ) ZZ ZZ D C n This is true for any Ê and any regular map, g : C D. → n Theorem 9.6 (Change of variable) Let C and D be compact Jordan set in Ê ; let M C a ⊂ set of measure 0. Let g : C D be continuously differentiable with the following properties   →   (i) is injective on .  g C M  \           (ii) g′(x) is regular on C M.  g D, y \               fg f   C, x   Let f : D Ê be continuous. IR → Then ∂(g ,...,g ) f(y) dy = f(g(x)) 1 n (x) dx. (9.3) ∂(x ,...,x ) ZD ZC 1 n

9.3 Change of Variable 255

1 Remark 9.4 Why the absolute value of the Jacobian? In Ê we don’t have the absolute value. n b a But in contrast to Ê , n 1, we have an orientation of the integration set f dx = f dx. ≥ a − b For the proof see [Rud76, 10.9 Theorem]. The main steps of the proof are:R 1) In a smallR open set g can be written as the composition of n “flips” and n “primitive mappings”. A flip changes two variables xi and xk, wheras a primitive mapping H is equal to the identity except for one

variable, H(x)= x +(h(x) x)e where h: U Ê. − m → 2) If the statement is true for transformations S and T , then it is true for the composition S◦T which follows from det(AB) = det A det B. 3) Use a partition of unity.

Example 9.4 (a) Polar coordinates. Let A = (r, ϕ) 0 r R, 0 ϕ< 2π be a rectangle { | ≤ ≤ ≤ } in polar coordinates. The mapping g(r, ϕ)=(x, y), x = r cos ϕ, y = r sin ϕ maps this rectangle continuously differentiable onto the disc D with radius R. Let M = (r, ϕ) r =0 . ∂(x,y) { | } Since ∂(r,ϕ) = r, the map g is bijective and regular on A \ M. The assumptions of the theorem are satisfied and we have

f(x, y) dxdy = f(r cos ϕ,r sin ϕ)rdrdϕ

ZZD ZZA R 2π = f(r cos ϕ,r sin ϕ)r drdϕ. Fubini Z0 Z0 (b) Spherical coordinates. Recall from the exercise class the spherical coordinates r [0, ), ∈ ∞ ϕ [0, 2π], and ϑ [0, π] ∈ ∈ x = r sin ϑ cos ϕ, y = r sin ϑ sin ϕ, z = r cos ϑ. The Jacobian reads x x x sin ϑ cos ϕ r cos ϑ cos ϕ r sin ϑ sin ϕ ∂(x, y, z) r ϑ ϕ − = y y y = sin ϑ sin ϕ r cos ϑ sin ϕ r sin ϑ cos ϕ = r2 sin ϑ ∂(r, ϑ, ϕ) r ϑ ϕ z z z cos ϑ r sin ϑ 0 r ϑ ϕ − Sometimes one uses ∂(x, y, z) = r2 sin ϑ. ∂(r, ϕ, ϑ) − Hence 1 2π π f(x, y, z) dxdydz = f(x, y, z) sin2 ϑ dr dϕ dϑ. 0 0 0 ZZZB1 Z Z Z This example was not covered in the lecture. Compute the volume of the ellipsoid E given by u2/a2 + v2/b2 + w2/c2 =1. We use scaled spherical coordinates: u = ar sin ϑ cos ϕ, v = br sin ϑ sin ϕ, w = cr cos ϑ, 256 9 Integration of Functions of Several Variables where r [0, 1], ϑ [0, π], ϕ 0, 2π]. Since the rows of the spherical Jacobian matrix ∂(x,y,z) ∈ ∈ ∈ ∂(r,ϑ,ϕ) are simply multiplied by a, b, and c, respectively, we have

∂(u,v,w) = abcr2 sin ϑ. ∂(r, ϑ, ϕ)

Hence, if B1 is the unit ball around 0 we have using iterated integrals

v(E)= dudvdw = abc r2 sin ϑ drdϑdϕ

ZZZE ZZZB1 1 2π π = abc drr2 dϕ sin ϑdϑ Z0 Z0 Z0 1 4π = abc 2π ( cos ϑ) π = abc. 3 − |0 3

x 2 - y 2=1 2 2 2 x -y 2=4 (c) (x + y ) dxdy where C is bounded by the four hyperbolas

ZZC xy=2 xy =1, xy =2, x2 y2 =1, x2 y2 =4. xy=1 − − We change coordinates g(x, y)=(u, v)

u = xy, v = x2 y2. − The Jacobian is ∂(u, v) y x = = 2(x2 + y2). ∂(x, y) 2x 2y −

The Jacobian of the inverse transform is ∂(x, y) 1 = . ∂(u, v) −2(x2 + y2)

2 In the (u, v)-plane, the region is a rectangle D = (u, v) Ê 1 u 2, 1 v 4 . { ∈ | ≤ ≤ ≤ ≤ } Hence,

∂(x, y) x2 + y2 1 3 (x2 + y2) dxdy = (x2 + y2) dudv = dudv = v(D)= . ∂(u, v) 2(x2 + y2) 2 2 ZZ ZZ ZZ C D D

Physical Applications

3 If ̺(x)= ρ(x , x , x ) is a mass density of a solid C Ê , then 1 2 3 ⊂

m = ρ dx is the mass of C and ZC 1 x = x ρ(x) dx, i =1,..., 3 are the coordinates of the mass center x of C. i m i ZC 9.4 Appendix 257

The moments of inertia of C are defined as follows

2 2 2 2 2 2 Ixx= (y + z )ρ dxdydz,Iyy= (x + z )ρ dxdydz,Izz= (x + y )ρ dxdydz,

ZZZC ZZZC ZZZC

Ixy = xyρ dxdydz, Ixz = xzρ dxdydz, Iyz = yzρ dxdydz.

ZZZC ZZZC ZZZC

Here Ixx, Iyy, and Izz are the moments of inertia of the solid with respect to the x-axis, y-axis, and z-axis, respectively.

Example 9.5 Compute the mass center of a homogeneous half-plate of radius R, C = (x, y) { | x2 + y2 R2, y 0 . ≤ ≥ } Solution. By the symmetry of C with respect to the y-axis, x =0. Using polar coordinates we find

R π R 3 1 1 1 2 π 1 2R y = y dxdy = r sin ϕrdϕ dr = r dr ( cos ϕ) 0 = . m m 0 0 m 0 − | m 3 ZZC Z Z Z

R2 4R Since the mass is proportional to the area, m = π 2 and we find (0, 3π ) is the mass center of the half-plate.

9.4 Appendix

Proof of Fubini’s Theorem. Let PA be a partition of A and PB a partition of B. Together they give a partition P of A B for which any subrectangle S is of the form S S , where S is × A × B A a subrectangle of the partition PA, and SB is a subrectangle of the partition PB. Thus

L(P, f)= mS v(S)= mSA SB v(SA SB) × × S S ,S X XA B

= mSA SB v(SB) v(SA). × S S ! XA XB

Now, if x SA, then clearly mSA SB (f) mSB (gx) since the reference set SA SB on the ∈ × ≤ × left is bigger than the reference set x S on the right. Consequently, for x S we have { } × B ∈ A

mSA SB v(SB) mSB (gx) v(SB) gx dy = L(x) × ≤ ≤ B XSB XSB Z

mSA SB v(SB) mSA (L(x)). × ≤ S XB Therefore,

mSA SB v(SB) v(SA) mSA (L(x))v(SA)= L(PA, L). × ! ≤ XSA XSB XSA 258 9 Integration of Functions of Several Variables

We thus obtain

L(P, f) L(P , L) U(P , L) U(P , U) U(P, f), ≤ A ≤ A ≤ A ≤ where the proof of the last inequality is entirely analogous to the proof of the first. Since f is integrable, sup L(P, f) = inf U(P, f) = A B f dxdy. Hence, { } { } × R sup L(PA, L) = inf U(PA, L) = f dxdy. { } { } A B Z × L L In other words, (x) is integrable on A and A B f dxdy = A (x) dx. The assertion for U(x) follows similarly from the× inequalities R R L(P, f) L(P , L) L(P , U) U(P , U) U(P, f). ≤ A ≤ A ≤ A ≤ Chapter 10

Surface Integrals

3 10.1 Surfaces in Ê

n Recall that a domain G is an open and connected subset in Ê ; connected means that for any two points x and y in G, there exist points x0, x1,...,xk with x0 = x and xk = y such that every segment xi 1xi, i =1,...,k, is completely contained in G. −

2 3 Ê Definition 10.1 Let G Ê be a domain and F : G continuously differentiable. The ⊂ → mapping F as well as the set F = F (G) = F (s, t) (s, t) G is called an open regular { | ∈ } surface if the Jacobian matrix F ′(s, t) has rank 2 for all (s, t) G. ∈ If x(s, t) F (s, t)= y(s, t) ,   z(s, t)   the Jacobian matrix of F is

xs xt F ′(s, t)= y y .  s t zs zt   The two column vectors of F ′(s, t) span the tangent plane to F at (s, t):

∂x ∂y ∂z D F (s, t)= (s, t), (s, t), (s, t) , 1 ∂s ∂s ∂s   ∂x ∂y ∂z D F (s, t)= (s, t), (s, t), (s, t) . 2 ∂t ∂t ∂t   Justification: Suppose (s, t ) G where t is fixed. Then γ(s) = F (s, t ) defines a curve 0 ∈ 0 0 in F with tangent vector γ′(s) = D1F (s, t0). Similarly, for fixed s0 we obtain another curve

γ˜(t)= F (s0, t) with tangent vector γ˜′(t)= D2F (s0, t). Since F ′(s, t) has rank 2 at every point of G, the vectors D1F and D2F are linearly independent; hence they span a plane.

259 260 10 Surface Integrals

3 Definition 10.2 Let F : G Ê be an open regular surface, and (s , t ) G. Then → 0 0 ∈

~x = F (s , t )+ αD F (s , t )+ βD F (s , t ), α,β Ê 0 0 1 0 0 2 0 0 ∈ is called the tangent plane E to F at F (s0, t0). The line through F (s0, t0) which is orthogonal to E is called the normal line to F at F (s0, t0).

3 Recall that the vector product ~x ~y of vectors ~x =(x , x , x ) and ~y =(y ,y ,y ) from Ê is × 1 2 3 1 2 3 the vector

e1 e2 e3 ~x ~y = x x x =(x y y x , x y y x , x y y x ). × 1 2 3 2 3 − 2 3 3 1 − 3 1 1 2 − 1 2 y y y 1 2 3

It is orthogonal to the plane spanned by the parallelogram P with edges ~x and ~y. Its length is the area of the parallelogram P . A vector which points in the direction of the normal line is

e1 e2 e3 D F (s , t ) D F (s , t )= x y z (10.1) 1 0 0 × 2 0 0 s s s x y z t t t D F D F ~n = 1 × 2 , (10.2) ± D F D F k 1 × 2 k where ~n is the unit normal vector at (s0, t0).

Example 10.1 (Graph of a function) Let F be given by the graph of a function f : G Ê, → namely F (x, y)=(x, y, f(x, y)). By definition

D1F = (1, 0, fx),D2F = (0, 1, fy), hence

e1 e2 e3 D f D f = 1 0 f =( f , f , 1). 1 × 2 x − x − y 0 1 f y

Therefore, the tangent plane has the equation

f (x x ) f (y y )+1(z z )=0. − x − 0 − y − 0 − 0 Further, the unit normal vector to the tangent plane is

(f , f , 1) ~n = x y − . 2 2 ± fx + fy +1 p 3 10.1 Surfaces in Ê 261

10.1.1 The Area of a Surface

Let F and F be as above. We assume that the continuous vector fields D1F and D2F on G can be extended to continuous functions on the closure G.

Definition 10.3 The number

F = F := D F D F dsdt (10.3) | | k 1 × 2 k ZZ G is called the area of F and of F. We call dS = D F D F dsdt k 1 × 2 k the scalar surface element of F . In this notation, F = dS. | | G

Helix: (s*cos(t), s*cos(t), 2*t) RR

Example 10.2 Let F = (s cos t, s sin t, 2t) s { | ∈ 25 [0, 2], t [0, 4π] be the surfaced spanned by a helix. ∈ } 20 We shall compute its area. The normal vector is

15

10 D1F = (cos t, sin t, 0),D2F =( s sin t, s cos t, 2)

5 -2 − -1 0 0 1 -2 such that -1 0 2 1 2

e1 e2 e3 D F D F = cos t sin t 0 = (2 sin t, 2 cos t, s). 1 × 2 − s sin t s cos t 2

− Therefore,

4π 2 2 F = 4 cos2 t + 4 sin2 t + s2 dsdt =4π √4+ s2 ds =8π(√2 log(√2 1)). | | 0 0 0 − − Z Z p Z

Example 10.3 (Guldin’s Rule (Paul Guldin, 1577–1643, Swiss Mathematician)) Let f be a continuously differentiable function on [a, b] with f(x) 0 for all x [a, b]. Let the graph of ≥ ∈ f revolve around the x-axis and let F be the corresponding surface. We have

b 2 F =2π f(x) 1+ f ′(x) dx. | | a Z p Proof. Using polar coordinates in the y-z-plane, we obtain a parametrization of F

F = (x, f(x) cos ϕ, f(x) sin ϕ) x [a, b],ϕ [0, 2π] . { | ∈ ∈ } 262 10 Surface Integrals

We have

D F = (1, f ′(x) cos ϕ, f ′(x) sin ϕ),D F = (0, f sin ϕ, f cos ϕ), 1 1 − D F D F =(ff ′, f cos ϕ, f sin ϕ); 1 × 2 − −

2 so that dS = f(x) 1+ f ′(x) dxdϕ. Hence

pb 2π b 2 2 F = f(x) 1+ f ′(x) dϕ dx =2π f(x) 1+ f ′(x) dx. | | a 0 a Z Z p Z p

10.2 Scalar Surface Integrals

3 Ê Let F and F be as above, and f : F Ê a continuous function on the compact subset F . → ⊂ Definition 10.4 The number

f(~x) dS := f(F (s, t)) D F (s, t) D (s, t) dsdt k 1 × 2 k ZZF ZZG is called the scalar surface integral of f on F.

10.2.1 Other Forms for dS

(a) Let the surface F be given as the graph of a function F (x, y)=(x, y, f(x, y)), (x, y) G. ∈ Then, see Example10.1, 2 2 dS = 1+ fx + fy dxdy. q (b) Let the surface be given implicitly as G(x, y, z)=0. Suppose G is locally solvable for z in a neighborhood of some point (x0,y0, z0). Then the surface element (up to the sign) is given by

F 2 + F 2 + F 2 grad F dS = x y z dxdy = k k dxdy. F F p | z | | z | One checks that DF DF =(F , F , F )/F . 1 × 2 x y z z (c) If F is given by F (s, t)=(x(s, t),y(s, t), z(s, t)) we have

dS = √EG H2 dsdt, − where

2 2 2 2 2 2 E = xs + ys + zs , G = xt + yt + zt , H = xsxt + ysyt + zszt. 10.2 Scalar Surface Integrals 263

Indeed, using ~a ~b = ~a ~b sin ϕ and sin2 ϕ =1 cos2 ϕ, where ϕ is the angle spanned × k k − ~ by ~a and b we get

2 2 2 2 2 2 2 EG H = D1F D2F (D1F D2F ) = D1F D2F (1 cos ϕ) − k k k k − · k k k k − = D F 2 D F 2 sin2 ϕ = D F D F 2 k 1 k k 2 k k 1 × 2 k which proves the claim.

Example 10.4 (a) We give two different forms for the scalar surface element of a sphere. By (b), the sphere x2 + y2 + z2 = R2 has surface element

(2x, 2y, 2z) R dS = k k dxdy = dxdy. 2z z If

x = R cos ϕ sin ϑ, y = R sin ϕ sin ϑ, z = R cos ϑ, we obtain

D = F = R(cos ϕ cos ϑ, sin ϕ cos ϑ, sin ϑ), 1 ϑ − D = F = R( sin ϕ sin ϑ, cos ϕ sin ϑ, 0), 2 ϕ − D D = R2(cos ϕ sin2 ϑ, sin ϕ sin2 ϑ, sin ϑ cos ϑ). 1 × 2 Hence, dS = D D dϑdϕ = R2 sin ϑdϑdϕ. k 1 × 2k

3 3 Ê (b) Riemann integral in Ê and surface integral over spheres. Let M = (x, y, z) { ∈ |

ρ (x, y, z) R where R > ρ 0. Let f : M Ê be continuous. Let us denote the ≤ k k ≤ } ≥ 3 2 2 →2 2 sphere of radius r by S = (x, y, z) Ê x + y + z = r . Then r { ∈ | }

R R f dxdydz = dr f(~x) dS = r2 f(r~x) dS(~x) dr. ρ   ρ   ZZZM Z ZZSr Z ZZS1     Indeed, by the previous example, and by our knowledge of spherical coordinates (r, ϑ, ϕ).

2 dxdydz = r sin ϑ dr dϑ dϕ = dr dSr.

On the other hand, on the unit sphere S1, dS = sin ϑ dϑ dϕ such that

dxdydz = r2 dr dS which establishes the second formula. 264 10 Surface Integrals

10.2.2 Physical Application

(a) If ρ(x, y, z) is the mass density on a surface F, ρ dS is the total mass of F. The mass F center of F has coordinates (xc,yc, zc) with RR

1 x = x ρ(x, y, z) dS, c ρ dS F ZZF RR and similarly for yc and zc. (b) If σ(~x) is a charge density on a surface F. Then

σ(~x) U(~y)= dS(~x), ~y F ~y ~x 6∈ ZZF k − k is the potential generated by F.

10.3 Surface Integrals

10.3.1 Orientation

We want to define the notion of orientation for a regular surface. Let F be a regular (injective) surface with or without boundary. Then for every point x F there exists the tangent plane 0 ∈ Ex0 ; the normal line to F at x0 is uniquely defined. However, a unit vector on the normal line can have two different directions.

Definition 10.5 (a) Let F be a surface as above. A unit normal field to F is a continuous 3 function ~n: F Ê with the following two properties for every x F → 0 ∈

(i) ~n(x0) is orthogonal to the tangent plane to F at x0. (ii) ~n(x ) =1. k 0 k (b) A regular surface F is called orientable, if there exists a unit normal field on F.

Suppose F is an oriented, open, regular surface with piecewise smooth boundary ∂F. Let

F (s, t) be a parametrization of F. We assume that the vector functions F , DF1, and DF2 can be extended to continuous functions on F. The unit normal vector is given by

D F D F ~n = ε 1 × 2 , D F D F k 1 × 2 k where ε = +1 or ε = 1 fixes the orientation of F. It turns out that for a regular surface F − there either exists exactly two unit normal fields or there is no such field. If F is provided with an orientation we write F+ for the pair (F, ~n). For F with the opposite orientation, we write F . − 10.3 Surface Integrals 265

Examples of non-orientable surfaces are the Mobius¨ band and the real projective plane. Ana- lytically the M¨obius band is given by

1+ t cos s sin s 2 1 1 F (s, t)= 1+ t cos s cos s , (s, t) [0, 2π] , .  2   ∈ × −2 2 t sin s   2   ~ 3 Definition 10.6 Let f : F Ê be a continuous vector field on F. The number →

f~(~x) ~n dS (10.4) · ZZF+ is called the surface integral of the vector field f~ on F+. We call

d~S = ~n dS = ε D F D F dsdt 1 × 2 the surface element of F.

Remark 10.1 (a) The surface integral is independent of the parametrization of F but depends ~ ~ ~ ~ on the orientation; F f dS = F f dS. + − − For, let (s, t)=(s(ξ, η), t·(ξ, η)) be a new· parametrization with F (s(ξ, η), t(ξ, η)) = G(ξ, η). RR RR Then the Jacobian is ∂(s, t) dsdt = =(s t s t ) dξdη. ∂(ξ, η) ξ η − η ξ Further

D1G = D1Fsξ + D2F tξ,D2G = D1Fsη + D2F tη, so that using ~x ~x =0, ~x ~y = ~y ~x × × − × D G D Gdξdη =(D F s + D F t ) (D F s + D F t )dξdη, 1 × 2 1 ξ 2 ξ × 1 η 2 η =(s t s t )D F D F dξdη ξ η − η ξ 1 × 2 = D F D F dsdt. 1 × 2

(b) The scalar surface integral is a special case of the surface integral, namely f dS = f~n ~n dS. (c) Special cases.· Let F be the graph of a function f, F = (x, y, f(x, y)) (x, y) C , then RR RR { | ∈ } d~S = ( f , f , 1) dxdy. ± − x − y If the surface is given implicitly by F (x, y, z)=0 and it is locally solvable for z, then

grad F d~S = dxdy. ± Fz 266 10 Surface Integrals

(d) Still another form of d~S.

f~ d~S = ε f(F (s, t)) (D F D F ) dsdt · 1 × 2 ZZF ZZG

f1(F (s, t)) f2(F (s, t)) f3(F (s, t)) f~(F (s, t)) (D F D F )= x (s, t) y (s, t) z (s, t) . (10.5) · 1 × 2 s s s x (s, t) y (s, t) z (s, t) t t t

(e) Again another notation. Computing the previous determinant or the determinant (10.1) explicitely we have

y z z x x y ∂(y, z) ∂(z, x) ∂(x, y) f~ (D F D F )= f s s +f s s +f s s = f +f +f . · 1 × 2 1 y z 2 z x 3 x y 1 ∂(s, t) 2 ∂(s, t) 3 ∂(s, t) t t t t t t

Hence, ∂(y, z) ∂(z, x) ∂(x, y) d~S = εD F D F dsdt = ε dsdt, dsdt, dsdt 1 × 2 ∂(s, t) ∂(s, t) ∂(s, t)   d~S = ε ( dydz, dzdx, dxdy) .

Therefore we can write

f~ d~S = (f dydz + f dzdx + f dxdy) . · 1 2 3 ZZF ZZF In this setting ∂(y, z) f dydz = (f , 0, 0) d~S = f (F (s, t)) dsdt. 1 1 · ± 1 ∂(s, t) ZZF ZZF ZZG Sometimes one uses

d~S = (cos(~n, e1), cos(~n, e2), cos(~n, e3)) dS, ~ since cos(~n, ei)= ~n ei = ni and dS = ~n dS. Note that we have surface· integrals in the last two lines, not ordinary double integrals since F 3 is a surface in Ê and f1 = f1(x, y, z) can also depend on x. The physical meaning of f~ d~S is the flow of the vector field f~ through the surface F. The F · flow is (locally) positive if ~n and f~ are on the same side of the tangent plane to F and negative RR in the other case.

Example 10.5 (a) Compute the surface integral

f dzdx

ZZF+ of f(x, y, z) = x2yz where F is the graph of g(x, y) = x2 + y over the unit square G = [0, 1] [0, 1] with the downward directed unit normal field. × 10.3 Surface Integrals 267

By Remark 10.1 (c) d~S =(g ,g , 1) dxdy = (2x, 1, 1) dxdy. x y − − Hence

f dzdx = (0, f, 0) d~S · ZZF+ ZZF+ 1 1 19 =(R) x2y(x2 + y) dxdy = dx (x4y + x2y2) dy = . 0 0 90 ZZG Z Z

z 3 (b) Let G denote the upper half ball of radius R in Ê :

G = (x, y, z) x2 + y2 + z2 R2, z 0 , { | ≤ ≥ } and let F be the boundary of G with the orientation of the y R outer normal. Then F consists of the upper half sphere R F1 x

F = (x, y, R2 x2 y2) x2 + y2 R2 , z = g(x, y)= R2 x2 y2, 1 { − − | ≤ } − − p p with the upper orientation of the unit normal field and of the disc F2 in the x-y-plane

F = (x, y, 0) x2 + y2 R2 , z = g(x, y)=0, 2 { | ≤ } with the downward directed normal. Let f~(x, y, z)=(ax, by, cz). We want to compute

f~ d~S. · ZZF+

F ~ 1 By Remark 10.1 (c), the surface element of the half-sphere 1 is dS = z (x, y, z) dxdy. Hence

1 1 2 2 2 I1 = f~ d~S = (ax, by, cz) (x, y, z) dxdy = (ax + by + cz ) dxdy. · · z z z=g(x,y) FZZ ZZ ZZ 1+ BR BR

Using polar coordinates x = r cos ϕ, y = r sin ϕ, r [0, R], and z = R2 x2 y2 = ∈ − − √R2 r2 we get − p 2π R ar2 cos2 ϕ + br2 sin2 ϕ + c(R2 r2) I1 = dϕ − rdr. 2 2 0 0 √R r Z Z − 2π 2 2π 2 Noting 0 sin ϕdϕ = 0 cos ϕdϕ = π we continue

R R R 3 3 ar br 2 2 I1 = π + +2cr√R r dr. 2 2 2 2 0 √R r √R r − Z  − −  268 10 Surface Integrals

Using r = R sin t, dr = R cos t dt we have

R π 3 π r3 2 R3 sin tR cos t dt 2 2 dr = = R3 sin3 t dt = R3. 2 2 2 0 √R r 0 R 1 sin t 0 3 Z − Z − Z Hence, p

R 2π 3 2 2 1 2 I1 = R (a + b)+ πc (R r ) 2 d(r ) 3 0 − Z R 2π 3 2 2 2 3 = R (a + b)+ πc (R r ) 2 3 −3 −  0 2π = R3(a + b + c). 3

In case of the disc F2 we have z = f(x, y)=0, such that fx = fy =0 and

d~S = (0, 0, 1) dxdy − by Remark 10.1 (c). Hence

f~ d~S = (ax, by, cz) (0, 0, 1) dxdy = c z dxdy = 0. · · − − z=0 FZZ2+ ZZBR ZZBR Hence, 2π (ax, by, cz) d~S = R3(a + b + c). · 3 ZZF

10.4 Gauß’ Divergence Theorem

The aim is to generalize the fundamental theorem of calculus to higher dimensions:

b f ′(x) dx = f(b) f(a). − Za Note that a and b form the boundary of the segment [a, b]. There are three possibilities to do this

~ ~ 3 g dxdydz = f dS Gauß’ theorem in Ê , ⇒ G (∂G)+ · RRR RR ~ 2 g dxdy = f d~x Green’s theorem in Ê , ⇒ G (∂G)+ · RR ~g d~S = R f~ d~x Stokes’ theorem . F ⇒ + · (∂G)+ · RR R 3 Let G Ê be a bounded domain (open, connected) such that its boundary F = ∂G satisfies ⊂ the following assumptions:

1. F is a union of regular, orientable surfaces F . The parametrization F (s, t), (s, t) C , i i ∈ i of Fi as well as D1Fi and D2Fi are continuous vector functions on Ci; Ci is a domain in 2 Ê . 10.4 Gauß’ Divergence Theorem 269

2. Let Fi be oriented by the outer normal (with respect to G).

~ 3 3. There is given a continuously differentiable vector field f : G Ê on G (More → precisely, there exist an open set U G and a continuously differentiable function ˜ 3 ˜ ~ ⊃ f : U Ê such that f↾G = f.) → Theorem 10.1 (Gauß’ Divergence Theorem) Under the above assumptions we have

div f~ dxdydz = f~ d~S (10.6)

ZZZG ∂GZZ+ · Sometimes the theorem is called Gauß–Ostrogadski theorem or simply Ostrogadski theorem. Other writings:

∂f ∂f ∂f 1 + 2 + 3 dxdydz = (f dydz + f dzdx + f dxdy) (10.7) ∂x ∂y ∂z 1 2 3 ZZZG   ZZ∂G

3 The theorem holds for more general regions G Ê . ⊂ We give a proof for

G = (x, y, z) (x, y) C, α(x, y) z β(x, y) , β(x,y) { | ∈ ≤ ≤ } 2 1 where C Ê is a domain and α, β C (C) define reg- G ⊂ ∈ Proof. ular top and bottom surfaces F1 and F2 of F, respectively. α (x,y) We prove only one part of (10.7) namely f~ = (0, 0, f3).

    ∂f  3 C (10.8)  dxdydz = f3 dxdy.   ∂z   ZZZG ZZ∂G By Fubini’s theorem, the left side reads

∂f β(x,y) ∂f 3 dxdydz = 3 dz dxdy ∂x α(x,y) ∂z ! ZZZG ZZC Z = (f (x, y, β(x, y)) f (x, y, α(x, y))) dxdy, (10.9) 3 − 3 ZZC where the last equality is by the fundamental theorem of calculus. Now we are going to compute the surface integral. The outer normal for the top surface is ( β (x, y), β (x, y), 1) such that − x − y I = f dxdy = (0, 0, f ) ( β (x, y), β (x, y), 1) dxdy 1 3 3 · − x − y FZZ1+ ZZC

= f3(x, y, β(x, y)) dxdy.

ZZC 270 10 Surface Integrals

Since the bottom surface F is oriented downward, the outer normal is (α (x, y), α (x, y), 1) 2 x y − such that I = f dxdy = f (x, y, α(x, y)) dxdy. 2 3 − 3 FZZ2+ ZZC

Finally, the shell F3 is parametrized by an angle ϕ and z:

F = (r(ϕ) cos ϕ,r(ϕ) sin ϕ, z) α(x, y) z β(x, y),ϕ [0, 2π] . 3 { | ≤ ≤ ∈ }

Since D2F = (0, 0, 1), the normal vector is orthogonal to the z-axis, ~n =(n1, n2, 0). Therefore,

I = f dxdy = (0, 0, f ) (n , n , 0) dS =0. 3 3 3 · 1 2 FZZ3+ FZZ3+

Comparing I1 + I2 + I3 with (10.9) proves the theorem in this special case.

Remarks 10.2 (a) Gauß’ divergence theorem can be used to compute the volumeof the domain 3 G Ê . Suppose the boundary ∂G of G has the orientation of the outer normal. Then ⊂

v(G)= x dydz = y dzdx = z dxdy.

ZZ∂G ZZ∂G ZZ∂G

(b) Applying the mean value theorem to the left-hand side of Gauß’ formula we have for any bounded region G containing x0

~ ~ ~ ~ div f(x0 + h) dxdydz = div f(x0 + h)v(G)= f dS,

ZZZG ∂GZZ+ · where h is a small vector. The integral on the left is the volume v(G). Hence

~ 1 ~ ~ 1 ~ ~ div f(x0) = lim f dS = lim 4πε3 f dS, G x0 v(G) ε 0 → → 3 ∂GZZ+ · SεZZ(x0)+ · where the region G tends to x0. In the second formula, we have chosen G = Bε(x0) the open ball of radius ε with center x0. The right hand side can be thought as to be the source density of the field f~. In particular, the right side gives a basis independent description of div f~.

Example 10.6 We want to compute the surface integral from Example10.5 (b) using Gauß’ theorem:

2πR3 f~ d~S = div f dxdydz = (a + b + c) dxdydz = (a + b + c). 3 ZZF+ · ZZZC x2+y2+ZZZz2 R2,z 0 ≤ ≥ 10.4 Gauß’ Divergence Theorem 271

Gauß’ divergence theorem which play an important role in partial differential equations.

Recall (Proposition 7.9 (Prop. 8.9)) that the directional derivative of a function v : U Ê, n → U Ê , at x0 in the direction of the unit vector ~n is given by D~nf(x0) = grad f(x0) ~n. ⊂ 3 Notation. Let U Ê be open and F U be an oriented, regular open surface with· the unit ⊂ + ⊂ normal vector ~n(x ) at x F. Let g : U Ê be differentiable. 0 0 ∈ → Then ∂g (x0) = grad g(x0) ~n(x0) (10.10) ∂~n · is called the normal derivative of g on F+ at x0.

Proposition 10.2 Let G be a region as in Gauß’ theorem, the boundary ∂G is oriented with the outer normal, u, v are twice continuously differentiable on an open set U with G U. Then we ⊂ have Green’s identities: ∂v (u) (v) dxdydz = u dS u ∆(v) dxdydz, (10.11) ∇ ∇ ∂~n − ZZZG · ZZ∂G ZZZG ∂v ∂u (u∆(v) v∆(u)) dxdydz = u v dS, (10.12) − ∂~n − ∂~n ZZZG ZZ∂G   ∂u ∆(u) dxdydz = dS. (10.13) ∂~n ZZZG ZZ∂G Proof. Put f = u (v). Then by nabla calculus ∇ div f = (u v)= (u) (v)+ u ( v) ∇ ∇ ∇ · ∇ ∇· ∇ = grad u grad v + u∆(v). · Applying Gauß’ theorem, we obtain

div f dxdydz = ( grad u grad v) dxdydz + u∆(v) dxdydz

ZZZG ZZZG · ZZZG ∂v = u grad v ~n dS = u dS. · ∂~n ZZ∂G ZZ∂G This proves Green’s first identity. Changing the role of u and v and taking the difference, we obtain the second formula. Inserting v = 1 into (10.12) we get (10.13). −

Application to Laplace equation

Let u1 and u2 be functions on G with ∆u1 = ∆u2, which coincide on the boundary ∂G, u (x)= u (x) for all x ∂G. Then u u in G. 1 2 ∈ 1 ≡ 2 272 10 Surface Integrals

Proof. Put u = u u and apply Green’s first formula (10.11) to u = v. Note that ∆(u) = 1 − 2 ∆(u ) ∆(u )=0 (U is harmonic ini G) and u(x) = u (x) u (x)=0 on the boundary 1 − 2 1 − 2 x ∂G. In other words, a harmonic function is uniquely determined by its boundary values. ∈ ∂u (u) (u) dxdydz = u dS u ∆(u) dxdydz =0. ∇ ∇ ∂~n − · 0,x ∂G ZZZG ZZ∂G ∈ ZZZ 0 Since (u) (u) = (u) 2 0 and |{z}(u) 2 is a continuous function| {z } on G, by homework ∇ ∇ k∇ k ≥ k∇ k 14.3, (u)· 2 =0 on G; hence (u)=0 on G. By the Mean Value Theorem, Corollary7.12, k∇ k ∇ u is constant on G. Since u =0 on ∂G, u(x)=0 for all x G. ∈

10.5 Stokes’ Theorem

Roughly speaking, Stokes’ theorem relates a surface integral over a surface F with a line integral 2 over the boundary ∂F. In case of a plane surface in Ê , it is called Green’s theorem.

10.5.1 Green’s Theorem

Γ Γ 2 1  2  Let G be a domain in Ê with picewise smooth (differ-  Γ  3    entiable) boundaries Γ1, Γ2,...,Γk. We give an orienta-    tion to the boundary: the outer curve is oriented counter   clockwise (mathematical positive), the inner boundaries

G are oriented in the opposite direction.

Theorem 10.3 (Green’s Theorem) Let (P, Q) be a continuously differentiable vector field on G and let the boundary Γ = ∂G be oriented as above. Then ∂Q ∂P dxdy = P dx + Q dy. (10.14) ∂x − ∂y Γ ZZG   Z Proof. (a) First, we consider a region G of type 1 in the plane, as shown in the figure and we will prove that ∂P dxdy = P dx. (10.15) − ∂y Γ ZZG Z

ψ(x) The double integral on the left may be evaluated as an y Γ 4 iterated integral (Fubini’s theorem), we have Γ 3 b ψ(x) Γ ∂P 1 dxdy = Py(x, y) dy dx ϕ(x) ∂y a ϕ(x) ! ZZG Z Z Γ b 2 = (P (x, ψ(x)) P (x, ϕ(x)))dx. a b x − Za 10.5 Stokes’ Theorem 273

The latter equality is due to the fundamental theorem of calculus. To compute the line integral, we parametrize the four parts of Γ in a natural way:

Γ , ~x (t)=(a, t), t [ϕ(a), ψ(a)], dx =0, dy = dt, − 1 1 ∈ Γ , ~x (t)=(t, ϕ(t)), t [a, b], dx = dt, dy = ϕ′(t) dt, 2 2 ∈ Γ , ~x (t)=(b, t), t [ϕ(b), ψ(b)], dx =0, dy = dt, 3 3 ∈ Γ , ~x (t)=(t, ψ(t)), t [a, b], dx = dt, dy = ψ′(t) dt. − 4 4 ∈

Since dx =0 on Γ1 and Γ3 we are left with the line integrals over Γ2 and Γ4: b b P dx = P (t, ϕ(t)) dt P (t, ψ(t)) dt − ZΓ Za Za ∂Q Let us prove the second part, ∂x dxdy = Γ Q dy. Using Proposition7.24 we have − G RR R ψ(x) ∂ d ψ(x) Q(x, y) dy = Q(x, y) dy ψ′(x)Q(x, ψ(x)) + ϕ′(x)Q(x, ϕ(x)). ∂x dx − Zϕ(x) Zϕ(x) ! ∂Q b ψ(x) ∂Q Inserting this into ∂x dxdy = a ϕ(x) ∂x dy dx, we get G RR R R  ∂Q b d ψ(x) dxdy = Q(x, y) dy ψ′(x)Q(x, ψ(x)) + ϕ′(x)Q(x, ϕ(x)) dx ∂x a dx ϕ(x) ! − ! ZZG Z Z ψ(b) ψ(a) b = Q(b, y) dy Q(a, y) dy Q(x, ψ(x)) ψ′(x) dx+ (10.16) − − Zϕ(b) Zϕ(a) Za b + Q(x, ϕ(x)) ϕ′(x) dx. (10.17) Za We compute the line integrals:

ψ(a) ψ(b) Q dy = Q(a, y) dy, Q dy = Q(b, y) dy. − Γ1 − ϕ(a) Γ3 ϕ(b) Z− Z Z Z Further,

b b Q dy = Q(t, ϕ(t)) ϕ′(t) dt, Q dy = Q(t, ψ(t)) ψ′(t) dt. Γ2 a − Γ4 − a Z Z Z− Z Adding up these integrals and comparing the result with (10.17), the proof for type 1 regions is complete. Γ 1

Γ y 4 type 2    type 1      ϕ   (x) Γ ψ(x)   2  type 2     type 2  Γ    3   

x 274 10 Surface Integrals

Exactly in the same way, we can prove that if G is a type 2 region then (10.14) holds. (b) Breaking a region G up into smaller regions, each of which is both of type 1 and 2, Green’s theorem is valid for G. The line integrals along the inner boundary cancel leaving the line integral around the boundary of G. (c) If the region has a hole, one can split it into two simply connected regions, for which Green’s theorem is valid by the arguments of (b).

Application: Area of a Region

If Γ is a curve which bounds a region G, then the area of G is A = (1 α)x dy αy dx Γ − − where α Ê is arbitrary, in particular, ∈ R 1 A = x dy y dx = x dy = y dx. (10.18) 2 − − ZΓ ZΓ ZΓ Proof. Choosing Q = (1 α)x, P = αy one has − −

A = dxdy = ((1 α) ( α)) dxdy = (Qx Py) dxdy = P dx + Q dy − − − − Γ ZZG ZZG ZZG Z = α y dx + (1 α) x dy. − − ZΓ ZΓ 1 Inserting α =0, α =1, and α = 2 yields the assertion.

x2 y2 Example 10.7 Find the area bounded by the ellipse Γ : + = 1. We parametrize Γ by a2 b2 ~x(t)=(a cos t, b sin t), t [0, 2π], ~x˙(t)=( a sin t, b cos t). Then (10.18) gives ∈ − 1 2π 1 2π A = a cos t b sin t dt b sin t( a sin t) dt = ab dt = πab. 2 − − 2 Z0 Z0 10.5.2 Stokes’ Theorem

Conventions: Let F+ be a regular, oriented surface. Let Γ = ∂F+ be the boundary of F with the induced orientation: the orientation of the surface (normal vector) together with the orientation of the boundary form a right-oriented screw. A second way to get the induced orientation: sitting in the arrowhead of the unit normal vector to the surface, the boundary curve has counter clockwise orientation.

Theorem 10.4 (Stokes’ theorem) Let F+ be a smooth regular oriented surface with a parametrization F C2(G) and G is a plane region to which Green’s theorem applies. Let ∈ 10.5 Stokes’ Theorem 275

Γ = ∂F be the boundary with the above orientation. Further, let f~ be a continuously differen- tiable vector field on F. Then we have

curl f~ d~S = f~ d~x. (10.19) ∂F+ ZZF+ · Z ·

This can also be written as

∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1 dydz+ dzdx+ dxdy = f1 dx+f2 dy+f3 dz. ∂y − ∂z ∂z − ∂x ∂x − ∂y Γ ZZF+       Z

Proof. Main idea: Reduction to Green’s theorem. Since both sides of the equation are additive with respect to the vector field f~, it suffices to proof the statement for the vector fields (f1, 0, 0), (0, f2, 0), and (0, 0, f3). We show the theorem for f~ = (f, 0, 0), the other cases are quite analogous: ∂f ∂f dzdx dxdy = f dx. ∂z − ∂y ∂F ZZF   Z Let F (u, v), u, v G be the parametrization of the surface F. Then ∈ ∂x ∂x dx = du + dv, ∂u ∂v

∂x such that the line integral on the right reads with P (u, v)= f(x(u, v),y(u, v), z(u, v)) ∂u (u, v) ∂x and Q(u, v)= f ∂v .

∂P ∂Q f dx = f xu du + f xv dv = P du + Q dv = + du dv ′ ∂F ∂G ∂G Green s th. − ∂v ∂u Z Z Z ZZG   = (f x + f x )+(f x + f x ) du dv = f x + f x du dv − u u vu u v uv − v u u v ZZG ZZG = (f x + f y + f z )x +(f x + f y + f z )x du dv − x v y v z v u x u y u z u v ZZG = ( f (x y x y )+ f (z x z x )) du dv − y u v − v u z u v − v u ZZG ∂(x, y) ∂(z, x) = f + f du dv = f dxdy + f dzdx. − y ∂(u, v) z ∂(u, v) − y z ZZG   ZZF

This completes the proof.

Remark 10.3 (a) Green’s theorem is a special case with F = G 0 , ~n = (0, 0, 1) (orienta- ×{ } tion) and f~ =(P, Q, 0). 276 10 Surface Integrals

(b) The right side of (10.19) is called the circulation of the vector field f~ over the closed curve Γ . Now let ~x F be fixed and consider smaller and smaller neighborhoods F of ~x with 0 ∈ 0+ 0 boundaries Γ0. By Stokes’ theorem and by the Mean Value Theorem of integration,

f~d~x = curl f~ ~n dS = curl f~(x0)~n(x0)area(F0). Γ0 Z ZZF0 · Hence,

f~d~x ∂F0 curl f~(x0) ~n(x0) = lim . F0 x0 F · → R | 0 | We call curl f~(x ) ~n(x ) the infinitesimal circulation of the vector field f~ at x corresponding 0 · 0 0 to the unit normal vector ~n. (c) Stokes’ theorem then says that the integral over the infinitesimal circulation of a vector field f~ corresponding to the unit normal vector ~n over F equals the circulation of the vector field along the boundary of F.

Path Independence of Line Integrals

We are going complete the proof of Proposition8.3 and show that for a

3 simply connected region G Ê and ⊂ a twice continuously differentiable vector field f~ with curl f~ =0 for all x G ∈ the vector field f~ is conservative. Proof. Indeed, let Γ be a closed, regular, piecewise differentiable curve Γ G and let Γ be the ⊂ the boundary of a smooth regular oriented surface F+, Γ = ∂F+ such that Γ has the induced orientation. Inserting curl f~ =0 into Stokes’ theorem gives

curl f d~S =0= f~ d~x; Γ ZZF+ · Z · the line integral is path independent and hence, f~ is conservative. Note that the region must be simply connected; otherwise its in general impossible to find F with boundary Γ .

10.5.3 Vector Potential and the Inverse Problem of Vector Analysis

~ 3 Let f be a continuously differentiable vector field on the simply connected region G Ê . ⊂ Definition 10.7 The vector field f~ on G is called a source-free field (solenoidal field) if there exists a vector field ~g on G with f~ = curl~g. Then ~g is called the vector potential to f~.

Theorem 10.5 f~ is source-free if and only if div f~ =0. 10.5 Stokes’ Theorem 277

Proof. (a) If f~ = curl~g then div f~ = div ( curl~g)=0. (b) To simplify notations, we skip the arrows. We explicitly construct a vector potential g to f with g =(g1,g2, 0) and curl g = f. This means ∂g f = 2 , 1 − ∂z ∂g f = 1 , 2 ∂z ∂g ∂g f = 2 1 . 3 ∂x − ∂y Integrating the first two equations, we obtain z g2 = f1(x, y, t) dt + h(x, y), − z0 Zz g1 = f2(x, y, t) dt, Zz0 where h(x, y) is the integration constant, not depending on z. Inserting this into the third equa- tion, we obtain z z ∂g2 ∂g1 ∂f1 ∂f2 = (x, y, t) dt + hx(x, y) (x, y, t) dt ∂x − ∂y − z0 ∂x − z0 ∂y Z z Z ∂f1 ∂f2 = + dt + hx − z0 ∂x ∂y Z z  ∂f3 = (x, y, t) dt + hx div f=0 ∂z Zz0 f (x, y, z)= f (x, y, z) f (x, y, z )+ h (x, y). 3 3 − 3 0 x

This yields, hx(x, y) = f3(x, y, z0). Integration with respect to x finally gives h(x, y)= x f (t, y, z ) dt; the third equation is satisfied and curl g = f. x0 3 0 R

Remarks 10.4 (a) The proof of second direction is a constructive one; you can use this method to calculate a vector potential explicitly. You can also try another ansatz, say g = (0,g2,g3) or g =(g1, 0,g3). (b) If g is a vector potential for f and U C2(G), then g˜ = g + grad U is also a vector potential ∈ for f. Indeed curlg ˜ = curl g + curl grad U = f.

The Inverse Problem of Vector Analysis

Let h be a function and ~a be a vector field on G; both continuously differentiable. Problem: Does there exist a vector field f~ such that

div f~ = h and curl f~ = ~a. 278 10 Surface Integrals

Proposition 10.6 The above problem has a solution if and only if div ~a =0. Proof. The condition is necessary since div ~a = div curl f~ = 0. We skip the vector arrows. For the other direction we use the ansatz f = r + s with

curl r =0, div r = h, (10.20) curl s = a, div s =0. (10.21)

Since curl r = 0, by Proposition8.3 there exists a potential U with r = grad U. Then curl r = 0 and div r = div grad U = ∆(U). Hence (10.20) is satisfied if and only if r = grad U and ∆(U)= h. Since div a =0 by asssumption, there exists a vector potential g such that curl g = a. Let ϕ be twice continuously differentiable on G and set s = g + grad ϕ. Then curl s = curl g = a and div s = div g + div grad ϕ = div g + ∆(ϕ). Hence, div s =0 if and only if ∆(ϕ)= div g. − Both equations ∆(U) = h and ∆(ϕ) = div g are so called Poisson equations which can be − solved within the theory of partial differential equations (PDE).

The inverse problem has not a unique solution. Choose a harmonic function ψ, ∆(ψ)=0 and put f1 = f + grad ψ. Then

div f1 = div f + div grad ψ = div f + ∆(ψ) = div f = h,

curl f1 = curl f + curl grad ψ = curl f = a. Chapter 11

n Differential Forms on Ê

We show that Gauß’, Green’s and Stokes’ theorems are three cases of a “general” theorem which is also named after Stokes. The simple formula now reads c dω = ∂c ω. The appearance of the Jacobian in the change of variable theorem will become clear. We formulate the Poincar´e R R lemma. Good references are [Spi65], [AF01], and [vW81].

n 11.1 The Exterior Algebra Λ(Ê )

Although we are working with the ground field Ê all constructions make sense for arbitrary

n n

à C Ê Ê fields Ã, in particular, = . Let e ,..., e be the standard basis of ; for h we { 1 n} ∈ write h =(h1,...,hn) with respect to the standard basis, h = i hiei. P

11.1.1 The Dual Vector Space V ∗

The interplay between a normed space E and its dual space E′ forms the basis of functional analysis. We start with the definition of the (algebraic) dual.

Definition 11.1 Let V be a linear space. The dual vector space V ∗ to V is the set of all linear

functionals f : V Ê, →

V ∗ = f : V Ê f is linear . { → | }

It turns out that V ∗ is again a linear space if we introduce addition and scalar multiples in the

natural way. For f,g V ∗, α Ê put ∈ ∈ (f + g)(v) := f(v)+ g(v), (αf)(v) := αf(v).

The evaluation of f V ∗ on v V is sometimes denoted by ∈ ∈

f(v)= f , v Ã. h i ∈ 279 n 280 11 Differential Forms on Ê

In this case, the brackets denote the dual pairing between V ∗ and V . By definition, the pairing

is linear in both components. That is, for all v,w V and for all λ, µ Ê ∈ ∈ λf + µg, v = λ f , v + µ g , v , h i h i h i f, λv + µw = λ f , v + µ f,w . h i h i h i

n Example 11.1 (a) Let V = Ê with the above standard basis. For i = 1,...,n define the ith

n Ê coordinate functional dx : Ê by i →

n

dx (h) = dx (h ,...,h )= h , h Ê . i i 1 n i ∈

n The functional dxi associates to each vector h Ê its ith coordinate hi. The functional dxi is

n ∈ Ê indeed linear since for all v,w Ê and α, β , dx (αv+βw)=(αv+βw) = αv +βw = ∈ ∈ i i i i α dxi(v)+ β dxi(w). n The linear space (Ê )∗ has also dimension n. We will show that dx1, dx2,..., dxn is a n { } basis of (Ê )∗. We call it the dual basis of to e ,..., e . Using the Kronecker symbol the { 1 n} evaluation of dxi on ej reads as follows

dxi(ej)= δij, i, j =1,...,n.

n dx , dx ,..., dx generates V ∗. Indeed, let f V ∗. Then f = f(e ) dx since both { 1 2 n} ∈ i=1 i i coincide for all h V : ∈ P n n

f(ei) dxi(h)= f(ei) hi = f(hiei)= f hiei = f(h). f homog. i=1 i=1 i i ! X X X X In Propostition11.1 below, we will see that dx ,..., dx is not only generating but linearly { 1 n} independent. (b) If V = C([0, 1]), the continuous functions on [0, 1] and α is an increasing on [0, 1] function, then the Riemann-Stieltjes integral

1 ϕ (f)= f dα, f V α ∈ Z0 defines a linear functional ϕα on V . If a [0, 1], ∈ δ (f)= f(a), f V a ∈ defines the evaluation functional of f at a. In case a =0 this is Dirac’s δ-functional playing an important role in the theory of distributions (generalized functions).

n n n n

Ê Ê (c) Let a Ê . Then a , x = i=1 aixi, x defines a linear functional on . By (a) ∈ h i ∈ n this is already the most general form of a linear functional on Ê . P n 11.1 The Exterior Algebra Λ(Ê ) 281

Definition 11.2 Let k Æ. An alternating (or skew-symmetric) multilinear form of degree

n ∈ n n n

Ê Ê Ê Ê k on Ê , a k-form for short, is a mapping ω : , k factors , which is ×···× → multilinear and skew-symmetric, i.e.

MULT ω( , αv + βw , )= αω( , v , )+ βω( ,w , ), (11.1) ··· i i ··· ··· i ··· ··· i ··· SKEW ω( , v , , v , )= ω( , v , , v , ), i, j =1,...,k,i = j, (11.2) ··· i ··· j ··· − ··· j ··· i ··· 6 n for all vectors v , v ,...,v ,w Ê . 1 2 k i ∈

n k n 0 n

Ê Ê Ê We denote the linear space of all k-forms on Ê by Λ ( ) with the convention Λ ( )= .

1 n n Ê In case k =1 property (11.2) is an empty condition such that Λ (Ê )=( )∗ is just the dual space.

n n Ê Let f1,...,fk (Ê )∗ be linear functionals on . Then we define the k-form k∈ n f f Λ (Ê ) (read: “f wedge f ...wedge f ”) as follows 1∧···∧ k ∈ 1 2 k f (h ) f (h ) 1 1 ··· 1 k . . f1 fk(h1,...,hk)= . . (11.3) ∧···∧ f (h ) f (h ) k 1 ··· k k

In particular, let i1,...,ik 1,...,n be fixed and choose fj = dxij , j =1,...,k. Then ∈{ } h h 1i1 ··· ki1 . . dxi1 dxik (h1,...,hk)= . . ∧···∧ h h 1ik ··· kik f f is indeed a k-form since the f are linear, the determinant is multilinear 1∧···∧ k i

λa + µa′ b c a b c a′ b c λd + µd e f = λ d e f + µ d e f , ′ ′ λg + µg h i g h i g h i ′ ′

and skew-symmetric a b c b a c d e f = e d f . − g h i h g i

n For example, let y =(y ,...,y ), z =(z ,...,z ) Ê , 1 n 1 n ∈ y z dx dx (y, z)= 3 3 = y z y z . 3∧ 1 y z 3 1 − 1 3 1 1

If fr = fs = f for some r = s, we have f1 f f fk = 0 since determinants 6 ∧···∧k n∧···∧ ∧···∧ with identical rows vanish. Also, for any ω Λ (Ê ), ∈ n

ω(h ,...,h,...,h,...,h )=0, h ,...,h , h Ê 1 k 1 k ∈ since the defining determinant has two identical columns. n 282 11 Differential Forms on Ê

Proposition 11.1 For k n the k-forms dxi1 dxik 1 i1 < i2 < < ik n ≤ k n { ∧···∧ | ≤ ··· ≤ } form a basis of the vector space Λ (Ê ).A k-form with k > n is identically zero. We have

k n n

dim Λ (Ê )= . k  

Proof. Any k-form ω is uniquely determined by its values on the k-tuple of vectors (ei1 ,..., eik ) with 1 i < i < < i n. Indeed, using skew-symmetry of ω, we know ω on all k- ≤ 1 2 ··· k ≤ tuples of basis vectors; using linearity in each component, we get ω on all k-tuples of vectors.

This shows that the dxi1 dxik with 1 i1 < i2 < < ik n generate the linear space k n ∧···∧ ≤ ··· ≤ Λ (Ê ). We make this precise in case k = 2. With y = i yiei, z = j zjej we have by linearity and skew-symmetry of ω P P n

ω(y, z)= yizjω(ei, ej) = (yizj yjzi)ω(ei, ej) SKEW − i,j=1 1 i

Hence, ω = ω(e , e ) dx dx . i j i∧ j i

We show its linear independence. Suppose that α dx dx = 0 for some α Ê.  i

k n Ê In general, let ω Λ (Ê ) then there exist unique numbers ai1 i = ω(ei1 ,..., ei ) , ∈ ··· k k ∈ i < i < < i such that 1 2 ··· k

ω = ai1 i dxi1 dxi . ··· k ∧···∧ k 1 i1<

Definition 11.3 An algebra A over Ê is a linear space together with a product map (a, b) ab, →

A A A, such that the following holds for all a, b, c A an d α Ê × → ∈ ∈ n 11.1 The Exterior Algebra Λ(Ê ) 283

(i) a(bc)=(ab)c (associative), (ii) (a + b)c = ac + bc, a(b + c)= ab + ac, (iii) α(ab)=(αa)b = a(αb).

n n

Standard examples are C(X), the continuous functions on a metric space X or Ê × , the full Ê n n-matrix algebra over Ê or the algebra of polynomials [X]. × n

n k n Ê Let Λ(Ê )= Λ ( ) be the direct sum of linear spaces. Mk=0

n Ê Proposition 11.2 (i) Λ(Ê ) is an -algebra with unity 1 and product defined by ∧ ( dx dx ) ( dx dx ) = dx dx dx dx i1 ∧···∧ ik ∧ j1 ∧···∧ jl i1 ∧···∧ ik ∧ j1 ∧···∧ jl

k n l n k+l n

Ê Ê (ii) If ω Λ (Ê ) and ω Λ ( ) then ω ω Λ ( ) and k ∈ l ∈ k∧ l ∈ ω ω =( 1)klω ω . k∧ l − l∧ k Proof. (i) Associativity is clear since concatanation of strings is associative. The distributive n laws are used to extend multiplication from the basis to the entire space Λ(Ê ). We show (ii) for ω = dx dx and ω = dx dx . We already know k i1 ∧···∧ ik l j1 ∧···∧ jl dx dx = dx dx . There are kl transpositions dx dx necessary to transport all i∧ j − j ∧ i ir ↔ js dx from the right to the left of ω . Hence the sign is ( 1)kl. js k −

In particular, dxi dxi = 0. The formula dxi dxj = dxj dxi determines the product in n ∧ ∧ − ∧ Λ(Ê ) uniquely.

n n Ê We call Λ(Ê ) is the exterior algebra of the vector space .

k n l n Ê The following formula will be used in the next subsection. Let ω Λ (Ê ) and η Λ ( ) n ∈ ∈ then for all v ,...,v Ê 1 k+l ∈ 1 (ω η)(v ,...,v )= ( 1)σω(v ,...,v ) η(v ,...,v ). (11.4) ∧ 1 k+l k!l! − σ(1) σ(k) σ(k+1) σ(k+l) σ S ∈Xk+l n Indeed, let ω = f f , η = f f , f (Ê )∗. The above formula can be 1∧···∧ k k+1∧···∧ k+l i ∈ obtained from

(f f ) (f f )(v ,...,v )= 1∧···∧ k ∧ k+1∧···∧ k+l 1 k+l f (v ) f (v ) f (v ) f (v ) 1 1 ··· 1 k 1 k+1 ··· 1 k+l f (v ) f (v ) f (v ) f (v ) 2 1 ··· 2 k 2 k+1 ··· 2 k+l ......

= f (v ) f (v ) f (v ) f (v ) k 1 ··· k k k k+1 ··· k k+l f (v ) f (v ) f (v ) f (v ) k+1 1 k+1 k k+1 k+1 k+1 k+l . ··· . . ··· . . .

f (v ) f (v ) f (v ) f (v ) k+l 1 k+l k k+l k+1 k+l k+l ··· ···

n 284 11 Differential Forms on Ê when expanding this determinant with respect to the last l rows. This can be done using Laplace expansion:

ai1j1 ai1jk aik+1jk+1 aik+1jk+l k . ··· . . ··· . A = ( 1)Pm=1(im+jm) . . . . , | | − 1 j1<

where (i1,...,ik) is any fixed orderd multi-index and (jk+1 ,...,j k+l) is the complementary orderd multi-index to (j1,...,jk) such that all inegers 1, 2,...,k + l appear.

11.1.2 The Pull-Back of k-forms

n m k m

Ê Æ Ê Definition 11.4 Let A L(Ê , ) a linear mapping and k . For ω Λ ( ) we define k ∈n ∈ ∈ a k-form A∗(ω) Λ (Ê ) by ∈ n

(A∗ω)(h ,...,h )= ω(A(h ), A(h ),...,A(h )), h ,...,h Ê 1 k 1 2 k 1 k ∈

We call A∗(ω) the pull-back of ω under A.

k m k n Ê Note that A∗ L(Λ (Ê ),Λ ( )) is a linear mapping. In case k = 1 we call A∗ the dual ∈ mapping to A. In case k = 0, ω Ê we simply set A∗(ω) = ω. We have A∗(ω η) =

k ∈n l n n ∧

Ê Ê A∗(ω) A∗(η). Indeed, let ω Λ (Ê ), η Λ ( ), and h , i = 1,...,k + l, then by ∧ ∈ ∈ i ∈ (11.4)

A∗(ω η)(h , . . . ,h )=(ω η)(A(h ),...,A(h )) ∧ 1 k+l ∧ 1 k+l 1 = ( 1)σω(A(v ),...,A(v )) η(A(v ),...,A(v )) k!l! − σ(1) σ(k) σ(k+1) σ(k+l) σ S ∈Xk+l 1 σ = ( 1) A∗(ω)(v ,...,v ) A∗(η)(v ,...,v ) k!l! − σ(1) σ(k) σ(k+1) σ(k+l) σ S ∈Xk+l =(A∗(ω) A∗(η))(h ,...,h ). ∧ 1 k+l

1 0 3 2 3 3 2

Ê Ê Example 11.3 (a) Let A = Ê × be a linear map A: , defined by 2 1 0 ∈ →   3 matrix multiplication, A(v)= A v, v Ê . Let e1, e2, e3 and f1, f2 be the standard bases

3 2 ∈ { } { } Ê of Ê and resp. and let dx ,· dx , dx and dy , dy their dual bases, respectvely. { 1 2 3} { 1 2} First we compute A∗( dy1) and A∗( dy2).

A∗( dy1)(ei) = dy1(A(ei)) = ai1, A∗( dy2)(ei)= ai2.

In particular,

A∗( dy1)=1dx1 + 0 dx2 + 3 dx3, A∗( dy2)=2dx1 + dx2.

Compute A∗( dy dy ). By definition, for 1 i < j 3, 2 ∧ 1 ≤ ≤

ai2 aj2 A∗( dy dy )(e , e ) = dy dy (A(e ), A(e ))= dy dy (A , A )= . 2∧ 1 i j 2∧ 1 i j 2∧ 1 i j a a j2 j1

11.2 Differential Forms 285

In particular

2 1 2 0 A∗( dy dy )(e , e )= = 1, A∗( dy dy )(e , e )= =6, 2∧ 1 1 2 1 0 − 2∧ 1 1 3 1 3

1 0 A∗( dy2 dy1)(e2, e3)= =3. ∧ 3 0

Hence, A∗( dy dy )= dx dx + 6 dx dx + 3 dx dx . 2∧ 1 − 1∧ 2 1∧ 3 2∧ 3 On the other hand

A∗( dy ) A∗( dy )=(2dx + dx ) ( dx +3 dx )= dx dx +3 dx dx +6 dx dx . 2 ∧ 1 1 2 ∧ 1 3 − 1∧ 2 1∧ 3 1∧ 3

n n n n n n

Ê Ê Ê (b) Let A Ê × , A: and ω = dx dx Λ ( ). Then ∈ → 1∧···∧ n ∈ A∗(ω) = det(A) ω.

n 11.1.3 Orientation of Ê

n If e ,...,e and f ,...,f are two bases of Ê there exists a unique regular matrix A = { 1 n} { 1 n} (a ) (det A = 0) such that e = a f . We say that e ,...,e and f ,...,f are ij 6 i j ij j { 1 n} { 1 n} equivalent if and only if det A> 0. Since det A =0, there are exactly two equivalence classes. P 6 We say that the two bases e i =1,...,n and f i =1,...,n define the same orientation { i | } { i | } if and only if det A> 0.

n Definition 11.5 An orientation of Ê is given by fixing one of the two equivalence classes.

2 Example 11.4 (a) In Ê the bases e , e and e , e have different orientations since A = { 1 2} { 2 1} 0 1 and det A = 1. 1 0 −   3 (b) In Ê the bases e , e , e , e , e , e and e , e , e have the same orientation whereas { 1 2 3} { 3 1 2} { 2 3 1} e , e , e , e , e , e , and e , e , e have opposite orientation. { 1 3 2} { 2 1 3} { 3 2 1} (c) The standard basis e ,..., e and e , e , e ,..., e define different orientations. { 1 n} { 2 1 3 n}

11.2 Differential Forms

11.2.1 Definition

n Throughout this section let U Ê be an open and connected set. ⊂

k n Definition 11.6 (a) A differential k-form on U is a mapping ω : U Λ (Ê ), i. e. to every k n → point p U we associate a k-form ω(p) Λ (Ê ). The linear space of differential k-forms on ∈ ∈ U is denoted by Ωk(U). n 286 11 Differential Forms on Ê

(b) Let ω be a differential k-form on U. Since dxi1 dxik 1 i1 < i2 < < ik n k n { ∧···∧ | ≤ ··· ≤ } forms a basis of Λ (Ê ) there exist uniquely determined functions ai1 i on U such that ··· k

ω(p)= ai1 i (p) dxi1 dxi . (11.5) ··· k ∧···∧ k 1 i1<

ω = x2 dy dz + xyz dx dy ω =(xy2 +3z2) dx 1 ∧ ∧ 2

3 2 3 1 3

Ê Ê define a differential 2-form and a 1-form on Ê , ω Ω ( ),ω Ω ( ) then ω ω = 1 ∈ 2 ∈ 1∧ 2 (x3y2 +3x2z2) dx dy dz. ∧ ∧

11.2.2 Differentiation

Definition 11.7 Let f Ω0(U) = Cr(U) and p U. We define ∈ ∈ df(p)= Df(p); then df is a differential 1-form on U.

If ω(p)= ai1 i (p) dxi1 dxi is a differential k-form, we define ··· k ∧···∧ k 1 i1<

dω(p)= dai1 i (p) dxi1 dxi . (11.6) ··· k ∧ ∧···∧ k 1 i1<

n 1 n

Ê Ê Ê Remarks 11.1 (a) Note, that for a function f : U Ê, Df L( , ) = Λ ( ). By → ∈ Example7.7 (a)

n ∂f n ∂f Df(x)(h) = grad f(x) h = (x)h = (x) dx (h), · ∂x i ∂x i i=1 i i=1 i X X hence

n ∂f df(x)= (x) dx . (11.7) ∂x i i=1 i X 11.2 Differential Forms 287

Viewing x : U Ê as a C∞-function, by the above formula i →

dxi(x) = dxi.

This justifies the notation dx . If f C∞(Ê) we have df(x)= f ′(x) dx. i ∈ (b) One can show that the definition of dω does not depend on the choice of the basis 1 n dx ,..., dx of Λ (Ê ). { 1 n}

2 xy 3 Example 11.5 (a) G = Ê , ω = e dx + xy dy. Then

dω = d(exy) dx + d(xy3) dy ∧ ∧ =(yexy dx + xexy dy) dx +(y3 dx +3xy2 dy) dy ∧ ∧ =( xexy + y3) dx dy. − ∧ (b) Let f be continuously differentiable. Then

df = fx dx + fy dy + fz dz = grad f ( dx, dy, dz) = grad f . d~x. · ·

1 (c) Let v =(v1, v2, v3) be a C -vector field. Put ω = v1 dx + v2 dy + v3 dz. Then we have

∂v ∂v ∂v ∂v ∂v ∂v dω = 3 2 dy dz + 1 3 dz dx + 2 1 dx dy ∂y − ∂z ∧ ∂z − ∂x ∧ ∂x − ∂y ∧       = curl (v) ( dy dz, dz dx, dx dy)= curl(v) dS. · ∧ ∧ ∧ ·

(d) Let v be as above. Put ω = v dy dz + v dz dx + v dx dy. Then we have 1 ∧ 2 ∧ 3 ∧ dω = div (v) dx dy dz. ∧ ∧ Proposition 11.3 The exterior differential d is a linear mapping which satisfies

(i) d(ω η) = dω η +( 1)kω dη, ω Ωk(U), η Ω (U). ∧ ∧ − ∧ ∈ 1 ∈ 1 (ii) d(dω)=0, ω Ω (U). ∈ 2 Proof. (i) We first prove Leibniz’ rule for functions f,g Ω0(U). By Remarks 11.1 (a), ∈ 1 ∂ ∂f ∂g d(fg)= (fg) dx = g + f dx ∂x i ∂x ∂x i i i i i i X X   ∂f ∂g = dx g + dx f = dfg + fdg. ∂x i ∂x i i i i i X X

For I = (i ,...,i ) and J = (j ,...,j ) we abbreviate dx = dx dx and dx = 1 k 1 l I i1 ∧···∧ ik J n 288 11 Differential Forms on Ê dx dx . Let ω = a dx and η = b dx . By definition j1 ∧···∧ jl I I J J XI XJ

d(ω η) = d a b dx dx = d(a b ) dx dx ∧ I J I ∧ J I j ∧ I ∧ J I,J ! I,J X X = (da b + a db ) dx dx I J I J ∧ I ∧ J XI,J = da dx b dx + a dx db dx ( 1)k I ∧ I ∧ J J I I ∧ J ∧ J − I,J I,J X X = dω η +( 1)kω dη, ∧ − ∧ where in the third line we used db dx =( 1)k dx db . J ∧ I − I ∧ J (ii) Again by the definition of d:

2 ∂aI ∂ aI d(dω)= d (daI dxI )= d dxj dxI = dxi dxj dxI ∧ ∂xj ∧ ∂xi∂xj ∧ ∧ I I,j   I,i,j X 2 X X ∂ aI = ( dxj dxi dxI )= d(dω). Schwarz’ lemma ∂xj ∂xi − ∧ ∧ − XI,i,j It follows that d(dω) = d2 ω =0.

11.2.3 Pull-Back

n Definition 11.8 Let f : U V be a differentiable function with open sets U Ê and m k → ⊂ V Ê . Let ω Ω (V ) be a differential k-form. We define a differential k-form f ∗(ω) ⊂ ∈ ∈ Ωk(U) by

(f ∗ω)(p)=(Df(p)∗)ω(f(p)), n

(f ∗ω)(p; h ,...,h )= ω(f(p); Df(p)(h ),...,Df(p)(h )), p U, h ,...,h Ê . 1 k 1 k ∈ 1 k ∈

0 In case k =0 and ω Ω (V ) = C∞(V ) we simply set ∈

(f ∗ω)(p)= ω(f(p)), f ∗ω = ω◦f.

We call f ∗(ω) the pull-back of the differential k-form ω with respect to f.

Note that by definition the pull-back f ∗ is a linear mapping from the space of differential k- k k forms on V to the space of differential k-forms on U, f ∗ : Ω (V ) Ω (U). → 11.2 Differential Forms 289

Proposition 11.4 Let f be as above and ω, η Ω(V ). Let dy1,..., dym be the dual basis m ∈ { } to the standard basis in (Ê )∗. Then we have with f =(f1,...,fm)

n ∂fi (a) f ∗( dy )= dx = df , i =1,...,m. (11.8) i ∂x j i j=1 j X (b) f ∗(dω)=d(f ∗ω), (11.9)

(c) f ∗(aω)=(a◦f)f ∗(ω), a C∞(V ), (11.10) ∈ (d) f ∗(ω η)= f ∗(ω) f ∗(η). (11.11) ∧ ∧ If n = m, then

∂(f1,...,fn) (e) f ∗( dy1 dyn)= dx1 dxn. (11.12) ∧···∧ ∂(x1,...,xn) ∧···∧ n Proof. We show (a). Let h Ê ; by Definition11.7 and the definition of the derivative we have ∈ n ∂fk(p) f ∗( dy )(h) = dy (Df(p)(h)) = dy , h i i i ∂x j * j=1 j ! + X   k=1,...,m n ∂f (p) n ∂f (p) = i h = i dx (h). ∂x j ∂x j j=1 j j=1 j X   X   This shows (a). Equation (11.10) is a special case of (11.11); we prove (d). Let p U. Using ∈ the pull-back formula for k forms we obtain

f ∗(ω η)(p)=(Df(p))∗(ω η(f(p))) = (Df(p))∗(ω(f(p)) η(f(p))) ∧ ∧ ∧ =(Df(p))∗(ω(f(p))) (Df(p))∗(η(f(p))) ∧ = f ∗(ω)(p) f ∗(η)(p)=(f ∗(ω) f ∗(η))(p) ∧ ∧

To show (11.9) we start with a 0-form g and prove that f ∗(dg) = d(f ∗ g) for functions

g : U Ê. By (11.7) and (11.10) we have → m ∂g m ∂g(f(p)) f ∗(dg)(p)= f ∗ dy (p)= f ∗( dy ) ∂y i ∂y i i=1 i ! i=1 i X X m ∂g(f(p)) n ∂f = i (p) dx ∂y ∂x j i=1 i j=1 j X X n m ∂g(f(p)) ∂f (p) = i dx ∂y ∂x j j=1 i=1 i j ! X X n ∂ = (g◦f)(p) dxj chain rule ∂x j=1 j X = d(g◦f)(p)=d(f ∗g)(p).

Now let ω = I aI dxI be an arbitrary form. Since by Leibniz rule

P df ∗( dx ) = d(d(f ) d(f ))=0, I i1 ∧···∧ ik n 290 11 Differential Forms on Ê we get by Leibniz rule

d(f ∗ω) = d f ∗(aI )f ∗( dxI ) ! XI = (d(f ∗(a )) f ∗( dx )+ f ∗(a ) d(f ∗( dx ))) = d(f ∗(a ))i f ∗( dx ). I ∧ I I I I ∧ I I I X X On the other hand, by (d) we have

f ∗ d a dx = f ∗ da dx = f ∗(da ) f ∗( dx ). i I I ∧ I I ∧ I I ! I ! I X X X By the first part of (b), both expressions coincide. This completes the proof of (b). We finally prove (e). By (b) and (d) we have

f ∗( dy dy )= f ∗( dy ) f ∗( dy ) 1∧···∧ n 1 ∧···∧ n n ∂f n ∂f = 1 dx n dx ∂x i1 ∧···∧ ∂x in i =1 i1 i =1 in X1 Xn n ∂f ∂f = 1 n dx dx . ∂x ··· ∂x i1 ∧···∧ in i ,...,i =1 i1 in 1 Xn Since the square of a 1-form vanishes, the only non-vanishing terms in the above sum are the permutations (i ,...,i ) of (1,...,n). Using skew-symmetry to write dx dx as a 1 n i1 ∧···∧ in multiple of dx dx , we obtain the sign of the permutation (i ,...,i ): 1∧···∧ n 1 n ∂f1 ∂fn f ∗( dy1 dyn)= sign (I) dx1 dxn ∧···∧ ∂xi1 ··· ∂xin ∧···∧ I=(i1,...,in) Sn X ∈ ∂(f1,...,fn) = dx1 dxn. ∂(x1,...,xn) ∧···∧

2 Ê Example 11.6 (a) Let f(r, ϕ)=(r cos ϕ,r sin ϕ) be given on Ê (0 ) and let dr, dϕ \ × { } and dx, dy be the dual bases to e , e and e , e . We have { } { r ϕ} { 1 2} f ∗(x)= r cos ϕ, f ∗(y)= r sin ϕ,

f ∗( dx) = cos ϕ dr r sin ϕdϕ, f ∗( dy) = sin ϕ dr + r cos ϕdϕ, − f ∗( dx dy)= r dr dϕ, ∧ ∧ y x f ∗ − dx + dy = dϕ. x2 + y2 x2 + y2  

k k+1

Ê Ê Ê (b) Let k Æ, r 1,...,k , and α . Define a mapping a mapping from I : ∈k k+1∈{ } ∈ → and ω Ω (Ê ) by ∈ I(x1,...,xk)=(x1,...,xr 1,α,xr,...xk), − k+1 ω(y ,...,y )= f (y) dy dy dy , 1 k+1 i 1∧··· i ···∧ k+1 i=1 X d 11.2 Differential Forms 291

k+1 where f C∞(Ê ) for all i; the hat means omission of the factor dy . Then i ∈ i

I∗(ω)(x)= fr(x1,...,xr 1,α,xr,...,xk) dx1 dxk. − ∧···∧ This follows from

I∗( dy ) = dx , i =1,...,r 1, i i − I∗( dyr)=0,

I∗( dyi+1) = dxi, i = r,...,k.

Roughly speaking: f ∗(ω) is obtained by substituting the new variables at all places.

11.2.4 Closed and Exact Forms

Motivation: Let f(x) be a continuous function on Ê. Then ω = f(x) dx is a 1-form. By the fundamental theorem of calculus, there exists an antiderivative F (x) to f(x) such that d F (x)= f(x) dx = ω. k k 1 Problem: Given ω Ω (U). Does there exist η Ω − (U) with dη = ω? ∈ ∈ Definition 11.9 ω Ωk(U) is called closed if d ω =0. k ∈ k 1 ω Ω (U) is called exact if there exists η Ω − (U) such that d η = ω. ∈ ∈ Remarks 11.2 (a) An exact form ω is closed; indeed, dω = d(dη)=0. ~ (b) A 1-form ω = i fi dxi is closed if and only if curl f =0 for the corresponding vector field f~ =(f ,...,f ). Here the general curl can be defined as a vector with n(n 1)/2 components 1 n P − ~ ∂fj ∂fi ( curl f)ij = . ∂xi − ∂xj

The form ω is exact if and only if f~ is conservative, that is, f~ is a gradient vector field with f~ = grad(U). Then ω = d U. (c) There are closed forms that are not exact; for example, the winding form y x ω = − dx + dy x2 + y2 x2 + y2

2 on Ê \ (0, 0) is not exact, cf. homework30.1. { } k 2 (d) If d η = ω then d(η + dξ)= ω, too, for all ξ Ω − (U). ∈

Definition 11.10 An open set U is called star-shaped if there ex-

ists an x0 U such that for all x U the segment from x0 to x is x 0 ∈ ∈ in U, i.e. (1 t)x + tx U for all t [0, 1]. − 0 ∈ ∈

Convex sets U are star-shaped (take any x U); any star-shaped set is connected and simply 0 ∈ connected. n 292 11 Differential Forms on Ê

n Lemma 11.5 Let U Ê be star-shaped with respect to the origin. Let ⊂ k ω = ai1 i dxi1 dxi Ω (U). Define ··· k ∧···∧ k ∈ i1<

k 1 r 1 k 1 I(ω)(x)= ( 1) − t − ai1 ik (tx) dt xir dxi1 dxir dxik , − 0 ··· ∧···∧ ∧···∧ i1<

where the hat means omission of the factor dxir . Then we have

I(d ω)+d(Iω)= ω. (11.14)

(Without proof.)

Example 11.7 (a) Let k =1, n =3, and ω = a1 dx1 + a2 dx2 + a3 dx3. Then

1 1 1 I(ω)= x1 a1(tx) dt + x2 a2(tx) dt + x3 a3(tx) dt. Z0 Z0 Z0

Note that this is exactly the formula for the potential U(x1, x2, x3) from Remark 8.5 (b). Let

(a1, a2, a3) be a vector field on U with dω = 0. This is equivalent to curl a = 0 by Exam- ple11.5(c). The above lemma shows dU = ω for U = I(ω); this means grad U =(a1, a2, a3),

U is the potential to the vector field (a1, a2, a3). (b) Let k =2, n =3, and ω = a dx dx +a dx dx +a dx dx where a is a C1-vector 1 2∧ 3 2 3∧ 1 3 1∧ 2 field on U. Then

1 1 I(ω)= x3 ta2(tx) dt x2 ta3(tx) dt dx1+ 0 − 0  Z 1 Z  1 + x1 ta3(tx) dt x3 ta1(tx) dt dx2+ 0 − 0  Z Z 1  1 + x ta (tx) dt x ta (tx) dt dx . 2 1 − 1 2 3  Z0 Z0 

By Example11.5(d), ω is closed if and only if div (a)=0 on U. Let η = b1 dx1 + b2 dx2 + b3 dx3 such that dη = ω. This means curl b = a. The Poincar´elemma shows that b with curl b = a exists if and only if div (a)=0. Then b is the vector potential to a. In case d ω =0 we can choose ~b d~x = I(ω).

Theorem 11.6 (Poincare´ Lemma) Let U be star-shaped. Then every closed differential form is exact.

Proof. Without loss of generality let U be star-shaped with respect to the origin and dω = 0. By Lemma11.5, d(Iω)= ω. 11.3 Stokes’ Theorem 293

k Remarks 11.3 (a) Let U be star-shaped, ω Ω (U). Suppose d η0 = ω for some η k 1 ∈ k 2 ∈ Ω − (U). Then the general solution of d η = ω is given by η + dξ with ξ Ω − (U). 0 ∈ Indeed, let η be a second solution of d η = ω. Then d(η η0)=0. By the Poincar´elemma, k 2 − there exists ξ Ω − (U) with η η = dξ, hence η = η + dξ. ∈ − 0 0 (b) Let V be a linear space and W a linear subspace of V . We define an equivalence relation on V by v v if v v W . The equivalence class of v is denoted by v + W . One easily 1 ∼ 2 1 − 2 ∈ sees that the set of equivalence classes, denoted by V/W , is again a linear space: α(v + W )+ β(u + W ) := αv + βu + W . n Let U be an arbitrary open subset of Ê . We define Ck(U)= ω Ωk(U) d ω =0 , the cocycles on U, { ∈ | } Bk(U)= ω Ωk(U) ω is exact , the coboundaries on U. { ∈ | } Since exact forms are closed, Bk(U) is a linear subspace of Ck(U). The factor space

k k k HdeR(U)= C (U)/B (U)

k is called the deRham cohomology of U. If U is star-shaped, HdeR(U)=0 for k 1, by 1 2 ≥ Poincar´e’s lemma. The first deRham cohomology H of Ê (0, 0) is non-zero. The wind- deR \ { } ing form is a non-zero element. We have

0 p HdeR(U) ∼= Ê , if and only if U has exactly p components which are not connected U = U1 Up (disjoint ∪···∪ 0 union). Then, the characteristic functions χUi , i =1,...,p, form a basis of the 0-cycles C (U) (B0(U)=0).

11.3 Stokes’ Theorem

11.3.1 Singular Cubes, Singular Chains, and the Boundary Operator A very nice treatment of the topics to this section is [Spi65, Chapter 4]. The set [0, 1]k = k [0, 1] [0, 1] = x Ê 0 xi 1, i = 1,...,k is called the k-dimensional unit ×···× n { ∈ | ≤ ≤ } cube. Let U Ê be open. ⊂ n Definition 11.11 (a) A singular k-cube in U Ê is a continuously differentiable mapping ⊂ c : [0, 1]k U. k → (b) A singular k-chain in U is a formal sum

s = n c + + n c k 1 k,1 ··· r k,r

with singular k-cubes c and integers n Z. k,i i ∈ A singular 0-cube is a point, a singular 1-cube is a curve, 3 in general, a singular 2-cube (in Ê ) is a surface with c2 a boundary of 4 pieces which are differentiable curves.

c2 Note that a singular 2-cube can also be a single point— that is where the name “singular” comes from. n 294 11 Differential Forms on Ê

k k k Let Ik : [0, 1] Ê be the identity map, i.e. Ik(x)= x, x [0, 1] . It is called the standard k- k → ∈ cube in Ê . We are going to define the boundary ∂sk of a singular k-chain sk. For i =1,...,k define

k I(i,0)(x1,...,xk 1)=(x1,...,xi 1, 0, xi,...,xk 1), − − − k I(i,1)(x1,...,xk 1)=(x1,...,xi 1, 1, xi,...,xk 1). − − −

Insert a 0 and a 1 at the ith component, respectively. k 1 k The boundary of the standard k-cube I is now defined by ∂I : [0, 1] − [0, 1] k k →

k ∂I = ( 1)i Ik Ik . (11.15) k − (i,0) − (i,1) i=1 X  It is the formal sum of 2k singular (k 1)-cubes, the faces of the k-cube. − k n The boundary of an arbitrary singular k-cube ck : [0, 1] U Ê is defined by the composi- k 1 k → ⊂ tion of the above mapping ∂ I : [0, 1] − [0, 1] and the k-cube c : k → k

k i k k ∂c = c ◦∂I = ( 1) c ◦I c ◦I , (11.16) k k k − k (i,0) − k (i,1) i=1 X  and for a singular k-chain s = n c + + n c we set k 1 k,1 ··· r k,r ∂s = n ∂c + + n ∂c . k 1 k,1 ··· r k,r The boundary operator ∂c associates to each singular k-chain a singular (k 1)-chain (since k − both Ik and Ik depend on k 1 variables, all from the segment [0, 1]). (i,0) (i,1) − One can show that

∂(∂sk)=0 for any singular k-chain sk. Example 11.8 (a) In case n = k =3 have

x3 +I(3,1) ∂I = I3 + I3 + I3 I3 I3 + I3 , 3 − (1,0) (1,1) (2,0) − (2,1) − (3,0) (3,1)

- I(1,0) where +I - I (2,0) (2,1) +I(1,1) 3 3 x2 I (x , x )= (0, x , x ), +I (x , x ) = +(1, x , x ), − (1,0) 1 2 − 1 2 (1,1) 1 2 1 2 3 3 x1 +I(2,0)(x1, x2)=+(x1, 0, x2), I(2,1)(x1, x2)= (x1, 1, x2), - I(3,0) − − I3 (x , x )= (x , x , 0), +I3 (x , x )=+(x , x , 1). − (3,0) 1 2 − 1 2 (3,1) 1 2 1 2 Note, if we take care of the signs in (11.15) all 6 unit normal vectors D Ik D Ik to 1 (i,j) × 2 (i,j) the faces have the orientation of the outer normal with respect to the unit 3-cube [0, 1]3. The above sum ∂I3 is a formal sum of singular 2-cubes. You are not allowed to add componentwise: (0, x , x )+(1, x , x ) = (1, 0, 0). − 1 2 1 2 6 11.3 Stokes’ Theorem 295

(b) In case k =2 we have

Γ 2 2 2 2 3 ∂ I (x)= I I + I I I (2,1) 2 (1,1) (1,0) (2,0) (2,1) E 4 E − − - 3 Γ 4 Γ = (1, x) (0, x)+(x, 0) (x, 1). 1 I (1,0) - + c2 Γ I 2 − − + (1,1) ∂∂ I =(E E ) (E E )+ 2 3 − 2 − 4 − 1 E I E 1 (2,0) 2 +(E E ) (E E )=0. 2 − 1 − 3 − 4

Here we have ∂c2 = Γ1 + Γ2 Γ3 Γ4. 3 − − (c) Let c : [0, 2π] [0, π] Ê (0, 0, 0) be the singular 2-cube 2 × → \ { } c(s, t) = (cos s sin t, sin s sin t, cos t).

By (b)

∂c2(x)= c2◦∂I2 = = c (2π, x) c (0, x)+ c (x, 0) c (x, π) 2 − 2 2 − 2 = (cos2π sin x, sin 2π sin x, cos2π) (cos 0 sin x, sin 0 sin x, cos x)+ − + (cos x sin 0, sin x sin 0, cos0) (cos x sin π, sin x sin π, cos π) − = (sin x, 0, cos x) (sin x, 0, cos x)+(0, 0, 1) (0, 0, 1) − − − = (0, 0, 1) (0, 0, 1). − −

Hence, the boundary ∂c2 of the singular 2-cube c2 is a degenerate singular 1-chain. We come back to this example.

11.3.2 Integration

k n Definition 11.12 Let ck : [0, 1] U Ê , ~x = ck(t1,...,tk), be a singular k-cube and ω → ⊂ k a k-form on U. Then (ck)∗(ω) is a k-form on the unit cube [0, 1] . Thus there exists a unique function f(t), t [0, 1]k, such that ∈

(c )∗(ω)= f(t) dt dt . k 1∧···∧ k Then

ω := (ck∗)(ω) := f(t) dt1 dtk k ··· Zck ZIk Z[0,1] is called the integral of ω over the singular cube ck; on the right there is the k-dimensional Riemann integral. r

If sk = nick,i is a k-chain, set i=1 X r

ω = ni ω. sk i=1 ck,i Z X Z If k = 0, a 0-cube is a single point c (0) = x and a 0-form is a function ω C∞(G). We set 0 0 ∈ ω = c∗(ω) = ω(c0(0)) = ω(x0). We discuss two special cases k =1 and k = n. c0 0 |t=0 R n 296 11 Differential Forms on Ê

n Example 11.9 (a) k = 1. Let c: [0, 1] Ê be an oriented, smooth curve Γ = c([0, 1]). Let → n ω = f (x) dx + + f (x) dx be a 1-form on Ê , then 1 1 ··· n n

c∗(ω)=(f (c(t))c′ (t)+ + f (c(t))c′ (t)) dt 1 1 ··· n n is a 1-form on [0, 1] such that

1 ω = c∗ ω = f (c(t))c′ (t)+ + f (c(t))c′ (t) dt = f~ d~x. 1 1 ··· n n · Zc Z[0,1] Z0 ZΓ ~ Obviously, c ω is the line integral of f over Γ . k k (b) k = n. Let c: [0, 1] Ê be continuously differentiable and let x = c(t). Let ω = R → k f(x) dx dx be a differential k-form on Ê . By Proposition11.4(e), 1∧···∧ k

∂(c1,...,ck) c∗(ω)= f(c(t)) dt1 . . . dtk. ∂(t1, . . ., tk) ∧ ∧ Therefore,

∂(c1,...,ck) ω = f(c(t)) dt1. . . dtk (11.17) ∂(t1, . . ., tk) Zc [0,Z1]k

k Let c = Ik be the standard k-cube in [0, 1] . Then

ω = f(x) dx dx 1··· k IZk [0Z,1]k is the k-dimensional Riemann integral of f over [0, 1]k. k k k Let I˜k(x1, . . ., xk)=(x2, x1, x3, . . ., xk). Then Ik([0, 1] )= I˜k([0, 1] )=[0, 1] , however

ω = f(x , x , x , . . ., x )( 1) dx dx 2 1 3 k − 1··· k Z Z k I˜k [0,1] = f(x , x , x , . . ., x ) dx dx = ω. − 1 2 3 k 1··· k − [0,Z1]k IZk

We see that ω is an oriented Riemann integral. Note that in the above formula (11.17) we do Ik not have theR absolute value of the Jacobian.

11.3.3 Stokes’ Theorem

n Theorem 11.7 Let U be an open subset of Ê , k 0 a non-negative integer and let s be a ≥ k+1 singular (k + 1)-chain on U. Let ω be a differential k-form on U, ω Ωk(U). Then we have ∈ 1 ω = d ω.

∂sZk+1 skZ+1 11.3 Stokes’ Theorem 297

Proof. (a) Let sk+1 = Ik+1 be the standard (k + 1)-cube, in particular n = k +1. k+1 Let ω = f (x) dx dx dx . Then i 1∧· · ·∧ i∧· · ·∧ k+1 i=1 X dk+1 ∂f (x) d ω = ( 1)i+1 i dx dx , − ∂x 1∧· · ·∧ k+1 i=1 i X hence by Example11.6(b), Fubini’s theorem and the fundamental theorem of calculus

k+1 i+1 ∂fi d ω = ( 1) dx1 dxk+1 − ∂xi ··· Z i=1 Zk+1 Ik+1 X [0,1] k+1 1 i+1 ∂fi = ( 1) (x1, . . ., t, . . .xk+1) dt dx1 dxi 1 dxi+1 dxk+1 − ∂x i ··· − ··· i=1 0 i b X [0,Z1]k Z  k+1 i+1 = ( 1) (fi(x1,..., 1,...,xk+1) fi(x1,..., 0,...,xk+1)) dx1 dxk+1 − i − i · ·· i=1 b b X [0,Z1]k b k+1 = ( 1)i+1 Ik+1 ∗ ω Ik+1 ∗ ω Example 11.6 (b) −  (i,1) − (i,0)  i=1 [0,Z1]k   [0,Z1]k   X     = ω,

∂IZk+1 by definition of ∂Ik+1. The assertion is shown in case of identity map. (b) The general case. Let Ik+1 be the standard (k + 1)-cube. Since the pull-back and the differential commute (Proposition 11.4) we have

d ω = (ck+1)∗ (d ω)= d((ck+1)∗ω)= (ck+1)∗ω

ckZ+1 IkZ+1 IkZ+1 ∂IZk+1

k+1 i = ( 1)  (ck+1)∗ω (ck+1)∗ω − − i=1 IkZ+1 IkZ+1 X  (i,0) (i,0)    k+1   = ( 1)i ω = ω. − i=1 k k c I +1 Zc I +1 ∂cZk+1 X k+1◦ (i,0)− k+1◦ (i,1)

(c) Finally, let s = n c with n Z and singular (k + 1)-cubes c . By definition and k+1 i i ki i ∈ ki by (b), P

dω = ni dω = ni ω = ω. sk+1 i ck+1 i ∂ck+1 ∂sk+1 Z X Z X Z Z n 298 11 Differential Forms on Ê

Remark 11.4 Stokes’ theorem is valid for arbitrary oriented compact differentiable k- dimensional manifolds F and continuously differentiable (k 1)-forms ω on F. − Example 11.10 We come back to Example11.8(c). Let ω = (x dy dz + y dz dx + 3 3 ∧ ∧ z dx dy)/r be a 2-form on Ê \ (0, 0, 0) . It is easy to show that ω is closed, d ω = 0. We ∧ 3{ } compute ω. First note that c2∗(r )=1, c2∗(x) = cos s sin t, c2∗(y) = sin s sin t, c2∗(z) = cos t c2 such that R

c∗( dx) = d(cos s sin t)= sin s sin t ds + cos s cos t dt, 2 − c2∗( dy) = d(sin s sin t) = cos s sin t ds + sin s cos t dt,

c∗( dz)= sin t dt. 2 − and

c∗(ω)= c∗(x dy dz + y dz dx + z dx dy) 2 2 ∧ ∧ ∧ = c∗(x) c∗( dy) c∗( dz)+ c∗(y dz dx)+ c∗(z dx dy) 2 2 ∧ 2 2 ∧ 2 ∧ =( cos2 s sin3 t sin2 s sin3 t cos t(sin2 s sin t cos t + cos2 s sin t cos t) ds dt − − − ∧ = sin t ds dt, − ∧ such that

2π π

ω = c2∗(ω)= sin t ds dt = ( sin t) dsdt = 4π. c2 − ∧ 0 0 − − Z [0,2π]Z [0,π] [0,2π]Z [0,π] Z Z × × 3 Stokes’ theorem shows that ω is not exact on Ê \ (0, 0, 0) . Suppose to the contrary that 1 3 { } ω = d η for some η Ω (Ê (0, 0, 0) ). Since by Example11.8(c), ∂c is a degenerate ∈ \ { } 2 1-chain (it consists of two points), the pull-back (∂c2)∗(η) is 0 and so is the integral

0= (∂c )∗(η)= η = dη = ω = 4π, 2 − ZI1 Z∂c2 Zc2 Zc2 a contradiction; hence, ω is not exact. We come back to the two special cases k =1, n =2 and k =1, n =3.

11.3.4 Special Cases

2 3 2 k = 1, n = 3. Let c: [0, 1] U Ê be a singular 2-cube, F = c([0, 1] ) is a regular 3 → ⊆ smooth surface in Ê . Then ∂F is a closed path consisting of 4 parts with the counter-clockwise orientation. Let ω = f1 dx1 +f2 dx2 +f3 dx3 be a differential 1-form on U. By Example11.9(a)

ω = f1 dx1 + f2 dx2 + f3 dx3 F Z∂c2 Z∂ On the other hand by Example11.5(c)

d ω = curl f ( dx dx , dx dx , dx dx ). · 2∧ 2 3∧ 2 1∧ 2 11.3 Stokes’ Theorem 299

In this case Stokes’ theorem gives

ω = dω

∂cZ2 cZ2 f dx + f dx + f dx = curl f ( dx dx , dx dx , dx dx ) 1 1 2 2 3 3 · 2∧ 3 3∧ 1 1∧ 2 ∂ZF ZF

If F is in the x1–x2 plane, we get Green’s theorem. 3 k =2, n =3. Let c3 be a singular 3-cube in Ê and G = c3([0, 1]). Further let

ω = v dx dx + v dx dx + v dx dx , 1 2∧ 3 2 3∧ 1 3 1∧ 2 with a continuously differentiable vector field v C1(G). By Example11.5(d), ∈ d ω = div (v) dx dx dx . The boundary of G consists of the 6 faces ∂c ([0, 1]3). They 1∧ 2∧ 3 3 are oriented with the outer unit normal vector. Stokes’ theorem then gives

dω = v dx dx + v dx dx + v dx dx , 1 2∧ 3 2 3∧ 1 3 1∧ 2 Zc3 Z∂c3 div v dxdydz = ~v d~S. · ZG Z∂G This is Gauß’ divergence theorem.

11.3.5 Applications

The following two applications were not covered by the lecture. (a) Brower’s Fixed Point Theorem

n Proposition 11.8 (Retraction Theorem) Let G Ê be a compact, connected, simply con- ⊂ nected set with smooth boundary ∂G. n 2 There exist no vector field f : G Ê , f C (G), i = 1,...,n such that f(G) ∂G and → i ∈ ⊂ f(x)= x for all x ∂G. ∈ n 1 Proof. Suppose to the contrary that such f exists; consider ω Ω − (U), ∈ ω = x1 dx2 dx3 dxn. First we show that f ∗(dω) = 0. By definition, for ∧ n ∧···∧ v ,...,v Ê we have 1 n ∈

f ∗(dω)(p)(v1,...,vn) = dω(f(p))(Df(p)v1, Df(p)v2,...,Df(p)vn)).

Since dim f(G) = dim ∂G = n 1, the n vectors Df(p)v , Df(p)v ,...,Df(p)v can be − 1 2 n thought as beeing n vectors in an n 1 dimensional linear space; hence, they are linearly − dependent. Consequently, any alternating n-form on those vectors is 0. Thus f ∗(dω)=0. By Stokes’ theorem

f ∗(ω)=0= f ∗(dω). Z∂G ZG n 300 11 Differential Forms on Ê

On the other hand, f = id on ∂G such that

f ∗(ω)= ω = x dx dx . |∂G 1 2∧···∧ n |∂G

Again, by Stokes’ theorem,

0= f ∗(ω)= dx dx = G ; 1∧···∧ n | | Z∂G ZG a contradiction.

Theorem 11.9 (Brower’s Fixed Point Theorem) Let g : B1 B1 a continuous mapping of n → the closed unit ball B Ê into itself. 1 ⊂ Then f has a fixed point p, f(p)= p.

Proof. (a) We first prove that the theorem holds true for a twice continuously differentiable mapping g. Suppose to the contrary that g has no fixed point. For any p B the line through ∈ 1 p and f(p) is then well defined. Let h(p) be those intersection point of the above line with the n 1 the unit sphere S − such that h(p) p is a positive multiple of f(p) p. In particular, h is a 2 n 1 − n 1 − C -mapping from B1 into S − which is the identity on S − . By the previous proposition, such a mapping does not exist. Hence, f has a fixed point. (b) For a continuous mapping one needs Stone–Weierstraß to approximate the continuous functions by polynomials.

In case n =1 Brower’s theorem is just the intermediate value theorem.

(b) The Fundamental Theorem of Algebra

We give a first proof of the fundamental theorem of algebra, Theorem 5.19:

n n 1 Every polynomial f(z)= z + a z − + + a with complex coefficients a C 1 ··· n i ∈

has a root in C.

2 n We use two facts, the winding form ω on Ê (0, 0) is closed but not exact and z and f(z) \ { } are “close together” for sufficiently large z . | |

2 2

Ê Ê We view C as with (a, b)= a + bi. Define the following singular 1-cubes on 11.3 Stokes’ Theorem 301

n n n cR,n(s)=(R cos(2πns), R sin(2πns)) = z ,

cR,f (s)= f ◦cR,1(s)= f(R cos(2πs), R sin(2πs)) = f(z),

where z = z(s) = R(cos2πs + i sin 2πs), s [0, 1]. ∈ z n Note that z = R. Further, let | | f(z) c(s,t) c(s, t)=(1 t)c (s)+ tc − R,f R,n = (1 t)f(z)+ tzn, (s, t) [0, 1]2, − ∈ b(s, t)= f((1 t) R(cos2πs, sin 2πs)) n − (1-t) f(z)+t z = f((1 t)z), (s, t) [0, 1]2 − ∈ 2 be singular 2-cubes in Ê .

Lemma 11.10 If z = R is sufficiently large, then | | Rn c(s, t) , (s, t) [0, 1]2. | | ≥ 2 ∈ Proof. Since f(z) zn is a polynomial of degree less than n, − f(z) zn − 0, zn z−→ →∞ in particular f(z) zn Rn/2 if R is sufficiently large. Then we have | − | ≤ c(s, t) = (1 t)f(z)+ tzn = zn + (1 t)(f(z) zn) | | | − | | − − | Rn Rn zn (1 t) f(z) zn Rn = . ≥| | − − | − | ≥ − 2 2

The only fact we need is c(s, t) =0 for sufficiently large R; hence, c maps the unit square into 2 6 Ê (0, 0) . \ { } Lemma 11.11 Let ω = ω(x, y) = ( y dx + x dy)/(x2 + y2) be the winding form on 2 − Ê (0, 0) . Then we have \ { } (a)

∂c = c c , R,f − R,n ∂b = f(z) f(0). − 2 (b) For sufficiently large R, c, c , and c are chains in Ê (0, 0) and R,n R,f \ { }

ω = ω =2πn. ZcR,n ZcR,f n 302 11 Differential Forms on Ê

Proof. (a) Note that z(0) = z(1) = R. Since ∂I (x)=(x, 0) (x, 1)+(1, x) (0, x) we have 2 − − ∂c(s)= c(s, 0) c(s, 1)+ c(1,s) c(0,s) − − = f(z) zn ((1 s)f(R)+ sRn)+((1 s)f(R)+ sRn)= f(z) zn. − − − − − This proves (a). Similarly, we have

∂b(s)= b(s, 0) b(s, 1) + b(1,s) b(0,s) − − = f(z) f(0) + f((1 s)R) f((1 s)R)= f(z) f(0). − − − − − 2 (b) By the Lemma11.10, c is a singular 2-chain in Ê \ (0, 0, 0) for sufficiently large R. 2 { } Hence ∂c is a 1-chain in Ê \ (0, 0) . In particular, both cR,n and cR,f take values in 2 { } Ê (0, 0) . Hence (∂c)∗(ω) is well-defined. We compute c∗ (ω) using the pull-backs of \ { } R,n dx and dy

2 2 2n cR,n∗(x + y )= R , n c ∗( dx)= 2πnR sin(2πns) ds, R,n − n cR,n∗( dy)=2πnR cos(2πns) ds,

cR,n∗(ω)=2πn ds.

Hence 1 ω = 2πn ds =2πn. ZcR,n Z0 By Stokes’ theorem and since ω is closed,

ω = d ω =0, Z∂c Zc such that by (a), and the above calculation

0= ω = ω ω, hence ω = ω =2πn. − Z∂c ZcR,n ZcR,f ZcR,n ZcR,f

   We complete the proof of the fundamental theorem of al-  n  b(s,t)=(1-t) z gebra. Suppose to the contrary that the polynomial f(z)   for f=1.

 is non-zero in C, then b as well as ∂b are singular chains   2  in Ê \ (0, 0) .  { } By Lemma 11.11 (b) and again by Stokes’ theorem we have

ω = ω = ω = dω =0. c c f(0) ∂b b Z R,f Z R,f − Z Z 2 But this is a contradiction to Lemma11.11(b). Hence, b is not a 2-chain in Ê (0, 0) , that \ { } is there exist s, t [0, 1] such that b(s, t) = f((1 t)z) =0. We have found that (1 ∈ − − 11.3 Stokes’ Theorem 303 t)R(cos(2πs) + isin(2πs)) is a zero of f. Actually, we have shown a little more. There is a

zero of f in the disc z C z R where R max 1, 2 a . Indeed, in this case { ∈ || | ≤ } ≥ { i | i |} n 1 n 1 − − P n n n k n 1 R f(z) z a z − a R − | − | ≤ | k | ≤ | k | ≤ 2 k=1 k=1 X X and this condition ensures c(s, t) =0 as in the proof of Lemma11.10. | | 6 n 304 11 Differential Forms on Ê Chapter 12

Measure Theory and Integration

12.1 Measure Theory

Citation from Rudins book, [Rud66, Chapter 1]: Towards the end of the 19th century it became clear to many mathematicians that the Riemann integral should be replaced by some other type of integral, more general and more flexible, better suited for dealing with limit processes. Among the attempts made in this direction, the most notable ones were due to Jordan, Borel, W.H. Young, and Lebesgue. It was Lebesgue’s construction which turned out to be the most successful. In a brief outline, here is the main idea: The Riemann integral of a function f over an interval [a, b] can be approximated by sums of the form

n

f(ti) m(Ei), i=1 X where E1,...,En are disjoint intervals whose union is [a, b], m(Ei) denotes the length of Ei and t E for i =1,...,n. Lebesgue discovered that a completely satisfactory theory of inte- i ∈ i gration results if the sets Ei in the above sum are allowed to belong to a larger class of subsets of the line, the so-called “measurable sets,” and if the class of functions under consideration is enlarged to what we call “measurable functions.” The crucial set-theoretic properties involved are the following: The union and the intersection of any countable family of measurable sets are measurable;...the notion of “length” (now called “measure”) can be extended to them in such a way that m(E E )= m(E )+ m(E )+ 1 ∪ 2 ∪··· 1 2 ··· for any countable collection E of pairwise disjoint measurable sets. This property of m is { i} called countable additivity. The passage from Riemann’s theory of integration to that of Lebesgue is a process of comple- tion. It is of the same fundamental importance in analysis as the construction of the real number system from rationals.

305 306 12 Measure Theory and Integration

12.1.1 Algebras, σ-algebras, and Borel Sets (a) The Measure Problem. Definitions

Lebesgue (1904) states the following problem: We want to associate to each bounded subset E of the real line a positive real number m(E), called measure of E, such that the following properties are satisfied:

(1) Any two congruent sets (by shift or reflexion) have the same measure. (2) The measure is countably additive. (3) The measure of the unit interval [0, 1] is 1.

He emphasized that he was not able to solve this problem in full detail, but for a certain class of sets which he called measurable. We will see that this restriction to a large family of bounded sets is unavoidable—the measure problem has no solution.

Definition 12.1 Let X be a set. A family (non-empty) family A of subsets of X is called an algebra if A, B A implies Ac A and A B A. ∈ ∈ ∪ ∈

An algebra A is called a σ-algebra if for all countable families A n Æ with A A we { n | ∈ } n ∈ have A = A A A A. n 1 ∪ 2 ∪···∪ n ∪···∈

n Æ [∈ Remarks 12.1 (a) Since A A implies A Ac A; X A and ∅ = Xc A. ∈ ∪ ∈ ∈ ∈ (b) If A is an algebra, then A B A for all A, B A. Indeed, by de Morgan’s rule, c ∩ c ∈ ∈ ( A ) = Ac and ( A ) = Ac , we have A B = (Ac Bc)c, and all the α α α α α α α α ∩ ∪ members on the right are in A by the definition of an algebra. S T T S

(c) Let A be a σ-algebra. Then A A if A A for all n Æ. Again by de Morgan’s n ∈ n ∈ ∈

n Æ \∈ rule c c

An = An . Æ n Æ n ! \∈ [∈ (d) The family P(X) of all subsets of X is both an algebra as well as a σ-algebra. (e) Any σ-algebra is an algebra but there are algebras not being σ-algebras. (f) The family of finite and cofinite subsets (these are complements of finite sets) of an infinite set form an algebra. Do they form a σ-algebra?

n

(b) Elementary Sets and Borel Sets in Ê

Ê Ê Let Ê be the extended real axis together with , = + . We use the old ±∞ ∪{ ∞} ∪ {−∞} rules as introduced in Section3.1.1 at page 79. The new rule which is used in measure theory only is 0 = 0=0. · ±∞ ±∞ · The set n

I = (x ,...,x ) Ê a x b , i =1,...,n { 1 n ∈ | i  i  i } 12.1 Measure Theory 307

n Ê is called a rectangle or a box in Ê , where either stands for < or for where ai, bi .  n ≤ ∈ For example ai = and bi = + yields I = Ê , whereas a1 = 2, b1 = 1 yields I = ∅. n −∞ ∞ A subset of Ê is called an elementary set if it is the union of a finite number of rectangles

n n Ê in Ê . Let En denote the set of elementary subsets of . En = I1 I2 Ir r

n { ∪ ∪···∪ | ∈ Ê Æ, I is a box in . j } Lemma 12.1 En is an algebra but not a σ-algebra. Proof. The complement of a finite interval is the union of two intervals, the complement of an infinite interval is an infinite interval. Hence, the complement of a rectangle in n Ê is the finite union of rectangles. The countable (disjoint) union M = [n, n + 1 ] is n Æ 2 not an elementary set. ∈ S

Note that any elementary set is the disjoint union of a finite number of rectangles. Let B be any (nonempty) family of subsets of X. Let σ(B) denote the intersection of all σ- algebras containing B, i.e. σ(B) = A , where A i I is the family of all σ-algebras i { i | ∈ } i I A which contain B, B A for all i\∈ I. i ⊆ i ∈ Note that the σ-algebra P(X) of all subsets is always a member of that family A such that { i} σ(B) exists. We call σ(B) the σ-algebra generated by B. By definition, σ(B) is the smallest σ-algebra which contains the sets of B. It is indeed a σ-algebra. Moreover, σ(σ(B)) = σ(B) and if B B then σ(B ) σ(B ). 1 ⊂ 2 1 ⊂ 2

n Definition 12.2 The Borel algebra Bn in Ê is the σ-algebra generated by the elementary sets En. Its elements are called Borel sets.

n The Borel algebra Bn = σ(En) is the smallest σ-algebra which contains all boxes in Ê We n will see that the Borel algebra is a huge family of subsets of Ê which contains “all sets we are ever interested in.” Later, we will construct a non-Borel set.

n Proposition 12.2 Open and closed subsets of Ê are Borel sets.

2 Proof. We give the proof in case of Ê . Let I (x ,y )=(x ε, x + ε) (y ε,y + ε) ε 0 0 0 − 0 × 0 − 0 denote the open square of size 2ε by 2ε with midpoint (x0,y0). Then I 1 I 1 for n Æ. Let n+1 n 2 ⊆ ∈ M Ê be open. To every point (x ,y ) with rational coordinates x ,y we choose the largest ⊂ 0 0 0 0 square I (x ,y ) M in M and denote it by J(x ,y ). We show that 1/n 0 0 ⊆ 0 0

M = J(x0,y0).

(x0,y0) M,x0,y0 É ∈[ ∈ 308 12 Measure Theory and Integration

Since the number of rational points in M is at least countable, the right side is in σ(E ). Now let (a, b) M arbitrary. Since M is open, there 2 ∈

exists n Æ such that I2/n(a, b) M. Since the rational points ∈ 2 ⊆ (x,y) are dense in Ê , there is rational point (x0,y0) which is contained in 0 0 I a, b . Then we have (a,b) 1/n( )

I 1 (x0,y0) I 2 (a, b) M. n ⊆ n ⊆

Since (a, b) I 1 (x0,y0) J(x0,y0), we have shown that M is the union of the countable ∈ n ⊆ family of sets J. Since closed sets are the complements of open sets and complements are again in the σ-algebra, the assertion follows for closed sets.

n Remarks 12.2 (a) We have proved that any open subset M of Ê is the countable union of rectangles I M. ⊆ (b) The Borel algebra Bn is also the σ-algebra generated by the family of open or closed sets n in Ê , Bn = σ(Gn)= σ(Fn). Countable unions and intersections of open or closed sets are in

Bn.

Let us look in more detail at some of the sets in σ(En). Let G and F be the families of all open n and closed subsets of Ê , respectively. Let Gδ be the collection of all intersection of sequences of open sets (from G), and let Fσ be the collection of all unions of sequences of sets of F. One can prove that F G and G F . These inclusions are strict. Since countable intersection ⊂ δ ⊂ σ and unions of countable intersections and union are still countable operations, G , F σ(E ) δ σ ⊂ n For an arbitrary family S of sets let Sσ be the collection of all unions of sequences of sets in S, and let Sδ be the collection of all unions of sequences of sets in S. We can iterate the operations represented by σ and δ, obtaining from the class G the classes Gδ, Gδσ, Gδσδ,...and from F the classes Fσ, Fσδ,.... It turns out that we have inclusions

G G G σ(E ) ⊂ δ ⊂ δσ ⊂···⊂ n F F F σ(E ). ⊂ σ⊂ σδ ⊂···⊂ n No two of these classes are equal. There are Borel sets that belong to none of them.

12.1.2 Additive Functions and Measures

Definition 12.3 (a) Let A be an algebra over X. An additive function or content µ on A is a function µ: A [0, + ] such that → ∞ (i) µ(∅)=0, (ii) µ(A B)= µ(A)+ µ(B) for all A, B A with A B = ∅. ∪ ∈ ∩ (b) An additive function µ is called countably additive (or σ-additive in the German literature)

on A if for any disjoint family A A A, n Æ , that is A A = ∅ for all i = j, with { n | n ∈ ∈ } i ∩ j 6 12.1 Measure Theory 309

A A we have n Æ n ∈ ∈ S

µ An = µ(An). Æ n Æ ! n [∈ X∈ (c) A measure is a countably additive function on a σ-algebra A. If X is a set, A a σ-algebra on X and µ a measure on A, then the tripel (X, A,µ) is called a measure space. Likewise, if X is a set and A a σ-algebra on X, the pair (X, A) is called a measurable space.

Notation. We write A in place of A if A is a disjoint family of subsets. The countable

n n { n} Æ n Æ n additivity readsX∈ as follows [∈

µ(A A )= µ(A )+ µ(A )+ , 1 ∪ 2 ∪··· 1 2 ··· ∞ ∞ µ An = µ(An). n=1 ! n=1 X X We say µ is finite if µ(X) < . If µ(X)=1, we call (X, A,µ) a probability space. We call µ ∞ σ-finite if there exist sets A A, with µ(A ) < and X = ∞ A . n ∈ n ∞ n=1 n Example 12.1 (a) Let X be a set, x X and A = P(X). ThenS 0 ∈ 1, x A, µ(A)= 0 ∈ (0, x0 A 6∈ defines a finite measure on A. µ is called the point mass concentrated at x0. (b1) Let X be a set and A = P(X). Put

n, if A has n elements µ(A)= (+ , if A has infinitely many elements. ∞ µ is a measure on A, the so called counting measure. (b2) Let X be a set and A = P(X). Put

0, if A has finitely many or countably many elements µ(A)= (+ , if A has uncountably many elements. ∞ µ is countably additive, not σ-finite. (b3) Let X be a set and A = P(X). Put

0, if A is finite µ(A)= (+ , if A is infinite. ∞ µ is additive, not σ-additive. 310 12 Measure Theory and Integration

n n Ê (c) X = Ê , A = En is the algebra of elementary sets of . Every A En is the finite disjoint m m ∈ union of rectangles A = k=1 Ik. We set µ(A)= k=1 µ(Ik) where

P µ(I)=(b a ) P(b a ), 1 − 1 ··· n − n n if I = (x1,...,xn) Ê ai xi bi, i =1,...,n and ai bi; µ(∅)=0. Then µ is an { ∈ |   } n  additive function on En. It is called the Lebesgue content on Ê . Note that µ is not a measure (since A is not a σ-algebra and µ is not yet shown to be countably additive). However, we will 2 see in Proposition12.5 below that µ is even countably additive. By definition, µ(line in Ê )=0 3

and µ(plane in Ê )=0.

Ê Ê (d) Let X = Ê, A = E1, and α an increasing function on . For a, b in with a < b set

µ ([a, b]) = α(b + 0) α(a 0), α − − µ ([a, b)) = α(b 0) α(a 0), α − − − µ ((a, b]) = α(b + 0) α(a + 0), α − µ ((a, b)) = α(b 0) α(a + 0). α − −

Then µα is an additive function on E1 if we set

n n

µα(A)= µα(Ii), if A = Ii. i=1 i=1 X X

We call µα the Lebesgue–Stieltjes content.

Ê Ê On the other hand, if µ: E Ê is an additive function, then α : defined by 1 → µ → µ((0, x]), x 0, αµ(x)= ≥ ( µ((x, 0]), x< 0, −

defines an increasing, right-continuous function α on Ê such that µ = µ . In general α = α µ αµ 6 µα since the function on the right hand side is continuous from the right whereas α is, in general, not .

Properties of Additive Functions

Proposition 12.3 Let A be an algebra over X and µ an additive function on A. Then

n n (a) µ A = µ(A ) if A A, k = 1,...,n form a disjoint family of n k k k ∈ k=1 ! k=1 subsets.X X (b) µ(A B)+ µ(A B)= µ(A)+ µ(B), A, B A. ∪ ∩ ∈ (c) A B implies µ(A) µ(B) (µ is monotone). ⊆ ≤ (d) If A B, A, B A, and µ(A) < + , then µ(B A) = µ(B) µ(A), ⊆ ∈ ∞ \ − (µ is subtractive). 12.1 Measure Theory 311

(e) n n

µ Ak µ(Ak), ! ≤ k[=1 Xk=1 if A A, k =1,...,n, (µ is finitely subadditive). k ∈ ∞

(f) If A k Æ is a disjoint family in A and A A. Then { k | ∈ } k ∈ Xk=1

∞ ∞ µ Ak µ(Ak). ! ≥ Xk=1 Xk=1 Proof. (a) is by induction. (d), (c), and (b) are easy (cf. Homework 34.4). n

(e) We can write Ak as the finite disjoint union of n sets of A: k[=1 n

Ak = A1 A2 A1 A3 (A1 A2) An (A1 An 1) . ∪ \ ∪ \ ∪ ∪···∪ \ ∪···∪ − k=1 [    Since µ is additive,

n n n

µ Ak = µ (Ak \ (A1 Ak 1)) µ(Ak), ! ∪··· − ≤ k[=1 Xk=1 Xk=1 where we used µ(B A) µ(B) (from (d)). \ ≤ (f) Since µ is additive, and monotone

n n ∞ µ(Ak)= µ Ak µ Ak . ! ≤ ! Xk=1 Xk=1 Xk=1 Taking the supremum on the left gives the assertion.

Proposition 12.4 Let µ be an additive function on the algebra A. Consider the following state- ments

(a) µ is countably additive. ∞ (b) For any increasing sequence A A , A A, with A = A A we n ⊆ n+1 n ∈ n ∈ n=1 [ have lim µ(An)= µ(A). n →∞ ∞ (c) For any decreasing sequence A A , A A, with A = A A and n ⊇ n+1 n ∈ n ∈ n=1 \ µ(An) < we have lim µ(An)= µ(A). ∞ n (d) Statement (c) with A→∞= ∅ only.

We have (a) (b) (c) (d). In case µ(X) < (µ is finite), all statements are equivalent. ↔ → → ∞ 312 12 Measure Theory and Integration

Proof. (a) (b). Without loss of generality A1 = ∅. Put Bn = An An 1 for n = 2, 3,... . → \ − Then B is a disjoint family with A = B B B and A = ∞ B . Hence, by { n} n 2 ∪ 3 ∪···∪ n n=1 n countable additivity of µ, S n n ∞ µ(A)= µ(Bk) = lim µ(Bk) = lim µ Bk = lim µ(An). n n n →∞ →∞ ! →∞ Xk=2 Xk=2 Xk=2 (b) (a). Let A be a family of disjoint sets in A with A = A A; put B = → { n} n ∈ k A A . Then B is an increasing to A sequence. By (b) 1 ∪···∪ k k S n n ∞ µ(Bn)= µ Ak = µ(Ak) µ(A)= µ Ak ; µ is additive n ! −→→∞ ! Xk=1 Xk=1 Xk=1

∞ ∞ Thus, µ(Ak)= µ Ak . k=1 k=1 ! (b) X(c). Since A is decreasingX to A, A A is increasing to A A. By (b) → n 1 \ n 1 \

µ(A1 An) µ(A1 A), \ n \ −→→∞ hence µ(A1) µ(An) µ(A1) µ(A) which implies the assertion. − n−→ − (c) (d) is trivial. →∞ → Now let µ be finite, in particular, µ(B) < for all B A. We show (d) (b). Let (A ) be an ∞ ∈ → n increasing to A sequence of subsets A, A A. Then (A A ) is a decreasing to ∅ sequence. n ∈ \ n By (d), µ(A An) 0. Since µ is subtractive (Proposition 12.3 (d)) and all values are finite, \ n −→→∞ µ(An) µ(A). n

−→→∞ Ê Proposition 12.5 Let α be a right-continuous increasing function α: Ê , and µ the → α corresponding Lebesgue–Stieltjes content on E1. Then µα is countably additive if. Proof. For simplicity we write µ for µ . Recal that µ((a, b]) = α(b) α(a). We will perform α − the proof in case of ∞ (a, b]= (ak, bk] k[=1 with a disjoint family [ak, bk) of intervals. By Proposition12.3(f) we already know

∞ µ ((a, b]) µ((a , b ]). (12.1) ≥ k k Xk=1 We prove the opposite direction. Let ε > 0. Since α is continuous from the right at a, there

exists a [a, b) such that α(a ) α(a) <ε and, similarly, for every k Æ there exists c > b 0 ∈ 0 − ∈ k k such that α(c ) α(b ) < ε/2k. Hence, k − k ∞ ∞ [a , b] (a , b ] (a ,c ) 0 ⊂ k k ⊂ k k k[=1 k[=1 12.1 Measure Theory 313 is an open covering of a compact set. By Heine–Borel (Definition6.14) there exists a finite subcover N N [a , b] (a ,c ), hence (a , b] (a ,c ], 0 ⊂ k k 0 ⊂ k k k[=1 k[=1 such that by Proposition12.3(e)

N µ((a , b]) µ((a ,c ]). 0 ≤ k k Xk=1

By the choice of a0 and ck, ε µ((a ,c ]) = µ((a , b ]) + α(c ) α(b ) µ((a , b ]) + . k k k k k − k ≤ k k 2k

Similarly, µ((a, b]) = µ((a, a0]) + µ((a0, b]) such that

N ε µ((a, b]) µ((a , b]) + ε µ((a , b ]) + + ε ≤ 0 ≤ k k 2k k=1   N X ∞ µ((a , b ])+2ε µ((a , b ])+2ε ≤ k k ≤ k k Xk=1 Xk=1 Since ε was arbitrary, ∞ µ((a, b]) µ((a , b ]). ≤ k k Xk=1 In view of (12.1), µα is countably additive.

Corollary 12.6 The correspondence µ α from Example12.1(d) defines a bijection be- 7→ µ tween countably additive functions µ on E1 and the monotonically increasing, right-continuous

functions α on Ê (up to constant functions, i.e. α and α + c define the same additive function). Historical Note. It was the great achievment of Emile´ Borel (1871–1956) that he really proved the countable additivity of the Lebesgue measure. He realized that the countable additivity of µ is a serious mathematical problem far from being evident.

12.1.3 Extension of Countably Additive Functions Here we must stop the rigorous treatment of measure theory. Up to now, we know only two trivial examples of measures (Example 12.1 (a) and (b)). We give an outline of the steps toward the construction of the Lebesgue measure.

Construction of an outer measure µ∗ on P(X) from a countably additive function µ on an • algebra A.

Construction of the σ-algebra A of measurable sets. • µ 314 12 Measure Theory and Integration

The extension theory is due to Carath´eodory (1914). For a detailed treatment, see [Els02, Sec- tion II.4].

Theorem 12.7 (Extension and Uniqueness) Let µ be a countably additive function on the al- gebra A. (a) There exists an extension of µ to a measure on the σ-algebra σ(A) which coincides with µ on A. We denote the measure on σ(A) also by µ. It is defined as the restriction of the outer measure µ∗ : P(X) [0, ] → ∞

∞ ∞

µ∗(A) = inf µ(A ) A A , A A, n Æ n | ⊂ n n ∈ ∈ ( n=1 ) Xk=n [ to the µ-measurable sets Aµ∗ . (b) This extension is unique, if (X, A,µ) is σ-finite.

(For a proof, see [Br¨o92, (2.6), p.68])

Remark 12.3 (a) A subset A X is said to be µ-measurable if for all Y X ⊂ ⊂ c µ∗(Y )= µ∗(A Y )+ µ∗(A Y ). ∩ ∩

The family of µ-measurable sets form a σ-algebra Aµ∗ . (b) We have A σ(A) A ∗ and µ∗(A)= µ(A) for all A A. ⊂ ⊂ µ ∈ (c) (X, Aµ∗ ,µ∗) is a measure space, in particular, µ∗ is countably additive on the measurable sets Aµ∗ and we redenote it by µ.

n 12.1.4 The Lebesgue Measure on Ê

Using the facts from the previous subsection we conclude that for any increasing, right contin-

uous function α on Ê there exists a measure µα on the σ-algebra of Borel sets. We call this

measure the Lebesgue–Stieltjes measure on Ê. In case α(x) = x we call it the Lebesgue mea- n sure. Extending the Lebesgue content on elementary sets of Ê to the Borel algebra Bn, we n obtain the n-dimensional Lebesgue measure λn on Ê .

Completeness

A measure µ: A Ê + on a σ-algebra A is said to be complete if A A, µ(A)=0, and → ∈ n B A implies B A. It turns out that the Lebesgue measure λ on the Borel sets of Ê is ⊂ ∈ n not complete. Adjoining to Bn the subsets of measure-zero-sets, we obtain the σ-algebra Aλn of Legesgue measurable sets Aλn .

n

A = σ (B X Ê B B : X E, λ (B)=0 ) . λn n ∪{ ⊆ | ∃ ∈ n ⊂ n }

The Lebesgue measure λn on Aλn is now complete. 12.1 Measure Theory 315

n Remarks 12.4 (a) The Lebesgue measure is invariant under the motion group of Ê . More n n precisely, let O(n) = T Ê × T ⊤T = T T ⊤ = E be the group of real orthogonal { ∈ | n} n n-matrices (“motions”), then × λ (T (A)) = λ (A), A A , T O(n). n n ∈ λn ∈ n (b) λ is translation invariant,i.e. λ (A)= λ (x+A) for all x Ê . Moreover, the invariance n n n ∈ of λn under translations uniquely characterizes the Lebesgue measure λn: If λ is a translation

invariant measure on Bn, then λ = cλn for some c Ê+.

n ∈ Ê (c) There exist non-measurable subsets in Ê . We construct a subset E of that is not Lebesgue measurable.

We write x y if x y is rational. This is an equivalence relation since x x for all x Ê, ∼ − ∼ ∈ x y implies y x for all x and y, and x y and y z implies x z. Let E be a subset of ∼ ∼ ∼ ∼ ∼ (0, 1) that contains exactly one point in every equivalence class. (the assertion that there is such a set E is a direct application of the axiom of choice). We claim that E is not measurable. Let E + r = x + r x E . We need the following two properties of E: { | ∈ } (a) If x (0, 1), then x E + r for some rational r ( 1, 1). ∈ ∈ ∈ − (b) If r and s are distinct rationals, then (E + r) (E + s)= ∅. ∩ To prove (a), note that for every x (0, 1) there exists y E with x y. If r = x y, then ∈ ∈ ∼ − x = y + r E + r. ∈ To prove (b), suppose that x (E + r) (E + s). Then x = y + r = z + s for some y, z E. ∈ ∩ ∈ Since y z = s r =0, we have y z, and E contains two equivalent points, in contradiction − − 6 ∼ to our choice of E. Now assume that E is Lebesgue measurable with λ(E) = α. Define S = (E + r) where the union is over all rational r ( 1, 1). By (b), the sets E + r are pairwise disjoint; since λ ∈ − [ is translation invariant, λ(E + r) = λ(E) = α for all r. Since S ( 1, 2), λ(S) 3. The ⊂ − ≤ countable additivity of λ now forces α = 0 and hence λ(S)=0. But (a) implies (0, 1) S, ⊂ hence 1 λ(S), and we have a contradiction. ≤ (d) Any countable set has Lebesgue measure zero. Indeed, every single point is a box with edges of length 0; hence λ( pt )=0. Since λ is countably additive, { } ∞ λ( x , x ,...,x ,... )= λ( x )=0. { 1 2 n } { n} n=1 X

In particular, the rational numbers have Lebesgue measure 0, λ(É)=0. (e) There are uncountable sets with measure zero. The Cantor set (Cantor: 1845–1918, inventor of set theory) is a prominent example:

∞ ai C = i ai 0, 2 i Æ ; ( 3 ∈{ } ∀ ∈ ) i=1 X Obviously, C [0, 1]; C is compact and can be written as the intersection of a decreasing ⊂ sequence (C ) of closed subsets; C = [0, 1/3] [2/3, 1], λ(C )=2/3, and, recursively, n 1 ∪ 1 1 2 1 1 1 2 2 C = C + C = λ(C )= λ(C )+ λ C + = λ(C ). n+1 3 n ∪ 3 3 n ⇒ n+1 3 n 3 n 3 3 n     316 12 Measure Theory and Integration

a It turns out that C = ∞ i ia 0, 2 i =1,...,n Clearly, n i=1 3i i ∈{ } ∀ P 2 2 n 2 n+1 λ(C )= λ(C )= = λ(C )= . n+1 3 n ··· 3 1 3     By Proposition12.4(c), λ(C) = lim λ(Cn)=0. However, C has the same cardinality as n

→∞

Æ Æ

0, 2 = 0, 1 = Ê which is uncountable. { } ∼ { } ∼

12.2 Measurable Functions

Let A be a σ-algebra over X. Ê Definition 12.4 A real function f : X Ê is called A-measurable if for all a the set → ∈ x X f(x) > a belongs to A. { ∈ | }

A complex function f : X C is said to be A-measurable if both Re f and Im f are A- → measurable.

n Ê A function f : U Ê, U , is said to be a Borel function if f is Bn-measurable, i. e. f is → ⊂ n measurable with respect to the Borel algebra on Ê .

n m 1 Ê A function f : U V , U Ê , V , is called a Borel function if f − (B) is a Borel set → ⊂ ⊂ for all Borel sets B V . It is part of homework 39.3 (b) to prove that in case m = 1 these ⊂ definitions coincide. Also, f =(f1,...,fm) is a Borel function if all fi are.

1 Note that x X f(x) > a = f − ((a, + )). From Proposition 12.8 below it becomes clear { ∈ | } ∞ that the last two notions are consistent. Note that no measure on (X, A) needs to be specified to define a measurable function.

n Ê Example 12.2 (a) Any continuous function f : U Ê, U , is a Borel function. Indeed, 1 → ⊂ since f is continuous and (a, + ) is open, f − ((a, + )) is open as well and hence a Borel set ∞ ∞ (cf. Proposition12.2). (b) The characteristic function χ is A-measurable if and only if A A (see homework 35.3). A ∈ (c) Let f : U V and g : V W be Borel functions. Then g◦f : U W is a Borel function, → → 1 → too. Indeed, for any Borel set C W , g− (C) is a Borel set in V since g is a Borel function. ⊂1 1 1 Since f is a Borel function (g◦f)− (C)= f − (g− (C)) is a Borel subset of U which shows the assertion.

Proposition 12.8 Let f : X Ê be a function. The following are equivalent →

(a) x f(x) > a A for all a Ê (i. e. f is A-measurable). { | } ∈ ∈

(b) x f(x) a A for all a Ê. { | ≥ } ∈ ∈

(c) x f(x) < a A for all a Ê . { | } ∈ ∈

(d) x f(x) a A for all a Ê. { 1| ≤ } ∈ ∈ (e) f − (B) A for all Borel sets B B . ∈ ∈ 1 Proof. (a) (b) follows from the identity → [a, + ]= (a 1/n, + ] ∞ − ∞

n Æ \∈ 12.2 Measurable Functions 317

1 1 1 and the invariance of intersections under preimages, f − (A B)= f − (A) f − (B). ∩ ∩ Since f is A-measurable and A is a σ-algebra, the countable intersection on the right is in A. (a) (d) follows from x f(x) a = x f(x) > a c. The remaining directions are left → { | ≤ } { | } to the reader (see also homework 35.5).

Remark 12.5 (a) Let f,g : X Ê be A-measurable. Then x f(x) > g(x) and x → { | } { | f(x)= g(x) are in A. Proof. Since } x f(x)

q É [∈ and all sets f < q and qg . Note that the sets f g { } { ≥ } and f g are the complements of f g , respectively; hence they belong to { ≤ } { } { } A as well. Finally, f = g = f g f g . { } { ≥ }∩{ ≤ }

(b) It is not difficult to see that for any sequence (an) of real numbers

lim an = inf sup ak and lim an = sup inf ak. (12.2)

n n Æ k n k n n n Æ →∞ ∈ ≥ →∞ ∈ ≥

As a consequence we can construct new mesurable functions using sup and limn . Let (fn) →∞ be a sequence of A-measurable real functions on X. Then sup fn, inf fn, lim fn, lim fn are

n Æ n n Æ n ∈ ∈ →∞ →∞ A-measurable. In particular lim fn is measurable if the limit exists. n →∞

Proof. Note that for all a Ê we have ∈ sup f a = f a . { n ≤ } { n ≤ }

n Æ \∈

Since all fn are measurable, so is sup fn. A similar proof works for inf fn. By (12.2), lim fn n →∞ and lim fn, are measurable, too. n →∞

n Ê Proposition 12.9 Let f,g : X Ê Borel functions on X . Then αf + βg, f g, and f → ⊂ · | | are Borel functions, too.

2 Proof. The function h(x)=(f(x),g(x)): X Ê is a Borel function since its coordinate → functions are so. Since the sum s(x, y) = x + y and the product p(x, y) = xy are continuous functions, the compositions s◦h and p◦h are Borel functions by Example 12.2 (c). Since the constant functions α and β are Borel, so are αf, βg, and finally αf + βg. Hence, the Borel functions over X form a linear space, moreover a real algebra. In particular f is Borel and so − is f = max f, f . | | { − } 318 12 Measure Theory and Integration

f+ Let (X, A,µ) be a measure space and f : X Ê ar- f- + → bitrary. Let f = max f, 0 and f − = max f, 0 { } {− } denote the positive and negative parts of f. We have + + + f = f f − and f = f +f −; moreover f , f − 0. − | | ≥

+ Corollary 12.10 Let f is a Borel function if and only if both f and f − are Borel.

12.3 The Lebesgue Integral

We define the Lebesgue integral of a complex function in three steps; first for positive simple functions, then for positive measurable functions and finally for arbitrary measurable functions. In this section (X, A,µ) is a measure space.

12.3.1 Simple Functions

Definition 12.5 Let M X be a subset. The function ⊆ 1, x M, χM (x)= ∈ (0, x M, 6∈ is called characteristic function of M.

An A-measurabel function f : X Ê is called simple if f takes only finitely many values → c1,...,cn. We denote the set of simple functions on (X, A) by S; the set of non-negative simple functions is denoted by S+.

Clearly, if c1,...,cn are the distinct values of the simple function f, then

n

f = ciχAi , i=1 X where A = x f(x) = c . It is clear, that f measurable if and only if A A for all i. i { | i} i ∈ Obviously, A i =1,...,n is a disjoint family of subsets of X. { i | } It is easy to see that f,g S implies αf + βg S, max f,g S, min f,g S, and fg S. ∈ ∈ { } ∈ { } ∈ ∈

Step 1: Positive Simple Functions

n For f = c χ S define i Ai ∈ + i=1 X n

f dµ = ci µ(Ai). (12.3) X i=1 Z X The convention 0 (+ ) is used here; it may happen that c =0 for some i and µ(A )=+ . · ∞ i i ∞ 12.3 The Lebesgue Integral 319

Remarks 12.6 (a) Since ci 0 for all i, the right hand side is well-defined in Ê. ≥ m m

(b) Given another presentation of f, say, f(x)= djχBj , dj µ(Bj) gives the same value j=1 j=1 as (12.3). X X

The following properties are easily checked.

Lemma 12.11 For f,g S , A A, c Ê we have ∈ + ∈ ∈ +

(1) X χA dµ = µ(A).

(2) RX cf dµ = c X f dµ.

(3) RX (f + g) dµR= X f dµ + X g dµ. (4) fR g implies Rf dµ Rg dµ. ≤ X ≤ X R R 12.3.2 Positive Measurable Functions The idea is to approximate a positive measurable function with an increasing sequence of posi- tive simple ones.

Theorem 12.12 Let f : X [0, + ] be measurable. There exist simple functions s , n Æ, → ∞ n ∈ on X such that

(a) 0 s s f. ≤ 1 ≤ 2 ≤···≤ (b) sn(x) f(x), as n , for every x X. n −→→∞ →∞ ∈ y

f Example. X =(a, b), n =1, 1 1 i 2. Then s1 ≤ ≤ 3/4 1 E 1 E 24 E11 = f − 0, , 24 2 1/2   1 1 E = f − , 1 , 12 2   1 F2 = f − ([1, + ]) . 0 ∞ a E b E E F 12 E11 x 11 12 1

n Proof. For n Æ and for 1 i n2 , define ∈ ≤ ≤

1 i 1 i 1 E = f − − , and F = f − ([n, ]) ni 2n 2n n ∞   and put n n2 i 1 s = − χ + nχ . n 2n Eni Fn i=1 X 320 12 Measure Theory and Integration

Proposition12.8 shows that Eni and Fn are measurable sets. It is easily seen that the functions s satisfy (a). If x is such that f(x) < + , then n ∞ 1 0 f(x) s (x) (12.4) ≤ − n ≤ 2n

as soon as n is large enough, that is, x E for some n, i Æ and not x F . If f(x)=+ , ∈ ni ∈ ∈ n ∞ then sn(x)= n; this proves (b).

From (12.4) it follows, that sn ⇉ f uniformly on X if f is bounded.

Step 2: Positive Measurable Real Functions

Definition 12.6 (Lebesgue Integral) Let f : X [0, + ] be measurable. Let (s ) be an → ∞ n increasing sequence of non-negative simple functions s converging to f(x) for all x X, n ∈ lim sn(x) = sup sn(x)= f(x). Define n n Æ →∞ ∈

f dµ = lim sn dµ = sup sn dµ (12.5) n X X n Æ X Z →∞ Z ∈ Z and call this number in [0, + ] the Lebesgue integral of f(x) over X with respect to the mea- ∞ sure µ or µ-integral of f over X. The definition of the Lebesgue integral does not depend on the special choice of the increasing functions s f. One can define n ր f dµ = sup s dµ s f, and s is a simple function . | ≤ ZX ZX 

Observe, that we apparently have two definitions for X f dµ if f is a simple function. However these assign the same value to the integral since f is the largest simple function greater than or R equal to f.

Proposition 12.13 The properties (1) to (4) from Lemma12.11 hold for any non-negative mea-

surable functions f,g : X [0, + ], c Ê . → ∞ ∈ + (Without proof.)

Step 3: Measurable Real Functions

+ + Let f : X Ê be measurable and f (x) = max(f, 0), f −(x) = max( f(x), 0). Then f → − and f − are both positive and measurable. Define

+ fdµ = f dµ f −dµ − ZX ZX ZX if at least one of the integrals on the right is finite. We say that f is µ-integrable if both are finite. 12.3 The Lebesgue Integral 321

Step 4: Measurable Complex Functions

Definition 12.7 (Lebesgue Integral—Continued) A complex, measurable function

f : X C is called µ-integrable if → f dµ< . | | ∞ ZX If f = u + iv is µ-integrable, where u = Re f and v = Im f are the real and imaginary parts of f, u and v are real measurable functions on X. Define the µ-integral of f over X by

+ + f dµ = u dµ u− dµ + i v dµ i v− dµ. (12.6) − − ZX ZX ZX ZX ZX + + These four functions u , u−, v , and v− are measurable, real, and non-negative. Since we have u+ u f etc., each of these four integrals is finite. Thus, (12.6) defines the integral on ≤ | |≤| | the left as a complex number.

We define L 1(X,µ) to be the collection of all complex µ-integrable functions f on X.

Note that for an integrable functions f, X f dµ is a finite number. R Proposition 12.14 Let f,g : X C be measurable. → (a) f is µ-integrable if and only if f is µ-integrable and we have | |

f dµ f dµ. ≤ | | ZX ZX

(b) f is µ-integrable if and only if there exists an integrable function h with f | | ≤ h.

(c) If f,g are integrable, so is c1f + c2g where

(c1f + c2g) dµ = c1 f dµ + c2 g dµ. ZX ZX ZX (d) If f g on X, then f dµ g dµ. ≤ X ≤ X It follows that the set L 1(X,µR ) of µ-integrableR complex-valued functions on X is a linear space. The Lebesgue integral defines a positive linear functional on L 1(X,µ). Note that (b) implies that any measurable and bounded function f on a space X with µ(X) < is integrable. ∞

Step 5: A f dµ C DefinitionR 12.8 Let A A, f : X Ê or f : X measurable. The function f is called ∈ → → µ-integrable over A if χA f is µ-integrable over X. In this case we put,

f dµ = χA f dµ. ZA ZX

In particular, Lemma 12.11 (1) now reads A dµ = µ(A). R 322 12 Measure Theory and Integration

12.4 Some Theorems on Lebesgue Integrals

12.4.1 The Role Played by Measure Zero Sets Equivalence Relations

Let X be a set and R X X. For a, b X we write a b if (a, b) R. ⊂ × ∈ ∼ ∈ Definition 12.9 (a) The subset R X X issaidtobean equivalence relation if R is reflexive, ⊂ × symmetric and transitive, that is,

(r) x X : x x. ∀ ∈ ∼ (s) x, y X : x y = y x. ∀ ∈ ∼ ⇒ ∼ (t) x, y, z X : x y y z = x z. ∀ ∈ ∼ ∧ ∼ ⇒ ∼ For a X the set a := x X x a is called the equivalence class of a. We have a = b if ∈ { ∈ | ∼ } and only if a b. ∼ (b) A partition P of X is a disjoint family P = A α I of subsets A X such that { α | ∈ } α ⊂ Aα = X. α I ∈ P The set of equivalence classes is sometimes denoted by X/ . ∼

Example 12.3 (a) On Z define a b if 2 (a b). a and b are equivalent if both are odd or both

∼ | − Z are even. There are two equivalence classes, 1= 5=2Z +1 (odd numbers), 0= 100 = 2 − even numbers. (b) Let W V be a subspace of the linear space V . For x, y V define x y if x y W . ⊂ ∈ ∼ − ∈ This is an equivalence relation, indeed, the relation is reflexive since x x = 0 W , it − ∈ is symmetric since x y W implies y x = (x y) W , and it is transitive since − ∈ − − − ∈ x y,y z W implies that there sum (x y)+(y z)= x z W such that x z. One − − ∈ − − − ∈ ∼ has 0= W and x = x + W := x + w w W . Set set of equivalence classes with respect to { | ∈ } this equivalence relation is called the factor space or quotient space of V with respect to W and is denoted by V/W . The factor space becomes a linear space if we define x + y := x + y and

λ x = λx, λ C. Addition is indeed well-defined since x x′ and y y′, say, x x′ = w , ∈ ∼ ∼ − 1 y y′ = w , w ,w W implies x + y (x′ + y′)= w + w W such that x + y = x + y . − 2 1 2 ∈ − 1 2 ∈ ′ ′

(c) Similarly as in (a), for m Æ define the equivalence relation a b (mod m) if m (a b). ∈ ≡ | − We say “a is congruent b modulo m”. This defines a partition of the integers into m disjoint

equivalence classes 0, 1,..., m 1, where r = am + r a Z . − { | ∈ } (d) Two triangles in the plane are equivalent if

(1) there exists a translation such that the first one is mapped onto the second one. (2) there exists a rotation around (0, 0) (3) there exists a motion (rotation or translation or reflexion or composition)

Then (1) – (3) define different equivalence relations on triangles or more generally on subsets of the plane. (e) Cardinality of sets is an equivalence relation. 12.4 Some Theorems on Lebesgue Integrals 323

Proposition 12.15 (a) Let be an equivalence relation on X. Then P = x x X defines ∼ { | ∈ } a partition on X denoted by P . ∼ (b) Let P be a partition on X. Then x y if there exists A P with x, y A defines an ∼ ∈ ∈ equivalence relation P on X. ∼ (c) P∼ = and P P = P. ∼ ∼ ∼ Let P be a property which a point x may or may not have. For instance, P may be the property

“f(x) > 0” if f is a given function or “(fn(x)) converges” if (fn) is a given sequence of functions.

Definition 12.10 If (X, A,µ) is a measure space and A A, we say “P holds almost every- ∈ where on A”, abbreviated by “P holds a.e. on A”, if there exists N A such that µ(N)=0 ∈ and P holds for every point x A N. ∈ \ This concept of course strongly depends on the measure µ, and sometimes we write a.e. to emphasize the dependence on µ.

(a) Main example. On the set of measurable functions on X, f : X C we define an equiva- → lence relation by f g, if f = g a. e. on X. ∼ This is indeed an equivalence relation since f(x) = f(x) for all x X, f(x) = g(x) implies ∈ g(x) = f(x). Let f = g a. e. on X and g = h a. e. on X, that is, there exist M, N A with ∈ µ(M) = µ(N)=0 and f(x) = g(x) for all x X M, g(x) = h(x) for all x N. Hence, ∈ \ ∈ f(x) = h(x) for all x M N. Since 0 µ(M N) µ(M)+ µ(N)=0+0=0 by ∈ ∪ ≤ ∪ ≤ Proposition 12.3 (e), µ(M N)=0 and finally f = h a. e. on X. ∪ (b) Note that f = g a. e. on X implies

f dµ = g dµ. ZX ZX Indeed, let N denote the zero-set where f = g. Then 6 f dµ g dµ f g dµ = f g dµ + f g dµ − ≤ | − | | − | | − | ZX ZX ZX ZN ZX \ N

µ(N)( )+ µ(X \ N) 0=0. ≤ ∞ · Here we used that for disjoint sets A, B A, ∈

f dµ = χA Bf dµ = χAf dµ + χBf dµ = f dµ + f dµ. A B X ∪ X X A B Z ∪ Z Z Z Z Z Proposition 12.16 Let f : X [0, + ] be measurable. Then → ∞

X f dµ =0 if and only if f =0 a.e. on X. R Proof. By the above argument in (b), f =0 a.e. implies X f dµ = X 0 dµ = 0 which proves one direction. The other direction is homework 40.4. R R 324 12 Measure Theory and Integration

12.4.2 The space Lp(X,µ)

For any measurable function f : X C and any real p, 1 p< define → ≤ ∞ 1 p f = f p dµ . (12.7) k kp | | ZX  This number may be finiteor . In the first case, f p is integrable and we write f L 1(X,µ). ∞ | | ∈ 1 1 Proposition 12.17 Let p, q 1 be given such that p + q =1. ≥ p q (a) Let f,g : X C be measurable functions such that f L and g L . → ∈ ∈ Then fg L 1 and ∈ fg dµ f g (Holder¨ inequality). (12.8) | | ≤k kp k kq ZX (b) Let f,g L q. Then f + g L q and ∈ ∈ f + g f + g , (Minkowski inequality). (12.9) k kq ≤k kq k kq Idea of proof. H¨older follows from Young’s inequality (Proposition1.31, as in the calssical case n of H¨older’s inequality in Ê , see Proposition1.32 Minkowski’s inequality follows from H¨older’s inequality as in Propostion1.34 Note that Minkowski implies that f,g L p yields f + g < such that f + g L p. In ∈ k k ∞ ∈ particular, L p is a linaer space. Let us check the properties of . For all measurable f,g we have k·kp f 0, k kp ≥ λf = λ f , k kp | | k kp f + g f + g . k k≤k k k k All properties of a norm, see Definition6.9 at page 179 are satisfied except for the definitness: f = 0 imlies f p dµ = 0 implies by Proposition12.16, f p = 0 a.e. implies f = 0 k kp X | | | | a.e. . However, it does not imply f = 0. To overcome this problem, we use the equivalece R relation f = g a.e. and consider from now on only equivalence classes of functions in L p, that is we identify functions f and g which are equal a. e. .

The space N = f : X C f is measurable and f = 0 a. e. is a linear subspace of { → | } L p(X,µ) for all all p, and f = g a.e. if and only if f g N. Then the factor space L p/N, − ∈ see Example12.3(b) is again a linear space.

Definition 12.11 Let (X, A,µ) be a measure space. Lp(X,µ) denotes the set of equivalence classes of functions of L p(X,µ) with respect to the equivalence relation f = g a.e. that is,

Lp(X,µ)= L p(X,µ)/N is the quotient space. (Lp(X,µ), ) is a normed space. With this norm Lp(X,µ) is complete. k·kp 12.4 Some Theorems on Lebesgue Integrals 325

p

Ê Ê É Example 12.4 (a) We have χÉ = 0 in L ( ,λ) since χ = 0 a. e. on with respect to the

Lebesgue measure (note that É is a set of measure zero). p p p (b) In case of sequence spaces L (Æ) with respect to the counting measure, L = L since f =0 a.e. implies f =0. 1 2 (c) f(x) = xα , α > 0 is in L (0, 1) if and only if 2α < 1. We identify functions and their equivalence classes.

12.4.3 The Monotone Convergence Theorem The following theorem about the monotone convergence by Beppo Levi (1875–1961) is one of the most important in the theory of integration. The theorem holds for an arbitrary increasing sequence of measurable functions with, possibly, f dµ =+ . X n ∞ R Theorem 12.18 (Monotone Convergence Theorem/Lebesgue) Let (fn) be a sequence of measurable functions on X and suppose that

(1) 0 f (x) f (x) + for all x X, ≤ 1 ≤ 2 ≤···≤ ∞ ∈ (2) fn(x) f(x), for every x X. n −→→∞ ∈ Then f is measurable, and

lim fn dµ = f dµ = lim fn dµ. n X X X n →∞ Z Z Z  →∞  (Without proof) Note, that f is measurable is a consequence of Remark 12.5 (b).

Corollary 12.19 (Beppo Levi) Let f : X [0, + ] be measurable for all n Æ, and n → ∞ ∈ ∞ f(x)= f (x) for x X. Then n ∈ n=1 X ∞ ∞ fn dµ = fn dµ. X n=1 n=1 X

Z X X Z Æ

Example 12.5 (a) Let X = Æ, A = P( ) the σ-algebra of all subsets, and µ the counting Æ measure on Æ. The functions on can be identified with the sequences (xn), f(n) = xn. Trivially, any function is A-measurable.

What is xn dµ? First, let f 0. For a simple function gn, given by gn = xnχ n , we obtain Æ ≥ { } ∞ g dµ =R x µ( n )= x . Note that f = g and g 0 since x 0. By Corollary 12.19, n n { } n n n ≥ n ≥ n=1 R X ∞ ∞

f dµ = gn dµ = xn. Æ Æ n=1 n=1 Z X Z X

Now, let f be arbitrary integrable, i. e. f dµ < ; thus ∞ xn < . Therefore, Æ n=1 1 | | ∞ | | ∞ (x ) L (Æ,µ) if and only if x converges absolutely. The space of absolutely convergent n ∈ n R P series is denoted by ℓ or ℓ (Æ). 1 1 P 326 12 Measure Theory and Integration

(b) Let a 0 for all n, m Æ. Then nm ≥ ∈ ∞ ∞ ∞ ∞ amn = amn. n=1 m=1 m=1 n=1

X X X X

Æ Æ Proof. Consider the measure space (Æ, P( ),µ) from (a). For n define functions f (m)= ∈ n amn. By Corollary 12.19 we then have

∞ ∞ ∞ ∞ fn(m) dµ = f dµ = f(m)= amn (a) X n=1 X m=1 m=1 n=1 Z X Z X X X f(m) |∞ {z } ∞ ∞ = fn(m) dµ = amn. n=1 X n=1 m=1 X Z X X

Proposition 12.20 Let f : X [0, + ] be measurable. Then → ∞ ϕ(A)= f dµ, A A ∈ ZA defines a measure ϕ on A. Proof. Since f 0, ϕ(A) 0 for all A A. Let (A ) be a countable disjoint family of ≥ ≥ ∈ n measurable sets A A and let A = ∞ A . By homework 40.1, χ = ∞ χ and n ∈ n=1 n A n=1 An therefore P P ∞ ∞ ϕ(A)= f dµ = χAf dµ = χAn f dµ = χAn f dµ B.Levi A X X n=1 n=1 X Z Z Z X X Z ∞ ∞ = f dµ = ϕ(An). n=1 An n=1 X Z X

12.4.4 The Dominated Convergence Theorem Besides the monotone convergence theorem the present theorem is the most important one. It is due to Henry Lebesgue. The great advantage, compared with Theorem6.6, is that µ(X)= is ∞ allowed, that is, non-compact domains X are included. We only need the pointwise convergence of (fn), not the uniform convergence. The main assumtion here is the existence of an integrable upper bound for all fn.

Theorem 12.21 (Dominated Convergence Theorem of Lebesgue) Let f : X Ê or n → g, f : X C be measurable functions such that n → 12.4 Some Theorems on Lebesgue Integrals 327

(1) f (x) f(x) as n a.e. on X, n → →∞ (2) f (x) g(x) a.e. on X, | n | ≤ (3) g dµ< + . ∞ ZX Then f is measurable and integrable, f dµ< , and X | | ∞ R lim fn dµ = f dµ = lim fn dµ, n n →∞ ZX ZX ZX →∞ lim fn f dµ =0. (12.10) n | − | →∞ ZX 1 Note, that (12.10) shows that (fn) converges to f in the normed space L (X,µ).

Example 12.6 (a) Let A A, n Æ, A A be an increasing sequence with n ∈ ∈ 1 ⊂ 2 ⊂ ··· A = A. If f L 1(A, µ), then f L 1(A ) for all n and n ∈ ∈ n n Æ [∈ lim f dµ = f dµ. (12.11) n →∞ ZAn ZA Indeed, the sequence (χ f) is pointwise converging to χ f since χ (x)=1 iff x A An A A ∈ iff x An for all n n0 iff limn χAn (x)=1. Moreover, χAn f χAf which is ∈ ≥ →∞ | |≤| | integrable. By Lebesgue’s theorem,

lim f dµ = lim χAn f = χAf dµ = f dµ. n n →∞ ZAn →∞ ZX ZX ZA However, if we do not assume f L 1(A, µ), the statement is not true (see Remark12.7 be- ∈ low).

Exhausting theorem. Let (An) be an increasing sequencce of measurable sets and A =

∞ An. suppose that f is measurable, and f dµ is a bounded sequence. Then f n=1 An ∈ L 1(A, µ) and (12.11) holds. S R (b) Let f (x)=( 1)nxn on [0, 1]. The sequence is dominated by the integrable function n − 1 fn(x) for all x [0, 1]. Hence limn fndλ =0= limn fndλ. ≥| | ∈ →∞ [0,1] [0,1] →∞ R R 12.4.5 Application of Lebesgue’s Theorem to Parametric Integrals As a direct application of the dominated convergence theorem we now treat parameter depen- dent integrals see Propostions7.22 and 7.23

n Proposition 12.22 (Continuity) Let U Ê be an open connected set, t0 U, and

m ⊂ ∈ Ê f : Ê U be a function. Assume that × → m (a) for a.e. x Ê , the function t f(x, t) is continuous at t0,

∈ 7→ m Ê (b) There exists an integrable function F : Ê such that for every t U, → ∈ m

f(x, t) F (x) a.e. on Ê . | | ≤ 328 12 Measure Theory and Integration

Then the function g(t)= f(x, t) dx m ZÊ is continuous at t0.

Proof. First we note that for any fixed t U, the function ft(x) = f(x, t) is integrable on m ∈ Ê since it is dominated by the integrable function F . We have to show that for any sequence t t , t U, g(t ) tends to g(t ) as n . We set f (x) = f(x, t ) and f (x) = f(x, t ) j → 0 j ∈ j 0 → ∞ j j 0 0 for all n Æ. By (b) we have ∈ m f0(x) = lim fj(x), a. e. x Ê . j →∞ ∈ By (a) and (c), the assumptions of the dominated convergence theorem are satisfied and thus

lim g(tj) = lim fj(x) dx = lim fj(x) dx = f0(x) dx = g(t0).

j j m m j m

Ê Ê →∞ →∞ ZÊ Z →∞ Z

Proposition 12.23 (Differentiation under the Integral Sign) Let I Ê be an open interval

m ⊂ Ê and f : Ê I be a function such that × → (a) for every t I the function x f(x, t) is integrable, ∈ m 7→ (b) for almost all x Ê , the function t f(x, t) is finite and continuously ∈ 7→ differentiable,

m Ê (c) There exists an integrable function F : Ê such that for every t I → ∈ ∂f m

(x, t) F (x), a.e. x Ê . ∂t ≤ ∈

Then the function g(t)= m f(x, t) dx is differentiable on I with Ê R ∂f g′(t)= (x, t) dx. m

Ê ∂t Z The proof uses the previous theorem about the continuity of the parametric integral. A detailed proof is to be found in [K¨on90, p. 283].

1 ˆ

Ê C Example 12.7 (a) Let f L (Ê). Then the Fourier transform f : , ˆ 1 itx ∈ → f(t)= e f(x) dx is continuous on Ê, see homework 41.3.

√2π − Ê

3 Ê (b) Let K R Ê be a compact subset and ρ: K integrable, the Newton potential (with ⊂ → mass density ρ) is given by ρ(x) u(t)= dx, t K. x t 6∈ ZK k − k 3 Then u(t) is a harmonic function on Ê \ K. 2 Similarly, if K Ê is compact and ρ L (K), the Newton potential is given by ⊂ ∈ u(t)= ρ(x) log x t dx, t K. k − k 6∈ ZK 2 Then u(t) is a harmonic function on Ê \ K. 12.4 Some Theorems on Lebesgue Integrals 329

12.4.6 The Riemann and the Lebesgue Integrals Proposition 12.24 Let f be a bounded function on the finite interval [a, b]. (a) f is Riemann integrable on [a, b] if and only if f is continuous a.e. on [a, b]. (b) If f is Riemann integrable on [a, b], then f is Lebesgue integrable, too. Both integrals coincide.

Let I Ê be an interval such that f is Riemann integrable on all compact subintervals of I. ⊂ (c) f is Lebesgue integrable on I if and only if f is improperly Riemann integrable on I (see | | Section 5.4); both integrals coincide.

Remarks 12.7 (a) The characteristic function χÉ on [0, 1] is Lebesgue but not Riemann inte-

grable; χÉ is nowhere continuous on [0, 1]. (b) The (improper) Riemann integral

∞ sin x dx x Z1 converges (see Example 5.11); however, the Lebesgue integral does not exist since the integral does not converge absolutely. Indeed, for non-negative integers n 1 we have with some c> 0 ≥ (n+1)π sin x 1 (n+1)π c dx sin x dx = ; x ≥ (n + 1)π | | (n + 1)π Znπ Znπ hence (n+1)π sin x c n 1 dx . x ≥ π k +1 Zπ k=1 X Since the harmonic series diverges, so does the integral ∞ sin x dx. 1 x R

12.4.7 Appendix: Fubini’s Theorem Theorem 12.25 Let (X , A ,µ ) and (X , A ,µ ) be σ-finite measure spaces, let f be an A 1 1 1 2 2 2 1 ⊗ A -measurable function and X = X X . 2 1 × 2 (a) If f : X [0, + ], ϕ(x1)= f(x1, x2) dµ2, ψ(x2)= f(x1, x2) dµ1, then → ∞ X2 X1 R R ψ(x2) dµ2 = f d(µ1 µ2)= ϕ(x1) dµ1. X2 ⊗ X1 Z X1ZX2 Z × (b) If f L 1(X,µ µ ) then ∈ 1 ⊗ 2

f d(µ1 µ2)= f(x1, x2) dµ1 dµ2. X1 X2 ⊗ X2 X1 Z × Z Z  Here A A denotes the smallest σ-algebra over X, which contains all sets A B, A A 1 ⊗ 2 × ∈ 1 and B A . Define µ(A B)= µ (A)µ (B) and extend µ to a measure µ µ on A A . ∈ 2 × 1 2 1 ⊗ 2 1 ⊗ 2 Remark 12.8 In (a), as in Levi’s theorem, we don’t need any assumption on f to change the order of integration since f 0. In (b) f is an arbitrary measurable function on X X , ≥ 1 × 2 however, the integral f dµ needs to be finite. X | | R 330 12 Measure Theory and Integration Chapter 13

Hilbert Space

Functional analysis is a fruitful interplay between linear algebra and analysis. One defines function spaces with certain properties and certain topologies and considers linear operators between such spaces. The friendliest example of such spaces are Hilbert spaces. This chapter is divided into two parts—one describes the geometry of a Hilbert space, the second is concerned with linear operators on the Hilbert space.

13.1 The Geometry of the Hilbert Space

13.1.1 Unitary Spaces

Ê Ã C Let E be a linear space over à = or over = .

Definition 13.1 An inner product on E is a function , : E E Ã with h· ·i × → (a) λ x + λ x , y = λ x , y + λ x , y (Linearity) h 1 1 2 2 i 1 h 1 i 2 h 2 i (b) x, y = y , x . (Hermitian property) h i h i (c) x , x 0 for all x E, and x , x = 0 implies x = 0 (Positive definite- h i ≥ ∈ h i ness)

A unitary space is a linear space together with an inner product.

Let us list some immediate consequences from these axioms: From (a) and (b) it follows that

(d) y, λ x + λ x = λ y , x + λ y , x . h 1 1 2 2i 1 h 1i 2 h 2i A form on E E satisfying (a) and (d) is called a sesquilinear form. (a) implies 0 , y = 0 × h i for all y E. The mapping x x, y is a linear mapping into à (a linear functional) for all ∈ 7→ h i y E. ∈ By (c), we may define x , the norm of the vector x E to be the square root of x , x ; thus k k ∈ h i x 2 = x , x . (13.1) k k h i

331 332 13 Hilbert Space

Proposition 13.1 (Cauchy–Schwarz Inequality) Let (E, , ) be a unitary space. For h· ·i x, y E we have ∈ x, y x y . |h i|≤k k k k

Equality holds if and only if x = βy for some β Ã.

∈ Ê Proof. Choose α C, α =1 such that α y , x = x, y . For λ we then have (since ∈ | | h i |h i| ∈ α x, y = α y , x = x, y ) h i h i |h i| x αλy, x αλy = x , x αλ y , x αλ x, y + λ2 y,y h − − i h i − h i − h i h i = x , x 2λ x, y + λ2 y,y 0. h i − |h i| h i ≥ This is a quadratic polynomial aλ2 + bλ + c in λ with real coefficients. Since this polynomial takes only non-negative values, its discriminant b2 4ac must be non-positive: − 4 x, y 2 4 x 2 y 2 0. |h i| − k k k k ≤ This implies x, y x y . |h i|≤k k k k

Corollary 13.2 defines a norm on E. k·k Proof. It is clear that x 0. From (c) it follows that x = 0 implies x = 0. Further, k k ≥ k k λx = λx, λx = λ 2 x , x = λ x . We prove the triangle inequality. Since k k h i | | h i | |k k 2Re(z)= z + z we have by Proposition1.20 and the Cauchy-Schwarz inequality p q x + y 2 = x + y , x + y = x , x + x, y + y , x + y,y k k h i h i h i h i h i = x 2 + y 2 +2Re x, y k k k k h i x 2 + y 2 +2 x, y ≤k k k k |h i| x 2 + y 2 +2 x y =( x + y )2; ≤k k k k k kk k k k k k hence x + y x + y . k k≤k k k k

By the corollary, any unitary space is a normed space with the norm x = x , x . k k h i Recall that any normed vector space is a metric space with the metric d(x, y)= x y . Hence, pk − k the notions of open and closed sets, neighborhoods, converging sequences, Cauchy sequences, continuous mappings, and so on make sense in a unitary space. In particular lim xn = x n means that the sequence ( x x ) of non-negative real numbers tends to 0.→∞ Recall from k n − k Definition6.8 that a metric space is said to be complete if every Cauchy sequence converges.

Definition 13.2 A complete unitary space is called a Hilbert space. C Example 13.1 Let à = . n

n n n

C C (a) E = C , x =(x ,...,x ) , y =(y ,...,y ) . Then x, y = x y defines 1 n ∈ 1 n ∈ h i k k k=1 1 X n 2 n an inner product, with the euclidean norm x =( x ) . (C , , ) is a Hilbert space. k k k=1 | k | h· ·i P 13.1 The Geometry of the Hilbert Space 333

(b) E =L2(X,µ) is a Hilbert space with the inner product f,g = fg dµ. h i X By Proposition12.17 with p = q =2 we obtain the Cauchy-Schwarz inequality R 1 1 2 2 fg dµ f 2 dµ g 2 dµ . ≤ | | | | ZX ZX  ZX 

Using CSI one can prove Minkowski’s inequality, that is, f,g L2(X,µ) implies f + g ∈ ∈ L2(X,µ). Also, f,g is a finite complex number, since fg L1(X,µ). h i ∈ Note that the inner product is positive definite since f 2 dµ = 0 implies (by Proposi- X | | tion 12.16) f = 0 a. e. and therefore, f = 0 in L2(X,µ). To prove the completeness of | | R L2(X,µ) is more complicated, we skip the proof.

(c) E = ℓ2, i.e.

∞ 2 Æ ℓ = (x ) x C, n , x < . 2 { n | n ∈ ∈ | n | ∞} n=1 k X Note that Cauchy–Schwarz’s inequality in Ê (Corollary 1.26) implies

k 2 k k 2 2 ∞ 2 ∞ 2 xnyn xn yn xn yn . ≤ | | | | ≤ | | | | n=1 n=1 n=1 n=1 n=1 X X X X X Taking the supremum over all k Æ on the left, we have ∈ 2 ∞ ∞ 2 ∞ 2 xnyn xn yn ; ≤ | | | | n=1 n=1 n=1 X X X hence ∞ (x ) , (y ) = x y h n n i n n n=1 X is an absolutely converging series such that the inner product is well-defined on ℓ2.

Lemma 13.3 Let E be a unitary space. For any fixed y E the mappings f,g : E C given ∈ → by f(x)= x, y , and g(x)= y , x h i h i are continuous functions on E. Proof. First proof. Let (x ) be a sequence in E , converging to x E, that is, n ∈ limn xn x =0. Then →∞ k − k xn , y x, y = xn x, y xn x y = 0 |h i−h i| |h − i |CSI ≤ k − kk k ⇒ as n . This proves continuity of f. The same proof works for g. →∞ Second proof. The Cauchy–Schwarz inequality implies that for x , x E 1 2 ∈ x , y x , y = x x , y x x y , |h 1 i−h 2 i| |h 1 − 2 i|≤k 1 − 2k k k which proves that the map x x, y is in fact uniformly continuous (Given ε > 0 choose 7→ h i δ = ε/ y . Then x x < δ implies x , y x , y < ε). The same is true for k k k 1 − 2k |h 1 i−h 2 i| x y , x . 7→ h i 334 13 Hilbert Space

Definition 13.3 Let H be a unitary space. We call x and y orthogonal to each other, and write x y, if x, y =0. Two subsets M, N H are called orthogonal to each other if x y for ⊥ h i ⊂ ⊥ all x M and y N. ∈ ∈ For a subset M H define the orthogonal complement M ⊥ of M to be the set ⊂ M ⊥ = x H x , m =0, for all m M . { ∈ |h i ∈ }

n n Ê For example, E = Ê with the standard inner product and v =(v ,...,v ) , v =0 yields 1 n ∈ 6 n n

v ⊥ = x Ê x v =0 . { } { ∈ | k k } Xk=1 n This is a hyperplane in Ê which is orthogonal to v.

Lemma 13.4 Let H be a unitary space and M H be an arbitrary subset. Then, M ⊥ is a ⊂ closed linear subspace of H.

Proof. (a) Suppose that x, y M ⊥. Then for m M we have ∈ ∈ λ x + λ y , m = λ x , m + λ y , m = 0; h 1 2 i 1 h i 2 h i hence λ x + λ y M ⊥. This shows that M ⊥ is a linear subspace. 1 2 ∈ (b) We show that any converging sequence (xn) of elements of M ⊥ has itslimit in M ⊥. Suppose

lim xn = x, xn M ⊥, x H. Then for all m M, xn , m =0. Since the inner product is n ∈ ∈ ∈ h i continuous→∞ in the first argument (see Lemma13.3) we obtain

0 = lim xn , m = x , m . n →∞ h i h i This shows x M ⊥; hence M ⊥ is closed. ∈

13.1.2 Norm and Inner product Problem. Given a normed linear space (E, ). Does there exist an inner product , on E k·k h· ·i such that x = x , x for all x E? In this case we call an inner product norm. k k h i ∈ k·k

p

C Ã Ê Proposition 13.5 (a) A norm on a linear space E over à = or = is an inner k·k product norm if and only if the parallelogram law

x + y 2 + x y 2 = 2( x 2 + y 2), x,y E (13.2) k k k − k k k k k ∈

is satisfied. Ê (b) If (13.2) is satisfied, the inner product , is given by (13.3) in the real case à = and

h· ·i C by (13.4) in the complex case à = .

1 2 2 Ê x, y = x + y x y , if à = . (13.3) h i 4 k k −k − k

1 2 2 2 2 C x, y = x + y x y + i x + iy i x iy , if à = . (13.4) h i 4 k k −k − k k k − k − k These equations are called polarization identities. 

13.1 The Geometry of the Hilbert Space 335 Ê Proof. We check the parallelogram and the polarization identity in the real case, Ã = . x + y 2 + x y 2 = x + y , x + y + x y , x y k k k − k h i h − − i = x , x + y , x + x, y + y, y +( x , x y , x x, y + y,y ) h i h i h i h i h i−h i−h i h i =2 x 2 +2 y 2 . k k k k Further, x + y 2 x y 2 =( x , x + y , x + x, y + y,y ) k k −k − k h i h i h i h i − ( x , x y , x x, y + y,y )=4 x, y . − h i−h i−h i h i h i

The proof that the parallelogram identity is sufficient for E being a unitary space is in the appendix to this section.

1 2 Example 13.2 We show that L ([0, 2]) with f 1 = 0 f dx is not an inner product norm. k k | | 1 Indeed, let f = χ and g = χ . Then f + g = f g =1 and f = g = dx =1 [1,2] [0,1] | −R | k k1 k k1 0 such that R f + g 2 + f g 2 =22 +22 =8 =4=2( f 2 + g 2). k k1 k − k1 6 k k1 k k1 The parallelogram identity is not satisfied for such that L1([0, 2]) is not an inner product k·k1 space.

13.1.3 Two Theorems of F. Riesz (born: January 22, 1880 in Austria-Hungary, died: February 28, 1956, founder of functional analysis)

Definition 13.4 Let (H , , ) and (H , , ) be Hilbert spaces. Let H = (x , x ) x 1 h· ·i1 2 h· ·i2 { 1 2 | 1 ∈ H , x H be the direct sum of the Hilbert spaces H and H . Then 1 2 ∈ 2} 1 2 (x , x ) , (y ,y ) = x , y + x , y h 1 2 1 2 i h 1 1i1 h 2 2i2 defines an inner product on H. With this inner product H becomes a Hilbert space. H = H H is called the (direct) orthogonal sum of H and H . 1 ⊕ 2 1 2

Definition 13.5 Two Hilbert spaces H1 and H2 are called isomorphic if there exists a bijective linear mapping ϕ: H H such that 1 → 2 ϕ(x) , ϕ(y) = x, y , x,y H . h i2 h i1 ∈ 1 ϕ is called isometric isomorphism or a unitary map. Back to the orthogonal sum H = H H . Let H˜ = (x , 0) x H and H˜ = (0, x ) 1 ⊕ 2 1 { 1 | 1 ∈ 1} 2 { 2 | x H . Then x (x , 0) and x (0, x ) are isometric isomorphisms from H H˜ , 2 ∈ 2} 1 7→ 1 2 7→ 2 i → i i =1, 2. We have H˜ H˜ and H˜ , i =1, 2 are closed linear subspaces of H. 1 ⊥ 2 i In this situation we say that H is the inner orthogonal sum of the two closed subspaces H˜1 and

H˜2. 336 13 Hilbert Space

(a) Riesz’s First Theorem

Problem. Let H1 be a closed linear subspace of H. Does there exist another closed linear subspace H such that H = H H ? 2 1 ⊕ 2 Answer: YES.  Lemma 13.6 ( Minimal Distance Lemma) Let C be a con- x vex and closed subset of the Hilbert space H. For x H  ∈  let c  ̺(x) = inf x y y C .  {k − k| ∈ }  Then there exists a unique element c C such that  ∈ C ̺(x)= x c . k − k Proof. Existence. Since ̺(x) is an infimum, there exists a sequence (y ), y C, which n n ∈ approximates the infimum, limn x yn = ̺(x). We will show, that (yn) is a Cauchy →∞ k − k sequence. By the parallelogram law (see Proposition 13.5) we have

y y 2 = y x + x y 2 k n − mk k n − − mk =2 y x 2 +2 x y 2 2x y y 2 k n − k k − mk −k − n − mk y + y 2 =2 y x 2 +2 x y 2 4 x n m . k n − k k − mk − − 2

Since C is convex, (y + y )/2 C and therefore x yn+ym ̺(x). Hence n m ∈ − 2 ≥ 2 y x 2 +2 x y 2 4̺( x)2. ≤ k n − k k − mk − By the choice of (y ), the first two sequences tend to ̺(x)2 as m, n . Thus, n →∞ 2 2 2 2 lim yn ym = 2(̺ (x)+ ̺(x) ) 4̺(x) =0, m,n →∞ k − k − hence (y ) is a Cauchy sequence. Since H is complete, there exists an element c H such n ∈ that limn yn = c. Since yn C and C is closed, c C. By construction, we have →∞ ∈ ∈ y x ̺(x). On the other hand, since y c and the norm is continuous (see k n − k −→ n −→ homework 42.1. (b)), we have y x c x . k n − k −→k − k This implies ̺(x)= c x . k − k Uniqueness. Let c,c′ two such elements. Then, by the parallelogram law,

2 2 0 c c′ = c x + x c′ ≤k − k k − − k 2 2 2 c + c′ =2 c x +2 x c′ 4 x k − k k − k − − 2

2 2 2 2(̺(x) + ̺(x) ) 4̺(x) =0 . ≤ − This implies c = c′; the point c C which realizes the infimum is unique. ∈ 13.1 The Geometry of the Hilbert Space 337

 H x  2  Theorem 13.7 (Riesz’s first theorem)  Let H1 be a closed linear subspace of the   Hilbert space H. Then we have    H  1 H = H1 H1⊥, x  ⊕ 2 x 1 that is, any x H has a unique repre-  ∈  sentation x = x1 + x2 with x1 H1 and  ∈  x2 H⊥.  ∈ 1   Proof. Existence. Apply Lemma13.6 to the convex, closed set H1. There exists a unique x H such that 1 ∈ 1 ̺(x) = inf x y y H = x x x x ty {k − k| ∈ 1} k − 1k≤k − 1 − 1k

for all t à and y H . homework 42.2(c) now implies x = x x y for all y H . ∈ 1 ∈ 1 2 − 1 ⊥ 1 1 ∈ 1 Hence x H⊥. Therefore, x = x + x , and the existence of such a representation is shown. 2 ∈ 1 1 2 Uniqueness. Suppose that x = x1 + x2 = x1′ + x2′ are two possibilities to write x as a sum of elements of x , x′ H and x , x′ H⊥. Then 1 1 ∈ 1 2 2 ∈ 1

x x′ = x′ x = u 1 − 1 2 − 2 belongs to both H and H⊥ (by linearity of H and H ). Hence u,u = 0 which implies 1 1 1 2 h i u =0. That is, x1 = x1′ and x2 = x2′ .

Let x = x1 + x2 be as above. Then the mappings P1(x)= x1 and P2(x)= x2 are well-defined on H. They are called orthogonal projections of H onto H1 and H2, respectively. We will consider projections in more detail later.

Example 13.3 Let H be a Hilbert space, z H, z =0, H = Ã z the one-dimensional linear ∈ 6 1 subspace spanned by one single vector z. Since any finite dimensional subspace is closed, Riesz’s first theorem applies. We want to compute the projections of x H with respect to H ∈ 1 and H⊥. Let x = αz; we have to determine α such that x x , z =0, that is 1 1 h − 1 i x αz, z = x , z αz, z = x , z α z , z =0. h − i h i−h i h i − h i Hence, x , z x , z α = h i = h i. z , z z 2 h i k k

The Riesz’s representation with respect to H1 = Ãz and H1⊥ is x , z x , z x = h i z + x h i z . z 2 − z 2 k k  k k  338 13 Hilbert Space

(b) Riesz’s Representation Theorem

Recall from Section 11 that a linear functional on the vector space E is a mapping F : E Ã → such that F (λ x + λ x )= λ F (x )+ λ F (x ) for all x , x E and λ ,λ Ã.

1 1 2 2 1 1 2 2 1 2 ∈ 1 2 ∈ Ã Let (E, ) be a normed linear space over Ã. Recall that a linear functional F : E is k·k → called continuous if x x in E implies F (x ) F (x). n −→ n −→ The set of all continuous linear functionals F on E form a linear space E′ with the same linear operations as in E∗.

Now let (H, , ) be a Hilbert space. By Lemma13.3, F : H Ã, F (x) = x, y defines h· ·i y → y h i a continuous linear functional on H. Riesz’s representation theorem states that any continuous linear functional on H is of this form.

Theorem 13.8 (Riesz’s Representation Theorem) Let F be a continuous linear functional on the Hilbert space H. Then there exists a unique element y H such that F (x)= F (x)= x, y for all x H. ∈ y h i ∈

Proof. Existence. Let H1 = ker F be the null-space of the linear functional F . H1 is a linear 1 subspace (since F is linear). H is closed since H = F − ( 0 ) is the preimage of the closed 1 1 { } set 0 under the continuous map F . By Riesz’s first theorem, H = H H⊥. { } 1 ⊕ 1 Case 1. H⊥ = 0 . Then H = H and F (x)=0 for all x. We can choose y = 0; F (x) = 1 { } 1 x , 0 . h i Case 2. H⊥ = 0 . Suppose u H⊥, u = 0. Then F (u) = 0 (otherwise, u H⊥ H such 1 6 { } ∈ 1 6 6 ∈ 1 ∩ 1 that u,u =0 which implies u =0). We have h i F (x) F (x) F x u = F (x) F (u)=0. − F (u) − F (u)   F (x) Hence x u H . Since u H⊥ we have − F (u) ∈ 1 ∈ 1

F (x) F (x) 0= x u,u = x, u u, u − F (u) h i − F (u) h i   F (u) F (u) F (x)= x, u = x , 2 u = Fy(x), u, u h i * u + h i k k F (u) where y = u. u 2 Uniqueness.kSupposek that both y ,y H give the same functional F , i.e. F (x)= x, y = 1 2 ∈ h 1i x, y for all x. This implies h 2i y y , x =0, x H. h 1 − 2 i ∈ In particular, choose x = y y . This gives y y 2 =0; hence y = y . 1 − 2 k 1 − 2k 1 2 13.1 The Geometry of the Hilbert Space 339

(c) Example

2 Any continuous linear functionals on L (X,µ) are of the form F (f) = X f g dµ with some g L2(X,µ). Any continuous linear functional on ℓ is given by ∈ 2 R ∞ F ((x )) = x y , with (y ) ℓ . n n n n ∈ 2 n=1 X

13.1.4 Orthogonal Sets and Fourier Expansion

n Motivation. Let E = Ê be the euclidean space with the standard inner product and standard basis e ,..., e . Then we have with x = x , e { 1 n} i h ii n n n x = x e , x 2 = x 2 , x, y = x y . k k k k | k | h i k k Xk=1 Xk=1 Xk=1 We want to generalize these formulas to arbitrary Hilbert spaces.

(a) Orthonormal Sets

Let (H, , ) be a Hilbert space. h· ·i Definition 13.6 Let x i I be a family of elements of H. { i | ∈ } x is called an orthogonal set or OS if x , x =0 for i = j. { i} h i ji 6 x is called an orthonormal set or NOS if x , x = δ for all i, j I. { i} h i ji ij ∈

Example 13.4 (a) H = ℓ2, en = (0, 0,..., 0, 1, 0,... ) with the 1 at the nth component. Then

en n Æ is an OS in H. { | ∈ } 2π (b) H =L2((0, 2π)) with the Lebesgue measure, f,g = fg dλ. Then h i 0

1, sin(nx), cos(nx) n ÆR { | ∈ } is an OS in H.

1 sin(nx) cos(nx) einx Æ , , n Æ , n √2π √π √π | ∈ √2π | ∈     to be orthonormal sets of H.

Lemma 13.9 (The Pythagorean Theorem) Let x ,...,x be an OS in H, then { 1 k} x + + x 2 = x 2 + + x 2 . k 1 ··· kk k 1k ··· k kk The easy proof is left to the reader.

2 Lemma 13.10 Let x be an OS in H. Then ∞ x converges if and only if ∞ x { n} k=1 k k=1 k kk converges. P P The proof is in the appendix. 340 13 Hilbert Space

(b) Fourier Expansion and Completeness

Throughout this paragraph let x n Æ an NOS in the Hilbert space H. { n | ∈ }

Definition 13.7 The numbers x , x , n Æ, are called Fourier coefficients of x H with h ni ∈ ∈ respect to the NOS x . { n} 1 sin(nx) cos(nx)

Example 13.5 Consider the NOS , , n Æ from the previous ex- √2π √π √π | ∈   ample on H =L2((0, 2π)). Let f H. Then ∈ sin(nx) 1 2π f , = f(t) sin(nt) dt, √π √π   Z0 cos(nx) 1 2π f , = f(t) cos(nt) dt, √π √π   Z0 1 1 2π f , = f(t) dt, √2π √2π   Z0 These are the usual Fourier coefficients—up to a factor. Note that we have another normaliza- tion than in Definition6.3 since the inner product there has the factor 1/(2π).

Proposition 13.11 (Bessel’s Inequality) For x H we have ∈ ∞ x , x 2 x 2 . (13.5) |h ki| ≤k k Xk=1 n

Proof. Let n Æ be a positive integer and y = x x , x x . Then ∈ n − k=1 h ki k n P n y , x = x , x x , x x , x = x , x x , x δ =0 h n mi h mi − h ki h k mi h mi − h ki km Xk=1 Xk=1 for m =1,...,n. Hence, y , x , x x ,..., x , x x is an OS. By Lemma13.9 { n h 1i 1 h ni n} n 2 n n 2 2 2 2 2 x = yn + x , xk xk = yn + x , xk xk x , xk , k k h i k k |h i| k k ≥ |h i| k=1 k=1 k=1 X X X since x 2 = 1 for all k. Taking the supremum over all n on the right, the assertion follows. k kk

∞ Corollary 13.12 For any x H the series x , x x converges in H. ∈ h ki k Xk=1 Proof. Since x , xk xk is an OS, by Lemma13.10 the series converges if and only if the {h i }2 2 series ∞ x , x x = ∞ x , x converges. By Bessel’s inequality, this series k=1 kh ki kk k=1 |h ki| converges. P P

We call ∞ x , x x the Fourier series of x with respect to the NOS x . k=1 h ki k { k} P 13.1 The Geometry of the Hilbert Space 341

Remarks 13.1 (a) In general, the Fourier series of x does not converge to x. (b) The NOS 1 , sin(nx) , cos(nx) gives the ordinary Fourier series of a function f which is { √2π √π √π } integrable over (0, 2π).

Theorem 13.13 Let x k Æ be an NOS in H. The following are equivalent: { k | ∈ }

∞ (a) x = x , x x for all x H, i.e. the Fourier series of x converges to x. h ki k ∈ Xk=1 (b) z , x =0 for all k Æ implies z =0, i.e. the NOS is maximal. h ki ∈ ∞ (c) For every x H we have x 2 = x , x 2. ∈ k k |h ki| Xk=1 ∞ (d) If x H and y H, then x, y = x , x x , y . ∈ ∈ h i h ki h k i Xk=1 Formula (c) is called Parseval’s identity.

Definition 13.8 An orthonormal set x i Æ which satisfies the above (equivalent) prop- { i | ∈ } erties is called a complete orthonormal system, CNOS for short.

Proof. (a) (d): Since the inner product is continuous in both components we have →

∞ ∞ ∞ x, y = x , x x , y , x x = x , x y , x x , x h i h ki k h ni n h ki h nih k ni *k=1 n=1 + k,n=1 X X X δkn ∞ = x , x x , y . | {z } h kih k i Xk=1 (d) (c): Put y = x. → (c) (b): Suppose z , x =0 for all k. By (c) we then have → h ki

∞ z 2 = z , x 2 = 0; hence z =0. k k |h ki| Xk=1

(b) (a): Fix x H and put y = ∞ x , x x which converges according to Corol- → ∈ k=1 h ki k lary13.12. With z = x y we have for all positive integers n Æ − P ∈

∞ z , xn = x y , xn = x x , xk xk , xn h i h − i * − h i + Xk=1 ∞ z , x = x , x x , x x , x = x , x x , x =0. h ni h ni − h ki h k ni h ni−h ni Xk=1 This shows z =0 and therefore x = y, i. e. the Fourier series of x converges to x. 342 13 Hilbert Space

Example 13.6 (a) H = ℓ , e n Æ is an NOS. We show that this NOS is complete. 2 { n | ∈ }

For, let x = (x ) be orthogonal to every e , n Æ; that is, 0 = x , e = x . Hence, n n ∈ h ni n x = (0, 0,... )=0. By (b), e is a CNOS. How does the Fourier series of x look like? The { n} Fourier coefficients of x are x , e = x such that h ni n ∞ x = xn en n=1 X is the Fourier series of x . The NOS e n 2 is not complete. { n | ≥ } (b) H =L2 ((0, 2π)),

1 sin(nx) cos(nx) einx Z , , n Æ , and n √2π √π √π | ∈ √2π | ∈     are both CNOSs in H. This was stated in Theorem6.14

(c) Existence of CNOS in a Separable Hilbert Space

Definition 13.9 A metric space E is called separable if there exists a countable dense subset of E.

n É Example 13.7 (a) Ê is separable. M = (r1,...,rn) r1,...,rn is a countable dense n { | ∈ } set in Ê .

n É (b) C is separable. M = (r1 + is1,...,rn + isn) r1,...,rn,s1,...,sn is a countable n { | ∈ } dense subset of C . (c) L2([a, b]) is separable. The polynomials 1, x, x2,... are linearly independent in L2([a, b]) { } and they can be orthonormalized via Schmidt’s process. As a result we get a countable CNOS 2 2 in L ([a, b]) (Legendre polynomials in case a =1= b). However, L (Ê) contains no polyno- − x2 mial; in this case the Hermite functions which are of the form pn(x) e− with polynomials pn, form a countable CNOS. 2 n More general, L (G,λ ) is separable for any region G Ê with respect to the Lebesgue n ⊂ measure. (d) Any Hilbert space is isomorphic to some L2(X,µ) where µ is the counting measure on X;

X = Æ gives ℓ2. X uncountable gives a non-separable Hilbert space.

Proposition 13.14 (Schmidt’s Orthogonalization Process) Let y be an at most countable { k} linearly independent subset of the Hilbert space H. Then there exists an NOS x such that { k} for every n lin y ,...,y = lin x ,...,x . { 1 n} { 1 n} The NOS can be computed recursively, n y1 x1 := , xn+1 =(yn+1 yn+1 , xk xk)/ y1 − h i k∼k k k Xk=1

Corollary 13.15 Let e k N be an NOS where N = 1,...,n for some n Æ { k | ∈ } { } ∈ or N = Æ. Suppose that H = lin e k N is the linear span of the NOS. Then 1 { k | ∈ } x1 = k N x , ek ek is the orthogonal projection of x H onto H1. ∈ h i ∈ P 13.1 The Geometry of the Hilbert Space 343

Proposition 13.16 (a) A Hilbert space H has an at most countable complete orthonormal sys- tem (CNOS) if and only if H is separable.

n Æ (b) Let H be a separable Hilbert space. Then H is either isomorphic to à for some n or ∈ to ℓ2.

13.1.5 Appendix

(a) The Inner Product constructed from an Inner Product Norm Ê Proof of Proposition13.5. We consider only the case à = . Assume that the parallelogram identity is satisfied. We will show that 1 x, y = x + y 2 x y 2 h i 4 k k −k − k  defines a bilinear form on E. (a) We show Additivity. First note that the parallelogram identity implies 1 1 x +x +y 2 = x +x +y 2 + x +x +y 2 k 1 2 k 2 k 1 2 k 2 k 1 2 k 1 1 = 2 x +y 2 +2 x 2 x x +y 2 + 2 x +y 2 +2 x 2 x x +y 2 2 k 1 k k 2k −k 1− 2 k 2 k 2 k k 1k −k 2− 1 k 1 = x +y 2 + x +y 2 + x 2 + x 2  x x +y 2 + x x +y 2  k 1 k k 2 k k 1k k 2k −2 k 1− 2 k k 2− 1 k  Replacing y by y, we have − 1 x +x y 2 = x y 2 + x y 2 + x 2 + x 2 x x y 2 + x x y 2 . k 1 2− k k 1− k k 2− k k 1k k 2k −2 k 1− 2− k k 2− 1− k By definition and the above two formulas,  1 x + x , y = x + x + y 2 x + x y 2 h 1 2 i 4 k 1 2 k −k 1 2 − k 1 = x + y 2 x y 2 + x + y 2 x y 2 2 k 1 k −k 1 − k k 2 k −k 2 − k = x , y + x , y ,  h 1 i h 2 i that is, , is additive in the first variable. It is obviously symmetric and hence additive in the h· ·i second variable, too.

(b) We show λx, y = λ x, y for all λ Ê, x, y E. By (a), 2x, y = 2 x, y . By

h i h i ∈ ∈ m h i h i Æ induction on n, nx, y = n x, y for all n Æ. Now let λ = , m, n . Then h i h i ∈ n ∈ m m n λx, y = n x, y = n x, y = m x, y h i n n h i mD E D E = λx, y = x, y = λ x, y . ⇒ h i n h i h i

Hence, λx, y = λ x, y holds for all positive rational numbers λ. Suppose λ É , then h i h i ∈ + 0= x +( x) , y = x, y + x, y h − i h i h− i 344 13 Hilbert Space implies x, y = x, y and, moreover, λx, y = λ x, y such that the equation

h− i −h i h− i − h i

Ê É holds for all λ É. Suppose that λ is given. Then there exists a sequence (λ ), λ ∈ ∈ n n ∈ of rational numbers with λ λ. This implies λ x λx for all x E and, since is n → n → ∈ k·k continuous,

λx, y = lim λnx, y = lim λn x, y = λ x, y . n n h i →∞ h i →∞ h i h i This completes the proof.

(b) Convergence of Orthogonal Series

We reformulate Lemma13.10

2 Let x be an OS in H. Then ∞ x converges if and only if ∞ x { n} k=1 k k=1 k kk converges. P P

Note that the convergence of a series i∞=1 xi of elements xi of a Hilbert space H is defined n to be the limit of the partial sums limn xi. In particular, the Cauchy criterion applies P→∞ i=1 since H is complete: P

The series yi converges if and only if for every ε > 0 there exists n0 Æ such n ∈

that m, n Pn0 imply yi <ε. ≥ i=m X n 2 Proof. By the above discussion, ∞ xk converges if and only if xk becomes small k=1 k k=m k for sufficiently large m, n Æ. By the Pythagorean theorem this term equals ∈ P P n x 2 ; k kk kX=m hence the series x converges, if and only if the series x 2 converges. k k kk P P

13.2 Bounded Linear Operators in Hilbert Spaces

13.2.1 Bounded Linear Operators

Let (E , ) and (E , ) be normed linear space. Recall that a linear map T : E E is 1 k·k1 2 k·k2 1 → 2 called continuous if x x in E implies T (x ) T (x) in E . n −→ 1 n −→ 2 Definition 13.10 (a) A linear map T : E E is called bounded if there exist a positive real 1 → 2 number C > 0 such that

T (x) C x , for all x E . (13.6) k k2 ≤ k k1 ∈ 1 13.2 Bounded Linear Operators in Hilbert Spaces 345

(b) Suppose that T : E E is a bounded linear map. Then the operator norm is the smallest 1 → 2 number C satisfying (13.6) for all x E , that is ∈ 1 T = inf C > 0 x E : T (x) C x . k k { | ∀ ∈ 1 k k2 ≤ k k1} One can show that T (x) (a) T = sup k k2 x E , x =0 , k k x | ∈ 1 6  k k1  (b) T = sup T (x) x 1 k k {k k2 |k k1 ≤ } (c) T = sup T (x) x =1 k k {k k2 |k k1 } Indeed, we may restrict ourselves to unit vectors since T (αx) α T (x) T (x) k k2 = | |k k2 = k k2 . αx α x x k k1 | |k k1 k k1 This shows the equivalence of (a) and (c). Since T (αx) = α T (x) , the suprema (b) k k2 | |k k2 and (c) are equal. From The last equality follows from the fact that the least upper bound is the infimum over all upper bounds. From (a) and (d) it follows, T (x) T x . (13.7) k k2 ≤k kk k1 S T Also, if E E E are bounded linear mappings, then T ◦S is a bounded linear mapping 1 ⇒ 2 ⇒ 3 with T ◦S T S . k k≤k k k k Indeed, for x =0 one has by (13.7) 6 T (S(x)) T S(x) T S x . k k3 ≤k k k k2 ≤k k k kk k1 Hence, (T ◦S)(x) / x T S . k k3 k k1 ≤k k k k Proposition 13.17 For a linear map T : E E of a normed space E into a normed space 1 → 2 1 E2 the following are equivalent: (a) T is bounded. (b) T is continuous.

(c) T is continuous at one point of E1. Proof. (a) (b). This follows from the fact → T (x ) T (x ) = T (x x ) T x x , k 1 − 2 k k 1 − 2 k≤k kk 1 − 2k and T is even uniformly continuous on E1. (b) trivially implies (c). (c) (a). Suppose T is continuous at x . To each ε > 0 one can find δ > 0 such that → 0 x x < δ implies T (x) T (x ) <ε. Let y = x x . In other words y < δ implies k − 0k k − 0 k − 0 k k T (y + x ) T (x ) = T (y) < ε. k 0 − 0 k k k Suppose z E , z 1. Then δ/2z δ/2 < δ; hence T (δ/2z) <ε. By linearity of T , ∈ 1 k k ≤ k k ≤ k k T (z) < 2ε/δ. This shows T 2ε/δ. k k k k ≤ 346 13 Hilbert Space

Definition 13.11 Let E and F be normed linear spaces. Let L (E, F ) denote the set of all bounded linear maps from E to F . In case E = F we simply write L (E) in place of L (E, F ).

Proposition 13.18 Let E and F be normed linear spaces. Then L (E, F ) is a normed linear space if we define the linear structure by

(S + T )(x)= S(x)+ T (x), (λT )(x)= λ T (x)

for S, T L (E, F ), λ Ã. The operator norm T makes L (E, F ) a normed linear space. ∈ ∈ k k Note that L (E, F ) is complete if and only if F is complete.

n m à Example 13.8 (a) Recall that L (à , ) is a normed vector space with A 1 k k ≤ a 2 2 , where A = (a ) is the matrix representation of the linear operator A, see i,j | ij | ij Proposition 7.1 P 

(b) The space E′ = L (E, Ã) of continuous linear functionals on E. (c) H =L2((0, 1)), g C([0, 1]), ∈

Tg(f)(t)= g(t)f(t) defines a bounded linear operator on H. (see homework) (d) H =L2((0, 1)), k(s, t) L2([0, 1] [0, 1]). Then ∈ × 1 (Kf)(t)= k(s, t)f(s) ds, f H =L2([0, 1]) ∈ Z0 defines a continuous linear operator K L (H). We have ∈ 1 2 1 2 (Kf)(t) 2 = k(s, t)f(s) ds k(s, t) f(s) ds | | ≤ | | | | Z0 Z0  1 1 2 2 k(s, t) ds f(s) ds C-S-I≤ 0 | | 0 | | Z1 Z = k(s, t) 2 ds f 2 . | | k kH Z0 Hence,

1 1 K(f) 2 k(s, t) 2 ds dt f 2 k kH ≤ | | k kH Z0 Z0  K(f) H k L2([0,1] [0,1]) f H . k k ≤k k × k k

This shows Kf H and further, K k 2 2 . K is called an integral operator; K is ∈ k k≤k kL ([0,1] ) compact, i.e. it maps the unit ball into a set whose closure is compact.

2 Ê (e) H =L (Ê), a , ∈

(V f)(t)= f(t a), t Ê, a − ∈ defines a bounded linear operator called the shift operator. Indeed,

2 2 2 2

Vaf 2 = f(t a) dt = f(t) dt = f 2 ; Ê k k Ê | − | | | k k Z Z 13.2 Bounded Linear Operators in Hilbert Spaces 347 since all quotients V (x) / x =1, V =1 . k a k k k k ak (f) H = ℓ2. We define the right-shift S by

S(x1, x2,... )=(0, x1, x2,... ).

1 2 2 Obviously, S(x) = x = n∞=1 xn . Hence, S =1. k 1 k k k | | k k (g) Let E1 = C ([0, 1]) and E2 = C([0, 1]). Define the differentiation operator (T f)(t)= f ′(t). P  n Let f 1 = f 2 = sup f(t) . Then T is linear but not bounded. Indeed, let fn(t)=1 t . k k k k t [0,1] | | − ∈ n 1 Then f =1 and T f (t)= nt − such that tf = n. Thus, T f / f = n + k nk1 n k nk2 k nk2 k nk1 → ∞ as n . T is unbounded. →∞ However, if we put f 1 = sup f(t) + sup f ′(t) and f 2 as before, then T is bounded k k t [0,1] | | t [0,1] | | k k since ∈ ∈

T f 2 = sup f ′(t) f 1 = T 1. k k t [0,1] | |≤k k ⇒ k k ≤ ∈ 13.2.2 The Adjoint Operator In this subsection H is a Hilbert space and L (H) the space of bounded linear operators on H. Let T L (H) be a bounded linear operator and y H. Then F (x) = T (x) , y defines a ∈ ∈ h i continuous linear functional on H. Indeed,

F (x) = T (x) , y T (x) y T y x C x . | | |h i |CSI ≤ k k k k≤k kk k k k ≤ k k C Hence, F is bounded and therefore continuous. in particular,| {z }

F T y k k≤k k k k By Riesz’s representation theorem, there exists a unique vector z H such that ∈ T (x) , y = F (x)= x , z . h i h i Note that by the above inequality

z = F T y . (13.8) k k k k≤k kk k Suppose y is another element of H which corresponds to z H with 1 1 ∈ T (x) , y = x , z . h 1i h 1i Finally, let u H be the element which corresponds to y + y , ∈ 1 T (x) , y + y = x, u . h 1i h i Since the element u which is given by Riesz’s representation theorem is unique, we have u = z + z1. Similarly, T (x) , λy = F (x)= x, λz h i h i shows that λz corresponds to λy. 348 13 Hilbert Space

Definition 13.12 The above correspondence y z is linear. Define the linear operator T by 7→ ∗ z = T ∗(y). By definition, T (x) , y = x , T (y) , x,y H. (13.9) h i ∗ ∈

T ∗ is called the adjoint operator to T . Proposition 13.19 Let T, T , T L (H). Then T is a bounded linear operator with T = 1 2 ∈ ∗ ∗ T . We have k k (a) (T1 + T2)∗ = T1∗ + T2∗ and (b) (λ T )∗ = λ T ∗.

(c) (T1T2)∗ = T2∗ T1∗. L 1 1 (d) If T is invertible in (H),so is T ∗, and we have (T ∗)− =(T − )∗. (e) (T ∗)∗ = T . Proof. Inequality (13.8) shows that

T (y) T y , y H. ∗ ≤k kk k ∈

By definition, this implies T T ∗ ≤k k and T is bounded. Since ∗ T (x) , y = y , T (x) = T (y) , x = x , T (y) , ∗ h ∗ i h i h i we get (T ) = T . We conclude T = T T ; such that T = T . ∗ ∗ k k ∗∗ ≤ ∗ ∗ k k (a). For x, y H we have

∈ (T + T )(x) , y = T (x)+ T (x) , y = T (x) , y + T (x) , y h 1 2 i h 1 2 i h 1 i h 2 i = x , T1∗(y) + x , T2∗(y) = x , (T1∗ + T2∗)(y) ; which proves (a). (c) and (d) are left to the reader.

A mapping : A A such that the above properties (a), (b), and (c) are satisfied is called an ∗ → involution. An algebra with involution is called a -algebra. ∗ We have seen that L (H) is a (non-commutative) -algebra. An example of a commutative ∗ -algebra is C(K) with the involution f (x)= f(x). ∗ ∗ Example 13.9 (Example 13.8 continued)

n C (a) H = C , A =(aij) M(n n, ). Then A∗ =(bij) has the matrix elements bij = aji. 2 ∈ × (b) H =L ([0, 1]), Tg∗ = Tg. 2 (c) H =L (Ê), Va(f)(t)= f(t a) (Shift operator), Va = V a. − ∗ − 13.2 Bounded Linear Operators in Hilbert Spaces 349

(d) H = ℓ2. The right-shift S is defined by S((xn)) = (0, x1, x2,... ). We compute the adjoint S∗. ∞ ∞ S(x) , y = xn 1yn = xnyn+1 = (x1, x2,... ) , (y2,y3,... ) . h i − h i n=2 n=1 X X Hence, S∗((yn))=(y2,y3,... ) is the left-shift.

13.2.3 Classes of Bounded Linear Operators Let H be a complex Hilbert space.

(a) Self-Adjoint and Normal Operators

Definition 13.13 An operator A L (H) is called ∈ (a) self-adjoint, if A∗ = A, (b) normal , if A∗A = A A∗, A self-adjoint operator A is called positive, if Ax , x 0 for all x H. We write A 0. If h i ≥ ∈ ≥ A and B are self-adjoint, we write A B if A B 0. ≥ − ≥ A crucial role in proving the simplest properties plays the polarization identity which gener- alizes the polarization identity from Subsection13.1.2. However, this exist only in complex Hilbert spaces.

4 A(x) , y = A(x + y) , x + y A(x y) , x y + h i h i−h − − i + i A(x + iy) , x + iy i A(x iy) , x iy . h i − h − − i We use the identity as follows A(x) , x =0 for all x H implies A =0. h i ∈ Indeed, by the polarization identity, A(x) , y = 0 for all x, y H. In particular y = A(x) h i ∈ yields A(x)=0 for all x; thus, A =0.

Remarks 13.2 (a) A is normal if and only if A(x) = A (x) for all x H. Indeed, if A k k ∗ ∈ is normal, then for all x H we have A A(x) , x = A A (x) , x which imply A(x) 2 = ∈ ∗ ∗ k k A(x) , A(x) = A (x) , A (x) = A (x) 2. On the other hand, the polarization identity h i ∗ ∗ ∗ and A A(x) , x = A A (x) , x implies (A A A A )(x) , x = 0 for all x; hence ∗ ∗ ∗ − ∗ A A A A =0 which proves the claim. ∗ − ∗ (b) Sums and real scalar multiples of self-adjoint operators are self-adjoint. (c) The product AB of self-adjoint operators is self-adjoint if and only if A and B commute with each other, AB = BA. (d) A is self-adjoint if and only if Ax , x is real for all x H. h i ∈ Proof. Let A = A. Then Ax , x = x , Ax = Ax , x is real; for the opposite direction ∗ h i h i h i A(x) , x = x , A(x) and the polarization identity yields A(x) , y = x , A(y) for all h i h i h i h i x, y; hence A∗ = A. 350 13 Hilbert Space

(b) Unitary and Isometric Operators

Definition 13.14 Let T L (H). Then T is called ∈ (a) unitary, if T ∗T = I = T T ∗. (b) isometric, if T (x) = x for all x H. k k k k ∈ Proposition 13.20 (a) T is isometric if and only if T T = I and if and only if T (x) , T (y) = ∗ h i x, y for all x, y H. h i ∈ (b) T is unitary, if and only if T is isometric and surjective. 1 (c) If S, T are unitary, so are ST and T − . The unitary operators of L (H) form a group.

Proof. (a) T isometric yields T (x) , T (x) = x , x and further (T T I)(x) , x = 0 for h i h i ∗ − all x. The polarization identity implies T T = I. This implies (T T I)(x) , y = 0, for ∗ ∗ − all x, y H. Hence, T (x) , T (y) = x, y . Inserting y = x shows T is isometric. ∈ h i h i (b) Suppose T is unitary. T ∗T = I shows T is isometric. Since T T ∗ = I, T is surjective. Suppose now, T is isometric and surjective. Since T is isometric, T (x)=0 implies x = 0; 1 1 hence, T is bijective with an inverse operator T − . Insert y = T − (z) into T (x) , T (y) = h i x, y . This gives h i 1 T (x) , z = x , T − (z) , x, z H. h i ∈ 1 Hence T − = T ∗ and therefore T ∗T = T T ∗ = I. (c) is easy (see homework 45.4). Note that an isometric operator is injective with norm 1 (since T (x) / x = 1 for all x). In

n n k k k k n

C Ê case H = C , the unitary operators on form the unitary group U(n). In case H = , the unitary operators on H form the orthogonal group O(n).

2 Example 13.10 (a) H = L (Ê). The shift operator Va is unitary since VaVb = Va+b. The multiplication operator T f = gf is unitary if and only if g = 1. T is self-adjoint (resp. g | | g positive) if and only if g is real (resp. positive).

(b) H = ℓ2, the right-shift S((xn)) = (0, x1, x2,... ) is isometric but not unitary since S is not surjective. S∗ is not isometric since S∗(1, 0,... )=0; hence S∗ is not injective. 1 (c) Fourier transform. For f L (Ê) define ∈

1 itx (Ff)(t)= e− f(x) dx.

√2π Ê Z

n (k)

Ê Z Ê Let S Ê . S is called the ( ) = f C∞( ) supt Ê t f (t) < , n, k + ( )

{ ∈ | ∈ ∞ 1∀ ∈2 }

Ê Ê Schwartz space after Laurent Schwartz. We have S(Ê) L ( ) L ( ), for example, f(x)=

x2 ⊆ ∩

Ê Ê e− S(Ê). We will show later that F : S( ) S( ) is bijective and norm preserving,

∈ → 2 Ê

F(f) 2 = f 2 , f S(Ê). F has a unique extension to a unitary operator on L ( ). Ê k kL (Ê) k kL ( ) ∈ The inverse Fourier transform is

1 1 itx

(F− f)(t)= e f(x) dx, f S(Ê).

√2π Ê ∈ Z 13.2 Bounded Linear Operators in Hilbert Spaces 351

13.2.4 Orthogonal Projections (a) Riesz’s First Theorem—revisited

Let H be a closed linear subspace. By Theorem13.7 any x H has a unique decomposition 1 ∈ x = x + x with x H and x H⊥. The map P (x) = x is a linear operator from H 1 2 1 ∈ 1 2 ∈ 1 H1 1 to H, (see homework 44.1). PH1 is called the orthogonal projection from H onto the closed subspace H1. Obviously, H1 is the image of PH1 ; in particular, PH1 is surjective if and only if H1 = H. In this case, PH = I is the identity. Since

P (x) 2 = x 2 x 2 + x 2 = x 2 k H1 k k 1k ≤k 1k k 2k k k we have P 1. If H = 0 , there exists a non-zero x H such that P (x ) = x . k H1 k ≤ 1 6 { } 1 ∈ 1 k H1 1 k k 1k This shows P =1. k H1 k Here is the algebraic characterization of orthogonal projections.

Proposition 13.21 A linear operator P L (H) is an orthogonal projection if and only if 2 ∈ P = P and P ∗ = P . In this case H = x H P (x)= x . 1 { ∈ | } Proof. “ ”. Suppose that P = P is the projection onto H . Since P is the identity on H , → H1 1 1 P 2(x)= P (x )= x = P (x) for all x H; hence P 2 = P . 1 1 ∈ Let x = x1 + x2 and y = y1 + y2 be the unique decompositions of x and y in elements of H1 and H1⊥, respectively. Then P (x) , y = x , y + y = x , y + x , y = x , y = x + x , y = x, P (y) , h i h 1 1 2i h 1 1i h 1 2i h 1 1i h 1 2 1i h i =0 that is, P ∗ = P . | {z } “ ”. Suppose P 2 = P = P and put H = x P (x)= x . First note, that for P =0, H = ← ∗ 1 { | } 6 1 6 0 is non-trivial. Indeed, since P (P (x)) = P (x), the image of P is part of the eigenspace { } of P to the eigenvalues 1, P (H) H . Since for z H , P (z) = z, H P (H) and thus ⊂ 1 ∈ 1 1 ⊂ H1 = P (H). 1 Since P is continuous and 0 is closed, H = (P I)− ( 0 ) is a closed linear subspace of { } 1 − { } H. By Riesz’s first theorem, H = H1 H1⊥. We have to show that P (x)= x1 for all x. 2 ⊕ Since P = P , P (P (x)) = P (x) for all x; hence P (x) H . We show x P (x) H⊥ which ∈ 1 − ∈ 1 completes the proof. For, let z H , then ∈ 1 x P (x) , z = x , z P (x) , z = x , z x, P (z) = x , z x , z =0. h − i h i−h i h i−h i h i−h i Hence x = P (x)+(I P )(x) is the unique Riesz decomposition of x with respect to H and − 1 H1⊥.

Example 13.11 (a) Let x ,...,x be an NOS in H. Then { 1 n} n P (x)= x , x x , x H, h ki k ∈ Xk=1 352 13 Hilbert Space defines the orthogonal projection P : H H onto lin x ,...,x . Indeed, since P (x ) = → { 1 n} m n x , x x = x , P 2 = P and since k=1 h m ki k m P n n P (x) , y = x , xk xk , y = x , xk , y xk = x, P (y) . h i h i h i * h i + h i Xk=1 Xk=1 Hence, P ∗ = P and P is a projection. 2 (b) H =L ([0, 1] [2, 3]), g C([0, 1] [2, 3]). For f H define Tgf = gf. Then Tg =(Tg)∗ ∪ ∈ ∪ ∈ 2 if and only if g(t) is real for all t. Tg is an orthogonal projection if g(t) = g(t) such that g(t)=0 or g(t)=1. Since g is continuous, there are only four solutions: g1 = 0, g2 = 1, g3 = χ[0,1], and g4 = χ[2,3]. In case of g , the subspace H can be identified with L2([0, 1]) since f H iff T f = f iff 3 1 ∈ 1 g gf = f iff f(t)=0 for all t [2, 3]. ∈ (b) Properties of Orthogonal Projections

Throughout this paragraph let P1 and P2 be orthogonal projections on the closed subspaces H1 and H2, respectively.

Lemma 13.22 The following are equivalent.

(a) P1 + P2 is an orthogonal projection. (b) P1P2 =0. (c) H H . 1 ⊥ 2 Proof. (a) (b). Let P + P be a projection. Then → 1 2 2 2 2 ! (P1 + P2) = P1 + P1P2 + P2P1 + P2 = P1 + P2 + P1P2 + P2P1 = P1 + P2, hence P1P2 + P2P1 =0. Multiplying this from the left by P1 and from the right by P1 yields

P1P2 + P1P2P1 =0= P1P2P1 + P2P1.

This implies P1P2 = P2P1 and finally P1P2 = P2P1 =0. (b) (c). Let x H and x H . Then → 1 ∈ 1 2 ∈ 2 0= P P (x ) , x = P (x ) , P (x ) = x , x . h 1 2 2 1i h 2 2 1 1 i h 2 1i This shows H H . 1 ⊥ 2 (c) (b). Let x, z H be arbitrary. Then → ∈ P P (x) , z = P (x) , P (z) = x , z = 0; h 1 2 i h 2 1 i h 2 1i Hence P1P2(x)=0 and therefore P1P2 =0. The same argument works for P2P1 =0. (b) (a). Since P P =0 implies P P =0 (via H H ), → 1 2 2 1 1 ⊥ 2 (P1 + P2)∗ = P1∗ + P2∗ = P1 + P2, 2 2 2 (P1 + P2) = P1 + P1P2 + P2P1 + P2 = P1 +0+0+ P2. 13.2 Bounded Linear Operators in Hilbert Spaces 353

Lemma 13.23 The following are equivalent

(a) P1P2 is an orthogonal projection. (b) P1P2 = P2P1.

In this case, P P is the orthogonal projection onto H H . 1 2 1 ∩ 2 Proof. (b) (a). (P P ) = P P = P P = P P , by assumption. Moreover, (P P )2 = → 1 2 ∗ 2∗ 1∗ 2 1 1 2 1 2 P1P2P1P2 = P1P1P2P2 = P1P2 which completes this direction. (a) (b). P P =(P P ) = P P = P P . → 1 2 1 2 ∗ 2∗ 1∗ 2 1 Clearly, P P (H) H and P P (H) H ; hence P P (H) H H . On the other hand 1 2 ⊆ 1 2 1 ⊆ 2 1 2 ⊆ 1 ∩ 2 x H H implies P P x = x. This shows P P (H)= H H . ∈ 1 ∩ 2 1 2 1 2 1 ∩ 2

The proof of the following lemma is quite similar to that of the previous two lemmas, so we omit it (see homework 40.5).

Lemma 13.24 The following are equivalent.

(a) H H , (d) P P , 1 ⊆ 2 1 ≤ 2 (b) P1P2 = P1, (c) P2P1 = P1, (e) P P is an orth. projection, (f) P (x) P (x) , x H. 2 − 1 k 1 k≤k 2 k ∈ Proof. We show (d) (c). From P P we conclude that I P I P . Note that both ⇒ 1 ≤ 2 − 2 ≤ − 2 I P and I P are again orthogonal projections on H⊥ and H⊥, respectively. Thus for all − 1 − 2 1 2 x H: ∈

(I P )P (x) 2 = (I P )P (x) , (I P )P (x) k − 2 1 k h − 2 1 − 2 1 i = (I P ) (I P )P (x) , P (x) = (I P )P (x) , P (x) − 2 ∗ − 2 1 1 h − 2 1 1 i (I P )P (x) , P (x) = P (x) P (x) , P (x) = 0 , P (x) =0. ≤h − 1 1 1 i h 1 − 1 1 i h 1 i Hence, (I P )P (x) 2 =0 which implies (I P )P =0 and therefore P = P P . k − 2 1 k − 2 1 1 2 1

13.2.5 Spectrum and Resolvent

Let T L (H) be a bounded linear operator. ∈

(a) Definitions

Definition 13.15 (a) The resolvent set of T , denoted by ρ(T ), is the set of all λ C such that ∈ there exists a bounded linear operator R (T ) L (H) with λ ∈ R (T )(T λI)=(T λI)R (T )= I, λ − − λ 354 13 Hilbert Space i. e. there T λI has a bounded (continuous) inverse R (T ). We call R (T ) the resolvent of T − λ λ at λ.

(b) The set C \ ρ(T ) is called the spectrum of T and is denoted by σ(T ). (c) λ C is called an eigenvalue of T if there exists a nonzero vector x, called eigenvector, ∈ with (T λI)x =0. The set of all eigenvalues is the point spectrum σ (T ) − p Remark 13.3 (a) Note that the point spectrum is a subset of the spectrum, σ (T ) σ(T ). p ⊆ Suppose to the contrary, the eigenvalue λ with eigenvector y belongs to the resolvent set. Then there exists R (T ) L (H) with λ ∈ y = R (T )(T λI)(y)= R (T )(0) = 0 λ − λ which contradicts the definition of an eigenvector; hence eigenvalues belong to the spectrum. (b) λ σ (T ) is equivalent to T λI not being injective. It may happen that T λI is not ∈ p − − surjective, which also implies λ σ(T ) (see Example13.12(b) below). ∈

n C Example 13.12 (a) H = C , A M(n n, ). Since in finite dimensional spaces T L (H) ∈ × ∈ is injective if and only if T is surjective, σ(A)= σp(A). (b) H =L2([0, 1]). (T f)(x)= xf(x). We have

σp(T ) = ∅.

Indeed, suppose λ is an eigenvalue and f L 2([0, 1]) an eigenfunction to T , that is (T ∈ − λI)(f)=0; hence (x λ)f(x) 0 a. e. on [0, 1]. Since x λ is nonzero a. e. , f =0 a. e. on − ≡ − [0, 1]. That is f =0 in H which contradicts the definition of an eigenvector. We have

C \ [0, 1] ρ(T ). ⊆ 1 Suppose λ [0, 1]. Since x λ = 0 for all x [0, 1], g(x) = is a continuous (hence 6∈ − 6 ∈ x λ bounded) function on [0, 1]. Hence, − 1 (R f)(x)= f(x) λ x λ − defines a bounded linear operator which is inverse to T λI since − 1 1 (T λI) f(x) =(x λ) f(x) = f(x). − x λ − x λ  −   −  We have σ(T ) = [0, 1]. Suppose to the contrary that there exists λ ρ(T ) [0, 1]. Then there exists R L (H) with ∈ ∩ λ ∈ R (T λI)= I. (13.10) λ −

By homework 39.5(a), the norm of the multiplication operator Tg is less that or equal to g 2 k k∞ (the supremum norm of g). Choose fε = χ(λ ε,λ+ε). Since χM = χM , −

(T λI)fε = (x λ)χUε(λ)(x)fε(x) sup (x λ)χUε(λ)(x) fε . k − k − ≤ x [0,1] − k k ∈

13.2 Bounded Linear Operators in Hilbert Spaces 355

However,

sup (x λ)χUε(λ)(x) = sup x λ = ε. x [0,1] − x Uε(λ) | − | ∈ ∈ This shows (T λI)f ε f . k − εk ≤ k εk

Inserting fε into (13.10) we obtain

f = R (T λI)f R (T λI)f R ε f k εk k λ − εk≤k λkk − εk≤k λk k εk which implies R 1/ε. This contradicts the boundedness of R since ε> 0 was arbitrary. k λk ≥ λ

(b) Properties of the Spectrum

Lemma 13.25 Let T L (H). Then ∈

σ(T ∗)= σ(T )∗, (complex conjugation) ρ(T ∗)= ρ(T )∗.

Proof. Suppose that λ ρ(T ). Then there exists R (T ) L (H) such that ∈ λ ∈ R (T )(T λI)=(T λI)R (T )= I λ − − λ (R (T )(T λI)) = ((T λI)R ) = I λ − ∗ − λ ∗ (T ∗ λI)R (T ) = R (t) (T λI)= I. − λ ∗ λ ∗ ∗ −

This shows Rλ(T ∗) = Rλ(T )∗ is again a bounded linear operator on H. Hence, ρ(T ) (ρ(T )) . Since is an involution (T = T ), the opposite inclusion follows. ∗ ⊆ ∗ ∗ ∗∗ Since σ(T ) is the complement of the resolvent set, the claim for the spectrum follows as well.

For λ, µ, T and S we have

R (T ) R (T )=(λ µ)R (T )R (T )=(λ µ)R (T )R (T ), λ − µ − λ µ − µ λ R (T ) R (S)= R (T )(S T )R (S). λ − λ λ − λ Proposition 13.26 (a) ρ(T ) is open and σ(T ) is closed. 1 (b) If λ ρ(T ) and λ λ < R (T ) − then λ ρ(T ) and 0 ∈ | − 0 | k λ0 k ∈

∞ R (T )= (λ λ )nR (T )n+1. λ − 0 λ0 n=0 X (c) If λ > T , then λ ρ(T ) and | | k k ∈

∞ n 1 n R (T )= λ− − T . λ − n=0 X 356 13 Hilbert Space

Proof. (a) follows from (b). (b) For brevity, we write R in place of R (T ). With q = λ λ R (T ) , q (0, 1) we λ0 λ0 | − 0 |k λ0 k ∈ have ∞ ∞ R λ λ n R n+1 = qn R = k λ0 k converges. | − 0 | k λ0 k k λ0 k 1 q n=0 n=0 X X − By homework 38.4, x converges if x converges. Hence, n k nk ∞ P P n n+1 B = (λ λ0) R − λ0 n=0 X converges in L (H) with respect to the operator norm. Moreover, (T λI)B =(T λ I)B (λ λ )B − − 0 − − 0 ∞ ∞ n n+1 n+1 n+1 = (λ λ0) (T λ0I)R (λ λ0) R − − λ0 − − λ0 n=0 n=0 X X ∞ ∞ n n n+1 n+1 = (λ λ0) R (λ λ0) R − λ0 − − λ0 n=0 n=0 X X =(λ λ )0R0 = I. − 0 λ0 Similarly, one shows B(T λI)= I. Thus, R (T )= B. − λ (c) Since λ > T , the series converges with respect to operator norm, say | | k k ∞ n 1 n C = λ− − T . − n=0 X We have ∞ ∞ n 1 n+1 n n 0 0 (T λI)C = λ− − T + λ− T = λ T = I. − − n=0 n=0 X X Similarly, C(T λI)= I; hence R (T )= C. − λ

Remarks 13.4 (a) By (b), Rλ(T ) is a holomorphic (i.e. complex differentiable) function in the variable λ with values in L (H). One can use this to show that the spectrum is non-empty, σ(T ) = ∅. 6 n (b) If T < 1, T I is invertible with inverse ∞ T . k k − − n=0 (c) Proposition 13.26 (c) means: If λ σ(T ) then λ P ∈ | | ≤ T . However, there is, in general, a smaller disc around k k σ(Τ) 0 which contains the spectrum. By definition, the spec- tral radius r(T ) of T is the smallest non-negative number such that the spectrum is completely contained in the disc 0 around 0 with radius r(T ):

r(T) r(T ) = sup λ λ σ(T ) . {| || ∈ } 13.2 Bounded Linear Operators in Hilbert Spaces 357

(d) λ σ(T ) implies λn σ(T n) for all non-negative integers. Indeed, suppose λn ρ(T n), ∈ ∈ ∈ that is B(T n λn)=(T n λn)B = I for some bounded B. Hence, − − n k n 1 k B T λ − − (T λ)=(T λ)CB = I; − − Xk=0 thus λ ρ(T ). ∈ We shall refine the above statement and give a better upper bound for λ λ σ(T ) than {| | | ∈ } T . k k Proposition 13.27 Let T L (H) be a bounded linear operator. Then the spectral radius of T ∈ is

1 r(T ) = lim T n n . (13.11) n →∞ k k The proof is in the appendix.

13.2.6 The Spectrum of Self-Adjoint Operators Proposition 13.28 Let T = T be a self-adjoint operator in L (H). Then λ ρ(T ) if and ∗ ∈ only if there exists C > 0 such that

(T λI)x C x . k − k ≥ k k Proof. Suppose that λ ρ(T ). Then there exists (a non-zero) bounded operator R (T ) such ∈ λ that x = R (T )(T λI)x R (T ) (T λI)x . k k k λ − k≤k λ kk − k Hence, 1 (T λI)x x , x H. k − k ≥ R (T ) k k ∈ k λ k We can choose C =1/ R (T ) and the condition of the proposition is satisfied. k λ k Suppose, the condition is satisfied. We prove the other direction in 3 steps, i.e. T λ I has a − 0 bounded inverse operator which is defined on the whole space H. Step 1. T λI is injective. Suppose to the contrary that (T λ)x =(T λ)x . Then − − 1 − 2 0= (T λ)(x x ) C x x , k − 1 − 2 k ≥ k 1 − 2k and x x =0 follows. That is x = x . Hence, T λI is injective. k 1 − 2k 1 2 − Step 2. H = (T λI)H, the range of T λI is closed. Suppose that y = (T λI)x , 1 − − n − n x H, converges to some y H. We want to show that y H . Clearly (y ) is a Cauchy n ∈ ∈ ∈ 1 n sequence such that y y 0 as m, n . By assumption, k m − nk→ →∞ y y = (T λI)(x x ) C x x . k m − nk k − n − m k ≥ k n − mk Thus, (x ) is a Cauchy sequence in H. Since H is complete, x x for some x H. Since n n → ∈ T λI is continuous, − yn =(T λI)xn (T λI)x. n − −→→∞ − 358 13 Hilbert Space

Hence, y =(T λI)x and H is a closed subspace. − 1 Step 3. H = H. By Riesz first theorem, H = H H⊥. We have to show that H⊥ = 0 . Let 1 1 ⊕ 1 1 { } u H⊥, that is, since T = T , ∈ 1 ∗ 0= (T λI)x, u = x , (T λI)u , for all x H. h − i − ∈ This shows (T λI)u =0, hence T (u)= λu. This implies − T (u) , u = λ u, u . h i h i However, T = T ∗ implies that the left side is real, by Remark13.2(d). Hence λ = λ is real. We conclude, (T λI)u =0. By injectivity of T λI, u =0. That is H1 = H. − − 1 We have shown that there exists a linear operator S = (T λI)− which is inverse to T λI − − and defined on the whole space H. Since

y = (T λI)S(y) C S(y) , k k k − k ≥ k k S is bounded with S 1/C. Hence, S = R (T ). k k ≤ λ

Note that for any bounded real function f(x, y) we have

sup f(x, y) = sup(sup f(x, y)) = sup(sup f(x, y)). x,y x y y x

In particular, x = sup x, y since y = x/ x yields the supremum and CSI gives the k k y 1 |h i| k k k k≤ upper bound. Further, T (x) = sup T (x) , y such that k k y 1 |h i| k k≤ T = sup sup T (x) , y = sup T (x) , y = sup sup T (x) , y k k x 1 y 1 |h i| x 1, y 1 |h i| y 1 x 1 |h i| k k≤ k k≤ k k≤ k k≤ k k≤ k k≤ In case of self-adjoint operators we can generalize this.

Proposition 13.29 Let T = T L (H). Then we have ∗ ∈ T = sup T (x) , x . (13.12) k k x 1 |h i| k k≤ Proof. Let C = sup T (x) , x . By Cauchy–Schwarz inequality, T (x) , x T x 2 x 1 |h i| |h i|≤k kk k k k≤ such that C T . ≤k k For any real positive α> 0 we have:

2 2 1 1 1 T (x) = T (x) , T (x) = T (x) , x = T (αx + α− T (x)) , αx + α− T (x) k k h i 4 − 1 1 = T (αx α− T (x)) , αx α− T (x) − − − 1 1 2 1 2 C αx + α− T (x) + C αx α− T (x)  ≤ 4 − C 2 1 2 C 2 2  2 2 = 2 αx +2 α− T (x) = α x + α− T (x) . P.I. 4 k k 2 k k k k   

13.2 Bounded Linear Operators in Hilbert Spaces 359

Inserting α2 = T (x) / x we obtain k k k k C = ( T (x) x + x T (x) ) 2 k kk k k kk k which implies T (x) C x . Thus, T = C. k k ≤ k k k k

Let m = inf T (x) , x and M = sup T (x) , x denote the lower and upper bound of T . x =1 h i h i k k x =1 Then we have k k sup T (x) , x = max m , M = T , x 1 |h i| {| | } k k k k≤ and m x 2 T (x) , x M x 2 , for all x H. k k ≤h i ≤ k k ∈ Corollary 13.30 Let T = T L (H) be a self-adjoint operator. Then ∗ ∈ σ(T ) [m, M]. ⊂ Proof. Suppose that λ [m, M]. Then 0 6∈

C := inf λ0 µ > 0. µ [m,M] | − | ∈ Since m = inf T (x) , x and M = sup T (x) , x we have for x =1 x =1 h i x =1 h i k k k k k k

2 (T λ0I)x = x (T λ0I)x (T λ0I)x , x = T (x) , x λ0 x C. k − k k kk − kCSI ≥ |h − i| h i − k k ≥ [m,M] 1 ∈

This implies | {z } |{z}

(T λ I)x C x for all x H. k − 0 k ≥ k k ∈ By Proposition13.28, λ ρ(T ). 0 ∈

Example 13.13 (a) Let H = L2[0, 1], g C[0, 1] a real-valued function, and (T f)(t) = ∈ g g(t)f(t). Let m = inf g(t), M = sup g(t). One proves that m and M are the lower and t [0,1] t [0,1] ∈ ∈ upper bounds of T such that σ(T ) [m, M]. Since g is continuous, by the intermediate value g g ⊆ theorem, σ(Tg) = [m, M]. (b) Let T = T ∗ L (H) be self-adjoint. Then all eigenvalues of T are real and eigenvectors ∈ to different eigenvalues are orthogonal to each other. Proof. The first statement is clear from Corollary 13.30. Suppose that T (x)= λx and T (y)= µy with λ = µ. Then 6 λ x, y = T (x) , y = x , T (y) = µ x, y = µ x, y . h i h i h i h i h i Since λ = µ, x, y =0. 6 h i

The statement about orthogonality holds for arbitrary normal operators. 360 13 Hilbert Space

Appendix: Compact Self-Adjoint Operator in Hilbert Space

Proof of Proposition13.27. From the theory of power series, Theorem2.34 we know that the series

∞ z T n zn (13.13) − k k n=0 X converges if z < R and diverges if z > R, where | | | | 1 R = . (13.14) lim n T n n →∞ k k Inserting z =1/λ and using homework 38.4, wep have

∞ n 1 n λ− − T − n=0 X diverges if λ < lim n T n (and converges if λ > lim n T n ). The reason for the n n | | →∞ k k | | →∞ k k divergence of the power seriesp is, that the spectrum σ(T ) and the circlep with radius lim n T n n →∞ k k have points in common; hence p r(T )= lim n T n . n →∞ k k On the other hand, by Remark13.4 (d), λ σ(T )pimplies λn σ(T n); hence, by Remark 13.4 ∈ ∈ (c), λn T n = λ n T n . | |≤k k ⇒ | | ≤ k k Taking the supremum over all λ σ(T ) on the left and theplim over all n on the right, we have ∈ r(T ) lim n T n lim n T n = r(T ). n ≤ n k k ≤ →∞ k k →∞ p p Hence, the sequence n T n converges to r(T ) as n tends to . k k ∞ p Compact operators generalize finite rank operators. Integral operators on compact sets are com- pact.

Definition 13.16 A linear operator T L (H) is called compact if the closure T (U ) of the ∈ 1 unit ball U = x x 1 is compact in H. In other words, for every sequence (x ), 1 { | k k ≤ } n x U , there exists a subsequence such that T (x ) converges. n ∈ 1 nk Proposition 13.31 For T L (H) the following are equivalent: ∈ (a) T is compact. (b) T ∗ is compact. (c) For all sequences (x ) with ( x , y ) x, y converges for all y we have n h n i −→ h i T (x ) T (x). n −→ (d) There exists a sequence (T ) of operators of finite rank such that T T n k − nk −→ 0. 13.2 Bounded Linear Operators in Hilbert Spaces 361

Definition 13.17 Let T be an operator on H and H1 a closed subspace of H. We call H1 an reducing subspace if both H and H⊥ are T -invariant, i. e. T (H ) H and T (H⊥) H⊥. 1 1 1 ⊂ 1 1 ⊂ 1 Proposition 13.32 Let T L (H) be normal. ∈ (a) The eigenspace ker(T λI) is a reducing subspace for T and ker(T λI) = ker(T λI) . − − − ∗ (b) If λ,µ are distinct eigenvalues of T , ker(T λI) ker(T µI). − ⊥ − Proof. (a) Since T is normal, so is T λ. Hence (T λ)(x) = (T λ) (x) . Thus, − k − k − ∗ ker(T λ) = ker(T λ) . In particular, T (x)= λx if x ker(T λI). − − ∗ ∗ ∈ − We show invariance. Let x ker(T λ); then T (x) = λx ker( T αI). Similarly, ∈ − ∈ − x ker(T λI)⊥, y ker(T λI) imply ∈ − ∈ − T (x) , y = x , T (y) = x , λy =0. h i ∗

Hence, ker(T λI)⊥ is T -invariant, too. − (b) Let T (x)= λx and T (y)= µy. Then (a) and T ∗(y)= µy ... imply λ x, y = T (x) , y = x , T (y) = x , µy = µ x, y . h i h i ∗ h i h i Thus (λ µ) x, y =0; since λ = µ, x y. − h i 6 ⊥

Theorem 13.33 (Spectral Theorem for Compact Self-Adjoint Operators) Let H be an infi- nite dimensional separable Hilbert space and T L (H) compact and self-adjoint. ∈

Then there exists a real sequence (λn) with λn 0 and an CNOS en n Æ fk k n −→→∞ { | ∈ }∪{ | ∈

N Æ such that ⊂ }

T (e )= λ e , n Æ T (f )=0, k N. n n n ∈ k ∈ Moreover,

∞ T (x)= λ x , e e , x H. (13.15) n h ni n ∈ n=1 X Remarks 13.5 (a) Since e f is a CNOS, any x H can be written as its Fourier series { n}∪{ k} ∈ ∞ x = x , e e + x , f f . h ni n h ki k n=1 k N X X∈ Applying T using T (en)= λnen we have

∞ T (x)= x , e λ e + x , f T (f ) h ni n n h ki k n=1 k N X X∈ =0 which establishes (13.15). The main point is the existence of a CNOS| {z } of eigenvectors en { } ∪ fk .

{ } n n Ê (b) In case H = C ( ) the theorem says that any hermitean (symmetric) matrix A is diago- nalizable with only real eigenvalues. 362 13 Hilbert Space Chapter 14

Complex Analysis

Here are some useful textbooks on Complex Analysis: [FL88] (in German), [Kno78] (in Ger- man), [Nee97], [R¨uh83] (in German), [Hen88]. The main part of this chapter deals with holomorphic functions which is another name for a function which is complex differentiable in an open set. On the one hand, we are already familiar with a huge class of holomorphic functions: polynomials, the exponential function, sine and cosine functions. On the other hand holomorphic functions possess quite amazing properties completely unusual from the vie point of real analysis. The properties are very strong. For example, it is easy to construct a real function which is 17 times differentiable but not 18 times. A complex differentiable function (in a small region) is automatically infinitely often differentiable. Good references are Ahlfors [Ahl78], a little harder is Conway [Con78], easier is Howie [How03].

14.1 Holomorphic Functions

14.1.1 Complex Differentiation

We start with some notations.

U z z

r { || | }

C C Definition 14.1 Let U C be an open subset of and f : U be a complex function. ⊂ → (a) If z U and the limit 0 ∈ f(z) f(z0) lim − =: f ′(z0) z z0 z z → − 0 exists, we call f complex differentiable at z0 and f ′(z0) the derivative of f at z0. We call f ′(z0) the derivative of f at z0.

363 364 14 Complex Analysis

(b) If f is complex differentiable for every z U, we say that f is holomorphic in U. We call 0 ∈ f ′ the derivative of f on U.

(c) f is holomorphic at z0 if it complex differentiable in a certain neighborhood of z0.

To be quite explicit, f ′(z ) exists if to every ε> 0 there exists some δ > 0 such that z U (z ) 0 ∈ δ 0 implies f(z) f(z0) − f ′(z ) < ε. z z − 0 0 −

Remarks 14.1 (a) Differentiability of f at z0 forces f to be continuous at z0. Indeed, f is differentiable at z0 with derivative f ′(z0) if and only if, there exists a function r(z, z0) such that

f(z)= f(z )+ f ′(z )(z z )+(z z )r(z, z ), 0 0 − 0 − 0 0 where limz z0 r(z, z0)=0, In particular, taking the limit z z0 in the above equation we get → →

lim f(z)= f(z0), z z0 → which proves continuity at z0.

Complex conjugation is a uniformly continuous function on C since z z = z z for | − 0 | | − 0 | all z, z C. 0 ∈ (b) The derivative satisfies the well-known sum, product, and quotient rules, that is, if both f and g are holomorphic in U, so are f + g, fg, and f/g, provided g =0 in U and we have 6 f ′ f ′g fg′ (f + g)′ = f ′ + g′, (fg)′ = f ′g + fg′, = − . g g2   f g ◦ Also, the chain rule holds; if U V C are holomorphic, so is g f and → → (g◦f)′(z)= g′(f(z)) f ′(z).

The proofs are exactly the same as in the real case. Since the constant functions f(z) = c and

the identity f(z) = z are holomorphic in C, so is every polynomial with complex coefficients

and, moreover, every rational function (quotient of two polynomials) f : U C, provided the → denominator has no zeros in U. So, we already know a large class of holomorphic functions. Another bigger class are the convergent power series.

2 Example 14.1 f(z)= z is complex differentiable at 0 with f ′(0) = 0. f is not differentiable | | at z0 =1. Indeed, f(h + 0) f(0) h 2 lim − = lim | | = lim h =0. h 0 h h 0 h h 0 → → →

On the other hand. Let ε Ê ∈ 1+ ε 2 1 2ε + ε2 lim | | − = lim =2 ε 0 ε 0 → ε → ε whereas 1 + iε 2 1 1+ ε2 1 lim | | − = lim − =0. ε 0 ε 0 → iε → iε This shows that f ′(1) does not exist. 14.1 Holomorphic Functions 365

14.1.2 Power Series

n Recall from Subsection 2.3.9 that a power series cnz has a radius of convergence 1 R = P . n lim cn n →∞ | | That is, the series converges absolutely for all z withp z < R; the series diverges for z > R, | | | | the behaviour for z = R depends on the (c ). Moreover, it converges uniformly on every | | n closed ball Ur with 0

Proposition 14.1 Let a C and ∈ ∞ f(z)= c (z a)n (14.1) n − n=0 X

be a power series with radius of convergence R. Then f : U (a) C is holomorphic and the R → derivative is

∞ n 1 f ′(z)= nc (z a) − . (14.2) n − n=1 X Proof. If the series (14.1) converges in UR(a), the root test shows that the series (14.2) also converges there. Without loss of generality, take a =0. Denote the sum of the series (14.2) by g(z), fix w U (0) and choose r so that w

n 1 n 1 − − k 1 n k 1 k 1 n k k n k 1 =(z w) kw − z − − = kw − z − kw z − − , (14.3) − − k=1 k=1 X X  which gives a telescope sum if we shift k := k +1 in the first summand. If z

∞ f(z) f(w) 2 n 2 − g(w) z w n c r − . (14.4) z w − ≤| − | | n | n=2 − X Since r < R, the last series converges. Hence the left side of (14.4) tends to 0 as z w. This → says that f ′(w)= g(w), and completes the proof. 366 14 Complex Analysis

Corollary 14.2 Since f ′(z) is again a power series with the same radius of convergence R, the proposition can be applied to f ′(z). It follows that f has derivatives of all orders and that each derivative has a power series expansion around a

∞ (k) n k f (z)= n(n 1) (n k + 1)c (z a) − (14.5) − ··· − n − Xn=k Inserting z = a implies (k) f (a)= k!ck, k =0, 1,....

n This shows that the coefficients c in the power series expansion f(z)= ∞ c (z a) of f n n=0 n − with midpoint a are unique. P ∞ zn Example 14.2 The exponential function ez = is holomorphic on the whole complex n! n=0 z z X plane with (e )′ = e ; similarly, the trigonometric functions sin z and cos z are holomorphic in

C since ∞ z2n+1 ∞ z2n sin z = ( 1)n , cos z = ( 1)n . − (2n + 1)! − (2n)! n=0 n=0 X X We have (sin z)′ = cos z and (cos z)′ = sin z. −

Definition 14.2 A complex function which is defined on C and which is holomorphic on the entire complex plane is called an entire function.

14.1.3 Cauchy–Riemann Equations

2 Ê Let us identify the complex field C and the two dimensional real plane via z = x + iy, that is, every complex number z corresponds to an ordered pair (x, y) of real numbers. In this way,

2 C a complex function w = f(z) corresponds to a function U Ê where U is open. We → ⊂ have w = u + iv where u = u(x, y) and v = v(x, y) are the real and the imaginary parts of the function f; u = Re w and v = Im w. Problem: What is the relation between complex

2 2 Ê differentiability and the differentiability of f as a function from Ê to ?

Proposition 14.3 Let C f : U C, U open, a U → ⊂ ∈ be a function. Then the following are equivalent:

(a) f is complex differentiable at a. (b) f(x, y) = u(x, y) + iv(x, y) is real differentiable at a as a function

2 2 Ê f : U Ê , and the Cauchy–Riemann equations are satisfied at a: ⊂ → ∂u ∂v ∂u ∂v (a)= (a), (a)= (a). ∂x ∂y ∂y −∂x u = v , u = v . (14.6) x y y − x 14.1 Holomorphic Functions 367

In this case,

f ′ = u + iv = v iu . x x y − y Proof. (a) (b): Suppose that z = h + ik is a complex number such that a + z U; put → ∈ f ′(a)= b1 + ib2. By assumption,

f(a + z) f(a) zf ′(a) lim | − − | =0. z 0 z → | | We shall write this in the real form with real variables h and k. Note that

zf ′(a)=(h + ik)(b + ib )= hb kb + i(hb + kb ) 1 2 1 − 2 2 1 hb kb b b h = 1 − 2 = 1 − 2 . hb + kb b b k  2 2  2 1    This implies, with the identification z =(h, k),

b b h f(a + z) f(a) 1 − 2 − − b2 b1 k lim     =0. z 0 → z | | That is (see Subsection7.2), f is real differentiable at a with the Jacobian matrix

b1 b2 f ′(a)= Df(a)= − . (14.7) b b  2 1  By Proposition7.6, the Jacobian matrix is exactly the matrix of the partial derivatives, that is

u u Df(a)= x y . v v  x y 

Comparing this with (14.7), we obtain u (a) = v (a) = Re f ′(a) and u (a) = v (a) = x y y − x Im f ′(a). This completes the proof of the first direction. (b) (a). Since f = (u, v) is differentiable at a U as a real function, there exists a linear → 2 ∈ mapping Df(a) L (Ê ) such that ∈ h f(a +(h, k)) f(a) Df(a) − − k   lim =0. (h,k) 0 (h, k) → k k By Proposition7.6, u u Df(a)= x y . v v  x y  The Cauchy–Riemann equations show that Df takes the form

b b Df(a)= 1 − 2 , b b  2 1  368 14 Complex Analysis

where ux = b1 and vx = b2. Writing

h hb kb Df(a) = 1 − 2 = z(b + ib ) k hb + kb 1 2    2 2 in the complex form with z = h+ik gives f is complex differentiable at a with f ′(a)= b1 +ib2.

Example 14.3 (a) We already know that f(z) = z2 is complex differentiable. Hence, the Cauchy–Riemann equations must be fulfilled. From

f(z)= z2 =(x + iy)2 = x2 y2 + 2ixy, u(x, y)= x2 y2, v(x, y)=2xy − − we conclude

u =2x, u = 2y, v =2y, v =2x. x y − x y The Cauchy–Riemann equations are satisfied. (b) f(z)= z 2. Since f(z)= x2 + y2, u(x, y)= x2 + y2, v(x, y)=0. The Cauchy-Riemann | | equations yield u =2x =0= v and u =2y =0= v such that z =0 is the only solution x y y − x of the CRE. z =0 is the only point where f is differentiable. f(z)= z is nowhere differentiable since u(x, y)= x, v(x, y)= y; thus − 1= u = v = 1.

x 6 y − C A function f : U C, U open, is called locally → ⊂ c constant in U, if for every point a U there exists a ball c 2 ∈ 1 V with a V U such that f is constant on V . ∈ ⊂ Clearly, on every connectedness component of U, f is constant. In fact, one can define U to be connected if for

c every holomorphic f : U C, f is constant.

3 → C Corollary 14.4 Let U C be open and f : U be a holomorphic function on U. ⊂ → (a) If f ′(z)=0 for all z U, then f is locally constant in U. ∈ (b) If f takes real values only, then f is locally constant. (c) If f has a continuous second derivative, u = Re f and v = Im f are harmonic functions, i.e., they satisfy the Laplace equation ∆(u)= uxx + uyy =0 and ∆(v)=0.

Proof. (a) Since f ′(z)=0 for all z U, the Cauchy–Riemann equations imply u = u = ∈ x y vx = vy =0 in U. From real analysis, it is known that u and v are locally constant in U (apply Corollary 7.12 with grad f(a + θx)=0). (b) Since f takes only real values, v(x, y)=0 for all (x, y) U. This implies v = v =0 on ∈ x y U. By the Cauchy–Riemann equations, ux = uy =0 and f is locally constant by (a). 14.2 Cauchy’s Integral Formula 369

(c) u = v implies u = v and differentiating u = v with respect to y yields x y xx yx y − x u = v . Since both u and v are twice continuously differentiable (since so is f), by yy − xy Schwarz’ Lemma, the sum is u + u = v v = 0. The same argument works for xx yy yx − xy vxx + vyy =0.

Remarks 14.2 (a) We will see soon that the additional differentiability assumption in (c) is superfluous. (b) Note, that an inverse statement to (c) is easily proved: If Q = (a, b) (c,d) is an open

× C rectangle and u: Q Ê is harmonic, then there exists a holomorphic function f : Q → → such that u = Re f.

14.2 Cauchy’s Integral Formula

14.2.1 Integration

The major objective of this section is to prove the converse to Proposition14.1: Every in D holomorphic function is representable as a power series in D. The quickest route to this is via Cauchy’s Theorem and Cauchy’s Integral Formula. The required integration theory will be developed. It is a useful tool to study holomorphic functions. Recall from Section5.4 the definition of the Riemann integral of a bounded complex valued

function ϕ: [a, b] C. It was defined by integrating both the real and the imaginary parts of → ϕ. In what follows, a path is always a piecewise continuously differentiable curve.

Definition 14.3 Let U C be open and ⊂

f : U C a continuous function on U. Sup- → pose that γ : [t1, tn] U is a path in U. The Z → 6 integral of f along γ is defined as the line inte- gral U Z 5 f C tk Z 4 n Z 1 γ f(z) dz := f(γ(t))γ′(t) dt, (14.8)

Z 2 Z 3 Z k=1 Z γ Xtk−1 where γ is continuously differentiable on

[tk 1, tk] for all k =1,...,n. − By the change of variable rule, the integral of f along γ does not depend on the parametrization γ of the path γ(t) t [t , t ] . However, if we exchange the initial and the end point of γ(t), { | ∈ 0 1 } we obtain a negative sign.

Remarks 14.3 (Properties of the complex integral) (a) The integral of f along γ is linear

over C:

(αf1 + βf2) dz = α f1 dz + β f2 dz, f(z) dz = f(z) dz, γ− − γ+ Zγ Zγ Zγ Z Z 370 14 Complex Analysis

where γ has the opposite orientation of γ+. − (b) If γ1 and γ2 are two paths so that γ1 and γ2 join to form γ, then we have

f(z) dz = f(z) dz + f(z) dz. γ2 Zγ γZ1 Z

(c) From the definition and the triangle inequality, it follows that for a continuously differen- tiable path γ

f(z) dz M ℓ, ≤ Z γ

b where f(z) M for all z γ and ℓ is the length of γ, ℓ = γ′(t) dt. t [t , t ]. Note | | ≤ ∈ a | | ∈ 0 1 that the integral on the right is the length of the curve γ(t). R b (d) The integral of f over γ generalizes the real integral f(t) dt. Indeed, let γ(t)= t, t [a, b], a ∈ then R b f(z) dz = f(t) dt.

Zγ Za

(e) Let γ be the circle Sr(a) of radius r with center a. We can parametrize the positively oriented circle as γ(t)= a + reit, t [0, 2π]. Then ∈

2π f(z) dz = ir f a + reit eit dt. Z Z γ 0 

it Example 14.4 (a) Let γ1(t) = e , t [0, π], be the half of the unit circle from 1 to 1 via i and ∈ it − γ (t)= t, t [ 1, 1] the segment from 1 to 1. Then γ′ (t) = ie and γ (t)′ = 1. Hence, 2 − ∈ − − 1 2 − π π π 2 2it it 2it+it it z dz = i e e dt = i e− dt = i e− dt 0 0 0 γZ1 Z Z Z π i it = e− = ( 1 1)=2. i − − − − 0 1 2 2 2 z dz = t dt = . see (b) − 1 −3 γZ2 Z−

In particular, the integral of z2 is not path independent. (b) 2π it 2π dz ire n+1 (n 1)it 0, n =1, = dt = ir− e− − dt = 6 zn rneint ZSr Z0 Z0 (2πi, n =1. 14.2 Cauchy’s Integral Formula 371

14.2.2 Cauchy’s Theorem Cauchy’s theorem is the main part in the proof that every in a holomorphic function can be written as a power series with midpoint a. As a consequence of Corollary14.2, holomorphic functions have derivatives of all orders. We start with a very weak form. The additional assumption is, that f has an antiderivative.

Lemma 14.5 Let f : U C be continuous, and suppose that f has an antiderivative F which → is holomorphic on U, F ′ = f. If γ is any path in U joining z0 and z1 from U, we have

f(z) dz = F (z ) F (z ). 2 − 1 Zγ

In particular, if γ is a closed path in U

f(z) dz =0.

Proof. It suffices to prove the statement for a continuously differentiable curve γ(t). Put h(t)= F (γ(t)). By the chain rule d h′(t)= F (γ(t)) = F ′(γ(t))γ′(t)= f(γ(t))γ′(t) dt By definition of the integral and the fundamental theorem of calculus (see Subsection 5.5),

b b b f(z) dz = f(γ(t))γ′(t) dt = h′(t) dt = h(t) a = h(b) h(a)= F (z1) F (z0). a a | − − Zγ Z Z

Example 14.5 (a) 1 i 4 4 − (1 i) (2 + 3i) z3 dz = − . 4 − 4 Z2+3i (b) πi ez dz = 1 e. 1 − − R Theorem 14.6 (Cauchy’s Theorem) Let U be a simply connected region in C and let f(z) be holomorphic in U. Suppose that γ(t) is a path in U joining z0 and z1 in U. Then f(z) dz γ depends on z0 and z1 only and not on the choice of the path. In particular, f(z) dzR = 0 for γ any closed path in U. R

Proof. We give the proof under the weak additional assumption that f ′ not only exists but is continuous in U. In this case, the partial derivatives ux, uy, vx, and vy are continuous and we can apply the integrability criterion Proposition8.3 which was a consequence of Green’s theorem, see Theorem10.3. Note that we need U to be simply connected in contrast to Lemma14.5. 372 14 Complex Analysis

Without this additional assumption (f ′ is continuous), the proof is lengthy (see [FB93,Lan89, J¨an93]) and starts with triangular or rectangular paths and is generalized then to arbitrary paths. We have

f(z) dz = (u + iv)( dx + i dy)= (u dx v dy) + i (v dx + u dy). − Zγ Zγ Zγ Zγ We have path independence of the line integral P dx + Q dy if and only if, the integrability γ condition Qx = Py is satisfied if and only if P dxR + Q dy is a closed form. In our case, the real part is path independent if and only if v = u . The imaginary part is − x y path independent if and only if ux = vy. These are exactly the Cauchy–Riemann equations which are satisfied since f is holomorphic.

Remarks 14.4 (a) The proposition holds under the following weaker assumption: f is contin- uous in the closure U and holomorphic in U, U is a simply connected region, and γ = ∂U is a path. (b) The statement is wrong without the assumption “U is simply connected”. Indeed, consider the circle of radius r with center a, that is γ(t) = a + reit. Then f(z)=1/(z a) is singular − at a and we have dz 2π eit 2π = ir it dt = i dt =2πi. z a 0 re 0 SrZ(a) − Z Z (c) For a non-simply connected region G one δ δ cuts with pairwise 3 4 G δ 2 γ inverse to each other δ 1 paths (in the picture: δ1, γ 2 δ2, δ3 and δ4). The re- sulting region G˜ is now γ 1 simply connected such that f(z) dz = 0 by ∂G (a). R Since the integrals along δi, i =1,..., 4, cancel, we have

f(z) dz =0.

γ+γZ1+γ2 In particular, if f is holomorphic in z 0 < z a < R and 0 < Sr 2 { | | − | } r1

Sr 1 f(z) dz = f(z) dz Z Z Sr1 (a) Sr2 (a) if both circles are positively oriented. 14.2 Cauchy’s Integral Formula 373

Proposition 14.7 Let U be a simply connected region, z U, U = U z . Suppose that f 0 ∈ 0 \ { 0} is holomorphic in U0 and bounded in a certain neighborhood of z0. Then f(z) dz =0

Zγ for every non-selfintersecting closed path γ in U0.

Proof. Suppose that f(z) C for z z <ε . For any ε with 0 <ε<ε we then have by | | ≤ | − 0 | 0 0 Remark 14.3 (c)

f(z) dz 2πεC. ≤ SεZ(z0)

By Remark 14.4 (c), f(z) dz = f(z) dz = f(z) dz. Hence Z Z Z γ Sε0 (z0) Sε(z0)

f(z) dz = f(z) dz 2πεC. ≤

Zγ SεZ(z0)

Since this is true for all small ε> 0, f(z) dz =0. γ R We will see soon that under the conditions of the proposition, f can be made holomorphic at z0, too.

14.2.3 Cauchy’s Integral Formula

Theorem 14.8 (Cauchy’s Integral Formula) Let U be a region. Suppose that f is holomor- phic in U, and γ a non-selfintersecting positively oriented path in U such that γ is the boundary of U U; in particular, U is simply connected. 0 ⊂ 0 Then for every a U we have ∈ 0 1 f(z) dz f(a)= (14.9) 2πi z a Zγ −

Proof. a U is fixed. For z U we define ∈ 0 ∈ f(z) f(a) − z a , z = a F (z)= − 6 (0, z = a.

Then F (z) is holomorphic in U a and bounded in a neighborhood of a since f ′(a) exists \ { } and therefore, f(z) f(a) − f ′(a) <ε z a −

374 14 Complex Analysis as z approaches a. Using Proposition14.7 and Remark14.4(b) we have F (z) dz =0, that is γ R f(z) f(a) − dz =0, z a Zγ − such that f(z) dz f(a) dz dz = = f(a) =2πi f(a). z a z a z a Zγ − Zγ − Zγ −

Remark 14.5 The values of a holomorphic function f inside a path γ are completely deter- mined by the values of f on γ.

Example 14.6 Evaluate sin z I := dz r z2 +1 SrZ(a) 1 in cases a =1+i and r = 2 , 2, 3. Solution. We use the partial fraction decomposition of z2 + 1 to obtain linear terms in the denominator. 1 1 1 1 = . z2 +1 2i z i − z + i  −  Hence, with f(z) = sin z we have in case r =3

sin z dz 1 sin z 1 sin z I = = dz dz 3 z2 +1 2i z i − 2i z + i SrZ(a) SrZ(a) − SrZ(a) = π(f(i) f( i)) = 2π sin(i) = πi(e 1/e). − − − sin z In case r =2, the function z+i is holomorphic inside the circle of radius 2 with center a. Hence,

I2 = π sin(i) = I3/2.

1 In case r = , both integrand are holomorphic, such that I 1 =0. 2 2 Example 14.7 Consider the function f(z) = eiz2 which

is an entire function. Let γ1(t) = t, t [0, R], be the ∈ it segment from 0 to R on the real line; let γ2(t) = Re , t [0,π/4], be the sector of the circle of radius R with γ γ ∈ iπ/4 2 center 0; and let finally γ (t) = te , t [0, R], be the 3 3 ∈ segment from 0 to Reiπ/4. By Cauchy’s Theorem,

45°

0 γ R I1 + I2 I3 = f(z) dz =0. 1 − γ1+γZ2 γ3 − 14.2 Cauchy’s Integral Formula 375

2 Obviously, since eiπ/4 = eiπ/2 = i

R  it2 I1 = e dt, 0 Z R iπ/4 t2 I3 = e e− dt Z0 We shall show that I2(R) 0 as R tends to . We have n | | −→→∞ ∞ π/4 π/4 π/4 i(R2e2it) it iR2(cos 2t+isin2t) R2 sin2t I2(R) = e Rie dt R e dt R e− dt. | | 0 ≤ 0 ≤ 0 Z Z Z

Note that sin t is a concave function on [0,π/2], that is, the graph of the sine function is above the graph of the corresponding linear function through (0, 0) and (π/2, 1); thus, sin t 2t/π, ≥ t [0,π/2]. We have ∈ π/4 R24t/π π R2 π R2 I2(R) R e− dt = e− 1 = 1 e− . | | ≤ 0 −4R − 4R − Z     We conclude that I (R) tends to 0 as R . By Cauchy’s Theorem I + I I = 0 for | 2 | → ∞ 1 2 − 3 all R, we conclude

∞ it2 iπ/4 ∞ t2 lim I1(R)= e dt = e e− dt = lim I3(R). R R →∞ Z0 Z0 →∞ The integral on the right is √π/2 (see below); hence eit2 = cos(t2)+isin(t2) implies

∞ √2π ∞ cos(t2) dt = = sin(t2) dt. 4 Z0 Z0 2 ∞ x These are the so called Fresnel integrals. We show that I = 0 e− dx = √π/2. (This was already done in Homework 41) For, we compute the double integral using Fubini’s theorem: R ∞ ∞ x2 y2 ∞ x2 ∞ y2 2 e− − dxdy = e− dx e− dy = I . 0 0 Z0 Z0 Z Z Passing to polar coordinates yields dxdy = r dr, x2 + y2 = r2 such that

π/2 R x2 y2 r2 e− − dxdy = lim dϕ e− r dr. R 0 0 2 →∞ (ÊZZ+) Z Z The change of variables r2 = t, dt =2r dr yields

2 1 π ∞ t π √π I = e− dt = = I = . 2 2 4 ⇒ 2 Z0 This proves the claim. In addition, the change of variables √x = s also yields

x 1 ∞ e Γ = − dx = √π. 2 √x   Z0 376 14 Complex Analysis

Theorem 14.9 Let γ be a path in an open set U and g be a continuous function on U. If a is not on γ, define g(z) h(a)= dz. z a Zγ − Then h is holomorphic on the complement of γ in U and has derivatives of all orders. They are given by g(z) h(n)(a)= n! dz. (z a)n+1 Zγ −

Proof. Let b U and not on γ. Then there exists ∈ some r> 0 such that z b r for all points z on | − | ≥ γ. Let 0

series expansion in the ball Us(b). We write z a 1 1 1 1 s = = a b b z a z b (a b) z b 1 z−b − − − − − − − 1 a b a b 2 = 1+ − + − + . z b z b z b ··· − −  −  ! γ This geometric series converges absolutely and uni- formly for a b s because | − | ≤ U a b s − < 1. z b ≤ r

Since g is continuous and γ is a compact set, g(z) is bounded on γ such that by Theorem6.6, a b n ∞ the series n=0 g(z) z−b can be integrated term by term, and we find − P  ∞ (a b)n h(a)= g(z) − dz (z b)n+1 n=0 Zγ X − ∞ g(z) = (a b)n dz − (z b)n+1 n=0 X Zγ − ∞ = c (a b)n, n − n=0 X where g(z) dz c = . n (z b)n+1 Zγ − This proves that h can be expanded into a power series in a neighborhood of b. By Propo- sition14.1 and Corollary 14.2, f has derivatives of all orders in a neighborhood of b. By the 14.2 Cauchy’s Integral Formula 377 formula in Corollary14.2 g(z) dz h(n)(b)= n!c = n! . n (z b)n+1 Zγ −

Remark 14.6 There is an easy way to deduce the formula. Formally, we can exchange the d differentiation da and γ:

dR g(z) d 1 g(z) h′(a)= dz = g(z)(z a)− dz = dz. da z a da − (z a)2 Zγ − Zγ Zγ −  d 2 g(z) h′′(a)= g(z)(z a)− dz =2 dz. da − (z a)3 Zγ Zγ −  Theorem 14.10 Suppose that f is holomorphic in U and U (a) U, then f has a power series r ⊂ expansion in Ur(a) ∞ f(z)= c (z a)n. n − n=0 X In particular, f has derivatives of all orders, and we have the following coefficient formula f (n)(a) 1 f(z) dz c = = . (14.10) n n! 2πi (z a)n+1 SrZ(a) − Proof. In view of Cauchy’s Integral Formula (Theorem 14.8) we obtain 1 f(z) dz f(a)= 2πi z a SRZ(a) − Inserting g(z)= f(z)/(2πi) (f is continuous) into Theorem14.9, we see that f can be expanded into a power series with center a and, therefore, it has derivatives of all orders at a, n! f(z) dz f (n)(a)= . 2πi (z a)n+1 SRZ(a) −

14.2.4 Applications of the Coefficient Formula Proposition 14.11 (Growth of Taylor Coefficients) Suppose that f is holomorphic in U and is bounded by M > 0 in Ur(a) U; that is, z a

Proof. By the coefficient formula (14.10) and Remark 14.3 (c) we have noting that f(z) M for z S a (z a)n+1 ≤ rn+1 ∈ r

1 f(z) 1 M M M c dz ℓ(S (a)) = 2πr = . n n+1 n+1 r n+1 n | | ≤ 2πi (z a) ≤ 2π r 2πr r SrZ(a) −

Theorem 14.12 (Liouville’s Theorem) A bounded entire function is constant.

Proof. Suppose that f(z) M for all z C. Since f is given by a power series f(z) = n | | ≤ ∈ ∞ c z with radius of convergence R = , the previous proposition gives n=0 n ∞ P M c | n | ≤ rn for all r> 0. This shows c =0 for all n =0; hence f(z)= c is constant. n 6 0

Remarks 14.7 (a) Note that we explicitly assume f to be holomorphic on the entire complex 1/z plane. For example, f(z) = e is holomorphic and bounded outside every ball Uε(0). How- ever, f is not constant. (b) Note that f(z) = sin z is an entire function which is not constant. Hence, sin z is unbounded as a complex function.

Theorem 14.13 (Fundamental Theorem of Algebra) A polynomial p(z) with complex coeffi- cients of degree deg p 1 has a complex root. ≥

Proof. Suppose to the contrary that p(z) = 0 for all z C. It is known, see Example3.3, that 6 ∈ lim p(z) =+ . In particular there exists R> 0 such that z | | ∞ | |→∞ z R = p(z) 1. | | ≥ ⇒ | | ≥ That is, f(z)=1/p(z) is bounded by 1 if z R. On the other hand, f is a continuous | | ≥ function and z z R is a compact subset of C. Hence, f(z)=1/p(z) is bounded on { | | | ≤ } UR, too. That is, f is bounded on the entire plane. By Liouville’s theorem, f is constant and so

is p. This contradicts our assumption deg p 1. Hence, p has a root in C. ≥

Now, there is an inverse-like statement to Cauchy’s Theorem. C Theorem 14.14 (Morera’s Theorem) Let f : U C be a continuous function where U → ⊂ is open. Suppose that the integral of f along each closed triangular path [z1, z2, z3] in U is 0. Then f is holomorphic in U. 14.2 Cauchy’s Integral Formula 379

Proof. Fix z U. We show that f has an anti-derivative in a small neighborhood U (z ) U. 0 ∈ ε 0 ⊂ For a Uε(z0) define ∈ a F (a)= f(z) dz.

zZ0 Note that F (a) takes the same value for all polygonal paths

from z0 to a by assumption of the theorem. We have

F (a + h) F (a) 1 a+h a+h − f(a) = (f(z) f(a)) dz , a h − h − Za

where the integral on the right is over the segment form a to z a+h a + h and we used a c dz = ch. By Remark 14.3 (c), the right side is less than or equal to R 1 sup f(z) f(a) h = sup f(z) f(a) . ≤ h z U (a) | − || | z U (a) | − | | | ∈ h ∈ h

Since f is continuous the above term tends to 0 as h tends to 0. This shows that F is differentiable at a with F ′(a) = f(a). Since F is holomorphic in U, by Theorem14.10 it has derivatives of all orders; in particular f is holomorphic.

Corollary 14.15 Suppose that (fn) is a sequence of holomorphic functions on U, uniformly converging to f on U. Then f is holomorphic on U.

Proof. Since fn are continuous and uniformly converging, f is continuous on U. Let γ be any closed triangular path in U. Since (fn) converges uniformly, we may exchange integration and limit:

f(z) dz = lim fn(z) dz = lim fn(z) dz = lim 0=0 n n n Zγ Zγ →∞ →∞ Zγ →∞ since each fn is holomorphic. By Morera’s theorem, f is holomorphic in U.

Summary

Let U be a region and f : U C be a function on U. The following are equivalent: → (a) f is holomorphic in U.

(b) f = u + iv is real differentiable and the Cauchy–Riemann equations ux = vy and u = v are satisfied in U. v − x (c) If U is simply connected, f is continuous and for every closed triangular path

γ = [z1, z2, z3] in U, γ f(z) dz =0 (Morera condition). R 380 14 Complex Analysis

(d) f possesses locally an antiderivative, that is, for every a U there is a ball ∈ U (a) U and a holomorphic function F such that F ′(z)= f(z) for all z U (a). ε ⊂ ∈ ε (e) f is continuous and for every ball U (a) with U (a) U we have r r ⊂ 1 f(z) f(b)= dz, b U (a). 2πi z b ∀ ∈ r SrZ(a) −

(f) For a U there exists a ball with center a such that f can be expanded in that ∈ ball into a power series. (g) For every ball B which is completely contained in U, f can be expanded into a power series in B.

14.2.5 Power Series Since holomorphic functions are locally representable by power series, it is quite useful to know how to operate with power series. In case that a holomorphic function f is represented by a power series, we say that f is an analytic function. In other words, every holomorphic function is analytic and vice versa. Thus, by Theorem14.10, any holomorphic function is analytic and vice versa.

(a) Uniqueness

n n If both cnz and bnz converge in a ball around 0 and define the same function then

c = b for all n Æ . n n P ∈ P0

(b) Multiplication

n n If both cnz and bnz converge in a ball Ur(0) around 0 then

P P ∞ ∞ ∞ c zn b zn = d zn, z < r, n · n n | | n=0 n=0 n=0 X X X n where dn = cn kbk. k=0 − P (c) The Inverse 1/f

∞ n Let f(z)= cnz be a convergent power series and n=0 X c =0. 0 6 Then f(0) = c = 0 and, by continuity of f, there exists r > 0 such that the power series 0 6 converges in the ball Ur(0) and is non-zero there. Hence, 1/f(z) is holomorphic in Ur(0) and therefore it can be expanded into a converging power series in Ur(0), see summary (f). Suppose 14.2 Cauchy’s Integral Formula 381

∞ that 1/f(z) = g(z) = b zn, z < r. Then f(z)g(z)=1=1+0z +0z2 + , the n | | ··· n=0 uniqueness and (b) yieldsX 1= c b , 0= c b + c b , 0= c b + c b + c b , 0 0 0 1 1 0 0 2 1 1 2 0 ···

This system of equations can be solved recursively for b , n Æ , for example, b = 1/c , n ∈ 0 0 0 b = c b /c . 1 − 1 0 0 (d) Double Series Suppose that ∞ n

f (z)= c (z a) , k Æ k kn − ∈ n=0 X are converging in Ur(a) power series. Suppose further that the series

∞ fk(z) Xk=1 converges locally uniformly in Ur(a) as well. Then

∞ ∞ ∞ f (z)= c (z a)n. k kn − n=0 ! Xk=1 X Xk=1 In particular, one can form the sum of a locally uniformly convergent series fk(z) of power series coefficientwise. Note that a series of functions ∞ f (z) converges locally uniformly at b if k=1 k P there exists ε> such that the series converges uniformly in U b . 0 P ε( ) Note that any locally uniformly converging series of holomorphic functions defines a holo- morphic function (Theorem of Weierstraß). Indeed, since the series converges uniformly, line integral and summation can be exchanged: Let γ = [z0z1z2] be any closed triangular path inside U, then by Cauchy’s theorem

∞ ∞ ∞ f(z) dz = fk(z) dz = fk(z) dz = 0=0. γ γ γ Z Z Xk=1 Xk=1 Z Xk=1 By Morera’s theorem, f(z)= k∞=1 f(z) is holomorphic. P (e) Change of Center

n Let f(z) = ∞ c (z a) be convergent in U (a), r > 0, and b U (a). Then f can be n=0 n − r ∈ r expanded into a power series with center b P ∞ f (n)(b) f(z)= b (z b)n, b = , n − n n! n=0 X with radius of convergence at least r b a . Also, the coefficients can be obtained by − | − | reordering for powers of (z b)k using the binomial formula − n n n n k n k (z a) =(z b + b a) = (z b) (b a) − . − − − k − − Xk=0   382 14 Complex Analysis

(f) Composition

We restrict ourselves to the case

f(z)= a + a z + a z2 + 0 1 2 ··· g(z)= b z + b z2 + 1 2 ··· where g(0) = 0 and therefore, the image of g is a small neighborhood of 0, and we assume that the first power series f is defined there; thus, f(g(z)) is defined and holomorphic in a certain neighborhood of 0, see Remark 14.1. Hence

h(z)= f(g(z)) = c + c z + c z2 + 0 1 2 ··· (n) where the coefficients cn = h (0)/n! can be computed using the chain rule, for example, c0 = f(g(0)) = a0, c1 = f ′(g(0))g′(0) = a1b1.

(g) The Composition Inverse f −1

n Suppose that f(z)= n∞=1 anz , a1 =0, has radius of convergence r > 0. Then there exists a n 6 power series g(z) = ∞ b z converging on U (0) such that f(g(z)) = z = g(f(z)) for all Pn=1 n ε z U (0). Using (f) and the uniqueness, the coefficients b can be computed recursively. ∈ ε P n Example 14.8 (a) The function 1 1 f(z)= + 1+ z2 3 z −

is holomorphic in C i, i, 3 . Expanding f into a power series with center 1, the closest \ { − } singularity to 1 is i. Since the disc of convergence cannot contain i, the radius of convergence ± ± is 1 i = √2. Expanding the power series around a = 2, the closest singularity of f is 3; | − | hence, the radius of convergence is now 3 2 =1. | − | 1 (b) Change of center. We want to expand f(z) = 1 z

which is holomorphic in C 1 into a power series− \ { } i around b = i/2. For arbitrary b with b < 1 we have | | i/2 1 1 1 1 = = 1 z 1 b (z b) 1 b · z b 1 1 1−b 0 − − − − − − − ∞ 1 = (z b)n = f˜(z). (1 b)n+1 − n=0 X − By the root test, the radius of convergence of this series is 1 b . In case b = i/2 we have | − | r = 1 i/2 = 1+1/4= √5/2. Note that the power series 1+ z + z2 + has radius of | − | ··· convergence 1 and a priori defines an analytic (= holomorphic) function in the open unit ball. p However, changing the center we obtain an analytic continuation of f to a larger region. This example shows that (under certain assumptions) analytic functions can be extended into a larger region by changing the center of the series. 14.3 Local Properties of Holomorphic Functions 383

14.3 Local Properties of Holomorphic Functions

We omit the proof of the Open Mapping Theorem and refer to [J¨an93, Satz 11, Satz 13] see also [Con78].

Theorem 14.16 (Open Mapping Theorem) Let G be a region and f a non-constant holomor- phic function on G. Then for every open subset U G, f(U) is open. ⊂ The main idea is to show that any at some point a holomorphic function f with f(a)=0 looks like a power function, that is, there exists a positive integer k such that f(z)= h(z)k in a small neighborhood of a, where h is holomorphic at a with a zero of order 1 at a.

Theorem 14.17 (Maximum Modulus Theorem) Let f be holomorphic in the region U and a U is a point such that f(a) f(z) for all z U. Then f must be a constant function. ∈ | |≥| | ∈ Proof. First proof. Let V = f(U) and b = f(a). By assumption, b w for all w V . | |≥| | ∈ b cannot be an inner point of V since otherwise, there is some c in the neighborhood of b with c > b which contradict the assumption. Hence b is in the boundary ∂V V . In particular, | | | | ∩ V is not open. Hence, the Open Mapping Theorem says that f is constant. Second Proof. We give a direct proof using Cauchy’s Integral formula. For simplicity let a =0 and let U (0) U be a small ball in U. By Cauchy’s theorem with γ = S (0), z = γ(t)= reit, r ⊆ r t [0, 2π], dz = rieit dt we get ∈ 1 2π f reit rieit 1 2π f(0) = dt = f(reit) dt. 2πi reit 2π Z0  Z0 In other words, f(0) is the arithmetic mean of the values of f on any circle with center 0. Let M = f(a) f(z) be the maximal modulus of f on U. Suppose, there exists z | |≥| | 0 with z = r eit0 with r < r and f(z ) < M. Since f is continuous, there exists a whole 0 0 0 | 0 | neighborhood of t U (t ) with f(reit) < M. However, in this case ∈ ε 0 1 2π 1 2π M = f(0) = f( reit) dt f(reit) dt < M | | 2π ≤ 2π Z0 Z0 which contradicts the mean value property. Hence, f(z) = M is constant in any sufficiently | | small neighborhood of 0. Let z U be any point in U. We connect 0 and z by a path in 1 ∈ 1 U. Let d be its distance from the boundary ∂U. Let z continuously moving from 0 to z1 and cosider the chain of balls with center z and radius d/2. By the above, f(z) = M in any such | | ball, hence f(z) = M in U. It follows from homework 47.2 that f is constant. | |

Remark 14.8 In other words, if f is holomorphic in G and U G, then sup f(z) is attained ⊂ z U | | on the boundary ∂U. Note that both theorems are not true in the real setting:∈ The image of the sine function of the open set (0, 2π) is [ 1, 1] which is notopen. The maximum of f(x)=1 x2 − − over ( 1, 1) is not attained on the boundary since f( 1) = f(1) = 0 while f(0) = 1. However − − z2 1 on the complex unit ball attains its maximum in z = i—on the boundary. | − | ± 384 14 Complex Analysis

Recall from topology: C An accumulation point of a set M C is a point of z such that for every ε > 0, • ⊂ ∈ Uε(z) contains infinitely many elements of M. Accumulation points are in the closure of M, not necessarily in the boundary of M. The set of accumulation points of M is closed (Indeed, suppose that a is in the closure of the set of accumulation points of M. Then every neighborhood of a meets the set of accumulation points of M; in particular, every neighborhood has infinitely many element of M. Hence a itself is an accumulation point of M).

M is connected if every locally constant function is constant. M is not connected if M is •

the disjoint union of two non-empty subsets A and B, both are open and closed in M. C For example, M = 1/n n Æ has no accumulation point in 0 but it has one { | ∈ } \ { } accumulation point, 0, in C.

Proposition 14.18 Let U be a region and let f : U C be holomorphic in U. → If the set Z(f) of the zeros of f has an accumulation point in U, then f is identically 0 in U.

1 C Example 14.9 Consider the holomorphic function f(z) = sin z , f : U C, U = \ 0 and 1 → { } with the set of zeros Z(f)= z = n Z . The only accumulation point 0 of Z(f) does { n nπ | ∈ } not belong to U. The proposition does not apply. Proof. Suppose a U is an accumulation point of Z(f). Expand f into a power series with ∈ center a: ∞ f(z)= c (z a)n, z a < r. n − | − | n=0 X Since a is an accumulation point of the zeros, there exists a sequence (zn) of zeros converging to a. Since f is continuous at a, limn f(zn)=0= f(a). This shows c0 = 0. The same →∞ argument works with the function f(z) f (z)= c + c (z a)+ c (z a)3 + = , 1 1 2 − 3 − ··· z a − which is holomorphic in the same ball with center a and has a as an accumulation point of zeros. Hence, c = 0. In the same way we conclude that c = c = = c = = 0. This shows 1 2 3 ··· n ··· that f is identically 0 on Ur(a). That is, the set A = a U a is an accumulation point of Z(f) { ∈ | } is an open set. Also, B = a U a is not an accumulation point of Z(f) { ∈ | } is open (with every non-accumulation point z, there is a whole neighborhood of z not contain- ing accumulation points of Z(f)). Now, U is the disjoint union of A and B, both are open as well as closed in U. Hence, the characteristic function on A is a locally constant function on U. Since U is connected, either U = A or U = B. Since by assumption A is non-empty, A = U, that is f is identically 0 on U. 14.4 Singularities 385

Theorem 14.19 (Uniqueness Theorem) Suppose that f and g are both holomorphic functions on U and U is a region. Then the following are equivalent:

(a) f = g (b) The set D = z U f(z) = g(z) where f and g are equal has an accumu- { ∈ | } lation point in U. (c) There exists z U such that f (n)(z ) = g(n)(z ) for all non-negative integers 0 ∈ 0 0

n Æ . ∈ 0 Proof. (a) (b). Apply the previous proposition to the function f g. ↔ − (a) implies (c) is trivial. Suppose that (c) is satisfied. Then, the power series expansion of f g at z is identically 0. In particular, the set Z(f g) contains a ball B (z ) which has an − 0 − ε 0 accumulation point. Hence, f g =0. −

The following proposition is an immediate consequence of the uniqueness theorem.

Proposition 14.20 (Uniqueness of Analytic Continuation) Suppose that M U C where ⊂ ⊂ U is a region and M has an accumulation point in U. Let g be a function on M and suppose that f is a holomorphic function on U which extents g, that is f(z)= g(z) on M. Then f is unique.

Remarks 14.9 (a) The previous proposition shows a quite amazing property of a holomorphic function: It is completely determined by “very few values”. This is in a striking contrast to

C∞-functions on the real line. For example, the “hat function”

1 e− 1−x2 x < 1, h(x)= | | (0 x 1 | | ≥ is identically 0 on [2, 3] (a set with accumulation points), however, h is not identically 0. This shows that h is not holomorphic. (b) For the uniqueness theorem, it is an essential point that U is connected. (c) It is now clear that the real function ex, sin x, and cos x have a unique analytic continuation into the complex plane. (d) The algebra O(U) of holomorphic functions on a region U is a domain, that is, fg = 0 implies f = 0 or g = 0. Indeed, suppose that f(z ) = 0, then f(z) = 0 in a certain 0 6 6 neighborhood of z0 (by continuity of f). Then g = 0 on that neighborhood. Since an open set has always an accumulation point in itself, g =0.

14.4 Singularities

◦ We consider functions which are holomorphic in a punctured ball U r(a). From information about the behaviour of the function near the center a, a number of interesting and useful results will be derived. In particular, we will use these results to evaluate certain unproper integrals over the real line which cannot be evaluated by methods of calculus. 386 14 Complex Analysis

14.4.1 Classification of Singularities

Throughout this subsection U is a region, a U, and f : U a C is holomorphic. ∈ \ { }→ Definition 14.4 (a) Let f be holomorphic in U a where U is a region and a U. Then a is \ { } ∈ said to be an isolated singularity. (b) The point a is called a removable singularity if there exists a holomorphic function

g : U (a) C such that g(z)= f(z) for all z with 0 < z a

Proposition 14.21 (Riemann—1851) Suppose that f : U a C, a U, is holomorphic. \ { } → ∈ Then a is a removable singularity of f if and only if there exists a punctured neighborhood ◦ U r(a) where f is bounded. Proof. The necessity of the condition follows from the fact, that a holomorphic function g is continuous and the continuous function g(z) defined on the compact set U (a) is bounded; | | r/2 hence f is bounded. For the sufficiency we assume without loss of generality, a = 0 (if a is non-zero, consider the function f˜(z)= f(z a) instead). The function − z2f(z), z =0, h(z)= 6 (0, z =0

◦ is holomorphic in U r(0). Moreover, h is differentiable at 0 since f is bounded in a neighborhood of 0 and h(z) h(0) h′(0) = lim − = lim zf(z)=0. z 0 z 0 → z → Thus, h can be expanded into a power series at 0,

h(z)= c + c z + c z3 + c z3 + = c z2 + c z3 + 0 1 2 3 ··· 2 3 ··· with c0 = c1 =0 since h(0) = h′(0) = 0. For non-zero z we have

h(z) f(z)= = c + c z + c z2 + . z2 2 3 4 ··· The right side defines a holomorphic function in a neighborhood of 0 which coincides with f for z =0. The setting f(0) = c removes the singularity at 0. 6 2 14.4 Singularities 387

Definition 14.5 (a) An isolated singularity a of f is called a pole of f if there exists a positive C integer m Æ and a holomorphic function g : U (a) such that ∈ r →

g(z) f(z)= (z a)m −

The smallest number m such that (z a)mf(z) has a removable singularity at a is called the − order of the pole. (b) An isolated singularity a of f which is neither removable nor a pole is called an essential singularity. (c) If f is holomorphic at a and there exists a positive integer m and a holomorphic function g such that f(z)=(z a)mg(z), and g(a) =0, a is called a zero of order m of f. − 6

Note that m = 0 corresponds to removable singularities. If f(z) has a zero of order m at a, 1/f(z) has a pole of order m at a and vice versa.

Example 14.11 The function f(z)=1/z2 has a pole of order 2 at z =0 since z2f(z)=1 has a removable singularity at 0 and zf(z)=1/z not. The function f(z) = (cos z 1)/z3 has a − pole of order 1 at 0 since (cos z 1)/z3 = /(2z)+ z/4! . − − ∓···

14.4.2

In a neighborhood of an isolated singularity a holomorphic function cannot expanded into a power series, however, in a so called Laurent series.

Definition 14.6 A Laurent series with center a is a series of the form

∞ c (z a)n n − n= X−∞ or more precisely the pair of series

∞ ∞ n n f (z)= c n(z a)− and f+(z)= cn(z a) . − − − − n=1 n=0 X X

The Laurent series is said to be convergent if both series converge. 388 14 Complex Analysis

R

R

r r

a a a

f(z) converges +

f(z) converges −

1 Remark 14.10 (a) f (z) is “a power series in .” Thus, we can derive facts about the − z a convergence of Laurent series from the convergence− of power series. In fact, suppose that 1/r n is the radius of convergence of the power series ∞ c nζ and R is the radius of convergence n=1 − n n of the series ∞ c z , then the Laurent series c z converges in the annulus A = n=0 n P n Z n r,R z r

Proposition 14.22 Suppose that f is holomorphic in the open annulus Ar,R(a) = z r < z a < R . Then f(z) has an expansion in a convergent Laurent series for { | | − | } z A ∈ r,R ∞ ∞ n 1 f(z)= cn(z a) + c n (14.12) − − (z a)n n=0 n=1 X X − with coefficients 1 f(z)

c = dz, n Z, (14.13) n 2πi (z a)n+1 ∈ SρZ(a) −

where r<ρ

Proof.

Let z be in the annulus As1,s2 and let γ be the closed path around z in the annulus consisting of the two cir- cles S (a), S (a) and the two “bridges.” By Cauchy’s − s1 s2 integral formula, z s s 1 2 a 1 f(w) f(z)= dw = f (z)+ f (z)= 2πi w z 1 2 Zγ − 1 f(w) 1 f(w) γ = dw dw 2πi w z − 2πi w z Z Z Ss2 (a) − Ss1 (a) − We consider the two functions

1 f(w) 1 f(w) f (z)= dw, f (z)= dw 1 2πi w z 2 −2πi w z Z Z Ss2 (a) − Ss1 (a) − separately. n In what follows, we will see that f1(z) is a power series n∞=0 cn(z a) and f2(z) = 1 − ∞ c n n . The first part is completely analogous to the proof of Theorem 14.9. n=1 − (z a) P − z a P s Case 1. w Ss (a). Then z a < w a and q = − < 1 2 ∈ 2 | − | | − | | | w a − z such that

a w 1 1 1 1 ∞ ∞ (z a)n = = qn = − . w z w a z a w a (w a)n+1 1 w− a n=0 n=0 − − − − − X X − 

Since f(w) is bounded on Ss2 (a), the geometric series has a converging numerical upper bound. Hence, the series converges uniformly with respect to w; we can exchange integration and summation:

1 ∞ (z a)n ∞ (z a)n f(w)dw ∞ f (z)= f(w) − dw = − = c (z a)n, 1 2πi (w a)n+1 2πi (w a)n+1 n − n=0 n=0 n=0 S Z(a) − S Z(a) − s2 X X s2 X

1 f(w)dw where cn = 2πi (w a)n+1 are the coefficients of the power series f1(z). − Ss2 (a) R w a Case 2. w Ss1 (a). Then z a > w a and − < 1 such s 1 ∈ | − | | − | z a z that −

a w 1 1 1 1 ∞ ∞ (w a)n = = qn = − . w z −z a w a −z a − (z a)n+1 1 z −a n=0 n=0 − − − − − X X − 

Since f(w) is bounded on Ss1 (a), the geometric series has a converging numerical upper bound. Hence, the series converges uniformly with respect to w; we can exchange integration and 390 14 Complex Analysis summation:

1 ∞ (w a)n ∞ 1 f (z)= f(w) − dw = f(w)(w a)ndw 2 2πi (z a)n+1 2πi(z a)n+1 − n=0 n=0 S Z(a) − − S Z(a) s1 X X s1 ∞ n = c n(z a)− , − − n=1 X 1 n 1 where c n = f(w)(w a) − dw are the coefficients of the series f2(z). − 2πi − Ss2 (a) R f(w)

Since the integrand , k Z, is holomorphic in both annuli A and A , by Re- (w a)k ∈ s1,ρ ρ,s2 mark 14.4 (c) −

f(w)dw f(w)dw f(w)dw f(w)dw = , and = , (w a)k (w a)k (w a)k (w a)k Z Z Z Z Ss2 (a) − Sρ(a) − Ss1 (a) − Sρ(a) −

that is, in the coefficient formulas we can replace both circles Ss1 (a) and Ss2 (a) by a common circle Sρ(a). Since a power series converge uniformly on every compact subset of the disc of convergence, the last assertion follows.

Remark 14.11 The Laurent series of f on A (a) is unique. Its coefficients c , n Z are r,R n ∈ uniquely determined by (14.13). Another value of ρ with r<ρ

2 Example 14.12 Find the Laurent expansion of f(z) = in the three annuli with z2 4z +3 midpoint 0 − 0 < z < 1, 1 < z < 3, 3 < z . | | | | | | 1 1 Using partial fraction decomposition, f(z)= + , we find in the case 1 z z 3 (a) z < 1 − − | | 1 ∞ 1 1 1 1 ∞ z n = zn, = = . 1 z 3 z 3 1 z 3 3 n=0 3 n=0 − X −  −  X   Hence, ∞ 1 f(z)= 1 zn, z < 1. − 3n+1 | | n=0 X   In the case z > 1, | | 1 1 1 ∞ 1 = = 1 z z 1 1 zn+1 z n=0 − − X and as in (a) for z < 3 | | 1 1 ∞ z n = 3 z 3 3 n=0 − X   14.4 Singularities 391 such that ∞ 1 ∞ 1 f(z)= − + − zn. zn 3n+1 n=0 n=0 X X (c) In case z > 3 we have | | 1 1 ∞ 3n = = z 3 3 zn+1 z 1 z n=0 − − X such that  ∞ n 1 1 f(z)= (3 − 1) . − zn n=1 X We want to study the behaviour of f in a neighborhood of an essential singularity a. It is characterized by the following theorem. For the proof see Conway, [Con78, p. 300].

Theorem 14.23 (Great Picard Theorem (1879)) Suppose that f(z) is holomorphic in the an- nulus G = z 0 < z a

Proposition 14.24 (Casorati–Weierstraß) Suppose that f(z) is holomorphic in the annulus G = z 0 < z a

{ | | − | } C Then the image of any neighborhood of a in G is dense in C, that is for every w and any ∈ ε> 0 and δ > 0 there exists z G such that z a < δ and f(z) w <ε. ∈ | − | | − |

Proof. For simplicity, assume that δ 0 such that f(z) w ε for all z U◦ (a). Then the function | − | ≥ ∈ δ 1 g(z)= , z U◦ (a) f(z) w ∈ δ − is bounded (by 1/ε) in some neighborhood of a; hence, by Proposition14.21, a is a removable singularity of g(z). We conclude that

1 f(z)= + w g(z) has a removable singularity at a if g(a) =0. If, on the other hand, g(z) has a zero at a of order 6 m, that is ∞ g(z)= c (z a)n, c =0, n − m 6 n=m X 392 14 Complex Analysis the function (z a)mf(z) has a removable singularity at a. Thus, f has a pole of order m at a. − Both conclusions contradict our assumption that f has an essential singularity at a.

The Laurent expansion establishes an easy classification of the singularity of f at a. We sum- marize the main facts about isolated singularities.

◦ Proposition 14.25 Suppose that f(z) is holomorphic in the punctured disc U = U R(a) and ∞ possesses there the Laurent expansion f(z)= c (z a)n. n − n= Then the singularity at a X−∞

(a) is removable if c =0 for all n< 0. In this case, f(z) is bounded in U. n | |

(b) is a pole of order m if c m =0 and cn =0 for all n< m. In this case, limz a f(z) = − 6 − → | | + . ∞ (c) is an essential singularity if c =0 for infinitely many n< 0. In this case, f(z) has no n 6 | | finite or infinite limit as z a. → The easy proof is left to the reader. Note that Casorati–Weierstraß implies that f(z) has no | | limit at a.

1/z Example 14.13 f(z) = e has in C 0 the Laurent expansion \ { }

1 ∞ 1 e z = , z > 0. n!zn | | n=0 X

Since c n =0 for all n, f has an essential singularity at 0. − 6

14.5 Residues C Throughout U C is an open connected subset of . ⊂

Definition 14.7 Suppose that f : U (a) C, is holomorphic, 0

n 1 f(z) dz f(z)= cn(z a) , cn = n+1 − 2πi Sr (a) (z a) n Z 1 X∈ Z − be the Laurent expansion of f in the annulus z 0 < z a

Remarks 14.12 (a) If f is holomorphic at a, Res f =0 by Cauchy’s theorem. a (b) The integral f(z) dz depends only on the coefficient c 1 in the Laurent expansion of f(z) γ − around a. Indeed, every summand c (z a)n, n =0, has an antiderivative in U a such that R n − 6 \ { } the integral over a closed path is 0. (c) Res f + Res g = Res f + g and Res λf = λ Res f. a a a a a

Theorem 14.26 (Residue Theorem) Suppose that f : U a ,...,a C, a ,...,a \ { 1 m} → 1 m ∈ U, is holomorphic. Further, let γ be a non-selfintersecting positively oriented closed curve in

U such that the points a1,...,am are in the inner part of γ. Then

m f(z) dz =2πi Res f (14.14) γ ak Z Xk=1 Proof.

As in Remark 14.4 we can replace γ by the sum of integrals over small circles, one δ δ R 3 4 δ 2 γ around each singularity. As before, we ob- δ 1 tain a 2 γ a 2 1 m γ 1 f(z) dz = f(z) dz, Zγ k=1 Z XSε(ak) where all circles are positively oriented. Applying the definition of the residue we obtain the assertion.

Remarks 14.13 (a) The residue theorem generalizes the Cauchy’s Theorem, see Theorem 14.6.

Indeed, if f(z) possesses an analytic continuation to the points a1,...,am, all the residues are zero and therefore γ f(z) dz =0. ∞ (b) If g(z) is holomorphicR in the region U, g(z)= c (z a)n, c = g(a), then n − 0 n=0 X g(z) f(z)= , z U a z a ∈ \ { } − is holomorphic in U a with a Laurent expansion around a: \ { } c f(z)= 0 + c + c (z a)2 + , z a 1 2 − ··· − where c0 = g(a) = Res f. The residue theorem gives a g(z) dz =2πi Res f =2πi c0 =2πig(a). z a a SrZ(a) − We recovered Cauchy’s integral formula. 394 14 Complex Analysis

14.5.1 Calculating Residues

(a) Pole of order 1

As in the previous Remark, suppose that f has a pole of order 1 at a and g(z)=(z a)f(z) is − the corresponding holomorphic function in Ur(a). Then

Res f = g(a) = lim g(z) = lim(z a)f(z). (14.15) a z a,z=a z a → 6 → −

(b) Pole of order m

Suppose that f has a pole of order m at a. Then

c m c m+1 c 1 f(z)= − + − + + − + c + c (z a)+ , 0 < z a

m m 1 (z a) f(z)= c m + c m+1(z a)+ c 1(z a) − + , z a < r. − − − − ··· − − ··· | − |

Differentiating this (m 1) times, all terms having coefficient c m, c m+1,...,c 2 vanish and − − − − we are left with the power series

m 1 d − m m 1 ((z a) f(z))=(m 1)!c 1 + m(m 1) 2 c0(z a)+ dz − − − − − ··· − ···

Inserting z = a on the left, we obtain c 1. However, on the left we have to take the limit z a − → since f is not defined at a. Thus, if f has a pol of order m at a,

1 dm 1 Res f(z)= lim − ((z a)mf(z)) . (14.17) a (m 1)! z a dzm 1 − − → −

(c) Quotients of Holomorphic Functions

p Suppose that f = q where p and q are holomorphic at a and q has a zero of order 1 at a, that is q(a)=0 = q′(a). Then, by (a) 6

lim p(z) p p(z) p(z) z a p(a) Res = lim(z a) = lim = → = . (14.18) a z a z a q(z) q(a) q(z) q(a) q → − q(z) → − lim − q′(a) z a z a z a − → − 14.6 Real Integrals 395

dz Example 14.14 Compute 4 . The only singularities of S1(i) 1+z Im f(z) = 1/(1 + z4) inside the disc z z i < 1 are R { | | − | } a = eπi/4 = (1+i)/√2 and a = e3πi/4 = ( 1+i)/√2. In- 1 2 − deed, a i 2 = 2 √2 < 1. We apply the Residue Theorem i | 1 − | − a2 a1 and (c) and obtain

dz 0 1 Re 4 =2πi Res f + Res f = 1+ z a1 a2 ZS1(i)   1 1 a a √2π 2πi + =2πi− 1 − 2 = . 4a3 4a3 4 2  1 2  14.6 Real Integrals

14.6.1 Rational Functions in Sine and Cosine Suppose we have to compute the integral of such a function over a full period [0, 2π]. The idea is to replace t by z = eit on the unit circle, cos t and sin t by (z +1/z)/2 and (z 1/z)/(2i), − respectively, and finally dt = dz/(iz).

Proposition 14.27 Supppose that R(x, y) is a rational function in two variables and R(cos t, sin t) is defined for all t [0, 2π]. Then ∈ 2π R(cos t, sin t) dt =2πi Res f(z), (14.19) 0 a Z a U1(0) ∈X where 1 1 1 1 1 f(z)= R z + , z iz 2 z 2i − z      and the sum is over all isolated singularities of f(z) in the open unit ball. Proof. By the residue theorem,

f(z) dz =2πi Res f a S1(0) Z a U1(0) ∈X Let z = eit for t [0, 2π]. Rewriting the integral on the left using dz = eiti dt = iz dt ∈ 2π f(z) dz = R(cos t, sin t) dt ZS1(0) Z0 completes the proof.

Example 14.15 For a < 1, | | 2π dt 2π = . 1 2a cos t + a2 1 a2 Z0 − − 396 14 Complex Analysis

For a = 0, the statement is trivially true; suppose now a = 0. Indeed, the complex function 6 corresponding to the integrand is

1 1 i/a f(z)= = = . iz(1 + a2 az a/z) i( az2 a +(1+ a2)z) (z a) z 1 − − − − − − a In the unit disc, f(z) has exactly one pole of order 1, namely z = a. By (14.15), the formula in Subsection 14.5.1, i i Res f = lim(z a)f(z)= a = ; a z a − a 1 a2 1 → − a − the assertion follows from the proposition:

2π dt i 2π =2πi = . 1 2a cos t + a2 a2 1 1 a2 Z0 − − −

Specializingi R =1 and r = a Ê in Homework 49.1, we obtain the same formula. ∈

14.6.2 Integrals of the form ∞ f(x)dx −∞ (a) The Principal Value R

∞ We often compute improper integrals of the form f(x) dx. Using the residue theorem, we calculate limits Z−∞

R lim f(x) dx, (14.20) R R →∞ Z−

which is called the principal value (or Cauchy mean value) of the integral over Ê and we denote it by ∞ Vp f(x) dx. Z −∞ The existence of the “coupled” limit (14.20) in general does not imply the existence of the improper integral

0 s ∞ f(x) dx = lim f(x) dx + lim f(x) dx. r r s 0 Z−∞ →−∞ Z →∞ Z For example, Vp ∞ x dx = 0 whereas ∞ x dx does not exist since ∞ x dx = + . In 0 ∞ general, the existence−∞ of the improper integral−∞ implies the existence of the principal value. If f R R R is an even function or f(x) 0, the existence of the principal value implies the existence of the ≥ improper integral.

(b) Rational Functions

The main idea to evaluate the integral f(x) dx is as follows. Let À = z Im (z) > 0

Ê { | } C be the upper half-plane and f : À a ,...,a be holomorphic. Choose R > 0 large \ { 1 R m} → 14.6 Real Integrals 397 enough such that a < R for all k =1,...,m, that is, all isolated singularities of f are in the | k | upper-plane-half-disc of radius R around 0. Consider the path as in the picture which con- sists of the segment from R to R on the real iR − γ line and the half-circle γR of radius R. By the . R . residue theorem, . . . . . R m f(x) dx+ f(z) dz =2πi Res f(z). ak R γR k=1 −R R Z− Z X If

lim f(z) dz =0 (14.21) R →∞ ZγR the above formula implies

R m lim f(x) dx =2πi Res (z). R ak →∞ R Z− Xk=1 ∞ Knowing the existence of the improper integral f(x) dx one has Z−∞ m ∞ f(x) dx =2πi Res (z). ak Z−∞ Xk=1 Suppose that f = p is a rational function such that q has no real zeros and deg q deg p +2. q ≥ Then (14.21) is satisfied. Indeed, since only the two leading terms of p and q determine the the p(z) C limit behaviour of f(z) for z , there exists C > 0 with on γ . Using the | |→∞ q(z) ≤ R2 R estimate Mℓ(γ) from Remark 14.3 (c) we get

p(z) C πC dz ℓ(γR)= 0. q(z) ≤ R2 R R−→ ZγR →∞

By the same reason namely p(x)/q(x) C/x2, for large x, the improper real integral exists | | ≤ (comparison test) and converges absolutely. Thus, we have shown the following proposition.

Proposition 14.28 Suppose that p and q are polynomials with deg q deg p +2. Further, q ≥ p(z) has no real zeros and a1,...,am are all poles of the rational function f(z) = q(z) in the open

upper half-plane À. Then m ∞ f(x) dx =2πi Res f. ak Z−∞ Xk=1 dx 2 2

∞ À Example 14.16 (a) 1+x2 . The only zero of q(z)= z +1 in is a1 = i and deg(1 +z )= 2 deg(1) + 2 such−∞ that ≥ R ∞ dx 1 1 =2πi Res =2πi = π. 2 i 2 1+ x 1+ z 1+ z z=i Z−∞

398 14 Complex Analysis

(b) It follows from Example14.14 that

∞ dx π√2 4 =2πi Res f + Res f = . 1+ x a1 a2 2 Z−∞   (c) We compute the integral ∞ dt 1 ∞ dt 6 = 6 . 0 1+ t 2 1+ t Z Z−∞

a2 =i The zeros of q(z) = z6 +1 in the upper half-plane are iπ/6 iπ/2 5iπ/6 a3 a1 a1 = e , a2 = e and a3 = e . They are all of multiplicity 1 such that Formula (14.18) applies:

1 1 1 1 a Res = = = k . a 5 k q(z) q′(ak) 6ak − 6

By Proposition14.28 and noting that a1 + a3 = i,

∞ dt 1 1 πi π = 2πi− eiπ/6 +i+e5iπ/6 = 2i = . 1+ t6 2 6 − 6 3 Z0  (c) Functions of Type g(z) eiαz Proposition 14.29 Suppose that p and q are polynomials with deg q deg p +1. Further, q ≥ p has no real zeros and a1,...,am are all poles of the rational function g = q in the open upper

iαz Ê half-plane À. Put f(z)= g(z) e , where α is positive α> 0. ∈ Then m ∞ f(x) dx =2πi Res f. ak Z−∞ Xk=1 Proof. (The proof was omitted in the lecture.)

−r+ir ir r+ir

. . Instead of a semi-circle it is more appropriate to . consider a rectangle now. . . . .

−r r According to the residue theorem,

r r+ir r ir r m − − f(x) dx + f(z) dz + f(z) dz + f(z) dz =2πi Res f. r r r+ir r+ir ak Z− Z Z Z− Xk=1 p(z) p(z) Since deg q deg p +1, limz q(z) = 0. Thus, sr = sup exists and tends to 0 as ≥ →∞ z r q(z) | |≥ r . →∞

14.6 Real Integrals 399

Consider the second integral I with z = r + it, t [0,r], dz = i dt. On this segment we have 2 ∈ the following estimate

p(z) iα(r+it) αt e s e− q(z) ≤ r

which implies r αt sr αr sr I s e− dt = 1 e− . | 2 | ≤ r α − ≤ α Z0 A similar estimate holds for the fourth integral from r + ir to r. In case of the third integral − − one has z = t + ir, t [ r, r], dz = dt such that ∈ − r r iα(t+ri) αr αr I3 sr e dt = sre− dt =2rsre− . | | ≤ r r Z− Z−

αr Since 2re− is bounded and s 0 as r , all three integrals I , I , and I tend to 0 as r → → ∞ 2 3 4 r . This completes the proof. →∞

Example 14.17 For a> 0, ∞ cos t π a dt = e− . t2 + a2 2a Z0 Obviously, it ∞ cos t 1 ∞ e 2 2 dt = Re 2 2 dt . 0 t + a 2 t + a Z Z−∞  eit The function f(z)= has a single pole of order 1 in the upper half-plane at z = ai. By t2 + a2 formula (14.18) eiz eiz e a Res = = − . ai z2 + a2 2z 2ai z=ai

Proposition 14.29 gives the result.

(d) A Fourier transformations

Lemma 14.30 For a Ê, ∈

1 1 x2 iax 1 a2 e− 2 − dx = e− 2 . (14.22)

√2π Ê Z Proof.

γ 1 z2 3 Let f(z) = e− 2 , z C and γ the closed −R+ai ai R+ai ∈ rectangular path γ = γ1 + γ2 + γ3 + γ4 as in γ the picture. Since f is an entire function, by 4 γ 2 Cauchy’s theorem γ f(z) dz = 0. Note that γ is parametrized as z = R + ti, t [0, a], 2 R ∈ −R γ R dz = i dt, such that 1 400 14 Complex Analysis

a a 1 (R2+it)2 1 (R2+2Rit t2) f(z) dz = e− 2 i dt = e− 2 − i dt γ2 0 0 Z Z a Z a 1 R2+ 1 t2 1 R2 1 t2 1 R2 f(z) dz e− 2 2 dt = e− 2 e 2 dt = Ce− 2 . ≤ Zγ2 Z0 Z0

1 R2 Since e− 2 tends to 0 as R , the above integral tends to 0, as well; hence → ∞ lim f(z) dz =0. Similarly, one can show that lim f(z) dz =0. Since f(z) dz =0, R R γ →∞ Zγ2 →∞ Zγ4 R we have f(z) dz =0, that is Zγ1+γ3

∞ ∞ f(x) dx = f(x + ai) dx. Z−∞ Z−∞ Using 1 x2 e− 2 dx = √2π, ZÊ which follows from Example14.7, page 374, or from homework41.3, we have

1 x2 1 (x2+2iax a2) 1 a2 1 x2 iax

√2π = e− 2 dx = e− 2 − dx = e 2 e− 2 − dx

Ê Ê ZÊ Z Z 1 1 x2 iax 1 a2 e− 2 − dx = e− 2 .

√2π Ê Z Chapter 15

Partial Differential Equations I — an Introduction

15.1 Classification of PDE

15.1.1 Introduction

There is no general theory known concerning the solvability of all PDE. Such a theory is ex- tremely unlikely to exist, given the rich variety of physical, geometric, probabilistic phenomena which can be modelled by PDE. Instead, research focuses on various particular PDEs that are important for applications in mathematics and physics.

Definition 15.1 A partial differential equation (abbreviated as PDE) is an equation of the form

F (x,y,...,u,ux,uy,...,uxx,uxy,... )=0 (15.1) where F is a given function of the independent variables x,y,... of the unknown function u and a finite number of its partial derivatives. We call u a solution of (15.1) if after substitution of u(x,y,... ) and its partial derivatives (15.1) is identically satisfied in some region Ω in the space of the independent variables x,y,... . The order of a PDE is the order of the highest derivative that occurs.

A PDE is called linear if it is linear in the unknown function u and their derivatives ux, uy, uxy, . . . , with coefficients depending only on the variables x,y,... . In other words, a linear PDE can be written in the form

G(u,ux,uy,...,uxx,uxy,... )= f(x,y,... ), (15.2) where the function f on the right depends only on the variables x,y,... and G is linear in all components with coefficients depending on x,y,... . More precisely, the formal differential operator L(u) = G(u,ux,uy,...,uxx,uxy,... ) which associates to each function u(x,y,... ) a new function L(u)(x,y,... ) is a linear operator. The linear PDE (15.2) (L(u)= f) is called homogeneous if f = 0 and inhomogeneous otherwise. For example, cos(xy2)u y2u + xxy − x 401 402 15 Partial Differential Equations I — an Introduction u sin x + tan(x2 + y2)=0 is a linear inhomogeneous PDE of order 3, the corresponding homogeneous linear PDE is cos(xy2)u y2u + u sin x =0. xxy − x A PDE is called quasi-linear if it is linear in all partial derivatives of order m (the order of the PDE) with coefficients which depend on the variables x, y, and partial derivatives of order 2 ··· less than m; for example uxuxx + u = 0 is quasi-linear, uxyuxx +1=0 not. Semi-linear equations are those quasi-linear equation in which the coefficients of the highest order terms 2 2 does not depend on u and its partial derivatives; sin xuxx+u =0 is semi-linear; uxuxx+u =0 not. Sometimes one considers systems of PDEs involving one or more unknown functions and their derivatives.

15.1.2 Examples

(1) The Laplace equation in n dimensions for a function u(x1,...,xn) is the linear second order equation ∆u = u + + u =0. x1x1 ··· xnxn The solutions u are called harmonic (or potential) functions. In case n =2 we associate with a harmonic function u(x, y) its “conjugate” harmonic function v(x, y) such that the first-order system of Cauchy–Riemann equations

u = v , u = v x y y − x is satisfied. A real solution (u, v) gives rise to the analytic function f(z) = u + iv. The Poisson equation is

∆u = f, for a given function f : Ω Ê. → The Laplace equation models equilibrium states while the Poisson equation is impor- tant in electrostatics. Laplace and Poisson equation always describe stationary processes (there is no time dependence).

(2) The heat equation. Here one coordinate t is distinguished as the “time” coordinate, while

the remaining coordinates x1,...,xn represent spatial coordinates. We consider

+ n

Ê Ê u: Ω Ê , Ω open in , × →

+ Ê where Ê = t t> 0 is the positive time axis and pose the equation { ∈ | } ku = ∆u, where ∆u = u + + u . t x1x1 ··· xnxn The heat equation models heat conduction and other diffusion processes.

(3) The wave equation. With the same notations as in (2), here we have the equation

u a2∆u =0. tt − It models wave and oscillation phenomena. 15.1 Classification of PDE 403

(4) The Korteweg–de Vries equation

u 6uu + u =0 t − x xxx models the propagation of waves in shallow waters.

(5) The Monge–Ampere` equation

u u u2 = f xx yy − xy with a given function f, is used for finding surfaces with prescribed curvature.

(6) The Maxwell equations for the electric field strength E =(E1, E2, E3) and the magnetic field strength B =(B1, B2, B3) as functions of (t, x1, x2, x3):

div B =0, (magnetostatic law),

Bt + curl E =0, (magnetodynamic law), div E =4πρ, (electrostatic law, ρ = charge density), E curl B = 4πj (electrodynamic law, j = current density) t − −

(7) The Navier–Stokes equations for the velocity v(x, t)=(v1, v2, v3) and the pressure p(x, t) of an incompressible fluid of density ρ and viscosity η:

3 ρvj + ρ vi vj η∆vj = p , j =1, 2, 3, t xi − − xj i=1 X div v =0.

(8) The Schrodinger¨ equation

~2 i~u = ∆u + V (x, u) t −2m

(m = mass, V = given potential, u: Ω C) from quantum mechanics is formally → similar to the heat equation, in particular in the case V = 0. The factor i, however, leads to crucial differences.

Classification

We have seen so many rather different-looking PDEs, and it is hopeless to develop a theory that can treat all these diverse equations. In order to proceed we want to look for criteria to classify PDEs. Here are some possibilities

(I) Algebraically.

(a) Linear equations are (1), (2), (3), (6) which is of first order, and (8) (b) semi-linear equations are (4) and (7) 404 15 Partial Differential Equations I — an Introduction

(c) a non-linear equation is (5)

naturally, linear equations are simple than non-linear ones. We shall therefore mostly study linear equations.

(II) The order of the equation. The Cauchy–Riemann equations and the Maxwell equations are linear first order equations. (1), (2), (3), (5), (7), (8) are of second order; (4) is of third order. Equations of higher order rarely occur. The most important PDEs are second order PDEs.

(III) Elliptic, parabolic, hyperbolic. In particular, for the second order equations the following classification turns out to be useful: Let x =(x ,...,x ) Ω and 1 n ∈

F (x,u,uxi ,uxixj )=0

be a second-order PDE. We introduce auxiliary variables pi, pij, i, j =1,...,n, and study

the function F (x,u,pi,pij). The equation is called elliptic in Ω if the matrix

Fpij (x, u(x),uxi (x),uxixj (x))i,j=1,...,n

of the first derivatives of F with respect to the variables pij is positive definite or negative definite for all x Ω. ∈ this may depend on the function u. The Laplace equation is the prominent example of an elliptic equation. Example (5) is elliptic if f(x) > 0. The equation is called hyperbolic if the above matrix has precisely one negative and d 1 − positive eigenvalues (or conversely, depending on the choice of the sign). Example (3) is hyperbolic and so is (5) if f(x) < 0. Finally, the equation is parabolic if one eigenvalue of the above matrix is 0 and all the other eigenvalues have the same sign. More precisely, the equation can be written in the form

ut = F (t,x,u,uxi ,uxixj ) with an elliptic F .

(IV) According to solvability. We consider the second-order PDE F (x,u,uxi ,uxixj )=0 for

u: Ω Ê, and wish to impose additional conditions upon the solution u, typically → prescribing the values of u or of certain first derivatives of u on the boundary ∂Ω or part of it. Ideally such a boundary problem satisfies the three conditions of Hadamard for a well- posed problem

(a) Existence of a solution u for the given boundary values; (b) Uniqueness of the solution; (c) Stability, meaning continuous dependence on the boundary values. 15.2 First Order PDE — The Method of Characteristics 405

2 Example 15.1 In the following examples Ω = Ê and u = u(x, y). 2 2 (a) Find all solutions u C (Ê ) with u = 0. We first integrate with respect to x and find ∈ xx that ux is independent on x, say ux = a(y). We again integrate with respect to x and obtain u(x, y) = xa(y)+ b(y) with arbitrary functions a and b. Note that the ODE u′′ = 0 has the general solution ax + b with coefficients a, b. Now the coefficients are functions on y. 2 2 (b) Solve uxx + u = 0, u C (Ê ). The solution of the corresponding ODE u′′ + u = 0, 2 ∈ u = u(x), u C (Ê), is a cos x + b sin x such that the general solution of the corresponding ∈ PDE in 2 variables x and y is a(y) cos x + b(y) sin x with arbitrary functions a and b. 2 2 ∂ (c) Solve u = 0, u C (Ê ). First integrate (u )=0 with respect to y. we obtain xy ∈ ∂y x ˜ ˜ ux = f(x). Integration with respect to x yields u = f(x) dx + g(y)= f(x)+ g(y), where f is differentiable and g is arbitrary. R

15.2 First Order PDE — The Method of Characteristics

We solve first order PDE by the method of chracteristics. It applies to quasi-linear equations

a(x,y,u)ux + b(x,y,u)uy = c(x,y,u) (15.3) as well as to the linear equation

a(x, y)ux + b(x, y)uy = c0(x, y)u + c1(x, y). (15.4)

We restrict ourselves to the linear equation with an initial condition given as a parametric curve in the xyu-space

Γ = Γ (s)=(x (s),y (s),u (s)), s (a, b) Ê. (15.5) 0 0 0 ∈ ⊆ The curve Γ will be called the initial curve. The initial condition then reads

u(x (s),y (s)) = u (s), s (a, b). 0 0 0 ∈

The geometric idea behind this method is the

u following. The solution u = u(x, y) can be 3 thought as surface in Ê = (x,y,u) x,y,u initial curve 3 { | ∈ integral Ê . Starting from a point on the initial curve, surface (x(0,s), y(0,s),u(0,s)) Γ(s) } initial point we construct a chracteristic curve in the surface y u. If we do so for any point of the initial curve, characteristic curves we obtain a one-parameter family of character- x istic curves; glueing all these curves we get the solution surface u. The linear equation (15.4) can be rewritten as

(a, b, c0 u + c1) (ux,uy, 1)=0. (15.6) · − 406 15 Partial Differential Equations I — an Introduction

Recall that (u ,u , 1) is the normal vector to the surface (x,y,u(x, y)), that is, the tangent x y − equation to u at (x0,y0,u0) is

u u0 = ux (x x0)+ uy(y y0) (x x0,y y0,u u0) (ux,uy, 1)=0. − − − ⇔ − − − · − It follows from (15.6) that (a, b, c0u + c1) is a vector in the tangent plane. Finding a curve (x(t),y(t),u(t)) with exactely this tangent vector

(a(x(t),y(t)), b(x(t),y(t)),c0(x(t),y(t))u(t)+ c1(x(t),y(t))) is equivalent to solve the ODE

x′(t)= a(x(t),y(t)), (15.7)

y′(t)= b(x(t),y(t)), (15.8)

u′(t)= c0(x(t),y(t))u(t)+ c1(x(t),y(t))). (15.9) This system is called the characteristic equations. The solutions are called characteristic curves of the equation. Note that the above system is autonomous, i. e. there is no explicit dependence on the parameter t. In order to determine characteristic curves we need an initial condition. We shall require the initial point to lie on the initial curve Γ (s). Since each curve (x(t),y(t),u(t)) emanates from a different point Γ (s), we shall explicitely write the curves in the form (x(t, s),y(t, s),u(t, s)). The initial conditions are written as

x(0,s)= x0(s), y(0,s)= y0(s), u(0,s)= u0(s).

Notice that we selected the parameter t such that the characteristic curve is located at the initial curve at t =0. Note further that the parametrization (x(t, s),y(t, s),u(t, s)) represents a surface 3 in Ê . The method of characteristics also applies to quasi-linear equations. To summarize the method: In the first step we identify the initial curve Γ . In the second step we select a point s on Γ as initial point and solve the characterictic equations using the point we selected on Γ as an initial point. After preforming the steps for all points on Γ , we obtain a portion of the solution surface, also called integral surface. That consists of the union of the characteristic curves.

Example 15.2 Solve the equation

ux + uy =2 subject to the initial condition u(x, 0) = x2. The characteristic equations and the parametric initial conditions are

xt(t, s)=1, yt(t, s)=1, ut(t, s)=2, x(0,s)= s, y(0,s)=0, u(0,s)= s2.

It is easy to solve the characteristic equations:

x(t, s)= t + f1(s), y(t, s)= t + f2(s), u(t, s)=2t + f3(s). 15.2 First Order PDE — The Method of Characteristics 407

Inserting the initial conditions, we find

x(t, s)= t + s, y(t, s)= t, u(t, s)=2t + s2.

We have obtained a parametric representation of the integral surface. To find an explicit rep- resentation we have to invert the transformation (x(t, s),y(t, s)) in the form (t(x, y),s(x, y)), namely, we have to solve for s and t. In the current example, we find t = y, s = x y. Thus − the explicit formula for the integral surface is

u(x, y)=2y +(x y)2. − Remark 15.1 (a) This simple example might lead us to think that each initial value problem for a first-order PDE possesses a unique solution. But this is not the case Is the problem(15.3) together with the initial condition (15.5) well-posed? Under which conditions does there exists a unique integral surface that contains the initial curve? (b) Notice that even if the PDE is linear, the characteristic equations are non-linear. It follows that one can expect at most a local existanece theorem for a first ordwer PDE. (c) The inversion of the parametric presentation of the integral surface might hide further diffi- culties. Recall that the implicit function theorem implies that the inversion locally exists if the Jacobian ∂(x,y) = 0. An explicit computation of the Jacobian at a point s of the initial curve ∂(t,s) 6 gives ∂x ∂y ∂x ∂y a b J = = ay0′ bx0′ = . ∂t ∂s − ∂s ∂t − x′ y′ 0 0

Thus, the Jacobian vanishes at some point if and only if the vectors (a, b) and (x′ ,y′ ) are 0 0 linearly dependent. The geometrical meaning of J = 0 is that the projection of Γ into the xy plane is tangent to the projection of the characteristic curve into the xy plane. To ensure a unique solution near the initial curve we must have J = 0. This condition is called the transersality 6 condition.

Example 15.3 (Well-posed and Ill-posed Problems) (a) Solve ux = 1 subject to the initial condition u(0,y)= g(y). The characteristic equations and the inition conditions are given by

xt =1, yt(t, s)=0, ut(t, s)=1, x(0,s)=0, y(0,s)= s, u(0,s)= g(s).

The parametric integral surface is (x(t, s),y(t, s),u(t, s))=(t, s, t+g(s)) such that the explicit solution is u(x, y)= x + g(y).

(b) If we keep ux =1 but modify the initial condition into u(x, 0) = h(x), the picture changess dramatically.

xt =1, yt(t, s)=0, ut(t, s)=1, x(0,s)= s, y(0,s)=0, u(0,s)= h(s).

In this case the parametric solution is

(x(t, s),y(t, s),u(t, s))=(t + s, 0, t + h(s)). 408 15 Partial Differential Equations I — an Introduction

Now, however, the transformation x = t + s, y = 0 cannot be inverted. Geometrically, the projection of the initial curve is the x axis, but this is also the projection of the characteristic curve. In the speial case h(x)= x + c for some constant c, we obtain u(t, s)= t + s + c. Then it is not necessary to invert (x, y) since we find at once u = x + c + f(y) for every differentiable function f with f(0) = 0. We have infinitely many solutions — uniqueness fails. (c) However, for any other choice of h Existence fails — the problem has no solution at all. Note that the Jacobian is a b 1 0 J = = =0. x y 1 0 0′ 0′

Remark 15.2 Because of the special role played by the projecions of the characteristics on the xy plane, we also use the term characteristics to denote them. In case of the linear PDE (15.4) the ODE for the projection is dx dy x′(t)= = a(x(t),y(t)), y′(t)= = b(x(t),y(t)), (15.10) dt dt dy b(x, y) which yields y′(x)= = . dx a(x, y)

15.3 Classification of Semi-Linear Second-Order PDEs

15.3.1 Quadratic Forms We recall some basic facts about quadratic forms and symmetric matrices.

n n Proposition 15.1 (Sylvester’s Law of Inertia) Suppose that A Ê × is a symmetric matrix.

n n ∈ Æ (a) Then there exist an invertible matrix B Ê × , r, s, t with r + s + t = n and a ∈ ∈ 0 diagonal matrix D = diag (d1,d2,...,dr+s, 0,..., 0) with di > 0 for i = 1,...,r and di < 0 for i = r +1,...,r + s and

BAB⊤ = diag (d1,...,dr+s, 0,..., 0).

We call (r, s, t) the signature of A. (b) The signature does not depend on the change of coordinates, i.e. If there exist another regular matrix B′ and a diagonal matrix D′ with D′ = B′A(B′)⊤ then the signature of D′ and D coincide.

15.3.2 Elliptic, Parabolic and Hyperbolic

n Consider the semi-linear second-order PDE in n variables x ,...,x in a region Ω Ê 1 n ⊂ n

aij(x) uxixj + F (x,u,ux1 ,...,uxn )=0 (15.11) i,j=1 X with continuous coefficients a (x). Since we assume u C2(Ω), by Schwarz’s lemma we ij ∈ assume without loss of generality that aij = aji. Using the terminology of the introduction 15.3 Classification of Semi-Linear Second-Order PDEs 409

(Classification (III), see page 404) we find that the matrix A(x) := (aij(x))i,j=1,...,n, coincides with the matrix (Fpij )i,j defined therein.

Definition 15.2 We call the PDE (15.11) elliptic at x0 if the matrix A(x0) is positive definite or negative definite. We call it parabolic at x0 if A(x0) is positive or negative semidefinite with exactly one eigenvalue 0. We call it hyperbolic if A(x ) has the signature (n 1, 1, 0), i.e. A is 0 − indefinite with n 1 positive eigenvalues and one negative eigenvalue and no zero eigenvalue − (or vice versa).

15.3.3 Change of Coordinates

First we study how the coefficients aij will change if we impose a non-singular transformation of coordinates y = ϕ(x);

yl = ϕl(x1,...,xn), l =1,...,n;

∂(ϕ1,...,ϕn) The transformation is called non-singular if the Jacobian (x0) = 0 is non-zero at ∂(x1,...,xn) 6 any point x Ω. By the Inverse Mapping Theorem, the transformation possesses locally an 0 ∈ inverse transformation denoted by x = ψ(y)

xl = ψl(y1,...,yn), l =1,...,n.

Putting u˜(y) := u(ψ(y)), then u(x)=˜u(ϕ(x)) and if moreover ϕ C2(Ω) we have by the chain rule l ∈ n ∂ϕl uxi = u˜yl , ∂xi Xl=1 n n 2 ∂ϕl ∂ϕk ∂ ϕl uxixj =(uxi )xj = u˜ylyk + u˜yl . (15.12) ∂xi ∂xj ∂xi∂xj k,lX=1 Xl=1 Inserting (15.12) into (15.11) one has

n n ∂ϕ ∂ϕ n n ∂2ϕ u˜ a l k + u˜ a l + F˜(y, u,˜ u˜ ,..., u˜ )=0. (15.13) ylyk ij ∂x ∂x yl ij ∂x ∂x y1 yn i,j=1 i j i,j=1 i j k,lX=1 X Xl=1 X

We denote by a˜lk the new coefficients of the partial second derivatives of u˜,

n ∂ϕ ∂ϕ a˜ = a (x) l k , (15.14) lk ij ∂x ∂x i,j=1 i j X and write (15.13) in the same form as (15.11)

n ˜ a˜lk(y)˜uylyk + F (y, u,˜ u˜y1 ,..., u˜yn )=0. k,lX=1 410 15 Partial Differential Equations I — an Introduction

Equation (15.14) later plays a crucial role in simplifying PDE (15.11). Namely, if we want some of the coefficients a˜lk to be 0, the right hand side of (15.14) has to be 0. Writing

∂ϕl blj = , l,j =1,...,n, B =(blj), ∂xj the new coefficient matrix A˜(y)=(˜alk(y)) reads as follows

A˜ = B A B⊤. · · By Proposition15.1, A and A˜ have the same signature. We have shown the following proposi- tion.

Proposition 15.2 The type of a semi-linear second order PDE is invariant under the change of coordinates. Notation. We call the operator L with n ∂2u L(u)= a (x) + F (x,u,u ,...,u ) ij ∂x ∂x x1 xn i,j=1 i j X differential operator and denote by L2 n ∂2u L (u)= a (x) 2 ij ∂x ∂x i,j=1 i j X the sum of its the highest order terms; L2 is a linear operator.

Definition 15.3 The second-order PDE L(u)=0 has normal form if m r L (u)= u u 2 xj xj − xj xj j=1 j=m+1 X X with some positive integers m r n. ≤ ≤ Remarks 15.3 (a) It happens that the type of the equation depends on the point x Ω. For 0 ∈ example, the Trichomi equation

yuxx + uyy =0 is of mixed type. More precisely, it is elliptic if y > 0, parabolic if y = 0 and hyperbolic if y < 0. (b) The Laplace equation is elliptic, the heat equation is parabolic, the wave equation is hyper- bolic. (c) The classification is not complete in case n 3; for example, the quadratic form can be of ≥ type (n 2, 1, 1). − (d) Case n =2. The PDE

auxx +2buxy + cuyy + F (x,y,u,ux,uy)=0 with coefficients a = a(x, y), b = b(x, y) and c = c(x, y) is elliptic, parabolic or hyperbolic at (x ,y ) if and only if ac b2 > 0, ac b2 =0 or ac b2 < 0 at (x ,y ), respectively. 0 0 − − − 0 0 15.3 Classification of Semi-Linear Second-Order PDEs 411

15.3.4 Characteristics

n Suppose we are given the semi-linear second-order PDE in Ω Ê ⊂ n ∂2u a (x) + F (x,u,u ,...,u )=0 (15.15) ij ∂x ∂x x1 xn i,j=1 i j X with continuous aij; aij(x)= aji(x). We define the concept of characteristics which plays an important role in the theory of PDEs, not only of second-order PDEs.

Definition 15.4 Suppose that σ C1(Ω) is continuously differentiable, grad σ =0, and ∈ 6

(a) for some point x of the hypersurface F = x Ω σ(x)= c , c Ê, we have 0 { ∈ | } ∈ n ∂σ(x ) ∂σ(x ) a (x ) 0 0 =0. (15.16) ij 0 ∂x ∂x i,j=1 i j X

Then F is said to be characteristic at x0. (b) If F is characteristic at every point of F, F is called a characteristic hypersurface or simply a characteristic of the PDE (15.11). Equation (15.16) is called the characteristic equation of (15.11).

In case n =2 we speak of characteristic lines. If all hypersurfaces σ(x) = c, a

y1 = σ(x) we see from (15.14) that a˜11 = 0. That is, the knowledge of one or more characteristic hyper- surfaces can simplify the PDE.

Example 15.4 (a) The characteristic equation of uxy = 0 is σxσy = 0 such that σx = 0 and

σy = 0 define the characteristic lines; the parallel lines to the coordinate axes, y = c1 and x = c2, are the characteristics. (b) Find type and characteristic lines of

x2u y2u =0, x =0, y =0. xx − yy 6 6 Since det = x2( y2) 0= x2y2 < 0, the equation is hyperbolic. The characteristic equation, − − − in the most general case, is 2 2 aσx +2bσx σy + cσy =0.

Since grad σ =0, σ(x, y)= c is locally solvable for y = y(x) such that y′ = σ /σ . Another 6 − x y way to obtain this is as follows: Differentiating the equation σ(x, y)= c yields σx dx+σy dy = 0 or dy/ dx = σ /σ . Inserting this into the previous equation we obtain a quadratic ODE − x y 2 a(y′) 2by′ + c =0, − 412 15 Partial Differential Equations I — an Introduction with solutions b √b2 ac y′ = ± − , if a =0. a 6 We can see, that the elliptic equation has no characteristic lines, the parabolic equation has one family of characteristics, the hyperbolic equation has two families of characteristic lines.

Hyperbolic case. In general, if c1 = ϕ1(x, y) is the first family of characteristic lines and c2 = ϕ2(x, y) is the second family of characteristic lines,

ξ = ϕ1(x, y), η = ϕ2(x, y) is the appropriate change of variable which gives a˜ =c ˜ = 0. The transformed equation then reads ˜ 2b u˜ξη + F (ξ,η, u,˜ u˜ξ, u˜η)=0. Parabolic case. Since det A = 0, there is only one real family of characteristic hyperplanes, say c1 = ϕ1(x, y). We impose the change of variables

z = ϕ1(x, y), y = y.

Since det A˜ =0, the coefficients ˜b vanish (together with a˜). The transformed equation reads

c˜u˜yy + F (z,y, u,˜ u˜z, u˜y)=0.

The above two equations are called the characteristic forms of the PDE (15.11) In our case the characteristic equation is

2 2 2 x (y′) y =0, y′ = y/x. − ± This yields dy dx = , log y = log x + c . y ± x | | ± | | 0 We obtain the two families of characteristic lines c y = c x, y = 2 . 1 x Indeed, in our example y ξ = = c , η = xy = c x 1 2 gives

ηx = y, ηy = x, ηxx =0, ηyy =0, ηxy =1, y 1 y 1 ξ = , ξ = , ξ =2 , ξ =0, ξ = . x −x2 y x xx x3 yy xy −x2 In our case (15.12) reads

2 2 uxx =u ˜ξξ ξx +2˜uξη ξxηx +˜uηη ηx +˜uξ ξxx +˜uη ηxx 2 2 uyy =u ˜ξξ ξy +2˜uξη ξyηy +˜uηη ηy +˜uξ ξyy +˜uη ηyy 15.3 Classification of Semi-Linear Second-Order PDEs 413

Noting x2 = η/ξ, y2 = ξη and inserting the values of the partial derivatives of ξ and η we get

y2 y2 y u =u ˜ 2 u˜ +˜u y2 +2 u˜ , xx ξξ x4 − x2 ξη ηη x3 ξ 1 u =u ˜ +2˜u +˜u x2. yy ξξ x2 ξη ηη Hence y x2u y2u = 4y2u˜ +2 u˜ =0 xx − yy − ξη x ξ 1 1 u˜ u˜ =0. ξη − 2 xy ξ

Since η = xy, we obtain the characteristic form of the equation to be

1 u˜ u˜ =0. ξη − 2η ξ

1 1 Using the substitution v =u ˜ , we obtain v v =0 which corresponds to the ODE v′ v = ξ η − 2η − 2η 0. Hence, v(η,ξ) = c(ξ)√η. Integration with respect to ξ gives u˜(ξ, η)= A(ξ)√η + B(η). Transforming back to the variables x and y, the general solution is y u(x, y)= A √xy + B(xy). x   (c) The one-dimensional wave equation u a2u =0. The characteristic equation σ2 = a2σ2 tt − xx t x yields σ /σ = dx/ dt =x ˙ = a. − t x ± The characteristics are x = at + c and x = at + c . The change of variables ξ = x at 1 − 2 − and η = x + at yields u˜ξη = 0 which has general solution u˜(ξ, η) = f(ξ)+ g(η). Hence, u(x, t)= f(x at)+ g(x + at) is the general solution, see also homework 23.2. . − (d) The wave equation in n dimensions has characteristic equation

n σ2 a2 σ2 =0. t − xi i=1 X This equation is satisfied by the characteristic cone

n σ(x, t)= a2(t t(0))2 (x x(0))2 =0, − − i − i i=1 X where the point (x(0), t(0)) is the peak of the cone. Indeed,

σ =2a2(t t(0)), σ = 2(x x(0)) t − x1 − i − i implies σ2 a2 n (x x(0))2 =0. t − i=1 i − i P 414 15 Partial Differential Equations I — an Introduction

Further, there are other characteristic surfaces: the hyperplanes

n

σ(x, t)= at + bixi =0, i=1 X where b =1. k k n 2 (e) The heat equation has characteristic equation i=1 σxi = 0 which implies σxi = 0 for all i = 1,...,n such that t = c is the only family of characteristic surfaces (the coordinate P hyperplanes). (f) The Poisson and Laplace equations have the same characteristic equation; however we have one variable less (no t) and obtain grad σ = 0 which is impossible. The Poisson and Laplace equations don’t have characteristic surfaces.

15.3.5 The Vibrating String

(a) The Infinite String on Ê

We consider the Cauchy problem for an infinite string (no boundary values):

u a2u =0, tt − xx u(x, 0) = u0(x), ut(x, 0) = u1(x), where u0 and u1 are given. Inserting the initial values into the general solution (see Example 15.4 (c)) u(x, t)= f(x at)+ g(x + at) we get −

u (x)= f(x)+ g(x), u (x)= af ′(x)+ ag′(x). 0 1 −

Differentiating the first one yields u0′ (x)= f ′(x)+ g′(x) such that 1 1 1 1 f ′(x)= u′ (x) u (x), g′(x)= u′ (x)+ u (x). 2 0 − 2a 1 2 0 2a 1 Integrating these equations we obtain

1 1 x 1 1 x f(x)= u (x) u (y) dy + A, g(x)= u (x)+ u (y) dy + B, 2 0 − 2a 1 2 0 2a 1 Z0 Z0 where A and B are constants such that A + B = 0 (since f(x)+ g(x) = u0(x)). Finally we have

u(x, t)= f(x at)+ g(x + at) − x at x+at 1 1 − 1 = (u (x + at)+ u (x at)) u (y) dy + u (y) dy 2 0 0 − − 2a 1 2a 1 Z0 Z0 1 1 x+at = (u0(x + at)+ u0(x at)) + u1(y) dy. (15.17) 2 − 2a x at Z − 15.3 Classification of Semi-Linear Second-Order PDEs 415 t It is clear from (15.17) that u(x, t) is uniquely deter-

mined by the values of the initial functions u0 and u1 (x,t) in the interval [x at, x + at] whose end points are − cut out by the characteristic lines through the point (x, t). This interval represents the domain of depen- dence for the solution at point u(x, t) as shown in the figure. x-at ξ x+at x Conversely, the initial values at point (ξ, 0) of the x-axis influence u(x, t) at points (x, t) in the wedge-shaped region bounded by the characteristics through (ξ, 0), i. e. , for ξ at a. | |

In this example we consider the vibrating string which is plucked at time t =0 as in the above picture (given u0(x)). The initial velocity is zero (u1 =0). u t=1/2 1/2

-3a/2 -a/2 a/2 3a/2 x In the pictures one can see the behaviour of u the string. The initial peek is divided into two 1/2 t=1 smaller peeks with the half displacement, one moving to the right and one moving to the left -2a -a a 2a x with speed a. u t=2 1/2

-3a -2a -a a 2a 3a 2 Formula (15.17) is due to d’Alembert (1746). Usually one assumes u0 C (Ê) and u1

1 2 2 ∈ ∈ Ê C (Ê). In this case, u C ( ) and we are able to evaluate the classical Laplacian ∆(u) ∈ which gives a continuous function. On the other hand, the right hand side of (15.17) makes sense for arbitrary continuous function u1 and arbitrary u0. If we want to call these u(x, t) a generalized solution of the Cauchy problem we have to alter the meaning of ∆(u). In particular, we need more general notion of functions and derivatives. This is our main objective of the next section.

(b) The Finite String over [0,l] We consider the initial boundary value problem (IBVP)

2 u = a u , u(0, x)= u (x), u (0, x)= u (x), x [0, l],u(0, t)= u(l, t)=0, t Ê. tt xx 0 t 1 ∈ ∈ 416 15 Partial Differential Equations I — an Introduction

Suppose we are given functions u C2([0, l]) and u C1([0, l]) on [0, l] with 0 ∈ 1 ∈

u0(0) = u0(l)=0, u1(0) = u1(l)=0, u0′′(0) = u0′′(l)=0.

To solve the IBVP, we define new functions u˜0 and u˜1 on Ê as follows: first extend both functions to [ l, l] as odd functions, that is, u˜i( x) = ui(x), i = 0, 1. Then extend u˜i as a − − − 2 2l-periodic function to the entire real line. The above assumptions ensure that u˜0 C (Ê) and 1 ∈ u˜ C (Ê). Put 1 ∈ 1 1 x+at u(x, t)= (˜u0(x + at)+˜u0(x at)) + u˜1(y) dy. 2 − 2a x at Z − Then u(x, t) solves the IVP. Chapter 16

Distributions

16.1 Introduction — Test Functions and Distributions

In this section we introduce the notion of distributions. Distributions are generalized functions. The class of distributions has a lot of very nice properties: they are differentiable up to arbitrary order, one can exchange limit procedures and differentiation, Schwarz’ lemma holds. Distri- butions play an important role in the theory of PDE, in particular, the notion of a fundamental solution of a differential operator can be made rigorous within the theory of distributions only. Generalized functions were first used by P. Dirac to study quantum mechanical phenomena. Systematically he made use of the so called δ-function (better: δ-distribution). The mathemat- ical foundations of this theory are due to S. L. Sobolev (1936) and L. Schwartz (1950, 1915 – 2002). Since then many mathematicians made progress in the theory of distributions. Motivation comes from problems in mathematical physics and in the theory of partial differential equations. Good accessible (German) introductions are given in the books of W. Walter [Wal74] and O. Forster [For81, 17]. More detailed explanations of the theory are to be found in the § books of H. Triebel (in English and German), V. S. Wladimirow (in russian and german) and Gelfand/Schilow (in Russian and German, part I, II, and III), [Tri92, Wla72, GS69, GS64].

16.1.1 Motivation

Distributions generalize the notion of a function. They are linear functionals on certain spaces of test functions. Using distributions one can express rigorously the density of a mass point, charge density of a point, the single-layer and the double-layer potentials, see [Arn04, pp. 92]. Roughly speaking, a generalized function is given at a point by the “mean values” in the neighborhood of that point.

The main idea to associate to each “sufficiently nice” function f a linear functional Tf (a distri- bution) on an appropriate function space D is described by the following formula.

Tf , ϕ = f(x)ϕ(x) dx, ϕ D. (16.1)

h i Ê ∈ Z

417 418 16 Distributions

On the left we adopt the notation of a dual pairing of vector spaces from Definition 11.1. In general the bracket T,ϕ denotes the evaluation of the functional T on the test function ϕ. h i Sometimes it is also written as T (ϕ). It does not denote an inner product; the left and the right arguments are from completely different spaces.

What we really want of Tf is

(a) The correspondence should be one-to-one, i. e., different functionals Tf and Tg corre- spond to different functions f and g. To achieve this, we need the function space D sufficiently large.

(b) The class of functions f should contain at least the continuous functions. However, if

n n 1 Ê f(x) = x , the function f(x)ϕ(x) must be integrable over Ê, that is x ϕ(x) L ( ). 1 ∈ Since polynomials are not in L (Ê), the functions ϕ must be “very small” for large x . | | Roughly speaking, there are two possibilities to this end. First, take only those functions ϕ which are identically zero outside a compact set (which depends on ϕ). This leads to

the test functions D(Ê). Then Tf is well-defined if f is integrable over every compact

subset of Ê. These functions f are called locally integrable. Secondly, we take ϕ(x) to be rapidly decreasing as x tends to . More precisely, we | | ∞ want sup xn ϕ(x) <

x Ê | | ∞ ∈

for all non-negative integers n Z . This concept leads to the notion of the so called ∈ +

Schwartz space S (Ê).

(c) We want to differentiate f arbitrarily often, even in case that f has discontinuities. The only thing we have to do is to give the expression

f ′(x)ϕ(x) dx, ϕ D

Ê ∈ Z a meaning. Using integration by parts and the fact that ϕ(+ )= ϕ( )=0, the above ∞ −∞ expression equals f(x)ϕ′(x) dx. That is, instead differentiating f, we differentiate − Ê the test function ϕ. In this way, the functional T ′ makes sense as long as fϕ is inte- R f ′ grable. Since we want to differentiate f arbitrarily often, we need the test function ϕ to

be arbitrarily differentiable, ϕ C∞(Ê). ∈ Note that conditions (b) and (c) make the space of test functions sufficiently small.

n 16.1.2 Test Functions D(Ê ) and D(Ω)

We want to solve the problem fϕ to be integrable for all polynomials f. We use the first approach and consider only functions ϕ which are 0 outside a bounded set. If nothing is stated

n n Ê otherwise, Ω Ê denotes an open, connected subset of . ⊆ 16.1 Introduction — Test Functions and Distributions 419

(a) The Support of a Function and the Space of Test Functions

Let f be a function, defineds on Ω. The set

n

supp f := x Ω f(x) =0 Ê { ∈ | 6 } ⊂ is called the support of f, denoted by supp f.

Remark 16.1 (a) supp f is always closed; it is the smallest closed subset M such that f(x)=0 n for all x Ê M. ∈ \ (b) A point x0 supp f if and only if there exists ε > 0 such that f 0 in Uε(x0). This in

6∈ n (k) ≡ Æ particular implies that for f C∞(Ê ) we have f (x )=0 for all k . ∈ 0 ∈ (c) supp f is compact if and only if it is bounded.

Example 16.1 (a) supp sin = Ê.

(b) Let let f : ( 1, 1) Ê, f(x)= x(1 x). Then supp f = [ 1, 1]. − → − − (c)The characteristic function χM has support M.

(d) Let h be the hat function on Ê – note that supp h = [ 1, 1] and f(x)=2h(x) 3h(x 10). − − − Then supp f = [ 1, 1] [ 11, 9]. − ∪ − − n

Definition 16.1 (a) The space D(Ê ) consists of all infinitely differentiable functions f on n Ê with compact support.

n n n

Ê Ê D(Ê ) = C∞( )= f C∞( ) supp f is compact . 0 { ∈ | } n D (b) Let Ω be a region in Ê . Define (Ω) as follows

n

D(Ω)= f C∞(Ω) supp f is compact in Ê and supp f Ω . { ∈ | ⊂ } We call D(Ω) the space of test functions on Ω.

First of all let us make sure the existence of such c/e functions. On the real axis consider the “hat” function (also called bump function)

h(t) 1 c e− 1−t2 , t < 1, h(t)= | | (0, t 1. | | ≥ −1 1

The constant c is chosen such that h(t) dt = 1. The function h is continuous on Ê. It was Ê

(k) (k) Ê already shown in Example4.5 that h ( 1) = h (1) = 0 for all k Æ. Hence h D( ) is R − ∈ ∈ a test function with supp h = [ 1, 1]. Accordingly, the function − 1 c e− 1−kxk2 , x < 1, h(x)= n k k (0, x 1. k k ≥ 420 16 Distributions

n is an element of D(Ê ) with support being the closed unit ball supp h = x x 1 . The { | k k ≤ } constant cn is chosen such that h(x) dx = h(x) dx =1.

n ÊZ U1Z(0) For ε> 0 we introduce the notation 1 x h (x)= h . ε εn ε   Then supp hε = U ε(0) and 1 x hε(x) dx = n h dx = h(y) dy =1. n Ê ε Uε(0) ε U (0) Z Z   Z 1 So far, we have constructed only one function h(x) (as well as its scaled relatives hε(x)) which is C∞ and has compact support. Using this single hat-function hε we are able (a) to restrict the support of an arbitrary integrable function f to a given domain by replacing f by fh (x a) which has a support in U (a), ε − ε (b) to make f smooth.

(b) Mollification

In this way, we have an amount of C∞ functions with compact support which is large enough for our purposes (especially, to recover the function f from the functional Tf ). Using the function hε, S.L. Sobolev developed the following mollification method.

1 n n Ê Definition 16.2 (a) Let f L (Ê ) and g D( ), define the convolution product f g by ∈ ∈ ∗ (f g)(x)= f(y)g(x y) dy = f(x y)g(y) dy =(g f)(x).

n n Ê ∗ Ê − − ∗ Z Z (b) We define the mollified function fε of f by f = f h . ε ∗ ε Note that f (x)= h (x y)f(y) dy = h (x y)f(y) dy. (16.2) ε ε − ε − n ÊZ UεZ(x)

Roughly speaking, fε(x) is the mean value of f in the ε-neighborhood of x. If f is continuous at x then f (x )= f(ξ) for some ξ U (x ). This follows from Proposition5.18. 0 ε 0 ∈ ε 0 In particular let f = χ[1,2] the characteristic function of the interval [1, 2]. The mollification

fε looks as follows

0, x< 1 ε, 1 2 −  , 1 ε

1 n n Ê Remarks 16.2 (a) For f L (Ê ), fε C∞( ), 1 n ∈ ∈ (b) fε f in L (Ê ) as ε 0.

→ n 1 n → 1 Ê (c) C0(Ê ) L ( ) is dense (with respect to the L -norm). In other words, for any

1 n ⊂ n Ê f L (Ê ) and ε> 0 there exists g C( ) with supp g is compact and n f g dx<ε. ∈ ∈ Ê | − | R (Sketch of Proof). (A) Any integrable function can be approximated by integrable functions with compact support. This follows from Example12.6. (B) Any integrable function with compact support can be approximated by sim- ple functions (which are finite linear combinations of characteristic functions) with compact support. (C) Any characteristic function with compact support can be approximated by char-

acteristic functions χQ where Q is a finite union of boxes. (D) Any χQ where Q is a closed box can be approximated by a sequence fn of continuous functions with compact support:

f (x) = max 0,nd(x, Q) , n Æ, n { } ∈

where d(x, Q) denotes the distance of x from Q. Note that fn is 1 in Q and 0

outside U1/n(Q).

n 1 n Ê (d) C∞(Ê ) L ( ) is dense. 0 ⊂

(b) Convergence in D

n n Æ Notations. For x Ê and α (a multi-index), α =(α ,...,α ) we write ∈ ∈ 0 1 n α = α + α + + α , | | 1 2 ··· n α!= α ! α ! 1 ··· n xα = xα1 xα2 xαn , 1 2 ··· n α α ∂| |u(x) D u(x)= α . ∂x 1 ∂xαn 1 ··· n n It is clear that D(Ê ) is a linear space. We shall introduce an appropriate notion of convergence.

n n Ê Definition 16.3 A sequence (ϕn(x)) of functions of D(Ê ) converges to ϕ D( ) if there n ∈ exists a compact set K Ê such that ⊆

(a) supp ϕ K for all n Æ and n ⊆ ∈ (b) α α D ϕn ⇉ D ϕ, uniformly on K for all multi-indices α .

We denote this type of convergence by ϕn ϕ. −→D 422 16 Distributions

Example 16.2 Let ϕ D be a fixed test function and consider the sequence (ϕn(x)) given by ϕ(x) ∈ (a) . This sequence converges to 0 in D since supp ϕ = supp ϕ for all n and the n n   n convergence is uniform for all x Ê (in fact, it suffices to consider x supp ϕ). ϕ(x/n) ∈ ∈ (b) . The sequence does not converge to 0 in D since the supports supp (ϕ ) = n n   n supp (ϕ), n Æ, are not in any common compact subset. ϕ(nx) ∈ (c) has no limit if ϕ =0, see homework 49.2. n 6  

n n Ê Note that D(Ê ) is not a metric space, more precisely, there is no metric on D( ) such that the metric convergence and the above convergence coincide.

n 16.2 The Distributions D′(Ê )

Definition 16.4 A distribution (generalized function) is a continuous linear functional on the n space D(Ê ) of test functions.

Here, a linear functional T on D is said to be continuous if and only if for all sequences (ϕn),

ϕ ,ϕ D, with ϕ ϕ we have T,ϕ T,ϕ in C. n ∈ n −→D h ni→h i n The set of distributions is denoted by D′(Ê ) or simply by D′.

The evaluation of a distribution T D′ on a test function ϕ D is denoted by T,ϕ . Two ∈ ∈ h i distributions T and T are equal if and only if T , ϕ = T , ϕ for all ϕ D. 1 2 h 1 i h 2 i ∈ n Remark 16.3 (Characterization of continuity.) (a) A linear functional T on D(Ê ) is con-

tinuous if and only if ϕ 0 implies T,ϕ 0 in C. Indeed, T continuous, trivially n −→D h ni→ implies the above statement. Suppose now, that ϕn ϕ. Then (ϕn ϕ) 0; thus −→D − −→D T,ϕ ϕ 0 as n . Since T is linear, this shows T,ϕ T,ϕ and T is h n − i → → ∞ h ni→h i continuous. (b) A linear functional T on D is continuous if and only if for all compact sets K there exist a

constant C > 0 and l Z such that for all ∈ + T,ϕ C sup Dαϕ(x) , ϕ D with supp ϕ K. (16.3) |h i | ≤ x K, α l | | ∀ ∈ ⊂ ∈ | |≤ We show that the criterion (16.3) in implies continuity of T . Indeed, let ϕ 0. Then there −→D n exists compact subset K Ê such that supp ϕn K for all n. By the criterion, there is a ⊂ α⊆ C > 0 and an l Z with T,ϕ C sup D ϕ (x) , where the supremum is taken over ∈ + |h ni | ≤ | n | all x K and multiindices α with α l. Since Dαϕ ⇉ 0 on K for all α, we particularly ∈ | | ≤ n have sup Dαϕ (x) 0 as n . This shows T,ϕ 0 and T is continuous. | n | −→ →∞ h ni→ For the proof of the converse direction, see [Tri92, p. 52]

16.2.1 Regular Distributions

A large subclass of distributions of D′ is given by ordinary functions via the correspondence f Tf given by Tf , ϕ = f(x)ϕ(x) dx. We are looking for a class which is as large as

↔ h i Ê Z n 16.2 The Distributions D′(Ê ) 423 possible.

n Definition 16.5 Let Ω be an open subset of Ê . A function f(x) on Ω is said to be locally integrable over Ω if f(x) is integrable over every compact subset K Ω; we write in this case ⊆ f L1 (Ω). ∈ loc Remark 16.4 The following are equivalent:

1 n (a) f L (Ê ). ∈ loc (b) For any R> 0, f L1(U (0)). ∈ R n 1 (c) For any x Ê there exists ε> 0 such that f L (U (x )). 0 ∈ ∈ ε 0 1 Lemma 16.1 If f is locally integrable f L (Ω), T is a distribution, T D′(Ω). ∈ loc f f ∈ A distribution T which is of the form T = Tf with some locally integrable function f is called regular.

Proof. First, Tf is linear functional on D since integration is a linear operation. Secondly, if ϕm 0, then there exists a compact set K with supp ϕm K for all m. We have the −→D ⊂ following estimate:

f(x)ϕm(x) dx sup ϕm(x) f(x) dx = C sup ϕm(x) , n Ê ≤ x K | | K | | x K | | Z ∈ Z ∈ where C = f dx exists since f L1 . The expression on the right tends to 0 since ϕ (x) K | | ∈ loc m uniformly tends to 0. Hence T , ϕ 0 and T belongs to D′. R h f mi→ f

Example 16.3 (a) C(Ω) L1 (Ω), L1(Ω) L1 (Ω). ⊂ loc ⊆ loc 1 1 1 1 (b) f(x) = is in L ((0, 1)); however, f L ((0, 1)) and f L (Ê) since f is not x loc 6∈ 6∈ loc integrable over [ 1, 1]. − Lemma 16.2 (Du Bois–Reymond, Fund. Lemma of the Calculus of Variation) Let Ω

n 1 n ⊆ Ê Ê be a region. Suppose that f L ( ) and T , ϕ =0 for all ϕ D(Ω). ∈ loc h f i ∈ Then f =0 almost everywhere in Ω. Proof. For simplicity we consider the case n = 1, Ω = ( π, π). Fix ε with 0 <ε<π. Let inx − x ϕ (x) = e− h (x), n Z. Then supp ϕ [ π, π]. Since both, e and h are C∞-functions, n ε ∈ n ⊂ − ε ϕ D(Ω) and n ∈ π inx cn = Tf , ϕn = f(x)e− hε(x) dx =0, n Z; h i π ∈ Z− and all Fourier coefficients of f h L2[ π, π] vanish. From Theorem13.13(b) it follows that ε ∈ − fh is 0 in L2( π, π). By Proposition12.16 it follows that fh is 0 a.e. in ( π, π). Since ε − ε − h > 0 on ( π, π), f =0 a.e. on ( π, π). ε − − 424 16 Distributions

Remark 16.5 The previous lemma shows, if f1 and f2 are locally integrable and Tf1 = Tf2 then 1 n f1 = f2 a.e.; that is, the correspondence is one-to-one. In this way we can identify Lloc(Ê ) n ⊆ D′(Ê ) the locally integrable functions as a subspace of the distributions.

16.2.2 Other Examples of Distributions

Definition 16.6 Every non-regular distribution is called singular. The most important example of singular distribution is the δ-distribution δa defined by

n

δ , ϕ = ϕ(a), a Ê , ϕ D. h a i ∈ ∈ It is immediate that δ is a linear functional on D. Suppose that ϕ 0 then ϕ (x) 0 a n −→D n → pointwise. Hence, δ (ϕ ) = ϕ (a) 0; the functional is continuous on D and therefore a a n n → distribution. We will also use the notation δ(x a) in place of δa and δ or δ(x) in place of δ0. − 1 Proof that δa is singular. If δa D′ were regular there would exist a function f Lloc such that ∈ n ∈ δa = Tf , that is ϕ(a)= n f(x)ϕ(x) dx. First proof. Let Ω Ê be an open such that a Ω. Ê ⊂ 6∈ Let ϕ D(Ω), that is, supp ϕ Ω. In particular ϕ(a)=0. That is, f(x)ϕ(x) dx = 0 for ∈ R ⊂ Ω all ϕ D(Ω). By Du Bois-Reymond’s Lemma, f =0 a.e. in Ω. Since Ω was arbitrary, f =0

∈ n n R n

Ê Ê a.e. in Ê a and therefore, f = 0 a.e. in . It follows that T = 0 in D′( ), however \ { } f δ =0 – a contradiction. a 6 Second Proof for a =0. Since f L1 there exists ǫ> 0 such that ∈ loc

d := f(x) dx< 1. | | ZUǫ(0)

Putting ϕ(x) = h(x/ε) with the bump function h we have supp ϕ = Uε(0) and

n such that supx Ê ϕ(x) = ϕ(0) > 0 ∈ | |

f(x)ϕ(x) dx sup ϕ(x) f(x) dx = ϕ(0)d<ϕ(0). n

Ê ≤ | | | | Z ZUε(0)

This contradicts n f(x)ϕ(x) dx = ϕ(0) = ϕ(0). Ê | | In the same way one can show that the assignment R

α n

T,ϕ = D ϕ(a), a Ê , ϕ D h i ∈ ∈ defines an element of D′ which is singular. The distribution α 1 T,ϕ = f(x)D ϕ(x) dx, f Lloc, n

h i Ê ∈ Z may be regular or singular which depends on the properties of f.

Locally integrable functions as well as δa describe mass, force, or charge densities. That is why L. Schwartz named the generalized functions “distributions.” n 16.2 The Distributions D′(Ê ) 425

16.2.3 Convergence and Limits of Distributions

1 There are a lot of possibilities to approximate the distribution δ by a sequence of Lloc functions.

n n Ê Definition 16.7 A sequence (Tn), Tn D′(Ê ), is said to be convergent to T D′( ) if and n ∈ ∈ only if for all ϕ D(Ê ) ∈ lim Tn(ϕ)= T (ϕ). n →∞ Similarly, let Tε, ε > 0, be a family of distributions in D′, we say that limε 0 Tε = T if → limε 0 Tε(ϕ)= T (ϕ) for all ϕ D. → ∈ n Note that D′(Ê ) with the above notion of convergence is complete, see [Wal02, p. 39].

1 1 x Example 16.4 Let f(x) = χ[ 1,1] and fε = f be the scaling of f. Note that fε =

2 − ε ε Ê 1/(2ε)χ[ ε,ε]. We will show tht fε δ in D′(Ê). Indeed, for ϕ D( ), by the Mean Value − →  ∈ Theorem of integration,

1 1 ε 1 Tfε (ϕ)= χ[ ε,ε]ϕ dx = ϕ(x) dx = 2εϕ(ξ)= ϕ(ξ), ξ [ ε,ε], − 2ε Ê 2ε ε 2ε ∈ − Z Z− for some ξ. Since ϕ is continuous at 0, ϕ(ξ) tends to ϕ(0) as ε 0 such that →

lim Tfε (ϕ)= ϕ(0) = δ(ϕ). ε 0 → This proves the calaim.

The following lemma generalizes this example.

1 Lemma 16.3 Suppose that f L (Ê) with f(x) dx = 1. For ε > 0 define the scaled ∈ Ê function f (x)= 1 f x . ε ε ε R

Then lim Tfε = δ in D′(Ê). ε 0+0  →

Proof. By the change of variable theorem, fε(x) dx =1 for all ε> 0. To prove the claim we Ê have to show that for all ϕ D ∈ R

fε(x)ϕ(x) dx ϕ(0) = fε(x)ϕ(0)dx as ε 0; Ê Ê −→ → Z Z or, equivalently,

fε(x)(ϕ(x) ϕ(0)) dx 0, as ε 0.

Ê − −→ → Z

Using the new coordinate y with x = εy, dx = ε dy the above integral equals

εfε(εy)(ϕ(εy) ϕ(0)) dy = f(y)(ϕ(εy) ϕ(0)) dy . Ê Ê − − Z Z

Since ϕ is continuous at 0, for every fixed y, the family of functions (ϕ(εy) ϕ(0)) tends to − 0 as ε 0. Hence, the family of functions g (y) = f(y)(ϕ(εy) ϕ(0)) pointwise tends to → ε − 426 16 Distributions

0. Further, gε has an integrable upper bound, 2C f , where C = sup ϕ(x) . By Lebesgue’s

| | x Ê | | theorem about the dominated convergence, the limit of the integrals∈ is 0:

lim f(y) ϕ(εy) ϕ(0) dy = f(y) lim ϕ(εy) ϕ(0) dy =0.

ε 0 ε 0 Ê Ê | | | − | | | | − | → Z Z → This proves the claim.

The following sequences of locally integrable functions approximate δ as ε 0. →

1 x 1 ε f (x)= sin2 , f (x)= , (16.4) ε πεx2 ε ε π x2 + ε2 1 x2 1 x f (x)= e− 4ε2 , f (x)= sin ε 2ε√π ε πx ε

sin x The first three functions satisfy the assumptions of the Lemma, the last one not since x is 1 not in L (Ê). Later we will see that the above lemma even holds if ∞ f(x) dx = 1 as an improper Riemann integral. −∞ R P 1 16.2.4 The distribution x 1 Since the function x is not locally integrable in a neighborhood of 0, 1/x is not a regular distribution. However, we can define a substitute that coincides with 1/x for all x =0. 6 Recall that the principal value (or Cauchy mean value) of an improper Riemann integral is defined as follows. Suppose f(x) has a singularity at c [a, b] then ∈ b c ε b − Vp f(x) dx := lim + f(x) dx. ε 0 Z a → Za Zc+ε 1 dx For example, Vp =0, n Æ. 1 x2n+1 ∈ For ϕ D define − ∈ R ε ∞ ϕ(x) − ∞ ϕ(x) F (ϕ) = Vp dx = lim + dx. x ε 0 ε x Z −∞ → Z−∞ Z  Then F is obviously linear. We have to show, that F (ϕ) is finite and continuous on D. Suppose that supp ϕ [ R, R]. Define the auxiliary function ⊆ − ϕ(x) ϕ(0) − , x =0 ψ(x)= x 6 (ϕ′(0), x =0.

ε Since ϕ is differentiable at 0, ψ C(Ê). Since 1/x is odd, ε dx/x =0 and we get ∈ − ε ε R − ∞ ϕ(x) −R ϕ(x) ϕ(0) F (ϕ) = lim + dx = lim + − dx ε 0 ε x ε 0 R ε x →  −∞  →  −  Z ε Z R R Z Z − = lim + ψ(x) dx = ψ(x) dx. ε 0 R ε R → Z− Z  Z− n 16.2 The Distributions D′(Ê ) 427

Since ψ is continuous, the above integral exists.

We now prove continuity of F . By Taylor’s theorem, ϕ(x)= ϕ(0)+ xϕ′(ξx) for some value ξx between x and 0. Therefore ε − ∞ ϕ(x) F (ϕ) = lim + dx | | ε 0 ε x → Z−∞ Z  ε R − ϕ(0) + xϕ ′(ξx) = lim + dx ε 0 R ε x → Z− Z  R

ϕ′(ξx) dx 2R sup ϕ′(x) .

≤ R | | ≤ x Ê | | Z− ∈

This shows that the condition (16.3) in Remark16.3 is satisfied with C = 2R and l = 1 such Ê that F is a continuous linear functional on D(Ê), F D′( ). We denote this distribution by P 1 ∈ x . In quantum physics one needs the so called Sokhotsky’s formulas, [Wla72, p.76] 1 1 lim = πiδ(x)+ P , ε 0+0 → x + εi − x 1 1 lim = πiδ(x)+ P . ε 0+0 x εi x → − Idea of proof: Show the sum and the difference of the above formulas instead. 2x 1 2iε lim =2P , lim − = 2πiδ. ε 0+0 2 2 ε 0+0 2 2 → x + ε x → x + ε − The second limit follows from (16.4).

16.2.5 Operation with Distributions The distributions are distinguished by the fact that in many calculations they are much easier to handle than functions. For this purpose it is necessary to define operations on the set D′. We already know how to add distributions and how to multiply them with complex numbers since

D′ is a linear space. Our guiding principle to define multiplication, derivatives, tensor prod- ucts, convolution, Fourier transform is always the same: for regular distributions, i.e. locally integrable functions, we want to recover the old well-known operation.

(a) Multiplication

There is no notion of a product T1T2 of distributions. However, we can define a T = T a,

n n · · Ê T D′(Ê ), a C∞( ). What happens in case of a regular distribution T = T ? ∈ ∈ f

aTf , ϕ = a(x)f(x)ϕ(x) dx = f(x) a(x)ϕ(x) dx = Tf , aϕ . (16.5)

n n Ê h i Ê h i Z Z

n n Ê Obviously, aϕ D(Ê ) since a C∞( ) and ϕ has compact support; thus, aϕ has compact ∈ ∈ n support, too. Hence, the right hand side of (16.5) defines a linear functional on D(Ê ). We have to show continuity. Suppose that ϕn 0 then aϕn 0. Then T, aϕn 0 since T −→D −→D h i→ is continuous. 428 16 Distributions

n n n

Ê Ê Definition 16.8 For a C∞(Ê ) and T D′( ) we define aT D′( ) by ∈ ∈ ∈ aT, ϕ = T, aϕ h i h i and call aT the product of a and T .

1 n Example 16.5 (a) xP =1. Indeed, for ϕ D(Ê ), x ∈

1 1 ∞ xϕ(x) xP , ϕ = P , xϕ(x) = Vp dx = ϕ(x) dx = 1 , ϕ .

x x x Ê h i     Z −∞ Z n (b) If f(x) C∞(Ê ) then ∈ f(x)δ , ϕ = δ , f(x)ϕ(x) = f(a)ϕ(a)= f(a) δ , ϕ . h a i h a i h a i

This shows f(x)δa = f(a)δa. (c) Note that multiplication of distribution is no longer associative:

1 1 1 (δ x) P = 0 P =0, δ x P = δ 1= δ. · x (b) · x · x (a) ·  

(b) Differentiation

Consider n = 1. Suppose that f L1 is continuously differentiable. Suppose further that ∈ loc ϕ D with supp ϕ ( a, a) such that ϕ( a)= ϕ(a)=0. We want to define (T )′ to be T ′ . ∈ ⊂ − − f f Using integration by parts we find

a a a Tf ′ , ϕ = f ′(x)ϕ(x) dx = f(x)ϕ(x) a f(x)ϕ′(x) dx h i a |− − a Z− a Z− = f(x)ϕ′(x) dx = Tf , ϕ′ , − a −h i Z− where we used that ϕ( a)= ϕ(a)=0. Hence, it makes sense to define T ′ , ϕ = T , ϕ′ . − f −h f i This can easily be generalized to arbitrary partial derivatives DαT . f

n n α n

Æ Ê Definition 16.9 For T D′(Ê ) and a multi-index α define D T D′( ) by ∈ ∈ 0 ∈ α α α D T,ϕ =( 1)| | T,D ϕ . h i − h i We have to make sure that DαT is indeed a distribution. The linearity of DαT is obvious. To prove continuity let ϕ 0. By definition, this implies Dαϕ 0. Since T is continuous, n −→D n −→D T,Dαϕ 0. This shows DαT,ϕ 0; hence DαT is a continuous linear functional h ni → h ni → on D. α Note, that exactly the fact D T D′ needs the complicated looking notion of convergence in ∈ D, Dαϕ Dαϕ. Further, a distribution has partial derivatives of all orders. n → n 16.2 The Distributions D′(Ê ) 429

n n Ê Lemma 16.4 Let a C∞(Ê ) and T D′( ). Then ∈α ∈ α (a) Differentiation D : D′ D′ is continuous in D′, that is, Tn T in D′ implies D Tn α → → → D T in D′. (b) ∂ ∂a ∂T (a T )= T + a , i =1,...,n (product rule). ∂xi ∂xi ∂xi (c) For any two multi-indices α and β

Dα+βT = Dα(DβT )= Dβ(DαT ) (Schwarz’s Lemma).

Proof. (a) Suppose that T T in D′, that is T , ψ T , ψ for all ψ D. In particular, n → h n i→h i ∈ for ψ = Dαϕ, ϕ D, we get ∈ α α α α α α ( 1)| | D T , ϕ = T ,D ϕ T,D ϕ =( 1)| | D T,ϕ . − h n i h n i −→h i − h i Since this holds for all ϕ D, the assertion follows. ∈ (b) This follows from ∂T ∂T ∂ a , ϕ = , aϕ = T , (aϕ) ∂x ∂x − ∂x  i   i   i  ∂ϕ ∂ϕ = T , a (x)ϕ T , a = a T,ϕ aT , −h xi i − ∂x −h xi i − ∂x  i   i  ∂ ∂ = a T,ϕ + (aT ) , ϕ = a T + (aT ) , ϕ . −h xi i ∂x − xi ∂x  i   i  “Cancelling” ϕ on both sides proves the claim. (c) The easy proof uses Dα+βϕ = Dα(Dβϕ) for ϕ D. ∈

n 1 n Ê Example 16.6 (a) Let a Ê , f L ( ), ϕ D. Then ∈ ∈ loc ∈ α α α α α D δ , ϕ =( 1)| | δ ,D ϕ =( 1)| |D ϕ(a) h a i − h a i − α α α D f,ϕ =( 1)| | f D ϕ dx. n

h i − Ê Z (b) Recall that the so-called Heaviside function H(x) is defined as the characteristic function of the half-line (0, + ). We compute its derivative in D′: ∞ ∞ TH′ , ϕ(x) = H(x)ϕ′(x) dx = ϕ(x)′ dx = ϕ(x) 0∞ = ϕ(0) = δ,ϕ .

h i − Ê − − | h i Z Z0 This shows TH′ = δ.

h f(x) (c) More generally, let f(x) be differentiable on G =

Ê c = ( ,c) (c, ) with a discontinuity of the \ { } −∞ ∪ ∞ first kind at c.

c 430 16 Distributions

The derivative of Tf in D′ is

T ′ = T ′ + hδ , where h = f(c + 0) f(c 0), f f c − − is the difference between the right-handed and left-handed limits of f at c. Indeed, for ϕ D ∈ we have c ∞ Tf′ , ϕ = f(x)ϕ′(x) dx − − c  Z−∞ Z 

= f(c 0)ϕ(c)+ f(c + 0)ϕ(c)+ f ′(x)ϕ(x) dx − − ZG = (f(c + 0) f(c 0))δ + T ′ , ϕ − − c f (x) = h δ + T ′ , ϕ . h c f i 1 (d) We prove that f(x) = log x is in L (Ê) (see homework 50.4, and 51.5) and compute its | | loc derivative in D′(Ê).

Proof. Since f(x) is continuous on Ê \ 0 , the only critical point is 0. Since the integral 1 { } 0 t (improper Riemann or Lebesgue) 0 log x dx = e dt = 1 exists f is locally integrable − −∞ − P 1 at 0 and therefore defines a regular distribution. We will show that f ′(x) = x . We use the ε ε R R ∞ − ∞ fact that = + ε + ε for all positive ε> 0. Also, the limit ε 0 of the right hand −∞ −∞ − → side gives the ∞ . By definition of the derivative, R −∞R R R R ∞ log′ x , ϕ(x) = log x , ϕ′(x) = log x ϕ′(x) dx h | | i −h | | i | | ε ε Z−∞ − ∞ = + + log x ϕ′(x) dx . − ε ε | | Z−∞ Z− Z   1 ε Since 1 log x ϕ′(x) dx < , the middle integral ε log x ϕ′(x) dx tends to 0 as ε 0 − | | ∞ − | | → (Apply Lebesgue’s theorem to the family of functions gε(x) = χ[ ε,ε](x)log x ϕ′(x) which R R − | | pointwise tends to 0 and is dominated by the integrable function log x ϕ′(x)). We consider | | the third integral. Integration by parts and ϕ(+ )=0 gives ∞ ∞ ∞ ϕ(x) ∞ ϕ(x) log xϕ′(x) dx = log xϕ(x) ∞ dx = log εϕ(ε) dx. |ε − x − x Zε Zε Zε Similarly, ε ε − − ϕ(x) log( x) ϕ′(x) dx = log εϕ( ε) dx. − − − − x Z−∞ Z−∞ The sum of the first two (non-integral) terms tends to 0 as ε 0 since ε log ε 0. Indeed, → → ϕ(ε) ϕ( ε) log εϕ(ε) log εϕ( ε) = log ε − − 2ε 2 lim ε log ε ϕ′(0) = 0. ε 0 − − 2ε −→ → Hence, ε − ∞ ϕ(x) 1 f ′ , ϕ = lim + dx = P , ϕ . h i ε 0 ε x x → Z−∞ Z    n 16.2 The Distributions D′(Ê ) 431

(c) Convergence and Fourier Series

Lemma 16.5 Suppose that (fn) converges locally uniformly to some function f, that is, fn ⇉ f uniformly on every compact set; assume further that fn is locally integrable for all n, fn 1 n ∈ Lloc(Ê ).

1 n n Ê Ê D (a) Then f Lloc( ) and Tfn Tf in ′( ). α ∈ α n → (b) D T D T in D′(Ê ) for all multi-indices α. fn → f n 1 Proof. (a) Let K be a compact subset of Ê , we will show that f L (K). Since f converge ∈ n uniformly on K to 0, by Theorem6.6 f is integrable and limn fn(x) dx = f dx. such →∞ K K 1 n that f L (Ê ). ∈ loc R R We show that T T in D′. Indeed, for any ϕ D with compact support K, again by fn → f ∈ Theorem 6.6 and uniform convergence of fnϕ on K,

lim Tfn (ϕ) = lim fn(x)ϕ(x) dx n n →∞ →∞ ZK = lim fn(x) ϕ(x) dx = f(x)ϕ(x) dx = Tf (ϕ); K n K Z  →∞  Z we are allowed to exchange limit and integration since (fn(x)ϕ(x)) uniformly converges on K.

Since this is true for all ϕ D, it follows that Tfn Tf . ∈ → α α (b) By Lemma16.4(a), differentiation is a continuous operation in D′. Thus D T D T . fn → f

m

Example 16.7 (a) Suppose that a,b > 0 and m Æ are given such that c a n + b for ∈ | n | ≤ | | all n Z. Then the Fourier series ∈ inx cne ,

n Z X∈ converges in D′(Ê). First consider the series c xm+2 c 0 + n einx. (16.6) (m + 2)! (ni)m+2

n Z,n=0 ∈X6 By assumption, c c a n m + b a˜ n einx = n | | . (ni)m+2 (ni)m+2 ≤ n m+2 ≤ n 2 | | | | a˜ Since n=0 2 < , the series (16.6) converges uniformly on Ê by the criterion of Weier- 6 n ∞ straß (Theorem| | 6.2). By Lemma 16.5, the series (16.6) converges in D , too and can be differ- P ′ entiated term-by-term. The (m + 2)nd derivative of (16.6) is exactly the given Fourier series. The 2π-periodic function f(x) = 1 x , x 1/2 2 − 2π ∈ [0, 2π) has discontinuities of the first kind at −π π 2 π 2πn, n Z; the jump has height 1 since ∈ 1 1 −1/2 f(0+0) f(0 0) = + =1. − − 2 2 432 16 Distributions

Therefore in D′ 1 f ′(x)= + δ(x 2πn). −2π −

n Z X∈ The Fourier series of f(x) is 1 1 f(x) einx. ∼ 2πi n n=0 X6 2 2π 2 Note that f and the Fourier series g on the right are equal in L (0, 2π). Hence 0 f g =0. | 1 − | This implies f = g a. e. on [0, 2π]; moreover f = g a. e. on Ê. Thus f = g in L (R) and f R loc coincides with g in D′(Ê).

1 1 inx

f(x)= e in D′(Ê). 2πi n n=0 X6 By Lemma 16.5 the series can be differentiated elementwise up to arbitrary order. Applying Example16.6 we obtain:

1 1 inx

f ′(x)+ = δ(x 2πn)= e in D′(Ê).

2π − 2π Z n Z n X∈ X∈ m (b) A solution of x u(x)=0 in D′ is m 1 − (n)

u(x)= c δ (x), c C. n n ∈ n=0 X Since for every ϕ D and n =0,...,m 1 we have ∈ − xmδ(n)(x) , ϕ =( 1)n δ , (xmϕ(x))(n) = (xmϕ(x))(n) = 0; − x=0 thus, the given u satisfies xm u =0. One can show, that this is the general solution, see [Wla72, p. 84]. (m) (c) The general solution of the ODE u =0 in D′ is a polynomial of degree m 1. − Proof. We only prove that u′ = 0 implies u = c in D′. The general statement follows by induction on m.

Suppose that u′ = 0. That is, for all ψ D we have 0 = u′ , ψ = u , ψ′ . In particular, ∈ h i −h i for ϕ,ϕ D we have 1 ∈ x ψ(x)= (ϕ(t) ϕ (t)I) dt, where I = 1 , ϕ , − 1 h i Z−∞ belongs to D since both ϕ and ϕ do; ϕ plays an auxiliary role. Since u , ψ′ = 0 and 1 1 h i ψ′ = ϕ Iϕ we obtain − 1 0= u , ψ′ = u,ϕ ϕ I = u,ϕ u, ϕ 1 , ϕ h i h − 1 i h i−h 1ih i = u,ϕ 1 u, ϕ , ϕ h i−h h 1i i = u 1 u,ϕ , ϕ = u c 1 , ϕ , h − h 1i i h − i n where c = u, ϕ . Since this is true for all test functions ϕ D(Ê ), we obtain 0= u c or h 1i ∈ − u = c which proves the assertion. 16.3 Tensor Product and Convolution Product 433

16.3 Tensor Product and Convolution Product

16.3.1 The Support of a Distribution

Let T D′ be a distribution. We say that T vanishes at x if there exists ε > 0 such that ∈ 0 T,ϕ = 0 for all functions ϕ D with supp ϕ U (x ). Similarly, we say that two h i ∈ ⊆ ε 0 distributions T1 and T2 are equal at x0, T1(x0) = T2(x0), if T1 T2 vanishes at x0. Note that

n n − Ê T = T if and only if T = T at a Ê for all points a . 1 2 1 2 ∈ ∈

Definition 16.10 Let T D′ be a distribution. The support of T , denoted by supp T , is the set ∈ of all points x such that T does not vanish at x, that is supp T = x ε> 0 ϕ D(U (x)): T,ϕ =0 . { | ∀ ∃ ∈ ε h i 6 } Remarks 16.6 (a) If f is continuous, then supp Tf = supp f; for an arbitrary locally integrable function we have, in general supp Tf supp f. The support of a distribution is closed. Its ⊂ n complement is the largest open subset G of Ê such that T ↾G =0. (b) supp δ = a , that is, δ vanishes at all points b = a. supp T = [0, + ), supp T = ∅. a { } a 6 H ∞ χÉ 16.3.2 Tensor Products (a) Tensor product of Functions

n m n+m

C Ê C Ê C Let f : Ê , g : be functions. Then the tensor product f g : is

→ → n m ⊗ → Ê defined via f g(x, y)= f(x)g(y), x Ê ,y .

⊗ n ∈m ∈ Ê If ϕk D(Ê ) and ψk D( ), k = 1,...,r, we call the function ϕ(x, y) = r ∈ ∈ n+m k=1 ϕk(x)ψk(y) which is defined on Ê the tensor product of the functions ϕk and ψk. Itis

r n m Ê denoted by ϕ ψ . The set of such tensors ϕ ψ is denoted by D(Ê ) D( ). P k k ⊗ k k=1 k ⊗ k ⊗ It is a linear space. P P Note first that under the above assumptions on ϕk and ψk the tensor product ϕ = k ϕk

n+m n m ⊗

Ê Ê ψ C∞(Ê ). Let K and K denote the common compact supports of k ∈ 1 ⊂ 2 ⊂ P the families ϕk and ψk , respectively. Then supp ϕ K1 K2. Since both K1 and K2 { } { } ⊂ × n+m are compact, its product K1 K2 is agian compact. Hence, ϕ(x, y) D(Ê ). Thus,

n m n+m× n m ∈ n+m

Ê Ê Ê Ê Ê D(Ê ) D( ) D( ). Moreover, D( ) D( ) is a dense subspace in D( ).

⊗ ⊂ n+m ⊗ (m) (m) Æ That is, for any η D(Ê ) there exist positive integers r and test functions ϕ , ψ ∈ k ∈ k k such that rm ϕ(m) ψ(m) η as m . k ⊗ k −→D →∞ Xl=1 (b) Tensor Product of Distributions

n m Ê Definition 16.11 Let T D′(Ê ) and S D′( ) be two distributions. Then there exists a

∈ n+m ∈ n m

Ê Ê unique distribution F D′(Ê ) such that for all ϕ D( ) and ψ D( ) ∈ ∈ ∈ F (ϕ ψ)= T (ϕ)S(ψ). ⊗ This distribution F is denoted by T S. ⊗ 434 16 Distributions

n n r Ê Indeed, T S is linear on D(Ê ) D( ) such that (T S)( k=1 ϕk ψk) =

r ⊗ ⊗ n ⊗ n n⊗+m

Ê Ê k=1 T (ϕk)S(ψk). By continuity it is extended from D(Ê ) D( ) to D( ). For

n m ⊗ n P m

Ê Ê Ê example, if a Ê , b then δ δ = δ . Indeed, for ϕ D( ) and ψ D( ) we P ∈ ∈ a ⊗ b (a,b) ∈ ∈ have (δ δ )(ϕ ψ)= ϕ(a)ψ(b)=(ϕ ψ)(a, b)= δ (ϕ ψ). a ⊗ b ⊗ ⊗ (a,b) ⊗

n+m n Ê Lemma 16.6 Let F = T S be the unique distribution in D′(Ê ) where T D( ) and

m ⊗ n+m ∈ Ê S D(Ê ) and η(x, y) D( ).

∈ ∈ n m Ê Then ϕ(x)= S(y) , η(x, y) is in D(Ê ), ψ(y)= T (x) , η(x, y) is in D( ) and we have h i h i (T S) , η = S , T , η = T , S , η . h ⊗ i h h ii h h ii For the proof, see [Wla72, II.7].

1 n 1 m Ê Example 16.8 (a) Regular Distributions. Let f Lloc(Ê ) and g Lloc( ). Then f g 1 n+m ∈ ∈ ⊗ ∈ Lloc(Ê ) and Tf Tg = Tf g. Indeed, by Fubini’s theorem, for test functions ϕ and ψ one ⊗ ⊗ has

(T T ) , ϕ ψ = T , ϕ T , ψ = f(x)ϕ(x) dx g(y)ψ(y) dy h f ⊗ g ⊗ i h f i h g i

n m Ê ÊZ Z

= f(x)g(y) ϕ(x)ψ(y) dxdy = Tf g , ϕ ψ . h ⊗ ⊗ i n+m

Ê Z (b) δ T , η = T , η(x ,y) . Indeed, h x0 ⊗ i h 0 i δ T,ϕ(x)ψ(y) = δ , ϕ(x) T , ψ = ϕ(x ) T , ψ(y) = T,ϕ(x )ψ(y) . h x0 ⊗ i h x0 i h i 0 h i h 0 i In particular,

(δa Tg)(η)= g(y)η(a, y) dy. m

⊗ Ê

n m Z Æ (c) For any α Æ , β , ∈ 0 ∈ 0 Dα+β(T S)=(DαT ) (DβS)= Dβ((DαT ) S)= Dα(T DβS). ⊗ x ⊗ y ⊗ ⊗

Idea of proof in case n = m =1. Let ϕ, ψ D(Ê). Then ∈ ∂ ∂ (T S)(ϕ ψ)= (T S) (ϕ ψ) ∂x ⊗ ⊗ − ⊗ ∂x ⊗   = (T S)(ϕ′ ψ)= T (ϕ′)S(ψ)= T ′(ϕ)S(ψ) − ⊗ ⊗ − =(T ′ S)(ϕ ψ). ⊗ ⊗ 16.3.3 Convolution Product Motivation: Knowing the fundamental solution E of a linear differential operator L, that is L(E) = δ, one can immediately has the solution of the equation L[u] = f for an arbitrary f, namely u = E f where the is the convolution product already defined for functions in ∗ ∗ Definition 16.2. 16.3 Tensor Product and Convolution Product 435

(a) Convolution Product of Functions

The main problem with convolutions is: we run into trouble with the support. Even in case that f and g are locally integrable, f g need not to be a locally integrable function. However, there ∗ are three cases where all is fine: 1. One of the two functions f or g has compact support. 2. Both functions have support in [0, + ). 1 ∞ 3. Both functions are in L (Ê).

In the last case (f g)(x)= f(y)g(x y) dy is again integrable. The convolution product is ∗ − 1 n a commutative and associative operation on L (Ê ). R

(b) Convolution Product of Distributions

1 n Let us consider the case of regular distributions. Suppose that If f,g,f g Lloc(Ê ). n ∗ ∈ As usual we want to have Tf Tg = Tf g. Let ϕ D(Ê ), ∗ ∗ ∈

Tf g , ϕ = (f g)(x)ϕ(x) dx = f(y)g(x y)ϕ(x) dxdy ∗

h i Ê ∗ − 2

Z ZZÊ

= f(y)g(t)ϕ(y + t) dy dt = Tf g(˜ϕ), (16.7) t=x y ⊗ − 2

ZZÊ where ϕ˜(y, t)= ϕ(y + t). There are two problems: (a) in general ϕ˜ is not

 a test function since it has unbounded support y  2n in Ê . Indeed, (y, t) suppϕ ˜ if y + t = c  ∈ ∈  supp ϕ, which is a family of parallel lines form-   ϕ supp (x+y) ing an unbounded strip. (b) the integral does not   exist. We overcome the second problem if we   impose the condition that the set     2n  Kϕ = (y, t) Ê y supp Tf , t supp Tg,  x { ∈ | ∈ ∈  supp ϕ y + t supp ϕ  ∈ }   n is bounded for any ϕ D(Ê ); then the inte- ∈ gral (16.7) makes sense. We want to solve the problem (a) by “cutting” ϕ˜. Define

Tf g(ϕ) = lim (Tf Tg)(ϕ(y + t)ηk(y, t)), ∗ k →∞ ⊗

2n 2n Ê where ηk 1 as k and ηk D(Ê ). Such a sequence exists; let η(y, t) D( ) n−→ → ∞ ∈ ∈ →∞ 2 2 y t with η(y, t)=1 for y + t 1. Put ηk(y, t)= η , , k Æ. Then limk ηk(y, t)=1 k k →∞ 2n k k k k ≤ ∈ for all (y, t) Ê . ∈  436 16 Distributions

n n Ê Definition 16.12 Let T,S D′(Ê ) be distributions and assume that for every ϕ D( ) the ∈ ∈ set 2n

K := (x, y) Ê x + y supp ϕ, x supp T, y supp S ϕ { ∈ | ∈ ∈ ∈ } is bounded. Define

T S,ϕ = lim T S,ϕ(x + y)ηk(x, y) . (16.8) k h ∗ i →∞ h ⊗ i T S is called the convolution product of the distributions S and T . ∗ Remark 16.7 (a) The sequence (16.8) becomes stationary for large k such that the limit exists.

Indeed, for fixed ϕ, the set K is bounded, hence there exists k Æ such that η (x, y)=1 for ϕ 0 ∈ k all x, y Kϕ and all k k0. That is ϕ(x + y)ηk(x, y) does not change for k k0. ∈ ≥ n ≥ (b) The limit is a distribution in D′(Ê )

(c) The limit does not depend on the special choice of the sequence ηk.

Remarks 16.8 (Properties) (a) If S or T has compact support, then T S exists. Indeed, ∗ suppose that supp T is compact. Then x + y supp ϕ and x supp T imply y supp ϕ ∈ ∈ ∈ − supp T = y y y supp ϕ,y supp T . Hence, (x, y) K implies { 1 − 2 | 1 ∈ 2 ∈ } ∈ ϕ (x, y) x + y x + y + y 2C + D k k≤k k k k≤k k k 1k k 2k ≤ if supp T U (0) and supp ϕ U (0). That is, K is bounded. ⊂ C ⊂ D ϕ (b) If S T exists, so does T S and S T = T S. ∗ ∗ ∗ ∗ (c) If T S exists, so do DαT S, T DαS, Dα(T S) and they coincide: ∗ ∗ ∗ ∗ Dα(T S)= DαT S = T DαS. ∗ ∗ ∗ α d Proof. For simplicity let n =1 and D = . Suppose that ϕ D(Ê) then dx ∈

(S T )′ , ϕ = S T,ϕ′ = lim S T,ϕ′(x + y) ηk(x, y) k h ∗ i −h ∗ i − →∞ h ⊗ i ∂ ∂ηk = lim S T , (ϕ(x + y)ηk(x, y)) ϕ(x + y) − k ⊗ ∂x − ∂x →∞   ∂ηk = lim S′ T,ϕ(x + y)ηk(x, y) lim S T,ϕ(x + y) k k ∂x →∞ h ⊗ i − →∞ * ⊗ + =0 for large k

= S′ T,ϕ h ∗ i |{z} The proof of the second equality uses commutativity of the convolution product.

n (d) If supp S is compact and ψ D(Ê ) such that ψ(y)=1 in a neighborhood of supp S. ∈ Then n

(T S)(ϕ)= T S,ϕ(x + y)ψ(y) , ϕ D(Ê ). ∗ h ⊗ i ∀ ∈ n (e) If T , T , T D′(Ê ) all have compact support, then T (T T ) and (T T ) T exist 1 2 3 ∈ 1 ∗ 2 ∗ 3 1 ∗ 2 ∗ 3 and T (T T )=(T T ) T . 1 ∗ 2 ∗ 3 1 ∗ 2 ∗ 3 16.3 Tensor Product and Convolution Product 437

16.3.4 Linear Change of Variables Suppose that y = Ax + b is a regular, linear change of variables; that is, A is a regular n n × matrix. As usual, consider first the case of a regular distribution f(x). Let f˜(x) = f(Ax + b) 1 with y = Ax + b, x = A− (y b), dy = det A dx. Then − f˜(x) , ϕ(x) = f(Ax + b)ϕ(x) dx D E Z 1 1 = f(y)ϕ(A− (y b)) dy − det A Z | | 1 1 = f(y) , ϕ(A− (y b)) . det A − | |

n n Ê Definition 16.13 Let T D′(Ê ), A a regular n n-matrix and b . Then T (Ax + b) ∈ × ∈ denotes the distribution

1 1 T (Ax + b) , ϕ(x) := T (y) , ϕ(A− (y b)) . h i det A − | | For example, T (x)= T , T (x a) , ϕ(x) = T,ϕ (x + a) , in particular, δ(x a)= δ . h − i h i − a δ(x b) , ϕ(x) = δ(x) , ϕ(x + b) = ϕ(0 + b)= ϕ(b)= δ , ϕ . h − i h i h b i

Example 16.9 (a) δ S = S δ = S for all S D′. The existence is clear since δ has compact ∗ ∗ ∈ support.

(δ S) , ϕ = lim δ(x) S(y) , ϕ(x + y)ηk(x, y) k h ∗ i →∞ h ⊗ i = lim S(y) , ϕ(y)ηk(0,y) = S,ϕ k →∞ h i h i (b) δ S = S(x a). Indeed, a ∗ − (δa S)(ϕ) = lim δa S,ϕ(x + y)ηk(x, y) k ∗ →∞ h ⊗ i = lim S(y) , ϕ(a + y)ηk(a, y) = S(y) , ϕ(a + y) = S(y a) , ϕ . k →∞ h i h i h − i Inparticular δa δb = δa+b. 1 ∗ n (c) Let ̺ L (Ê ) and supp T is compact. ∈ loc f 1 1 2 Case n =2 f(x) = log x Lloc(Ê ). We call k k ∈ 1 V (x)=(̺ f)(x)= ̺(y)log dy 2

∗ Ê x y ZZ k − k surface potential with density ̺.

1 1 n Case n 3 f(x)= n L (Ê ). We call x −2 loc ≥ k k ∈ 1 V (x)=(̺ f)(x)= ̺(y) n 2 dy n ∗ Ê x y − Z k − k vector potential with density ̺.

x2 1 2 (d) For α> 0 and x Ê put f (x)= e− 2α . Then f f = f . ∈ α α√2π α ∗ β √α2+β2 438 16 Distributions

16.3.5 Fundamental Solutions

n Suppose that L[u] is a linear differential operator on Ê ,

α L[u]= cα(x)D u, α k | X|≤ n where c C∞(Ê ). α ∈ n Definition 16.14 A distribution E D′(Ê ) is said to be a fundamental solution of the differ- ∈ ential operator L if L(E)= δ.

n Note that E D′(Ê ) need not to be unique. It is a general result due to Malgrange and Ehren- ∈ preis (1952) that any linear partial differential operator with constant coefficients possesses a fundamental solution.

(a) ODE

We start with an example from the theory of ordinary differential equations. Recall that H =

χ(0,+ ) denotes the Heaviside function. ∞ Lemma 16.7 Suppose that u(t) is a solution of the following initial value problem for the ODE

(m) (m 1) L[u]= u + a (t)u − + + a (t)u =0, 1 ··· m (m 2) u(0) = u′(0) = = u − (0) = 0, ··· (m 1) u − (0) = 1.

Then E = TH(t) u(t) is a fundamental solution of L, that is, it satisfies L(E)= δ. Proof. Using Leibniz’ rule, Example16.5(b), and u(0) = 0 we find

E′ = δ Tu + THu′ = u(0)δ + THu′ = THu′ .

Similarly, on has

(m 1) (m) E′′ = THu′′ ,..., E − = THu(m−1) , E = THu(m) + δ(t). This yields

(m) (m 1) L(E)= E + a (t)E − + + a (t)E(t)= T + δ = T + δ = δ. 1 ··· m H(t) L(u(t)) 0

Example 16.10 We have the following example of fundamental solutions:

y′ + ay =0, E = TH(x)e−ax , 2 y′′ + a y =0, E = T sin ax . H(x) a

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) 439

(b) PDE

Here is the main application of the convolution product: knowing the fundamental solution of a partial differential operator L one immediately knows a weak solution of the inhomogeneous n equations L(u)= f for f D′(Ê ). ∈ α n Theorem 16.8 Suppose that L[u] = cαD u is a linear differential operator in Ê with α k | X|≤ n constant coefficients cα. Suppose further that E D′(Ê ) is a fundamental solution of L. Let n ∈ f D′(Ê ) be a distribution such that the convolution product S = E f exists. ∈ ∗ Then L(S)= f in D′. In the set of distributions of D′ which possess a convolution with E, S is the unique solution of L(S)= f

Proof. By Remark 16.8 (b) we have

L(S)= c Dα(E f)= c Dα(E) f = L(E) f = δ f = f. a ∗ a ∗ ∗ ∗ α k α k | X|≤ | X|≤

Suppose that S1 and S2 are both solutions of L(S)= f, i.e. L(S1)= L(S2)= f. Then

S S =(S S ) δ =(S S ) c DαE = 1 − 2 1 − 2 ∗ 1 − 2 ∗ a α k | X|≤

= c Dα(S S ) E =(f f) E =0. (16.9)  a 1 − 2  ∗ − ∗ α k | X|≤  

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( )

We want to define the Fourier transformation for test functions ϕ as well as for distributions. n The problem with D(Ê ) is that its Fourier transformation

ixξ Fϕ(ξ)= αn e− ϕ(x) dx ZÊ

of ϕ is an entire (analytic) function with real support Ê. That is, Fϕ does not have compact support. The only test function in D which is analytic is 0. To overcome this problem, we enlarge the space of test function D S in such a way that S becomes invariant under the ⊂ Fourier transformation F(S ) S . ⊂ itz Lemma 16.9 Let ϕ D(Ê). Then the Fourier transform g(z) = αn e− ϕ(t) dt is holo- ∈ Ê morphic in the whole complex plane and bounded in any half-plane H = z C Im (z) aR { ∈ | ≤ a . } 440 16 Distributions

Proof. (a) We show that the complex limit limh 0(g(z + h) g(z))/h exists for all z C. → − ∈ Indeed, iht g(z + h) g(z) izt e− 1 − = αn e− − ϕ(t) dt.

h Ê h Z izt e−iht 1 Since e− − ϕ(t) C for all x supp (ϕ), h C, h 1, we can apply Lebesgue’s h ≤ ∈ ∈ | | ≤ Dominated Convergence theorem:

iht g(z + h) g(z) izt e− 1 izt lim − = αn e− lim − ϕ(t) dt = αn e− ( it)ϕ(t) dt = F( itϕ(t)).

h 0 h 0 Ê h Ê h − − → Z → Z (b) Suppose that Im (z) a. Then ≤

it Re (z) t Im (z) ta g(z) αn e− e ϕ(t) dt αn sup ϕ(t) e dt,

| | ≤ Ê ≤ t K | | K Z ∈ Z where K is a compact set which contains supp ϕ.

n 16.4.1 The Space S (Ê )

n n Ê Definition 16.15 S (Ê ) is the set of all functions f C∞( ) such that for all multi-indices ∈ α and β β α pα,β(f) = sup x D f(x) < . n x Ê ∞ ∈ S is called the Schwartz space or the space of rapidly decreasing functions.

n n Ê S (Ê )= f C∞( ) α, β : P (f) < . { ∈ | ∀ α,β ∞} Roughly speaking, a Schwartz space function is a function decreasing to 0 (together with all its 1 partial derivatives) faster than any rational function as x . In place of p one can P (x) → ∞ α,β also use the norms

p (ϕ)= p (ϕ), k,l Z k,l α,β ∈ + α k, β l, | |≤X| |≤ n to describe S (Ê ). n The set S (Ê ) is a linear space and pα,β are norms on S . x 2 n For example, P (x) S for any non-zero polynomial P (x); however e−k k S (Ê ). n 6∈ ∈ S (Ê ) is an algebra. Indeed, the generalized Leibniz rule ensures pkl(ϕ ψ) < . For ax2+bx+c · ∞ x example, f(x)= p(x)e− , a> 0, belongs to S (Ê) for p is a polynomial; g(x) = e−| |

is not differentiable at 0 and hence not in S (Ê).

Convergence in S

Definition 16.16 Let ϕ ,ϕ S . We say that the sequence (ϕ ) converges in S to ϕ, abbrevi- n ∈ n ated by ϕ ϕ, if one of the following equivalent conditions is satisfied for all multi-indices n −→S

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) 441

α and β:

pα,β(ϕ ϕn) 0; n − −→→∞ β α n

x D (ϕ ϕ ) 0, uniformly on Ê ; − n −→−−→ β α β α n

x D ϕ x D ϕ, uniformly on Ê . n −→−−→ Remarks 16.9 (a) In quantum mechanics one defines the position and momentum operators

Qk and Pk, k =1,...,n, by ∂ϕ (Qkϕ)(x)= xkϕ(x), (Pkϕ)(x)= i , − ∂xk

β α respectively. The space S is invariant under both operators Qk and Pk; that is x D ϕ(x)

n n ∈ Ê S (Ê ) for all ϕ S ( ).

n 1 ∈ n Ê (b) S (Ê ) L ( ). ⊂ Recall that a rational function P (x)/Q(x) is integrable over [1, + ) if and only if Q(x) = 0 ∞ 6 for x 1 and deg Q deg P +2. Indeed, C/x2 is then an integrable upper bound. We want ≥ ≥ to find a condition on m such that dx 2 < . n m Ê (1 + x ) ∞ Z k k n For, we use that any non-zero x Ê can uniquely be written as x = ry where r = x and y n 1 ∈ n 1 k k is on the unit sphere S − . One can show that dx1 dx2 dxm = r − dr dS where dS is the n 1 ··· surface element of the unit sphere S − . Using this and Fubini’s theorem,

n 1 n 1 dx ∞ r − dr dS ∞ r − dr 2 = 2 m = ωn 1 2 m , n m n−1 − Ê (1 + x ) 0 S (1 + r ) 0 (1 + r ) Z k k Z Z Z n 1 where ωn 1 is the (n 1)-dimensional measure of the unit sphere S − . By the above criterion, − − the integral is finite if and only if 2m n +1 > 1 if and only if m>n/2. In particular, − dx n+1 < . n Ê 1+ x ∞ Z k k In case n =1 the integral is π. By the above argument

2n dx ϕ(x) dx = (1 + x )ϕ(x) 2n

n n Ê Ê | | k k 1+ x Z Z k k dx Cp0,2n(ϕ) 2n < . n ≤ Ê 1+ x ∞ Z k k

n n n

Ê Ê (c) D(Ê ) S ( ); indeed, the supremum p (ϕ) of any test function ϕ D( ) is finite ⊂ α,β ∈ since the supremum of a continuous function over a compact set is finite. On the other hand, x 2 D ( S since e−k k is in S but not in D.

n n Ê (d) In contrast to D(Ê ), the rapidly decreasing functions S ( ) form a metric space. Indeed, n S (Ê ) is a locally convex space, that is a linear space V such that the topology is given by a 442 16 Distributions

set of semi-norms pα separating the elements of V , i.e. pα(x)=0 for all α implies x = 0. Any locally convex linear space where the topology is given by a countable set of semi-norms

is metrizable. Let (pn)n Æ be the defining family of semi-norms. Then ∈

∞ 1 p (ϕ ψ) d(ϕ, ψ)= n − , ϕ,ψ V 2n 1+ p (ϕ ψ) ∈ n=1 n X − defines a metric on V describing the same topology. (In our case, use Cantor’s first diagonal

method to write the norms p , k, l Æ, from th array into a sequence p .) kl ∈ n

1 n Definition 16.17 Let f(x) L (Ê ), then the Fourier transform Ff of the function f(x) is ∈ given by ˆ 1 ix ξ Ff(ξ)= f(ξ)= n e− · f(x) dx, n √2π Ê Z where x =(x ,...,x ), ξ =(ξ ,...,ξ ), and x ξ = n x ξ . 1 n 1 n · k=1 k k 1 Let us abbreviate the normalization factor, α = Pn . Caution, Wladimirow, [Wla72] uses n √2π +iξ x another convention with e · under the integral and normalization factor 1 in place of αn.

Note that Ff(0) = αn n f(x) dx. Ê

Example 16.11 WeR calculate the Fourier transform Fϕ of the function x 2/2 1 x x n ϕ(x) = e−k k = e− 2 · , x Ê . ∈ (a) n =1. From complex analysis, Lemma14.30, we know

2 2 2 x 1 x ixξ ξ F e− 2 (ξ)= e− 2 e− dx = e− 2 . (16.10)

√2π Ê   Z (b) Arbitrary n. Thus,

1 n 2 n Pk x i Pk xkξk Fϕ(ξ)=ϕ ˆ(ξ)= αn e− 2 =1 k e− =1 dx

n Ê n Z 1 2 x ixkξk = αn e− 2 k− dx1 dxn n

Ê ··· Z k=1 n Y 1 2 x ixkξk = αn e− 2 k− dxk k=1 Z Y n 1 ξ2 1 ξ2 Fϕ(ξ) = e− 2 k = e− 2 . (16.10) Yk=1 1 x2 Hence, the Fourier transform of e− 2 is the function itself. It follows via scaling x cx that 7→ 2 c2x2 1 ξ F e− 2 (ξ)= e− 2c2 . cn   n Theorem 16.10 Let ϕ, ψ S (Ê ). Then we have ∈ α α α (i) F(x ϕ(x))=i| | D (Fϕ), that is F◦Q = P ◦F, k =1,...,n. k − k

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) 443

α α α (ii) F(D ϕ(x))(ξ) = i| |ξ (Fϕ)(ξ), that is F◦Pk = Qk◦F, k =1,...,n.

n (iii) F(ϕ) S (Ê ), moreover ϕ ϕ implies Fϕ Fϕ, that is, the Fourier transform ∈ n −→S n −→S F is a continuous linear operator on S .

1 (iv) F(ϕ ψ)= α− F(ϕ) F(ψ). ∗ n (v) F(ϕ ψ)= α F(ϕ) F(ψ) · n ∗ (vi) 1 iA−1b ξ F(ϕ(Ax + b))(ξ)= e · Fϕ A−⊤ξ , det A | |  1 where A is a regular n n matrix and A−⊤ denotes the transpose of A− . In particular, × 1 ξ F(ϕ(λx))(ξ)= (Fϕ) , λ n λ   |ib ξ| F(ϕ(x + b))(ξ) = e · (Fϕ)(ξ).

Proof. (i) We carry out the proof in case α = (1, 0,..., 0). The general case simply follows.

∂ ∂ iξ x (Fϕ)(ξ)= αn e− · ϕ(x) dx. n

∂ξ ∂ξ Ê 1 1 Z ∂ iξ x iξ x Since e− · ϕ(x)= ix1e− · ϕ(x) tends to 0 as x , we can exchange partial differ- ∂ξ1 − →∞ entiation and integration, see Proposition12.23 Hence, 

∂ iξ x (Fϕ)(ξ)= αn e− · ix1ϕ(x) dx = F( ix1ϕ(x))(ξ). n

∂ξ − Ê − 1 Z (ii) Without loss of generality we again assume α = (1, 0,..., 0). Using integration by parts, we obtain

∂ iξ x ∂ iξ x F ϕ(x) (ξ)= αn e− · ϕx1 (x) dx = αn e− · ϕ(x) dx

n n Ê ∂x Ê − ∂x  1  Z Z 1 iξ x  = iξ1αn e− · ϕ(x) dx = iξ1 (Fϕ)(ξ). n ZÊ  (iii) By (i) and (ii) we have for α k and β l | | ≤ | | ≤

α β α β l γ ξ D Fϕ αn D (x ϕ(x)) dx c1 (1 + x ) D ϕ(x) dx

n n Ê ≤ Ê ≤ k k | | Z Z γ k |X|≤ l+n+1 (1 + x ) γ c2 k k n+1 D ϕ(x) dx n ≤ Ê (1 + x ) | | Z γ k k k |X|≤

l+n+1 γ c3 sup (1 + x ) D ϕ(x) n ≤ x Ê  k k | | ∈ γ k |X|≤ c p  (ϕ).  ≤ 4 k,l+n+1 444 16 Distributions

n This implies Fϕ S (Ê ) and, moreover F : S S being continuous. ∈ 1 n → (iv) First note that L (Ê ) is an algebra with respect to the convolution product where

f g 1 f 1 g 1 . Indeed, k ∗ kL ≤k kL k kL

f g L1 = f g dx = f(y)g(x y) dy dx

n n n

Ê Ê k ∗ k Ê | ∗ | | − | Z Z Z  f(y) g(x y) dx dy

n n Ê ≤ Ê | | | − | Z Z 

g L1 f(y) dy = f L1 g L1 . n

≤k k Ê | | k k k k Z n This in particular shows that f g(x) < is finite a.e. on Ê . By definition and Fubini’s | ∗ | ∞ theorem we have

ix ξ F(ϕ ψ)(ξ)= αn e− · ϕ(y)ψ(x y) dy dx

n n Ê ∗ Ê − Z Z 1 i(x y) ξ iy ξ = αn− αn e− − · ψ(x y) dx αne− · ϕ(y) dy

n n Ê Ê − Z  Z  1 = αn− Fψ(ξ) Fϕ(ξ). z=x y − (v) will be done later, after Proposition16.11. 1 (vi) is straightforward using A− (y) , ξ = y , A−⊤(ξ) . h i

1 n Remark 16.10 Similar properties as F has the operator G which is also defined on L (Ê ):

+ix ξ Gϕ(ξ)=ϕ ˇ(ξ)= αn e · ϕ(x) dx. n Z Ê Put ϕ (x) := ϕ( x). Then − − Gϕ = Fϕ = F(ϕ) and Fϕ = Gϕ = G(ϕ). − − It is easy to see that (iv) holds for G, too, that is

1 G(ϕ ψ)= α− G(ϕ)G(ψ). ∗ n

Proposition 16.11 (Fourier Inversion Formula) The Fourier transformation is a one-to-one

n n Ê mapping of S (Ê ) onto S ( ). The inverse Fourier transformation is given by Gϕ:

n

F(Gϕ)= G(Fϕ)= ϕ, ϕ S (Ê ). ∈ 2 2 x·x ε x 1 x 2 2 F Proof. Let ψ(x) = e− and Ψ(x) = ψ(εx) = e− . Then Ψε(x) := Ψ(x) = εn ψ ε . We have  1 x ˆ αn Ψε(x) dx = αn n ψ dx = αn ψ(x) dx = ψ(0) = 1.

n n n

Ê Ê Ê ε ε Z Z   Z

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) 445

Further, 1 1 1 n Ψε(x)ϕ(x) dx = n ψ(x)ϕ(εx) dx n ψ(x)ϕ(0)dx = ϕ(0).

n n ε 0 n

Ê Ê √2π Ê √2π −→ √2π Z Z → Z (16.11)

In other words, αnΨε(x) is a δ-sequence. We compute G(FϕΨ)(x). Using Fubini’s theorem we have

1 i x ξ 1 i x ξ i ξ y G F F · · − · ( ϕΨ)(x)= n ( ϕ)(ξ)Ψ(ξ)e dξ = n Ψ(ξ)e e ϕ(y) dydξ

n n n

Ê Ê √2π Ê (2π) Z Z Z 1 1 i(y x) ξ = n ϕ(y) n e− − · Ψ(ξ)dξ dy

n n Ê √2π Ê √2π Z Z 1 = n ϕ(y)(FΨ)(y x) dy n √2π Ê − Z 1 1 = n FΨ(z) ϕ(z + x) dz = n Ψε(z) ϕ(z + x) dz

z:=y x n n Ê √2π Ê √2π − Z Z ϕ(x). as ε 0, see (16.11) → −→ On the other hand, by Lebesgue’s dominated convergence theorem,

i x ξ i x ξ lim G(Fϕ Ψ)(x)= αn (Fϕ)(ξ)ψ(0)e · dξ = αn (Fϕ)(ξ)e · dξ = G(Fϕ)(x).

ε 0 n n Ê · Ê → Z Z This proves the first part. The second part F(Gϕ) = ϕ follows from G(ϕ) = F(ϕ), F(ϕ) = G(ϕ), and the first part.

We are now going to complete the proof of Theorem16.10(v). For, let ϕ = Gϕ1 and ψ = Gψ1 with ϕ , ψ S . By (iv) we have 1 1 ∈ F(ϕ ψ)= F(G(ϕ )G(ψ )) = F(α G(ϕ ψ )) = α ϕ ψ = α Fϕ Fψ. · 1 1 n 1 ∗ 1 n 1 ∗ 1 n ∗ n Proposition 16.12 (Fourier–Plancherel formula) For ϕ, ψ S (Ê ) we have ∈ ϕ ψ dx = F(ϕ) F(ψ) dx,

n n Ê Z Ê Z

In particular, ϕ 2 n = F(ϕ) 2 n Ê k kL (Ê ) k kL ( ) Proof. First note that

ix y ix y F(ψ)( y)= αn e · ψ(x) dx = αn e− · ψ(x) dx = F(ψ)(y). (16.12)

n n Ê − Ê Z Z By Theorem 16.10 (v),

1 ϕ(x) ψ(x) dx = αn− F(ϕ ψ)(0) = (F(ϕ) F(ψ))(0) n

Ê · ∗ Z = F(ϕ)(y) F(ψ)(0 y) dy = F(ϕ)(y) F(ψ)(y) dy.

n (16.12) n Ê Ê − Z Z 446 16 Distributions

n 2 n Ê Remark 16.11 S (Ê ) L ( ) is dense. Thus, the Fourier transformation has a unique

⊂ 2 n 2 n 2 Ê extension to a unitary operator F : L (Ê ) L ( ). (To a given f L choose a sequence → ∈ ϕ S converging to f in the L2-norm. Since F preserves the L2-norm, Fϕ Fϕ = n ∈ k n − mk ϕ ϕ and (ϕ ) is a Cauchy sequence in L2, (Fϕ ) is a Cauchy sequence, too; hence it k n − mk n n converges to some g L2. We define F(f)= g.) ∈

n 16.4.2 The Space S ′(Ê ) Definition 16.18 A tempered distribution (or slowly increasing distribution) is a continuous n linear functional T on the space S (Ê ). The set of all tempered distributions is denoted by n S ′(Ê ). n A linear functional T on S (Ê ) is continuous if for all sequences ϕ S with ϕ 0, n ∈ n −→S T,ϕ 0. h ni→

For ϕn D with ϕn 0 it follows that ϕn 0. So, every continuous linear functional ∈ −→D −→S on S (restricted to D) is continuous on D. Moreover, the mapping ι: S ′ D′, ι(T ) = T ↾D → is injective since T (ϕ)=0 for all ϕ D implies T (ψ)=0 for all ψ S which follows ∈ ∈ from D S is dense and continuity of T . Using the injection ι, we can identify S ′ with as a ⊂ subspace of D′, S ′ D′. That is, every tempered distribution “is” a distribution. ⊆ n Lemma 16.13 (Characterization of S ′) A linear functional T defined on S (Ê ) belongs to n S ′(Ê ) if and only if there exist non-negative integers k and l and a positive number C such n that for all ϕ S (Ê ) ∈ T (ϕ) Cp (ϕ), | | ≤ kl where pkl(ϕ)= pαβ(ϕ). α k β l, | |≤X| |≤

Remarks 16.12 (a) With the usual identification f Tf of functions and regular distributions,

1 n n 2 n n ↔

Ê Ê Ê L (Ê ) S ′( ), L ( ) S ′( ). 1 ⊆ ⊆ x2 (b) L S ′, for example T S ′(Ê), f(x) = e , since T (ϕ) is not well-defined for all loc 6⊂ f 6∈ f x2 Schwartz function ϕ, for example Tex2 e− =+ .

n ∞ n Ê (c) If T D′(Ê ) and supp T is compact then T S ′( ). ∈ ∈

(d) Let f(x) be measurable. If there exist C > 0 and m Æ such that ∈ 2 m n

f(x) C(1 + x ) a. e. x Ê . | | ≤ k k ∈

Then T S ′. Indeed, the above estimate and Remark16.9 imply f ∈

2 m 2 n 1 Tf , ϕ = f(x)ϕ(x) dx C (1 + x ) (1 + x ) 2 ϕ(x) dx

n n n Ê |h i| Ê ≤ k k k k (1 + x ) | | Z Z k k dx Cp0,2m+2n(ϕ) 2 . n n ≤ Ê (1 + x ) Z k k By Lemma16.13, f is a tempered regular distribution, f(x) S ′. ∈

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) 447

Operations on S ′

The operations are defined in the same way as in case of D′. One has to show that the result is again in the (smaller) space S ′. If T S ′ then α ∈ (a) D T S ′ for all multi-indices α. ∈ n α (b) f T S ′ for all f C∞(Ê ) such that D f growth at most polynomially at infinity for all · ∈ ∈ α multi-indices α, (i.e. for all multi-indices α there exist Cα > 0 and kα such that D f(x) kα | | ≤ Cα(1 + x ) . ) k k n (c) T (Ax + b) S ′ for any regular real n n- matrix and b Ê .

n∈ m × n+∈m

Ê Ê (d) T S ′(Ê ) and S S ′( ) implies T S S ′( ).

∈ n ∈ n ⊗ ∈ Ê (e) Let T S ′(Ê ), ψ S ( ). Define the convolution product ∈ ∈ n

ψ T,ϕ = (1(x) T (y)) , ψ(x)ϕ(x + y) , ϕ S (Ê ) h ∗ i h ⊗ i ∈ = T , ψ(x)ϕ(x + y) dx n  ZÊ  Note that this definition coincides with the more general Definition 16.12 since

2n lim ψ(x)ϕ(x + y)ηk(x, y)= ψ(x)ϕ(x + y) S (Ê ). k →∞ ∈

n 16.4.3 Fourier Transformation in S ′(Ê )

We are following our guiding principle to define the Fourier transform of a distribution T S ′: ∈ First consider the case of a regular tempered distribution. We want to define F(Tf ) := TFf . 1 n Suppose that f(x) L (Ê ) is integrable. Then its Fourier transformation Ff exists and is a ∈ bounded continuous function:

F iξ x f(ξ) αn e · f(x) dx = αn f(x) dx = αn Tf L1 < .

n n Ê | | ≤ Ê | | k k ∞ Z Z

By Remark 16.12 (d), Ff defines a distribution in S ′. By Fubini’s theorem

iξ x TFf , ϕ = Ff(ξ)ϕ(ξ)dξ = αn f(x)e− · ϕ(ξ)dξ dx n

h i Ê 2n

Z ÊZZ

= f(x) Fϕ(x) dx = Tf , Fϕ . n

Ê h i Z

Hence, TF , ϕ = T , Fϕ . We take this equation as the definition of the Fourier transfor- h f i h f i mation of a distribution T S ′. ∈

Definition 16.19 For T S ′ and ϕ S define ∈ ∈ FT,ϕ = T , Fϕ . (16.13) h i h i We call FT the Fourier transform of the distribution T . 448 16 Distributions

n Since Fϕ S (Ê ), FT it is well-defined on S . Further, F is a linear operator and T a ∈ linear functional, hence FT is again a linear functional. We show that FT is a continuous linear functional on S . For, let ϕ ϕ in S . By Theorem16.10, Fϕ Fϕ. Since T is n −→S n −→S continuous, FT,ϕ = T , Fϕ T , Fϕ = FT,ϕ h ni h ni −→h i h i which proves continuity.

n n Ê Lemma 16.14 The Fourier transformation F : S ′(Ê ) S ′( ) is a continuous bijection. → Proof. (a) We show continuity (see also homework 53.3). Suppose that T T in S ′; that is n → for all ϕ S , T , ϕ T,ϕ . Hence, ∈ h n i→h i

FTn , ϕ = Tn , Fϕ T , Fϕ = FT,ϕ n h i h i −→→∞ h i h i which proves the assertion.

(b) We define a second transformation G: S ′ S ′ via → GT,ϕ := T , Gϕ h i h i and show that F◦G = G◦F = id on S ′. Taking into account Proposition16.11 we have

G(FT ) , ϕ = FT , Gϕ = T , F(Gϕ) = T,ϕ ; h i h i h i h i thus, G◦F = id . The proof of the direction F◦G = id is similar; hence, F is a bijection.

Remark 16.13 All the properties of the Fourier transformation as stated in Theorem16.10 (i), α α α (ii), (iii), (iv), and (v) remain valid in case of S ′. In particular, F(x T ) = i| | D (FT ). n Indeed, for ϕ S (Ê ), by Theorem16.10(ii) ∈ α α α α α F(x T )(ϕ)= x T , Fϕ = T , x Fϕ = T , ( i)| |F(D ϕ) h i h i − α α α α α =( 1)| |( i)| | D (FT ) , ϕ = i| |D T,ϕ . − − h i

n n Ê Example 16.12 (a) Let a Ê . We compute F δ . For ϕ S ( ), ∈ a ∈

ix a −ix·a Fδa(ϕ)= δa(Fϕ)=(Fϕ)(a)= αn e− ϕ(x) dx = Tαne (ϕ). n Ê · Z ix a Hence, Fδa is the regular distribution corresponding to f(x) = αne− · . In particular, F(δ) = 1 T is the constant function. Note that F(δ) = G(δ) = n T . Moreover, F(T ) = G(T ) = αn1 √2π 1 1 1 1 αn− δ. (b) n =1, b> 0.

ixξ F(H(b x )) = α1 e− H(b x ) dx

−| | Ê −| | Z b ixξ 2 sin(bξ) = α1 e− dx = . b √2π ξ Z−

n n Ê 16.4 Fourier Transformation in S (Ê ) and S ′( ) 449

(c) The single-layer distribution. Suppose that S is a compact, regular, piecewise differentiable,

3 1 3 Ê non self-intersecting surface in Ê and ̺(x) L ( ) is a function on S (a density function ∈ loc or distribution—in the physical sense). We define the distribution ̺δS by the scalar surface integral ̺δ , ϕ = ̺(x)ϕ(x) dS. h S i ZZS

The support of ̺δS is S, a set of measure zero with respect to the 3-dimensional Lebesgue measure. Hence, ̺δS is a singular distribution. Similarly, one defines the double-layer distribution (which comes from dipoles) by

∂ ∂ϕ(x) (̺δ ) , ϕ = ̺(x) dS, −∂~n S ∂~n   ZZS where ~n denotes the unit normal vector to the surface.

We compute the Fourier transformation of the single layer F(̺δS) in case of a sphere of radius 3 r, S = x Ê x = r and density ̺ =1. By Fubini’s theorem, r { ∈ |k k } 1 F F ix ξ δSr , ϕ = δSr(0) , ϕ 3 e− · ϕ(x) dx dSξ 3 h i √ S Ê 2π ZZ r Z 

1 = 3 (cos(x ξ) i sin(x ξ)) dSξ ϕ(x) dx 3 √2π Ê  Sr · − ·  Z ZZ is 0   Using spherical coordinates on Sr, where x is fixed to be the| z-axis{z and}ϑ is the angle between x and ξ S , we have dS = r2 sin ϑ dϕ dϑ and x ξ = r x cos ϑ. Hence, the inner (surface) ∈ r · k k integral reads

2π π = cos( x r cos ϑ)r2 sin ϑdϑdϕ, s = x r cos ϑ, ds = x r sin ϑdϑ 0 0 k k k k −k k Z Z x r −k k r r =2π cos s ds =4π sin( x r). x r − x x k k Zk k k k k k Hence, 2r sin(r x ) FδSr , ϕ = ϕ(x) k k dx; 3 h i √2π Ê x Z k k the Fourier transformation of δSr is the regular distribution

2r sin(r x ) FδS (x)= k k . r √2π x k k 2 n (d) The Resolvent of the Laplacian ∆. Consider the Hilbert space H =L (Ê ) and its dense n − subspace S (Ê ). For ϕ S there is defined the Laplacian ∆ϕ. Recall that the resolvent of ∈ − 1 a linear operator A at λ is the bounded linear operator on H, given by R (A)=(A λI)− . λ − Given f H we are looking for u H with R (A) f = u. This is equivalent to solve ∈ ∈ λ 450 16 Distributions f = (A λI)(u) for u. In case of A = ∆ we can apply the Fourier transformation to solve − − this equation. By Theorem16.10(ii)

n ∂2 ∆u λu = f, F u λFu = Ff, − − − ∂x2 − k=1 k ! n X ξ2(Fu)(ξ) λFu(ξ)=(Fu)(ξ)(ξ2 λ)= Ff(ξ) k − − Xk=1 Ff(ξ) Fu(ξ)= ξ2 λ − 1 u(x)= G Ff(ξ) (x) ξ2 λ  −  Hence, 1 1 R ( ∆) = F− ◦ ◦F, λ − ξ2 λ − where in the middle is the multiplication operator by the function 1/(ξ2 λ). One can see that

− Ê this operator is bounded in H if and only if λ C such that the spectrum of ∆ satisfies ∈ \ + −

σ( ∆) Ê . − ⊂ +

16.5 Appendix—More about Convolutions

Since the following proposition is used in several places, we make the statement explicit.

n+1 n Ê Proposition 16.15 Let T (x, t) and S(x, t) be distributions in D′(Ê ) with supp T

n+1 ⊂ × Ê Ê and supp S Γ (0, 0). Here Γ (0, 0) = (x, t) x at denotes the forward + ⊂ + + { ∈ |k k ≤ } light cone at the origin. n+1 Then the convolution T S exists in D′(Ê ) and can be written as ∗ T S,ϕ = T (x, t) S(y,s) , η(t)η(s)η(as y ) ϕ(x + y, t + s) , (16.14) h ∗ i h ⊗ −k k i

n+1 Ê ϕ D(Ê ), where η D( ) with η(t)=1 for t > ε and ε > 0 is any fixed positive ∈ ∈ − number. The convolution (T S)(x, t) vanishes for t< 0 and is continuous in both components, ∗ that is

n+1 n n+1

Ê Ê Ê (a) If Tk T in D′(Ê ) and supp fk, f , then Tk S T S in D′( ).

→ n+1 ⊆ × ∗ → ∗ n+1 Ê (b) If S S in D′(Ê ) and supp S ,S Γ (0, 0), then T S T S in D′( ). k → k ⊆ + ∗ k → ∗

n+1 Ê Proof. Since η D(Ê), there exists δ > 0 with η(x)=0 for x< δ. Let ϕ(x, t) D( ) ∈ − 2n+2 ∈ with supp ϕ UR(0) for some R > 0. Let ηK(x,t,y,s), K Ê , be a sequence in

2n+2 ∈ 2n+2 → Ê D(Ê ) converging to 1 in , see before Definition16.12. For sufficiently large K we then have

ψK := η(s)η(t)η(as y )ηK (x,t,y,s)ϕ(x + y, t + s) −k k (16.15) = η(s)η(t)η(at y )ϕ(x + y, t + s) =: ψ. −k k 16.5 Appendix—More about Convolutions 451

2n+2 To prove this it suffices to show that ψ D(Ê ). Indeed, ψ is arbitrarily often differentiable ∈ and its support is contained in

(x,t,y,s) s, t δ, as y δ, x + y 2 + r + s 2 R2 , { | ≥ − −k k ≥ − k k | | ≤ } which is a bounded set. Since η(t)=1 in a neighborhood of supp T and η(s)η(as y )=1 in a neighborhood of −k k supp S, T (x, t)= η(x)T (x, t) and S(y,s)= η(s)η(as y )S(y,s). Using (16.15) we have −k k

T S,ϕ = lim T (x, t) S(y,s) , ηK (x,t,y,s)ϕ(x + y, t + s) 2n+2 h ∗ i K Ê h ⊗ i → 2n+2 = lim T (x, t) S(y,s) , ψK , ϕ D(Ê ). 2n+2 K Ê h ⊗ i ∈ → This proves the first assertion. We now prove that the right hand side of (16.14) defines a continuous linear functional on n+1 D(Ê ). Let ϕk ϕ as k . Then −→D →∞

ψk := η(t)η(s)η(as y ) ϕk(x + y, t + s) ψ −k k −→D as k . Hence, →∞ T S,ϕ = T (x, t) S(y,s) , ψ T (x, t) S(y,s) , ψ = T S,ϕ , k , h ∗ ki h ⊗ ki→h ⊗ i h ∗ i →∞ and T S is continuous.

∗ n+1 n Ê We show that T S vanishes for t< 0. For, let ϕ D(Ê ) with supp ϕ ( , δ ]. ∗ ∈ ⊆ × −∞ − 1 Choosing δ > δ1/2 one has

η(t)η(s)η(as y )ϕ(x + y, t + s)=0, −k k such that T S,ϕ = 0. Continuity of the convolution product follows from the continuity h ∗ i of the tensor product. 452 16 Distributions Chapter 17

PDE II — The Equations of Mathematical Physics

In this chapter we study in detail the Laplace equation, wave equation as well as the heat equa- tion. Firstly, for all space dimensions n we determine the fundamental solutions to the corre- sponding differential operators; then we consider initial value problems and initial boundary value problems. We study eigenvalue problems for the Laplace equation. Recall Green’s identities, see Proposition10.2, ∂v ∂u (u∆(v) v∆(u)) dxdydz = u v dS, − ∂~n − ∂~n ZZZG ZZ∂G   ∂u ∆(u) dxdydz = dS. (17.1) ∂~n ZZZG ZZ∂G

n We also need that for x Ê 0 , ∈ \ { } 1 ∆ n 2 =0, n 3, ∆(log x )=0, n =2, x − ≥ k k k k  see Example7.5.

17.1 Fundamental Solutions

17.1.1 The Laplace Equation

n 1 n Let us denote by ωn the measure of the unit sphere S − in Ê , that is, ω2 =2π, ω3 =4π.

Theorem 17.1 The function

1 log x , n =2, E (x)= 2π k k n 1 1 n−2 , n 3 ( (n 2)ωn x − − k k ≥ is locally integrable; the corresponding regular distribution En satisfies the equation ∆En = δ, n and hence is a fundamental solution for the Laplacian in Ê .

453 454 17 PDE II — The Equations of Mathematical Physics

Proof. Step 1. Example7.5 shows that ∆(E(x))=0 if x =0.

1 2 6 α 1 n Ê Step 2. By homework 50.4, log x is in Lloc(Ê ) and 1/ x Lloc( ) if and only if α < n. k k n k k ∈ Hence, En, n 2, define regular distributions in Ê . ≥ n n 2 Let n 3 and ϕ D(Ê ). Using that 1/ x − is locally integrable and Example12.6(a), ≥ ∈ k k ψ D implies ∈ ψ(x) ψ(x) n 2 dx = lim n 2 dx. n ε 0 Ê r − x ε r − Z → Zk k≥ Abbreviating β = 1/((n 2)ω ) we have n − − n ∆ϕ(x) dx E ∆ n , ϕ = βn n 2 n h i Ê x − Z k k ∆ϕ(x) dx = βn lim n 2 . ε 0 x ε x − → Zk k≥ k k We compute the integral on the right using v(x) = 1 , which is harmonic for x ε, rn−2 k k ≥ ∆v =0. Applying Green’s identity, we have

∆ϕ(x) dx 1 ∂ ∂ 1 βn n 2 = βn n 2 ϕ(x) ϕ(x) n 2 dS x ε x − x =ε r − ∂r − ∂r r − Zk k≥ k k Zk k    Let us consider the first integral as ε 0. Note that ϕ and grad ϕ are both bounded by a → constant C since hϕ is a test function. We make use of the estimate

n 1 ∂ dS 1 ∂ϕ(x) c c′ε − ′ ϕ(x) n 2 n 2 dS n 2 dS = n 2 = c ε x =ε ∂r r − ≤ ε − x =ε ∂r ≤ ε − x =ε ε − Zk k Zk k Zk k

which tends to 0 as ε 0 . → Hence we are left with computing the second integral. Note \ n x v\ that the outer unit normal vector to the sphere is ~n = − x ∂ 1 1 k k such that n =(n 2) and we have ∂~n x −2 rn−1 k k −  

(n 2) 1 1 second integral = βn ϕ(x)− n−1 dS = n 1 ϕ(x) dS. x =ε r − ωn ε − x =ε Zk k Zk k

n 1 Note that ω ε − is exactly the (n 1)-dimensional measure of the sphere of radius ε. So, the n − integral is the mean value of ϕ over the sphere of radius ε. Since ϕ is continuous at 0, the mean value tends to ϕ(0). This proves the assertion in case n 3. ≥ The proof in case n =2 is quite analogous. 17.1 Fundamental Solutions 455

Corollary 17.2 Suppose that f(x) is a continuous function with compact support. Then S =

E f is a regular distribution and we have ∆S = f in D′. In particular, ∗ 1 S(x)= log x y f(y) dy, n = 2; 2π k − k 2

ZZÊ (17.2) 1 f(y) S(x)= dy, n =3. −4π x y 3 ZZZÊ k − k

Proof. By Theorem16.8, S = E f is a solution of Lu = f if E is a fundamental solution of ∗ the differential operator L. Inserting the fundamental solution of the Laplacian for n = 2 and n =3 and using that f has compact support, the assertion follows.

Remarks 17.1 (a) The given solution (17.2) is even a classical solutions of the Poisson equa- tion. Indeed, we can differentiate the parameter integral as usual. (b) The function G(x, y)= E (x y) is called the Green’s function of the Laplace equation. n −

17.1.2 The Heat Equation

Proposition 17.3 The function

1 kxk2 2 F (x, t)= n H(t) e− 4a t (4πa2t) 2 defines a regular distribution E = TF and a fundamental solution of the heat equation u a2∆u =0, that is t − E a2 ∆ E = δ(x) δ(t). (17.3) t − x ⊗ Proof. Step 1. The function F (x, t) is locally integrable since F = 0 for t 0 and F 0 for ≤ ≥ t> 0 and

n 2 1 r 1 ξ2 − 4a2t − k (17.4) F (x, t) dx = 2 n/2 e dx = e dξk =1.

n n

Ê Ê Ê (4πa t) √π Z Z kY=1  Z 

Step 2. For t> 0, F C∞ and therefore ∈ ∂F x2 n = F, ∂t 4a2t2 − 2t   ∂F x ∂2F x2 1 = i F ; = i , ∂x −2a2t ∂x2 4a4t2 − 2a2t i i   ∂F a2∆F =0. (17.5) ∂t − See also homework 59.2. 456 17 PDE II — The Equations of Mathematical Physics

We give a proof using the Fourier transformation with respect to the spatial variables. Let

E(ξ, t)=(FxF )(ξ, t). We apply the Fourier transformation to (17.3) and obtain a first order ODE with respect to the time variable t: ∂ E(ξ, t)+ a2ξ2E(ξ, t)= α 1(ξ)δ(t). ∂t n

bt Recall from Example 16.10, that u′ + bu = δ has the fundamental solution u(t) = H(t) e− ; hence a2ξ2t E(ξ, t)= H(t) αne− We want to apply the inverse Fourier transformation with respect to the spatial variables. For, note that by Example16.11, 2 2 2 1 ξ n c x F− e− 2c2 = c e− 2 ,   where, in our case, 1 = a2t or c = 1 . Hence, 2c2 √2a2t

2 2 2 2 1 1 x 1 x 1 a ξ t 2 2 E(x, t)= H(t) αnF− e− = n n e− 2·2a t = n e− 4a t . (2π) 2 · (2a2t) 2 (4πa2t) 2  

n Ê Corollary 17.4 Suppose that f(x, t) is a continuous function on Ê with compact sup- × + port. Let kx−yk2 1 t e− 4a2(t−s) V (x, t)= H(t) n n f(y,s) dy ds 2 2 n 2

(4a π) Ê (t s) Z0 Z −

n 2 Ê Then V (x, t) is a regular distribution in D′(Ê +) and a solution of ut a ∆u = f in

n × − Ê D′(Ê ). × + Proof. This follows from Theorem16.8.

17.1.3 The Wave Equation We shall determine the fundamental solutions for the wave equation in dimensions n = 3, n = 2, and n = 1. In case n = 3 we again apply the Fourier transformation. For the other dimensions we use the method of descent.

(a) Case n = 3 Proposition 17.5 H(t) 4

E(x, t)= δ D′(Ê ) Sat ⊗ 4πa2t ∈ is a fundamental solution for the wave operator 2 u = u a2(u + u + u ) where a tt − x1x1 x2x2 x3x3 δ denotes the single-layer distribution of the sphere of radius a t around 0. Sat · 17.1 Fundamental Solutions 457

Proof. As in case of the heat equation let E(ξ, t) = FE(ξ, t) be the Fourier transform of the fundamental solution E(x, t). Then E(ξ, t) satisfies

∂2 E + a2ξ2E = α 1(ξ)δ(t) ∂t2 3

2 Again, this is an ODE of order 2 in t. Recall from Example16.10 that u′′ + a u = δ, a =0, has sin at 6 a solution u(t)= H(t) a . Thus,

sin(a ξ t) E(ξ, t)= α H(t) k k , 3 a ξ k k 1 where ξ is thought to be a parameter. Apply the inverse Fourier transformation Fx− to this function. Recall from Example16.12(b), the Fourier transform of the single layer of the sphere of radius at around 0 is 2at sin(at ξ ) FδS (ξ)= k k . at √2π ξ k k This shows 1 1 1 1 E(x, t)= H(t)δ (x) = H(t)δ (x) 2π 2at Sat a 4πa2t Sat

Let’s evaluate E , ϕ(x, t) . Using dx dx dx = dS dr where x =(x , x , x ) and r = x h 3 i 1 2 3 r 1 2 3 k k as well as the transformation r = at, dr = a dt and dS is the surface element of the sphere

Sr(0), we obtain

1 ∞ 1 E (17.6) 3 , ϕ(x, t) = 2 ϕ(x, t) dS dt h i 4πa 0 t Z ZZSat 1 ∞ a r dr = 2 ϕ x, dS 4πa 0 r a a Z ZZSr   x k k 1 ϕ x, a (17.7) = 2 dx. 3  

4πa Ê x Z k k (b) The Dimensions n = 2 and n = 1

To construct the fundamental solution E2(x, t), x = (x1, x2), we use the so-called method of descent.

Lemma 17.6 A fundamental solution E2 of the 2-dimensional wave operator 2a,2 is given by

E2 , ϕ(x1, x2, t) = lim E3(x1, x2, x3, t) , ϕ(x1, x2, t)ηk(x3) , k h i →∞ h i where E denotes a fundamental solution of the 3-dimensional wave operator 2 and η 3 a,3 k ∈

D(Ê) is the function converging to 1 as k . →∞ 458 17 PDE II — The Equations of Mathematical Physics

2 Ê Proof. Let ϕ DÊ ). Noting that η′′ 0 uniformly on as k , we get ∈ k → →∞ 2 E , ϕ(x , x , t) = E , 2 ϕ(x , x , t) h a,2 2 1 2 i h 2 a,2 1 2 i = lim E3 , 2a,2(ϕ(x1, x2, t)ηk(x3)) k →∞ h i 2 = lim E3 , a,2ϕ(x1, x2, t) ηk(x3)+ϕ ηk′′(x3) k →∞ h · · i = lim E3 , 2a,3 ϕ(x1, x2, t) ηk(x3) k →∞ = lim 2a,3E3 , ϕ(x1, x2, t)ηk(x3) = k →∞ h i lim δ(x1, x2, x3)δ(t) , ϕ(x1, x2, t)ηk(x3) k →∞ h i = ϕ(0, 0, 0) = δ(x , x , t) , ϕ(x , x , t) . h 1 2 1 2 i In the third line we used that ∆ ϕ(x , x , t) η(x ) = ∆ ϕ(x , x , t) η + ϕ η′′(x ). x1,x2,x3 1 2 · 3 x1,x2 1 2 3 

2 Ê Proposition 17.7 (a) For x =(x , x ) Ê and t , the regular distribution 1 2 ∈ ∈ 1 1 , at> x , 1 H(at x ) 2πa √a2t2 x2 E2(x, t)= −k k = − k k 2πa √a2t2 x2 (0, at x − ≤k k is a fundamental solution of the 2-dimensional wave operator. (b) The regular distribution

1 1 2a , x < at, E1(x, t)= H(at x )= | | 2a −| | (0, x at | | ≥ is a fundamental solution of the one-dimensional wave operator. Proof. By the above lemma,

1 ∞ 1 E E 2 , ϕ(x1, x2, t) = 3 , ϕ(x1, x2, t)1(x3) = 2 ϕ(x1, x2, t) dS dt. h i h i 4πa 0 t Z ZZSat

We compute the surface element of the sphere of radius at around 0 in terms of x1, x2. The surface is the graph of the function x = f(x , x )= a2t2 x2 x2. By the formula before 3 1 2 − 1 − 2 Example 10.4, dS = 1+ f 2 + f 2 dx dx . In case of the sphere we have x1 x2 1 2 p p at dx1 dx2 dSx ,x = 1 2 a2t2 x2 x2 − 1 − 2 Integration over both the upper and the lowerp half-sphere yields factor 2,

1 ∞ 1 atϕ(x1, x2, t) =2 dx1 dx2 dt 2 2 2 2 2 4πa 0 t a t x1 x2 Z x2+xZZ2 a2t2 − − 1 2≤ p 1 ∞ ϕ(x1, x2, t) = dx1 dx2 dt. 2 2 2 2 2πa 0 a t x1 x2 Z xZZat − − k k≤ p 17.2 The Cauchy Problem 459

This shows that E2(x, t) is a regular distribution of the above form. 1 3 One can show directly that E2 Lloc(Ê ). Indeed, E2(x1, x2, t) dx dt< for all ∈ 2 ∞ Ê [ R,R] R> 0. ×RRR−

(b) It was already shown in homework 57.2 that E1 is the fundamental solution to the one- dimensional wave operator. A short proof is to be found in [Wla72, II. 6.5 Example g)]. §

2 Use the method of descent to complete this proof. Let ϕ D(Ê ). Since ∈

E2(x1, x2, t) dx2 < and E2(x1, x2, t) dx2 again defines a locally integrable func- Ê Ê | | ∞ tion, we have as in Lemma17.6 R R

E1(ϕ) = lim E2(x1, x2, t) , ϕ(x1, t)ηk(x2) = lim E2(x1, x2, t)ηk(x2)ϕ(x1, t) dx1 dx2 dt. k k 3

h i Ê →∞ →∞ Z Hence, the fundamental solution E1 is the regular distribution

∞ E1(x1, t)= E2(x1, x2, t) dx2 Z−∞ 17.2 The Cauchy Problem

In this section we formulate and study the classical and generalized Cauchy problems for the wave equation and for the heat equation.

17.2.1 Motivation of the Method To explain the method, we first apply the theory of distribution to solve an initial value problem of a linear second order ODE. Consider the Cauchy problem 2 u′′(t)+ a u(t)= f(t), u = u , u′ = u , (17.8) |t=0+ 0 |t=0+ 1

where f C(Ê ). We extend the solution u(t) as well as f(t) by 0 for negative values of t, ∈ + t < 0. We denote the new function by u˜ and f˜, respectively. Since u˜ has a jump of height u0 at 0, by Example16.6, u˜′(t) = u′(t) + u δ(t). Similarly, u′(t) jumps at 0 by u such that { } 0 1 u˜′′(t)= u′′(t) + u δ′(t)+ u δ(t). Hence, u˜ satisfies on Ê the equation { } 0 1 2 u˜′′ + a u˜ = f˜(t)+ u0δ′(t)+ u1δ(t). (17.9) We construct the solution u˜. Since the fundamental solution E(t)= H(t) sin at/a as well as the right hand side of (17.9) has positive support, the convolution product exists and equals

u˜ = E (f˜+ u δ′(t)+ u δ(t)) = E f˜+ u E′(t)+ u E(t) ∗ 0 1 ∗ 0 1 1 t u˜ = f(τ) sin a(t τ)dτ + u E′(t)+ u E(t). a − 0 1 Z0 Since in case t > 0, u˜ satisfies (17.9) and the solution of the Cauchy problem is unique, the above formula gives the classical solution for t> 0, that is 1 t sin at u(t)= f(τ) sin a(t τ)dτ + u cos at + u . a − 0 1 a Z0 460 17 PDE II — The Equations of Mathematical Physics

17.2.2 The Wave Equation

(a) The Classical and the Generalized Initial Value Problem—Existence, Uniqueness, and Continuity

Definition 17.1 (a) The problem

2 n u = f(x, t), x Ê ,t> 0, (17.10) a ∈ u = u (x), (17.11) |t=0+ 0 ∂u = u (x), (17.12) ∂t 1 t=0+

where we assume that

n 1 n n

Ê Ê Ê f C(Ê ), u C ( ), u C( ). ∈ × + 0 ∈ 1 ∈ is called the classical initial value problem (CIVP, for short) to the wave equation. A function u(x, t) is called classical solution of the CIVP if

2 n + 1 n

Ê Ê Ê u(x, t) C (Ê ) C ( ), ∈ × ∩ × + u(x, y) satisfies the wave equation (17.10) for t > 0 and the initial conditions (17.11) and (17.12) as t 0+0. → (b) The problem

2 U = F (x, t)+ U (x) δ′(t)+ U (x) δ(t) a 0 ⊗ 1 ⊗

n+1 n n

Ê Ê with F D′(Ê ), U0, U1 D′( ), and supp F [0, + ) is called ∈ ∈ ⊂ × ∞ n+1 generalized initial value problem (GIVP). A generalized function U D′(Ê ) with n ∈ supp U Ê [0, + ) which satisfies the above equation is called a (generalized, weak) so- ⊂ × ∞ lution of the GIVP.

Proposition 17.8 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f, u0, and u1. Then the regular distribution Tu is a solution of the GIVP with the right hand side

Tf + Tu0 δ′(t)+Tu1 δ(t) provided that f(x, t) and u(x, t) are extended by 0 into the domain ⊗ n+1⊗ (x, t) (x, t) Ê ,t< 0 . { | ∈ } (b) Conversely, suppose that U is a solution of the GIVP.Let the distributions F = Tf , U0 = Tu0 ,

U1 = Tu1 and U = Tu be regular and they satisfy the regularity assumptions of the CIVP. Then, u(x, t) is a solution of the CIVP.

n+1 Proof. (b) Suppose that U is a solution of the GIVP; let ϕ D(Ê ). By definition of the ∈ tensor product and the derivative,

2 U a ∆U,ϕ = F,ϕ + U δ′ , ϕ + U δ, ϕ tt − h i h 0 ⊗ i h 1 ⊗ i ∞ ∂ϕ = f(x, t )ϕ(x, t) dx dt u0(x) (x, 0) dx + u1(x)ϕ(x, 0) dx. (17.13)

n n n

Ê Ê Ê − ∂t Z0 Z Z Z 17.2 The Cauchy Problem 461

Applying integration by parts with respect to t twice, we find

∞ ∞ uϕ dt = uϕ ∞ u ϕ dt tt t|0 − t t Z0 Z0 ∞ = u(x, 0)ϕ (x, 0) u ϕ ∞ + u tϕ dt − t − t |0 t Z0 ∞ = u(x, 0)ϕ (x, 0) + u (x, 0)ϕ(x, 0)+ u tϕ dt. − t t t Z0 n Since Ê has no boundary and ϕ has compact support, integration by parts with respect to the spatial variables x yields no boundary terms. Hence, by the above formula and u ∆ϕ dt dx = ∆uϕ dt dx, we obtain RR RR 2 2 ∞ 2 Utt a ∆U,ϕ = U,ϕtt a ∆ϕ = u(x, t) ϕtt a ∆ϕ dt dx n

− − Ê − Z Z0 ∞ 2  = utt a ∆u ϕ(x, t) dx dt (u(x, 0)ϕt(x, 0) ut(x, 0)ϕ(x, 0)) dx.

n n Ê Ê − − − Z Z0 Z  (17.14)

n n

Ê Ê For any ϕ D(Ê ), supp ϕ is contained in (0, + ) such that ∈ × + × ∞ ϕ(x, 0) = ϕt(x, 0)=0. From (17.13) and (17.14) it follows that

∞ 2 f(x, t) utt + a ∆u ϕ(x, t) dt dx =0. n

Ê − Z Z0 

2 n Ê By Lemma16.2 (Du Bois Reymond) it follows that u a ∆u = f on Ê . Inserting this tt − × + into (17.13) and (17.14) we have

(u0(x) u(x, 0))ϕt(x, 0) dx (u1(x) ut(x, 0)) ϕ(x, 0) dx =0.

n n Ê Ê − − − Z Z

If we set ϕ(x, t)= ψ(x)η(t) where η D(Ê) and η(t)=1 is constant in a neighborhood of 0, ∈ ϕt(x, 0)=0 and therefore

n (u1(x) ut(x, 0))ψ(x)=0, ψ D(Ê ). n

Ê − ∈ Z Moreover, n (u0(x) u(x, 0))ψ(x) dx =0, ψ D(Ê ) n

Ê − ∈ Z if we set ϕ(x, t)= tη(t)ψ(x). Again, Lemma16.2 yields

u0(x)= u(x, 0), u1(x)= ut(x, 0) and u(x, t) is a solution of the CIVP. (a) Conversely, if u(x, t) is a solution of the CIVP then (17.14) holds with

U(x, t)= H(t) u(x, t). 462 17 PDE II — The Equations of Mathematical Physics

Comparing this with (17.13) it is seen that

2 U a ∆U = F + U δ′ + U δ tt − 0 ⊗ 1 ⊗ where F (x, t)= H(t)f(x, t), U0(x)= u(x, 0) and U1(x)= ut(x, 0).

Corollary 17.9 Suppose that F , U0, and U1 are data of the GIVP. Then there exists a unique solution U of the GIVP. It can be written as U = V + V (0) + V (1) where ∂E V = E F, V (1) = E U , V (0) = n U . n ∗ n ∗x 1 ∂t ∗x 0 Here E U := E (U (x) δ(t)) denotes the convolution product with respect to the spatial n ∗x 1 n ∗ 1 ⊗ variables x only. The solution U depends continuously in the sense of the convergence in D′ on

F , U0, and U1. Here En denote the fundamental solution of the n-dimensional wave operator

2a,n.

Proof. The supports of the distributions U0 δ′ and U1 δ are contained in the hyperplane n+1 ⊗ ⊗ (x, t) Ê t =0 . Hence the support of the distribution F +U0 δ′ +U1 δ is contained

{ ∈ | n } ⊗ ⊗ Ê in the half space Ê . × + It follows from Proposition16.15 below that the convolution product

U = E (F + U δ′ + U δ) n ∗ 0 ⊗ 1 ⊗ exists and has support in the positive half space t 0. It follows from Theorem16.8 that U is a

≥ n Ê solution of the GIVP. On the other and, any solution of the GIVP has support in Ê and × + therefore, by Proposition16.15, posses the convolution with En. By Theorem16.8, the solution U is unique. n+1 Suppose that U U as k in D′(Ê ) then E U E U by the continuity of k −→ 1 →∞ n ∗ k −→ n ∗ 1 the convolution product in D′ (see Proposition 16.15).

(b) Explicit Solutions for n = 1, 2, 3 We will make the above formulas from Corollary 17.9 explicit, that is, we compute the above convolutions to obtain the potentials V , V (0), and V (1).

2 n 3 n 2 n

Ê Ê Ê Proposition 17.10 Let f C (Ê +), u0 C ( ), and u1 C ( ) for n = 2, 3; let

1 2 ∈ × 1 ∈ ∈

Ê Ê f C (Ê ), u C ( ), and u C ( ) in case n =1. ∈ + 0 ∈ 1 ∈ Then there exists a unique solution of the CIVP.It is given in case n =3 by Kirchhoff’s formula

x y 1 f y, t − 1 ∂ 1 u(x, t)= − a dy + u (y) dS + u (y) dS . 4πa2  x y t 1 y ∂t  t 0 y UZZZ(x) k − k  SZZ(x) SZZ(x)  at at  at    (17.15) 17.2 The Cauchy Problem 463

The first term V is called retarded potential.

In case n =2, x =(x1, x2), y =(y1,y2), it is given by Poisson’s formula

1 t f(y,s) dy ds 1 u (y) dy u(x, t)= + 1 2πa 2 2πa 2 Z0 ZZ a2(t s)2 x y ZZ a2t2 x y Ua(t−s)(x) − −k − k Uat(x) −k − k q 1 ∂ q u (y) dy + 0 . (17.16) 2πa ∂t 2 ZZ a2t2 x y Uat(x) −k − k q In case n =1 it is given by d’Alembert’s formula

t x+a(t s) x+at 1 − 1 1 u(x, t)= f(y,s) dy ds + u1(y) dy + (u0(x + at)+ u0(x at)) . 2a 0 x a(t s) 2a x at 2 − Z Z − − Z − (17.17)

The solution u(x, t) depends continuously on u0, u1, and f in the following sense: If

f f˜ < ε, u u˜ <ε , u u˜ <ε , grad(u u˜ ) <ε′ − | 0 − 0 | 0 | 1 − 1 | 1 k 0 − 0 k 0

(where we impose the last inequality only in cases and ), then the corresponding n = 3 n = 2 solutions u(x, t) and u˜(x, t) satisfy in a strip 0 t T ≤ ≤ 1 2 u(x, t) u˜(x, y) < T ε + Tε + ε +(aT ε′ ), | − | 2 1 0 0 where the last term is omitted in case n =1. Proof. (idea of proof) We show Kirchhoff’s formula. (a) The potential term with f.

By Proposition16.15 below, the convolution product E3 f exists. It is shown in [Wla72, p.

1 n+1 ∗ n

Ê Ê 153] that for a locally integrable function f L (Ê ) with supp f , E T is ∈ loc ⊂ × + n ∗ f again a locally integrable function. Formally, the convolution product is given by

(E3 f)(x, t)= E3(y,s)f(x y, t s) dy ds = E3(x y, t s) f(y,s) dy ds,

4 4 Ê ∗ Ê − − − − Z Z where the integral is to be understood the evaluation of E (y,s) on the shifted function f(x 3 − y, t s). Since f has support on the positive time axis, one can restrict oneselves to s> 0 and − t s> 0, that is to 0

Using dy dy dy = dr dS as well as y = r = as we can proceed 1 2 3 k k y k k 1 f x y, t a V (x, t)= − − dy dy dy . 4πa2  y  1 2 3 UZZZat(0) k k

The shift z = x y, dz dz dz = dy dy dy finally yields − 1 2 3 1 2 3 x z k − k 1 f z, t a V (x, t)= − dz. 4πa2  x z  UZZZat(x) k − k

This is the first potential term of Kirchhoff’s formula. (b) We compute V (1)(x, t). By definition,

V (1) = E (u δ)= E u . 3 ∗ 1 ⊗ 3 ∗x 1 Formally, this is given by,

1 1 V (1)(x, t)= δ (y) u (x y) dy = u (x y) dS(y) 4πa2t Sat 1 − 4πa2t 1 − 3 ZZZÊ ZZSat 1 = u (y) dS(y). 4πa2t 1 SatZZ(x)

(c) Recall that (DαS) T = Dα(S T ), by Remark 16.8 (b). In particular ∗ ∗ ∂ E (u δ′)= (E u ) ; 3 ∗ 0 ⊗ ∂t 3 ∗x 0 which immediately gives (c) in view of (b).

Remark 17.2 (a) The stronger regularity (differentiability) conditions on f,u0,u1 are neces-

2 n + Ê sary to prove u C (Ê ) and to show stability. ∈ × (b) Proposition17.10 and Corollary17.9 show that the GIVP for the wave wave equation is a well-posed problem (existence, uniqueness, stability).

17.2.3 The Heat Equation Definition 17.2 (a) The problem

2 n

u a ∆u = f(x, t), x Ê ,t> 0 (17.18) t − ∈ u(x, 0) = u0(x), (17.19) where we assume that

n n

Ê Ê f C(Ê ), u C( ) ∈ × + 0 ∈ 17.2 The Cauchy Problem 465 is called the classical initial value problem (CIVP, for short) to the heat equation. A function u(x, t) is called classical solution of the CIVP if

2 n n Ê u(x, t) C (Ê (0, + )) C( [0, + )), ∈ × ∞ ∩ × ∞ and u(x, t) satisfies the heat equation (17.18) and the initial condition (17.19). (b) The problem U a2∆U = F + U δ t − 0 ⊗

n+1 n n

Ê Ê Ê with F D′(Ê ), U0 D′( ), and supp F + is called generalized initial

∈ ∈ ⊂ n+1× n

Ê Ê value problem (GIVP). A generalized function U D′(Ê ) with supp U which ∈ ⊂ × + satisfies the above equation is called a generalized solution of the GIVP. The fundamental solution of the heat operator has the following properties:

E(x, t) dx =1, n ZÊ E(x, t) δ(x), as t 0+ . −→ → The fundamental solution describes the heat distribution of a point-source at the origin (0, 0). n Since E(x, t) > 0 for all t> 0 and all x Ê , the heat propagates with infinite speed. This is in ∈ contrast to our experiences. However, for short distances, the heat equation is gives sufficiently good results. For long distances one uses the transport equation. We summarize the results which are similar to that of the wave equation.

Proposition 17.11 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f and u0. Then the regular distribution Tu˜ is a solution of the GIVP with the right hand side ˜ Tf˜ + Tu 0 δ provided that f(x, t) and u(x, t) are extended to f(x, t) and u˜(x, t) by 0 into the − ⊗ n+1 left half-space (x, t) (x, t) Ê ,t< 0 . { | ∈ } (b) Conversely, suppose that U is a solution of the GIVP.Let the distributions F = Tf , U0 = Tu0 , and U = Tu be regular and they satisfy the regularity assumptions of the CIVP. Then, u(x, t) is a solution of the CIVP.

Proposition 17.12 Suppose that F and U0 are data of the GIVP. Suppose further that F and U0 both have compact support. Then there exists a solution U of the GIVP which can be written as

U = V + V (0) where V = E F, V (0) = E U . ∗ ∗x 0 The solution U varies continuously with F and U0.

Remark 17.3 The theorem differs from the corresponding result for the wave equation in that there is no proposition on uniqueness. It turns out that the GIVP cannot be solved uniquely. A. Friedman, Partial differential equations of parabolic type, gave an example of a non-vanishing 466 17 PDE II — The Equations of Mathematical Physics

distribution which solves the GIVP with F =0 and U0 =0. However, if all distributions are regular and we place additional requirements on the growth for t, x of the regular distribution, uniqueness can be achieved. k k→∞ For existence and uniqueness we introduce the following class of functions

n n

Ê Ê M = f C(Ê ) f is bounded on the strip [0, T ] for all T > 0 , { ∈ × + | × }

n n n

Ê Ê C (Ê )= f C( ) f is bounded on b { ∈ | }

n Corollary 17.13 (a) Let f M and u C (Ê ). Then the two potentials V (x, t) as in ∈ 0 ∈ b Corollary17.4 and

x y 2 (0) H(t) k − k 4a2t V (x, t)= E Tu0 δ = n u0(y)e− dy 2 2 n

∗ ⊗ (4πa t) Ê Z are regular distributions and u = V + V (0) is a solution of the GIVP.

2 n α Ê (b) In case f C (Ê ) with D f M for all α with α 1 (first order partial ∈ × + ∈ | | ≤ derivatives), the solution in (a) is a solution of the CIVP. In particular, V (0)(x, t) u (x) as −→ 0 t 0+. → (c) The solution u of the GIVP is unique in the class M.

17.2.4 Physical Interpretation of the Results

        The backward light cone  s    s   (x,t)       

Γ (x,t) Γ (x,t) − +

 y  2  y   2   (x,t)         The forward light cone      y  1 radius = at y1

n+1

Definition 17.3 We introduce the two cones in Ê

Γ (x, t)= (y,s) x y < a(t s) , s < t, − { |k − k − } Γ (x, t)= (y,s) x y < a(s t) , s > t, + { |k − k − } which are called domain of dependence (backward light cone) and domain of influence (forward light cone), respectively.

Recall that the boundaries ∂Γ+ and ∂Γ are characteristic surfaces of the wave equation. − 17.2 The Cauchy Problem 467

(a) Propagation of Waves in Space

Consider the fundamental solution 1 E3(x, t)= δSat T 1 4πa2 ⊗ t of the 3-dimensional wave equation.

Sat It shows that the disturbance at time t > 0 effected by a point source δ(x)δ(t) in the origin is located on a sphere of radius at around 0. The disturbance moves like a spherical wave, x = at, with velocity a. In the be- k k ginning there is silence, then disturbance (on the sphere), and afterwards, again silence. This is called Huygens’ principle.

Disturbance is on a sphere of radius at

It follows by the superposition principle that the solution u(x0, t0) of an initial disturbance u0(x)δ′(t)+ u1(x)δ(t) is completely determined by the values of u0 and u1 on the sphere of the backwards light-cone at t = 0; that is by the values u0(x) and u1(x) at all values x with x x = at . k − 0k 0 Now, let the disturbance be situated in a com- pact set K rather than in a single point. Sup- silence pose that d and D are the minimal and maximal distances of x from K. Then the disturbance disturbance starts to act in x at time t0 = d/a it lasts for M(K) M(K) silence (D d)/a; and again, for t > D/a = t there − 1 K is silence at x. Therefore, we can observe a for-

ward wave front at time t0 and a backward wave front at time t1.

This shows that the domain of influence M(K) of compact set K is the union of all boundaries of forward light-cones Γ (y, 0) with y K at time t =0. + ∈ M(K)= (y,s) x K : x y = as . { | ∃ ∈ k − k }

(b) Propagation of Plane Waves

Consider the fundamental solution

H(at x ) E2(x, t)= −k k , x =(x1, x2) 2πa a2t2 x 2 −k k q of the 2-dimensional wave equation. 468 17 PDE II — The Equations of Mathematical Physics

It shows that the disturbance effected by a point source δ(x)δ(t) in the origin at time 0 is a disc          Uat of radius at around 0. One observes a for-                         ward wave front moving with speed . In con-   a                          trast to the -dimensional picture, there exists  3      t=0  t=1 t=2 t=3   t=4 no back front. The disturbance is permanently present from t0 on. We speak of wave diffusion; Huygens’ principle does not hold.

Diffusion can also be observed in case of arbitrary initial disturbance u0(x)δ′(t)+ u1(x)δ(t). Indeed, the superposition principle shows that the domain of dependence of a compact initial disturbance K is the union of all discs U (y) with y K. at ∈

(c) Propagation on a Line

Recall that E (x, t) = 1 H(at x ). The disturbance at time t > 0 which is effected by a 1 2a −| | point source δ(x)δ(t) is the whole closed interval [ at, at]. We have two forward wave “fronts” − one at the point x = at and one at x = at; one moving to the right and one moving to the left. − As in the plane case, there does not exist a back wave font; we observe diffusion. For more details, see the discussion in Wladimirow, [Wla72, p. 155 – 159].

17.3 Fourier Method for Boundary Value Problems

A good, easy accessable introduction to the Fourier method is to be found in [KK71]. In this section we use Fourier series to solve BEVP to the Laplace equation as well as initial boundary value problems to the wave and heat equations. Recall that the following sets are CNOS in the Hilbert space H

1 int 2

e n Z, , H =L (a, a +2π), √2π | ∈   1 2πi b−a nt 2 e n Z , H =L (a, b), √b a | ∈  −  1 1 1 2

, sin(nt), cos(nt) n Æ , H =L (a, a +2π), √2π √π √π | ∈   1 2 2π 2 2π 2

, sin nt , cos nt n Æ , H =L (a, b), √b a b a b a b a b a | ∈ ( − r −  −  r −  −  ) For any function f L1(0, 2π) one has an associated Fourier series ∈ 2π int 1 int f cn e , cn = f(t)e− dt. ∼ √2π 0 n Z X∈ Z 17.3 Fourier Method for Boundary Value Problems 469

Lemma 17.14 Each of the following two sets forms a CNOS in H = L2(0, π) (on the half interval).

2 2 Æ sin(nt) n Æ , cos(nt) n 0 . (rπ | ∈ ) (rπ | ∈ ) Proof. To check that they form an NOS is left to the reader. We show completeness of the first set. Let f L2(0, π). Extend f to an odd function f˜ L2( π, π), that is f˜(x) = f(x) and ∈ ∈ − f˜( x)= f(x) for x (0, π). Since f˜ is an odd function, in its Fourier series − − ∈ a ∞ 0 + a cos(nt)+ b sin(nt) 2 n n n=1 X 2 we have a =0 for all n. Since the Fourier series ∞ b sin(nt) converges to f˜in L ( π, π), n n=1 n − it converges to f in L2(0, π). Thus, the sine system is complete. The proof for the cosine P system is analogous.

17.3.1 Initial Boundary Value Problems (a) The Homogeneous Heat Equation, Periodic Boundary Conditions

We consider heat conduction in a closed wire loop of length 2π. Let u(x, t) be the temperature of the wire at position x and time t. Since the wire is closed (a loop), u(x, t) = u(x +2π, t);

u is thought to be a 2π periodic function on Ê for every fixed t. Thus, we have the following periodic boundary conditions

u(0, t)= u(2π, t), u (0, t)= u (2π, t), t Ê . (PBC) x x ∈ + The initial temperature distribution at time t =0 is given such that the BIVP reads

2

u a u =0, x Ê,t> 0, t − xx ∈

u(x, 0) = u (x), x Ê, (17.20) 0 ∈ (PBC).

Separation of variables. We are seeking solutions of the form

u(x, t)= f(x) g(t) · ignoring the initial conditions for a while. The heat equation then takes the form

2 1 g′(t) f ′′(x) f(x)g′(t)= a f ′′(x)g(t) = = κ = const. ⇐⇒ a2 g(t) f(x) We obtain the system of two independent ODE only coupled by κ:

2 f ′′(x) κf(x)=0, g′(t) a κg(t)=0. − − 470 17 PDE II — The Equations of Mathematical Physics

The periodic boundary conditions imply:

f(0)g(t)= f(2π)g(t), f ′(x)g(t)= f ′′(2π)g(t) which for nontrivial g gives f(0) = f(2π) and f ′(0) = f ′(2π). In case κ = 0, f ′′(x)=0 has the general solution f(x) = ax + b. The only periodic solution is f(x) = b = const. Suppose now that κ = ν2 < 0. Then the general solution of the second order ODE is −

f(x)= c1 cos(νx)+ c2 sin(νx).

Since f is periodic with period 2π, only a discrete set of values ν are possible, namely νn = n,

2 Æ n Z. This implies κ = n , n . ∈ n − ∈ Finally, in case κ = ν2 > 0, the general solution

νx νx f(x)= c1e + c2e− provides no periodic solutions f. So far, we obtained a set of solutions

f (x)= a cos(nx)+ b sin(nx), n Æ , n n n ∈ 0 corresponding to κ = n2. The ODE for g (t) now reads n − n 2 2 gn′ (t)+ a n gn(t)=0.

a2n2t Its solution is g (t)= ce− , n Æ. Hence, the solutions of the BVP are given by n ∈

a2n2t a0

u (x, t) = e− (a cos(nx)+ b sin(nx)), n Æ, u (x, t)= n n n ∈ 0 2 and finite or “infinite” linear combinations:

∞ a0 a2n2t u(x, t)= + e− (a cos(nx)+ b sin(nx)), (17.21) 2 n n n=0 X Consider now the initial value, that is t =0. The corresponding series is

a ∞ u(x, 0) = 0 + a cos(nx)+ b sin(nx), (17.22) 2 n n n=0 X which gives the ordinary Fourier series of u0(x). That is, the Fourier coefficient an and bn of the initial function u0(x) formally give a solution u(x, t).

1. If the Fourier series of u0 pointwise converges to u0, the initial conditions are satisfied by the function u(x, t) given in (17.21)

2. If the Fourier series of u0 is twice differentiable (with respect to x), so is the function u(x, t) given by (17.21). 17.3 Fourier Method for Boundary Value Problems 471

4 Lemma 17.15 Consider the BIVP (17.20). (a) Existence. Suppose that u0 C (Ê) is periodic. 2,1 ∈

Then the function u(x, t) given by (17.21) is in C ([0, 2π] Ê ) and solves the classical BIVP x,t × + (17.20). 2,1

(b) Uniqueness and Stability. In the class of functions C ([0, 2π] Ê ) the solution of the x,t × + above BIVP is unique.

(4) Proof. (a) The Fourier coefficients of u0 are bounded such that the Fourier coefficients of u0 4 (4) have growth 1/n (integrate the Fourier series of u0 four times). Then the series for uxx(x, t) ∞ 1 and u (x, t) both are dominated by the series ; hence they converge uniformly. This t n2 n=0 shows that the series u(x, t) can be differentiatedX term by term twice w.r.t. x and once w.r.t. t. 2 (b) For any fixed t 0, u(x, t) is continuous in x. Consider v(t) := u(x, t) 2 . Then ≥ k kL (0,2π) 2π 2π 2π d 2 2 v′(t)= u(x, t) dx =2 u(x, t)ut(x, t) dx =2 u(x, t)a uxx(x, t) dx dt 0 0 0 Z 2π Z 2π Z 2π =2 a2u u a2 (u (x, t))2 dx = 2a2 u2 dx 0. x 0 − x − x ≤  Z0  Z0

This shows that v(t) is monotonically decreasing in t. Suppose now u and u both solve the BIVP in the given class. Then u = u u solves 1 2 1 − 2 the BIVP in this class with homogeneous initial values, that is u(x, 0) = 0, hence, v(0) = 2 u(x, 0) 2 = 0. Since v(t) is decreasing for t 0 and non-negative, v(t)=0 for all t 0; k kL ≥ ≥ hence u(x, t)=0 in L2(0, 2π) for all t 0. Since u(x, t) is continuous in x, this implies ≥ u(x, t)=0 for all x and t. Thus, u1(x, t)= u2(x, t)—the solution is unique. Stability. Since v(t) is decreasing

sup u(x, t) L2(0,2π) u0 L2(0,2π) .

t Ê+ k k ≤k k ∈

This shows that small changes in the initial conditions u0 imply small changes in the solution u(x, t). The problem is well-posed.

(b) The Inhomogeneous Heat Equation, Periodic Boundary Conditions We study the IBVP

u a2u = f(x, t), t − xx u(x, 0)=0, (17.23) (PBC).

inx 2 Solution. Let en(x) = e /√2π, n Z, be the CNOS in L (0, 2π). These functions are all ∈ d2 2 eigen functions with respect to the differential operator , e′′ (x) = n e (x). Let t > 0 be dx2 n − n fixed and f(x, t) c (t) e (x) ∼ n n

n Z X∈ 472 17 PDE II — The Equations of Mathematical Physics

be the Fourier series of f(x, t) with coefficients cn(t). For u, we try the following ansatz

u(x, t) d (t) e (x) (17.24) ∼ n n

n Z X∈ If f(x, t) is continuous in x and piecewise continuously differentiable with respect to x, its Fourier series converges pointwise and we have

2 2 2 u a u = e d′(t)+ a n e d(t) = f(x, t)= c (t)e .

t − xx n n n n Æ n Z n X∈  X∈ For each n this is an ODE in t

2 2 dn′ (t)+ a n dn(t)= cn(t), dn(0) = 0.

From ODE the solution is well-known

t a2n2t a2n2s dn(t) = e− e cn(s) ds. Z0 Under certain regularity and growth conditions on f, (17.24) solves the inhomogeneous IBVP.

(c) The Homogeneous Wave Equation with Dirichlet Conditions

Consider the initial boundary value problem of the vibrating string of length π.

(E) u a2u =0, 0 0; tt − xx (BC) u(0, t)= u(π, t)=0,

(IC) u(x, 0) = ϕ(x), ut(x, 0) = ψ(x), 0

The ansatz u(x, t)= f(x)g(t) yields

f ′′ g′′ 2 = κ = , f ′′(x)= κf(x), g′′ = κa g. f a2g

The boundary conditions imply f(0) = f(π)=0. Hence, the first ODE has the only solutions

2

f (x)= c sin(nx), κ = n , n Æ. n n n − ∈ The corresponding ODEs for g then read

2 2 gn′′ + n a gn =0, which has the general solution an cos(nat)+ bn sin(nat). Hence,

∞ u(x, t)= (an cos(nat)+ bn sin(nat)) sin(nx) n=1 X 17.3 Fourier Method for Boundary Value Problems 473

2 solves the boundary value problem in the sense of D′(Ê ) (choose any an, bn of polynomial growth). Now, insert the initial conditions, t =0:

∞ ! ∞ ! u(x, 0) = an sin(nx) = ϕ(x), ut(x, 0) = nabn sin(nx) = ψ(x). n=1 n=1 X X

2 2 Since sin(nx) n Æ isa CNOS in L (0, π), we can determine the Fourier coefficients { π | ∈ } of ϕ andqψ with respect to this CNOS and obtain an and bn, respectively. 4 3 Regularity. Suppose that ϕ C ([0, π]), ψ C ([0, π]). Then the Fourier-Sine coefficients an ∈ 4 ∈ 3 and anbn of ϕ and ψ have growth 1/n and 1/n , respectively. Hence, the series

∞ u(x, t)= (an cos(nat)+ bn sin(nat)) sin(nx) (17.25) n=1 X can be differentiated twice with respect to x or t since the differentiated series have a summable upper bound c/n2. Hence, (17.25) solves the IBVP. P (d) The Wave Equation with Inhomogeneous Boundary Conditions

n Consider the following problem in Ω Ê ⊂ u a2∆u =0, tt − u(x, 0) = ut(x, 0)=0 u = w(x, t). |∂Ω 2 Idea. Find an extension v(x, t) of w(x, t), v C (Ω Ê ), and look for functions u˜ = u v. ∈ × + − Then u˜ has homogeneous boundary conditions and satisfies the IBVP

u˜ a2∆˜u = v + a2∆v, tt − − tt u˜(x, 0) = v(x, 0), u˜ (x, 0) = v (x, 0) − t − t u˜ =0. |∂Ω This problem can be split into two problems, one with zero initial conditions and one with homogeneous wave equation.

17.3.2 Eigenvalue Problems for the Laplace Equation In the previous subsection we have seen that BIVPs using Fourier’s method often lead to bound- ary eigenvalue problems (BEVP) for the Laplace equation. We formulate the problems. Let n = 1 and Ω = (0, l). One considers the following types of

BEVPs to the Laplace equation; f ′′ = λf:

Dirichlet boundary conditions: f(0) = f(l)=0. •

Neumann boundary conditions: f ′(0) = f ′(l)=0. • 474 17 PDE II — The Equations of Mathematical Physics

Periodic boundary conditions : f(0) = f(l), f ′(0) = f ′(l). •

Mixed boundary conditions: α f(0) + α f ′(0) = 0, β f(l)+ β f ′(l)=0. • 1 2 1 2 Symmetric boundary conditions: If u and v are function satisfying these boundary condi- • l tions, then (u′v uv′) =0. In this case integration by parts gives − |0 l l l (u′′v uv′′) dx = u′v v′u (u′v′ v′u′) dx =0. − − |0 − − Z0 Z0 That is u′′ v = u v′′ and the Laplace operator becomes symmetric. · · n Proposition 17.16 Let Ω Ê . The BEVP with Dirichlet conditions ⊂ ∆u = λu, u =0, u C2(Ω) C1(Ω) (17.26) |∂Ω ∈ ∩ has countably many eigenvalues λk. All eigenvalues are negative and of finite multiplicity. Let 1 0 > λ1 > λ2 > then sequence ( ) tends to 0. The eigenfunctions uk corresponding to λk ··· λk form a CNOS in L2(Ω). Sketch of proof. (a) Let H =L2(Ω). We use Green’s 1st formula with u = v , u =0, |∂Ω u ∆u dx + ( u)2 dx =0. ∇ ZΩ ZΩ to show that all eigenvalues of ∆ are negative. Let ∆u = λu. First note, that λ = 0 is not an eigenvalue of ∆. Suppose to the contrary ∆u = 0, that is, u is harmonic. Since u = 0, by |∂Ω the uniqueness theorem for the Dirichlet problem, u =0 in Ω. Then

λ u 2 = λ u,u = λu, u = ∆u, u = u ∆u dx = ( u)2 dx< 0. k k h i h i h i − ∇ ZΩ ZΩ Hence, λ is negative. (b) Assume that a Green’s function G for Ω exists. By (17.34), that is ∂G(x, y) u(y)= G(x, y) ∆u(x) dx + u(x) dS(x), ∂~n ZΩ Z∂Ω x u =0 implies |∂Ω u(y)= G(x, y) ∆u(x) dx. ZΩ This shows that the integral operator A: L2(Ω) L2(Ω) defined by → (Av)(y) := G(x, y)v(x) dx ZΩ is inverse to the Laplacian. Since G(x, y) = G(y, x) is real, A is self-adjoint. By (a), its eigenvalues, 1/λk are all negative. If

G(x, y) 2 dxdy < , | | ∞ ΩZZΩ × 17.3 Fourier Method for Boundary Value Problems 475

A is a compact operator.

We want to justify the last statement. Let (Kf)(x)= Ω k(x, y)f(y) dy be an integral operator 2 2 on H = L (Ω), with kernel k(x, y) H = L (Ω Ω). Let u n Æ be a CNOS in H; ∈ × R { n | ∈ } then u (x)u (y) n, m Æ is a CNOS in H. Let k be the Fourier coefficients of k with { n m | ∈ } nm respect to the basis u (x)u (y) in H. Then { n m }

(Kf)(x)= f(y) knmun(x)um(y) dy Ω n,m Z X

= un(x) knm um(y)f(y) dy n m Ω ! X X Z = u (x) k f,u = k f,u u . n nm h mi nm h mi n n m n,m X X X This in particular shows that

Kf 2 = k2 f,u 2 k2 f 2 = f 2 k(x, y)2 dxdy = f 2 Ku 2 k k mn h mi ≤ mn k k k k k k k nk m,n m,n Ω Ω n X X Z × X K 2 Ku 2 k k ≤ k nk n X We show that K is approximated by the sequence (Kn) defined by

n K f = k f,u u n rm h mi r m r=1 X X of finite rank operators. Indeed,

∞ ∞ 2 2 2 2 2 (K Kn)f = krm f,um sup krm f k − k |h i| ≤ m k k m r=n+1 r=n+1 X X X such that ∞ 2 2 K Kn = sup krm 0 k − k m −→ r=n+1 X as n . Hence, K is compact. →∞ (c) By (a) and (b), A is a negative, compact, self-adjoint operator. By the spectral theorem for compact self-adjoint operators, Theorem 13.33, there exists an NOS (uk) of eigenfunctions to

1/λk of A. The NOS (uk) is complete since 0 is not an eigenvalue of A.

2 Example 17.1 Dirichlet Conditions on the Square. Let Q = (0, π) (0, π) Ê . The × ⊂ Laplace operator with Dirichlet boundary conditions on Ω has eigenfunctions 2 u (x, y)= sin(mx) sin(ny), mn π

2 2 corresponding to the eigenvalues λ = (m + n ). The eigenfunctions u m, n Æ mn − { mn | ∈ } form a CNOS in the Hilbert space L2(Ω). 476 17 PDE II — The Equations of Mathematical Physics

2 Example 17.2 Dirichlet Conditions on the ball U1(0) in Ê . We consider the BEVP with Dirichlet boundary conditions on the ball. ∆u = λu, u =0. − |S1(0) In polar coordinates u(x, y)=˜u(r, ϕ) this reads, 1 ∂ 1 ∆˜u(r, ϕ)= (r u˜ )+ u˜ = λu,˜ 0

Φ′′ + µΦ =0, Φ(0) = Φ(2π); 2 2 r R′′ + rR′ +(λr µ)R =0, R(0) < , R(1) = 0. (17.27) − | | ∞ The eigenvalues and eigenfunctions to the first problem are

2 ikϕ

µ = k , Φ (ϕ) = e , k Z. k k ∈ Equation (17.27) is the Bessel ODE. For µ = k2 the solution of (17.27) bounded in r = 0 is given by the Bessel function Jk(r√λ). Recall from homework 21.2 that

x 2n+k ∞ n ( 1) 2

J (x)= − , k Æ . k n!(n + k)! ∈ 0 n=0  X To determine the eigenvalues λ we use the boundary condition R(1) = 0 in (17.27), namely

Jk(√λ)=0. Hence, √λ = µkj, where µkj, j = 1, 2,... , denote the positive zeros of Jk. We obtain λ = µ2 , R (r)= J (µ r), j =1, 2, . kj kj kj k kj ··· The solution of the BEVP is

2 ikϕ λkj = µkj, ukj(x)= J k (µ k jr)e , k Z, j =1, 2, . | | | | ∈ ···

ikt Z Note that the Bessel functions Jk k Z+ and the system e k form a complete

2 { 2 | ∈ } { | ∈ } Z OS in L ((0, 1),r dr) and in in L (0, 2π), respectively. Hence, the OS ukl k Z, l + is 2 { | ∈ ∈ } a complete OS in L (U1(0)). Thus, there are no further solutions to the given BEVP. For more details on Bessel functions, see [FK98, p. 383]. 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 477

17.4 Boundary Value Problems for the Laplace and the Pois- son Equations

Throughout this section (if nothing is stated otherwise) we will assume that Ω is a bounded n 2 region in Ê , n 2. We suppose further that Ω belongs to the class C , that is, the boundary ≥ n ∂Ω consists of finitely many twice continuously differentiable hypersurfaces; Ω′ := Ê \ Ω is assumed to be connected (i.e. it is a region, too). All functions are assumed to be real valued.

17.4.1 Formulation of Boundary Value Problems

(a) The Inner Dirichlet Problem:

Given ϕ C(∂Ω) and f C(Ω), find u C(Ω) C2(Ω) such that ∈ ∈ ∈ ∩ ∆u(x)= f(x) x Ω, and ∀ ∈ u(y)= ϕ(y), y ∂Ω. ∀ ∈

(b) The Exterior Dirichlet Problem:

2 Given ϕ C(∂Ω) and f C(Ω ), find u C(Ω ) C (Ω′) such that ∈ ∈ ′ ∈ ′ ∩

∆u(x)= f(x), x Ω′, and ∀ ∈ u(y)= ϕ(y), y ∂Ω, ∀ ∈ lim u(x)=0. x | |→∞

(c) The Inner Neumann Problem:

n y Given ϕ C(∂Ω) and f C(Ω), find u C1(Ω) C2(Ω) Ω ∈ ∈ ∈ ∩ such that x=y−tn ∆u(x)= f(x) x Ω, and ∀ ∈ ∂u (y)= ϕ(y), y ∂Ω. ∂~n ∀ ∈ −

Here ∂u (y) denotes the limit of directional derivative ∂~n−

∂u (y) = lim ~n(y) grad u(y t~n(y)) ∂~n t 0+0 · − − → and ~n(y) is the outer normal to Ω at y ∂Ω. That is, x Ω approaches y ∂Ω in the direction ∈ ∈ ∈ of the normal vector ~n(y). We assume that this limit exists for all boundary points y ∂Ω. ∈ 478 17 PDE II — The Equations of Mathematical Physics

(d) The Exterior Neumann Problem:

x=y+tn 1 Given ϕ C(∂Ω) and f C(Ω′), find u C (Ω′) n 2 ∈ ∈ ∈ ∩ C (Ω′) such that y Ω ∆u(x)= f(x) x Ω′, and ∀ ∈ ∂u (y)= ϕ(y), y ∂Ω ∂~n + ∀ ∈ lim u(x)=0. x | |→∞ Here ∂u denotes the limit of directional derivative ∂~n +(y) ∂u (y) = lim ~n(y) grad u(y + t~n(y)) + t 0+0 ∂~n → · and ~n(y) is the outer normal to Ω at y ∂Ω. We assume that this limit exists for all boundary ∈ points y ∂Ω. In both Neumann problems one can also look for a function u C2(Ω) C(Ω) ∈2 ∈ ∩ or u C (Ω′) C(Ω′), respectively, provided the above limits exist and define continuous ∈ ∩ functions on the boundary. These four problems are intimately connected with each other, and we will obtain solutions to all of them simultaneously.

17.4.2 Basic Properties of Harmonic Functions Recall that a function u C2(Ω), is said to be harmonic if ∆u =0 in Ω. ∈ n We say that an operator L on a function space V over Ê is invariant under a affine transfor-

n n Ê mation T , T (x)= A(x)+ b, where A L (Ê ), b , if ∈ ∈

L◦T ∗ = T ∗◦L, where T ∗ : V V is given by (T ∗f)(x) = f(T (x)), f V . It follows that the Laplacian is → ∈ invariant under translations (T (x)= x+b) and rotations T (i. e. T ⊤T = T T ⊤ = I). Indeed, for 1 translations, the matrix B with A˜ = BAB⊤ is the identity and in case of the rotation, B = T − . Since A = I, in both cases, A˜ = A = I; the Laplacian is invariant. n In this section we assume that Ω Ê is a region where Gauß’ divergence theorem is valid for ⊂ all vector fields f C1(Ω) C(Ω) for : ∈ ∩ div f(x) dx = f(y) d~S(y), · ZΩ Z∂Ω n where the dot denotes the inner product in Ê . The term under the integral can be written as · ω(y)= f(y) d~S =(f (y), , f (y)) ( dy dy dy , dy dy dy , · 1 ··· n · 2 ∧ 3 ∧···∧ n − 1 ∧ 3 ∧···∧ n n 1 , ( 1) − dy1 dyn 1 − n ··· − ∧···∧ k 1  = ( 1) − f (y) dy dy dy , − k 1 ∧···∧ k ∧···∧ n Xk=1 d 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 479 where the hat means ommission of this factor. In this way ω(y) becomes a differential (n 1)- − form. Using differentiation of forms, see Definition11.7, we obtain

dω = div f(y) dy dy dy . 1 ∧ 2 ∧···∧ n

This establishes the above generalized form of Gauß’ divergence theorem. Let U : ∂Ω Ê be → a continuous scalar function on ∂Ω, one can define U(y) dS(y) := U(y)~n(y) d~S(y), where ~n · is the outer unit normal vector to the surface ∂Ω. Recall that we obtain Green’s first formula inserting f(x)= v(x) u(x), u, v C2(Ω): ∇ ∈ ∂u v(x)∆u(x) dx + u(x) v(x) dx = v(y) (y) dS(y). ∇ · ∇ ∂~n ZΩ ZΩ Z∂Ω Interchanging the role of u and v and taking the difference, we obtain Green’s second formula ∂u ∂v (v(x)∆u(x) u(x)∆v(x)) dx = v(y) (y) u(y) (y) dS(y). (17.28) − ∂~n − ∂~n ZΩ Z∂Ω   Recall that 1 E (x)= log x , n =2, 2 2π k k 1 n+2 E (x)= x − , n 3 n −(n 2)ω k k ≥ − n n are the fundamental solutions of the Laplacian in Ê .

Theorem 17.17 (Green’s representation formula) Let u C2(Ω). ∈ Then for x Ω we have ∈ ∂E ∂u u(x)= E (x y) ∆u(y) dy + u(y) n (x y) E (x y) (y) dS(y) n − ∂~n − − n − ∂~n ZΩ Z∂Ω  y  (17.29)

Here ∂ denotes the derivative in the direction of the outer normal with respect to the variable ∂~ny y. Note that the distributions ∆u , ∂u δ , and ∂ (u δ ) have compact support such that the { } ∂~n ∂Ω ∂~n ∂Ω convolution products with E exist. n  Proof. Idea of proof For sufficiently small ε > 0, U (x) Ω, since Ω is open. We apply ε ⊂ E E Green’s second formula with v(y)= n(x y) and Ω \ Uε(x) in place of Ω. Since n(x y) − E − is harmonic with respect to the variable y in Ω \ x (recall from Example 7.5, that n(x) is n { } harmonic in Ê 0 ), we obtain \ { } ∂u ∂E (x y) E (x y)∆(y) dy = E (x y) (y) u(y) n − dS(y) n − n − ∂~n − ∂~n Z Z∂Ω  y  Ω \ Uε(x) ∂u ∂En(x y) + En(x y) (y) u(y) − dS(y). (17.30) − ∂~n − ∂~ny ∂UZε(x)   480 17 PDE II — The Equations of Mathematical Physics

In the second integral ~n denotes the outer normal to Ω \ Uε(x) hence the inner normal of Uε(x). We wish to evaluate the limits of the individual integrals in this formula as ε 0. Consider → the left-hand side of (17.30). Since u C2(Ω), ∆u is bounded; since E (x y) is locally ∈ n − integrable, the lhs converges to

E (x y) ∆u(y) dy. n − ZΩ n+2 On ∂U (x), we have E (x y)= β ε− , β = 1/(ω (n 2)). Thus as ε 0, ε n − n n − n − →

∂u β ∂u β ∂u(y) E (x y) (y) dS | n | (y) dS | n | sup dS n − ∂~n ≤ εn 2 ∂~n ≤ εn 2 ∂~n − − Uε(x) ∂UZε(x) ∂UZε(x) SεZ(x)

1 n 1 ∂u(y) ω ε − sup = Cε 0. ≤ (n 2)ω εn 2 n ∂~n −→ n − Uε(x) −

Furthermore, since ~n is the interior normal of the ball Uε(y), the same calculations as in the ∂En(x y) d n+2 n+1 proof of Theorem17.1 show that − = βn (ε− )= ε− /ωn. We obtain, ∂~ny − dε −

∂En(x y) 1 u(y) − dS(y)= n 1 u(y) dS(y) u(x). − ∂~ny ωnε − −→ ∂UZε(x) SεZ(x)

spherical mean

In the last line we used that the integral is the mean| value{z of u over the} sphere Sε(x), and u is continuous at x.

Remarks 17.4 (a) Green’s representation formula is also true for functions u C2(Ω) C1(Ω). To prove this, consider Green’s representation theorem on smaller ∈ ∩ regions Ω Ω such that Ω Ω. ε ⊂ ε ⊂ (b) Applying Green’s representation formula to a test function ϕ D(Ω), see Definition16.1, ∈ ϕ(y)= ∂ϕ (y)=0, y ∂Ω, we obtain ∂~n ∈ ϕ(x)= E (x y)∆ϕ(x) dx n − ZΩ (c) We may now draw the following consequence from Green’s representation formula: If one knows ∆u, then u is completely determined by its values and those of its normal derivative on ∂Ω. In particular, a harmonic function on Ω can be reconstructed from its boundary data. One may ask conversely whether one can construct a harmonic function for arbitrary given values ∂u of u and ∂~n on ∂Ω. Ignoring regularity conditions, we will find out that this is not possible in general. Roughly speaking, only one of these data is sufficient to describe u completely. (d) In case of a harmonic function u C2(Ω) C1(Ω), ∆u =0, Green’s representation formula ∈ ∩ reads (n =3): 1 1 ∂u(y) ∂ 1 u(x)= u(y) dS(y). (17.31) 4π x y ∂~n − ∂~n x y Z∂Ω k − k y k − k 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 481

In particular, the surface potentials V (0)(x) and V (1)(x) can be differentiated arbitrarily often for x Ω. Outside ∂Ω, V (0) and V (1) are harmonic. It follows from (17.31) that any harmonic ∈ function is a C∞-function.

Spherical Means and Ball Means First of all note that u C1(Ω) and u harmonic in Ω implies ∈ ∂u(y) dS =0. (17.32) ∂~n Z∂Ω Indeed, this follows from Green’s first formula inserting v =1 and u harmonic, ∆u =0.

Proposition 17.18 (Mean Value Property) Suppose that u is harmonic in UR(x0) and contin- uous in UR(x0). (a) Then u(x0) coincides with its spherical mean over the sphere SR(x0). 1 (spherical mean) (17.33) u(x0)= n 1 u(y) dS(y) . ωnR − SRZ(x0) (b) Further, n (ball mean) u(x0)= n u(x) dx . ωnR URZ(x0)

Proof. (a) For simplicity, we consider only the case n = 3 and x0 = 0. Apply Green’s repre- sentation formula (17.31) to any ball Ω = Uρ(0) with ρ < R. Noting (17.32) from (17.31) it follows that

1 1 ∂u(y) ∂ 1 u(0) = dS u(y) dS 4π ρ ∂~n − ∂~n y ZSρ(0) ZSρ(0) y k k ! 1 ∂ 1 1 1 = u(y) dS = u(y) dS −4π ∂~n y 4π ρ2 ZSρ(0) y k k ZSρ(0) 1 = u(y) dS, 4πρ2 ZSρ(0) Since u is continuous on the closed ball of radius R, the formula remains valid as ρ R. → (b) Use dx = dx1 dxn = dr dSr where x = r. Multiply both sides of (17.33) by n 1 ··· k k r − dr and integrate with respect to r from 0 to R:

R R n 1 n 1 1 r − u(x ) dr = r − u(y) dS dr 0 ω rn 1 Z0 Z0  n − ZSR(x0)  1 1 Rnu(x )= u(x) dx. n 0 ω n ZUR(x0) n The assertion follows. Note that R ωn/n is exactly the n-dimensional volume of UR(x0). The proof in case n =2 is similar. 482 17 PDE II — The Equations of Mathematical Physics

Proposition 17.19 (Minimum-Maximum Principle) Let u be harmonic in Ω and continuous in Ω. Then max u(x) = max u(x); x Ω x ∂Ω ∈ ∈ i.e. u attains its maximum on the boundary ∂Ω. The same is true for the minimum.

Proof. Suppose to the contrary that M = u(x0) = max u(x) is attained at an inner point x0 Ω x Ω ∈ ∈ and M > m = max u(x)= u(y0), y0 ∂Ω. x ∂Ω ∈ (a) We show that∈u(x)= M is constant in any ball U (x ) Ω around x . ε 0 ⊂ 0 Suppose to the contrary that u(x ) < M for some x U (x ). By continuity of u, u(x) < M 1 1 ∈ ε 0 for all x U (x ) U (x ). In particular ∈ ε 0 ∩ η 1 n n M = u(x )= u(x) dx< M dy = M; 0 ω εn ω εn n ZBε(x0) n ZBε(x0) this is a contradiction; u is constant in Uε(x0). (b) u(x) = M is constant in Ω. Let x Ω; we will show that u(x ) = M. Since Ω is 1 ∈ 1 connected and bounded, there exists a path from x0 to x1 which can be covered by a chain of finitely many balls in Ω. In all balls, starting with the ball around x0 from (a), u(x) = M is constant. Hence, u is constant in Ω. Since u is continuous, u is constant in Ω. This contradicts the assumption; hence, the maximum is assumed on the boundary ∂Ω. Passing from u to u, the statement about the minimum follows. −

Remarks 17.5 (a) A stronger proposition holds with “local maximum” in place of “maximum” (b) Another stricter version of the maximum principle is:

Let u C2(Ω) C(Ω) and ∆u 0 in Ω. Then either u is constant or ∈ ∩ ≥ u(y) < max u(x) x ∂Ω ∈ for all y Ω. ∈ Corollary 17.20 (Uniqueness) The inner and the outer Dirichlet problem has at most one so- lution, respectively.

Proof. Suppose that u1 and u2 both are solutions of the Dirichlet problem, ∆u1 = ∆u2 = f. Put u = u u . Then ∆u(x)=0 for all x Ω and u(y)=0 on the boundary y ∂Ω. 1 − 2 ∈ ∈ (a) Inner problem. By the maximum principle, u(x)=0 for all x Ω; that is u = u . ∈ 1 2 (b) Suppose that u 0. Without loss of generality we may assume that

  6≡     u(x1) = α > 0 for some x1 Ω′. By assumption, u(x) 0 as     B(0) ∈ | | → r    x . Hence, there exists r > 0 such that u(x) < α/2 for all      → ∞ | |     . Since is harmonic in , the maximum principle yields  x r u Br(0) Ω  \ Ω    ≥            α = u(x ) max u(x) α/2;  1    x S (0) ∂Ω  ≤ R ≤   ∈ ∪   a contradiction. 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 483

Corollary 17.21 (Stability) Suppose that u1 and u2 are solutions of the inner Dirichlet prob- lem ∆u1 = ∆u2 = f with boundary values ϕ1(y) and ϕ2(y) on ∂Ω, respectively. Suppose further that ϕ (y) ϕ (y) ε y ∂Ω. | 1 − 2 | ≤ ∀ ∈ Then u (x) u (x) ε for all x Ω. | 1 − 2 | ≤ ∈ A similar statement is true for the exterior Dirichlet problem. Proof. Put u = u u . Then ∆u = 0 and u(y) ε for all y ∂Ω. By the Maximum 1 − 2 | | ≤ ∈ Principle, u(x) ε for all x Ω. | | ≤ ∈

Lemma 17.22 Suppose that u is a non-constant harmonic function on Ω and the maximum of u(x) is attained at y ∂Ω. ∂u ∈ Then ∂~n (y) > 0. For the proof see [Tri92, 3.4.2. Theorem, p. 174].

Proposition 17.23 (Uniqueness) (a) The exterior Neumann problem has at most one solution. (b) A necessary condition for solvability of the inner Neumann problem is

ϕ dS = f(x) dx. Z∂Ω ZΩ Two solutions of the inner Neumann problem differ by a constant.

Proof. (a) Suppose that u1 and u2 are solutions of the exterior Neumann problem, then u = ∂u u u satisfies ∆u =0 and (y)=0. The above lemma shows that u(x)= c is constant in 1 − 2 ∂~n Ω. Since lim x u(x)=0, the constant c is 0; hence u1 = u2. | |→∞ (b) Inner problem. The uniqueness follows as in (a). The necessity of the formula follows from (17.28) with v 1, ∆u = f, ∂v =0. ≡ ∂~n

Proposition 17.24 (Converse Mean Value Theorem) Suppose that u C(Ω) and that when- ∈ ever x Ω such that U (x ) Ω we have the mean value property 0 ∈ r 0 ⊂ 1 1 u(x )= u(y) dS(y)= u(x + ry) dS(y). 0 ω rn 1 ω 0 n − ZSr(x0) n ZS1(0)

Then u C∞(Ω) and u is harmonic in Ω. ∈

Proof. (a) We show that u C∞(Ω). The Mean Value Property ensures that the mollification ∈ h u equals u as long as U (x ) Ω; that is, the mollification does not change u. By ε ∗ 1/ε 0 ⊂ 484 17 PDE II — The Equations of Mathematical Physics

homework 49.4, u C∞ since h is. We prove h u = u. Here g denotes the 1-dimensional ∈ ε ε ∗ bump function, see page 419.

uε(x)=(u hε)(x)= u(y)hε(x y) dy = u(x y)hε(y) dy

n n Ê ∗ Ê − − Z Z n − = u(x y)h(y/ε)ε dy =y u(x εz)h(z) dz − z = i − ZUε(0) i ε ZU1(0) 1 n 1 = u(x rεy)g(r)r − dS(y) dr − Z0 ZS1(0) 1 n 1 = ωnu(x) g(r)r − dr = u(x) h(y) dy = u(x). n Z0 ZÊ Second Part. Differentiating the above equation with respect to r yields ∆u(y) dy = 0 Ur(x) for any ball in Ω, since the left-hand side u(x) does not depend on r. R d 0= u(x + ry) dS(y)= y u(x + ry) dS(y) dr · ∇ ZS1(0) ZS1(0) 1 1 n = (r− z) u(x + z)r − dS(z) ∇ ZSr(0) n = r− ~n(z) u(x + z) dS(z) · ∇ ZSr(0) n ∂u = r− (x + z) dS(z) ∂~n ZSr(0) n ∂u(y) n = r− dS(y)= r− ∆u(x) dx. ∂~n ZSr(x0) ZUr(x0) In the last line we used Green’s 2nd formula with v =1. Thus ∆u =0. Suppose to the contrary that ∆u(x ) =0, say ∆u(x ) > 0. By continuity of ∆u(x), ∆u(x) > 0 for x U (x ). Hence 0 6 0 ∈ ε 0 ∆u(x) dx> 0 which contradicts the above equation. We conclude that u is harmonic in Uε(x0) U (x ). R r 0

Remark 17.6 A regular distribution u D′(Ω) is called harmonic if ∆u =0, that is, ∈

∆u,ϕ = u(x) ∆ϕ(x) dx =0, ϕ D(Ω). h i ∈ ZΩ Weyl’s Lemma: Any harmonic regular distribution is a harmonic function, in particular, u ∈ C∞(Ω).

Example 17.3 Solve

∆u = 2, (x, y) Ω = (0, a) ( b/2, b/2), − ∈ × − u =0. |∂Ω 17.5 Appendix 485

Ansatz: u = w + v with ∆w = 0. For example, choose w = x2 + ax. Then ∆v = 0 with − boundary conditions

v(0,y)= v(a, y)=0, v(x, b/2) = v(x, b/2) = x2 ax. − − Use separation of variables, u(x, y)) = X(x)Y (y) to solve the problem.

17.5 Appendix

17.5.1 Existence of Solutions to the Boundary Value Problems (a) Green’s Function

Let u C2(Ω) C1(Ω). Let us combine Green’s representation formula and Green’s ∈ ∩ 2nd formula with a harmonic function v(x) = v (x), x Ω, where y Ω is thought to be y ∈ ∈ a parameter.

∂E ∂u u(y)= E (x y) ∆u(x) dx + u(x) n (x y) E (x y) (x) dS(x) n − ∂~n − − n − ∂~n ZΩ Z∂Ω  x  ∂v ∂u 0= v (x)∆u(x) dx + u(x) y (x) v (x) (x) dS(x) y ∂~n − y ∂~n ZΩ Z∂Ω   Adding up these two lines and denoting G(x, y)= E (x y)+ v (x) we get n − y ∂G(x, y) ∂u u(y)= G(x, y) ∆u(x) dx + u(x) G(x, y) (x) dS(x). ∂~n − ∂~n ZΩ Z∂Ω  x  Suppose now that G(x, y) vanishes for all x ∂Ω then the last surface integral is 0 and ∈ ∂G(x, y) u(y)= G(x, y) ∆u(x) dx + u(x) dS(x). (17.34) ∂~n ZΩ Z∂Ω x In the above formula, u is completely determined by its boundary values and ∆u in Ω. This motivates the following definition.

Definition 17.4 A function G: Ω Ω Ê satisfying × → (a) G(x, y)=0 for all x ∂Ω, y Ω, x = y. ∈ ∈ 6 (b) v (x)= G(x, y) E (x y) is harmonic in x Ω for all y Ω. y − n − ∈ ∈ is called a Green’s function of Ω. More precisely, G(x, y) is a Green’s function to the inner Dirichlet problem on Ω.

Remarks 17.7 (a) The function v (x) is in particular harmonic in x = y. Since E (x y) has y n − a pole at x = y, G(x, y) has a pole of the same order at x = y such that G(x, y) E (x y) − n − has no singularity. If such a function G(x, y) exists, for all u C2(Ω) we have (17.34). ∈ 486 17 PDE II — The Equations of Mathematical Physics

In particular, if in addition, u is harmonic in Ω,

∂G(x, y) u(y)= u(x) dS(x). (17.35) ∂~n Z∂Ω x This is the so called Poisson’s formula for Ω. In general, it is difficult to find Green’s function. For most regions Ω it is even impossible to give G(x, y) explicitely. However, if Ω has kind of symmetry, one can use the reflection principle to construct G(x, y) explicitely. Nevertheless, G(x, y) exists for all “well-behaved” Ω (the boundary is a C2-set and Gauß’ divergence theorem holds for Ω).

(c) The Reflection Principle

This is a method to calculate Green’s function explicitely in case of domains Ω with the follow- ing property: Using repeated reflections on spheres and on hyperplanes occurring as boundaries n of Ω and its reflections, the whole Ê can be filled up without overlapping.

Example 17.4 Green’s function on a ball UR(0). For, we use the reflection on the sphere SR(0). n For y Ê put ∈

. R y R2 , y =0, y 2 y := k k 6 O y − ( , y =0. y ∞

Note that this map has the the property y y = R2 and y 2 y = R2y. Points on the sphere

· k k Ê SR(0) are fix under this map, y = y. Let En : Ê+ denote the corresponding to En radial → n 2 scalar function with E(x) = E ( x ), that is E (r) = 1/((n 2)ω r − ), n 2. Then we n k k n − − n ≥ put

y En( x y ) En k k x y , y =0, G(x, y)= k − k − R k − k 6 (17.36) E ( x ) E (R),  y =0.  n k k − n

For x = y, G(x, y) is harmonic in x, since for y < R, y > R and therefore x y =0. The 6 k k k k − 6 function G(x, y) has only one singularity in UR(0) namely at x = y and this is the same as that of E (x y). Therefore, n −

y k k En R x y , y =0, vy(x)= G(x, y) En(x y)= − k − k 6 − − E (R), y =0.  n  17.5 Appendix 487 is harmonic for all x Ω. For x ∂Ω = S (0) we have for y =0 ∈ ∈ R 6 1 y 1 G(x, y)= E x 2 + y 2 2x y 2 E k k x 2 + y 2 2x y 2 n k k k k − · − n R k k k k − ·     1   2 1 2 2 2 2 2 2 y y 2 y = En R + y 2x y En y + k k k2 k 2 y x 2 k k − · −  k k R − k k · R !     1  1  = E R2 + y 2 2x y 2 E y 2 + R2 2x y 2 =0. n k k − · − n k k − · For y =0 we have      G(x, 0) = E ( x ) E (R)= E (R) E (R)=0. n k k − n n − n This proves that G(x, y) is a Green’s function for UR(0). In particular, the above calculation y shows that r = x y and r = k k x y are equal if x ∂Ω. k − k R k − k ∈ One can show that Green’s function is symmetric, that is G(x, y) = G(y, x). This is a general property of Green’s function. ∂ To apply formula (17.34) we have to compute the normal derivative G(x, y). Note first that ∂~nx n for any constant z Ê and x S (0) ∈ ∈ R ∂ x x z f( x z )= ~n f( x z )= f ′( x z ) − . ∂~n k − k · ∇ k − k x · k − k x z x k k k − k Note further that for x = R we have k k y r = x y = k k x y , (17.37) k − k R k − k y 2 (x y) x k k (x y) x = R2 y 2 . (17.38) − · − R2 − · −k k Hence, for y =0, 6 n+2 ∂ 1 ∂ n+2 ∂ y − n+2 G(x, y)= x y − k k x y − ∂~n −(n 2)ω ∂~n k − k − ∂~n R n+2 k − k x − n x x − !! n+2 1 n+1 x y x y − n+1 x y x = x y − − k k x y − − ω k − k x y · x − R n+2 k − k x y · x n k − k k k − k − k k k! 1 y 2 = n (x y) x (x y)k k2 x ωnr R − · − − R · ! By (17.38), the expression in the brackets is R2 y 2. Hence, −k k ∂ R2 y 2 1 G(x, y)= −k k . ∂~n ω R x y n x n k − k This formula holds true in case y = 0. Inserting this into (17.34) we have for any harmonic function u C2(U (0)) C(U (0)) we have ∈ R ∩ R R2 y 2 u(x) u(y)= −k k n dS(x). (17.39) ωnR x y SRZ(0) k − k

This is the so called Poisson’s formula for the ball UR(0). 488 17 PDE II — The Equations of Mathematical Physics

Proposition 17.25 Let n 2. Consider the inner Dirichlet problem in Ω = U (0) and f =0. ≥ R The function R2 y 2 ϕ(x) −k k n dS(x), y < R, ωnR x y k − k k k u(y)= SR(0)  ϕ(y), R y = R  k k is continuous on the closed ball UR(0) and harmonic in UR(0). In case n =2 the function u(y), can be written in the following form

1 z + y dz

u(y)= Re ϕ(z) , y U (0) C. 2πi z y z ∈ R ⊂  ZSR(0) −  For the proof of the general statement with n 2, see [Jos02, Theorem 1.1.2] or [Joh82, p. ≥ 107]. We show the last statement for n =2. Since yz yz is purely imaginary, − z + y (z + y)(z y) z 2 y 2 + yz yz R2 y 2 Re = Re − = Re | | −| | − = −| | . z y (z y)(z y) z y 2 z y 2 − − − | − | | − | dz Using the parametrization z = Reit, dt = we obtain iz

1 z + y dz 1 2π R2 y 2 Re ϕ(z) = −| 2| ϕ(z) dt = 2π S (0) z y iz 2π 0 z y  Z R −  Z | − | R2 y 2 ϕ(x) −| | 2 dx . 2πR S (0) x y | | Z R | − |

In the last line we have a (real) line integral of the first kind, using x =(x1, x2)= x1 + ix2 = z, x SR(0) and dx = R dt on the circle. ∈ | | 3 Other Examples. (a) n = 3. The half-space Ω = (x , x , x ) Ê x > 0 . We use the { 1 2 3 ∈ | 3 } ordinary reflection map with respect to the plane x = 0 which is given by y = (y ,y ,y ) 3 1 2 3 7→ y′ =(y ,y , y ). Then Green’s function to Ω is 1 2 − 3 1 1 1 G(x, y)= E (x, y) E (x, y′)= . 3 − 3 4π x y − x y k − ′k k − k (see homework 57.1) 3 (b) n =3. The half ball Ω = (x , x , x ) Ê x 0 . We use the reflections { 1 2 3 ∈ |k k 3 } y y′ and y y (reflection with respect to the sphere S (0)). Then → → R R R G(x, y)= E (x y) E (x y) E (x y′)+ E (x y′) 3 − − y 3 − − 3 − y 3 − k k k k is Green’s function to Ω. 3 (c) n = 3, Ω = (x , x , x ) Ê x > 0, x > 0 . We introduce the reflection { 1 2 3 ∈ | 2 3 } y =(y ,y ,y ) y =(y , y ,y ). Then Green’s function to Ω is 1 2 3 7→ ∗ 1 − 2 3

G(x, y)= E (x y) E (x y′) E (x y )+ E (x (y )′). 3 − − 3 − − 3 − ∗ 3 − ∗ 17.5 Appendix 489

Consider the Neumann problem and the ansatz for Green’s function in case of the Dirichlet problem:

∂H(x, y) ∂u u(y)= H(x, y) ∆u(x) dx + u(x) H(x, y) (x) dS(x). (17.40) ∂~n − ∂~n ZΩ Z∂Ω  x  We want to choose a Green’s function of the second kind H(x, y) in such a way that only the last surface integral remains present. Inserting u =1 in the above formula, we have

∂G(x, y) 1= dS(x). ∂~n Z∂Ω x Imposing, ∂G(x,y) = α = const. , this constant must be α =1/vol(∂Ω). ∂~nx Note, that one defines a Green’s function to the Neumann problem replacing G(x, y)=0 on ∂Ω by the condition ∂ G(x, y) = const. ∂~ny 3 Green’s function of second kind (Neumann problem) to the ball of radius R in Ê , UR(0).

1 1 R 1 2R2 H(x, y)= + + log −4π x y y x y R R2 x y + y x y k − k k kk − k − · k kk − k (c) Existence Theorems

The aim of this considerations is to sketch the method of proving existence of solutions of the four BVPs. n Definition. Suppose that Ω Ê , n 3 and let µ(y) be a continuous function on the boundary ⊂ ≥ ∂Ω, that is µ C(∂Ω). We call ∈

n

w(x)= µ(y) E (x y) dS(y), x Ê , (17.41) n − ∈ Z∂Ω a single-layer potential and

∂En n

v(x)= µ(y) (x y) dS(y), x Ê (17.42) ∂~n − ∈ Z∂Ω a double-layer potential

Remarks 17.8 (a) For x ∂Ω the integrals (17.41) and (17.42) exist. 6∈ n (b) The single layer potential u(x) is continuous on Ê . The double-layer potential jumps at y ∂Ω by µ(y ) as x approaches y , see (17.43) below. 0 ∈ 0 0

n 2 n Ê Theorem 17.26 Let Ω be a connected bounded region in Ê of the class C and Ω′ = \ Ω also be connected. Then the interior Dirichlet problem to the Laplace equation has a unique solution. It can be represented in form of a double-layer potential. The exterior Neumann problem likewise has a unique solution which can be represented in form of a single-layer potential. 490 17 PDE II — The Equations of Mathematical Physics

Theorem 17.27 Under the same assumptions as in the previous theorem, the inner Neumann problem to the Laplace equation has a solution if and only if ∂Ω ϕ(y) dS(y)= 0. If this condition is satisfies, the solution is unique up to a constant. R The exterior Dirichlet problem has a unique solution.

Remark. Let µID denote the continuous functions which produces the solution v(x) of the inte- ∂ E rior Dirichlet problem, i.e. v(x)= ∂Ω µID(y) K(x, y) dS(y), where K(x, y)= n(x y). ∂~ny − Because of the jump relation for v(xR) at x0 ∂Ω: ∈ 1 1 lim v(x) µ(x )= v(x ) = lim v(x)+ µ(x ), (17.43) 0 0 ′ 0 x x0,x Ω 2 x x0,x Ω 2 → ∈ − → ∈

µID satisfies the integral equation 1 ϕ(x)= µ (x)+ µ (y)K(x, y) dS(y), x ∂Ω. 2 ID ID ∈ Z∂Ω 1 The above equation can be written as ϕ =(A + 2 I)µID, where A is the above integral operator 2 1 in L (∂Ω). One can prove the following facts: A is compact, A + 2 I is injective and surjective, ϕ continuous implies µID continuous. For details, see [Tri92, 3.4].

Application to the Poisson Equation

Consider the inner Dirichlet problem ∆u = f, and u = ϕ on ∂Ω. We suppose that f ∈ C(Ω) C1(Ω). We already know that ∩ 1 f(y) E w(x)=( n f)(x)= n 2 dy n ∗ −(n 2)ωn Ê x y − − Z k − k is a distributive solution of the Poisson equation, ∆w = f. By the assumptions on f, w ∈ C2(Ω) and therefore is a classical solution. To solve the problem we try the ansatz u = w + v. Then ∆u = ∆w + ∆v = f + ∆v. Hence, ∆u = f if and only if ∆v = 0. Thus, the inner Dirichlet problem for the Poisson equation reduces to the inner Dirichlet problem for the Laplace equation ∆v =0 with boundary values

v(y)= u(y) w(y)= ϕ(y) w(y)=:ϕ ˜(y), y ∂Ω. − − ∈ Since ϕ and w are continuous on ∂Ω, sois ϕ˜.

17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet Principle (a) The Dirichlet Principle

Consider the inner Dirichlet problem to the Poisson equation with given data f C(Ω) and ∈ ϕ C(∂Ω). ∈ Put C1 (Ω) := v C1(Ω) v = ϕ on ∂Ω . ϕ { ∈ | } 17.5 Appendix 491

On this space define the Dirichlet integral by

1 E(v)= v 2 dx + f v dx, v C1 (Ω). (17.44) 2 k∇ k · ∈ ϕ ZΩ ZΩ This integral is also called energy integral. The Dirichlet principle says that among all functions v with given boundary values ϕ, the function u with ∆u = f minimizes the energy integral E.

1 2 Proposition 17.28 A function u Cϕ(Ω) C (Ω) is a solution of the inner Dirichlet problem ∈ ∩ 1 if and only if the energy integral E attains its minimum on Cϕ(Ω) at u. Proof. (a) Suppose first that u C1 (Ω) C2(Ω) is a solution of the inner Dirichlet problem, ∈ ϕ ∩ ∆u = f. For v C1 (Ω) let w = v u C1 (Ω). Then ∈ ϕ − ∈ ϕ 1 E(v)= E(u + w)= ( u + w) ( u + w) dx + (u + w)f dx 2 ∇ ∇ · ∇ ∇ ZΩ ZΩ 1 1 = u 2 + w 2 + u w dx + (u + w)f dx 2 k∇ k 2 k∇ k ∇ · ∇ ZΩ ZΩ ZΩ ZΩ Since u and v satisfy the same boundary conditions, w = 0. Further, ∆u = f. By Green’s |∂Ω 1st formula, ∂u u w dx = (∆u) w dx + w dS = fw dx. ∇ · ∇ − ∂~n − ZΩ ZΩ Z∂Ω ZΩ Inserting this into the above equation, we have

1 1 E(v)= u 2 + w 2 fw dx + (u + w)f dx 2 k∇ k 2 k∇ k − ZΩ ZΩ ZΩ ZΩ 1 = E(u)+ w 2 dx E(u). 2 k∇ k ≥ ZΩ This shows that E(u) is minimal. (b) Conversely, let u C1 (Ω) C2(Ω) minimize the energy integral. In particular, for any test ∈ ϕ ∩ function ψ D(Ω), ψ has zero boundary values, the function ∈ 1 g(t)= E(u + tψ)= E(u)+ t ( u ψ + fψ) dx + t2 ψ 2 dx ∇ · ∇ 2 k∇ k ZΩ ZΩ st has a local minimum at t = 0. Hence, g′(0) = 0 which is, again by Green’s 1 formula and ψ =0, equivalent to |∂Ω

0= ( u ψ + fψ) dx = ( ∆u + f) ψ dx. ∇ · ∇ − ZΩ ZΩ By the fundamental Lemma of calculus of variations, ∆u = f almost everywhere on Ω. Since both ∆u and f are continuous, this equation holds pointwise for all x Ω. ∈ 492 17 PDE II — The Equations of Mathematical Physics

(b) Hilbert Space Methods

We want to give another reformulation of the Dirichlet problem. Consider the problem

∆u = f, u =0. − |∂Ω On C1(Ω) define a bilinear map

u v = u v dx. E ∇ · ∇ · ZΩ 1 C (Ω) is not yet an inner product space since for any non-vanishing constant function u, u uE = 1 1 0. Denote by C0(Ω) the subspace of functions in C (Ω) vanishing on the boundary ∂Ω.· Now, 1 u vE is an inner product on C0(Ω). The positive definiteness is a consequence of the Poincar´e inequality· below. Its corresponding norm is u 2 = u 2 dx. Let u be a solution of the k kE Ω k∇ k above Dirichlet problem. Then for any v C1(Ω), by Green’s 1st formula ∈ 0 R

v u = v u dx = v ∆u dx = v f dx = v f 2 . E ∇ · ∇ − L · ZΩ ZΩ ZΩ · This suggests that u can be found by representing the known linear functional in v

F (v)= v f dx ZΩ as an inner product v uE. To make use of Riesz’s representations theorem, Theorem 13.8, we have to complete C1(·Ω) into a Hilbert space W with respect to the energy norm and to 0 k·kE prove that the above linear functional F is bounded with respect to the energy norm. This is a consequence of the next lemma. We make the same assumtions on Ω as in the beginning of Section 17.4.

n Lemma 17.29 (Poincare´ inequality) Let Ω Ê . Then there exists C > 0 such that for all ⊂ u C1(Ω) ∈ 0 u 2 C u . k kL (Ω) ≤ k kE n Proof. Let Ω be contained in the cube Γ = x Ê x a, i = 1,...,n . We extend u { ∈ | | i | ≤ } by zero outside Ω. For any x =(x1,...,xn), by the Fundamental Theorem of Calculus

x1 2 x1 2 2 u(x) = ux1 (y1, x2,...,xn) dy1 = 1 ux1 (y1, x2,...,xn) dy1 a a ·  −   −  Zx1 x1 xZ1 a 2 2 2 dy1 ux1 dy1 =(x1 + a) ux1 dy1 2a ux1 dy1. CSI≤ a a a ≤ a Z− Z− Z− Z− Since the last integral does not depend on x1, integration with respect to x1 gives a a 2 2 2 u(x) dx1 4a ux1 dy1. a ≤ a Z− Z− Integrating over x ,...,x from a to a we find 2 n − u2 dx 4a2 u2 dy. ≤ x1 ZΓ ZΓ 17.5 Appendix 493

The same inequality holds for xi, i =2,...,n in place of x1 such that

2 2 2 4a 2 2 2 u 2 = u dx ( u) dx = C u , k kL ≤ n ∇ k kE ZΓ ZΓ where C =2a/√n.

The Poincar´einequality is sometimes called Poincar´e–Friedrich inequality. It remains true for functions u in the completion W . Let us discussthe elements of W in more detail. By definition, 1 f W if there is a Cauchy sequence (fn) in C0(Ω) such that (fn) “converges to f” in the ∈ 2 2 energy norm. By the Poincar´einequality, (fn) is also an L -Cauchy sequence. Since L (Ω) 2 2 1 is complete, (fn) has an L -limit f. This shows W L (Ω). For simplicity, let Ω Ê . 2 ⊆ ⊂ By definition of the energy norm, Ω (fn fm)′ dx 0, as m, n ; that is (fn′ ) is an 2 | − | 2 → →2 ∞ L -Cauchy sequence, too. Hence, (f ′ ) has also some L -limit, say g L (Ω). So far, R n ∈

f f 2 0, f ′ g 2 0. (17.45) k n − kL −→ k n − kL −→

We will show that the above limits imply f ′ = g in D′(Ω). Indeed, by (17.45) and the Cauchy– Schwarz inequality, for all ϕ D(Ω), ∈ 1 1 2 2 2 2 (fn f)ϕ′ dx fn f dx ϕ′ dx 0, Ω − ≤ Ω | − | Ω | | −→ Z Z 1 Z 1 2 2 2 2 (f ′ g)ϕ dx f ′ g dx ϕ dx 0. n − ≤ | n − | | | −→ ZΩ ZΩ  ZΩ  Hence,

f ′ ϕ dx = fϕ′ dx = lim fn ϕ′ dx = lim fn′ ϕ dx = gϕ dx. − − n n ZΩ ZΩ →∞ ZΩ →∞ ZΩ ZΩ This shows f ′ = g in D′(Ω). One says that the elements of W provide weak derivatives, that is, its distributive derivative is an L2-function (and hence a regular distribution). 2 Also, the inner product E is positive definite since the L -inner product is. It turns out that W · · 1,2 is a separable Hilbert space.· W is the so called Sobolev space W0 (Ω) sometimes also denoted 1 1,2 by H0(Ω). The upper indices 1 and 2 in W0 (Ω) refer to the highest order of partial derivatives ( α = 1) and the Lp-space (p = 2) in the definition of W , respectively. The lower index 0 | | refers to the so called generalized boundary values 0. For further readings on Sobolev spaces, see [Fol95, Chapter 6].

Corollary 17.30 F (v)= v fL2 = f v dx defines a bounded linear functional on W . · Ω Proof. By the Cauchy–Schwarz andR Poincar´einequalities,

F (v) f v dx f 2 v 2 C f 2 v . | | ≤ | | ≤k kL k kL ≤ k kL k kE ZΩ

Hence, F is bounded with F C f 2 . k k ≤ k kL 494 17 PDE II — The Equations of Mathematical Physics

Corollary 17.31 Let f C(Ω). Then there exists a unique u W such that ∈ ∈

v uE = v fL2 , v W. · · ∀ ∈ This u solves ∆u = f, in D′(Ω). − The first statement is a consequence of Riesz’s representation theorem; note that F is a bounded linear functional on the Hilbert space W . The last statement follows from D(Ω) C1(Ω) and ⊂ 0 u ϕ = ∆u, ϕ = f ϕ 2 . This is the so called modified Dirichlet problem. It remains open E −h i L the· task to identify the solution· u W with an ordinary function u C2(Ω). ∈ ∈

17.5.3 Numerical Methods (a) Difference Methods

Since most ODE and PDE are not solvable in a closed form there are a lot of methods to find approximative solutions to a given equation or a given problem. A general principle is discretization. One replaces the derivative u′(x) by one of its difference quotients

+ u(x + h) u(x) u(x) u(x h) ∂ u(x)= − , ∂−u(x)= − − , h h

u(x+h) u(x h) − − where h is called the step size. One can also use a symmetric difference 2h . The 2 Five-Point formula for the Laplacian in Ê is then given by

+ + ∆hu(x, y):=(∂x−∂x + ∂y−∂y )u(x, y)= u(x h, y)+ u(x + h, y)+ u(x, y h)+ u(x, y + h) 4u(x, y) = − − − h2 Besides the equation, the domain Ω as well as its boundary ∂Ω undergo a discretization: If Ω = (0, 1) (0, 1) then

× Z Ω = (nh, mh) Ω n, m Æ , ∂Ω = (nh, mh) ∂Ω n, m . h { ∈ | ∈ } h { ∈ | ∈ } The discretization of the inner Dirichlet problem then reads

∆ u = f, x Ω , h ∈ h u = ϕ. |∂Ωh Also, Neumann problems have discretizations, [Hac92, Chapter 4].

(b) The Ritz–Galerkin Method

Suppose we have a boundary value problem in its variational formulation:

Find u V , so that u vE = F (v) for all v V , ∈ · ∈ 17.5 Appendix 495 where we are thinking of the Sobolev space V = W from the previous paragraph. Of course, F is assumed to be bounded. Difference methods arise through discretising the differential operator. Now we wish to leave the differential operator which is hidden in unchanged. The Ritz–Galerkin method consists · ·E in replacing the infinite-dimensional space V· by a finite-dimensional space VN ,

V V, dim V = N < . N ⊂ N ∞ V equipped with the norm is still a Banach space. Since V V , both the inner product N k·kE N ⊆ E and F are defined for u, v VN . Thus, we may pose the problem ··· ∈ Find uN VN , so that uN vE = F (v) for all v VN , ∈ · ∈ The solution to the above problem, if it exists, is called Ritz–Galerkin solution (belonging to

VN ). An introductory example is to be found in [Hac92, 8.1.11, p. 164], see also [Bra01, Chapter 2]. 496 17 PDE II — The Equations of Mathematical Physics Bibliography

[AF01] I. Agricola and T. Friedrich. Globale Analysis (in German). Friedr. Vieweg & Sohn, Braunschweig, 2001.

[Ahl78] L. V. Ahlfors. Complex analysis. An introduction to the theory of analytic func- tions of one complex variable. International Series in Pure and Applied Mathematics. McGraw-Hill Book Co., New York, 3 edition, 1978.

[Arn04] V. I. Arnold. Lectures in Partial Differential Equations. Universitext. Springer and Phasis, Berlin. Moscow, 2004.

[Bra01] D. Braess. Finite elements. Theory, fast solvers, and applications in solid mechanics. Cambridge University Press, Cambridge, 2001.

[Bre97] Glen E. Bredon. Topology and geometry. Number 139 in Graduate Texts in Mathe- matics. Springer-Verlag, New York, 1997.

[Br¨o92] Th. Br¨ocker. Analysis II (German). B. I. Wissenschaftsverlag, Mannheim, 1992.

[Con78] J. B. Conway. Functions of one complex variable. Number 11 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1978.

[Con90] J. B. Conway. A course in functional analysis. Number 96 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1990.

[Cou88] R. Courant. Differential and integral calculus I–II. Wiley Classics Library. John Wiley & Sons, New York etc., 1988.

[Die93] J. Dieudonn´e. Treatise on analysis. Volume I – IX. Pure and Applied Mathematics. Academic Press, Boston, 1993.

[Els02] J. Elstrodt. Maß- und Integrationstheorie. Springer, Berlin, third edition, 2002.

[Eva98] L. C. Evans. Partial differential equations. Number 19 in Graduate Studies in Math- ematics. AMS, Providence, 1998.

[FB93] E. Freitag and R. Busam. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag, Berlin, 1993.

497 498 BIBLIOGRAPHY

[FK98] H. Fischer and H. Kaul. Mathematik fur¨ Physiker. Band 2. Teubner Studienb¨ucher: Mathematik. Teubner, Stuttgart, 1998.

[FL88] W. Fischer and I. Lieb. Ausgewahlte¨ Kapitel der Funktionentheorie. Number 48 in Vieweg Studium. Friedrich Vieweg & Sohn, Braunschweig, 1988.

[Fol95] G. B. Folland. Introduction to partial differential equations. Princeton University Press, Princeton, 1995.

[For81] O. Forster. Analysis 3 (in German). Vieweg Studium: Aufbaukurs Mathematik. Vieweg, Braunschweig, 1981.

[For01] O. Forster. Analysis 1 – 3 (in German). Vieweg Studium: Grundkurs Mathematik. Vieweg, Braunschweig, 2001.

[GS64] I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen). III: Einige Fragen zur Theorie der Differentialgleichungen. (German). Number 49 in Hochschulb¨ucher f¨ur Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 1964.

[GS69] I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen). I: Verallgemeinerte Funktionen und das Rechnen mit ihnen. (German). Number 47 in Hochschulb¨ucher f¨ur Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 1969.

[Hac92] W. Hackbusch. Theory and numerical treatment. Number 18 in Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 1992.

[Hen88] P. Henrici. Applied and computational complex analysis. Vol. 1. Wiley Classics Library. John Wiley & Sons, Inc., New York, 1988.

[How03] J. M. Howie. Complex analysis. Springer Undergraduate Mathematics Series. Springer-Verlag, London, 2003.

[HS91] F. Hirzebruch and W. Scharlau. Einfuhrung¨ in die Funktionalanalysis. Number 296 in BI-Hochschultaschenb¨ucher. BI-Wissenschaftsverlag, Mannheim, 1991.

[HW96] E. Hairer and G. Wanner. Analysis by its history. Undergraduate texts in Mathematics. Readings in Mathematics. Springer-Verlag, New York, 1996.

[J¨an93] K. J¨anich. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag, Berlin, third edi- tion, 1993.

[Joh82] F. John. Partial differential equations. Number 1 in Applied Mathematical Sciences. Springer-Verlag, New York, 1982.

[Jos02] J.Jost. Partial differential equations. Number 214 in Graduate Texts in Mathematics. Springer-Verlag, New York, 2002. BIBLIOGRAPHY 499

[KK71] A. Kufner and J. Kadlec. Fourier Series. G. A. Toombs Iliffe Books, London, 1971.

[Kno78] K. Knopp. Elemente der Funktionentheorie. Number 2124 in Sammlung G¨oschen. Walter de Gruyter, Berlin, New York, 9 edition, 1978.

[K¨on90] K. K¨onigsberger. Analysis 1 (English). Springer-Verlag, Berlin, Heidelberg, New York, 1990.

[Lan89] S. Lang. Undergraduate Analysis. Undergraduate texts in mathematics. Springer, New York-Heidelberg, second edition, 1989.

[MW85] J. Marsden and A. Weinstein. Calculus. I, II, III. Undergraduate Texts in Mathemat- ics. Springer-Verlag, New York etc., 1985.

[Nee97] T. Needham. Visual complex analysis. Oxford University Press, New York, 1997.

[O’N75] P. V. O’Neil. Advanced calculus. Collier Macmillan Publishing Co., London, 1975.

[RS80] M. Reed and B. Simon. Methods of modern mathematical physics. I. Functional Analysis. Academic Press, Inc., New York, 1980.

[Rud66] W. Rudin. Real and Complex Analysis. International Student Edition. McGraw-Hill Book Co., New York-Toronto, 1966.

[Rud76] W. Rudin. Principles of mathematical analysis. International Series in Pure and Ap- plied Mathematics. McGraw-Hill Book Co., New York-Auckland-D¨usseldorf, third edition, 1976.

[R¨uh83] F. R¨uhs. Funktionentheorie. Hochschulb¨ucher f¨ur Mathematik. VEB Deutscher Ver- lag der Wissenschaften, Berlin, 4 edition, 1983.

[Spi65] M. Spivak. Calculus on manifolds. W. A. Benjamin, New York, Amsterdam, 1965.

[Spi80] M. Spivak. Calculus. Publish or Perish, Inc., Berkeley, California, 1980.

[Str92] W. A. Strauss. Partial differential equations. John Wiley & Sons, New York, 1992.

[Tri92] H. Triebel. Higher analysis. Hochschulb¨ucher f¨ur Mathematik. Johann Ambrosius Barth Verlag GmbH, Leipzig, 1992.

[vW81] C. von Westenholz. Differential forms in mathematical physics. Number 3 in Stud- ies in Mathematics and its Applications. North-Holland Publishing Co., Amsterdam- New York, second edition, 1981.

[Wal74] W. Walter. Einfuhrung¨ in die Theorie der Distributionen (in German). Bibliographi- sches Institut, B.I.- Wissenschaftsverlag, Mannheim-Wien-Z¨urich, 1974.

[Wal02] W. Walter. Analysis 1–2 (in German). Springer-Lehrbuch. Springer, Berlin, fifth edition, 2002. 500 BIBLIOGRAPHY

[Wla72] V. S. Wladimirow. Gleichungen der mathematischen Physik (in German). Number 74 in Hochschulb¨ucher f¨ur Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 1972.

PDDR.A.SCHULER¨ MATHEMATISCHES INSTITUT UNIVERSITAT¨ LEIPZIG 04009 LEIPZIG [email protected]