<<

Chapter 6

Differential

In this chapter, it is assumed that all linear spaces and flat spaces under consideration are finite-dimensional.

61 Differentiation of Processes

Let be a flat space with translation space . A mapping p : I from E V → E some I Sub R to will be called a process. It is useful to think ∈ E of the value p(t) as describing the state of some physical system at ∈E t. In special cases, the mapping p describes the motion of a particle and p(t) is the place of the particle at time t. The concept of differentiability for real-valued functions (see Sect.08) extends without difficulty to processes as follows: Definition 1: The process p : I is said to be differentiable at → E t I if the ∈ 1 ∂tp := lim (p(t + s) p(t)) (61.1) s→0 s − exists. Its value ∂tp is then called the of p at t. We say ∈ V that p is differentiable if it is differentiable at all t I. In that case, the ∈ mapping ∂p : I defined by (∂p)(t) := ∂tp for all t I is called the → V ∈ derivative of p. Given n N×, we say that p is n differentiable if ∂np : I ∈ → V can be defined by the recursion

∂1p := ∂p, ∂k+1p := ∂(∂kp) for all k (n 1)]. (61.2) ∈ − We say that p is of class Cn if it is n times differentiable and ∂np is continuous. We say that p is of class C∞ if it is of class Cn for all n N×. ∈ 209 210 CHAPTER 6.

As for a real-valued , it is easily seen that a process p is contin- uous at t Dom p if it is differentiable at t. Hence p is continuous if it is ∈ differentiable, but it may also be continuous without being differentiable. In analogy to (08.34) and (08.35), we also use the notation

p(k) := ∂kp for all k (n 1)] (61.3) ∈ − when p is an n-times differentiable process, and we use

p• := p(1) = ∂p, p•• := p(2) = ∂2p, p••• := p(3) = ∂3p, (61.4) if meaningful. We use the term “process” also for a mapping p : I from some → D interval I into a subset of the flat space . In that case, we use poetic D E license and ascribe to p any of the properties defined above if p E has that | property. Also, we write ∂p instead of ∂(p E ) if p is differentiable, etc. (If | D is included in some flat , then one can take the direction space of rather F F than all of as the codomain of ∂p. This ambiguity will usually not cause V any difficulty.) The following facts are immediate consequences of Def.1, and Prop.5 of Sect.56 and Prop.6 of Sect.57. Proposition 1: The process p : I is differentiable at t I if and → E ∈ only if, for each λ in some basis of ∗, the function λ(p p(t)) : I R is V − → differentiable at t. The process p is differentiable if and only if, for every flat function a ∈ Flf , the function a p is differentiable. E ◦ Proposition 2: Let , ′ be flat spaces and α : ′ a flat mapping. E E E→E If p : I is a process that is differentiable at t I, then α p : I ′ →E ∈ ◦ →E is also differentiable at t I and ∈ ∂t(α p)=( α)(∂tp). (61.5) ◦ ∇ If p is differentiable then (61.5) holds for all t I and we get ∈ ∂(α p)=( α)∂p. (61.6) ◦ ∇ Let p : I and q : I ′ be processes having the same domain I. → E → E Then (p, q) : I ′, defined by value-wise pair formation, (see (04.13)) →E×E is another process. It is easily seen that p and q are both differentiable at t I if and only if (p, q) is differentiable at t. If this is the case we have ∈ ∂t(p, q)=(∂tq,∂tq). (61.7) 61. DIFFERENTIATION OF PROCESSES 211

Both p and q are differentiable if and only if (p, q) is, and, in that case, ∂(p, q)=(∂p, ∂q). (61.8) Let p and q be processes having the same domain I and the same codomain . Since the point-difference (x, y) x y is a flat mapping E 7→ − from into whose is the vector-difference (u, v) u v E×E V 7→ − from into we can apply Prop.2 to obtain V × V V Proposition 3: If p : I and q : I are both differentiable at → E → E t I, so is the value-wise difference p q : I , and ∈ − → V ∂t(p q)=(∂tp) (∂tq). (61.9) − − If p and q are both differentiable, then (61.9) holds for all t I and we ∈ get ∂(p q) = ∂p ∂q. (61.10) − − The following result generalizes the Difference-Quotient Theorem stated in Sect.08. Difference-Quotient Theorem: Let p : I be a process and let → E t ,t I with t < t . If p is continuous and if p is differentiable at 1 2 ∈ 1 2 |[t1,t2] each t ]t ,t [ then ∈ 1 2 p(t2) p(t1) − Clo Cxh ∂tp t ]t ,t [ . (61.11) t t ∈ { | ∈ 1 2 } 2 − 1 Proof: Let a Flf be given. Then (a p) is continuous and, ∈ E ◦ |[t1,t2] by Prop.1, a p is differentiable at each t ]t ,t [. By the elementary ◦ ∈ 1 2 Difference-Quotient Theorem (see Sect.08) we have

(a p)(t2) (a p)(t2) ◦ − ◦ ∂t(a p) t ]t ,t [ . t t ∈{ ◦ | ∈ 1 2 } 2 − 1 Using (61.5) and (33.4), we obtain

p(t2) p(t1) a − ( a)>( ), (61.12) ∇ t t ∈ ∇ S  2 − 1  where := ∂tp t ]t ,t [ . S { | ∈ 1 2 } Since (61.12) holds for all a Flf we can conclude that b( p(t2)−p(t1) ) 0 ∈ E t2−t1 ≥ holds for all those b Flf that satisfy b>( ) P. Using the Half-Space ∈ V S ⊂ Intersection Theorem of Sect.54, we obtain the desired result (61.11). Notes 61

n · (n) (1) See Note (8) to Sect.08 concerning notations such as ∂tp, ∂p, ∂ p, p , p , etc. 212 CHAPTER 6. DIFFERENTIAL CALCULUS

62 Small and Confined Mappings

Let and ′ be linear spaces of strictly positive . Consider a V V mapping n from a neighborhood of zero in to a neighborhood of zero in V ′. If n(0) = 0 and if n is continuous at 0, then we can say, intuitively, that V n(v) approaches 0 in ′ as v approaches 0 in . We wish to make precise V V the idea that n is small near 0 in the sense that n(v) approaches 0 ′ ∈ V ∈ V faster than v approaches 0 . ∈ V Definition 1: We say that a mapping n from a neighborhood of 0 in V to a neighborhood of 0 in ′ is small near 0 if n(0) = 0 and, for all norms V ν and ν′ on and ′, respectively, we have V V ν′(n(u)) lim =0. (62.1) u→0 ν(u) The set of all such small mappings will be denoted by Small( , ′). V V Proposition 1: Let n be a mapping from a neighborhood of 0 in to a V neighborhood of 0 in ′. Then the following conditions are equivalent: V (i) n Small( , ′). ∈ V V (ii) n(0) = 0 and the limit-relation (62.1) holds for some norm ν on V and some norm ν′ on ′. V (iii) For every bounded subset of and every ′ Nhd ( ′) there is a S V N ∈ 0 V δ P× such that ∈ n(sv) s ′ for all s ] δ, δ[ (62.2) ∈ N ∈ − and all v such that sv Dom n. ∈ S ∈ Proof: (i) (ii): This implication is trivial. ⇒ (ii) (iii): Assume that (ii) is valid. Let ′ Nhd0( ′) and a bounded ⇒ N ∈ V subset of be given. By Cor.1 to the Cell-Inclusion Theorem of Sect.52, S V we can choose b P× such that ∈ ν(v) b for all v . (62.3) ≤ ∈ S By Prop.3 of Sect.53 we can choose ε P× such that ∈ εbCe(ν′) ′. (62.4) ⊂ N Applying Prop.4 of Sect.57 to the assumption (ii) we obtain δ P× such ∈ that, for all u Dom n, ∈ ν′(n(u)) < εν(u) if 0 <ν(u) < δb. (62.5) 62. SMALL AND CONFINED MAPPINGS 213

Now let v be given. Then ν(sv) = s ν(v) s b<δb for all s ] δ, δ[ ∈ S | | ≤| | ∈ − such that sv Dom n. Therefore, by (62.5), we have ∈ ν′(n(sv)) < εν(sv) = ε s ν(v) s εb | | ≤| | if sv = 0, and hence 6 n(sv) sεbCe(ν′) ∈ for all s ] δ, δ[ such that sv Dom n. The desired conclusion (62.2) now ∈ − ∈ follows from (62.4). (iii) (i): Assume that (iii) is valid. Let a norm ν on , a norm ⇒ V ν′ on ′, and ε P× be given. We apply (iii) to the choices := Ce(ν), V ∈ S ′ := εCe(ν′) and determine δ P× such that (62.2) holds. If we put N ∈ s := 0 in (62.2) we obtain n(0) = 0. Now let u Dom n be given such that ∈ 1 0 < ν(u) < δ. If we apply (62.2) with the choices s := ν(u) and v := s u, we see that n(u) ν(u)εCe(ν′), which yields ∈ ν′(n(u)) < ε. ν(u) The assertion follows by applying Prop.4 of Sect.57. The condition (iii) of Prop.1 states that 1 lim n(sv)=0 (62.6) s→0 s for all v and, roughly, that the limit is approached uniformly as v varies ∈ V in an arbitrary bounded set. We also wish to make precise the intuitive idea that a mapping h from a neighborhood of 0 in to a neighborhood of 0 in ′ is confined near zero in V V the sense that h(v) approaches 0 ′ not more slowly than v approaches ∈ V 0 . ∈ V Definition 2: A mapping h from a neighborhood of 0 in to a neigh- V borhood of 0 in ′ is said to be confined near 0 if for every norm ν on V V and every norm ν′ on ′ there is Nhd0( ) and κ P× such that V N ∈ V ∈ ν′(h(u)) κν(u) for all u Dom h. (62.7) ≤ ∈ N ∩ The set of all such confined mappings will be denoted by Conf( , ′). V V Proposition 2: Let h be a mapping from a neighborhood of 0 in to a V neighborhood of 0 in ′. Then the following are equivalent: V (i) h Conf( , ′). ∈ V V 214 CHAPTER 6. DIFFERENTIAL CALCULUS

(ii) There exists a norm ν on , a norm ν′ on ′, a neighborhood of 0 V V N in , and κ P× such that (62.7) holds. V ∈ (iii) For every bounded subset of there is δ P× and a bounded subset S V ∈ ′ of ′ such that S V h(sv) s ′ for all s ] δ, δ[ (62.8) ∈ S ∈ − and all v such that sv Dom h. ∈ S ∈ Proof: (i) (ii): This is trivial. ⇒ (ii) (iii): Assume that (ii) holds. Let a bounded subset of be ⇒ S V given. By Cor.1 to the Cell-Inclusion Theorem of Sect.52, we can choose b P× such that ∈ ν(v) b for all v . ≤ ∈ S By Prop.3 of Sect.53, we can determine δ P× such that δbCe(ν) . ∈ ⊂ N Hence, by (62.7), we have for all u Dom h ∈ ν′(h(u)) κν(u) if ν(u) < δb. (62.9) ≤ Now let v be given. Then ν(sv) = s ν(v) s b<δb for all s ] δ, δ[ ∈ S | | ≤| | ∈ − such that sv Dom h. Therefore, by (62.9) we have ∈ ν′(h(sv) κν(sv) = κ s ν(v) < s κb ≤ | | | | and hence

h(sv) sκbCe(ν′) ∈ for all s ] δ, δ[ such that sv Dom h. If we put ′ := κbCe(ν′), we obtain ∈ − ∈ S the desired conclusion (62.8). (iii) (i): Assume that (iii) is valid. Let a norm ν on and a norm ν′ ⇒ V on ′ be given. We apply (iii) to the choice := Ce(ν) and determine ′ and V S S δ P× such that (62.8) holds. Since ′ is bounded, we can apply the Cell- ∈ S Inclusion Theorem of Sect.52 and determine κ P× such that ′ κCe(ν′). ∈ S ⊂ We put := δCe(ν) Dom h, which belongs to Nhd0( ). If we put s := 0 N ∩ V in (62.8) we obtain h(0) = 0, which shows that (62.7) holds for u := 0. Now let u × be given, so that 0 <ν(u) <δ. If we apply (62.8) with the ∈ N 1 choices s := ν(u) and v := s u, we see that h(u) ν(u) ′ ν(u)κCe(ν′), ∈ S ⊂ which yields the assertion (62.7). The following results are immediate consequences of the definition and of the properties on linear mappings discussed in Sect.52: 62. SMALL AND CONFINED MAPPINGS 215

(I) Value-wise sums and value-wise scalar multiples of mappings that are small [confined] near zero are again small [confined] near zero.

(II) Every mapping that is small near zero is also confined near zero, i.e.

Small( , ′) Conf( , ′). V V ⊂ V V (III) If h Conf( , ′), then h(0) = 0 and h is continuous at 0. ∈ V V (IV) Every linear mapping is confined near zero, i.e.

Lin( , ′) Conf( , ′). V V ⊂ V V

(V) The only linear mapping that is small near zero is the zero-mapping, i.e. Lin( , ′) Small( , ′) = 0 . V V ∩ V V { } Proposition 3: Let , ′, ′′ be linear spaces and let h V V V ∈ Conf( , ′) and k Conf( ′, ′′) be such that Dom k = Cod h. Then V V ∈ V V k h Conf( , ′′). Moreover, if one of k or h is small near zero so is ◦ ∈ V V k h. ◦ Proof: Let norms ν,ν′,ν′′ on , ′, ′′, respectively, be given. Since h V V V and k are confined we can find κ,κ′ P× and Nhd0( ), ′ Nhd0( ′) ∈ N ∈ V N ∈ V such that ν′′((k h)(u) κ′ν′(h(u)) κ′κν(u) (62.10) ◦ ≤ ≤ for all u Dom h such that h(u) ′ Dom k, i.e. for all ∈ N ∩ ∈ N ∩ u h<( ′ Dom k). Since h is continuous at 0 , we have ∈ N ∩ N ∩ ∈ V h<( ′ Dom k) Nhd0( ) and hence h<( ′ Dom k) Nhd0( ). N ∩ ∈ V N ∩ N ∩ ∈ V Thus, (62.7) remains satisfied when we replace h,κ and by k h,κ′κ, and N ◦ h<( ′ Dom k), respectively, which shows that k h Conf( , ′′). N ∩ N ∩ ◦ ∈ V V Assume now, that one of k and h, say h, is small. Let ε P× be given. ∈ Then we can choose Nhd0( ) such that ν′(h(u)) κν(u) holds for all N ∈ ε V ≤′′ u Dom h with κ := ′ . Therefore, (62.10) gives ν ((k h)(u)) εν(u) ∈ N ∩ κ ◦ ≤ for all u h<( ′ Dom k). Since ε P× was arbitrary this proves ∈ N ∩ N ∩ ∈ that ν′′((k h)(u)) lim ◦ =0, u→0 ν(u) i.e. that k h is small near zero. ◦ Now let and ′ be flat spaces with translation spaces and ′, respec- E E V V tively. 216 CHAPTER 6. DIFFERENTIAL CALCULUS

Definition 3: Let x be given. We say that a mapping σ from a ∈ E neighborhood of x to a neighborhood of 0 ′ is small near x if the ∈ E ∈ V mapping v σ(x + v) from (Dom σ) x to Cod σ is small near 0. The 7→ − ′ set of all such small mappings will be denoted by Smallx( , ). E V We say that a mapping ϕ from a neighborhood of x to a neighborhood ∈E of ϕ(x) ′ is confined near x if the mapping v (ϕ(x+v) ϕ(x)) from ∈E 7→ − (Dom ϕ) x to (Cod ϕ) ϕ(x) is confined near zero. − − The following characterization is immediate. Proposition 4: The mapping ϕ is confined near x if and only if ′ ′ ∈ E for every norm ν on and every norm ν on there is Nhdx( ) and V V N ∈ E κ P× such that ∈ ν′(ϕ(y) ϕ(x)) κν(y x) for all y . (62.11) − ≤ − ∈ N We now state a few facts that are direct consequences of the definitions, the results (I)–(V) stated above, and Prop.3: (VI) Value-wise sums and differences of mappings that are small [confined] near x are again small [confined] near x. Here, “sum” can mean either the sum of two vectors or sum of a point and a vector, while “differ- ence” can mean either the difference of two vectors or the difference of two points.

′ (VII) Every σ Smallx( , ) is confined near x. ∈ E V (VIII) If a mapping is confined near x it is continuous at x. (IX) A flat mapping α : ′ is confined near every x . E→E ∈E (X) The only flat mapping β : ′ that is small near some x is E → V ∈ E the 0E→V′ . (XI) If ϕ is confined near x and if ψ is a mapping with Dom ψ = Cod ϕ ∈E that is confined near ϕ(x) then ψ ϕ is confined near x. ◦ ′ ′ ′′ (XII) If σ Smallx( , ) and h Conf( , ) with Cod σ = Dom h, then ∈ E V ′′ ∈ V V h σ Smallx( , ). ◦ ∈ E V (XIII) If ϕ is confined near x and if σ is a mapping with Dom σ = Cod ϕ ∈E ′′ ′′ that is small near ϕ(x) then σ ϕ Smallx( , ), where is the ◦ ∈ E V V linear space for which Cod σ Nhd0( ′′). ∈ V (XIV) An adjustment of a mapping that is small [confined] near x is again small [confined] near x, provided only that the concept small [confined] near x remains meaningful after the adjustment. 63. , 217

Notes 62

′ (1) In the conventional treatments, the norms ν and ν in Defs.1 and 2 are assumed to be prescribed and fixed. The notation n = o(ν), and the phrase “n is small oh ′ of ν”, are often used to express the assertion that n ∈ Small(V, V ). The notation h = O(ν), and the phrase “h is big oh of ν”, are often used to express the assertion ′ that h ∈ Conf(V, V ). I am introducing the terms “small” and “confined” here for the first time because I believe that the conventional terminology is intolerably awkward and involves a misuse of the = sign.

63 Gradients, Chain Rule

Let I be an open interval in R. One learns in elementary calculus that if a function f : I R is differentiable at a point t I, then the graph of f has → ∈ a at (t, f(t)). This tangent is the graph of a flat function a Flf(R). ∈ Using poetic license, we refer to this function itself as the tangent to f at t I. In this sense, the tangent a is given by a(r) := f(t)+(∂tf)(r t) for ∈ − all r R. ∈

If we put σ := f a I , then σ(r) = f(r) f(t) (∂tf)(r t) for all r I. −σ(t+|s) − − − ∈ We have lims = 0, from which it follows that σ Smallt(R, R). →0 s ∈ One can use the existence of a tangent to define differentiability at t. Such a definition generalizes directly to mappings involving flat spaces. Let , ′ be flat spaces with translation spaces , ′, respectively. We E E V V consider a mapping ϕ : ′ from an open subset of into an open D → D D E subset ′ of ′. D E Proposition 1: Given x , there can be at most one flat mapping ∈ D α : ′ such that the value-wise difference ϕ α : ′ is small E →E − |D D → V near x. Proof: If the flat mappings α1, α2 both have this property, then the value-wise difference (α α ) = (ϕ α ) (ϕ α ) is small near 2 − 1 |D − 1|D − − 2|D 218 CHAPTER 6. DIFFERENTIAL CALCULUS x . Since α α is flat, it follows from (X) and (XIV) of Sect.62 that ∈ E 2 − 1 α α is the zero mapping and hence that α = α . 2 − 1 1 2 Definition 1: The mapping ϕ : ′ is said to be differentiable D → D at x if there is a flat mapping α : ′ such that ∈ D E→E ′ ϕ α Smallx( , ). (63.1) − |D ∈ E V This (unique) flat mapping α is then called the tangent to ϕ at x. The gradient of ϕ at x is defined to be the gradient of α and is denoted by

xϕ := α. (63.2) ∇ ∇ We say that ϕ is differentiable if it is differentiable at all x . If this ∈ D is the case, the mapping

ϕ : Lin( , ′) (63.3) ∇ D→ V V defined by ( ϕ)(x) := xϕ for all x (63.4) ∇ ∇ ∈ D is called the gradient of ϕ. We say that ϕ is of class C1 if it is differentiable and if its gradient ϕ is continuous. We say that ϕ is twice differentiable ∇ if it is differentiable and if its gradient ϕ is also differentiable. The gradient ∇ of ϕ is then called the second gradient of ϕ and is denoted by ∇ (2)ϕ := ( ϕ): Lin( , Lin( , ′)) = Lin ( 2, ′). (63.5) ∇ ∇ ∇ D→ V V V ∼ 2 V V We say that ϕ is of class C2 if it is twice differentiable and if (2)ϕ is ∇ continuous. If the subsets and ′ are arbitrary, not necessarily open, and if x D D ′ E ∈ Int , we say that ϕ : is differentiable at x if ϕ Int D is differentiable D D → D E | at x and we write xϕ for x(ϕ ). ∇ ∇ |Int D The differentiability properties of a mapping ϕ remain unchanged if the codomain of ϕ is changed to any open subset of ′ that includes Rng ϕ. The E gradient of ϕ remains unaltered. If Rng ϕ is included in some flat ′ in ′, F E one may change the codomain to a subset that is open in ′. in that case, F the gradient of ϕ at a point x must be replaced by the adjustment U ′ ′ ∈ D ′ xϕ of xϕ, where is the direction space of . ∇ | ∇ U F The differentiability and the gradient of a mapping at a point depend only on the values of the mapping near that point. To be more precise, let ϕ1 and ϕ be two mappings whose domains are neighborhoods of a given x 2 ∈E and whose codomains are open subsets of ′. Assume that ϕ and ϕ agree E 1 2 63. GRADIENTS, CHAIN RULE 219

E′ E′ on some neighborhood of x, i.e. that ϕ = ϕ for some Nhdx( ). 1|N 2|N N ∈ E Then ϕ1 is differentiable at x if and only if ϕ2 is differentiable at x. If this is the case, we have xϕ = xϕ . ∇ 1 ∇ 2 Every flat mapping α : ′ is differentiable. The tangent of α at E →E every point x is α itself. The gradient of α as defined in this section ∈ E is the constant ( α) ′ whose value is the gradient α of α in the ∇ E→Lin(V,V ) ∇ sense of Sect.33. The gradient of a linear mapping is the constant whose value is this linear mapping itself. In the case when := := R and when := I is an open interval, E V D differentiability at t I of a process p : I ′ in the sense of Def.1 above ∈ → D reduces to differentiability of p at t in the sense of Def.1 of Sect.1. The ′ ′ gradient tp Lin(R, ) becomes associated with the derivative ∂tp ∇ ∈ V ′ ′ ∈ V by the natural isomorphism from to Lin(R, ), so that tp =(∂tp) and V V ∇ ⊗ ∂tp =( tp)1 (see Sect.25). If I is an interval that is not open and if t is an ∇ endpoint of it, then the derivative of p : I ′ at t, if it exists, cannot be → D associated with a gradient. If ϕ is differentiable at x then it is confined and hence continuous at x. This follows from the fact that its tangent α, being flat, is confined near x and that the difference ϕ α , being small near x, is confined near x. The − |D converse is not true. For example, it is easily seen that the absolute-value function (t t ): R R is confined near 0 R but not differentiable at 0. 7→ | | → ∈ The following criterion is immediate from the definition: Characterization of Gradients: The mapping ϕ : ′ is dif- D → D ferentiable at x if and only if there is an L Lin( , ′) such that ∈ D ∈ V V n :( x) , defined by D − → V n(v):=(ϕ(x + v) ϕ(x)) Lv for all v x, (63.6) − − ∈ D − is small near 0 in . If this is the case, then xϕ = L. V ∇ Let , , be open subsets of flat spaces , , with translation D D1 D2 E E1 E2 spaces , , , respectively. The following result follows immediately from V V1 V2 the definitions if we use the term-wise evaluations (04.13) and (14.12). Proposition 2: The mapping (ϕ ,ϕ ) : is differentiable 1 2 D → D1 × D2 at x if and only if both ϕ and ϕ are differentiable at x. If this is the ∈ D 1 2 case, then

x(ϕ ,ϕ )=( xϕ , xϕ ) Lin( , ) Lin( , ) = Lin( , ) ∇ 1 2 ∇ 1 ∇ 2 ∈ V V1 × V V2 ∼ V V1 × V2 (63.7) General Chain Rule: Let , ′, ′′ be open subsets of flat spaces D D D , ′, ′′ with translation spaces , ′, ′′, respectively. If ϕ : ′ is E E E V V V D → D 220 CHAPTER 6. DIFFERENTIAL CALCULUS differentiable at x and if ψ : ′ ′′ is differentiable at ϕ(x), then the ∈ D D → D composite ψ ϕ : ′′ is differentiable at x. The tangent to the compos- ◦ D → D ite ψ ϕ at x is the composite of the tangent to ϕ at x and the tangent to ψ ◦ at ϕ(x) and we have

x(ψ ϕ)=( ψ)( xϕ). (63.8) ∇ ◦ ∇ϕ(x) ∇ If ϕ and ψ are both differentiable, so is ψ ϕ, and we have ◦ (ψ ϕ)=( ψ ϕ)( ϕ), (63.9) ∇ ◦ ∇ ◦ ∇ where the product on the right is understood as value-wise composition. Proof: Let α be the tangent to ϕ at x and β the tangent to ψ at ϕ(x). Then ′ σ := ϕ α Smallx( , ), − |D ∈ E V ′ ′′ τ := ψ β ′ Small ( , ). − |D ∈ ϕ(x) E V We have

ψ ϕ = (β + τ ) (α + σ) ◦ ◦ = β (α + σ) + τ (α + σ) ◦ ◦ = β α +( β) σ + τ (α + σ) ◦ ∇ ◦ ◦ where domain restriction symbols have been omitted to avoid clutter. It follows from (VI), (IX), (XII), and (XIII) of Sect.62 that ( β) σ + τ (α + ′′ ∇ ◦ ◦ σ) Smallx( , ), which means that ∈ E V ′′ ψ ϕ β α Smallx( , ). ◦ − ◦ ∈ E V If follows that ψ ϕ is differentiable at x with tangent β α. The assertion ◦ ◦ (63.8) follows from the Chain Rule for Flat Mappings of Sect.33. Let ϕ : ′ and ψ : ′ both be differentiable at x . Then D→E D→E ∈ D the value-wise difference ϕ ψ : ′ is differentiable at x and − D → V

x(ϕ ψ) = xϕ xψ. (63.10) ∇ − ∇ −∇ This follows from Prop.2, the fact that the point-difference (x′, y′) (x′ y′) 7→ − is a flat mapping from ′ ′ into ′, and the General Chain Rule. E ×E V When the General Chain Rule is applied to the composite of a vector- valued mapping with a linear mapping it yields 63. GRADIENTS, CHAIN RULE 221

Proposition 3: If h : ′ is differentiable at x and if L D → V ∈ D ∈ Lin( ′, ), where is some linear space, then Lh : is differentiable V W W D→W at x and ∈ D x(Lh) = L( xh). (63.11) ∇ ∇ Using the fact that vector-addition, transpositions of linear mappings, and the trace operations are all linear operations, we obtain the following special cases of Prop.3:

(I) Let h : ′ and k : ′ both be differentiable at x . D → V D → V ∈ D Then the value-wise sum h + k : ′ is differentiable at x and D → V

x(h + k) = xh + xk. (63.12) ∇ ∇ ∇

(II) Let and be linear spaces and let F : Lin( , ) be dif- W Z D → W Z ferentiable at x. If F⊤ : Lin( ∗, ∗) is defined by value-wise D → Z W transposition, then F⊤ is differentiable at x and ∈ D ⊤ ⊤ x(F )v = (( xF)v) for all v . (63.13) ∇ ∇ ∈ V In particular, if I is an open interval and if F : I Lin( , ) is → W Z differentiable, so is F⊤ : I Lin( ∗, ∗), and → Z W (F⊤)• =(F•)⊤. (63.14)

(III) Let be a linear space and let F : Lin( ) be differentiable at x. W D→ W If trF : R is the value-wise trace of F, then trF is differentiable D → at x and

( x(trF))v = tr(( xF)v) for all v . (63.15) ∇ ∇ ∈ V In particular, if I is an open interval and if F : I Lin( ) is differ- → W entiable, so is trF : I R, and → (trF)• = tr(F•). (63.16)

We note three special cases of the General Chain Rule: Let I be an open interval and let and ′ be open subsets of and ′, respectively. If D D E E p : I and ϕ : ′ are differentiable, so is ϕ p : I ′, and → D D → D ◦ → D (ϕ p)• = (( ϕ) p)p•. (63.17) ◦ ∇ ◦ 222 CHAPTER 6. DIFFERENTIAL CALCULUS

If ϕ : ′ and f : ′ R are differentiable, so is f ϕ : R, and D → D D → ◦ D→ (f ϕ)=( ϕ)⊤(( f) ϕ) (63.18) ∇ ◦ ∇ ∇ ◦ (see (21.3)). If f : I and p : I ′ are differentiable, so is p f, and D→ → D ◦ (p f)=(p• f) f. (63.19) ∇ ◦ ◦ ⊗∇

Notes 63

(1) Other common terms for the concept of “gradient” of Def.1 are “differential”, “Fr´echet differential”, “derivative”, and “Fr´echet derivative”. Some authors make an artificial distinction between “gradient” and “differential”. We cannot use “derivative” because, for processes, “gradient” and “derivative” are distinct though related concepts.

(2) The conventional definitions of gradient depend, at first view, on the prescription of a norm. Many texts never even mention the fact that the gradient is a norm- invariant concept. In some contexts, as when one deals with genuine Euclidean spaces, this norm-invariance is perhaps not very important. However, when one deals with mathematical models for space-time in the theory of relativity, the norm- invariance is crucial because it shows that the concepts of differential calculus have a “Lorentz-invariant” meaning.

(3) I am introducing the notation ∇xϕ for the gradient of ϕ at x because the more conventional notation ∇ϕ(x) suggests, incorrectly, that ∇ϕ(x) is necessarily the value at x of a gradient-mapping ∇ϕ. In fact, one cannot define the gradient- mapping ∇ϕ without first having a notation for the gradient at a point (see 63.4).

′ (4) Other notations for ∇xϕ in the literature are dϕ(x), Dϕ(x), and ϕ (x).

(5) I conjecture that the “Chain” of “Chain Rule” comes from an old terminology that used “chaining” (in the sense of “concatenation”) for “composition”. The term “Composition Rule” would need less explanation, but I retained “Chain Rule” because it is more traditional and almost as good.

64 Constricted Mappings

In this section, and ′ denote arbitrary subsets of flat spaces and ′ D D E E with translation spaces and ′, respectively. V V Definition 1: We say that the mapping ϕ : ′ is constricted if D → D for every norm ν on and every norm ν′ on ′ there is κ P× such that V V ∈ ν′(ϕ(y) ϕ(x)) κν(y x) (64.1) − ≤ − 64. CONSTRICTED MAPPINGS 223 holds for all x, y . The infimum of the set of all κ P× for which (64.1) ∈ D ∈ holds for all x, y is called the striction of ϕ relative to ν and ν′; it is ∈ D denoted by str(ϕ; ν,ν′). It is clear that ν′(ϕ(y) ϕ(x)) str(ϕ; ν,ν′) = sup − x, y ,x = y . (64.2) ν(y x) ∈ D 6   −

Proposition 1: For every mapping ϕ : ′, the following are D → D equivalent:

(i) ϕ is constricted.

(ii) There exist norms ν and ν′ on and ′, respectively, and κ P× such V V ∈ that (64.1) holds for all x, y . ∈ D (iii) For every bounded subset of and every ′ Nhd ( ′) there is C V N ∈ 0 V ρ P× such that ∈ x y s = ϕ(x) ϕ(y) sρ ′ (64.3) − ∈ C ⇒ − ∈ N for all x, y and s P×. ∈ D ∈ Proof: (i) (ii): This implication is trivial. ⇒ (ii) (iii): Assume that (ii) holds, and let a bounded subset of ⇒ C V and ′ Nhd0( ′) be given. By Cor.1 of the Cell-Inclusion Theorem of N ∈ V Sect.52, we can choose b P× such that ∈ ν(u) b for all u . (64.4) ≤ ∈C By Prop.3 of Sect.53, we can choose σ P× such that σCe(ν′) ′. Now ∈ ⊂ N let x, y and s P× be given and assume that x y s . Then ∈ D ∈ − ∈ C 1 (x y) and hence, by (64.4), we have ν( 1 (x y)) b, which gives s − ∈ C s − ≤ ν(x y) sb. Using (64.1) we obtain ν′(ϕ(y) ϕ(x)) sκb. If we put −κb ≤ − ≤ ρ := σ , this means that

ϕ(y) ϕ(x) sρσCe(ν′) sρ ′, − ∈ ⊂ N i.e. that (64.3) holds. (iii) (i): Assume that (iii) holds and let a norm ν on and a norm ⇒ V ν′ on ′ be given. We apply (iii) with the choices := BdyCe(ν) and V C ′ := Ce(ν′). Let x, y with x = y be given. If we put s := ν(x y) N ∈ D 6 − 224 CHAPTER 6. DIFFERENTIAL CALCULUS then x y sBdy Ce(ν) and hence, by (64.3), ϕ(x) ϕ(y) sρCe(ν′), which − ∈ − ∈ implies ν′(ϕ(x) ϕ(y)) < sρ = ρν(x y). − − Thus, if we put κ := ρ, then (64.1) holds for all x, y . ∈ D If and ′ are open sets and if ϕ : ′ is constricted, it is confined D D D → D near every x , as is evident from Prop.4 of Sect.62. The converse is not ∈ D true. For example, one can show that the function f :] 1, 1[ R defined − → by t sin( 1 ) if t ]0, 1[ f(t) := t ∈ (64.5) 0 if t ] 1, 0]  ∈ −  is not constricted, but is confined near every t ] 1, 1[. ∈ − Every flat mapping α : ′ is constricted and E→E ′ str(α; ν,ν ) = α ν,ν′ , ||∇ || ′ ′ where ν,ν′ is the operator norm on Lin( , ) corresponding to ν and ν || || V V (see Sect.52). Constrictedness and strictions remain unaffected by a change of codo- main. If the domain of a constricted mapping is restricted, then it remains constricted and the striction of the restriction is less than or equal to the striction of the original mapping. (Pardon the puns.) Proposition 2: If ϕ : ′ is constricted then it is uniformly D → D continuous. Proof: We use condition (iii) of Prop.1. Let ′ Nhd ( ′) be given. N ∈ 0 V We choose a bounded neighborhood of 0 in and determine ρ P× 1 C V 1 ∈ according to (iii). Putting := Nhd0( ) and s := , we see that N ρ C ∈ V ρ (64.3) gives

x y = ϕ(x) ϕ(y) ′ for all x, y . − ∈ N ⇒ − ∈ N ∈ D Pitfall: The converse of Prop.2 is not valid. A counterexample is the square-root function √ : P P, which is uniformly continuous but not → constricted. Another counterexample is the function defined by (64.5). The following result is the most useful criterion for showing that a given mapping is constricted. Striction Estimate for Differentiable Mappings: Assume that D is an open convex subset of , that ϕ : ′ is differentiable and that the E D → D gradient ϕ : Lin( , ′) has a bounded range. Then ϕ is constricted ∇ D → V V and ′ str(ϕ; ν,ν ) sup zϕ ′ z (64.6) ≤ {||∇ ||ν,ν | ∈ D} 64. CONSTRICTED MAPPINGS 225 for all norms ν,ν′ on , ′, respectively. V V Proof: Let x, y . Since is convex, we have tx + (1 t)y for ∈ D D − ∈ D all t [0, 1] and hence we can define a process ∈ p : [0, 1] ′ by p(t) := ϕ(tx + (1 t)y). → D − By the Chain Rule, p is differentiable at t when t ]0, 1[ and we have ∈

∂tp =( zϕ)(x y) with z := tx + (1 t)y . ∇ − − ∈ D Applying the Difference-Quotient Theorem of Sect.61 to p, we obtain

ϕ(x) ϕ(y) = p(1) p(0) Clo Cxh ( zϕ)(x y) z . (64.7) − − ∈ { ∇ − | ∈ D} If ν,ν′ are norms on , ′ then, by (52.7), V V ′ ν (( zϕ)(x y)) zϕ ν,ν′ ν(x y) (64.8) ∇ − ≤ ||∇ || − for all z . To say that ϕ has a bounded range is equivalent, by Cor.1 ∈ D ∇ to the Cell-Inclusion Theorem of Sect.52, to

κ := sup zϕ ν,ν′ z < . (64.9) {||∇ || | ∈ D} ∞ ′ It follows from (64.8) that ν (( zϕ)(x y) κν(x y) for all z , which ∇ − ≤ − ∈ D can be expressed in the form

′ ( zϕ)(x y) κν(x y)Ce(ν ) for all z . ∇ − ∈ − ∈ D Since the set on the right is closed and convex we get

′ Clo Cxh ( zϕ)(y x) z κν(x y)Ce(ν ) { ∇ − | ∈D}⊂ − and hence, by (64.7), ϕ(x) ϕ(y) κν(x y)Ce(ν′). This may be expressed − ∈ − in the form ν′(ϕ(x) ϕ(y)) κν(x y). − ≤ − Since x, y were arbitrary it follows that ϕ is constricted. The definition ∈ D (64.9) shows that (64.6) holds. Remark: It is not hard to prove that the inequality in (64.6) is actually an equality. Proposition 3: If is a non-empty open convex set and if ϕ : ′ D D → D is differentiable with gradient zero, then ϕ is constant. Proof: Choose norms ν,ν′ on , ′, respectively. The assumption ϕ = V V ∇′ 0 gives zϕ ′ = 0 for all z . Hence, by (64.6), we have str(ϕ,ν,ν ) = ||∇ ||ν,ν ∈ D 226 CHAPTER 6. DIFFERENTIAL CALCULUS

0. Using (64.2), we conclude that ν′(ϕ(y) ϕ(x)) = 0 and hence ϕ(x) = ϕ(y) − for all x, y . ∈ D Remark: In Prop.3, the condition that be convex can be replaced by D the weaker one that be “connected”. This means, intuitively, that every D two points in can be connected by a continuous entirely within . D D Proposition 4: Assume that and ′ are open subsets and that D D ϕ : ′ is of class C1. Let k be a compact subset of . For every D → D D norm ν on , every norm ν′ on ′, and every ε P× there exists δ P× V V ∈ ∈ such that k + δCe(ν) and such that, for every x k, the function ′ ⊂ D ∈ nx : δCe(ν) defined by → V

nx(v) := ϕ(x + v) ϕ(x) ( xϕ)v (64.10) − − ∇ is constricted with ′ str(nx; ν,ν ) ε. (64.11) ≤ Proof: Let a norm ν on be given. By Prop.6 of Sect.58, we can × V obtain δ1 P such that k + δ1Ce(ν) . For each x k, we define ∈ ′ ⊂ D ∈ mx : δ Ce(ν) by 1 → V

mx(v) := ϕ(x + v) ϕ(x) ( xϕ)v. (64.12) − − ∇ Differentiation gives

vmx = ϕ(x + v) ϕ(x) (64.13) ∇ ∇ −∇ for all x k and all v δ Ce(ν). Since k + δ Ce(ν) is compact by Prop.6 ∈ ∈ 1 1 of Sect.58 and since ϕ is continuous, it follows by the Uniform Continuity ∇ Theorem of Sect.58 that ϕ k is uniformly continuous. Now let a ∇ | +δ1Ce(ν) norm ν′ on ′ and ε P× be given. By Prop.4 of Sect.56, we can then V ∈ determine δ P× such that 2 ∈ ν(y x) <δ = ϕ(y) ϕ(x) ′ <ε − 2 ⇒ ||∇ −∇ ||ν,ν for all x, y k + δ Ce(ν). In view of (64.13), it follows that ∈ 1

v δ Ce(ν) = vmx ′ <ε ∈ 2 ⇒ ||∇ ||ν,ν for all x k and all v δ Ce(ν). If we put δ := min δ ,δ and if we define ∈ ∈ 1 { 1 2} nx := mx for every x k, we see that vnx ν,ν′ v δCe(ν) |δCe(ν) ∈ {||∇ || | ∈ } is bounded by ε for all x k. By (64.12) and the Striction Estimate for ∈ Differentiable Mappings, the desired result follows. 64. CONSTRICTED MAPPINGS 227

Definition 2: Let ϕ : be a constricted mapping from a set D → D D into itself. Then

str(ϕ) := inf str(ϕ; ν,ν) ν a norm on (64.14) { | V} is called the absolute striction of ϕ. If str(ϕ) < 1 we say that ϕ is a contraction. Contraction Fixed Point Theorem: Every contraction has at most one fixed point and, if its domain is closed and not empty, it has exactly one fixed point. Proof: Let ϕ : be a contraction. We choose a norm ν on such D → D V that κ := str(ϕ; ν,ν) < 1 and hence, by (64.1),

ν(ϕ(x) ϕ(y)) κν(x y) for all x, y . (64.15) − ≤ − ∈ D If x and y are fixed points of ϕ, so that ϕ(x) = x, ϕ(y) = y, then (64.15) gives ν(x y)(1 κ) 0. Since 1 κ > 0 this is possible only when − − ≤ − ν(x y) = 0 and hence x = y. Therefore, ϕ can have at most one fixed − point. We now assume = , choose s arbitrarily, and define D 6 ∅ 0 ∈ D ◦n × sn := ϕ (s ) for all n N 0 ∈ (see Sect.03). It follows from (64.15) that

ν(sm sm) = ν(ϕ(sm) ϕ(sm )) κν(sm sm ) +1 − − −1 ≤ − −1 for all m N×. Using induction, one concludes that ∈ m ν(sm sm) κ ν(s s ) for all m N. +1 − ≤ 1 − 0 ∈ Now, if n N and r N, then ∈ ∈

sn r sn = (sn k sn k) + − + +1 − + [ kX∈r and hence n n+k κ ν(sn r sn) κ ν(s s ) ν(s s ). + − ≤ 1 − 0 ≤ 1 κ 1 − 0 [ kX∈r − n × Since κ < 1, we have lim n κ = 0, and it follows that for every ε P →∞ ∈ there is an m N such that ν(sn r sn) <ε whenever n m+N, r N. By ∈ + − ∈ ∈ 228 CHAPTER 6. DIFFERENTIAL CALCULUS the Basic Convergence Criterion of Sect.55 it follows that s := (sn n N) | ∈ converges. Since ϕ(sn) = sn for all n N, we have ϕ s =(sn n N), +1 ∈ ◦ +1 | ∈ which converges to the same limit as s. We put

x := lim s = lim(ϕ s). ◦ Now assume that is closed. By Prop.6 of Sect.55 it follows that x . D ∈ D Since ϕ is continuous, we can apply Prop.2 of Sect.56 to conclude that lim(ϕ s) = ϕ(lim s), i.e. that ϕ(x) = x. ◦ Notes 64

(1) The traditional terms for “constricted” and “striction” are “Lipschitzian” and “Lipschitz number (or constant)”, respectively. I am introducing the terms “con- stricted” and “striction” here because they are much more descriptive.

(2) It turns out that the absolute striction of a lineon coincides with what is often called its “spectral radius”.

(3) The Contraction Fixed Point Theorem is often called the “Contraction Mapping Theorem” or the “Banach Fixed Point Theorem”.

65 Partial Gradients, Directional

Let , be flat spaces with translation spaces , . As we have seen in E1 E2 V1 V2 Sect.33, the set-product := is then a flat space with translation E E1 ×E2 space := . We consider a mapping ϕ : ′ from an open V V1 × V2 D → D subset of into an open subset ′ of a flat space ′ with translation D′ E D E space . Given any x , we define ( ,x ): according to (04.21), V 2 ∈E2 · 2 E1 →E put := ( ,x )<( ) = z (z,x ) , (65.1) D(·,x2) · 2 D { ∈E1 | 2 ∈ D} which is an open subset of because ( ,x ) is flat and hence continuous, E1 · 1 and define ϕ( ,x ) : ′ according to (04.22). If ϕ( ,x ) is dif- · 2 D(·,x2) → D · 2 ferentiable at x for all (x ,x ) we define the partial 1-gradient 1 1 2 ∈ D ϕ : Lin( , ′) of ϕ by ∇(1) D→ V1 V

ϕ(x ,x ) := x ϕ( ,x ) forall (x ,x ) . (65.2) ∇(1) 1 2 ∇ 1 · 2 1 2 ∈ D

In the special case when 1 := R, we have 1 = R, and the partial ′ E V 1-derivative ϕ, : , defined by 1 D → V

ϕ, (t,x ):=(∂ϕ( ,x ))(t) forall (t,x ) , (65.3) 1 2 · 2 2 ∈ D 65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 229 is related to ϕ by ϕ = ϕ, and ϕ, =( ϕ)1 (value-wise). ∇(1) ∇(1) 1⊗ 1 ∇(1) Similar definitions are employed to define (x1,·), the mapping ′ D ′ ϕ(x1, ): (x1,·) , the partial 2-gradient (2)ϕ : Lin( 2, ) and · D → D ′ ∇ D→ V V the partial 2-derivative ϕ, : . 2 D → V The following result follows immediately from the definitions. Proposition 1: If ϕ : ′ is differentiable at x := (x ,x ) , D → D 1 2 ∈ D then the gradients x ϕ( ,x ) and x ϕ(x , ) exist and ∇ 1 · 2 ∇ 2 1 ·

( xϕ)v =( x ϕ( ,x ))v +( x ϕ(x , ))v (65.4) ∇ ∇ 1 · 2 1 ∇ 2 1 · 2 for all v := (v , v ) . 1 2 ∈ V If ϕ is differentiable, then the partial gradients ϕ and ϕ exist ∇(1) ∇(2) and we have ϕ = ϕ ϕ (65.5) ∇ ∇(1) ⊕∇(2) where the operation on the right is understood as value-wise application ⊕ of (14.13). Pitfall: The converse of Prop.1 is not true: A mapping can have partial gradients without being differentiable. For example, the mapping ϕ : R R R, defined by × → st if (s, t) = (0, 0) ϕ(s, t) := s2+t2 6 , 0 if (s, t)=(0, 0)   has partial derivatives at (0, 0) since both ϕ( , 0) and ϕ(0, ) are equal to the · · constant 0. Since ϕ(t,t) = 1 for t R× but ϕ(0, 0) = 0, it is clear that ϕ is 2 ∈ not even continuous at (0, 0), let alone differentiable. The second assertion of Prop.1 shows that if ϕ is of class C1, then ϕ ∇(1) and ϕ exist and are continuous. The converse of this statement is true, ∇(2) but the proof is highly nontrivial: Proposition 2: Let , , ′ be flat spaces. Let be an open subset of E1 E2 E D and ′ an open subset of ′. A mapping ϕ : ′ is of class C1 if E1 ×E2 D E D → D (and only if) the partial gradients ϕ and ϕ exist and are continuous. ∇(1) ∇(2) Proof: Let (x ,x ) . Since (x ,x ) is a neighborhood of (0, 0) 1 2 ∈ D D − 1 2 in , we may choose Nhd0( ) and Nhd0( ) such that V1 × V2 M1 ∈ V1 M2 ∈ V2 := ( (x ,x )). M M1 × M2 ⊂ D − 1 2 We define m : ′ by M → V m(v , v ) := ϕ((x ,x )+(v , v )) ϕ(x ,x ) 1 2 1 2 1 2 − 1 2 ( ϕ(x ,x )v + ϕ(x ,x )v ). − ∇(1) 1 2 1 ∇(2) 1 2 2 230 CHAPTER 6. DIFFERENTIAL CALCULUS

It suffices to show that m Small( , ′), for if this is the case, then ∈ V1 × V2 V the Characterization of Gradients of Sect.63 tells us that ϕ is differentiable at (x1,x2) and that its gradient at (x1,x2) is given by (65.4). In addition, since (x ,x ) is arbitrary, we can then conclude that (65.5) holds and, 1 2 ∈ D since both ϕ and ϕ are continuous, that ϕ is continuous. ∇(1) ∇(2) ∇ We note that m := h+k, when h : ′ and k : ′ are defined M → V M → V by h(v , v ) := ϕ(x ,x + v ) ϕ(x ,x ) ϕ(x ,x )v , 1 2 1 2 2 − 1 2 −∇(2) 1 2 2 k(v , v ) := ϕ(x + v ,x + v ) ϕ(x ,x + v ) ϕ(x ,x )v . 1 2 1 1 2 2 − 1 2 2 −∇(1) 1 2 1  (65.6) The differentiability of ϕ(x , ) at x insures that h belongs to 1 · 2 Small( , ′) and it only remains to be shown that k belongs to V1 × V2 V Small( , ′). V1 × V2 V We choose norms ν ,ν , and ν′ on , and ′, respectively. Let ε 1 2 V1 V2 V ∈ P× be given. Since ϕ is continuous at (x ,x ), there are open convex ∇(1) 1 2 neighborhoods and of 0 in and , respectively, such that N1 N2 V1 V2 N1 ⊂ , , and M1 N2 ⊂ M2 ϕ((x ,x )+(v , v )) ϕ(x ,x ) ν ,ν′ <ε ||∇(1) 1 2 1 2 −∇(1) 1 2 || 1 for all (v , v ) . Noting (65.6), we see that this is equivalent to 1 2 ∈ N1 × N2 v k( , v ) ν ,ν′ <ε for all v , v . ||∇ 1 · 2 || 1 1 ∈ N1 2 ∈ N2 Applying the Striction Estimate of Sect.64 to k( , v ) , we infer that · 2 |N1 str(k( , v ) ; ν ,ν′) ε. It is clear from (65.6) that k(0, v ) = 0 for · 2 |N1 1 ≤ 2 all v . Hence we can conclude, in view of (64.1), that 2 ∈ N2 ν′(k(v , v )) εν (v ) ε max ν (v ),ν (v ) for all v , v . 1 2 ≤ 1 1 ≤ { 1 1 2 2 } 1 ∈ N1 2 ∈ N2 Since ε P× was arbitrary, it follows that ∈ ν′(k(v , v )) lim 1 2 =0, (v ,v )→(0,0) max ν (v ),ν (v ) 1 2 { 1 1 2 2 } which, in view of (62.1) and (51.23), shows that k Small( , ′) as ∈ V1 × V2 V required. Remark: In examining the proof above one observes that one needs merely the existence of the partial gradients and the continuity of one of them in order to conclude that the mapping is differentiable.

We now generalize Prop.2 to the case when := ( i i I) is the E E | ∈ set-product of a finite family ( i i I) of flat spaces. This product is a E | ∈ × E 65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 231

flat space whose translation space is the set-product := ( i i I) of V V | ∈ the translation spaces i of the i, i I. Given any x and j I, we V E ∈ ∈× E ∈ define the mapping (x.j): j according to (04.24). E →E We consider a mapping ϕ : ′ from an open subset of into an D → D D E open subset ′ of ′. Given any x and j I, we put D E ∈ D ∈ < := (x.j) ( ) j (65.7) D(x.j) D ⊂E and define ϕ(x.j) : ′ according to (04.25). If ϕ(x.j) is differen- D(x.j) → D tiable at xj for all x , we define the partial j-gradient (j)ϕ : ′ ∈ D ∇ D → Lin( j, ) of ϕ by ϕ(x) := x ϕ(x.j) for all x . V V ∇(j) ∇ j ∈ D In the special case when j := R for some j I, the partial j- ′ E ∈ derivative ϕ,j : , defined by D → V

ϕ,j(x):=(∂ϕ(x.j))(xj) for all x , (65.8) ∈ D is related to ϕ by ϕ = ϕ,j and ϕ,j =( ϕ)1. ∇(j) ∇(j) ⊗ ∇(j) In the case when I := 1, 2 , these notations and concepts reduce to the { } ones explained in the beginning (see (04.21)). The following result generalizes Prop.1. ′ Proposition 3: If ϕ : is differentiable at x =(xi i I) , D → D | ∈ ∈ D then the gradients x ϕ(x.j) exist for all j I and ∇ j ∈

( xϕ)v = ( x ϕ(x.j))vj (65.9) ∇ ∇ j Xj∈I for all v =(vi i I) . | ∈ ∈ V If ϕ is differentiable, then the partial gradients ϕ exist for all j I ∇(j) ∈ and we have ϕ = ( ϕ), (65.10) ∇ ∇(j) j I M∈ where the operation on the right is understood as value-wise application of (14.18). L I In the case when := R , we can put i := R for all i I and (65.10) E E ∈ reduces to ϕ = lnc , (65.11) ∇ (ϕ,i | i∈I) where the value at x of the right side is understood to be the linear- ∈ D ′ ′ combination mapping of the family (ϕ,i(x) i I) in . If moreover, | ∈ V E is also a space of the form ′ := RK with some finite index set K, then E 232 CHAPTER 6. DIFFERENTIAL CALCULUS lnc can be identified with the function which assigns to each x (ϕ,i | i∈I) ∈ D the matrix

K×I I K (ϕk,i(x) (k,i) K I) R = Lin(R , R ). | ∈ × ∈ ∼ Hence ϕ can be identified with the matrix ∇

ϕ =(ϕk,i (k,i) K I). (65.12) ∇ | ∈ × of the partial derivatives ϕk,i : R of the component functions. As we D → have seen, the mere existence of these partial derivatives is not sufficient to insure the existence of ϕ. Only if ϕ is known a priori to exist is it ∇ ∇ possible to use the identification (65.12). Using the natural isomorphism between

( i i I) and j ( ( i i I j )) E | ∈ E × E | ∈ \{ } and Prop.2, one× easily proves, by induction,× the following generalization. Partial : Let I be a finite index set and let i, i E ∈ ′ I and be flat spaces. Let be an open subset of := ( i i I) and E D E E | ∈ ′ an open subset of ′. A mapping ϕ : ′ is of class C1 if and only D E D → D × if the partial gradients ϕ exist and are continuous for all i I. ∇(i) ∈ The following result is an immediate corollary: Proposition 4: Let I and K be finite index sets and let and ′ be D D open subsets of RI and RK , respectively. A mapping ϕ : ′ is of 1 D → D class C if and only if the partial derivatives ϕk,i : R exist and are D → continuous for all k K and all i I. ∈ ∈ Let , ′ be flat spaces with translation spaces and ′, respectively. E E V V Let , ′ be open subsets of and ′, respectively, and consider a mapping D D E E ϕ : ′. Given any x and v ×, the range of the mapping D → D ∈ D ∈ V (s (x + sv)) : R is a line through x whose direction space is Rv. 7→ → E Let Sx,v := s R x + sv be the pre-image of under this { ∈ | ∈ D} D mapping and x +1S v : Sx,v a corresponding adjustment of the x,v → D mapping. Since is open, Sx,v is an open neighborhood of zero in R and D ′ ϕ (x +1S v) : Sx,v is a process. The derivative at 0 R of this ◦ x,v → D ∈ process, if it exists, is called the of ϕ at x and is denoted by

ϕ(x + sv) ϕ(x) (ddvϕ)(x) := ∂0(ϕ (x +1Sx, v)) = lim − . (65.13) ◦ v s→0 s 65. PARTIAL GRADIENTS, DIRECTIONAL DERIVATIVES 233

If this directional derivative exists for all x , it defines a function ∈ D ′ ddvϕ : D → V which is called the directional v-derivative of ϕ. The following result is immediate from the General Chain Rule: Proposition 5: If ϕ : ′ is differentiable at x then the D → D ∈ D directional v-derivative of ϕ at x exists for all v and is given by ∈ V

(ddvϕ)(x)=( xϕ)v. (65.14) ∇ Pitfall: The converse of this Proposition is false. In fact, a mapping can have directional derivatives in all directions at all points in its domain without being differentiable. For example, the mapping ϕ : R2 R defined → by 2 s t if (s, t) = (0, 0) ϕ(s, t) := s4+t2 6 ( 0 if (s, t)=(0, 0) ) has the directional derivatives

α2 β if β =0 (dd(α,β)ϕ)(0, 0) = 6 ( 0 if β =0 )

1 at (0, 0). Using Prop.4 one easily shows that ϕ R2 is of class C and | \{(0,0)} hence, by Prop.5, ϕ has directional derivatives in all directions at all (s, t) ∈ R2. Nevertheless, since ϕ(s, s2) = 1 for all s R×, ϕ is not even continuous 2 ∈ at (0, 0), let alone differentiable. Proposition 6: Let b be a set basis of . Then ϕ : ′ is of class V D → D C1 if and only if the directional b-derivatives ddbϕ : ′ exist and are D → V continuous for all b b. ∈ b Proof: We choose q and define α : R by ∈ DRb b → E α(λ) := q + b∈b λbb for all λ . Since is a basis, α is a flat iso- ∈ b morphism. If we define := α<( ) R and ϕ := ϕ α D : ′, we P D D ⊂ ◦ |D D → D see that the directional b-derivatives of ϕ correspond to the partial deriva- tives of ϕ. The assertion follows from the Partial Gradient Theorem, applied to the case when I := b and i := R for all i I. E ∈ Combining Prop.5 and Prop.6, we obtain Proposition 7: Assume that Sub spans . The mapping S ∈ V V ϕ : ′ is of class C1 if and only if the directional derivatives ddvϕ : D → D ′ exist and are continuous for all v . D → V ∈ S 234 CHAPTER 6. DIFFERENTIAL CALCULUS

Notes 65

(1) The notation ∇(i) for the partial i-gradient is introduced here for the first time. I am not aware of any other notation in the literature.

(2) The notation ϕ,i for the partial i-derivative of ϕ is the only one that occurs fre- quently in the literature and is not objectionable. Frequently seen notations such

as ∂ϕ/∂xi or ϕxi for ϕ,i are poison to me because they contain dangling dummies (see Part D of the Introduction).

(3) Some people use the notation ∇v for the directional derivative ddv. I am intro- ducing ddv because (by Def.1 of Sect.63) ∇v means something else, namely the gradient at v.

66 The General

The following result shows that bilinear mappings (see Sect.24) are of class C1 and hence continuous. (They are not uniformly continuous except when zero.) Proposition 1: Let , and be linear spaces. Every bi- V1 V2 W linear mapping B : is of class C1 and its gradient V1 × V2 → W B : Lin( , ) is the linear mapping given by ∇ V1 × V2 → V1 × V2 W B(v , v )=(B∼v ) (Bv ) (66.1) ∇ 1 2 2 ⊕ 1 for all v , v . 1 ∈ V1 2 ∈ V2 Proof: Since B(v , ) = Bv : (see (24.2)) is linear for each 1 · 1 V2 → W v , the partial 2-gradient of B exists and is given by B(v , v ) = 1 ∈ V ∇(2) 1 2 Bv for all (v , v ) . It is evident that B : 1 1 2 ∈ V1 × V2 ∇(2) V1 × V2 → Lin( , ) is linear and hence continuous. A similar argument shows that V2 W B is given by B(v , v ) = B∼v for all (v , v ) and hence ∇(1) ∇(1) 1 2 2 1 2 ∈ V1 ×V2 is also linear and continuous. By the Partial Gradient Theorem of Sect.65, it follows that B is of class C1. The formula (66.1) is a consequence of (65.5).

Let be an open set in a flat space with translation space . If D E V h : , h : and B Lin ( , ) we write 1 D → V1 2 D → V2 ∈ 2 V1 × V2 W B(h , h ) := B (h , h ): 1 2 ◦ 1 2 D→W so that B(h , h )(x) = B(h (x), h (x)) for all x . 1 2 1 2 ∈ D 66. THE GENERAL PRODUCT RULE 235

The following theorem, which follows directly from Prop.1 and the Gen- eral Chain Rule of Sect.63, is called “Product Rule” because the term “prod- uct” is often used for the bilinear forms to which the theorem is applied. General Product Rule: Let , and be linear spaces and let V1 V2 W B Lin ( , ) be given. Let be an open subset of a flat space and ∈ 2 V1 × V2 W D let mappings h : and h : be given. If h and h are both 1 D → V1 2 D → V2 1 2 differentiable at x so is B(h , h ), and we have ∈ D 1 2 ∼ xB(h , h )=(Bh (x)) xh +(B h (x)) xh . (66.2) ∇ 1 2 1 ∇ 2 2 ∇ 1 1 If h1 and h2 are both differentiable [of class C ], so is B(h1, h2), and we have B(h , h )=(Bh ) h +(B∼h ) h , (66.3) ∇ 1 2 1 ∇ 2 2 ∇ 1 where the products on the right are understood as value-wise compositions. We now apply the General Product Rule to special bilinear mappings, namely to the ordinary product in R and to the four examples given in Sect.24. In the following list of results denotes an open subset of a flat D space having as its translation space; , ′ and ′′ denote linear spaces. V W W W (I) If f : R and g : R are differentiable [of class C1], so is the D → D → value-wise product fg : R and we have D→ (fg) = f g + g f. (66.4) ∇ ∇ ∇ (II) If f : R and h : are differentiable [of class C1], so is the D → D→W value-wise scalar multiple fh : , and we have D→W (fh) = h f + f h. (66.5) ∇ ⊗∇ ∇ (III) If h : and η : ∗ are differentiable [of class C1], so is the D→W D→W function ηh : R defined by (ηh)(x) := η(x)h(x) for all x , D → ∈ D and we have (ηh)=( h)⊤η +( η)⊤h. (66.6) ∇ ∇ ∇ (IV) If F : Lin( , ′) and h : are differentiable at x , so D→ W W D→W ∈ D is Fh (defined by (Fh)(y) := F(y)h(y) for all y ) and we have ∈ D

x(Fh)v = (( xF)v)h(x)+(F(x) xh)v (66.7) ∇ ∇ ∇ for all v . If F and h are differentiable [of class C1], so is Fh and ∈ V (66.7) holds for all x , v . ∈ D ∈ V 236 CHAPTER 6. DIFFERENTIAL CALCULUS

(V) If F : Lin( , ′) and G : Lin( ′, ′′) are differentiable D → W W D → W W at x , so is GF, defined by value-wise composition, and we have ∈ D

x(GF)v = (( xG)v)F(x) + G(x)(( xF)v) (66.8) ∇ ∇ ∇ for all v . If F and G are differentiable [of class C1], so is GF, and ∈ V (66.8) holds for all x , v . ∈ D ∈ V If is an inner-product space, then ∗ can be identified with , (see W W W Sect.41) and (III) reduces to the following result.

(VI) If h, k : are differentiable [of class C1], so is their value-wise D→W inner product h k and · (h k)=( h)⊤k +( k)⊤h. (66.9) ∇ · ∇ ∇ In the case when reduces to an interval in R, (66.4) becomes D the Product Rule (08.32)2 of elementary calculus. The following for- mulas apply to differentiable processes f, h,η, F, and G with values in R, , ∗, Lin( , ′), and Lin( ′, ′′), respectively: W W W W W W (fh)• = f •h + fh•, (66.10)

(ηh)• = η•h + ηh•, (66.11) (Fh)• = F•h + Fh•, (66.12) (GF)• = G•F + GF•. (66.13) If h and k are differentiable processes with values in an inner product space, then (h k)• = h• k + h k•. (66.14) · · · Proposition 2: Let be an inner-product space and let R : Lin W D→ W be given. If R is differentiable at x and if Rng R Orth then ∈ D ⊂ W ⊤ R(x) (( xR)v) Skew for all v . (66.15) ∇ ∈ W ∈ V Conversely, if R is differentiable, if is convex, if (66.15) holds for all D x , and if R(q) Orth for some q then Rng R Orth . ∈ D ∈ W ∈ D ⊂ W Proof: Assume that R is differentiable at x . Using (66.8) with the ∈ D choice F := R and G := R⊤ we find that

⊤ ⊤ ⊤ x(R R)v = (( xR )v)R(x) + R (x)(( xR)v) ∇ ∇ ∇ 66. THE GENERAL PRODUCT RULE 237 holds for all v . By (63.13) we obtain ∈ V ⊤ ⊤ ⊤ x(R R)v = W + W with W := R (x)(( xR)v) (66.16) ∇ ∇ for all v . Now, if Rng R Orth then R⊤R = 1 is constant (see ∈ V ⊂ V W Prop.2 of Sect.43). Hence the left side of (66.16) is zero and W must be skew for all v , i.e. (66.15) holds. ∈ V ⊤ Conversely, if (66.15) holds for all x , then, by (66.16) x(R R) = 0 ∈ D ∇ for all x . Since is convex, we can apply Prop.3 of Sect.64 to conclude ∈ D D that R⊤R must be constant. Hence, if R(q) Orth , i.e. if (R⊤R)(q) = ∈ W 1 , then R⊤R is the constant 1 , i.e. Rng R Orth . W W ⊂ W Remark: In the second part of Prop.2, the condition that be con- D vex can be replaced by the one that be “connected” as explained in the D Remark after Prop.3 of Sect.64. Corollary: Let I be an open interval and let R : I Lin be a → W differentiable process. Then Rng R Orth if and only if R(t) Orth ⊂ W ∈ W for some t I and Rng(R⊤R•) Skew . ∈ ⊂ W The following result shows that quadratic forms (see Sect.27) are of class C1 and hence continuous. Proposition 3: Let be a linear space. Every quadratic form Q : V V → R is of class C1 and its gradient Q : ∗ is the linear mapping ∇ V → V Q =2 Q , (66.17) ∇ where Q is identified with the symmetric bilinear form Q Sym ( 2, R) = ∈ 2 V ∼ Sym( , ∗) associated with Q (see (27.14)). V V Proof: By (27.14), we have

Q(u) = Q (u, u)=(Q (1V , 1V ))(u) for all u . Hence, if we apply the General Product Rule with the choices ∈ V ∼ B := Q and h := h := 1 , we obtain Q = Q + Q . Since Q is 1 2 V ∇ symmetric, this reduces to (66.17). Let be a linear space. The lineonic nth pow : Lin Lin V n V → V on the algebra Lin of lineons (see Sect.18) is defined by V pow (L) := Ln for all n N. (66.18) n ∈ The following result is a generalization of the familiar differentiation rule (ιn)• = nιn−1. 238 CHAPTER 6. DIFFERENTIAL CALCULUS

Proposition 4: For every n N the lineonic nth power pow on ∈ n Lin defined by (66.18) is of class C1 and, for each L Lin , its gradi- V ∈ V ent Lpow Lin(Lin ) at L Lin is given by ∇ n ∈ V ∈ V k−1 n−k ( Lpow )M = L ML for all M Lin . (66.19) ∇ n ∈ V ] kX∈n

Proof: For n = 0 the assertion is trivial. For n = 1, we have pow1 = 1 and hence Lpow = 1 for all L Lin , which is consistent with LinV ∇ 1 LinV ∈ V (66.19). Also, since pow is constant, pow is of class C1. Assume, then, ∇ 1 1 that the assertion is valid for a given n R. Since pow = pow pow ∈ n+1 1 n holds in terms of value-wise composition, we can apply the result (V) above to the case when F := pown, G := pow1 in order to conclude that pown+1 is of class C1 and that

( Lpow )M = (( Lpow )M)pow (L) + pow (L)(( Lpow )M) ∇ n+1 ∇ 1 n 1 ∇ n for all M Lin . Using (66.18) and (66.19) we get ∈ V n k−1 n−k ( Lpow )M = ML + L( L ML ), ∇ n+1 ] kX∈n which shows that (66.19) remains valid when n is replaced by n + 1. The desired result follows by induction. If M commutes with L, then (66.19) reduces to

n−1 ( Lpow )M =(nL )M. ∇ n Using Prop.4 and the form (63.17) of the Chain Rule, we obtain Proposition 5: Let I be an interval and let F : I Lin be a process. If → V F is differentiable [of class C1], so is its value-wise nth power Fn : I Lin → V and we have

(Fn)• = Fk−1F•Fn−k for all n N. (66.20) ∈ ] kX∈n 67 , Laplacian

In this section, denotes an open subset of a flat space with translation D E space , and denotes a linear space. V W 67. DIVERGENCE, LAPLACIAN 239

If h : is differentiable at x , we can form the trace (see D → V ∈ D Sect.26) of the gradient xh Lin . The result ∇ ∈ V

divxh := tr( xh) (67.1) ∇ is called the divergence of h at x. If h is differentiable we can define the divergence div h : R of h by D→

(div h)(x) := divxh for all x . (67.2) ∈ D Using the product rule (66.5) and (26.3) we obtain Proposition 1: Let h : and f : R be differentiable. Then D → V D → the divergence of fh : is given by D → V div (fh)=( f)h + fdiv h. (67.3) ∇ Consider now a mapping H : Lin( ∗, ) that is differentiable D → V W at x . For every ω ∗ we can form the value-wise composite ∈ D ∈ W ωH : Lin( ∗, R) = ∗∗ = (see Sect.22). Since ωH is differen- D → V V ∼ V tiable at x for every ω ∗ (see Prop.3 of Sect.63) we may consider the ∈ W mapping ∗ (ω tr( x(ωH))) : R. 7→ ∇ W → It is clear that this mapping is linear and hence an element of ∗∗ = . W ∼ W Definition 1: Let H : Lin( ∗, ) be differentiable at x . Then D→ V W ∈ D the divergence of H at x is defined to be the (unique) element divxH of which satisfies W ∗ ω(divxH) = tr( x(ωH)) for all ω . (67.4) ∇ ∈W If H is differentiable, then its divergence div H : is defined by D→W

(div H)(x) := divxH for all x . (67.5) ∈ D In the case when := R, we also have ∗ = R and Lin( ∗, ) = W W ∼ V W ∗∗ = . Thus, using (67.4) with ω := 1 R, we see that the definition V ∼ V ∈ just given is consistent with (67.1) and (67.2). If we replace h and L in Prop.3 of Sect.63 by H and

(K ωK) Lin(Lin( ∗, ), ), 7→ ∈ V W V respectively, we obtain

∗ x(ωH) = ω xH for all ω , (67.6) ∇ ∇ ∈W 240 CHAPTER 6. DIFFERENTIAL CALCULUS where the right side must be interpreted as the composite of ∗ xH Lin ( , ) with ω Lin( , R). This composite, being an ele- ∇ ∈ 2 V × V W ∈ W ment of Lin ( ∗, R), must be reinterpreted as an element of Lin via the 2 V ×V V identifications Lin ( ∗, R) = Lin( , Lin( ∗, R)) = Lin( , ∗∗) = Lin 2 V × V ∼ V V V V ∼ V to make (67.6) meaningful. Using (67.6), (67.4), and (67.1) we see that div H satisfies ω(divxH) = divx(ωH) = tr(ω xH) (67.7) ∇ for all ω ∗. ∈W The following results generalize Prop.1. Proposition 2: Let H : Lin( ∗, ) and ρ : ∗ D → V W D→W be differentiable. Then the divergence of the value-wise composite ρH : Lin( ∗, R) = is given by D→ V ∼ V div (ρH) = ρdiv H + tr(H⊤ ρ), (67.8) ∇ where value-wise evaluation and composition are understood. Proof: Let x and v be given. Using (66.8) with the choices ∈ D ∈ V G := ρ and F := H we obtain

x(ρH)v = (( xρ)v)H(x) + ρ(x)(( xH)v). (67.9) ∇ ∇ ∇ ∗ Since ( xρ)v , (21.3) gives ∇ ∈W ⊤ (( xρ)v)H(x)=(H(x) xρ)v. ∇ ∇ On the other hand, we have

ρ(x)(( xH)v)=(ρ(x)( xH))v ∇ ∇ ∗ if we interpret xH on the right as an element of Lin ( , ). Hence, ∇ 2 V × V W since v was arbitrary, (67.9) gives ∈ V ⊤ x(ρH) = ρ(x) xH + H(x) xρ. ∇ ∇ ∇ Taking the trace, using (67.1), and using (67.7) with ω := ρ(x), we get

⊤ divx(ρH) = ρ(x)divxH + tr(H(x) xρ). ∇ Since x was arbitrary, the desired result (67.8) follows. ∈ D Proposition 3: Let h : and k : be differentiable. The D → V D→W divergence of k h : Lin( ∗, ), defined by taking the value-wise ⊗ D → V W tensor product (see Sect.25), is then given by

div (k h)=( k)h + (div h)k. (67.10) ⊗ ∇ 67. DIVERGENCE, LAPLACIAN 241

Proof: By (67.7) we have

ω divx(k h) = divx(ω(k h)) = divx((ω k)h) ⊗ ⊗ for all x and all ω ∗. Hence, by Prop.1, ∈ D ∈W ω div (k h)=( (ωk))h +(ωk)div h = ω(( k)h + k div h) ⊗ ∇ ∇ holds for all ω ∗, and (67.10) follows. ∈W Proposition 4: Let H : Lin( ∗, ) and f : R be differen- D → V W D → tiable. The divergence of fH : Lin( ∗, ) is then given by D→ V W div (fH) = fdiv H + H( f). (67.11) ∇ Proof: Let ω ∗ be given. If we apply Prop.2 to the case when ∈ W ρ := fω we obtain

div (ω(fH)) = ω(fdiv H)+tr(H⊤ (fω)). (67.12) ∇ Using (fω) = ω f and using (25.9), (26.3), and (22.3), we obtain ∇ ⊗∇ tr(H⊤ (fω)) = tr(H⊤(ω f)) = f(H⊤ω) = ω(H f). Hence (67.12) ∇ ⊗∇ ∇ ∇ and (67.7) yield ω div (fH) = ω(fdiv H + H f). Since ω ∗ was ∇ ∈ W arbitrary, (67.11) follows. From now on we assume that has the structure of a , E so that becomes an inner-product space and we can use the identification V ∗ = (see Sect.41). Thus, if k : is twice differentiable, we can V ∼ V D→W consider the divergence of k : Lin( , ) = Lin( ∗, ). ∇ D→ V W ∼ V W Definition 2: Let k : be twice differentiable. Then the Lapla- D→W cian of k is defined to be

∆k := div ( k). (67.13) ∇ If ∆k =0, then k is called a harmonic function. Remark: In the case when is not a genuine Euclidean space, but one E whose translation space has index 1 (see Sect.47), the term D’Alembertian or Wave-Operator and the symbol are often used instead of Laplacian and ∆. The following result is a direct consequence of (66.5) and Props.3 and 4. Proposition 5: Let k : and f : R be twice differentiable. D→W D → The Laplacian of fk : is then given by D→W ∆(fk)=(∆f)k + f∆k + 2( k)( f). (67.14) ∇ ∇ 242 CHAPTER 6. DIFFERENTIAL CALCULUS

The following result follows directly from the form (63.19) of the Chain Rule and Prop.3. Proposition 6: Let I be an open interval and let f : I and D → g : I both be twice differentiable. The Laplacian of g f : is →W ◦ D→W then given by

∆(g f)=( f)·2(g•• f)+(∆f)(g• f). (67.15) ◦ ∇ ◦ ◦ We now discuss an important application of Prop.6. We consider the case when f : I is given by D→ f(x):=(x q)·2 for all x , (67.16) − ∈ D where q is given. We wish to solve the following problem: How can ,I ∈E D and g : I be chosen such that g f is harmonic? →W ◦ It follows directly from (67.16) and Prop.3 of Sect.66, applied to Q := sq, •2 that xf = 2(x q) for all x and that hence ( f) =4f. Also, ( f) ∇ − ∈ D ∇ ∇ ∇ is the constant with value 21V and hence, by (26.9),

∆f = div ( f) = 2tr1 = 2dim = 2dim . ∇ V V E Substitution of these results into (67.15) gives

∆(g f)=4f(g•• f) + 2(dim )g• f. (67.17) ◦ ◦ E ◦ Hence, g f is harmonic if and only if ◦ (2ιg•• + ng•) f = 0 with n := dim , (67.18) ◦ E where ι is the identity mapping of R, suitably adjusted (see Sect.08). Now, if we choose I := Rng f, then (67.18) is satisfied if and only if g satisfies the ordinary differential equation 2ιg•• + ng• = 0. It follows that g must be of the form −( n −1) aι 2 + b if n =2 g = 6 , (67.19) a log+b if n =2   where a, b , provided that I is an interval. ∈W If is a genuine Euclidean space, we may take := q and I := E D E\{ } Rng f = P×. We then obtain the harmonic function

ar−(n−2) + b if n =2 h = 6 , (67.20) a(log r) + b if n =2  ◦  where a, b and where r : q P× is defined by r(x) := x q . ∈W E\{ }→ | − | 68. LOCAL INVERSION, IMPLICIT MAPPINGS 243

If the Euclidean space is not genuine and is double-signed we have E V two possibilities. For all n N× we may take := x (x ∈ D { ∈E | − q)•2 > 0 and I := P×. If n is even and n = 2, we may instead take } 6 := x (x q)•2 < 0 and I := P×. D { ∈E| − } − Notes 67

(1) In some of the literature on “Vector ”, the notation ∇• h instead of div h is used for the divergence of a vector field h, and the notation ∇2f instead of ∆f for the Laplacian of a function f. These notations come from a formalistic and erroneous understanding of the meaning of the symbol ∇ and should be avoided.

68 Local Inversion, Implicit Mappings

In this section, and ′ denote open subsets of flat spaces and ′ with D D ′ E E translation spaces and , respectively. V V To say that a mapping ϕ : ′ is differentiable at x means, D → D ∈ D roughly, that ϕ may be approximated, near x, by a flat mapping α, the tangent of ϕ at x. One might expect that if the tangent α is invertible, then ϕ itself is, in some sense, “locally invertible near x”. To decide whether this expectation is justified, we must first give a precise meaning to “locally invertible near x”. Definition: Let a mapping ϕ : ′ be given. We say that ψ is a ′ D → D local inverse of ϕ if ψ =(ϕ N )← for suitable open subsets = Cod ψ = |N N Rng ψ and ′ = Dom ψ of and ′, respectively. We say that ψ is a local N D D inverse of ϕ near x if x Rng ψ. We say that ϕ is locally invertible ∈ D ∈ near x if it has some local inverse near x. ∈ D To say that ψ is a local inverse of ϕ means that

N ′ N ′ ψ ϕ =1 and ϕ ψ =1 ′ (68.1) ◦ |N N |N ◦ N ′ for suitable open subsets = Cod ψ and = Dom ψ of and ′, respec- N N D D tively. If ψ1 and ψ2 are local inverses of ϕ and := (Rng ψ1) (Rng ψ2), then < M < ∩ ψ and ψ must agree on ϕ>( ) = ψ ( ) = ψ ( ), i.e. 1 2 M 1 M 2 M ψ M = ψ M if := (Rng ψ ) (Rng ψ ). (68.2) 1|ϕ>(M) 2|ϕ>(M) M 1 ∩ 2 Pitfall: A mapping ϕ : ′ may be locally invertible near every D → D point in without being invertible, even if it is surjective. In fact, if ψ is a D local inverse of ϕ, ϕ<(Dom ψ) need not be included in Rng ψ. 244 CHAPTER 6. DIFFERENTIAL CALCULUS

For example, the function f :] 1, 1[× ]0, 1[ defined by f(s) := s for − → | | all s ] 1, 1[× is easily seen to be locally invertible, surjective, and even ∈ − continuous. The identity function 1]0,1[ of ]0, 1[ is a local inverse of f, and we have

f <(Dom1 ) = f <( ]0, 1[)=] 1, 1[× ]0, 1[= Rng1 . ]0,1[ − 6⊂ [0,1] The function h : ]0, 1[ ] 1, 0[ defined by h(s) = s for all s ]0, 1[ is → − − ∈ another local inverse of f. If dim 2, one can even give counter-examples of mappings E ≥ ϕ : ′ of the type described above for which is convex (see Sect.74). D → D D The following two results are recorded for later use. Proposition 1: Assume that ϕ : ′ is differentiable at x and D → D ∈ D locally invertible near x. Then, if some local inverse near x is differentiable −1 at ϕ(x), so are all others and all have the same gradient, namely ( xϕ) . ∇ Proof: We choose a local inverse ψ1 of ϕ near x such that ψ1 is differ- entiable at ϕ(x). Applying the Chain Rule to (68.1) gives

( ψ )( xϕ) = 1 and ( xϕ)( ψ ) = 1 ′ , ∇ϕ(x) 1 ∇ V ∇ ∇ϕ(x) 1 V −1 which shows that xϕ is invertible and ψ =( xϕ) . ∇ ∇ϕ(x) 1 ∇ Let now a local inverse ψ of ϕ near x be given. Let be defined as in 2 M (68.2). Since is open and since ψ1, being differentiable at x, is continuous M < ′ at x, it follows that ϕ>( ) = ψ ( ) is a neighborhood of ϕ(x) in . By M 1 M E (68.2), ψ2 agrees with ψ1 on the neighborhood ϕ>( ) of ϕ(x). Hence ψ2 M −1 must also be differentiable at ϕ(x), and ψ = ψ =( xϕ) . ∇ϕ(x) 2 ∇ϕ(x) 1 ∇ Proposition 2: Assume that ϕ : ′ is continuous and that ψ is a D → D local inverse of ϕ near x. (i) If ′ is an open neighborhood of ϕ(x) and ′ Dom ψ, then M ′ < ′ M ⊂ ψ>( ) = ϕ ( ) Rng ψ is an open neighborhood of x; hence the ′ M ψ>M(M )∩ adjustment ψ ′ is again a local inverse of ϕ near x. |M (ii) Let be an open subset of with x . If ψ is continuous at ϕ(x) G E ∈ G then there is an open neighborhood of x with Rng ψ such < M M ⊂ M ∩ G that ψ ( ) = ϕ>( ) is open; hence the adjustment ψ < is again M M |ψ (M) a local inverse of ϕ near x.

Proof: Part (i) is an immediate consequence of Prop.3 of Sect.56. To prove (ii), we observe first that ψ<(Rng ψ ) must be a neighborhood of ∩ G ϕ(x) = ψ←(x). We can choose an open subset ′ of ψ<(Rng ψ ) with M ∩ G 68. LOCAL INVERSION, IMPLICIT MAPPINGS 245

ϕ(x) ′ (see Sect.53). Application of (i) gives the desired result with ∈ M ′ := ψ>( ). M M Remark: It is in fact true that if ϕ : ′ is continuous, then every D → D local inverse of ϕ is also continuous, but the proof is highly non-trivial and goes beyond the scope of this presentation. Thus, in Part (ii) of Prop.2, the requirement that ψ be continuous at ϕ(x) is, in fact, redundant. Pitfall: The expectation mentioned in the beginning is not justified. A continuous mapping ϕ : ′ can have an invertible tangent at x D → D ∈ D without being locally invertible near x. An example is the function f : R → R defined by 2t2 sin 1 + t if t R× f(t) := t ∈ . (68.3) 0 if t =0    It is differentiable and hence continuous. Since f •(0) = 1, the tangent to f at 0 is invertible. However, f is not monotone, let alone locally invertible, near 0, because one can find numbers s arbitrarly close to 0 such that f •(s) < 0.

Let I be an open interval and let f : I R be a function of class C1. → If the tangent to f at a given t I is invertible, i.e. if f •(t) = 0, one can ∈ 6 easily prove, not only that f is locally invertible near t, but also that it has a local inverse near t that is of class C1. This result generalizes to mappings ϕ : ′, but the proof is far from easy. D → D Local Inversion Theorem: Let and ′ be open subsets of flat spaces, D D let ϕ : ′ be of class C1 and let x be such that the tangent to ϕ at D → D ∈ D x is invertible. Then ϕ is locally invertible near x and every local inverse of ϕ near x is differentiable at ϕ(x). Moreover, there exists a local inverse ψ of ϕ near x that is of class C1 and satisfies −1 yψ =( ϕ) for all y Dom ψ. (68.4) ∇ ∇ψ(y) ∈ Before proceeding with the proof, we state two important results that are closely related to the Local Inversion Theorem. Implicit Mapping Theorem: Let , ′, ′′ be flat spaces, let be an E E E A open subset of ′ and let ω : ′′ be a mapping of class C1. Let E×E A→E (xo, yo) , zo := ω(xo, yo), and assume that yo ω(xo, ) is invertible. ∈ A ∇ • ′ Then there exist an open neighborhood of xo and a mapping ϕ : , D D→E differentiable at xo, such that ϕ(xo) = yo and Gr(ϕ) , and ⊂ A ω(x,ϕ(x)) = zo for all x . (68.5) ∈ D Moreover, and ϕ can be chosen such that ϕ is of class C1 and D ϕ(x) = ( ω(x,ϕ(x)))−1 ω(x,ϕ(x)) for all x . (68.6) ∇ − ∇(2) ∇(1) ∈ D 246 CHAPTER 6. DIFFERENTIAL CALCULUS

Remark: The mapping ϕ is determined implicitly by an equation in this sense: for any given x , ϕ(x) is a solution of the equation ∈ D ′ ? y , ω(x, y) = zo. (68.7) ∈E ′ In fact, one can find a neighborhood of (xo, yo) in such that ϕ(x) M E×E is the only solution of

? y , ω(x, y) = zo, (68.8) ∈ M(x,•) where := y ′ (x, y) . M(x,•) { ∈E | ∈ M} Differentiation Theorem for Inversion Mappings: Let and ′ V V be linear spaces of equal dimension. Then the set Lis( , ′) of all linear V V isomorphorisms from onto ′ is a (non-empty) open subset of Lin( , ′), V V V V the inversion mapping inv : Lis( , ′) Lin( ′, ) defined by inv(L) := V V → V V L−1 is of class C1, and its gradient is given by

−1 −1 ′ ( Linv)M = L ML for all M Lin( , ). (68.9) ∇ − ∈ V V The proof of these three theorems will be given in stages, which will be designated as lemmas. After a basic preliminary lemma, we will prove a weak version of the first theorem and then derive from it weak versions of the other two. The weak version of the last theorem will then be used, like a bootstrap, to prove the final version of the first theorem; the final versions of the other two theorems will follow. Lemma 1: Let be a linear space, let ν be a norm on , and put V V := Ce(ν). Let f : be a mapping with f(0) = 0 such that f 1 B B → V − B⊂V is constricted with κ := str(f 1 ; ν,ν) < 1. (68.10) − B⊂V Then the adjustment f (1−κ)B (see Sect.03) of f has an inverse and this | inverse is confined near 0. Proof: We will show that for every w (1 κ) , the equation ∈ − B ? z , f(z) = w (68.11) ∈ B has a unique solution. To prove uniqueness, suppose that z , z are solutions of (68.11) for 1 2 ∈ B a given w . We then have ∈ V (f 1 )(z ) (f 1 )(z ) = z z − B⊂V 1 − − B⊂V 2 2 − 1 68. LOCAL INVERSION, IMPLICIT MAPPINGS 247 and hence, by (68.10), ν(z z ) κν(z z ), which amounts to 2 − 1 ≤ 2 − 1 (1 κ)ν(z z ) 0. This is compatible with κ < 1 only if ν(z z )=0 − 2 − 1 ≤ 1 − 2 and hence z1 = z2. To prove existence, we assume that w (1 κ) is given and we put w ∈ − B α := ν( ) , so that α [0, 1[. For every v α , we have ν(v) α and 1−κ ∈ ∈ B ≤ hence, by (68.10)

ν((f 1 )(v)) κν(v) κα, − B⊂V ≤ ≤ which implies that

ν(w (f(v) v)) ν(w) + ν((f 1 )(v)) − − ≤ − B−V (1 κ)α + κα = α. ≤ −

Therefore, it makes sense to define hw : α α by B → B hw(v) := w (f(v) v) for all v α . (68.12) − − ∈ B Since hw is the difference of a constant with value w and a restriction of f 1 , it is constricted and has an absolute striction no greater than − B−V κ. Therefore, it is a contraction, and since its domain α is closed, the B Contraction Fixed Point Theorem states that hw has a fixed point z α . ∈ B It is evident from (68.12) that a fixed point of hw is a solution of (68.11). We proved that (68.11) has a unique solution z and that this solution ∈ B satisfies ν(w) ν(z) α := . (68.13) ≤ 1 κ − If we define g : (1 κ) f <((1 κ) ) in such a way that g(w) is this − B → − B solution of (68.11), then g is the inverse we are seeking. It follows from (68.13) that ν(g(w)) 1 ν(w) for all w (1 κ) . In view of part (ii) ≤ 1−κ ∈ − B of Prop.2 of Sect.62, this proves that g is confined. Lemma 2: The assertion of the Local Inversion Theorem, except possi- bly the statement starting with “Moreover”, is valid. Proof: Let α : ′ be the (invertible) tangent to ϕ at x. Since E →E D is open, we may choose a norming cell such that x + . We define B B ⊂ D f : by B → V ← x+B f := (α ′ ϕ x (x + 1 ) ) x. (68.14) |D ◦ | +B ◦ V |B − Since ϕ is of class C1, so is f. Clearly, we have

f(0) = 0, 0f = 1 . (68.15) ∇ V 248 CHAPTER 6. DIFFERENTIAL CALCULUS

Put ν := no , so that = Ce(ν). Choose ε ]0, 1[ (ε := 1 would do). If we B B ∈ 2 apply Prop.4 of Sect.64 to the case when ϕ there is replaced by f and k by 0 , we see that there is δ ]0, 1] such that { } ∈

κ := str(f δ 1δ ; ν,ν) ε. (68.16) | B − B⊂V ≤ Now, it is clear from (64.2) that a striction relative to ν and ν′ remains unchanged if ν and ν′ are multiplied by the same strictly positive factor. 1 1 Hence, (68.16) remains valid if ν there is replaced by δ ν. Since Ce( δ ν) = δCe(ν) = δ (see Prop.6 of Sect.51), it follows from (68.16) that Lemma B 1 1 can be applied when ν, , and f there are replaced by ν,δ and f δ , B δ B | B respectively. We infer that if we put

< o := (1 κ)δ , o := f ( o), M − B N M Mo then f has an inverse g : o o that is confined near zero. Note |No M → N that o and o are both open neighborhoods of zero, the latter because it M N is the pre-image of an open set under a continuous mapping. We evidently have g V 1 =(1 f ) g. | − Mo⊂V No⊂V − |No ◦ In view of (68.15), 1 f is small near 0 . Since g is confined near No⊂V − |No ∈ V 0, it follows by Prop.3 of Sect.62 that g V 1 is small near zero. By the | − Mo⊂V Characterization of Gradients of Sect.63, we conclude that g is differentiable at 0 with 0g = 1 . ∇ V We now define

′ := x + o, := ϕ(x)+( α)>( o). N N N ∇ M These are open neighborhoods of x and ϕ(x), respectively. A simple calcu- lation, based on (68.14) and g =(f Mo )←, shows that |No

N ← Mo ψ := (x +1 ) g (α x) ′ (68.17) V |No ◦ ◦ − |N ′ is the inverse of ϕ N . Using the Chain Rule and the fact that g is dif- |N ferentiable at 0, we conclude from (68.17) that ψ is differentiable at ϕ(x).

Lemma 3: The assertion of the Implicit Mapping Theorem, except pos- sibly the statement starting with “Moreover”, is valid. Proof: We defineω ˜ : ′′ by A→E×E ω˜(x, y):=(x, ω(x, y)) for all (x, y) . (68.18) ∈ A 68. LOCAL INVERSION, IMPLICIT MAPPINGS 249

Of course,ω ˜ is of class C1. Using Prop.2 of Sect.63 and (65.4), we see that

( ω˜)(u, v)=(u, ( x ω( , yo))u +( y ω(xo, ))v) ∇(xo,yo) ∇ o · ∇ o · ′ for all (u, v) . Since y ω(xo, ) is invertible, so is ω˜. In- ∈V×V ∇ o · ∇(xo,yo) deed,we have

−1 −1 ( ω˜) (u, w)=(u, ( y ω(xo, )) (w ( x ω( , yo))u)) ∇(xo,yo) ∇ o · − ∇ o · ′′ for all (u, w) . Lemma 2 applies and we conclude thatω ˜ has a local ∈V×V inverse ρ with (xo, yo) Rng ρ that is differentiable atω ˜(xo, yo)=(xo,zo). ∈ In view of Prop.2, (i), we may assume that the domain of ρ is of the form ′′ ′′ , where and are open neighborhoods of xo and zo, respectively. D × D D D Let ψ : ′′ and ψ′ : ′′ ′ be the value-wise terms of ρ, so D × D →E D × D →E that ρ(x,z)=(ψ(x,z), ψ′(x,z)) for all x , z ′′. ∈ D ∈ D Then, by (68.18),

ω˜(ρ(x, y))=(ψ(x,z), ω(ψ(x,z), ψ′(x,z))) = (x,z) and hence

ψ(x,z) = x, ω(x, ψ′(x,z)) = z for all x , z ′′. (68.19) ∈ D ∈ D ′ ′ Since ρ(xo,zo)=(ψ(xo,zo), ψ (xo,zo)) = (xo, yo), we have ψ (xo,zo) = yo. ′′ ′ Therefore, if we define ϕ : by ϕ(x) := ψ (x,zo), we see that ϕ(xo) = D→E yo and (68.5) are satisfied. The differentiability of ϕ at xo follows from the differentiablity of ρ at (xo,zo). If we define := Rng ρ, it is immediate that ϕ(x) is the only solution M of (68.8). Lemma 4: Let and ′ be linear spaces of equal dimension. Then ′ V V ′ Lis( , ) is an open subset of Lin( , ) and the inversion mapping V V ′ ′ V V inv : Lis( , ) Lin( , ) defined by inv(L) := L−1 is differentiable. V V → V V ′ Proof: We apply Lemma 3 with replaced by Lin( , ), ′ by ′ E V V E Lin( , ), and ′′ by Lin . For ω of Lemma 3 we take the mapping V V E V ′ ′ (L, M) ML from Lin( , ) Lin( , ) to Lin . Being bilinear, 7→ V V × V V V this mapping is of class C1 (see Prop.1 of Sect.66). Its partial 2-gradient at (L, M) does not depend on M. It is the right-multiplication RiL ′ ′∈ Lin(Lin( , ), Lin ) defined by RiL(K) := KL for all K Lin( , ). V V V ∈ V V It is clear that RiL is invertible if and only if L is invertible. (We have −1 ′ (RiL) = RiL−1 if this is the case.) Thus, given Lo Lis( , ), we can ∈ V V 250 CHAPTER 6. DIFFERENTIAL CALCULUS

−1 apply Lemma 3 with Lo, Lo , and 1V playing the roles of xo, yo, and zo, respectively. We conclude that there is an open neighborhood of Lo in ′ D Lin( , ) such that the equation V V ′ ? M Lin( , ), LM = 1 ∈ V V V has a solution for every L . Since this solution can only be L−1 = inv(L), ∈ D it follows that the role of the mapping ϕ of Lemma 3 is played by inv D. ′ | Therefore, we must have Lis( , ) and inv must be differentiable at ′ D ⊂ V V ′ Lo. Since Lo Lis( , ) was arbitrary, it follows that Lis( , ) is open ∈ V V V V and that inv is differentiable. Completion of Proofs: We assume that hypotheses of the Local Inver- sion Theorem are satisfied. The assumption that ϕ has an invertible tangent ′ at x is equivalent to xϕ Lis( , ). Since ϕ is continuous and ∈ D ′ ∇ ∈ V V ′ ∇ since Lis( , ) is an open subset of Lin( , ) by Lemma 4, it follows that V V ′ V V := ( ϕ)<(Lis( , )) is an open subset of . By Lemma 2, we can choose G ∇ V V E a local inverse of ϕ near x which is differentiable and hence continuous at ϕ(x). By Prop.2, (ii), we can adjust this local inverse such that its range is included in . Let ψ be the local inverse so adjusted. Then ϕ has an G invertible tangent at every z Rng ψ. Let z Rng ψ be given. Applying ∈ ∈ Lemma 2 to z, we see that ϕ must have a local inverse near z that is differ- entiable at ϕ(z). By Prop.1, it follows that ψ must also be differentiable at ϕ(z). Since z Rng ψ was arbitrary, it follows that ψ is differentiable. The ∈ ′ ′ formula (68.4) follows from Prop.1. Since inv : Lis( , ) Lin( , ) is V V → V V differentiable and hence continuous by Lemma 4, since ϕ is continuous by ∇ assumption, and since ψ is continuous because it is differentiable, it follows ′ that the composite inv ϕ Lis(V,V ) ψ is continuous. By (68.4), this com- ◦∇ |Rng ψ ◦ posite is none other than ψ, and hence the proof of the Local Inversion ∇ Theorem is complete. Assume, now, that the hypotheses of the Implicit Mapping Theorem are satisfied. By the Local Inversion Theorem, we can then choose a local inverse ρ ofω ˜, as defined by (68.18), that is of class C1. It then follows that the function ϕ : ′ defined in the proof of Lemma 3 is also of class D→E C1. The formula (68.6) is obtained easily by differentiating the constant x ω(x,ϕ(x)), using Prop.1 of Sect.65 and the Chain Rule. 7→ To prove the Differentiation Theorem for Inversion Mappings, one applies the Implicit Mapping Theorem in the same way as Lemma 3 was applied to obtain Lemma 4. Pitfall: Let I and I′ be intervals and let f : I I′ be of class C1 and → surjective. If f has an invertible tangent at every t I, then f is not only ∈ 69. EXTREME VALUES, CONSTRAINTS 251 locally invertible near every t I but (globally) invertible, as is easily seen. ∈ This conclusion does not generalize to the case when I and I′ are replaced by connected open subsets of flat spaces of higher dimension. The curvilinear coordinate systems discussed in Sect.74 give rise to counterexamples. One can even give counterexamples in which I and I′ are replaced by convex open subsets of flat spaces.

Notes 68

(1) The Local Inversion Theorem is often called the “ Theorem”. Our term is more descriptive.

(2) The Implicit Mapping Theorem is most often called the “ Theo- rem”.

′ (3) If the sets D and D of the Local Inversion Theorem are both subsets of Rn for some n ∈ N, then ∇xϕ can be identified with an n-by-n matrix (see (65.12)). This matrix is often called the “Jacobian matrix” and its determinant the “Jacobian” of ϕ at x. Some textbook authors replace the condition that ∇xϕ be invertible by the condition that the Jacobian be non-zero. I believe it is a red herring to drag in determinants here. The Local Inversion Theorem has an extension to infinite- dimensional spaces, where it makes no sense to talk about determinants.

69 Extreme Values, Constraints

In this section and denote flat spaces with translation spaces and , E F V W respectively, and denotes a subset of . D E The following definition is “local” variant of the definition of an ex- tremum given in Sect.08. Definition 1: We say that a function f : R attains a local D → maximum [local minimum] at x if there is Nhdx( ) (see ∈ D N ∈ D (56.1)) such that f attains a maximum [minimum] at x. We say that |N f attains a local extremum at x if it attains a local maximum or local minimum at x. The following result is a direct generalization of the Extremum Theorem of elementary calculus (see Sect.08). Extremum Theorem: If f : R attains a local extremum at D → x Int and if f is differentiable at x, then xf = 0. ∈ D ∇ Proof: Let v be given. It is clear that Sx,v := s R x + sv ∈ V { ∈ | ∈ Int is an open subset of R and the function (s f(x + sv)) : Sx,v R D} 7→ → attains a local extremum at 0 R and is differentiable at 0 R. Hence, by ∈ ∈ 252 CHAPTER 6. DIFFERENTIAL CALCULUS the Extremum Theorem of elementary calculus, and by (65.13) and (65.14), we have 0 = ∂ (s f(x + sv)) = (ddvf)(x)=( xf)v. 0 7→ ∇ Since v was arbitrary, the desired result xf = 0 follows. ∈ V ∇ From now on we assume that is an open subset of . D E The next result deals with the case when a restriction of f : R D → to a suitable subset of , but not necessarily f itself, has an extremum at D a given point x . This subset of is assumed by the set on which a ∈ D D given mapping ϕ : has a constant value, i.e. the set ϕ<( ϕ(x) ). D→F { } The assertion that f < has an extremum at x is often expressed by |ϕ ({ϕ(x)}) saying that f attains an extremum at x subject to the constraint that ϕ be constant. Constrained-Extremum Theorem: Assume that f : R and 1 D → ϕ : are both of class C and that xϕ Lin( , ) is surjective for D→F ∇ ∈ V W a given x . If f < attains a local extremum at x, then ∈ D |ϕ ( ϕ(x)}) ⊤ xf Rng( xϕ) . (69.1) ∇ ∈ ∇ Proof: We put := Null xϕ and choose a supplement of in . U ∇ Z U V Every neighborhood of 0 in includes a set of the form + , where V N M N is an open neighborhood of 0 in and is an open neighborhood of 0 in U M . Since is open, we may select and such that x + + . Z D N M N M ⊂ D We now define ω : by N×M→F ω(u, z) := ϕ(x + u + z) for all u , z . (69.2) ∈ N ∈ M It is clear that ω is of class C1 and we have

0ω(0, ) = xϕ . (69.3) ∇ · ∇ |Z Since xϕ is surjective, it follows from Prop.5 of Sect.13 that xϕ is ∇ ∇ |Z invertible. Hence we can apply the Implicit Mapping Theorem of Sect.68 to the case when there is replaced by , the points xo, yo by 0, and zo A N × M by ϕ(x). We obtain an open neighborhood of 0 in with and a G U G ⊂ N mapping h : , of class C1, such that h(0) = 0 and G→Z ω(u, h(u)) = ϕ(x) for all u . (69.4) ∈ G Since ω(0, 0) = 0ω( , 0) = xϕ = 0 by the definition ∇(1) ∇ · ∇ |U := Null xϕ, the formula (68.6) yields in our case that 0h = 0, i.e. U ∇ ∇ we have h(0) = 0, 0h = 0. (69.5) ∇ 69. EXTREME VALUES, CONSTRAINTS 253

It follows from (69.2) and (69.4) that x + u + h(u) ϕ<( ϕ(x) ) for all ∈ { } u and hence that ∈ G (u f(x + u + h(u))) : R 7→ G → has a local extremum at 0. By the Extremum Theorem and the Chain Rule, it follows that

V 0 = 0(u f(x + u + h(u))) = xf(1 + 0h ), ∇ 7→ ∇ U⊂V ∇ | and therefore, by (69.5), that xf U = ( xf)1U⊂V = 0, which means that ⊥ ⊥ ∇ | ∇ xf = (Null xϕ) . The desired result (69.1) is obtained by applying ∇ ∈U ∇ (22.9). In the case when := R, the Constrained-Extremum Theorem reduces F to Corollary 1: Assume that f,g : R are both of class C1 and that D → xg = 0 for a given x . If f < attains a local extremum at x, ∇ 6 ∈ D |g ({g(x)}) then there is λ R such that ∈

xf = λ xg. (69.6) ∇ ∇ In the case when := RI , we can use (23.19) and obtain F Corollary 2: Assume that f : R and all terms gi : R D → I 1 D → in a finite family g := (gi i I) : R are of class C and that | ∈ D → ( xgi i I) is linearly independent for a given x . If f < ∇ | ∈ ∈ D |g ({g(x)}) attains a local extremum at x, then there is λ RI such that ∈

xf = λi xgi. (69.7) ∇ ∇ Xi∈I Remark 1: If is two-dimensional then Cor.1 can be given a geo- E < metrical interpretation as follows: The sets g := g ( g(x) ) and f := L { } L f <( f(x) ) are the “level-lines” through x of f and g, respectively (see { } Figure). 254 CHAPTER 6. DIFFERENTIAL CALCULUS

If these lines cross at x, then the value f(y) of f at y g strictly ∈ L increases or decreases as y moves along g from one side of the line f to L L the other and hence f cannot have an extremum at x. Hence, if f |Lg |Lg has an extremum at x, the level-lines must be tangent. The assertion (69.6) expresses this tangency. The condition xg = 0 insures that the level-line ∇ 6 g does not degenerate to a point. L Notes 69

(1) The number λ of (69.6) or the terms λi occuring in (69.7) are often called “Lagrange multipliers”.

610 Representations

Let I Sub R be some genuine interval and let h : I be a continuous ∈ → V process with values in a given linear space . For every λ ∗ the composite V ∈ V λh : I R is then continuous. Given a, b I one can therefore form the →b ∈ integral a λh in the sense of elementary integral calculus (see Sect.08). It is clear that the mapping R b (λ λh): ∗ R 7→ V → Za is linear and hence an element of ∗∗ = . V ∼ V Definition 1: Let h : I be a continuous process with values in the → V linear space . Given any a, b I the integral of h from a to b is defined V b ∈ to be the unique element h of which satisfies a V b R b λ h = λh for all λ ∗. (610.1) ∈ V Za Za Proposition 1: Let h : I be a continuous process and let L be a → V linear mapping from to a given linear space . Then Lh : I is a V W → W continuous process and

b b L h = Lh for all a, b I. (610.2) ∈ Za Za Proof: Let ω ∗ and a, b I be given. Using Def.1 twice, we obtain ∈W ∈ b b b b b ω(L h)=(ωL) h = (ωL)h = ω(Lh) = ω Lh. Za Za Za Za Za 610. INTEGRAL REPRESENTATIONS 255

Since ω ∗ was arbitrary, (610.2) follows. ∈W Proposition 2: Let ν be a norm on the linear space and let h : I V → V be a continuous process. Then ν h : I R is continuous and, for every ◦ → a, b I, we have ∈ b b ν( h) ν h . (610.3) ≤ ◦ Za Za

Proof: The continuity of ν h follows from the continuity of ν (Prop.7 ◦ of Sect.56) and the Composition Theorem for Continuity of Sect.56. It follows from (52.14) that

λh (t) = λh(t) ν∗(λ)ν(h(t)) = ν∗(λ)(ν h)(t) | | | | ≤ ◦ holds for all λ ∗ and all t I, and hence that ∈ V ∈ λh ν h for all λ Ce(ν∗). | | ≤ ◦ ∈ Using this result and (610.1), (08.40), and (08.42), we obtain

b b a b λ h = λh λh ν h ≤ | | ≤ ◦ Za Za Zb Za for all λ Ce(ν ∗) and all a, b I . The desired result now follows from the ∈ ∈ Norm-Duality Theorem, i.e. (52.15). Most of the rules of elementary integral calculus extend directly to inte- grals of processes with values in a linear space. For example, if h : I is → V continuous and if a, b, c I, then ∈ b c b h = h + h. (610.4) Za Za Zc The following result is another example. Fundamental Theorem of Calculus: Let be a flat space with trans- E lation space . If h : I is a continuous process and if x and a I V → V ∈E ∈ are given, then the process p : I defined by →E t p(t) := x + h for all t I (610.5) ∈ Za is of class C1 and p• = h. Proof: Let λ ∗ be given. By (610.5) and (610.1) we have ∈ V t t λ(p(t) x) = λ h = λh for all t I. − ∈ Za Za 256 CHAPTER 6. DIFFERENTIAL CALCULUS

By Prop.1 of Sect.61 p is differentiable and we have (λ(p x))• = λp•. − Hence, the elementary Fundamental Theorem of Calculus (see Sect.08) gives λp• = λh. Since λ ∗ was arbitrary, we conclude that p• = h, which is ∈ V continuous by assumption. In the remainder of this section, we assume that the following items are given: (i) a flat space with translation space , (ii) An open subset of E V D , (iii) An open subset of R such that, for each x , E D ×E ∈ D := s R (s, x) D(·,x) { ∈ | ∈ D} is a non-empty open inteval, (iv) a linear space . W Consider now a mapping h : such that h( ,x) : is D→W · D(·,x) → W continuous for all x . Given a, b R such that a, b for all x , ∈ D ∈ ∈ D(·,x) ∈ D we can then define k : by D→W b k(x) := h( ,x) for all x . (610.6) · ∈ D Za We say that k is defined by an integral representation. Proposition 3: If h : is continuous, so is the mapping D → W k : defined by the integral representation (610.6). D→W Proof: Let x be given. Since is open, we can choose a norm ν ∈ D D on such that x + Ce(ν) . We may assume, without loss of generality, V ⊂ D that a < b. Then [a, b] is a compact interval and hence, in view of Prop.4 of Sect.58, [a, b] (x + Ce(ν)) is a compact subset of . By the Uniform × D Continuity Theorem of Sect. 58, it follows that the restriction of h to [a, b] × (x + Ce(ν)) is uniformly continuous. Let µ be a norm on and let ε P× W ∈ be given. By Prop.4 of Sect.56, we can determine δ ]0, 1] such that ∈ ε µ(h(s, y) h(s, x)) < − b a − for all s [a, b] and all y x + δCe(ν). Hence, by (610.6) and Prop.2, we ∈ ∈ have b b µ(k(y) k(x)) = µ (h( , y) h( ,x)) µ (h( , y) h( ,x)) − a · − · ≤ a ◦ · − · Z ε  Z (b a) = ε ≤ − b a − whenever ν(y x) < δ. Since ε P× was arbitrary, the continuity of k − ∈ at x follows by Prop.1 of Sect.56. Since x was arbitrary, the assertion ∈ D follows. 610. INTEGRAL REPRESENTATIONS 257

The following is a stronger version of Prop.3. Proposition 4: Let R R be defined by D ⊂ × ×E := (a, b, x) x , (a, x) , (b, x) . (610.7) D { | ∈ D ∈ D ∈ D} Then, if h : is continuous, so is the mapping D → W (a, b, x) b h( ,x) : . 7→ a · D→W  Proof: RChoose a norm µ on . Let (a, b, x) be given. Since W ∈ D µ h is continuous at (a, x) and at (b, x) and since is open, we can find ◦ × D Nhdx( ) and σ P such that N ∈ D ∈ := ((a+] σ, σ[) (b+] σ, σ[)) N − ∪ − × N is a subset of and such that the restriction of µ h to is bounded by D ◦ N some β P×. By Prop.2, it follows that ∈ a t µ( h( , y) + h( , y)) ( s a + t b )β (610.8) · · ≤ | − | | − | Zs Zb for all y , all s a +] σ, σ[, and all t b +] σ, σ]. ∈ N ∈ − ∈ − Now let ε P× be given. If we put δ := min σ, ε , it follows from ∈ { 4β } (610.8) that a t ε µ h( , y) + h( , y) < (610.9) · · 2 Zs Zb  for all y , all s a +] δ, δ], and all t b +] δ, δ]. On the other hand, ∈ N ∈ − ∈ − by Prop.3, we can determine Nhdx( ) such that M ∈ D b b ε µ h( , y) h( ,x) < (610.10) · − · 2 Za Za  for all y . By (610.4) we have ∈ M t b b b h( , y) h( ,x) = h( , y) h( ,x) s · − a · a · − a · Z Z Z a Z t + h( , y) + h( , y). · · Zs Zb Hence it follows from (610.9) and (610.10) that

t b µ h( , y) h( ,x) <ε · − · Zs Za  258 CHAPTER 6. DIFFERENTIAL CALCULUS for all y Nhdx( ), all s a+] δ, δ], and all t b+] δ, δ[. Since ∈M∩N ∈ D ∈ − ∈ − ε P× was arbitrary, it follows from Prop.1 of Sect.56 that the mapping ∈ under consideration is continuous at (a, b, x). Differentiation Theorem for Integral Representations: Assume that h : satisfies the following conditions: D→W (i) h( ,x) is continuous for every x . • ∈ D (ii) h(s, ) is differentiable at x for all (s, x) and the partial 2-gradient • ∈ D h : Lin( , ) is continuous. ∇(2) D→ V W Then the mapping k : defined by the integral representation D→W (610.6) is of class C1 and its gradient is given by the integral representation

b xk = h( ,x) for all x . (610.11) ∇ ∇(2) • ∈ D Za Roughly, this theorem states that if h satisfies the conditions (i) and (ii), one can differentiate (610.6) with respect to x by “differentiating under the integral sign”. Proof: Let x be given. As in the proof of Prop.3, we can choose ∈ D a norm ν on such that x + Ce(ν) , and we may assume that a < b. V ⊂ D Then [a, b] x + Ce(ν) is a compact subset of . By the Uniform Conti- × D nuity Theorem of Sect.58, the restriction of h to [a, b] x + Ce(ν) is  ∇(2) × uniformly continuous. Hence, if a norm µ on and ε P× are given, we W ∈  can determine δ ]0, 1 ] such that ∈ ε h(s, x + u) h(s, x) ν,µ < (610.12) k∇(2) −∇(2) k b a − holds for all s [a, b] and u δCe(ν). ∈ ∈ We now define n : (0,x) by D − →W n(s, u) := h(s, x + u) h(s, x) h(s, x) u. (610.13) − − ∇(2) It is clear that n exists and is given by  ∇(2) n(s, u) = h(s, x + u) h(s, x). ∇(2) ∇(2) −∇(2) By (610.12), we hence have ε n(s, u) ν,µ < k∇(2) k b a − 610. INTEGRAL REPRESENTATIONS 259 for all s [a, b] and u δCe(ν). By the Striction Estimate for Differentiable ∈ ∈ Mapping of Sect.64, it follows that n(s, ) is constricted for all s [a, b] • |δCe(ν) ∈ and that ε str(n(s, ) ; ν,µ) for all s [a, b]. • |δCe(ν) ≤ b a ∈ − Since n(s, 0) = 0 for all s [a, b], the definition (64.1) shows that ∈ ε µ (n(s, u)) ν(u) for all s [a, b] ≤ b a ∈ − and all u δCe(ν). Using Prop.2, we conclude that ∈ b b µ n( , u) µ (n( , u)) εν(u) • ≤ ◦ • ≤ Za  Za whenever u δCe(ν). Since ε P× was arbitrary, it follows that the map- ∈ ∈ ping u b n( , u) is small near 0 (see Sect.62). Now, integrating 7→ a • ∈ V (610.13) with respect to s [a, b] and observing the representation (610.6) R ∈ of k, we obtain b b n( , u) = k(x + u) k(u) h( ,x) u • − − ∇(2) • Za Za  for all u x. Therefore, by the Characterization of Gradients of Sect.63, ∈ D− k is differentiable at x and its gradient is given by (610.11). The continuity of k follows from Prop.3. ∇ The following corollary deals with generalizations of integral representa- tions of the type (610.6). Corollary: Assume that h : satisfies the conditions (i) and D→W (ii) of the Theorem. Assume, further, that f : R and g : R are D → D → differentiable and satisfy f(x),g(x) for all x . ∈ D(•,x) ∈ D Then k : , defined by D→W g(x) k(x) := h( ,x) for all x , (610.14) • ∈ D Zf(x) is of class C1 and its gradient is given by

xk = h(g(x),x) xg h(f(x),x) xf (610.15) ∇ ⊗∇ − ⊗∇ g(x) + h( ,x) for all x . ∇(2) • ∈ D Zf(x) 260 CHAPTER 6. DIFFERENTIAL CALCULUS

Proof: Consider the set R R defined by (610.7), so that D ⊂ × ×E a, b for all (a, b, x) . Since is assumed to be an interval ∈ D(•,x) ∈ D D(•,x) for each x , we can define m : by ∈ D D→W b m(a, b, x) := h( ,x). (610.16) • Za

By the Theorem, m : Lin( , ) exists and is given by ∇(3) D→ V W

b m(a, b, x) = h( ,x). (610.17) ∇(3) ∇(2) • Za It follows from Prop.4 that m is continuous. By the Fundamental The- ∇(3) orem of Calculus and (610.16), the partial derivatives m,1 and m,2 exist, are continuous, and are given by

m, (a, b, x) = h(a, x), m, (a, b, x) = h(b, x). (610.18) 1 − 2 By the Partial Gradient Theorem of Sect.65, it follows that m is of class C1. Since m = m, and m = m, , it follows from (65.9), (610.17), ∇(1) 1 ⊗ ∇(2) 2 ⊗ and (610.18) that

( m)(α,β, u) = h(b, x) β h(a, x) α (610.19) ∇(a,b,x) ⊗ − ⊗ b + h( ,x) u ∇(2) • Za  for all (a, b, x) and all (α,β, u) R R . ∈ D ∈ × × V Now, since f and g are differentiable, so is (f,g, 1 ) : R R D⊂E D→ × ×E and we have

x(f,g, 1 )=( xf, xg, 1 ) (610.20) ∇ D⊂E ∇ ∇ V for all x . (See Prop.2 of Sect.63). By (610.14) and (610.16) we have ∈ D k = m (f,g, 1 ) D. Therefore, by the General Chain Rule of Sect.63, k ◦ D⊂E | is of class C1 and we have

xk =( m) x(f,g, 1 ) ∇ ∇(f(x),g(x),x) ∇ D⊂E for all x . Using (610.19) and (610.20), we obtain (610.15). ∈ D 611. , SYMMETRY OF SECOND GRADIENTS 261

611 Curl, Symmetry of Second Gradients

In this section, we assume that flat spaces and ′, with translation spaces E E and ′, and an open subset of are given. V V D E If H : Lin( , ′) is differentiable, we can apply the identification D → V V (24.1) to the codomain of the gradient

H : Lin , Lin( , ′) = Lin ( 2, ′), ∇ D→ V V V ∼ 2 V V and hence we can apply the switching defined by (24.4) to the values of H. ∇ Definition 1: The curl of a differentiable mapping H : Lin( , ′) D→ V V is the mapping Curl H : Lin ( 2, ′) defined by D→ 2 V V ∼ (Curl H)(x) := xH ( xH) for all x . (611.1) ∇ − ∇ ∈ D The values of Curl H are skew, i.e. Rng Curl H Skew ( 2, ′). ⊂ 2 V V The following result deals with conditions that are sufficient or necessary for the curl to be zero. Curl-Gradient Theorem: Assume that H : Lin( , ′) is of class D→ V V C1. If H = ϕ for some ϕ : ′ then Curl H = 0. Conversely, if is ∇ D→E D convex and Curl H = 0 then H = ϕ for some ϕ : ′. ∇ D→E The proof will be based on the following Lemma: Assume that is convex. Let q and q′ ′ be given and D ∈ D ∈E let ϕ : ′ be defined by D→E 1 ϕ(q + v) := q′ + H(q + sv)ds v for all v q. (611.2) ∈ D − Z0  Then ϕ is of class C1 and we have

1 ϕ(q + v) = H(q + v) s(Curl H)(q + sv)ds v (611.3) ∇ − Z0  for all v q. ∈ D − Proof: We define R by D ⊂ × V := (s, u) u q, q + su . D { | ∈ D − ∈ D} Since is open and convex, it follows that, for each u q, u := D ∈ D − D(•, ) s R q + su is an open interval that includes [0, 1]. Since H is of { ∈ | ∈ D} class C1, so is

((s, u) H(q + su)) : Lin( , ′); 7→ D→ V V 262 CHAPTER 6. DIFFERENTIAL CALCULUS hence we can apply the Differentiation Theorem for Integral Representations of Sect.610 to conclude that u 1 H(q + su)ds is of class C1 and that its 7→ 0 gradient is given by R 1 1 v(u H(q + su)ds) = s H(q + sv)ds ∇ 7→ ∇ Z0 Z0 for all v q. By the Product Rule (66.7), it follows that the mapping ∈ D − ϕ defined by (611.2) is of class C1 and that its gradient is given by

1 1 ( q vϕ)u = s H(q + sv)uds v + H(q + sv)ds u ∇ + ∇ Z0  Z0  for all v q and all u . Applying the definitions (24.2), (24.4), and ∈ D − ∈ V (611.1), and using the linearity of the switching and Prop.1 of Sect.610, we obtain

1 ( q vϕ)u = s(Curl H)(q + sv)ds (u, v) (611.4) ∇ + Z0  1 + (s( H(q + sv))v + H(q + sv))ds u ∇ Z0  for all v q and all u . By the Product Rule (66.10) and the Chain ∈ D − ∈ V Rule we have, for all s [0, 1], ∈

∂s(t tH(q + tv)) = H(q + sv) + s( H(q + sv))v. 7→ ∇ Hence, by the Fundamental Theorem of Calculus, the second integral on the right side of (611.4) reduces to H(q + v). Therefore, since Curl H has skew values and since u was arbitrary, (611.4) reduces to (611.3). ∈ V Proof of the Theorem: Assume, first, that is convex and that D Curl H = 0. Choose q , q′ ′ and define ϕ : ′ by (611.2). Then ∈ D ∈E D→E H = ϕ by (611.3). ∇ Conversely, assume that H = ϕ for some ϕ : ′. Let q ∇ D→E ∈ D be given. Since is open, we can choose a convex open neighborhood D N of q with . Let v q by given. Since is convex, we have N ⊂ D ∈ N − N q + sv for all s [0, 1], and hence we can apply the Fundamental ∈ N ∈ Theorem of Calculus to the derivative of s ϕ(q + sv), with the result 7→ 1 ϕ(q + v) = ϕ(q) + H(q + sv)ds v. Z0  611. CURL, SYMMETRY OF SECOND GRADIENTS 263

Since v q was arbitrary, we see that the hypotheses of the Lemma are ∈ N − satisfied for instead of , q′ := ϕ(q), ϕ instead of ϕ, and H = ϕ N D |N |N ∇ |N instead of H. Hence, by (611.3), we have

1 s(Curl H)(q + sv)ds v = 0 (611.5) Z0  for all v q. Now let u be given. Since q is a neighborhood ∈ N − ∈ V N − of 0 , there is a δ P× such that tu q for all t ]0,δ]. Hence, ∈ V ∈ ∈ N − ∈ substituting tu for v in (611.5) and dividing by t gives

1 s(Curl H)(q + stu)ds u = 0 for all t ]0,δ]. ∈ Z0  since Curl H is continuous, we can apply Prop.3 of Sect.610 and, in the limit t 0, obtain ((Curl H)(q))u = 0. Since u and q were arbitrary, → ∈ V ∈ D we conclude that Curl H = 0. Remark 1: In the second part of the Theorem, the condition that be D convex can be replaced by the weaker one that be “simply connected”. D This means, intuitively, that every closed curve in can be continuously D shrunk entirely within to a point. D Since Curl ϕ = (2)ϕ ( (2)ϕ)∼ by (611.1), we can restate the first ∇ ∇ − ∇ part of the Curl-Gradient Theorem as follows: Theorem on Symmetry of Second Gradients: If ϕ : ′ D→E is of class C2, then its second gradient (2)ϕ : Lin ( 2, ′) has ∇ D → 2 V V symmetric values, i.e. Rng (2)ϕ Sym ( 2, ′). ∇ ⊂ 2 V V Remark 2: The assertion that x( ϕ) is symmetric for a given x ∇ ∇ ∈ D remains valid if one merely assumes that ϕ is differentiable and that ϕ ∇ is differentiable at x. A direct proof of this fact, based on the results of Sect.64, is straightforward although somewhat tedious. We assume now that := is the set-product of flat spaces , E E1 ×E2 E1 E2 with translation spaces , , respectively. Assume that ϕ : ′ is V1 V2 D→E twice differentiable. Using Prop.1 of Sect.65 repeatedly, it is easily seen that

(2)ϕ(x) = ∇ ( ϕ)(x)(ev , ev )+( ϕ)(x)(ev , ev )+ (611.6) ∇(1)∇(1) 1 1 ∇(1)∇(2) 1 2 ( ϕ)(x)(ev , ev )+( ϕ)(x)(ev , ev ) ∇(2)∇(1) 2 1 ∇(2)∇(2) 2 2 for all x , where ev and ev are the evaluation mappings from := ∈ D 1 2 V to and , respectively (see Sect.04). Evaluation of (611.6) at V1 × V2 V1 V2 264 CHAPTER 6. DIFFERENTIAL CALCULUS

(u, v)=((u , u ), (v , v )) 2 gives 1 2 1 2 ∈ V (2)ϕ(x)(u, v) = ∇ ( ϕ)(x)(u , v )+( ϕ)(x)(u , v )+ (611.7) ∇(1)∇(1) 1 1 ∇(1)∇(2) 1 2 ( ϕ)(x)(u , v )+( ϕ)(x)(u , v ). ∇(2)∇(1) 2 1 ∇(2)∇(2) 2 2 The following is a corollary to the preceding theorem. Theorem on the Interchange of Partial Gradients: Assume that is an open subset of a product space := and that ϕ : ′ D E E1 ×E2 D→E is of class C2. Then ϕ =( ϕ)∼, (611.8) ∇(1)∇(2) ∇(2)∇(1) where the operation is to be understood as value-wise switching. ∼ Proof: Let x and w be given. If we apply (611.7) to the ∈ D ∈ V1 × V2 case when u := (w1, 0), v := (0, w2) and then with u and v interchanged we obtain (2)ϕ(x)(u, v)=( ϕ)(x)(w , w ), ∇ ∇(1)∇(2) 1 2 (2)ϕ(x)(v, u)=( ϕ)(x)(w , w ). ∇ ∇(2)∇(1) 2 1 Since w was arbitrary, the symmetry of (2)ϕ(x) gives (611.8). ∈ V1 × V2 ∇ Corollary: Let I be a finite index set and let be an open subset D of RI . If ϕ : ′ is of class C2, then the second partial derivatives D→E′ ϕ,i ,k : satisfy D → V ϕ,i ,k = ϕ,k ,i for all i,k I. (611.9) ∈ Remark 3: Assume is an open subset of a Euclidean space with D E translation space . A mapping h : = ∗ is then called a vector-field V D → V ∼ V (see Sect.71). If h is differentiable, then the range of Curl h is included in Skew ( 2, R) = Skew . If is 3-dimensional, there is a natural doubleton 2 V ∼ V V of orthogonal isomorphisms from Skew to , as will be explained in Vol.II. V V If one of these two isomorphisms, say V Orth(Skew , ), is singled out, ∈ V V we may consider the vector-curl curl h := V(Curl h) SkewV of the vector | field h (Note the lower-case “c”). This vector-curl rather than the curl of Def.1 is used in much of the literature.

Notes 611

(1) In some of the literature on “Vector Analysis”, the notation ∇ × h instead of curl h is used for the vector-curl of h, which is explained in the Remark above. This notation should be avoided for the reason mentioned in Note (1) to Sect.67. 612. LINEONIC EXPONENTIALS 265

612 Lineonic Exponentials

We note the following generalization of a familiar result of elementary cal- culus. Proposition 1: Let be a flat space with translation space and let E V I be an interval. Let g be a sequence of continuous processes in Map(I, ) V that converges locally uniformly to h Map(I, ). Let a I and q be ∈ V ∈ ∈ E given and define the sequence z in Map(I, ) by E t × zn(t) := q + gn for all n N and t I. (612.1) ∈ ∈ Za Then h is continuous and z converges locally uniformly to the process p : I defined by →E t p(t) := q + h for all t I. (612.2) ∈ Za

Proof: The continuity of h follows from the Theorem on Continuity of Uniform Limits of Sect.56. Hence (612.2) is meaningful. Let ν be a norm on . By Prop.2 of Sect.610 and (612.1) and (612.2) we have V t t ν(zn(t) p(t)) = ν (gn h) ν (gn h) (612.3) − − ≤ ◦ − Za  Za for all n N× and all t I. Now let s I be given. We can choose ∈ ∈ ∈ a compact interval [b, c] I such that a [b, c] and such that [b, c] is a ⊂ ∈ neightborhood of s relative to I (see Sect.56). By Prop.7 of Sect.58, g |[b,c] converges uniformly to h . Now let ǫ P× be given. By Prop.7 of Sect.55 |[b,c] ∈ we can determine m N× such that ∈ ǫ ν (gn h) < for all n m + N. ◦ − |[b,c] c b ∈ − Hence, by (612.3), we have ǫ ν(zn(t) p(t)) < t a <ǫ for all n m + N − | − |c b ∈ − and all t [b, c]. Since ε P× was arbitrary, we can use Prop.7 of Sect.55 ∈ ∈ again to conclude that z converges uniformly to p . Since s I was |[b,c] |[b,c] ∈ arbitrary and since [b, c] Nhds(I), the conclusion follows. ∈ From now on we assume that a linear space is given and we consider V the algebra Lin of lineons on (see Sect.18). V V 266 CHAPTER 6. DIFFERENTIAL CALCULUS

Proposition 2: The sequence S in Map (Lin , Lin ) defined by V V 1 S (L) := Lk (612.4) n k! [ kX∈n for all L Lin and all n N× converges locally uniformly. ∈ V ∈ Its limit is called the (lineonic) exponential for and is denoted by V exp : Lin Lin . The exponential exp is continuous. V V → V V Proof: We choose a norm ν on . Using Prop.6 of Sect.52 and induction, k k V we see that L ν L ν for all k N and all L Lin (see (52.9)). k k ≤ k k ∈ ∈ V Hence, given σ P×, we have ∈ k 1 k σ L ν kk! k ≤ k! σk for all k N and all L σCe( ν). Since the sum-sequence of ( k N) ∈ ∈ k•k k! | ∈ converges (to eσ), we can use Prop.9 of Sect.55 to conclude that the restric- 1 k tion of the sum-sequence S of ( L k N) to σCe( ν) converges k! | ∈ k•k uniformly. Now, given L Lin , σCe( ν) is a neighborhood of L if ∈ V k•k σ > L ν. Therefore S converges locally uniformly. The continuity of its k k limit expV follows from the Continuity Theorem for Uniform Limits. For later use we need the following: Proposition 3: Let L Lin be given. Then the constant zero is the ∈ V only differentiable process D : R Lin that satisfies → V D• = LD and D(0) = 0. (612.5)

Proof: By the Fundamental Theorem of Calculus, (612.5) is equivalent to t D(t) = (LD) for all t R. (612.6) ∈ Z0 Choose a norm ν on . Using Prop.2 of Sect.610 and Prop.6 of Sect.52, we V conclude from (612.6) that

t t D(t) ν LD(s) νds L ν D(s) νds, k k ≤ k k ≤k k k k Z0 Z0 i.e., with the abbreviations

σ(t) := D(t) ν for all t P, κ := L ν, (612.7) k k ∈ k k that t 0 σ(t) κ σ for all t P. (612.8) ≤ ≤ ∈ Z0 612. LINEONIC EXPONENTIALS 267

We define ϕ : P R by → t ϕ(t):=e−κt σ for all t P. (612.9) ∈ Z0 Using elementary calculus, we infer from (612.9) and (612.8) that

t ϕ•(t) = κe−κt σ +e−κtσ(t) 0 − ≤ Z0 for all t P and hence that ϕ is antitone. On the other hand, it is evident ∈ from (612.9) that ϕ(0) = 0 and ϕ 0. This can happen only if ϕ = 0, ≥ which, in turn, can happen only if σ(t) = 0 for all t P and hence, by ∈ (612.7), if D(t) = 0 for all t P. To show that D(t) = 0 for all t P, one ∈ ∈ − need only replace D by D ( ι) and L by L in (612.5) and then use the ◦ − − result just proved. Proposition 4: Let L Lin be given. Then the only differentiable ∈ V process E : R Lin that satisfies → V • E = LE and E(0) = 1V (612.10) is the one given by E := exp (ιL). (612.11) V ◦ Proof: We consider the sequence G in Map(R, Lin ) defined by V tk G (t) := Lk+1 (612.12) n k! [ kX∈n × for all n N and all t R. Since, by (612.4), Gn(t) = LSn(tL) for ∈ ∈ all n N× and all t R, it follows from Prop.2 that G converges locally ∈ ∈ uniformly to L exp (ιL) : R Lin . On the other hand, it follows from ◦ → V (612.12) that

t tk+1 1 + G = 1 + Lk+1 = S (tL) V n V (k + 1)! n+1 0 [ Z kX∈n for all n N× and all t R. Applying Prop.1 to the case when and are ∈ ∈ E V replaced by Lin , I by R, g by G, a by 0, and q by 1 , we conclude that V V t exp (tL) = 1 + (L exp (ιL)) for all t R, (612.13) V V V ◦ ∈ Z0 268 CHAPTER 6. DIFFERENTIAL CALCULUS which, by the Fundamental Theorem of Calculus, is equivalent to the asser- tion that (612.10) holds when E is defined by (612.11). The uniqueness of E is an immediate consequence of Prop.3. Proposition 5: Let L Lin and a continuous process F : R Lin ∈ V → V be given. Then the only differentiable process D : R Lin that satisfies → V D• = LD + F and D(0) = 0 (612.14) is the one given by

t D(t) = E(t s)F(s)ds for all t R, (612.15) − ∈ Z0 where E : R Lin is given by (612.11). → V Proof: Let D be defined by (612.15). Using the Corollary to the Dif- ferentiation Theorem for Integral Representations of Sect.610, we see that D is of class C1 and that D• is given by

t D•(t) = E(0)F(t) + E•(t s)F(s)ds for all t R. − ∈ Z0 Using (612.10) and Prop.1 of Sect.610, and then (612.15), we find that (612.14) is satisfied. The uniqueness of D is again an immediate consequence of Prop.3. Differentiation Theorem for Lineonic Exponentials: Let be a V linear space. The exponential exp : Lin Lin is of class C1 and its V V → V gradient is given by

1 ( L exp )M = exp (sL)M exp ((1 s)L)ds (612.16) ∇ V V V − Z0 for all L, M Lin . ∈ V Proof: Let L, M Lin be given. Also, let r R× be given and put ∈ V ∈ K := L + rM, E := exp (ιL), E := exp (ιK). (612.17) 1 V ◦ 2 V ◦ • • By Prop.4 we have E1 = LE1, E2 = KE2, and E1(0) = E2(0) = 1V . Taking the difference and observing (612.17) we see that D := E E 1 2 − 1 satisfies D• = KE LE = LD + rME , D(0) = 0. 2 − 1 2 Hence, if we apply Prop.5 with the choice F := rME2 we obtain t (E E )(t) = r E (t s)ME (s)ds for all t R. 2 − 1 1 − 2 ∈ Z0 612. LINEONIC EXPONENTIALS 269

For t := 1 we obtain, in view of (612.17),

1 1 (exp (L + rM) exp (L)) = (exp ((1 s)L))M exp (s(L + rM))ds. r V − V V − V Z0

Since expV is continuous by Prop.2, and since bilinear mappings are contin- uous (see Sect.66), we see that the integrand on the right depends contin- uously on (s, r) R2. Hence, by Prop.3 of Sect.610 and by (65.13), in the ∈ limit r 0 we find → 1 (ddM exp )(L) = (exp ((1 s)L))M exp (sL)ds. (612.18) V V − V Z0 Again, we see that the integrand on the right depends continuously on (s, L) R Lin . Hence, using Prop.3 of Sect.610 again, we conclude that ∈ × V the mapping ddM expV : Lin Lin is continuous. Since M Lin V → V ∈1 V was arbitrary, it follows from Prop.7 of Sect.65 that expV is of class C . The formula (612.16) is the result of combining (65.14) with (612.18). Proposition 6: Assume that L, M Lin commute, i.e. that LM = ∈ V ML. Then:

(i) M and expV (L) commute. (ii) We have

expV (L + M) = (expV (L))(expV (M)). (612.19)

(iii) We have ( L exp )M = (exp (L))M. (612.20) ∇ V V Proof: Put E := exp (ιL) and D := EM ME. By Prop.4, we find v ◦ − D• = E•M ME• = LEM MLE. Hence, since LM = ML, we obtain − − D• = LD. Since D(0) = 1 M M1 = 0, it follows from Prop.3 that V − V D = 0 and hence D(1) = 0, which proves (i). Now put F := exp (ιM). By the Product Rule (66.13) and by Prop.4, v ◦ we find (EF)• = E•F + EF• = LEF + EMF. Since EM = ME by part (i), we obtain (EF)• = (L + M)EF. Since (EF)(0) = E(0)F(0) = 1 , by Prop.4, EF = exp (ι(L + M)). Evaluation V v ◦ at 1 gives (612.19), which proves (ii). Part (iii) is an immediate consequence of (612.16) and of (i) and (ii). 270 CHAPTER 6. DIFFERENTIAL CALCULUS

Pitfalls: Of course, when = R, then Lin = LinR = R and the V V ∼ lineonic exponential reduces to the ordinary exponential exp. Prop.6 shows how the rule exp(s+t) = exp(s)exp(t) for all s, t R and the rule exp• = exp ∈ generalize to the case when = R. The assumption, in Prop.6, that L and V 6 M commute cannot be omitted. The formulas (612.19) and (612.20) need not be valid when LM = ML. 6 If the codomain of the ordinary exponential is restricted to P× it becomes invertible and its inverse log : P× R is of class C1. If dim > 1, then exp → V V does not have differentiable local inverses near certain values of L Lin ∈ V because for these values of L, the gradient L exp fails to be injective (see ∇ V Problem 10). In fact, one can prove that expV is not locally invertible near the values of L in question. Therefore, there is no general lineonic analogue of the logarithm. See, however, Sect.85.

613 Problems for Chapter 6

(1) Let I be a genuine interval, let be a flat space with translation space E , and let p : I be a differentiable process. Define h : I2 by V →E → V

p(s)−p(t) s−t if s = t h(s, t) := • 6 . (P6.1) ( p (s) if s = t )

(a) Prove: If p• is continuous at t I, then h is continuous at ∈ (t,t) I2. ∈ (b) Find a counterexample which shows that h need not be continu- ous if p is merely differentiable and not of class C1.

(2) Let f : R be a function whose domain is an open convex subset D→ D of a flat space with translation space . E V (a) Show: If f is differentiable and f is constant, then f = a for ∇ |D some flat function a Flf( ) (see Sect.36). ∈ E (b) Show: If f is twice differentiable and (2)f is constant, then f ∇ has the form

f = a + Q (1 q ), (P6.2) |D ◦ D⊂E − D→E where a Flf( ), q , and Q Qu( ) (see Sect. 27). ∈ E ∈E ∈ V 613. PROBLEMS FOR CHAPTER 6 271

(3) Let be an open subset of a flat space and let be a genuine inner- D E W product space. Let k : × be a mapping that is differentiable D→W at a given x . ∈ D (a) Show that the value-wise magnitude k : P×, defined by | | D → k (y) := k(y) for all y , is differentiable at x and that | | | | ∈ D 1 ⊤ x k = ( xk) k(x). (P6.3) ∇ | | |k(x)| ∇ (b) Show that k/ k : × is differentiable at x and that | | D→W

k 1 2 ⊤ x = 3 k(x) xk k(x) ( xk) k(x) . (P6.4) ∇ |k| |k(x)| | | ∇ − ⊗ ∇   (4) Let be a genuine inner-product space with translation space , let E V q be given, and define r : q × by ∈E E\{ } → V r(x) := x q for all x q (P6.5) − ∈E\{ } and put r := r (see Part (a) of Problem 3). | | (a) Show that r/r : q × is of class C1 and that E\{ } → V r = 1 (r21 r r). (P6.6) ∇ r r3 V − ⊗ (Hint: Use Part (b) of Problem  3.) (b) Let the function h : P× R be twice differentiable. Show that → h r : q R is twice differentiable and that ◦ E\{ }→ (2)(h r) = 1 1 (r(h•• r) (h• r))r r +(h• r)1 . ∇ ◦ r r2 ◦ − ◦ ⊗ ◦ V (P6.7)  (c) Evaluate the Laplacian ∆(h r) and reconcile your result with ◦ (67.17). (5) A Euclidean space of dimension n with n 2, a point q , and a E ≥ ∈E linear space are assumed given. W (a) Let I be an open interval, let be an open subset of , let D E f : I be given by (67.16), let a be a flat function on , let D → E g : I be twice differentiable, and put h := a (g f): →W |D ◦ D→ . Show that W ∆h =2a ((2ιg•• +(n + 2)g•) f) 4a(q)(g• f). (P6.8) |D ◦ − ◦ 272 CHAPTER 6. DIFFERENTIAL CALCULUS

(b) Assuming that the Euclidean space is genuine, show that the E function h : q given by E\{ }→W h(x) := ((e (x q)) x q −n)a + b (P6.9) · − | − | is harmonic for all e := and all a, b . (Hint: Use ∈ V E−E ∈W Part (a) with a determined by a = e, a(q) = 0.) ∇ (6) Let be a flat space, let q , let Q be a non-degenerate quadratic E ∈ E form (see Sect.27) on := , and define f : R by V E−E E →

f(y) := Q(y q) for all y . (P6.10) − ∈E

Let a be a non-constant flat function on (see Sect.36). E (a) Prove: If the restriction of f to the hyperplane := a<( 0 ) F { } attains an extremum at x , then x must be given by ∈F −1 x = q λ Q ( a), where λ := a(q) . (P6.11) − ∇ −1 Q (∇a,∇a)

(Hint: Use the Constrained-Extremum Theorem.) (b) Under what condition does f actually attain a maximum or a |F minimum at the point x given by (P6.11)?

(7) Let be a 2-dimensional Euclidean space with translation-space E V and let J Orth Skew be given (see Problem 2 of Chapt.4). Let ∈ V ∩ V be an open subset of and let h : be a vector-field of class D E D → V C1.

(a) Prove that

Curl h = (div(Jh))J. (P6.12) − (b) Assuming that is convex and that div h = 0, prove that h = D J f for some function f : R of class C2. ∇ D→ Note: If h is interpreted as the field of a volume-preserving flow, then div h = 0 is valid and a function f as described in Part (b) is called a “stream- function” of the flow. 613. PROBLEMS FOR CHAPTER 6 273

(8) Let a linear space , a lineon L Lin , and u be given. Prove: V ∈ V ∈ V The only differentiable process h : R that satisfies → V h• = Lh and h(0) = u (P6.13)

is the one given by

h := (exp (ιL))u. (P6.14) V ◦ (Hint: To prove existence, use Prop. 4 of Sect. 612. To prove unique- ness, use Prop.3 of Sect.612 with the choice D := h λ (value-wise), ⊗ where λ ∗ is arbitrary.) ∈ V (9) Let a linear space and a lineon J on that satisfies J2 = 1 be V V − V given. (There are such J if dim is even; see Sect.89.) V (a) Show that there are functions c : R R and d : R R, of class → → C1, such that

exp (ιJ) = c1 + dJ. (P6.15) V ◦ V (Hint: Apply the result of Problem 8 to the case when is re- V placed by := Lsp(1 , J) Lin and when L is replaced by C V ⊂ V LeJ Lin , defined by LeJU = JU for all U .) ∈ C ∈C (b) Show that the functions c and d of Part (a) satisfy

d• = c, c• = d, c(0) = 1, d(0) = 0, (P6.16) − and

c(t + s) = c(t)c(s) d(t)d(s) − for all s, t R. (P6.17) d(t + s) = c(t)d(s) + d(t)c(s) ∈  (c) Show that c = cos and d = sin.

Remark: One could, in fact, use Part (a) to define sin and cos.

(10) Let a linear space and J Lin satisfying J2 = 1 be given and V ∈ V − V put

:= L Lin LJ = JL . (P6.18) A { ∈ V | − } 274 CHAPTER 6. DIFFERENTIAL CALCULUS

(a) Show that = Null ( πJ exp ) and conclude that πJ exp fails A ∇ V ∇ V to be invertible when dim > 0. (Hint: Use the Differentiation V Theorem for Lineonic Exponentials, Part (a) of Problem 9, and Part (d) of Problem 12 of Chap. 1). (b) Prove that 1 Bdy Rng exp and hence that exp fails to be − V ∈ V V locally invertible near πJ.

(11) Let be an open subset of a flat space with translation space and D E V let ′ be a linear space. V (a) Let H : Lin( , ′) be of class C2. Let x and v be D → V V ∈ D ∈ V given. Note that

′ 2 ′ xCurl H Lin( , Lin( , Lin( , ))) = Lin ( , Lin( , )) ∇ ∈ V V V V ∼ 2 V V V and hence

∼ 2 ′ ′ ( xCurl H) Lin ( , Lin( , )) = Lin( , Lin( , Lin( , ))). ∇ ∈ 2 V V V ∼ V V V V Therefore we have

∼ ′ 2 ′ G := ( xCurl H) v Lin( , Lin( , )) = Lin ( , ). ∇ ∈ V V V ∼ 2 V V Show that

∼ G G =( xCurl H)v. (P6.19) − ∇ (Hint: Use the Symmetry Theorem for Second Gradients.) (b) Let η : ∗ be of class C2 and put D → V W := Curl η : Lin( , ∗) = Lin ( 2, R). D→ V V ∼ 2 V Prove: If h : is of class C1, then D → V Curl (Wh)=( W)h + W h +( h)⊤W, (P6.20) ∇ ∇ ∇ where value-wise evaluation and composition are understood.

(12) Let and ′ be flat spaces with dim = dim ′ 2, let ϕ : ′ be E E E E ≥ E→E a mapping of class C1 and put

:= x xϕ is not invertible . C { ∈E |∇ } 613. PROBLEMS FOR CHAPTER 6 275

(a) Show that ϕ>( ) Rng ϕ Bdy (Rng ϕ). C ⊃ ∩ (b) Prove: If the pre-image under ϕ of every bounded subset of ′ is E bounded and if Acc ϕ>( ) = (see Def.1 of Sect.57), then ϕ is C ∅ surjective. (Hint: Use Problem 13 of Chap.5.)

(13) By a complex p we mean an element of C(N), i.e. a sequence in C indexed on N and with finite support (see (07.10)). If p = 0, we define the degree of p by 6

deg p := max Supt p = max k N pk =0 . { ∈ | 6 } By the derivative p′ of p we mean the complex polynomial p′ C(N) ∈ defined by

′ (p )k := (k + 1)pk for all k N. (P6.21) +1 ∈ The polynomial function p˜ : C C of p is defined by →

k p˜(z) := N pkz for all z C. (P6.22) k∈ ∈ P Let p (C(N))× be given. ∈ (a) Show thatp ˜<( 0 ) is finite and ♯p˜<( 0 ) deg p. { } { } ≤ (b) Regarding C as a two-dimensional linear space over R, show that p˜ is of class C2 and that

′ ( zp˜)w =p ˜ (z)w for all z, w C. (P6.23) ∇ ∈ (c) Prove thatp ˜ is surjective if deg p > 0. (Hint: Use Part (b) of Problem 12.) (d) Show: If deg p > 0, then the equation ? z C,p ˜(z) = 0 has at ∈ least one and no more than deg p solutions.

Note: The assertion of Part (d) is usually called the “Fundamental Theorem of Algebra”. It is really a theorem of analysis, not algebra, and it is not all that fundamental.