Multivariable : Inverse-Implicit Theorems1 A. K. Nandakumaran2

1. Introduction

We plan to introduce the calculus on Rn, namely the concept of total of multivalued functions f : Rn → Rm in more than one variable. We are indeed familiar ∂fj with the notion of partial derivatives ∂ifj = , 1 ≤ i ≤ n, 1 ≤ j ≤ m. In the sequel, we ∂xi see the importance of introducing the powerful concept of Total and its con- nection to the partial derivatives. The reader is expected to have the sound knowledge of basic linear algebra and the notions like: Rn as normed linear space, basis, dimension, linear operators and transformations and so on. We remark that the (known also as Freche´t Derivative) can be extended to infinite dimensional normed lin- ear spaces which allows one to solve more complicated problems especially arising from optimal control problems, , partial differential and so on. Motivation: One of the fundamental problems in (and hence in applica- tions as well) is follows: Let f : Rn → Rn. Given y ∈ Rn, solve for x from the system of equations

(1.1) f(x) = y and represent x = g(y) and if possible find good properties of g, namely smoothness. More generally, if f : Rn+m → Rn, x ∈ Rn, y ∈ Rm, solve the implicit system of equations

(1.2) f(x, y) = 0 and represent x = g(y). n n T n Note that f : R → R can be written as f = (f1, ··· , fn) , where fi : R → R are real valued functions. Linear system: Let us look at the well-known linear system

(1.3) Ax = y

1The lectures were delivered at the Science academy workshop held at the department of mathematics, Jain University, Bangalore during the period February 16-18, 2012. 2Department of Mathematics, Indian Institute of Science, Bangalore, Email: [email protected] 1 2 where A is a given n × n matrix. The system (1.3) can be rewritten as n X (1.4) aijxj = yi, 1 ≤ i ≤ n j=1 The system (1.3) or (1.4) is uniquely solvable for x in terms of y if and only if det A 6= 0. In this case x = A−1y and A−1 is also an n × n matrix. We would like to address the solvability of (1.1) and (1.2) giving appropriate conditions like det A 6= 0 as in linear system.

Example 1.1. Define f : R → R by f(x) = x2. Clearly f(0) = 0. For y > 0, the 2 √ √ x = y has two solutions x1 = + y and x2 = − y (no uniqueness) and y < 0, the equation has no solution. Thus we sense a trouble around y = 0. Note that ∂f ∂x |x=0 = 2x|x=0 = 0 is the cause for trouble which we will see later. If we take any a 6= 0, and b = a2 = f(a). Then for any y ∈ (b − , b + ),  small ∃! x ∈ (a − δ, a + δ) for some δ such that f(x) = y. That is the equation is solvable in a neighbourhood of the equilibrium ∂f point (a, b). Here observe that ∂x |x=a = 2x|x=a = 2a 6= 0. Example 1.2. More generally, consider f : R × R → R defined by f(x, y) = x2 + y2 − 1. Indeed one knows that the solutions (x, y) of the equation f(x, y) = 0 are points on the unit . Consider the solvability of x in terms of y near the solution (0, 1) of x2 +y2 −1 = 0, 2 2 p 2 p 2 that x = 1 − y . For y near 1, we have two solutions x1 = + 1 − y , x2 = − 1 − y . ∂f Similarly the case near the point (0, −1). Again observe that ∂x |(0,±1) = 2x|(0,±1) = 0. On the other hand if we consider the equilibrium point (+1, 0). For y near 0, ∃! x = +p1 − y2 and for (−1, 0) and y near 0, ∃!x = −p1 − y2. In fact, For any (a, b) 2 2 ∂f with a + b − 1 = 0 and a 6= 0, we get ∂x |(a,b) 6= 0 and the system is uniquely solvable for y in a neighborhood of b. The situation is reversed if one looks at the possibility of solving y in terms of x.

Remark 1.3. Thus we see the impact of non vanishing of the derivative on the solv- ability similar to det A 6= 0 in the linear systems. In higher dimensional case, we have many derivatives and we need a systematic procedure to deal with such complicated case. In other words, we would like to understand the solvability of a system of non-linear equations in many unknowns. This is given via inverse and implicit function theorems. We also remark that we will only get a local theorem not a global theorem like in linear systems. 3

2. Partial, Directional and Freche`t Derivatives

0 Let f : R → R and x0 ∈ R. Then f (x0) is normally defined as

0 f(x0 + h) − f(x0) (2.1) f (x0) = lim h→0 h 0 We also aware of the fact that f (x0) is the of the to the y = f(x) at the point (x0, f(x0)). We will soon give another interpretation of the derivatives via linear transformation which is at the heart of the concept Frech`et derivative. Let U be a open subset of Rn and f : U → Rm be a multi-valued map represented T by f = (f1, ··· , fm) , where fi : U → R are real valued maps. The limit definition can easily be used to define the directional derivatives in any direction and in particular partial derivatives which are nothing but the directional derivatives along the co-ordinate axes. Directional and Partial Derivatives: Recall that the derivative in (2.1) is the instanta- neous rate of change of the output f(x) with respect to the input x. Thus, if we consider n f(x) at x0 ∈ R , there are infinitely many radial directions emanating from x0. In par- ∂f ticular, if we have two variable function f(x, y), then ∂x (x0, y0) is the instantaneous rate of change of f along the x-axis (keeping y-fixed) and is given by

∂f f(x0 + h, y) − f(x0, y) (2.2) (x0, y0) = lim . ∂x h→0 h

∂f n Similarly ∂y (x0, y0). Any given vector v ∈ R determines a direction given by its n position vector. Thus for x0 ∈ R , f(x0 + hv) − f(x0), h ∈ R is the change in f in the n direction v. This motivates us to define the of f at x0 ∈ R in the direction v, denoted by Dvf(x0), is defined as

f(x0 + hv) − f(x0) (2.3) Dvf(x0) = lim h→0 h T whenever the limit exists. Note that if f = (f1, ··· , fm) , then

T Dvf(x0) = (Dvf1(x0), ··· ,Dvfm(x0)) .

Corollary 2.1. If v = ei = (0, ··· , 0, 1, 0, ··· , 0) is the co-ordinate axis vector, then clearly  T ∂f ∂f1 ∂fm Dei f(x0) = (x0) = (x0), ··· , ∂xi ∂xi ∂xi n 2 Pn 2 Pn Example 2.2. Define f : R → R by f(x) = |x| = i=1 xi = i=1 xixi = hx, xi. Then ∂f (x0) = 2xi. ∂xi 4

Now for v ∈ Rn, n X 2 f(x0 + hv) = (x0i + hvi) i=1 n n n X 2 X 2 X 2 = x0i + 2h x0ivi + h vi i=1 i=1 i=1 2 2 = f(x0) + 2h(x0, v) + h |v|

It follows that

Dvf(x0) = 2(x0, v).

Remark 2.3. As seen earlier the existence of all directional derivatives implies the ex- istence of partial derivatives. But, the converse is not true

Exercise 2.4. Let f : R2 → R defined by  x + y if x = 0 or y = 0 f(x, y) = 1 otherwise

Then D(1,0)f(0, 0) = D(0,1)f(0, 0) = 1, but D(a,b)f(0, 0), a 6= 0, b 6= 0 does not exists.

Remark 2.5. Normally, we expect differentiable functions to be continuous. But the existence of all directional derivatives at a point does not imply the continuity at that point. This is serious drawback and prompts us to look for a stronger concept of derivative, namely the notion total derivative.

Exercise 2.6. Consider  xy2  , if x 6= 0 f(x, y) = x2 + y4 0 if x = 0.

2 Show that Dvf(0, 0) exists for all v ∈ R , but f is not continuous.

Exercise 2.7. Consider xy(x2 − y2)  , if (x, y) 6= (0, 0) f(x, y) = x2 + y2 0 if (x, y) = (0, 0). ∂2f ∂2f Then show that (0, 0) = 1 and (0, 0) = −1 which shows that, in general, the ∂x∂y ∂y∂x order of partial derivatives cannot be interchanged. 5

0 Total (Freche`t) Derivative: Recall that if f : R → R, then f (x0) = α represents the slope of the tangent to the curve y = f(x) at the point (x0, f(x0)). In this case, one can 0 associate the linear equations, namely the line y = αx = f (x0)x. In other words, the derivatives can be viewed as the linear mapping, Tα : R → R defined by 0 (2.4) Tαx = αx = f (x0)x. Thus interpreting any differentiation concept as a linearization is the crux of the matter 0 not only in finite dimension, but in infinite dimensions as well. Once again recall f (x0) in dimension one which can be re-casted as

0 (2.5) f(x0 + h) = f(x0) + f (x0)h + r(h) where the reminder (or error) terms satisfies r(h) (2.6) lim = 0 h→0 h That is

f(x0 + h) = value of f at x0+ linearized term + reminder o(h).

This can be easily extended to vector valued functions with one variable, namely f : R → m T 0 0 0 T R . Here f(x) = (f1(x), ··· fm(x)) with x ∈ R and f (x0) = (f1(x0), ··· , fm(x0)) . If 0 m we define αi = fi (x0), then α = (α1, ··· , αm) ∈ R . Correspondingly, we can associate m a linear operator Tα : R → R defined by T Tα(x) = αx = (α1x, ··· , αmx) 0 T = (f1(x0)x, ··· , fm(x0)x) , x ∈ R Further, for each i,

0 fi(x0 + h) = fi(x0) + fi (x0)h + ri(h), h ∈ R

ri(h) where h → 0 as h → 0. In vector notation 0 (2.7) f(x0 + h) = f(x0) + f (x0)h + r(h) r(h) lim = 0 h→0 h n m If f : R → R , then x0, h are vectors and one has to interpret the meaning of product 0 r(h) f (x0)h and division h . The equation (2.7 ) equivalently can be rewritten as |f(x + h) − f(x ) − f 0(x )h| lim 0 0 0 = 0 h→0 |h| 6

0 0 where f (x0)h is the action of the linear operator f (x0) at h.

n m Definition 2.8. (Freche`t Derivative): Let U ⊂ R be open and f : U → R , x0 ∈ E. n m We say f is differentiable (F-differentiable) at x0 if ∃ a linear operator T : R → R such that |f(x + h) − f(x ) − T h| (2.8) lim 0 0 = 0 h→0 |h|

0 If such a linear transformation T exists, we denote T = f (x0). If f is differentiable 0 at all points in E, we say f is differentiable in E and f (x0) is also known as the total derivative or Frech`et derivative of f at x0.

Example 2.9. Suppose A ∈ L(Rn, Rm) be an m × n matrix. Define f : Rn → Rm by f(x) = Ax. Then clearly f(x0 + h) − f(x0) = A(x0 + h) − Ax0 = Ah by linearity. 0 Therefore r(h) = 0 and f (x0) = A.

n 2 0 Example 2.10. Let f : R → R by f(x) = |x| = (x, x). Then f (x0)h = 2(x0, h) or 0 f (x0) = 2x0

Remark 2.11. The definition can be extended to infinite dimensional normed linear spaces without much difficulty and hence the enormous applications.

The following results can be verified by the reader.

n n Proposition 2.12. If f : R → R and differentiable at x0, then F is continuous at x0.

Proposition 2.13. (): Let f : U ⊂ Rn → Rm, U open be differentiable m k at x0 and g : R → R be differentiable at y0 = f(x0). Then the composite function n k F (x) = g ◦ f(x) = g(f(x)), F : R → R is differentiable at x0 and 0 0 0 (2.9) F (x0) = g (f(x0))f (x0).

0 0 We have to interpret the product g (f(x0))f (x0) as they are linear operators.

Linear operators and matrices: Let A be an m × n matrix, then define f : Rn → Rm by f(x) = Ax and f is a linear operator. Conversely, if f is a linear operator, then ∃ an m × n matrix A such that f(x) = Ax. Of course the matrix representation depends on the basis of Rn and Rm. Thus every linear operator f ∈ L(Rn, Rm) can be identified with a matrix A ∈ M(m, n). In the above proposition, let A ∈ M(m, n),B ∈ M(k, m) and C ∈ M(k, n) be, re- 0 0 0 spectively, the matrices corresponding to f (x0), g (f(x0)) and F (x0). Then the equality 7

(2.8) can be read as C = BA Notice that the orders of the matrices are such that the product BA is well defined.

Exercise 2.14. Show that in Exercises 2.4, 2.6 and 2.7, f 0(0, 0) does not exist.

Remark 2.15. This indicates that the existence of all directional derivatives are not enough to guarantee the existence of total derivative. But if the total derivative exists, then all the directional derivatives exist and in fact one can compute the total derivative using the partial derivatives.

n m Let {e1, ··· , en} and {e˜1, ··· , e˜m} be the standard basis of R and R respectively. If 0 0 f (x0) exists, then for j fixed, by definition and linearity of f (x0), we get

0 f(x0 + hei) − f(x0) = f (x0)(hei) + r(hei), 0 = hf (x0)ei + r(hci)

|r(hei)| where h ∈ R on h → 0 as h → 0. Dividing by h and taking the limit as h → 0, we get  T ∂f 0 ∂f1 ∂fm (2.10) = f (x0)ei = , ··· , ∂xi ∂xi ∂xi

n P More generally, if v ∈ R , then v = viei and

0 X 0 Dvf(x0) = f (x0)v = vif (x0)ei by linearity X ∂f = vi ∂xi

0 Thus, the matrix representation of f (x0) is given by   0 ∂fj (2.11) f (x0) = ∂xi 1≤j≤m 1≤i≤n The above results can be consolidated in

n m ∂fj Theorem 2.16. Let f : U ⊂ → be differentiable at x0 ∈ U. Then , exists for R R ∂xi 0 all 1 ≤ i ≤ n, 1 ≤ j ≤ m and f (x0) is given as in (2.10). That is m X ∂fj f 0(x )e = e˜ 0 i ∂x j j=1 i 8

3. Theorem

In this section, we address the solvability of the non linear equation in explicit form: (3.1) f(x) = y where f : E ⊂ Rn → Rm and y ∈ Rm is given and E is open. These are a set of m non-linear equation in n unknowns:

 f1(x1, ··· xn) = y1  . (3.2) .  fm(x1, ··· xn) = yn

Given a ∈ E, let b = f(a), then (a, b) is a solution to (3.1). We want to give conditions under which one can solve for x for all y in a neighborhood of b. Theorem 3.1 (Inverse Function Theorem). Let f be as above satisfies (1) f is a C1 map, that is f 0(x) exists for all x ∈ E and the mapping x 7→ f 0(x),E → L(Rn, Rm) is continuous. (2) The matrix f 0(a) is invertible, that is det f 0(a) 6= 0. Then ∃ open sets U and V in Rn and Rm, containing a and b, respectively such that (i) f : U → V is 1-1 and onto (ii) g = f −1 : V → U given by g(f(x)) = x, ∀ x ∈ U is a C1 map. The above theorem tells us that y = f(x) can be uniquely solved for x for y in a neighborhood of b. Further the inverse map is also smooth.

We will not present a proof of the above theorem, but it is based on contraction mapping theorem from functional analysis.

Theorem 3.2. Let φ : Rn → Rn be a contraction map, that is ∃ 0 ≤ α < 1 such that |φ(x) − φ(y)| ≤ α|x − y| for all x, y ∈ Rn. Then ∃ a unique solution to the problem φ(x) = x Corollary 3.3. The above theorem is true for any complete metric space X in place of Rn. n Remark 3.4. The proof is beautiful and constructive. Take any arbitrary point x0 ∈ R . Construct inductively xn+1 = φ(xn), n = 0, 1, 2, ··· . Then one can prove that xn → x and φ(x) = x. 9

Proof. (Inverse Function Theorem; sketch): Given that A = f 0(a) is invertible. Since f 0 is continuous, given  > 0, ∃ U ⊂ E such that k f 0(x) − A k≤  ∀ x ∈ U. Now define for y ∈ Rn, φ(x) = x + A−1(y − f(x)). Then f(x) = y is solvable if and only if φ(x) = x has a solution. Step 1: Show that φ is a contraction to get the solution. 1 Step 2: Prove the inverse map then obtained is C . 

Corollary 3.5. Let f : E ⊂ Rn → Rm be C1 and det f 0(x) 6= 0, ∀ x ∈ E, then f is an open map. The matrix of f 0(x) is also known as Jacobian matrix.

4. Implicit Function Theorem

Quite often, we do not expect to get equations in explicit form y = f(x) like in x2 + y2 − 1 = 0, we may get a relation connecting the variables x and y. Let f : E ⊂ Rn × Rm → Rn be a C1 map. We wish to solve for x in terms of y of the system of equations (4.1) f(x, y) = 0 This is a system of n equations in n + m variables:  f1(x1, ··· xn, y1, ··· ym) = 0  . (4.2) .  fn(x1, ··· xn, yn ··· ym) = 0

, Linear System: If fi s are linear, then ∃ n × n matrix A and n × m matrix B so that (4.2) reduces to (4.3) Ax + By = 0 If A is invertible, then x can be solve as x = −A−1By

Let T ∈ L(Rn+m, Rn) be a linear operator on Rn+m. Indeed T can be represented as an n × (n + m) matrix like [A, B]. For (h, k) ∈ Rn+m, we write (h, k) = (h, 0) + (0, k), h ∈ Rn, k ∈ Rm and by linearity of T , we get T (h, k) = T (h, 0) + T (0, k)

n n m n Thus, define Tx : R → R ,Ty : R → R as

Txh = T (h, 0),Tyk = T (0, k) 10 and

T (h, k) = Txh + Tyk That is, we have

T = Tx + Ty n n m n with Tx ∈ L(R , R ) and Ty ∈ L(R , R )

Theorem 4.1 (Implicit Function Theorem (Linear version)). Assume T ∈ L(Rn+m, Rn) m n and Tx is invertible, then for any k ∈ R , ∃!h ∈ R such that T (h, k) = 0 and the solution −1 is given by h = −Tx Ty(k).

Theorem 4.2 (Implicit Function Theorem (Non Linear version)). Let f : E ⊂ Rn+m → Rn be a C1 map such that f(a, b) = 0 for some (a, b) ∈ E. Put T = f 0(a, b) ∈ n+m n L(R , R ) and T = Tx + Ty as above and assume Tx is invertible. Then ∃ open sets U ⊂ E ⊂ Rn+m, W ⊂ Rm with b ∈ W , (a, b) ∈ U satisfying (i) for every y ∈ W , ∃! x such that (x, y) ∈ U and f(x, y) = 0. (ii) define g : W → Rn by g(y) = x, then g is a C1 map such that g(b) = a, and f(g(y), y) = 0. Further 0 −1 g (b) = −Tx Ty.

Proof. (Idea): The proof is based as an application of Inverse function theorem applied to F : E ⊂ Rn+m → Rn+m defined by F (x, y) = (f(x, .y), y).

We will not get into the details. 

Remark 4.3. For the system (4.2), we have T = [Tx,Ty], where " # " # Dx1 f1, ··· Dxn f1 Dy1 f1, ··· Dym f1 Tx = and Ty = Dx1 fn, ··· Dxn fn Dy1 fn, ··· Dym fn

Example 4.4. Define f : R2+3 → R2, n = 2, m = 3, by

x1 f1(x1, x2, y1, y2, y3) = 2e + x2y1 − 4y2 + 3

f2(x1, x2, y1, y2, y3) = x2cosx1 − 6x1 + 2y1 − y3

Take a = (1, 1), b = (3, 2, 7), then f(a, b) = 0. Compute T = (Tx,Ty) where " # " # 2 3 1 −4 0 Tx = and Ty = −6 1 2 0 1 11

Clearly T is invertible and x " # 1 1 −3 T −1 = . x 20 6 2 Hence one can solve for x = g(y) in a neighborhood of (a, b).

References

[1] Tom M. Apostol, , Narosa Publishing House (2002). [2] W. Rendin, Principles of Mathematical Analysis, Mc. Graw Hill (1976). [3] M. Spivak, Calculus on , W. A. Benjamin Inc.(1965)