QE Optimization, WS 2016/17 Part 2. Differential Calculus for Functions

QE Optimization, WS 2016/17 Part 2. Differential Calculus for Functions of n Variables (about 5 Lectures) Supporting Literature:Angel de la Fuente, \Mathematical Methods and Models for Economists", Chapter 2 C. Simon, L. Blume, \Mathematics for Economists" Contents 2 Differential Calculus for Functions of n Variables 2 2.1 Partial Derivatives . .2 2.2 Directional Derivatives . .5 2.3 Higher Order Partials . .6 2.4 Total Differentiability . .7 2.5 Chain Rule . 13 2.6 Taylor's Formula . 16 2.7 Implicit Functions . 20 2.8 Inverse Functions . 26 2.9 Unconstrained Optimization . 27 2.10 First-Order Conditions . 28 2.11 Second-Order Conditions . 29 2.12 A Rough Guide: How to Find the Global Maxima/Minima . 32 2.13 Envelope Theorems . 34 2.14 Gâteauxand Fréchet Differentials . 38 In this chapter we consider functions n f : R ! R of n ≥ 1 variables (multivariate functions). Such functions are the basic building block of formal economic models. 1 2 Differential Calculus for Functions of n Variables 2.1 Partial Derivatives Everywhere below: U ⊆ Rn will be an open set in the space (Rn; k · k) (with the Euclidean norm k · k) and f : U ! R, U 3 (x1; : : : ; xn) ! f(x1; : : : ; xn) 2 R: Definition 2.1.1. The function f is partially differentiable with respect to the i-th coordinate (or variable) xi; at a given point x 2 U, if the following limit exists f(x + hei) − f(x) Dif(x) : = lim h!0 h 1 = lim [f(x1; : : : ; xi−1; xi + h; xi+1; : : : ; xn) − f(x1; : : : ; xn)] ; h!0 h n where ei := f0;:::; 0;1; 0;:::; 0g is the basis vector in R . | {z } i−1 Since U is open, there exists an open ball B"(x) ⊆ U. In the definition of lim one h!0 considers only \small" h with jhj < ": Dif(x) is called the i-th partial derivative of f at point x: Notation: We also write Dxi f(x);@if(x); @f(x)=@xi: The partial derivative Dif(x) can be interpreted as a usual derivative w.r.t. the i- th coordinate, whereby all the other n − 1 coordinates are kept fixed. Namely, in the "−neighbourhood of xi, let us define a function (xi − "; xi + ") 3 ξ ! gi(ξ) := f(x1;i−1 ; ξ; xi+1; : : : ; xn): Then by Definition 2.1.1, gi(xi + h) − gi(xi) 0 Dif(x) := lim = gi(xi): h!0 h Definition 2.1.2. A function f : U ! R is called partially differentiable if Dif(x) exists for all x 2 U and all 1 ≤ i ≤ n: Furthermore, f is called continuously partially differentiable, if all partial derivatives Dif : U ! R, 1 ≤ i ≤ n are continuous functions. 2 Example 2.1.3. (i) Distance function q 2 2 n r(x) := jxj = x1 + ::: + xn; x 2 R Let us show that r(x) is partially differentiable at all points x 2 Rnnf0g: q 2 2 2 ξ ! gi(ξ) := x1 + ::: + ξ + ::: + xn 2 R: Use the chain rule for the derivatives of real-valued functions (cf. standard courses in Calculus) =) @r 1 2x x (x) = i = i : @x 2 p 2 2 2 r(x) i x1 + ::: + ξ +n n Generalization: Let f : R+ ! R be differentiable, then R 3 x ! f(r(x)) is partially differentiable at all points x 2 Rnnf0g and @ @r x f(r) = f 0(r) · = f 0(r) · i : @xi @xi r (ii) Cobb{Douglas production function with n inputs α1 α2 αn f(x) := x1 x2 : : : xn for αi > 0; 1 ≤ i ≤ n; defined on U := f(x1; : : : ; xn)j xi > 0; 1 ≤ i ≤ ng : Calculate the so-called marginal-product function of input i @f α1 αi−1 αn f(x) (x) = αix1 i : : : xn = αi : @xi xi Mathematicians will say: multiplicative functions with separable variables, polynomi- als. Economists are especially interested in the case αi 2 (0; 1): This is an example of homogeneous functions of order (degree) a = α1 + ::: + αn, which means f(λx) = λaf(x); 8λ > 0; x 2 U: Moreover, the Cobb{Douglas function is log −linear: log f(x) = α1 log x1 + ::: + αn log xn: 3 (iii) Quasilinear utility function: f(m; x) := m + u(x) with m 2 R+ (i.e., m ≥ 0) and some u : R ! R: @f @f = 1; = u0(x): @m @x (iv) Constant elasticity of substitution (CES) production function with n inputs, which describes aggregate consumption for n types of goods. α α 1/α f(x1; : : : ; xn) := (δ1x1 + ::: + δnxn) ; X with α > 0; δi > 0 and δi = 1; 1≤i≤n defined on the open domain U := f(x1; : : : ; xn) j xi > 0; 1 ≤ i ≤ ng: We calculate the marginal-product function @f 1 1 α α α −1 α−1 (x) = (δ1x1 + ::: + δnxn) · αδixi @xi α 1−α α−1 α α α = δixi (δ1x1 + ::: + δnxn) : Note that f is homogeneous : f(λx) = λf(x): Definition 2.1.4. Let U ⊆ Rn be open and f : U ! R be partially differentiable. Then, the vector @f @f n rf(x) := gradf(x) := (x);:::; (x) 2 R @x1 @xn is called the gradient of f at point x 2 U. Example 2.1.5. (i) Distance function r(x) x gradr(x) = 2 n; x 2 U := nnf0g: r(x) R R (ii) Let f; g : U ! R be partially differentiable. Then r(f · g) = f · rg + g · rf: Proof. This follows from the product rule @ @g @f (fg) = f + g : @xi @xi @xi 4 2.2 Directional Derivatives Fix a directional vector v 2 Rn with jvj = 1 (of unit length!). Definition 2.2.1. The directional derivative of f : U ! R at a point x 2 U along the unit vector v 2 Rn (i.e., with jvj = 1) is given by f(x + hv) − f(x) @vf(x) := Dvf(x) := lim : h!0 h Remark 2.2.2. (i) Define a new function h ! gv(h) := f(x + hv): If gv(h) is differentiable at h = 0; then f(x) is differentiable at point x 2 U along direction v and 0 Dvf(x) = gv(0): (ii) From the above definitions it is clear that the partial derivatives = directional derivatives along the basis vectors ei; 1 ≤ i ≤ n, @f (x) = Dei f(x); 1 ≤ i ≤ n: @xi Example 2.2.3. Consider the \saddle" function in R2 2 2 f(x1; x2) := −x1 + x2; p p and find Dvf(x) along the direction v := 2=2; 2=2 ; jvj = 1: Define p 2 p 2 gv(h) : = − x1 + h 2=2 + x2 + h 2=2 p 2 2 = −x1 + x2 + 2h(x2 − x1): p 0 Then Dvf(x) = gv(0) = 2(x2 − x1). Note that Dvf(x) = 0 if x1 = x2. The function f has its minimum at the diagonal x1 = x2: Relation between rf(x) and Dvf(x): n X Dvf(x) = hrf(x); viRn = @if(x) · vi: (∗) i=1 Proof. will be done later, as soon as we prove the chain rule for rf. 5 2.3 Higher Order Partials Let f : U ! R be partially differentiable, i.e., @ 9 f : U ! R 1 ≤ i ≤ n: @xi Analogously, for 1 ≤ j ≤ n we can define (if it exists) @ @ f : U ! R: @xj @xi Notation: @2f @2f or 2 if i = j: @xj@xi @xi Warning: In general, @2f @2f 6= if i 6= j: @xj@xi @xj@xi Theorem 2.3.1 ((A. Schwarz); also known as Young's theorem). Let U ⊆ Rn be open and f : U ! R be twice continuously differentiable, f 2 C2(U), (i.e., all derivatives 2 @ f ; 1 ≤ i; j ≤ n; are continuous). Then for all x 2 U and 1 ≤ i; j ≤ n @xj @xi @2f(x) @2f(x) = ; @xj@xi @xj@xi i.e., for cross-partial derivatives, the order of differentiation in their computing is irrele- vant. Example: (i) The above theorem works: 2 2 2 f(x1; x2) := x1 + bx1x2 + x2; (x1; x2) 2 R : Counterexample: (ii) The above theorem does not work: 8 2 2 x1 − x2 2 <x1x2 2 2 ; (x1; x2) 2 R n f(0; 0)g f(x1; x2) := x1 + x2 :0; (x1; x2) = (0; 0): We calculate @2f @2f (0; 0) = −1 6= 1 = (0; 0): @x1@x2 @x2@x1 Reason: f2 = C2(U). 6 Notation: @kf Dik i1 f; ; @xik : : : @xi1 for any i1; : : : ; ik 2 f1; : : : ng: In general, for any v 2 Rn with jvj = 1, we have by (∗) jD f(x)j ≤ jrf(x)j : v Rn Geometrical interpretation of rf: Define the normalized vector rf(x) n v := 2 R : jrf(x)j n R Then, for this v D f(x) = hrf(x); vi = jrf(x)j : v Rn Rn In other words, the gradient rf(x) of f at point x is the direction in which the slope of f is the largest in absolute value. 2.4 Total Differentiability Intuition: Repetition of the 1-dim case Definition 2.4.1. A function g : R ! R is differentiable at point x 2 R if the following limit exists g(x + h) − g(x) 0 lim =: g (x) 2 R: (∗) h!0 h Geometrical picture: Locally, i.e., for small jhj ! 0; we can approximate the values of g(x + h) by the linear function g(x) + ah with a := g0(x) 2 R: Indeed, the limit (∗) can be rewritten as g(x + h) − [g(x) + ah] lim = 0: h!0 h The approximation error Eg(h) equals Eg(h) := g(x + h) − [g(x) + ah] 2 R and it goes to zero with h: E (h) jE (h)j lim g = 0 i.e., lim g = 0: h!0 h h!0 jhj 7 The latter can be written as Eg(h) = o(h) as h ! 0; g(x + h) ∼ g(x) + ah as h ! 0: Summary: g : R ! R is differentiable at x 2 R if, for points x + h sufficiently close to x, the values g(x + h) admit a \nice" approximation by a linear function g(x) + ah, with an error Eg(h) := g(x + h) − g(x) − ah that goes to zero \faster" than h itself, i.e., jE (h)j lim g = 0: h!0 jhj Now we extend the notion of differentiability to functions f : Rn ! Rm, for arbitrary n; m ≥ 1 : 0 1 0 1 f1(x) f1(x1; : : : ; xn) n B .

QE Optimization, WS 2016/17 Part 2. Differential Calculus for Functions

Math 346 Lecture #3 6.3 the General Fréchet Derivative

FROM CLASSICAL MECHANICS to QUANTUM FIELD THEORY, a TUTORIAL Copyright © 2020 by World Scientific Publishing Co

Differentiability Problems in Banach Spaces

Value Functions and Optimality Conditions for Nonconvex

On Fréchet Differentiability of Lipschitz Maps

Gateaux Differentiability Revisited Malek Abbasi, Alexander Kruger, Michel Théra

ECE 821 Optimal Control and Variational Methods Lecture Notes

An Inverse Function Theorem Converse

A Generalization of the Implicit Function Theorem

DIFFERENTIABILITY PROPERTIES of UTILITY FUNCTIONS Freddy

Computation of a Bouligand Generalized Derivative for the Solution Operator of the Obstacle Problem∗

Arxiv:1802.07633V1 [Math.FA]