QE Optimization, WS 2016/17 Part 2. Differential Calculus for Functions of n Variables
(about 5 Lectures)
Supporting Literature:Angel de la Fuente, “Mathematical Methods and Models for Economists”, Chapter 2
C. Simon, L. Blume, “Mathematics for Economists”
Contents
2 Differential Calculus for Functions of n Variables 2 2.1 Partial Derivatives ...... 2 2.2 Directional Derivatives ...... 5 2.3 Higher Order Partials ...... 6 2.4 Total Differentiability ...... 7 2.5 Chain Rule ...... 13 2.6 Taylor’s Formula ...... 16 2.7 Implicit Functions ...... 20 2.8 Inverse Functions ...... 26 2.9 Unconstrained Optimization ...... 27 2.10 First-Order Conditions ...... 28 2.11 Second-Order Conditions ...... 29 2.12 A Rough Guide: How to Find the Global Maxima/Minima ...... 32 2.13 Envelope Theorems ...... 34 2.14 Gˆateauxand Fr´echet Differentials ...... 38
In this chapter we consider functions
n f : R → R of n ≥ 1 variables (multivariate functions). Such functions are the basic building block of formal economic models.
1 2 Differential Calculus for Functions of n Variables
2.1 Partial Derivatives
Everywhere below: U ⊆ Rn will be an open set in the space (Rn, k · k) (with the Euclidean norm k · k) and f : U → R,
U 3 (x1, . . . , xn) → f(x1, . . . , xn) ∈ R.
Definition 2.1.1. The function f is partially differentiable with respect to the i-th
coordinate (or variable) xi, at a given point x ∈ U, if the following limit exists
f(x + hei) − f(x) Dif(x) : = lim h→0 h 1 = lim [f(x1, . . . , xi−1, xi + h, xi+1, . . . , xn) − f(x1, . . . , xn)] , h→0 h
n where ei := {0,..., 0,1, 0,..., 0} is the basis vector in R . | {z } i−1
Since U is open, there exists an open ball Bε(x) ⊆ U. In the definition of lim one h→0 considers only “small” h with |h| < ε.
Dif(x) is called the i-th partial derivative of f at point x.
Notation: We also write Dxi f(x), ∂if(x), ∂f(x)/∂xi.
The partial derivative Dif(x) can be interpreted as a usual derivative w.r.t. the i- th coordinate, whereby all the other n − 1 coordinates are kept fixed. Namely, in the
ε−neighbourhood of xi, let us define a function
(xi − ε, xi + ε) 3 ξ → gi(ξ) := f(x1,i−1 , ξ, xi+1, . . . , xn).
Then by Definition 2.1.1,
gi(xi + h) − gi(xi) 0 Dif(x) := lim = gi(xi). h→0 h
Definition 2.1.2. A function f : U → R is called partially differentiable if Dif(x) exists for all x ∈ U and all 1 ≤ i ≤ n. Furthermore, f is called continuously partially differentiable, if all partial derivatives
Dif : U → R, 1 ≤ i ≤ n
are continuous functions.
2 Example 2.1.3.
(i) Distance function q 2 2 n r(x) := |x| = x1 + ... + xn, x ∈ R
Let us show that r(x) is partially differentiable at all points x ∈ Rn\{0}. q 2 2 2 ξ → gi(ξ) := x1 + ... + ξ + ... + xn ∈ R.
Use the chain rule for the derivatives of real-valued functions (cf. standard courses in Calculus) =⇒ ∂r 1 2x x (x) = i = i . ∂x 2 p 2 2 2 r(x) i x1 + ... + ξ +n n Generalization: Let f : R+ → R be differentiable, then R 3 x → f(r(x)) is partially differentiable at all points x ∈ Rn\{0} and ∂ ∂r x f(r) = f 0(r) · = f 0(r) · i . ∂xi ∂xi r
(ii) Cobb–Douglas production function with n inputs
α1 α2 αn f(x) := x1 x2 . . . xn for αi > 0, 1 ≤ i ≤ n,
defined on
U := {(x1, . . . , xn)| xi > 0, 1 ≤ i ≤ n} . Calculate the so-called marginal-product function of input i
∂f α1 αi−1 αn f(x) (x) = αix1 i . . . xn = αi . ∂xi xi Mathematicians will say: multiplicative functions with separable variables, polynomi-
als. Economists are especially interested in the case αi ∈ (0, 1). This is an example of homogeneous functions of order (degree)
a = α1 + ... + αn, which means
f(λx) = λaf(x), ∀λ > 0, x ∈ U.
Moreover, the Cobb–Douglas function is log −linear:
log f(x) = α1 log x1 + ... + αn log xn.
3 (iii) Quasilinear utility function: f(m, x) := m + u(x)
with m ∈ R+ (i.e., m ≥ 0) and some u : R → R. ∂f ∂f = 1, = u0(x). ∂m ∂x (iv) Constant elasticity of substitution (CES) production function with n inputs, which describes aggregate consumption for n types of goods.
α α 1/α f(x1, . . . , xn) := (δ1x1 + ... + δnxn) , X with α > 0, δi > 0 and δi = 1, 1≤i≤n defined on the open domain
U := {(x1, . . . , xn) | xi > 0, 1 ≤ i ≤ n}. We calculate the marginal-product function
∂f 1 1 α α α −1 α−1 (x) = (δ1x1 + ... + δnxn) · αδixi ∂xi α 1−α α−1 α α α = δixi (δ1x1 + ... + δnxn) . Note that f is homogeneous : f(λx) = λf(x).
Definition 2.1.4. Let U ⊆ Rn be open and f : U → R be partially differentiable. Then, the vector ∂f ∂f n ∇f(x) := gradf(x) := (x),..., (x) ∈ R ∂x1 ∂xn is called the gradient of f at point x ∈ U.
Example 2.1.5.
(i) Distance function r(x) x gradr(x) = ∈ n, x ∈ U := n\{0}. r(x) R R
(ii) Let f, g : U → R be partially differentiable. Then ∇(f · g) = f · ∇g + g · ∇f.
Proof. This follows from the product rule ∂ ∂g ∂f (fg) = f + g . ∂xi ∂xi ∂xi
4 2.2 Directional Derivatives
Fix a directional vector v ∈ Rn with |v| = 1 (of unit length!).
Definition 2.2.1. The directional derivative of f : U → R at a point x ∈ U along the unit vector v ∈ Rn (i.e., with |v| = 1) is given by f(x + hv) − f(x) ∂vf(x) := Dvf(x) := lim . h→0 h Remark 2.2.2.
(i) Define a new function
h → gv(h) := f(x + hv).
If gv(h) is differentiable at h = 0, then f(x) is differentiable at point x ∈ U along direction v and 0 Dvf(x) = gv(0).
(ii) From the above definitions it is clear that the partial derivatives = directional deriva-
tives along the basis vectors ei, 1 ≤ i ≤ n, ∂f (x) = Dei f(x), 1 ≤ i ≤ n. ∂xi
Example 2.2.3. Consider the “saddle” function in R2
2 2 f(x1, x2) := −x1 + x2, √ √ and find Dvf(x) along the direction v := 2/2, 2/2 , |v| = 1. Define
√ 2 √ 2 gv(h) : = − x1 + h 2/2 + x2 + h 2/2 √ 2 2 = −x1 + x2 + 2h(x2 − x1). √ 0 Then Dvf(x) = gv(0) = 2(x2 − x1). Note that Dvf(x) = 0 if x1 = x2. The function f has its minimum at the diagonal x1 = x2.
Relation between ∇f(x) and Dvf(x):
n X Dvf(x) = h∇f(x), viRn = ∂if(x) · vi. (∗) i=1 Proof. will be done later, as soon as we prove the chain rule for ∇f.
5 2.3 Higher Order Partials
Let f : U → R be partially differentiable, i.e., ∂ ∃ f : U → R 1 ≤ i ≤ n. ∂xi Analogously, for 1 ≤ j ≤ n we can define (if it exists)
∂ ∂ f : U → R. ∂xj ∂xi Notation: ∂2f ∂2f or 2 if i = j. ∂xj∂xi ∂xi Warning: In general, ∂2f ∂2f 6= if i 6= j. ∂xj∂xi ∂xj∂xi
Theorem 2.3.1 ((A. Schwarz); also known as Young’s theorem). Let U ⊆ Rn be open and f : U → R be twice continuously differentiable, f ∈ C2(U), (i.e., all derivatives 2 ∂ f , 1 ≤ i, j ≤ n, are continuous). Then for all x ∈ U and 1 ≤ i, j ≤ n ∂xj ∂xi
∂2f(x) ∂2f(x) = , ∂xj∂xi ∂xj∂xi i.e., for cross-partial derivatives, the order of differentiation in their computing is irrele- vant.
Example: (i) The above theorem works:
2 2 2 f(x1, x2) := x1 + bx1x2 + x2, (x1, x2) ∈ R .
Counterexample: (ii) The above theorem does not work:
2 2 x1 − x2 2 x1x2 2 2 , (x1, x2) ∈ R \{(0, 0)} f(x1, x2) := x1 + x2 0, (x1, x2) = (0, 0).
We calculate ∂2f ∂2f (0, 0) = −1 6= 1 = (0, 0). ∂x1∂x2 ∂x2∂x1 Reason: f∈ / C2(U).
6 Notation: ∂kf Dik i1 f, , ∂xik . . . ∂xi1
for any i1, . . . , ik ∈ {1, . . . n}.
In general, for any v ∈ Rn with |v| = 1, we have by (∗)
|D f(x)| ≤ |∇f(x)| . v Rn Geometrical interpretation of ∇f: Define the normalized vector
∇f(x) n v := ∈ R . |∇f(x)| n R Then, for this v D f(x) = h∇f(x), vi = |∇f(x)| . v Rn Rn In other words, the gradient ∇f(x) of f at point x is the direction in which the slope of f is the largest in absolute value.
2.4 Total Differentiability Intuition: Repetition of the 1-dim case
Definition 2.4.1. A function g : R → R is differentiable at point x ∈ R if the following limit exists g(x + h) − g(x) 0 lim =: g (x) ∈ R. (∗) h→0 h Geometrical picture: Locally, i.e., for small |h| → 0, we can approximate the values of g(x + h) by the linear function g(x) + ah with a := g0(x) ∈ R. Indeed, the limit (∗) can be rewritten as g(x + h) − [g(x) + ah] lim = 0. h→0 h
The approximation error Eg(h) equals
Eg(h) := g(x + h) − [g(x) + ah] ∈ R and it goes to zero with h:
E (h) |E (h)| lim g = 0 i.e., lim g = 0. h→0 h h→0 |h|
7 The latter can be written as
Eg(h) = o(h) as h → 0, g(x + h) ∼ g(x) + ah as h → 0.
Summary: g : R → R is differentiable at x ∈ R if, for points x + h sufficiently close to x, the values g(x + h) admit a “nice” approximation by a linear function g(x) + ah, with an error
Eg(h) := g(x + h) − g(x) − ah that goes to zero “faster” than h itself, i.e., |E (h)| lim g = 0. h→0 |h| Now we extend the notion of differentiability to functions f : Rn → Rm, for arbitrary n, m ≥ 1 : f1(x) f1(x1, . . . , xn) n . . m R 3 x = (x1, . . . , xn) → f(x) = . = . ∈ R . fm(x) fm(x1, . . . , xn) Definition 2.4.2. Let U ⊂ Rn be open, and let f : U → Rm. The function f is (totally) differentiable at a point x ∈ U if there exists a linear mapping
n m A : R → R such that in some neighbourhood of x, (i.e., for small enough h ∈ Rn with |h| < ε), there is a presentation
f(x + h) = f(x) + Ah + Ef (h), (∗∗) where the error term m Ef (h) := f(x + h) − f(x) − Ah ∈ R obeys kE (h)k m lim f R = 0. h→0 khk n R The derivative Df(x) of f at point x is the matrix A. Remark 2.4.3.
(i) Each linear map A : Rn → Rm can be represented by the m × nmatrix (with m rows and n columns) a11 a12 . . . a1n a21 a22 . . . a2n (a ) = , ij 1≤i≤m . . . 1≤j≤n . . . am1 am2 . . . amn
8 which describes the action of the linear map A on the canonical basis (ej)1≤j≤n in n t R , ej = (0,..., 0, 1, 0,..., 0) (vertical column or n × 1matrix), | {z } j a1j a2j Ae = ∈ m, 1 ≤ j ≤ n. j . R . amj
Below we always identify the linear mapping A : Rn → Rm with this matrix, which acts as a11 a12 . . . a1n h1 a11h1 + ... + a1nhn a21 a22 . . . a2n h2 a21h1 + ... + a2nhn Ah := = ∈ m, . . . . . R . . . . . am1 am2 . . . amn hn am1h1 + ... + amnhn
n whereby the vector h = (h1, . . . , hn) ∈ R is considered as an n × 1 matrix. The identity (∗∗) can be rewritten in coordinate form as
( Pn fi(x + h) = fi(x) + j=1 aijhj + Ei(h), i = 1, . . . , m,
with |E (h)| lim i = 0. h→0 khk n R It is obvious that the vector-valued function f : U → Rm is differentiable at a point x ∈ U if and only if all coordinate mappings fi : U → R, 1 ≤ i ≤ m, are differentiable. (ii) Symbolically we write
Ef (h) = o (khkRn ) , as h → 0.
(iii) Let f : Rn → R (i.e., m = 1). Then