Taylor Expansions and (log)linearizing

Stéphane Dupraz

1 Mean Value Thoerem

Theorem 1.1. Mean Value Theorem in R Let f :[a, b] → R. If f is continuous on [a, b] and differentiable on (a, b), then there exists z ∈ (a, b) such that:

f(b) − f(a) = f 0(z)(b − a)

Proof. The proof is in two steps. First we prove the result—called Rolle’s theorem—in the particular case when f(b) = f(a). In this case we want to find z such that f 0(z) = 0: a critical point. Since f is continuous on [a, b] compact, f has a maximum and a minimum on [a, b] by Weierstrass theorem. To conclude, we just need to make sure that there is one maximum or minimum on the interior (a, b). There are three cases. First, if f is constant over [a, b] then f 0 = 0 and any z will do. Second, if there exists some x such that f(x) > f(a), then there is an interior maximum. Finally, if there exists some x such that f(x) < f(a), then there is an interior minimum. For the general case, the trick is to define the g(t) = (f(b) − f(a))t − (b − a)f(t). If satisfies g(b) = g(a), so we can apply Rolle’s theorem to g: there exists z ∈ (a, b) such that g0(z) = f(b)−f(a)−(b−a)f 0(z) = 0. QED.

The mean value theorem has a straightforward extension to function from Rn to R.

Theorem 1.2. Mean Value Theorem in Rn Let f : S → R, S an open subset of Rn. If f is differentiable, then for any x, y ∈ S such that [x, y] ∈ S, there exists z ∈ (x, y) (that is z =

1 λx + (1 − λ)y for some λ ∈ (0, 1)) such that:

f(x) − f(y) = f 0(z)(x − y)

Proof. Just define g(λ) = f(λx + (1 − λ)y) for λ ∈ [0, 1] and apply the mean value theorem for function from

R to R.

2 2 Taylor Expansions

We start with kth-order Taylor expansions for functions from R to R, then consider first and second-order Taylor expansions for functions from Rn to R.

2.1 Taylor formula for functions from R to R

We have seen that we can look at the of a function f at a point x as providing the best affine approximation of f around x. We say that we linearize the function f around x. A Taylor expansion of order k of f at x generalizes this notion by looking at the best of f around x by a polynomial of order k. There exist variants of the Taylor theorem, depending on the precise meaning of “best approximation” (some require f to be k + 1 times differentiable, other only k times differentiable). Here we state the Taylor-Lagrange theorem, whose proof relies on the mean value theorem.

Theorem 2.1. Taylor-Lagrange theorem for functions from R to R Let f :[a, b] → R. If f is k + 1 times differentiable on [a, b], then for all x, y ∈ [a, b], there exists z ∈ (x, y) such that:

k  X f (i)(x)  f (k+1)(z) f(y) = f(x) + (y − x)i + (y − x)k+1 i! (k + 1)! i=1

k X f (i)(x) We call the term f(x) + (y − x)i the Taylor expansion of order k of f at x. i! i=1 We often note the result less rigorously as:

k X f (i)(x) f(y) ' f(x) + (y − x)i i! i=1

Proof. The proof is by induction on k. The base case k = 0 is the mean value theorem. The induction step relies on the mean-value theorem. We do not go into the details here.

2.2 Second-order Taylor approximations for functions from Rn to R

For functions of n variables, we restrict to first and second-order Taylor expansions. We have already seen first-order Taylor expansions—differentiation/linearization. Stated not too rigorously, the second-order Taylor

3 expansion of a twice differentiable function from Rn to R is

1 f(y) ' f(x) + f 0(x)(y − x) + (y − x)0f 00(x)(y − x) 2 n n n X 1 X X ' f(x) + f 0(x)(y − x ) + f 00 (x)(y − x )(y − x ) i i i 2 ij i i j j j=1 i=1 j=1

The main application of first and second-order Taylor approximations you will see will be in solving Dynamic Stochastic General Equilibrium (DSGE) models in macroeconomics. The perturbations method solves numerically for the equilibrium of an economy defined by a set of difference by first taking a first-order or second-order approximation to the equations around the steady-state of the .

4 3 Linearizing

We now take a more applied turn on first-order approximations: linearization.

3.1 Linearizing a function

As we have seen, to get the first-order Taylor expansion of a function f at a point x∗—to linearize the function f at x∗—we simply need to calculate its derivative at x∗, f 0(x∗). Then the approximation is f(x) = f(x∗) + f 0(x∗)(x − x∗). Defining dx ≡ x − x∗ and df(x) ≡ f(x) − f(x∗), we can note this as:

0 ∗ 0 ∗ 0 ∗ 0 ∗ df(x) = f (x )dx = f1(x )dx1 + f2(x )dx2 + ... + f (x )dxn and see it as relating the deviation of f(x) from f(x∗) to the deviation of x from x∗. This is most useful to linearize equations. Consider an of the form:

f(x1, ..., xn) = 0

We can differentiate both sides at x∗ to get:

0 ∗ 0 ∗ 0 ∗ 0 ∗ f (x )dx = f1(x )dx1 + f2(x )dx2 + ... + fn(x )dxn = 0

This turns the equation that relates the variables x1,...,xn into an that relates the deviations of each variables dx1, ..., dxn.

3.2 Practical rules

The rules of differentiation can be easily expressed in those terms. The linearity of the differentiation operator and the chain rule translate into:

1. d(x + y) = dx + dy.

2. d(λx) = λdx.

3. df(g(x)) = f 0(g(x∗))dg(x) = f 0(g(x∗))g0(x∗)dx.

For example, to linearize a resource constraint:

f(Kt) = Ct + Kt+1 − (1 − δ)Kt

5 ∗ ∗ around Ct = C and Kt+1 = Kt = K :

0 ∗ f (K )dKt = dCt + dKt+1 − (1 − δ)dKt

6 4 Loglinearizing

In , it is often more relevant to focus on relative deviations (or percentage deviations) rather than on absolute deviations (or level deviations). For instance, knowing that “consumption is 5% below its steady-state value” is more meaningful than knowing that “consumption is $100 below its steady-state value”, the meaning of which depends on how high steady-state consumption is. We are more interested in the relative deviation variables, that we will denote with a hat:

dx x − x∗ x = = b x∗ x∗

But it turns out that at first order1:

dx x = = dx × ln0(x∗) = d ln(x) b x∗ where the last term means (as for any function): d ln(x) = ln(x)−ln(x∗). For this reason, turning a function—a relationship between inputs x1, ..., xn and output f(x)—into a of the relative variation in x, xˆ1,...,xˆn, and the relative variation in y, yˆ, is called loglinearizing the function. (Turning an equation— a relationship between variables x1,...,xn—into a linear equation that relates the relative variations of each variables xˆ1, ..., xˆn, is called loglinearizing the equation).

4.1 Formally

Formally, to loglinearize a function f(x) instead of linearizing it, we first do some changes of variables.

• Instead of the variation dxi, we want the variation d ln(xi), so we define yi = ln(xi).

• Instead of the variation df(x), we want the variation d ln(f(x)), so we consider the function ln(f(x)).

This way, we consider the function:

g(y) = ln(f(ey1 , ..., eyn ))

∗ ∗ 1You can also see it through : dln(x) = ln(x) − ln(x∗) = ln x  = ln x−x + 1 ' x−x ≡ x. x∗ x∗ x∗ b

7 Linearizing:

∗ n 0 y∗,...,eyn f (e 1 ) ∗ X i yi dg(y) = y∗ y∗ e dyi f(e 1 , ..., e n ) i=1 n X f 0(x∗) d ln(f(x)) = i x∗d ln(x ) f(x∗) i i i=1

∗ Definition 4.1. The elasticity of a function f at x with respect to xi is:

0 ∗ ∗ i ∗ fi (x ) ∗ ∂ ln(f(x )) εf (x ) = ∗ xi = f(x ) ∂ ln(xi)

∗ It expresses the percentage change of f(x) for a percentage increase in xi of 1% (at x ).

We finally get:

∗ ∗ fd(x) = ε1(x )ˆx1 + ... + εn(x )ˆxn

In practice, we can rely on two approaches to loglinearize a function or an equation.

4.2 First Approach and Practical Rules

In practice, we can use the following method:

Approach 1. To loglinearize a function f:

1. Take the log of f.

2. Linearize ln(f), remembering dln(x) = xb.

For instance let us loglinearize a Cobb-Douglas production function Y = AKαL1−α at any point:

ln(Y ) = ln(A) + α ln(K) + (1 − α) ln(L)

d ln(Y ) = d ln(A) + αd ln(K) + (1 − α)d ln(L)

Yˆ = Aˆ + αKˆ + (1 − α)Lˆ

This example is straightforward because of the following practical rules for loglinearization, which are simply the translation of the corresponding rules for linearization:

8 1. xyc = xb + yb

α 2. xc = αxb

∗ ∗ 3. f\(g(x)) = εf (g(x ))εg(x )ˆx.

Power functions have constant elasticities; they are to loglinearization what linear functions, which have a constant derivative, are to linearization.

4.3 Second approach to deal with sums

The previous rules imply that it is straighforward to loglinearize a function that contains only products and power functions using the first approach. However, with the first approach it is not so easy to deal with sums. In contrast, it was easy to deal with sums when linearizing. Hence, when a function f includes sums, it is easier to loglinearize it using the alternative method:

Approach 2. To loglinearize a function f:

1. Linearize f.

∗ 2. Replace absolute deviations dx by x xb.

For instance, to loglinearize the resources constraint above:

0 ∗ f (K )dKt = dCt + dKt+1 − (1 − δ)dKt

0 ∗ ∗ ∗ ∗ ∗ f (K )K Kˆt = C Cˆt + K Kˆt+1 − (1 − δ)K Kˆt C∗ f 0(K∗)Kˆ = Cˆ + Kˆ − (1 − δ)Kˆ t K∗ t t+1 t

4.4 Mixing the two approaches

Which of the two approaches to use in practice? We have seen that when a functions involves products and power functions (for instance the Cobb-Douglas production function), taking the log and linearizing is the quickest way to go. But when the function is a sum (for instance a resource or budget constraint), then it

∗ is easier to linearize and replace dx = x xb. Now most functions are sums of products or products of sums. Consider for instance the following resource constraint, where the production function is Cobb-Douglas:

AKαL1−α = C + I

9 It is a sum, but with one of its terms (on the LHS) being itself a product. Start by seeing it as a sum, as it is

∗ a sum, so linearize, remembering dx = x xb:

α 1−α (A∗K∗ L∗ )AK\αL1−α = C∗Cb + I∗Ib

\α 1−α Then treat the product AtKt Lt with the rule on products and power functions to get:

α 1−α (A∗K∗ L∗ )(Ab + αKb + (1 − α)Lb) = C∗Cb + I∗Ib  C∗   I∗  Ab + αKb + (1 − α)Lb = α 1−α Cb + α 1−α Ib A∗K∗ L∗ A∗K∗ L∗

Alternatively, you can face a product some of the factors of which are sums. Then treat it as a product (take the log and linearize), and then use the rules on sums to treat each factor. Finally, if you are running into a function whose functional form is not made explicit, use the general formula with elasticities.

4.5 When loglinearizing equations, reduce the number of constants

∗ ∗ ∗ When we are loglinearizing a function f(x1, ..., xn) at x , constants may appear that depend on x1, ..., xn. If ∗ we are loglinearizing an equation f(x1, ..., xn) = 0, those constants are linked through the equation at x ,

∗ ∗ ∗ f(x1, ..., xn) = 0. For instance, in the previous example, we have two constant that depend on x . However, they are linked through:

α 1−α (A∗K∗ L∗ ) = C∗ + I∗ C∗ I∗ 1 = + A∗K∗α L∗1−α A∗K∗α L∗1−α

∗ Noting s = C the share of consumption in GDP, the loglinearized equation becomes: C A∗K∗α L∗1−α

Ab + αKb + (1 − α)Lb = sC Cb + (1 − sC )Ib

There is some arbitrariness in the choice of the constants to keep.

4.6 A remark on interest, inflation, ... rates

Just a remark to finish: many authors use a different definition of the hat variable for interest and inflation rates. What they note rb, which could be taken as the relative deviation of the net interest rate, is actually the

10 deviation from trend of the gross interest rate, 1[ + r. The connection between the two is :

d(1 + r) dr r∗ 1[ + r = = = r 1 + r∗ 1 + r∗ 1 + r∗ b

The advantage of doing so is that it allows to consider the case of a steady-state value of the net rate equal to zero, which might be of particular interest for the nominal interest rate or the inflation rate. Indeed, we cannot loglinearize the net interest rate around zero if its steady-state value is zero, but we can loglinearize the gross interest rate around 1. Note that in this case, the relative deviation of the net interest rate is the net interest rate itself:

1\ + rt = drt = rt

11