<<

101A Section Notes GSI: David Albouy Notes on and Optimization

1 Basic Calculus 1.1 Definition of a Let f (x) be some of x, then the derivative of f, if it exists, is given by the following df (x) f (x + h) f (x) = lim − (Definition of Derivative) dx h 0 h → df although often this definition is hard to apply directly. It is common to write f 0 (x),or dx to be shorter, or dy if y = f (x) then dx for the derivative of y with respect to x.

1.2 Calculus Rules Here are some handy formulas which can be derived using the definitions, where f (x) and g (x) are functions of x and k is a which does not depend on x. d df (x) [kf (x)] = k (Constant Rule) dx dx d df (x) df (x) [f (x)+g (x)] = + (Sum Rule) dx dx dy d df (x) dg (x) [f (x) g (x)] = g (x)+f (x) () dx dx dx d f (x) df (x) g (x) dg(x) f (x) = dx − dx () dx g (x) [g (x)]2 · ¸ d f [g (x)] dg (x) f [g (x)] = () dx dg dx For specific forms of f the following rules are useful in economics d xk = kxk 1 () dx − d ex = ex (Exponent Rule) dx d 1 ln x = ( Rule) dx x 1 Finally assuming that we can invert y = f (x) by solving for x in terms of y so that x = f − (y) then the following rule applies 1 df − (y) 1 = 1 (Inverse Rule) dy df (f − (y)) dx Example 1 Let y = f (x)=ex/2, then using the exponent rule and the chain rule, where g (x)=x/2,we df (x) d x/2 d x/2 d x x/2 1 1 x/2 get dx = dx e = d(x/2) e dx 2 = e 2 = 2 e . The of f can be gotten · x/2 x/2 x x by solving for x in terms of y. y = e ln y =ln e = 2 ln (e)= 2 x =2lny using the facts ¡ ¢ u ¡ ¢ ¡ ⇒¢ ¡ ¢ 1⇒ from that ln e = u ln e and that ln e =1. The derivative of f − (y) can be solved directly as 1 d d 1 2 ¡ ¢ df − (y) 1 1 dy (2 ln y)=2dy (ln y)=2 y = y or indirectly through the Inverse Rule as dy = 1 x/2 = 1 2lny/2 = 2 e 2 e 2 2 ln y ln y = using the fact that³e ´ = y. e y 1 y x 5

10 2.5

7.5

0 0 2.5 5 7.5 10 5 y -2.5 2.5

0 -5 -5 -2.5 0 2.5 5

x dy 1 x =2lny dx = 2 y =exp(x/2) dx = 2 exp (x/2) dy y

1.3 Partial Differentiation Taking of functions in several variables is very straightforward (at least with the functions used in economics). Say we have a function in two variables, x and y,whichwecanwriteash (x, y). Partial differentiation, which uses a "∂" ("") instead of a "d", can be done in either in x or in y,bywritingeither ∂ ∂ " ∂x"or"∂y "infrontofh (x, y), respectively, and then taking the derivative with respect to the chosen as we would in the single variable case, treating the other variable as a constant.

Example 2 Let h (x, y)=3x2 ln y, shown below in the graphs for ranges 0 x 2 and 0 y 10 ≤ ≤ ≤ ≤

y 10 h(x,y) 20

7.5 10 57.5 y 02.50.25 0.5 0.75 1 1.25 1.5 1.75 2 x 5

-20

2.5 -40

0.5 1 1.5 2

x 3-D representation of 3x2 ln y Level 3x2 ln y = k for k =1, 2,...,27

Note the second plot of the level curves are like what you would find in a topographic contour map. Applying the partial differentiation rules we get the partial derivatives

∂ ∂ 3x2 ln y =3lny x2 =3lny (2x)=6x ln y ∂x ∂x µ ¶ ∂ ¡ ¢ ∂ 1 3x2 3x2 ln y =3x2 ln y =3x2 = ∂y ∂y y y µ ¶ µ ¶ ¡ ¢

2 1.4 Total Differentiation Say that we know already that y is a function of x,sayy = f (x) and we want to see how a function in both variables, say h (x, y) responds when we let both x change and y change according to how it should according to f. The we could rewrite our expression as a function in a single variable H (x)=h (x, f (x)) and take the with respect to x to get dH (x) ∂h(x, f (x)) ∂h(x, f (x)) df (x) = + (Total Derivative) dx ∂x ∂y dx where the second term makes use of the chain rule. x/2 2 df (x) 1 x/2 Example 3 Let y = f (x)=e and again h (x, y)=3x ln y.Thendx = 2 e .Andsobythe dH(x) 3x2 1 x/2 above formulas we have dx =6x ln y + y 2 e . Notice that this expression is in both x and y.It x/2 dH(x) x/2 3x2 1 x/2 can be simplified to be just a function of x by substituting in y = e so dx =6x ln e + ex/2 2 e = 2 3x2 9 2 3x + 2 = 2 x . Of course had we substituted in directly for y from the beginning we would have gotten 2 x/2 3 3 dH(x) 9 2 H (x)=3x ln e = 2 x and differentiating directly dx = 2 x , but this is not as general a treatment.

y 10 h(x,y) 20

7.5 10 57.5 y 02.50.25 0.5 0.75 1 1.25 1.5 1.75 2 x 5

-20

2.5 -40

0.5 1 1.5 2

x exp (x/2) crossing level curves h (x, f (x)) is given by on the

1.5 Implicit Differentiation Imagine that it is known that y depends on x, but that we do not have an "explicit" formula for y in terms of x, but that we do know that together x and y must satisfy some h (x, y)=k (Equation in 2 variables) where h (x, y) is a function and k is just a constant (any equation in x and y can be written like this). Sometimesitisdifficult to solve "explicitly" for y in terms of x, i.e. to find the function f such that y = f (x). Nevertheless by the theorem we can still findthederivativeoff by differentiating "implicitly" with respect to x,treatingy as a function of x,sowecanwrite H (x)=h (x, f (x)) = k Using what we learned about total differentiation we can differentiate totally with respect to x to get dH (x) ∂h(x, f (x)) ∂h(x, f (x)) df (x) = + =0 dx ∂x ∂y dx df (x) andthensolvingfor dx we get df (x) ∂h(x,f(x)) = ∂x (Implicit Derivative) dx − ∂h(x,f(x)) ∂y Note that this trick does not if ∂h(x,f(x)) =0since the derivative is undefined. ∂y 3 Example 4 Assume that our equation is h (x, y)=3x2 ln y =3. Thenusingthederivativewetookabove 2 2 we have dH(x) =6x ln y+ 3x df (x) =0.Solvingfordf (x) we get df (x) = (6x ln y) / 3x . This expression dx y dx dx dx − y is in both x and y, but we could evaluate it meaningfully for any value of x and y satisfying³ ´ 3x2 ln y =3for df (1) 3 instance x =1and y = e which means that dx = (6 1) / e = 2e. Of course this derivative may be different for different values of x. − · − ¡ ¢

y 10

7.5

5

2.5

0.5 1 1.5 2

x Implicit derivative at x =1,y = e is 2e −

2 Optimization 2.1 Increasing and Decreasing Functions By looking at the definition of a derivative we can say "loosely speaking" that for h small h 0 and positive h>0 that ≈ df (x) f (x + h) f (x) − dx ≈ h or that df (x) h = f (x + h) f (x) dx − Now we say that a function f is increasing at x if we have f (x + h) >f(x) or f (x + h) f (x) > 0. df (x) − However since h>0 this implies that dx > 0. Equivalently a function f is decreasing at x if we have df (x) f (x + h)

2.2 Unconstrained Optimization 2.2.1 One Variable Consider first the case with one variables x where x R,i.e. x cantakeonanyrealvalue. Wewishto maximize the objective function f (x) and there are∈ no constraints. Thus we solve

max f (x) x Assuming f (x) has a maximum (for example f does not go to )1 andthatitisdifferentiable everywhere, ∞ then any maximum (x∗) must solve the following first order necessary conditions (FOC) df (x ) ∗ =0 (FOC) dx 1 Weierstrauss’s Theorem say shat f (x, y) will be guaranteed to have a maximum if f is continuous and x and y are defined over a closed and bounded domain. See an book for more. 4 This is a single equation in a single unknown and so it should be solvable for x∗. and y∗. df (x∗) df (x∗) df (x∗) Why does this make sense? Well say dx =0then that means that either dx > 0 or dx < 0, i.e. f is either increasing or decreasing. If f is increasing6 however we could increase the value of f by increasing x a "little bit" to x∗ + h in which case x∗ never gave us our maximum. Similarly if f is decreasing we could reduce x a little bit and increase the value of x. Either case is impossible at a maximum and therefore we df (x∗) must have that dx =0. Remeber that a necessary condition is different from a sufficient condition in df 2 that we could have dx (x)=0and not be at a maximum, but at a minimum or a "." Also we have to be careful to check points where f is not differentiable. Third, there may be several points that df (x) satisfy the condition dx =0Typically we will try and avoid these situations and deal with points where x∗ is uniquely determined by (FOC), but not always.

df (x∗) 1 1 Example 5 f (x)=lnx +ln(1 x)+3.TheFOCimplies dx = x 1 x =0 x∗ =1 x∗ x∗ = − ∗ ∗ − − ∗ ⇒ − ⇒ 1/2.Themaximumvalueoff achievesisthenf (1/2) = 2 ln (1/2) + 3 ∼= 1. 614

y 2.5

1.25

0 0.25 0.5 0.75

x -1.25

-2.5

Graphic Representation of FOC

2.2.2 Two Variables Now consider the case with two variables x and y,wherebothx, y R. Herewewishtosolve ∈ max f (x, y) x,y

With similar conditions we can expect the maximum (x∗,y∗) must solve the following first order necessary conditions (FOC)

∂f (x ,y ) ∗ ∗ =0 (FOC1) ∂x ∂f (x ,y ) ∗ ∗ =0 (FOC2) ∂y

This is a system of two in two unknowns, and thus should usually be solvable for both x∗ and y∗. The intuition for these conditions is similar to that above.

2 2 ∂f(x∗) ∂f(x∗) Example 6 Let f (x, y)= x +9x + xy y .TheFOCare = 2x∗ + y∗ +9 = 0, = − − ∂x − ∂y 2y∗ + x∗ =0. The second equation implies x∗ =2y∗ which can be substituted into the first equation to − 2 2 yield 4y∗ +y∗ +9 = 0 y∗ =3and x∗ =6. The maximum value attained is f (6, 3) = 6 +54+18 3 = − ⇒ − − 2 The second order necessary condition (SOC) states that the derivate of f should be decreasing, which implies that 2 the be negative d f (x )= d df (x ) 0. Thisimpliesthatf will be increasing, then flat (at the dx2 ∗ dx dx ∗ ≤ maximum) and then decreasing afterwards. If f wereh first decreasing,i then flat, and then increasing, that would imply that x∗ was at a minimum. 5 27. Note that we can be sure that this is a unique and global maximum because f is everywhere concave as the graph suggests.

y 6 f(x,y) 25 5 12.5 5 6 4 2 3 4 y 0 1 2 4 6 3 -12.5 8 10 12 -25 x 2

1

0 0 2 4 6 8 10 12

x Surface to horizontal plane. Equations determined by FOC

2.3 Optimization with Equality Constraints Say there is a constraint on x and y which we can write as g (x, y)=k. There are two ways of solving the ensuing maximization problem. The first via substitution, the second via a Lagrangean.

2.3.1 Substitution Pursuging the first route we use the constraint equation to solve for one variable in terms of the other, i.e. we find another function y = h (x), such that g (x, h (x)) = k. This can then be substituted into the objective function to get a maximization in a single variable

max f (x, h (x)) x which taking the total derivative yields the FOC

∂f (x ,h(x )) ∂f (x ,h(x )) dh (x ) ∗ ∗ + ∗ ∗ ∗ =0 (Substitution FOC) ∂x ∂y dx

We could use this one equation to solve for x∗,andthenfind y∗ = h (x∗). Differentiating the budget constraint implicitly we can solve for the of h (x)

∂g (x, h (x)) ∂g (x, h (x)) dh (x) dh (x) ∂g(x,h(x)) + =0 = ∂x ∂x ∂y dx ⇒ dx − ∂g(x,h(x)) ∂y Plugging this into the FOC and rearranging means that at the optimum

∂f(x∗,h(x∗)) ∂g(x∗,h(x∗)) ∂x = ∂x (Tangency) ∂f(x∗,h(x∗)) ∂g(x∗,h(x∗)) ∂y ∂y which is a general tangency condition, which we see many versions of throughout economics. It says that the constraint g and the level curve of f at f ∗ = f (x∗,y∗) should be tangent at the maximum, which means the curves meet and have the same slope.. A level curve of f at z is just all the points x, y such that f (x, y)=z (like with indifference curves)

Example 7 Let f (x, y)=x + y subject to g (x, y)=x2 + y2 =1=c.Thenh (x)=y = √1 x2,andso f x, √1 x2 = x + √1 x2.TheFOCis1 x =0 √1 x2 = x 1 x2 = x2 − x2 = 1 √1 x2 2 − − − −6 ⇒ − ⇒ − ⇒ ⇒ ¡ ¢ x = 1 and y = 1 achieving f 1 , 1 = √2 Note that there is a second tangency at the minimum ∗ √2 ∗ √2 √2 √2 1 1 x∗ = and y∗ = . ³ ´ − √2 − √2

y 2

5

1 2.5 -2.5 -1.25 0 -2.5 -1.25 00 0 1.25 1.25 2.5 2.5 -2 -1 0 1 2 y -2.5 x x -1 -5 z

-2

Constraint and Level Curves 3-D Plot of Objective and Constraint

2.3.2 Lagrangean An equivalent way of solving this problem which at first may seem more difficult, but which in fact can be very useful (and sometimes easier) is to maximize a Lagrangean

L (x, y, λ)=f (x, y)+λ [c g (x, y)] (Lagrangean) − where λ is a , which we maximize just as we would an unconstrained problem, taking into account the extra variable λ.TheFOCarethen

L (x∗,y∗,λ∗) ∂f ∂g = (x∗,y∗) λ∗ (x∗,y∗)=0 (Lagrange FOC1) ∂x ∂x − ∂x L (x∗,y∗,λ∗) ∂f ∂g = (x∗,y∗) λ∗ (x∗,y∗)=0 (Lagrange FOC 2) ∂y ∂y − ∂y L (x∗,y∗,λ∗) = c g (x∗,y∗)=0 (Lagrange FOC3) ∂λ − This is a system of three equations with three unknowns, and therefore should be relatively straightforward to solve. Notice that third FOC is just the constraint restated. Solve (Lagrange FOC1) and (Lagrange FOC2) for λ and you will get ∂f ∂f (x ,y ) ∂x (x∗,y∗) ∂y ∗ ∗ λ∗ = ∂g = ∂g (Lambda) ∂x (x∗,y∗) ∂y (x∗,y∗) The second equation eliminates λ for the purposes of problem solving, which combined with the constraint (Lagrange FOC3) gives a 2 equation system in two unknowns (x∗,y∗). λ∗ can then be solved for by plugging back into (Lambda). Notice that the second part of (Lambda) can be rearranged to produce the same result as (Tangency), making the equivalence of the two approaches obvious.

2.3.3 Interpreting the Lagrange Multiplier So why all the fuss about the Lagrange multiplier? One reason why is that the Lagrange multiplier has an interesting interpretation. If we were to increase c by one unit how much higher would f go? The answer is λ∗: it tells us the value of relaxing the constraint by one unit (e.g. one dollar). More formally, take c to be a (like prices or income) that is fixed over the optimization. Then really all of our solutions x∗,y∗,λ∗ are dependent on c.Wecanthenwritex∗ (c) ,y∗ (c) ,λ∗ (c),asthe parameter c changes, these solutions will change. Now plug x (c) ,y (c) into f and we will get a new 7 ∗ ∗ function F (c)=f (x∗ (c) ,y∗ (c)) which gives us the maximum value of f we can get for a given c.Now take the total derivative of F with respect to c:

dF (c) ∂f (x (c) ,y (c)) dx (c) ∂f (x (c) ,y (c)) dy (c) = ∗ ∗ ∗ + ∗ ∗ ∗ dc ∂x dc ∂y dc ∂g (x (c) ,y (c)) dx (c) ∂g (x (c) ,y (c)) dy (c) = λ (c) ∗ ∗ ∗ + λ (c) ∗ ∗ ∗ ∗ ∂x dc ∗ ∂y dc ∂g (x (c) ,y (c)) dx (c) ∂g (x (c) ,y (c)) dy (c) = λ (c) ∗ ∗ ∗ + ∗ ∗ ∗ ∗ ∂x dc ∂y dc · ¸ where the second equality comes from using (Lagrange FOC1) and (Lagrange FOC2). The constraint implies that g (x∗ (c) ,y∗ (c)) = c.Differentiating the constraint totally with respect to c gives us ∂g (x (c) ,y (c)) dx (c) ∂g (x (c) ,y (c)) dy (c) ∗ ∗ ∗ + ∗ ∗ ∗ =1 ∂x dc ∂y dc sotheentireterminbracketsaboveequalsoneleavinguswith

dF (c) = λ (c) (Lambda Interpretation) dc ∗ which is what we wanted to show: an increase in one unit of c increases F (or f)byλ∗.

Example 8 Let f (x, y)=x + y subject to g (x, y)=x2 + y2 =1=c.Then

L (x, y, λ)=x + y + λ 1 x2 y2 − − with the FOC ¡ ¢ ∂L =1 λ2x =0 ∂x − ∂L =1 λ2y =0 ∂x − ∂L =1 x2 y2 =0 ∂x − − Solving for λ we have λ = 1 = 1 x = y Putting this into the third equation we have 1 x2 x2 =0 2x 2y ⇒ − − ⇒ 2x2 =1 x2 = 1 x = 1 = y .Alsoλ = 1 = 1 . 2 ∗ √2 ∗ ∗ 2 1 √2 ⇒ ⇒ √2 2.4 Optimization with Inequality Constraints In economics it is much more common to start with inequality constraints of the form g (x, y) c.The ≤ constraint is said to be binding if at the optimum g (x∗,y∗)=c, and it is said to be slack if at the optimum g (x∗,y∗)

∂f ∂g (x∗,y∗) λ∗ (x∗,y∗)=0 (KT1) ∂x − ∂x ∂f ∂g (x∗,y∗) λ∗ (x∗,y∗)=0 (KT2) ∂y − ∂y c g (x∗,y∗) 0,λ∗ 0,λ∗ [c g (x∗,y∗)] = 0 (KT3, KT4, KT5) − ≥ ≥ − The first two conditions are identical to the Lagrangean FOC’s. The third condition now has three parts. The first is just a restatement of the constraint. The second part says that λ is always non-negative. 8 ∗ Mathematically this this ensures a correct answer: if you get an (x, y) combination that implies a negative λ, then this is not a solution. Note that if we had written L (x, y, λ)=f (x, y) λ [c g (x, y)] by mistake, the correct λ would be non-positive. This would also occur if the constraint is− written− as c g (x, y) 0 instead of c g (x, y) 0. Although mathematically the choice of is a formality, economically− we want≤ the Lagrange− multiplier≥ to have a positive sign because of its economic interpretation as a . The third part (the "complementary slackness" condition) says that either λ∗ or c g (x∗,y∗) − must be zero. If λ∗ =0, then we can forget about the constraint and the Kuhn-Tucker conditions turn into the unconstrained FOC considered earlier. If λ∗ > 0 then the constraint must be binding then the problem turns into the standard Lagrangean considered above. Thus the Kuhn-Tucker conditions provide a neat mathematical way of integrating the unconstrained and constrained cases into a single formulation. Unfortunately you typically have to check both cases, unless you know in advance if it’s one or the other, which often happens. Note that the interpretation of λ∗, still stands. If λ∗ =0, this means that the constraint is slack g (x, y)

f 0 (x∗)+λ =0,x∗ 0,λ∗ 0,x∗λ∗ =0 ≥ ≥ If the nonnegativity constraint does not bind then f 0 (x∗)=0.Ifitdoesbind,thenf 0 (x∗)= λ∗ < 0 and − x∗ =0. Thus we can rephrase the FOC’s without mentioning λ∗ as

f 0 (x∗) 0 with " = "ifx∗ > 0 ≤ As a specific case consider maximizing the function f (x)= 2x subject to the non-negativity constraint − x 0. In this case the FOC is 2+λ∗ =0.Soλ∗ =2and therefore as x∗λ∗ =0,thenx∗ =0. ≥ − Example 10 Let f (x, y)= (x 2)2 (y 2)2 +5 and g (x, y)=x + y c,whereeitherc =2or c =6. The Lagrangean is − − − − ≤ L (x, y, λ)= (x 2)2 (y 2)2 +5+λ (c x y) − − − − − − andtheFOCare 2(x 2) λ =0 − − − 2(y 2) λ =0 − − − c x y 0,λ 0,λ(c x y)=0 − − ≥ ≥ − − The first two conditions imply x = y =2 λ/2,andsox + y =4 λ. Plugging this into the third condition gives c 4 λ or λ 4 c.Ifc =2−this means that λ 2 >−0 andsothefifth condition implies that 2 x ≥y =0−, i.e. the≥ constraint− binds, and so x = y =1.If≥ c =6, then the third condition gives λ 2. If−λ =0−, then the fifth condition implies x = y =3=2 λ/2,bythefirst condition. But then λ = 2≥−which confl6 icts with λ 0. Therefore it must be the case that−λ =0, and the unconstrained optimum x =−y =2is achieved. ≥

y 6 z 4

4 2 6 4 2 -2 0 0 -2 2 4 6 2 y -2 x

-4

0 2 4 6

x 9 2.5 More than One Constraint (Optional) When there is more than one constraints say g (x, y) c and h (x, y) d we can just write a longer Lagrangean with two multipliers λ and µ ≤ ≤ L (x, y, λ, µ)=f (x, y)+λ [c g (x, y)] + µ [d h (x, y)] − − The FOC’s are derived in the same way by taking the derivatives with respect to each variable (including the multipliers) and setting each equal to zero. If the constraints are not known to be binding then conditions like KT3-KT5 need to be written for each constraint (see the example below). While this may get tedious it should eventually boil down to a system of 4 equations in 4 unknowns. Moreover, if both constraints bind and are independent, then (x∗,y∗) will simply be whatever satisfies both constraints g (x∗,y∗)=c,and h (x∗,y∗)=d, a system of two equations in two unknowns and the first two FOC equations are only useful for determining the multipliers. More control variables like x and y only implies more derivatives. More constraints simply imply adding more Lagrange multipliers with or without the Kuhn-Tucker subtleties. There are other issues related to optimization that are in fact quite interesting and worth learning more about, but we won’t go over them here.3 Example 11 "Bang-Bang" Solution Consider the very straightforward problem of maximizing f (x)= ax + b a simple where a ≷ 0 is a constant of indeterminate sign. Without a constraint there is no solution to this problem (infinity isn’t technically considered a "solution"). However consider the constraints 0 and x 10. The Lagrangean will then be × ≥ ≤ L (x, λ, µ)=ax + b + λx + µ (10 x) − with FOC

a + λ∗ µ∗ =0 − x∗ 0,λ∗ 0,x∗λ∗ =0 ≥ ≥ x∗ 10,µ∗ 0, (10 x)∗ µ∗ =0 ≤ ≥ − Realize that both constraints cannot be binding at the same time so either λ∗ =0 or µ∗ =0.AlsotheFOC implies a = µ∗ λ∗ There are then 3 cases worth considering here (1) a>0 then it must be that a = µ∗ > 0 − and λ∗ =0.Otherwisea = λ∗ > 0 contradicts λ∗ 0. The complementary slackness µ constraint must − ≥ be binding and so x∗ =10(2) a<0 where by similar reasoning a = λ∗ and x∗ = µ∗ =0and (3) a =0in which case µ∗ = λ∗ =0and x∗ cantakeanyvaluebetween0and100astheyallyieldax∗ =0x∗ =0.This kind of a solution is known as a "bang-bang" solution - either you "bang" x at the top or at the bottom. You go for the maximum or minimum possible value of x.

f(x) f(x) 40

10

30 7.5

5 20

2.5 10

0

0 2.5 5 7.5 10 0

x 0 2.5 5 7.5 10

x

a =1, b =0,x∗ =10 a = 2,b=30, x∗ =0 −

3 A typical multivariate calculus book should have a chapter on this subject for equality constraints. Simon and Blume (1994) offers a full treatment and is a good read and reference with many economic applications. 10