<<

Lecture Notes1

Mathematical Ecnomics

Guoqiang TIAN Department of Texas A&M University College Station, Texas 77843 ([email protected])

This version: August 2018

1The lecture notes are only for the purpose of my teaching and convenience of my students in class, please not put them online or pass to any others. Contents

1 The Nature of 1 1.1 Economics and Mathematical Economics ...... 1 1.2 Advantages of Mathematical Approach ...... 2

2 Economic Models 4 2.1 Ingredients of a Mathematical Model ...... 4 2.2 The Real-Number System ...... 4 2.3 The Concept of Sets ...... 5 2.4 Relations and Functions ...... 7 2.5 Types of Function ...... 8 2.6 Functions of Two or More Independent Variables ...... 10 2.7 Levels of Generality ...... 10

3 Equilibrium Analysis in Economics 12 3.1 The Meaning of Equilibrium ...... 12 3.2 Partial Equilibrium - A Linear Model ...... 12 3.3 Partial Market Equilibrium - A Nonlinear Model ...... 14 3.4 General Market Equilibrium ...... 16 3.5 Equilibrium in National- Analysis ...... 19

4 Linear Models and Matrix Algebra 20 4.1 Matrix and Vectors ...... 20 4.2 Matrix Operations ...... 23 4.3 Linear Dependance of Vectors ...... 25 4.4 Commutative, Associative, and Distributive Laws ...... 26

i 4.5 Identity Matrices and Null Matrices ...... 27 4.6 Transposes and Inverses ...... 29

5 Linear Models and Matrix Algebra (Continued) 32 5.1 Conditions for Nonsingularity of a Matrix ...... 32 5.2 Test of Nonsingularity by Use of Determinant ...... 34 5.3 Basic Properties of Determinants ...... 38 5.4 Finding the Inverse Matrix ...... 42 5.5 Cramer’s Rule ...... 45 5.6 Application to Market and National-Income Models ...... 49 5.7 Quadratic Forms ...... 51 5.8 Eigenvalues and Eigenvectors ...... 54 5.9 Vector Spaces ...... 56

6 and the Concept of Derivative 62 6.1 The Nature of Comparative Statics ...... 62 6.2 Rate of Change and the Derivative ...... 63 6.3 The Derivative and the Slope of a Curve ...... 64 6.4 The Concept of Limit ...... 64 6.5 Inequality and Absolute Values ...... 68 6.6 Limit Theorems ...... 69 6.7 Continuity and Differentiability of a Function ...... 70

7 Rules of Differentiation and Their Use in Comparative Statics 73 7.1 Rules of Differentiation for a Function of One Variable ...... 73 7.2 Rules of Differentiation Involving Two or More Functions of the Same Vari- able ...... 76 7.3 Rules of Differentiation Involving Functions of Different Variables . . . . . 79 7.4 Integration (The Case of One Variable) ...... 82 7.5 Partial Differentiation ...... 84 7.6 Applications to Comparative-Static Analysis ...... 87 7.7 Note on Jacobian Determinants ...... 89

ii 8 Comparative-static Analysis of General-Functions 92 8.1 Differentials ...... 93 8.2 Total Differentials ...... 95 8.3 Rule of Differentials ...... 96 8.4 Total Derivatives ...... 97 8.5 Implicit Function Theorem ...... 100 8.6 Comparative Statics of General-Function Models ...... 104 8.7 Matrix Derivatives ...... 105

9 Derivatives of Exponential and Logarithmic Functions 107 9.1 The Nature of Exponential Functions ...... 107 9.2 Logarithmic Functions ...... 108 9.3 Derivatives of Exponential and Logarithmic Functions ...... 109

10 Optimization: Maxima and Minima of a Function of One Variable 111 10.1 Optimal Values and Extreme Values ...... 111 10.2 General Result on Maximum and Minimum ...... 112 10.3 First-Derivative Test for Relative Maximum and Minimum ...... 113 10.4 Second and Higher Derivatives ...... 115 10.5 Second-Derivative Test ...... 116 10.6 Taylor Series ...... 118 10.7 Nth-Derivative Test ...... 120

11 Optimization: Maxima and Minima of a Function of Two or More Vari- ables 122 11.1 The Differential Version of Optimization Condition ...... 122 11.2 Extreme Values of a Function of Two Variables ...... 123 11.3 Objective Functions with More than Two Variables ...... 128 11.4 Second-Order Conditions in Relation to Concavity and Convexity . . . . . 130 11.5 Economic Applications ...... 133

12 Optimization with Equality Constraints 136 12.1 Effects of a Constraint ...... 136

iii 12.2 Finding the Stationary Values ...... 137 12.3 Second-Order Condition ...... 141 12.4 General Setup of the Problem ...... 143 12.5 Quasiconcavity and Quasiconvexity ...... 145 12.6 Maximization and ...... 149

13 Optimization with Inequality Constraints 152 13.1 Non-Linear Programming ...... 152 13.2 Kuhn-Tucker Conditions ...... 155 13.3 Economic Applications ...... 160

14 Linear Programming 164 14.1 The Setup of the Problem ...... 164 14.2 The Simplex Method ...... 166 14.3 Duality ...... 174

15 Continuous Dynamics: Differential Equations 177 15.1 Differential Equations of the First Order ...... 177 15.2 Linear Differential Equations of a Higher Order with Constant Coefficients 180 15.3 Systems of the First Order Linear Differential Equations ...... 184 15.4 Economic Application: General Equilibrium ...... 190 15.5 Simultaneous Differential Equations. Types of Equilibria ...... 194

16 Discrete Dynamics: Difference Equations 197 16.1 First-order Linear Difference Equations ...... 199 16.2 Second-Order Linear Difference Equations ...... 201 16.3 The General Case of Order n ...... 202 16.4 Economic Application:A dynamic model of ...... 203

17 Introduction to Dynamic Optimization 205 17.1 The First-Order Conditions ...... 205 17.2 Present-Value and Current-Value Hamiltonians ...... 207 17.3 Dynamic Problems with Inequality Constraints ...... 208 17.4 Economics Application:The Ramsey Model ...... 208

iv Chapter 1

The Nature of Mathematical Economics

The purpose of this course is to introduce the most fundamental aspects of the mathe- matical methods such as those matrix algebra, mathematical analysis, and optimization theory.

1.1 Economics and Mathematical Economics

Economics is a social science that studies how to make decisions in face of scarce resources. Specifically, it studies individuals’ economic behavior and phenomena as well as how individual agents, such as , households, firms, organizations and government agencies, make -off choices that allocate limited resources among competing uses. Mathematical economics is an approach to economic analysis, in which the e- conomists make use of mathematical symbols in the statement of the problem and also draw upon known mathematical theorems to aid in reasoning. Since mathematical economics is merely an approach to economic analysis, it should not and does not differ from the nonmathematical approach to economic analysis in any fundamental way. The difference between these two approaches is that in the former, the assumptions and conclusions are stated in mathematical symbols rather than words and in the equations rather than sentences so that the interdependent relationship among economic variables and resulting conclusions are more rigorous and concise by using math-

1 ematical models and mathematical statistics/econometric methods. The study of economic social issues cannot simply involve real world in its experiment, so it not only requires theoretical analysis based on inherent logical inference and, more often than not, vertical and horizontal comparisons from the larger perspective of history so as to draw experience and lessons, but also needs tools of statistics and to do empirical quantitative analysis or test, the three of which are all indispensable. When conducting economic analysis or giving policy suggestions in the realm of modern eco- nomics, the theoretical analysis often combines theory, history, and statistics, presenting not only theoretical analysis of inherent logic and comparative analysis from the historical perspective but also empirical and quantitative analysis with statistical tools for exami- nation and investigation. Indeed, in the final analysis, all knowledge is history, all science is logics, and all judgment is statistics. As such, it is not surprising that mathematics and mathematical statistics/econometrics are used as the basic analytical tools, and also become the most important analytical tools in every field of economics. For those who study economics and conduct research, it is necessary to grasp enough knowledge of mathematics and mathematical statistics. There- fore, it is of great necessity to master sufficient mathematical knowledge if you want to learn economics well, conduct economic research and become a good .

1.2 Advantages of Mathematical Approach

Mathematical approach has the following advantages:

(1) It makes the language more precise and the statement of assumptions more clear, which can deduce many unnecessary debates resulting from inaccurate definitions.

(2) It makes the analytical logic more rigorous and clearly states the boundary, application scope and conditions for a conclusion to hold. Otherwise, the abuse of a theory may occur.

(3) Mathematics can help obtain the results that cannot be easily attained through intuition.

(4) It helps improve and extend the existing economic theories.

2 It is, however, noteworthy a good master of mathematics cannot guarantee to be a good economist. It also requires fully understanding the analytical framework and research methodologies of economics, and having a good intuition and insight of real economic environments and economic issues. The study of economics not only calls for the understanding of some terms, concepts and results from the perspective of mathematics (including geometry), but more importantly, even when those are given by mathematical language or geometric figure, we need to get to their economic meaning and the underlying profound economic thoughts and ideals. Thus we should avoid being confused by the mathematical formulas or symbols in the study of economics. All in all, to become a good economist, you need to be of original, creative and academic way of thinking.

3 Chapter 2

Economic Models

2.1 Ingredients of a Mathematical Model

A is merely a theoretical framework, and there is no inherent reason why it must mathematical. If the model is mathematical, however, it will usually consist of a set of equations designed to describe the structure of the model. By relating a number of variables to one another in certain ways, these equations give mathematical form to the set of analytical assumptions adopted. Then, through application of the relevant mathematical operations to these equations, we may seek to derive a set of conclusions which logically follow from those assumptions.

2.2 The Real-Number System

Whole numbers such as 1, 2, ··· are called positive numbers; these are the numbers most frequently used in counting. Their negative counterparts −1, −2, −3, ··· are called neg- ative integers. The number 0 (zero), on the other hand, is neither positive nor negative, and it is in that sense unique. Let us lump all the positive and negative integers and the number zero into a single category, referring to them collectively as the set of all integers. Integers of course, do not exhaust all the possible numbers, for we have fractions, such

2 5 7 as 3 , 4 , and 3 ,, which – if placed on a ruler – would fall between the integers. Also, − 1 − 2 we have negative fractions, such as 2 and 5 . Together, these make up the set of all fractions.

4 The common property of all fractional number is that each is expressible as a ratio of two integers; thus fractions qualify for the designation rational numbers (in this usage, rational means ratio-nal). But integers are also rational, because any integer n can be considered as the ratio n/1. The set of all fractions together with the set of all integers from the set of all rational numbers. Once the notion of rational numbers is used, however, there naturally arises the concept of irrational numbers – numbers that cannot be expressed as raios of a pair of integers. √ One example is 2 = 1.4142 ··· . Another is π = 3.1415 ··· . Each irrational number, if placed on a ruler, would fall between two rational numbers, so that, just as the fraction fill in the gaps between the integers on a ruler, the irrational number fill in the gaps between rational numbers. The result of this filling-in process is a continuum of numbers, all of which are so-called “real numbers.” This continuum constitutes the set of all real numbers, which is often denoted by the symbol R.

2.3 The Concept of Sets

A set is simply a collection of distinct objects. The objects may be a group of distinct numbers, or something else. Thus, all students enrolled in a particular economics course can be considered a set, just as the three integers 2, 3, and 4 can form a set. The object in a set are called the elements of the set. There are two alternative ways of writing a set: by enumeration and by description. If we let S represent the set of three numbers 2, 3 and 4, we write by enumeration of the elements, S = {2, 3, 4}. But if we let I denote the set of all positive integers, enumeration becomes difficult, and we may instead describe the elements and write I = {x|x is a positive integer}, which is read as follows: “I is the set of all x such that x is a positive integer.” Note that the braces are used enclose the set in both cases. In the descriptive approach, a vertical bar or a colon is always inserted to separate the general symbol for the elements from the description of the elements. A set with finite number of elements is called a finite set. Set I with an infinite number of elements is an example of an infinite set. Finite sets are always denumerable (or countable), i.e., their elements can be counted one by one in the sequence 1, 2, 3, ··· .

5 Infinite sets may, however, be either denumerable (set I above) or nondenumerable (for example, J = {x|2 < x < 5}). Membership in a set is indicated by the symbol ∈ (a variant of the Greek letter epsilon ϵ for “element”), which is read: “is an element of.”

If two sets S1 and S2 happen to contain identical elements,

S1 = {1, 2, a, b} and S2 = {2, b, 1, a} then S1 and S2 are said to be equal (S1 = S2). Note that the order of appearance of the elements in a set is immaterial. If we have two sets T = {1, 2, 5, 7, 9} and S = {2, 5, 9}, then S is subset of T , because each element of S is also an element of T . A more formal statement of this is: S is subset of T if and only if x ∈ S implies x ∈ T . We write S ⊆ T or T ⊇ S. It is possible that two sets happen to be subsets of each other. When this occurs, however, we can be sure that these two sets are equal. If a set have n elements, a total of 2n subsets can be formed from those elements. For example, the subsets of {1, 2} are: ∅, {1}, {2} and {1, 2}. If two sets have no elements in common at all, the two sets are said to be disjoint. The union of two sets A and B is a new set containing elements belong to A, or to B, or to both A and B. The union set is symbolized by A ∪ B (read: “A union B”).

Example 2.3.1 If A = {1, 2, 3}, B = {2, 3, 4, 5}, then A ∪ B = {1, 2, 3, 4, 5}.

The intersection of two sets A and B, on the other hand, is a new set which contains those elements (and only those elements) belonging to both A and B. The intersection set is symbolized by A ∩ B (read: “A intersection B”).

Example 2.3.2 If A = {1, 2, 3}, A = {4, 5, 6}, then A ∪ B = ∅.

In a particular context of discussion, if the only numbers used are the set of the first seven positive integers, we may refer to it as the universal set U. Then, with a given set, say A = {3, 6, 7}, we can define another set A¯ (read: “the complement of A”) as the set that contains all the numbers in the universal set U which are not in the set A. That is: A¯ = {1, 2, 4, 5}.

Example 2.3.3 If U = {5, 6, 7, 8, 9}, A = {6, 5}, then A¯ = {7, 8, 9}.

6 Properties of unions and intersections:

A ∪ (B ∪ C) = (A ∪ B) ∪ C A ∩ (B ∩ C) = (A ∩ B) ∩ C A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C) A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

2.4 Relations and Functions

An ordered pair (a, b) is a pair of mathematical objects. The order in which the objects appear in the pair is significant: the ordered pair (a, b) is different from the ordered pair (b, a) unless a = b. In contrast, a set of two elements is an unordered pair: the unordered pair {a, b} equals the unordered pair {b, a}. Similar concepts apply to a set with more than two elements, ordered triples, quadruples, quintuples, etc., are called ordered sets.

Example 2.4.1 To show the age and weight of each student in a class, we can form ordered pairs (a, w), in which the first element indicates the age (in years) and the second element indicates the weight (in pounds). Then (19, 128) and (128, 19) would obviously mean different things.

Suppose, from two given sets, x = {1, 2} and y = {3, 4}, we wish to form all the possible ordered pairs with the first element taken from set x and the second element taken from set y. The result will be the set of four ordered pairs (1,2), (1,4), (2,3), and (2,4). This set is called the Cartesian product, or direct product, of the sets x and y and is denoted by x × y (read “x cross y”). Extending this idea, we may also define the Cartesian product of three sets x, y, and z as follows: x × y × z = {(a, b, c)|a ∈ x, b ∈ y, c ∈ z} which is the set of ordered triples.

Example 2.4.2 If the sets x, y, and z each consist of all the real numbers, the Cartesian product will correspond to the set of all points in a three-dimension space. This may be denoted by R × R × R, or more simply, R3.

7 Example 2.4.3 The set {(x, y)|y = 2x} is a set of ordered pairs including, for example, (1,2), (0,0), and (-1,-2). It constitutes a relation, and its graphical counterpart is the set of points lying on the straight line y = 2x.

Example 2.4.4 The set {(x, y)|y ≤ x} is a set of ordered pairs including, for example, (1,0), (0,0), (1,1), and (1,-4). The set corresponds the set of all points lying on below the straight line y = x.

As a special case, however, a relation may be such that for each x value there exists only one corresponding y value. The relation in example 2.4.3 is a case in point. In that case, y is said to be a function of x, and this is denoted by y = f(x), which is read: “y equals f of x.” A function is therefore a set of ordered pairs with the property that any x value uniquely determines a y value. It should be clear that a function must be a relation, but a relation may not be a function. A function is also called a mapping, or transformation; both words denote the action of associating one thing with another. In the statement y = f(x), the functional notation f may thus be interpreted to mean a rule by which the set x is “mapped” (“transformed”) into the set y. Thus we may write f : x → y where the arrow indicates mapping, and the letter f symbolically specifies a rule of map- ping. In the function y = f(x), x is referred to as the argument of the function, and y is called the value of the function. We shall also alternatively refer to x as the independent variable and y as the dependent variable. The set of all permissible values that x can take in a given context is known as the domain of the function, which may be a subset of the set of all real numbers. The y value into which an x value is mapped is called the image of that x value. The set of all images is called the range of the function, which is the set of all values that the y variable will take. Thus the domain pertains to the independent variable x, and the range has to do with the dependent variable y.

2.5 Types of Function

A function whose range consists of only one element is called a constant function.

8 Example 2.5.1 The function y = f(x) = 7 is a constant function.

The constant function is actually a “degenerated” case of what are known as polyno- mial functions. A polynomial functions of a single variable has the general form

2 n y = a0 + a1x + a2x + ··· + anx in which each term contains a coefficient as well as a nonnegative-integer power of the variable x. Depending on the value of the integer n (which specifies the highest power of x), we have several subclasses of polynomial function:

Case of n = 0 : y = a0 [constant function]

Case of n = 1 : y = a0 + a1x [linear function] 2 Case of n = 2 : y = a0 + a1x + a2x [quadritic function] 2 3 Case of n = 3 : y = a0 + a1x + a2x + a3x [ cubic function]

A function such as x − 1 y = x2 + 2x + 4 in which y is expressed as a ratio of two polynomials in the variable x, is known as a rational function (again, meaning ratio-nal). According to the definition, any polynomial function must itself be a rational function, because it can always be expressed as a ratio to 1, which is a constant function. Any function expressed in terms of polynomials and or roots (such as square root) of polynomials is an algebraic function. Accordingly, the function discussed thus far are all √ algebraic. A function such as y = x2 + 1 is not rational, yet it is algebraic. However, exponential functions such as y = bx, in which the independent variable appears in the exponent, are nonalgebraic. The closely related logarithmic functions, such as y = logb x, are also nonalgebraic.

Rules of Exponents:

Rule 1: xm × xn = xm+n xm Rule 2: = xm−n (x ≠ 0) xn 1 Rule 3: x−n = xn

9 Rule 4: x0 = 1 (x ≠ 0)

1 √ Rule 5: x n = n x

Rule 6: (xm)n = xmn

Rule 7: xm × ym = (xy)m

2.6 Functions of Two or More Independent Variables

Thus for far, we have considered only functions of a single independent variable, y = f(x). But the concept of a function can be readily extended to the case of two or more independent variables. Given a function

z = g(x, y) a given pair of x and y values will uniquely determine a value of the dependent variable z. Such a function is exemplified by

2 2 z = ax + by or z = a0 + a1x + a2x + b1y + b2y

Functions of more than one variables can be classified into various types, too. For instance, a function of the form

y = a1x1 + a2x2 + ··· + anxn is a linear function, whose characteristic is that every variable is raised to the first power only. A quadratic function, on the other hand, involves first and second powers of one or more independent variables, but the sum of exponents of the variables appearing in any single term must not exceed two.

Example 2.6.1 y = ax2 + bxy + cy2 + dx + ey + f is a quadratic function.

2.7 Levels of Generality

In discussing the various types of function, we have without explicit notice introducing examples of functions that pertain to varying levels of generality. In certain instances, we have written functions in the form

y = 7, y = 6x + 4, y = x2 − 3x + 1 (etc.)

10 Not only are these expressed in terms of numerical coefficients, but they also indicate specifically whether each function is constant, linear, or quadratic. In terms of graphs, each such function will give rise to a well-defined unique curve. In view of the numerical nature of these functions, the solutions of the model based on them will emerge as numer- ical values also. The drawback is that, if we wish to know how our analytical conclusion will change when a different set of numerical coefficients comes into effect, we must go through the reasoning process afresh each time. Thus, the result obtained from specific functions have very little generality. On a more general level of discussion and analysis, there are functions in the form

y = a, y = bx + a, y = cx2 + bx + a (etc.)

Since parameters are used, each function represents not a single curve but a whole family of curves. With parametric functions, the outcome of mathematical operations will also be in terms of parameters. These results are more general. In order to attain an even higher level of generality, we may resort to the general function statement y = f(x), or z = g(x, y). When expressed in this form, the functions is not restricted to being either linear, quadratic, exponential, or trigonometric – all of which are subsumed under the notation. The analytical result based on such a general formulation will therefore have the most general applicability.

11 Chapter 3

Equilibrium Analysis in Economics

3.1 The Meaning of Equilibrium

Like any economic term, equilibrium can be defined in various ways. One definition here is that an equilibrium for a specific model is a situation that is characterized by a lack of tendency to change. Or more generally, it means that from the various feasible and available choices, choose “best” one according to a certain criterion, and the one being finally chosen is called an equilibrium. It is for this reason that the analysis of equilibrium is referred to as statics. The fact that an equilibrium implies no tendency to change may tempt one to conclude that an equilibrium necessarily constitutes a desirable or ideal state of affairs. This chapter provides two examples of equilibrium. One is the equilibrium attained by a market model under given demand and conditions. The other is the equilibrium of national income model under given conditions of consumption and investment patterns. We will use these two models as running examples throughout the course.

3.2 Partial Market Equilibrium - A Linear Model

In a static-equilibrium model, the standard problem is that of finding the set of values of the endogenous variables which will satisfy the equilibrium conditions of model.

12 Partial-Equilibrium Market Model

Partial-equilibrium market model is a model of determination in an isolated market for a commodity. Three variables:

Qd = the quantity demanded of the commodity;

Qs = the quantity supplied of the commodity;

P = the price of the commodity.

The Equilibrium Condition: Qd = Qs. The model is

Qd = Qs

Qd = a − bp (a, b > 0)

Qs = −c + dp (c, d > 0)

The slope of Qd = −b, the vertical intercept = a.

The slope of Qd = d, the vertical intercept = −c.

13 Note that, contrary to the usual practice, quantity rather than price has been plotted vertically in the figure. One way of finding the equilibrium is by successive elimination of variables and equa- tions through substitution.

From Qs = Qd, we have a − bp = −c + dp

and thus (b + d)p = a + c.

Since b + d ≠ 0, then the equilibrium price is

a + c p¯ = . b + d

The equilibrium quantity can be obtained by substitutingp ¯ into either Qs or Qd:

ad − bc Q¯ = . b + d

Since the denominator (b+d) is positive, the positivity of Q¯ requires that the numerator (ad − bc) > 0. Thus, to be economically meaningful, the model should contain the additional restriction that ad > bc.

3.3 Partial Market Equilibrium - A Nonlinear Model

The partial market model can be nonlinear. Suppose a model is given by

Qd = Qs

2 Qd = 4 − p

Qs = 4p − 1

As previously stated, this system of three equations can be reduced to a single equation by substitution. 4 − p2 = 4p − 1

14 or p2 + 4p − 5 = 0 which is a quadratic equation. In general, given a quadratic equation in the form

ax2 + bx + c = 0 (a ≠ 0), its two roots can be obtained from the quadratic formula: √ −b ± b2 − 4ac x¯ , x¯ = 1 2 2a where the “+” part of the “±” sign yieldsx ¯1 and “−” part yieldsx ¯2. Thus, by applying 2 the quadratic formulas to p + 4p − 5 = 0, we havep ¯1 = 1 andp ¯2 = −5, but only the first is economically admissible, as negative are ruled out.

The Graphical Solution

15 3.4 General Market Equilibrium

In the above, we have discussed methods of an isolated market, wherein the Qd and Qs of a commodity are functions of the price of that commodity alone. In the real world, there would normally exist many substitutes and complementary . Thus a more realistic model for the demand and supply function of a commodity should take into account the effects not only of the price of the commodity itself but also of the prices of other commodities. As a result, the price and quantity variables of multiple commodities must enter endogenously into the model. Thus, when several interdependent commodities are simultaneously considered, equilibrium would require the absence of excess demand, which is the difference between demand and supply, for each and every commodity included in the model. Consequently, the equilibrium condition of an n−commodity market model will involve n equations, one for each commodity, in the form

Ei = Qdi − Qsi = 0 (i = 1, 2, ··· , n)

where Qdi = Qdi(P1,P2, ··· ,Pn) and Qsi = Qsi(P1,P2, ··· ,Pn) are the demand and supply functions of commodity i, and (P1,P2, ··· ,Pn) are prices of commodities. Thus solving n equations for P :

Ei(P1,P2, ··· ,Pn) = 0

¯ ¯ we obtain the n equilibrium prices Pi – if a solution does indeed exist. And then the Qi may be derived from the demand or supply functions.

Two-Commodity Market Model

To illustrate the problem, let us consider a two-commodity market model with linear demand and supply functions. In parametric terms, such a model can be written as

Qd1 − Qs1 = 0

Qd1 = a0 + a1P1 + a2P2

Qs1 = b0 + b1P1 + b2P2

Qd2 − Qs2 = 0

16 Qd2 = α0 + α1P1 + α2P2

Qs2 = β0 + β1P1 + β2P2

By substituting the second and third equations into the first and the fifth and sixth equations into the fourth, the model is reduced to two equations in two variable:

(a0 − b0) + (a1 − b1)P1 + (a2 − b2)P2 = 0

(α0 − β0) + (α1 − β1)P1 + (α2 − β2)P2 = 0

If we let

ci = ai − bi (i = 0, 1, 2),

γi = αi − βi (i = 0, 1, 2),

the above two linear equations can be written as

c1P1 + c2P2 = −c0

γ1P1 + γ2P2 = −γ0 which can be solved by further elimination of variables. The solutions are − ¯ c2γ0 c0γ2 P1 = c1γ2 − c2γ1 − ¯ c0γ1 c1γ0 P2 = c1γ2 − c2γ1 For these two values to make sense, certain restrictions should be imposed on the model. First, we require the common denominator c1γ2 − c2γ1 ≠ 0. Second, to assure positivity, the numerator must have the same sign as the denominator.

Numerical Example

Suppose that the demand and supply functions are numerically as follows:

Qd1 = 10 − 2P1 + P2

Qs1 = −2 + 3P1

Qd2 = 15 + P1 − P2

17 Qs2 = −1 + 2P2

By substitution, we have

5P1 − P2 = 12

−P1 + 3P2 = 16 which are two linear equations. The solutions for the equilibrium prices and quantities ¯ ¯ ¯ ¯ are P1 = 52/14, P2 = 92/14, Q1 = 64/7, Q2 = 85/7. Similarly, for the n−commodities market model, when demand and supply functions are linear in prices, we can have n linear equations. In the above, we assume that an equal number of equations and unknowns has a unique solution. However, some very simple examples should convince us that an equal number of equations and unknowns does not necessarily guarantee the existence of a unique solution. For the two linear equations,   x + y = 8  x + y = 9 we can easily see that there is no solution. The second example shows a system has an infinite number of solutions:   2x + y = 12  4x + 2y = 24 These two equations are functionally dependent, which means that one can be derived from the other. Consequently, one equation is redundant and may be dropped from the system. Any pair (¯x, y¯) is the solution as long as (¯x, y¯) satisfies y = 12 − x. Now consider the case of more equations than unknowns. In general, there is no solution. But, when the number of unknowns equals the number of functional independent equations, the solution exists and is unique. The following example shows this fact.

2x + 3y = 58

y = 18

x + y = 20

Thus for simultaneous-equation model, we need systematic methods of testing the existence of a unique (or determinate) solution. There are our tasks in the following chapters.

18 3.5 Equilibrium in National-Income Analysis

The equilibrium analysis can be also applied to other areas of economics. As a simple example, we may cite the familiar Keynesian national-income model,

Y = C + I0 + G0 (equilibrium condition)

C = a + bY (theconsumption function) where Y and C stand for the endogenous variables national income and consumption expenditure, respectively, and I0 and G0 represent the exogenously determined investment and government expenditures, respectively. Solving these two linear equations, we obtain the equilibrium national income and consumption expenditure: a + I + G Y¯ = 0 0 1 − b a + b(I + G ) C¯ = 0 0 1 − b

19 Chapter 4

Linear Models and Matrix Algebra

From the last chapter we have seen that for the one-commodity model, the solutions for P¯ and Q¯ are relatively simple, even though a number of parameters are involved. As more and more commodities are incorporated into the model, such solutions formulas quickly become cumbersome and unwieldy. We need to have new methods suitable for handing a large system of simultaneous equations. Such a method is found in matrix algebra. Matrix algebra can enable us to do many things. (1) It provides a compact way of writing an equation system, even an extremely large one. (2) It leads to a way of testing the existence of a solution by evaluation of a determinant – a concept closely related to that of a matrix. (3) It gives a method of finding that solution if it exists.

4.1 Matrix and Vectors

In general, a system of m linear equations in n variables (x1, x2, ··· , xn) can be arranged into such formula

a11x1 + a12x2 + ··· a1nxn = d1

a21x1 + a22x2 + ··· a2nxn = d2 (3.1) ···

am1x1 + am2x2 + ··· amnxn = dm where the double-subscripted symbol aij represents the coefficient appearing in the ith equation and attached to the jth variable xj and dj represents the constant term in the jth equation.

20 Example 4.1.1 The two-commodity linear market model can be written – after elimi- nating the quantity variables – as a system of two linear equations.

c1P1 + c2P2 = −c0

γ1P1 + γ2P2 = −γ0

Matrix as Arrays

There are essentially three types of ingredients in the equation system (3.1). The first is the set of coefficients aij; the second is the set of variables x1, x2, ··· , xn; and the last is the set of constant terms d1, d2, ··· , dm. If we arrange the three sets as three rectangular arrays and label them, respectively, as A, x, and d, then we have   ···  a11 a12 a1n     ···   a21 a22 a2n  A =    ············   

am1 am2 ··· amn    x1       x2  x =   (3.2)  ···   

xn    d1       d2  d =    ···   

dm

Example 4.1.2 Given the linear-equation system

6x1 + 3x2 + x3 = 22

x1 + 4x2 − 2x3 = 12

4x1 − x2 + 5x3 = 10

We can write    6 3 1     −  A =  1 4 2  4 −1 5

21    x1    x =    x2 

x3    22      d =  12  10 Each of three arrays given above constitutes a matrix.

A matrix is defined as a rectangular array of numbers, parameters, or variables. As a shorthand device, the array in matrix A can be written more simple as

A = [aij]n×m (i = 1, 2, ··· , n; j = 1, 2, ··· , m)

Vectors as Special Matrices

The number of rows and number of columns in a matrix together define the dimension of the matrix. For instance, A is said to be of dimension m × n. In the special case where m = n, the matrix is called a square matrix. If a matrix contains only one column (row), it is called a column (row) vector. For notation purposes, a row vector is often distinguished from a column vector by the use of a primed symbol. ′ x = [x1, x2, ··· , xn]

Remark 4.1.1 A vector is merely an ordered n-tuple and as such it may be interpreted as a point in an n-dimensional space.

With the matrices defined in (3.2), we can express the equations system (3.1) simply as Ax = d.

However, the equation Ax = d prompts at least two questions. How do we multiply two matrices A and x? What is meant by the equality of Ax and d? Since matrices involve whole blocks of numbers, the familiar algebraic operations defined for single numbers are not directly applicable, and there is need for new set of operation rules.

22 4.2 Matrix Operations

The Equality of Two Matrices

A = B if and only if aij = bij for all i = 1, 2, ··· , n, j = 1, 2, ··· , m.

Addition and Subtraction of Matrices

A + B = [aij] + [bij] = [aij + bij] i.e., the addition of A and B is defined as the addition of each pair of corresponding elements.

Remark 4.2.1 Two matrices can be added (equal) if and only if they have the same dimension.

Example 4.2.1       4 9 2 0 6 9   +   =   2 1 0 7 2 8

Example 4.2.2      

a11 a12 a13 b11 b12 b13 a11 + b11 a12 + b12 a13 + b13   +   =   a21 a22 a23 b21 b22 b23 a21 + b21 a22 + b22 a23 + b23

The Subtraction of Matrices:

A − B is defined by

[aij] − [bij] = [aij − bij].

Example 4.2.3       19 3 6 8 13 −5   −   =   2 0 1 3 1 −3

23 Scalar Multiplication:

λA = λ[aij] = [λaij] i.e., to multiply a matrix by a number is to multiply every element of that matrix by the given scalar.

Example 4.2.4     3 −1 21 −7 7   =   0 5 0 35

Example 4.2.5    

a11 a12 a13 −a11 −a12 −a13 −1   =   a21 a22 a23 −a21 −a22 −a23

Multiplication of Matrices:

Given two matrices Am×n and Bp×q, the conformability condition for multiplication AB is that the column dimension of A must be equal to be the row dimension of B, i.e., the matrix product AB will be defined if and only if n = p. If defined, the product AB will have the dimension m × q. The product AB is defined by AB = C ∑ ··· n with cij = ai1b1j + ai2b2j + + ainbnj = l=1 ailblj

Example 4.2.6      

a11 a12 b11 b12 a11b11 + a12b21 a11b12 + a12b22     =   a21 a22 b21 b22 a21b11 + a22b21 a21b12 + a22b22

Example 4.2.7         3 5 −1 0 −3 + 20 35 17 35     =   =   4 6 4 7 −4 + 24 42 20 42

24 Example 4.2.8 ′ ′ u = [u1, u2, ··· , un] and v = [v1, v2, ··· , vn] ∑n ′ u v = u1v1 + u2v2 + ··· + unvn = uivi i=1 This can be described by using the concept of the inner product of two vectors u and v.

′ v · v = u1v1 + u2v2 + ··· + unvn = u v

Example 4.2.9     ···  a11 a12 a1n  x1       ···     a21 a22 a2n  x2  A =   x =   ············  ···    

am1 am2 ··· amn xn   ···  a11x1 + a12x2 + + a1nxn     ···   a21x1 + a22x2 + + a2nxn  Ax =    ···   

am1x1 + am2x2 + ··· + amnxn   3 Example 4.2.10 Given u =   and v′ = [1, 4, 5] 2     3 × 1 3 × 4 3 × 5 3 12 15 uv′ =   =   2 × 1 2 × 4 2 × 5 2 8 10

It is important to distinguish the meaning of uv′ (a matrix larger than 1 × 1) and u′v (a 1 × 1 matrix, or a scalar).

4.3 Linear Dependance of Vectors

A set of vectors v1, ··· , vn is said to be linearly dependent if and only if one of them can be expressed as a linear combination of the remaining vectors; otherwise they are linearly independent.

25 Example 4.3.1 The three vectors       2 1 4 v1 = , v2 = , v3 = 7 8 5 are linearly dependent since v3 is a linear combination of v1 and v2;        6   2  4 3v1 − 2v2 = − = = v3. 21 16 5 or 3v − 2v − v = 0   1 2 3 0 where 0 =   represents a zero vector. 0

Example 4.3.2 The three vectors       2 3 1 v1 = , v2 = , v3 = 3 1 5 are linearly dependent since v1 is a linear combination of v2 and v3:

1 1 v = v + v 1 2 2 2 3

A equivalent definition of linear dependence is: a set of m−vectors v1, v2, ··· , vn is linearly dependent if and only if there exists a set of scalars k1, k2, ··· , kn (not all zero) such that ∑n kivi = 0 i=1

If this equation can be satisfied only when ki = 0 for all i, these vectors are linearly independent.

4.4 Commutative, Associative, and Distributive Laws

The commutative and associative laws of matrix can be stated as follows: Commutative Law: A + B = B + A.

Proof: A + B = [aij] + [bij] = [aij + bij] = [bij + aij] = [bij] + [aij] = B + A

26 Associative Law: (A + B) + C = A + (B + C)

Proof: (A + B) + C = ([aij] + [bij]) + [cij] = [aij + bij] + [cij] = [aij + bij + cij] =

[aij + (bij + cij)] = [aij] + ([bij + cij]) = [aij] + ([bij] + [cij]) = A + (B + C)

Matrix Multiplication

Matrix multiplication is not commutative, that is

AB ≠ BA

Even when AB is defined, BA may not be; but even if both products are defined, AB = BA may not still hold.       1 2 0 −1 12 13 Example 4.4.1 Let A =  , B =  , then AB =  , but BA =   3 4 6 7 24 25 −3 −4  . 27 40

The scalar multiplication of a matrix does obey. The commutative law:

kA = Ak if k is a scalar. Associative Law:(AB)C = A(BC) if A is m × n, B is n × p, and C is p × q. Distributive Law

A(B + C) = AB + AC [premultiplication by A]

(B + C)A = BA + CA [postmultiplication by A]

4.5 Identity Matrices and Null Matrices

Identity matrix is a square matrix with ones in its principal diagonal and zeros everywhere else. It is denoted by I or In in which n indicates its dimension.

27 Fact 1: Given an m × n matrix A, we have

ImA = AIn = A

Fact 2:

Am×nInBn×p = (AI)B = AB

Fact 3:

k (In) = In

Idempotent Matrices: A matrix A is said to be idempotent if AA = A. Null Matrices: A null–or zero matrix–denoted by 0, plays the role of the number 0. A null matrix is simply a matrix whose elements are all zero. Unlike I, the zero matrix is not restricted to being square. Null matrices obey the following rules of operation.

Am×n + 0m×n = Am×n

Am×n0n×p = 0m×p

0q×mAm×n = 0q×n

Remark 4.5.1 (a) CD = CE does not imply D = E. For instance,       2 3 1 1 −2 1 C =   D =   E =   6 9 1 2 3 2

We can see that   5 8 CD = CE   15 24 even though D ≠ E. (b) Even if A and B ≠ 0, we can still have AB = 0.     2 4 −2 4 Example 4.5.1 A =  , B =  . 1 2 1 −2

28 4.6 Transposes and Inverses

The transpose of a matrix A is a matrix which is obtained by interchanging the rows and columns of the matrix A. We say that B = [bij]n×m is the transpose of A = [aij]m×n if ′ T aji = bij for all i = 1, ··· , n and j = 1, ··· , m. Usually transpose is denoted by A or A . Recipe - How to Find the Transpose of a Matrix: The transpose A′ of A is obtained by making the columns of A into the rows of A′.   3 8 −9 Example 4.6.1 For A =   1 0 4    3 1   ′   A =  8 0 −9 4

Thus, by definition, if a matrix A is m × n, then its transpose A′ must be n × m.

Example 4.6.2   1 0 4     D = 0 3 7 4 7 2   1 0 4   ′   Its transpose D = 0 3 7 = D 4 7 2

A matrix A is said to be symmetric if A′ = A. A matrix A is called anti-symmetric (or skew-symmetric) if A′ = −A. A matrix A is called orthogonal if A′A = I.

Properties of Transposes:

a) (A′)′ = A

b) (A + B)′ = A′ + B′

c) (αA)′ = αA′, where α is a real number.

d) (AB)′ = B′A′

29 The property d) states that the transpose of a product is the product of the transposes in reverse order.

Inverses and Their Properties

For a given square matrix A, A′ is always derivable. On the other hand, its inverse matrix may or may not exist. The inverse of A, denoted by A−1, is defined only if A is a square matrix, in which case the inverse is the matrix that satisfies the condition

AA−1 = A−1A = I

Remark 4.6.1 The following statements are true:

1. Not every square matrix has an inverse–squareness is a necessary but not sufficient condition for the existence of an inverse. If a square matrix A has an inverse, A is said to be nonsingular. If A possesses no inverse, it is said to be a singular matrix.

2. If A is nonsingular, then A and A−1 are inverse of each other, i.e., (A−1)−1 = A.

3. If A is n × n, then A−1 is also n × n.

4. The inverse of A is unique.

Proof. Let B and C both be inverses of A. Then

B = BI = BAC = IC = C.

5. AA−1 = I implies A−1A = I.

Proof. We need to show that if AA−1 = I, and if there is a matrix B such that BA = I, then B = A−1. Postmultiplying both sides of BA = I by A−1, we have BAA−1 = A−1 and thus B = A−1.     3 1 2 −1   1   Example 4.6.3 Let A = and B = 6 . Then 0 2 0 3         3 1 2 −1 1 6 1 1 AB =     =   =   0 2 0 3 6 6 6 1 So B is the inverse of A.

30 6. Suppose A and B are nonsingular matrices with dimension n × n. (a) (AB)−1 = B−1A−1 (b) (A′)−1 = (A−1)′

Inverse Matrix and Solution of Linear-Equation System

The application of the concept of inverse matrix to the solution of a simultaneous-equation is immediate and direct. Consider Ax = d

If A is a nonsingular matrix, then premultiplying both sides of Ax = d, we have

A−1Ax = A−1d

So, x = A−1d is the solution of Ax = d and the solution is unique since A−1 is unique. Methods of testing the existence of the inverse and of its calculation will be discussed in the next chapter.

31 Chapter 5

Linear Models and Matrix Algebra (Continued)

In chapter 4, it was shown that a linear-equation system can be written in a compact notation. Furthermore, such an equation system can be solved by finding the inverse of the coefficient matrix, provided the inverse exists. This chapter studies how to test for the existence of the inverse and how to find that inverse.

5.1 Conditions for Nonsingularity of a Matrix

As was pointed out earlier, the squareness condition is necessary but not sufficient for the existence of the inverse A−1 of a matrix A.

Conditions for Nonsingularity

When the squareness condition is already met, a sufficient condition for the nonsingularity of a matrix is that its rows (or equivalently, its columns) are linearly independent. In fact, the necessary and sufficient conditions for nonsingularity are that the matrix satisfies the squareness and linear independence conditions.

32 An n × n coefficient matrix A can be considered an ordered set of row vectors:     ··· ′ a11 a12 a1n  v1       ···   ′  a21 a22 a2n  v2  A =   =   ·········  ···     ··· ′ an1 an2 ann vn

′ ··· ··· where vi = [ai1, ai2, , ain], i = 1, 2, , n. For the rows to be linearly independent, for ∑ n any set of scalars ki, i=1 kivi = 0 if and only if ki = 0 for all i.

Example 5.1.1 For a given matrix,   3 4 5      0 1 2  , 6 8 10

′ ′ ′ since v3 = 2v1 + 0v2, so the matrix is singular.   1 2 Example 5.1.2 B =   is nonsingular. 3 4   −2 1 Example 5.1.3 C =   is singular. 6 −3

Rank of a Matrix

Even though the concept of row independence has been discussed only with regard to square matrices, it is equally applicable to any m×n rectangular matrix. If the maximum number of linearly independent rows that can be found in such a matrix is γ, then matrix is said to be of rank γ. The rank also tells us the maximum number of linearly independent columns in the matrix. Then rank of an m × n matrix can be at most m or n, whichever is smaller. By definition, an n × n nonsingular matrix A has n linearly independent rows (or columns); consequently it must be of rank n. Conversely, an n × n matrix having rank n must be nonsingular.

33 5.2 Test of Nonsingularity by Use of Determinant

To determine whether a square matrix is nonsingular, we can make use of the concept of determinant.

Determinant and Nonsingularity

The determinant of a square matrix A, denoted by |A|, is a uniquely defined scalar associated with that matrix. Determinants are defined only for square matrices. For a 2 × 2 matrix:  

a11 a12 A =   , a21 a22 its determinant is defined as follows:

a11 a12 |A| = = a11a22 − a12a21

a21 a22 In view of the dimension of matrix A, |A| as defined in the above is called a second- order determinant.     10 4 3 5 Example 5.2.1 Given A =   and B =  , then 8 5 0 −1

10 4 |A| = = 50 − 32 = 18 8 5

3 5 |B| = = −3 − 5 × 0 = −3 0 −1   2 6 Example 5.2.2 A =  . Then its determinant 8 24

2 6 |A| = = 2 × 24 − 6 × 8 = 48 − 48 = 0 8 24

This example shows that the determinant is equal to zero if its rows are linearly dependent. As will be seen, the value of a determinant |A| can serve as a criterion for testing the linear independence of the rows (hence nonsingularity) of matrix A but also as an input in the calculation of the inverse A−1, if it exists.

34 Evaluating a Third-Order Determinant

For a 3 × 3 matrix A, its determinants have the value

a11 a12 a13 a a a a a a | | 22 23 − 21 23 21 22 A = a21 a22 a23 = a11 a12 + a13 a a a a a a 32 33 31 33 31 32 a31 a32 a33

= a11a22a33 − a11a23a32 − a12a21a33 + a12a23a31 + a13a21a32 − a13a22a31

We can use the following diagram to calculate the third-order determinant.

Example 5.2.3

2 1 3

× × × × × × − × × − × × − × × 4 5 6 = 2 5 9 + 1 6 7 + 4 8 3 3 5 7 1 4 9 6 8 2

7 8 9

= 90 + 42 + 96 − 105 − 36 − 96 = −9

35 Example 5.2.4

0 1 2

× × × × × × − × × − × × − × × 3 4 5 = 0 4 8 + 1 5 6 + 3 7 2 2 4 6 1 3 8 5 7 0

6 7 8

= 0 + 30 + 42 − 48 − 24 − 0 = 0

Example 5.2.5

− 1 2 1

− × × × × × × − × × − × × − × × − 0 3 2 = 1 3 2 + 2 2 1 + 0 0 1 1 3 1 2 0 2 2 0 ( 1)

1 0 2

= −6 + 4 + 0 − 3 − 0 − 0 = −5

The method of cross-diagonal multiplication provides a handy way of evaluating a third-order determinant, but unfortunately it is not applicable to determinants of orders higher than 3. For the latter, we must resort to the so-called “Laplace expansion” of the determinant.

Evaluating an nth-Order Determinant by Laplace Expansion

The minor of the element aij of a determinant |A|, denoted by |Mij| can be obtained by deleting the ith and jth column of the determinant |A|, the minors of a , a and a are 11 12 13

a22 a23 a21 a23 a21 a21 |M11| = , |M12| = , |M13| = .

a32 a33 a31 a33 a31 a32 A concept closely related to the minor is that of the cofactor. A cofactor, denoted by

Cij, is a minor with a prescribed algebraic sign attached to it. Formally, it is defined by   −|M | if i + j is odd; | | − i+j| | ij Cij = ( 1) Mij =  |Mij| if i + j is even.

Thus, if the sum of the two subscripts i and j in Mij is even, then |Cij| = |Mij|. If it is odd, then |Cij| = −|Mij|. Using these new concepts, we can express a third-order determinant as

|A| = a11|M11| − a12|M12| + a13|M13|

= a11|C11| + a12|C12| + a13|C13|

36 The Laplace expansion of a third-order determinant serves to reduce the evaluation problem to one of evaluating only certain second-order determinants. In general, the Laplace expansion of an nth-order determinant will reduce the problem to one of evalu- ating n cofactors, each of which is of the (n − 1)th order, and the repeated application of the process will methodically lead to lower and lower orders of determinants, eventually culminating in the basic second-order determinants. Then the value of the original deter- minant can be easily calculated. Formally, the value of a determinant |A| of order n can be found by the Laplace expansion of any row or any column as follows:

∑n |A| = aij|Cij| [expansion by the ith row] j=1 ∑n = aij|Cij| [expansion by the jth column] i=1 Even though one can expand |A| by any row or any column, as the numerical calcula- tion is concerned, a row or column with largest number of 0’s or 1’s is always preferable for this purpose, because a 0 times its cofactor is simply 0.

5 6 1

| | Example 5.2.6 For the A = 2 3 0 , the easiest way to expand the determinant is

7 −3 0 by the third column, which consists of the elements 1, 0, and 0. Thus,

1+3 2 3 |A| = 1 × (−1) = −6 − 21 = −27 7 −3

Example 5.2.7

0 0 0 1 0 0 2

0 0 2 0 1+4 = 1 × (−1) 0 3 0 = −1 × (−24) = 24

0 3 0 0 4 0 0 4 0 0 0

37 Example 5.2.8

··· ··· a11 a12 a1n a22 a23 a2n

··· ··· 0 a22 a2n 1+1 0 a33 a3n = a11 × (−1) ············ ············

0 0 ··· a 0 0 ··· a nn nn

··· a33 a34 a3n

··· 1+1 0 a44 a4n = a11 × a22 × (−1) ············

0 0 ··· ann

= ··· = a11 × a22 × sann

5.3 Basic Properties of Determinants

Property I. The determinant of a matrix A has the same value as that of its transpose A′, i.e., |A| = |A′|.

Example 5.3.1

a b |A| = = ad − bc c d

′ a c |A | = = ad − bc = |A| b d

Property II. The interchange of any two rows (or any two columns) will alter the sign, but not the numerical value of the determinant.

a b Example 5.3.2 = ad − bc, but the interchange of the two rows yields c d

c d = bc − ad = −(ad − bc) a b

38 Property III. The multiplication of any one row (or one column) by a scalar k will change the value of the determinant k-fold, i.e., for |A|,

··· ··· a11 a12 a1n a11 a12 a1n

············ ············

··· ··· | | kai1 kai2 kain = k ai1 ai2 ain = k A

············ ············

an1 an2 ··· ann an1 an2 ··· ann

In contrast, the factoring of a matrix requires the presence of a common divisor for all its elements, as in     ··· ··· a11 a12 a1n ka11 ka12 ka1n      ···   ···  a21 a22 a2n ka21 ka22 ka2n k   =   ·········   ·········     

an1 an2 ··· ann kan1 kan2 ··· kann

Property IV. The addition (subtraction) of a multiple of any row (or column) to (from) another row (or column) will leave the value of the determinant unaltered.

Example 5.3.3

a b a b = a(d + kb) − b(c + ka) = ad − bc = c + ka d + kb c d

Example 5.3.4

a b b b

b a b b 3 = (a + 3b)(a − b)

b b a b

b b b a

Example 5.3.5

1 2 3 4

2 3 4 1 = −168

3 4 1 2

4 1 2 3

39 Property V. If one row (or column) is a multiple of another row (or column), the value of the determinant will be zero.

Example 5.3.6

ka kb = kab − kab = 0 a b

Example 5.3.7

− − 2 5 1 3

− 1 9 13 7 = 312 − − 3 1 5 5

2 8 −7 −10

Remark: Property V is a logic consequence of Property IV. Property V. If A and B are both square matrices, then |AB| = |A||B|. The basic properties just discussed are useful in several ways. For one thing, they can be of great help in simplifying the task of evaluating determinants. By subtracting multi- pliers of one row (or column) from another, for instance, the elements of the determinant may be reduced to much simpler numbers. If we can indeed apply these properties to transform some row or column into a form containing mostly 0’s or 1’s, Laplace expansion of the determinant will become a much more manageable task. Recipe - How to Calculate the Determinant:

1. The multiplication of any one row (or column) by a scalar k will change the value of the determinant k-fold.

2. The interchange of any two rows (columns) will alter the sign but not the numerical value of the determinant.

3. If a multiple of any row is added to (or subtracted from) any other row it will not change the value or the sign of the determinant. The same holds true for columns (i.e. the determinant is not affected by linear operations with rows (or columns)).

4. If two rows (or columns) are proportional, i.e., they are linearly dependent, then the determinant will vanish.

40 5. The determinant of a triangular matrix is a product of its principal diagonal elements.

Using these rules, we can simplify the matrix (e.g. obtain as many zero elements as possible) and then apply Laplace expansion.

Determinantal Criterion for Nonsingularity

Our present concern is primarily to link the linear dependence of rows with the vanishing of a determinant. By property I, we can easily see that row independence is equivalent to column independence. Given a linear-equation system Ax = d, where A is an n × n coefficient matrix, we have

|A| ̸= 0 ⇔ A is row (or column) independent

⇔ rank(A) = n

⇔ A is nonsingular

⇔ A−1 exists

⇔ a unique solution xe = A−1d exists.

Thus the value of the determinant of A provides a convenient criterion for testing the nonsingularity of matrix A and the existence of a unique solution to the equation system Ax = d.

Rank of a Matrix Redefined

The rank of a matrix A was earlier defined to be the maximum number of linearly inde- pendent rows in A. In view of the link between row independence and the nonvanishing of the determinant, we can redefine the rank of an m×n matrix as the maximum order of a nonvanishing determinant that can be constructed from the rows and columns of that matrix. The rank of any matrix is a unique number. Obviously, the rank can at most be m or n for a m×n matrix A, whichever is smaller, because a determinant is defined only for a square matrix. Symbolically, this fact can be expressed as follows: γ(A) ≤ min{m, n}.

41    1 3 2     Example 5.3.8 rank 2 6 4 = 2. −5 7 1

The first two rows are linearly dependent, but the last two are independent, therefore the maximum number of linearly independent rows is equal to 2.

Properties of the rank:

1) The column rank and the row rank of a matrix are equal.

2) rank(AB) ≤ min{rank(A); rank(B)}.

3) rank(A) = rank(AA′) = rank(A′A).

5.4 Finding the Inverse Matrix

If the matrix A in the linear-equation system Ax = d is nonsingular, then A−1 exists, and the solution of the system will bex ¯ = A−1d. We have learned to test the nonsingularity of A by the criterion |A| ̸= 0. The next question is how we can find the inverse A−1 if A does pass that test.

Expansion of a Determinant by Alien Cofactors

We have known that the value of a determinant |A| of order n can be found by the Laplace expansion of any row or any column as follows; ∑n |A| = aij|Cij| [expansion by the ith row] j=1 ∑n = aij|Cij| [expansion by the jth column] i=1 ′ ′ Now what happens if we replace aij by ai′j for i ≠ i or by aij′ for j ≠ j . Then we have the following important property of determinants. Property VI. The expansion of a determinant by alien cofactors (the cofactors of a “wrong” row or column) always yields a value of zero. That is, we have ∑n ′ ′ ai′j|Cij| = 0 (i ≠ i ) [expansion by the i th row and use of cofactors of ith row] j=1

42 ∑n ′ ′ aij′ |Cij| = 0 (j ≠ j ) [expansion by the j th column and use of cofactors of jth column] j=1 The reason for this outcome lies in the fact that the above formula can be considered as the result of the regular expansion by the ith row (jth column) and i′th row (j′th column) are identical.

Example 5.4.1 For the determinant

a11 a12 a13

| | A = a21 a22 a23

a31 a32 a33

Consider another determinant

a11 a12 a13

| ∗| A = a11 a12 a13

a31 a32 a33 If we expand |A∗| by the second row, then we have ∑3 ∗ 0 = |A | = a11|C21| + a12|C22| + a13|C23| = a1j|C2j| j=1

Matrix Inversion

Property VI is of finding the inverse of a matrix. For a n × n matrix A:   ··· a11 a12 a1n    ···  a21 a22 a2n A =   ·········   

an1 an2 ··· ann

Since each element of A has a cofactor |Cij|, we can form a matrix of cofactors by replacing each element aij with its cofactor |Cij|. Such a cofactor matrix C = [|Cij|] is also n × n. For our present purpose, however, the transpose of C is of more interest. This transpose C′ is commonly referred to as the adjoint of A and is denoted by adj A. Thus,

| | | | · · · | | C11 C21 Cn1

| | | | · · · | | ′ C12 C22 Cn2 C ≡ adj A ≡ ············

|C1n| |C2n| · · · |Cnn|

43 By utilizing the formula for the Laplace expansion and Property VI, we have ∑ ∑ ∑  n | | n | | · · · n | |  j=1 a1j C1j j=1 a1j C2j j=1 a1j Cnj  ∑ ∑ ∑   n | | n | | · · · n | | ′  j=1 a2j C1j j=1 a2j C2j j=1 a2j Cnj  AC =    ··· ··· ··· ···  ∑ ∑ ∑  n a |C | n a |C | · · · n a |C |  j=1 nj 1j j=1 nj 2j j=1 nj nj | | ···  A 0 0     | | · · ·   0 A 0  =   ············   0 0 · · · |A|

= |A|In

Thus, by the uniqueness of A−1 of A, we know

C′ adj A A−1 = = |A| |A|

Now we have found a way to invert the matrix A.

Remark 5.4.1 The general procedures for finding the inverse of a square A are:

(1) find |A|;

(2) find the cofactors of all elements of A and form C = [|Cij|];

(3) form C′ to get adj A; and

−1 adj A (4) determine A by |A| .   a b In particular, for a 2 × 2 matrix A =  , we have the following formula: c d

−1 adj A A = | | A   − 1  d b = − ad cb −c a

This is a very useful formula.   3 2 Example 5.4.2 A =   1 0

44 The inverse of A is given by     1 0 −2 0 1 A−1 =   =   −2 − 1 − 3 1 3 2 2     − − 2 4 5  3 4 15   − 1   Example 5.4.3 A = 0 3 0. |A| = −9. A 1 = −  0 −3 0    9   1 0 1 −3 4 6   − 4 1 1     Example 5.4.4 Find the inverse of B = 0 3 2 . 3 0 7 Since |B| = 99 ≠ 0, B−1 exists. The cofactor matrix is  

 3 2 0 2 0 3   −       0 7 3 7 3 0    21 6 −9  − −     1 1 4 1 4 1    − −  = −7 31 3  .      0 7 3 7 3 0    5 −8 12  − −   1 1 4 1 4 1  − 3 2 0 2 0 3

Then   −  21 7 5     −  adj B =  6 31 8 −9 3 12 Hence   −  21 7 5  − 1   B 1 =  6 31 −8 99   −9 3 12

5.5 Cramer’s Rule

The method of matrix inversion just discussed enables us to derive a convenient way of solving a linear-equation system, known as Cramer’s rule.

45 Derivation of the Rule

Given an equation system Ax = d, the solution can be written as

1 x¯ = A−1d = (adj A)d |A|

provided A is nonsingular. Thus,     | | | | · · · | |  C11 C21 Cn1   d1      | | | | · · · | |   1  C12 C22 Cn2   d2  x¯ =     |A|  ············  ···    

|C1n| |C2n| · · · |Cnn| dn ∑  n | |  i=1 di Ci1  ∑   n | | 1  i=1 di Ci2  =   |A|  ···  ∑  n | | i=1 di Cin

That is, thex ¯i is given by

1 ∑n x¯ = d |C | j |A| i ij i=1  ··· ··· a11 a12 d1 a1n    ··· ···  1 a21 a22 d2 a2n =   |A| ············   

an1 an2 ··· dn ··· ann 1 = |A | |A| j where |Aj| is obtained by replacing the jth column of |A| with constant terms d1, ··· , dn. This result is the statement of Cramer’s rule.

Example 5.5.1 Let us solve      

2 3 x1 12     =   4 −1 x2 10

for x1, x2 using Cramer’s rule:

|A| = −14, |A1| = −42, |A2| = −28,

46 therefore −42 −28 x = = 3, x = = 2. 1 −14 2 −14 Example 5.5.2

5x1 + 3x2 = 30

6x1 − 2x2 = 8

5 3 |A| = = −28 6 −2

30 3 |A1| = = −84 8 −2

5 30 |A2| = = −140 6 8 Therefore, by Cramer’s rule, we have

|A | −84 |A | −140 x¯ = 1 = = 3 andx ¯ = 2 = = 5. 1 |A| −28 2 |A| −28 Example 5.5.3

x1 + x2 + x3 = 0

12x1 + 2x2 − 3x3 = 5

3x1 + 4x2 + x3 = −4        1 1 1  x1  0         −    =   12 2 3 x2  5 

3 4 1 x3 −4

|A| = 35, |A3| = 35, x3 = 1.

Example 5.5.4

7x1 − x2 − x3 = 0

10x1 − 2x2 + x3 = 8

6x1 + 3x2 − 2x3 = 7

|A| = −61, |A1| = −61, |A2| = −183, |A3| = −244.

x¯1 = 1, x¯2 = 3, x¯3 = 4.

47 Note on Homogeneous-Equation System

A linear-equation system Ax = d is said to be a homogeneous-equation system if d = 0, i.e., if Ax = 0. If |A| ̸= 0,x ¯ = 0 is a unique solution of Ax = 0 sicnex ¯ = A−10 = 0. This is a “trivial solution.” Thus, the only way to get a nontrivial solution from the homogeneous- equation system is to have |A| = 0, i.e., A is singular. In this case, Cramer’s rule is not applicable. Of course, this does not mean that we cannot obtain solutions; it means only that we cannot get a unique solution. In fact, it has an infinite number of solutions.

Example 5.5.5

a11x1 + a12x2 = 0

a21x2 + a22x2 = 0

If |A| = 0, then its rows are linearly dependent. As a result, one of two equations is redundant. By deleting, say, the second equation, we end up with one equation with two variables. The solutions are a12 x¯1 = − x2 if a11 ≠ 0 a11

Overview on Solution Outcomes for a linear-Equation System

For a linear-equation system Ax = d, our discussion can be summarized as in the following table. A necessary and sufficient condition for the existence of solution for a linear-equation system Ax = d is that the rank of A and the rank of the added matrix [A; d] are the same, i.e., r(A) = r([A; d]).

48 5.6 Application to Market and National-Income Mod- els

Market Model:

The two-commodity model described in chapter 3 can be written as follows:

c1P1 + c2P2 = −c0

γ1P1 + γ2P2 = −γ0

Thus

c1 c2 |A| = = c1γ2 − c2γ1

γ1 γ2

− c0 c2 |A1| = = c1γ2 − c2γ1

γ0 γ2

− c1 c0 |A2| = = c1γ0 − c0γ1

γ1 γ0 Thus the equilibrium is given by | | − | | − ¯ A1 c2γ0 c0γ2 ¯ A2 c0γ1 c1γ0 P1 = = and P2 = = |A| c1γ2 − c2γ1 |A| c1γ2 − c2γ1

General Market Equilibrium Model:

Consider a market for three goods. Demand and supply for each good are given by:   D1 = 5 − 2P1 + P2 + P3  S1 = −4 + 3P1 + 2P2

  D2 = 6 + 2P1 − 3P2 + P3  S2 = 3 + 2P2

  D3 = 20 + P1 + 2P2 − 4P3  S3 = 3 + P2 + 3P3

where Pi is the price of good i; i = 1; 2; 3.

49 The equilibrium conditions are: Di = Si; i = 1; 2; 3 , that is   −  5P1 + P2 P3 = 9 −2P + 5P − P = 3  1 2 3  −P1 − P2 + 7P3 = 17

This system of linear equations can be solved via Cramer’s rule

|A | 356 |A | 356 |A | 534 P = 1 = = 2,P = 1 = = 2,P = 1 = = 3. 1 |A| 178 2 |A| 178 1 |A| 178

National-Income Model

Y = C + I0 + G0

C = a + bY (a > 0, 0 < b < 1)

These can be rearranged into the form

Y − C = I0 + G0

−bY + C = a

While we can solve Y¯ and C¯ by Cramer’s rule, here we solve this model by inverting the coefficient matrix.    1 −1 1 1   −1 1   Since A = , then A = 1−b −b 1 b 1 Hence     [ ] 1 1 1 I0 + G0 ¯ ¯     Y C = − 1 b b 1 a  

1 I0 + G0 + a =   1 − b b(I0 + G0) + a

50 5.7 Quadratic Forms

Quadratic Forms

A function q with n-variables is said to have the quadratic form if it can be written as

··· 2 ··· q(u1, u2, , un) = d11u1 + 2d12u1u2 + + 2d1nu1un 2 ··· + d22u2 + 2d23u2u3 + + 2d2nu2un ···

2 + dnnun

If we let dji = dij, i < j, then q(u1, u2, ··· , un) can be written as

··· 2 ··· q(u1, u2, , un) = d11u1 + d12u1u2 + + d1nu1un 2 ··· + d12u2u1 + d22u2 + + d2nu2un ···

··· 2 + dn1unu1 + dn2unu2 + + dnnun ∑n ∑n = dijuiuj i=1 j=1 = u′Du where   ··· d11 d12 d1n    ···  d21 d22 d2n D =   ············  

dn1 dn2 ··· dnn which is called quadratic-form matrix.

Since dij = dji, D is a symmetric matrix.

2 2 Example 5.7.1 A quadratic form in two variables: q = d11u1 + d12u1u2 + d22u2.

Positive and Negative Definiteness:

′ A quadratic form q(u1, u2, ··· , un) = u Du is said to be

51 (a) positive definite (PD) if q(u) > 0 for all u ≠ 0;

(b) positive semidefinite (PSD) if q(u) ≥ 0 for all u ≠ 0;

(c) negative definite (ND) if q(u) < 0 for all u ≠ 0;

(d) negative semidefinite (NSD) if q(u) ≤ 0 for all u ≠ 0.

Otherwise q is called indefinite (ID). Sometimes, we say that a matrix D is, for instance, positive definite if the correspond- ing quadratic form q(u) = u′Du is positive definite.

2 2 2 2 − 2 Example 5.7.2 q = u1 + u2 is PD, q = (u1 + u2) is PSD, q = u1 u2 is ID.

Determinantal Test for Sign Definiteness:

We state without proof that for the quadratic form q(u) = u′Du, the necessary and sufficient condition for positive definiteness is the principal minors of |D|, namely,

|D1| = d11 > 0

d11 d12 |D2| = > 0

d21 d22 ···

··· d11 d12 d1n

··· d21 d22 d2n |Dn| = > 0 ············

dn1 dn2 ··· dnn The corresponding necessary and sufficient condition for negative definiteness is that the principal minors alternate in sign as follows:

|D1| < 0, |D2| > 0, |D3| < 0, etc.

Two-Variable Quadratic Form

Example 5.7.3 Is q = 5u2 +3uv+2v2 either positive or negative? The symmetric matrix is   5 1.5   . 1.5 2

52 Since the principal minors of |D| is |D1| = 5 and

5 1.5 |D2| = = 10 − 2.25 = 7.75 > 0, 1.5 2 so q is positive definite.

Three-Variable Quadratic Form

2 2 2 − − Example 5.7.4 Determine whether q = u1 + 6u2 + 3u3 2u1u2 4u2u3 is positive or negative definite. The matrix D corresponding this quadratic form is   −  1 1 0    − −  D =  1 6 2 0 −2 3 and the principal minors of |D| are |D1| = 1 > 0,

− 1 1 |D2| = = 6 − 1 = 5, −1 6 and

− 1 1 0

| | − − D3 = 1 6 2 = 11 > 0.

0 −2 3 Thus, the quadratic form is positive definite.

− 2 − 2 − 2 − Example 5.7.5 Determine whether q = 3u1 3u2 5u3 2u1u2 is positive or negative definite. The matrix D corresponding this quadratic form is   − −  3 1 0    − −  D =  1 3 0  0 0 −5

Leading principal minors of D are

|D1| = −3 < 0, |D2| = 8 > 0, |D3| = −40 < 0,

therefore, the quadratic form is negative definite.

53 5.8 Eigenvalues and Eigenvectors

Consider the matrix equation: Dx = λx.

Any number λ such that the equation Dx = λx has a non-zero vector-solution x is called an eigenvalue (or a characteristic root) of the above equation. Any non-zero vector x satisfying the above equation is called an eigenvector (or characteristic vector) of D for the eigenvalue λ. Recipe - How to calculate eigenvalues: From Dx = λx, we have (D − λI)x = 0.

Since x is non-zero, the determinant of (D − λI) should vanish. Therefore all eigen- values can be calculated as roots of the equation (which is often called the characteristic equation or the characteristic polynomial of D)

|D − λI| = 0

Example 5.8.1 Let   −  3 1 0   −  D =  1 3 0 0 0 5

− − 3 λ 1 0

| − | − − − − − D λI = 1 3 λ 0 = (5 λ)(λ 2)(λ 4) = 0,

0 0 5 − λ

therefore the eigenvalues are λ1 = 2, λ2 = 4, λ3 = 5.

Properties of Eigenvalues:

′ A quadratic form q(u1, u2, ··· , un) = u Du is

positive definite if and only if eigenvalues λi > 0 for all i = 1, 2, ··· , n.

negative definite if and only if eigenvalues λi < 0 for all i = 1, 2, ··· , n.

positive semidefinite if and only if eigenvalues λi ≥ 0 for all i = 1, 2, ··· , n.

negative semidefinite if and only if eigenvalues λi ≤ 0 for all i = 1, 2, ··· , n.

54 A form is indefinite if at least one positive and one negative eigenvalues exist. Matrix A is diagonalizable if there exists a non-singular matrix P and a diagonal matrix D such that P −1AP = D.

Matrix U is orthogonal matrix if U ′ = U −1.

Theorem 5.8.1 (The Spectral Theorem for Symmetric Matrices) Suppose that A is a symmetric matrix of order n and λ1, ··· , λn are its eigenvalues. Then there exists an orthogonal matrix U such that   λ1 0    −1  ..  U AU =  .  .

0 λn

Usually, U is the normalized matrix formed by eigenvectors. It has the property U ′U = I. “Normalized” means that for any column u of the matrix U, u′u = 1.

Example 5.8.2 Diagonalize the matrix   1 2 A =   2 4

First, we need to find the eigenvalues:

− 1 λ 2 = λ(λ − 5) = 0 2 4 − λ i.e., λ1 = 0 and λ2 = 5.

For λ1 = 0 we solve      

1 − 0 2 x1 0     =   2 4 − 0 x2 0 ′ The eigenvector, corresponding to λ1 = 0 is v1 = C1 ·(2, −1) , where C1 is an arbitrary ′ real constant. Similarly, for λ2 = 5, v2 = C2 · (1, 2) . ′ Let us normalize the eigenvectors, i.e. let us pick constants Ci such that vivi = 1. We get ( ) ( ) 2 −1 1 2 v1 = √ , √ , v2 = √ , √ . 5 5 5 5

55 Thus the diagonalization matrix U is   √2 √1 U =  5 5  √−1 √2 5 5

You can easily check that   0 0 U −1AU =   0 5

The trace of a square matrix of order n is the sum of the n elements on its principal ∑ n diagonal, i.e., tr(A) = i=1 aii.

Properties of the Trace:

1) tr(A) = λ1 + ··· + λn;

2) if A and B are of the same order, tr(A + B) = tr(A) + tr(B);

3) if a is a scalar, tr(aA) = atr(A);

4) tr(AB) = tr(BA), whenever AB is square;

5) tr(A′) = tr(A); ∑ ∑ ′ n n 2 6) tr(A A) = i=1 j=1 aij.

5.9 Vector Spaces

A (real) vector space is a nonempty set V of objects together with an additive operation + : V × V → V , +(u, v) = u + v and a scalar multiplicative operation · : R × V → V , ·(a, u) = au which satisfies the following axioms for any u, v, w ∈ V and any a, b ∈ R (R is the set of all real numbers):

1. (u + v) + w = u + (v + w);

2. u + v = v + u;

3. 0 + u = u;

4. u + (−u) = 0;

5. a(u + v) = au + av;

56 6. (a + b)u = au + bu;

7. a(bu) = (ab)u;

8. 1u = u.

The objects of a vector space V are called vectors, the operations + and · are called vector addition and scalar multiplication, respectively. The element 0 ∈ V is the zero vector and −v is the additive inverse of V .

Example 5.9.1 (The n-Dimensional Vector Space Rn)

n ′ Define R = {(u1, u2, ··· , un) |ui ∈ R, i = 1, ··· , n} (the apostrophe denotes the n ′ ′ transpose). Consider u, v ∈ R , u = (u1, u2, ··· , un) , v = (v1, v2, ··· , vn) and a ∈ R. Define the additive operation and the scalar multiplication as follows:

′ u + v = (u1 + v1, ··· , un + vn) ,

′ au = (au1, ··· , aun) .

It is not difficult to verify that Rn together with these operations is a vector space.

Let V be a vector space. An inner product or scalar product in V is a function s : V × V → R, s(u, v) = u · v which satisfies the following properties:

1. u · v = v · u,

2. u · (v + w) = u · v + u · w,

3. a(u · v) = (au) · v = u · (av),

4. u · u ≥ 0 and u · u = 0 iff u = 0.

n ′ ′ Example 5.9.2 Let u, v ∈ R , u = (u1, u2, ··· , un) , v = (v1, v2, ··· , vn) . Then u · v = u1v1 + ··· + unvn.

Let V be a vector space and v ∈ V . The norm of magnitude is a function ||·|| : V → R √ defined as ||v|| = v · v. For any v ∈ V and any a ∈ R, we have the following properties:

1. ||au|| = |a|||u||;

2. (Triangle inequality) ||u + v|| ≤ ||u|| + ||v||;

3. (Schwarz inequality) |u · v| ≤ ||u|| × ||v||.

57 The nonzero vectors u and v are parallel if there exists a ∈ R such that u = av. The vectors u and v are orthogonal or perpendicular if their scalar product is zero, that is, if u · v = 0.

uv The angle between vectors u and v is arccos( ||u||||v|| ) A nonempty subset S of a vector space V is a subspace of V if for any u, v ∈ S and a ∈ R u + v ∈ S and au ∈ S.

Example 5.9.3 V is a subset of itself. {0} is also a subset of V . These subspaces are called proper subspaces.

Example 5.9.4 L = {(x, y)|y = mx + n} where m, n ∈ R and m ≠ 0 is a subspace of R2.

Let u1, u2, ··· , uk be vectors in a vector space V . The set S of all linear combinations of these vectors

S = {a1u1 + a2u2 + ··· , akuk|a1, ··· , ak ∈ R} is called the subspace generated or spanned by the vectors u1, u2, ··· , uk and denoted as sp(u1, u2, ··· , uk). One can prove that S is a subspace of V .

′ ′ 3 Example 5.9.5 Let u1 = (2, −1, 1) , u2 = (3, 4, 0) . Then the subspace of R generated by u1 and u2 is ′ sp(u1, u2) = {(2a + 3b, −a + 4b, a) |a, b ∈ R}.

As we discussed in Chapter 4, a set of vectors {u1, u2, ··· , uk} in a vector space V is linearly dependent if there exists the real numbers a1, a2, ··· , ak, not all zero, such that a1u1 +a2u2 +···+akuk = 0. In other words, the set of vectors in a vector space is linearly dependent if and only if one vector can be written as a linear combination of the others.

A set of vectors {u1, u2, ··· , uk} in a vector space V is linearly independent if it is not linearly dependent.

n Properties: Let {u1, u2, ··· , uk} be n vectors in R . The following conditions are equivalent:

58 i) The vectors are independent.

ii) The matrix having these vectors as columns is nonsingular.

iii) The vectors generate Rn.

A set of vectors {u1, u2, ··· , uk} in V is a basis for V if it, first, generates V , and, second, is linearly independent.

n ′ Example 5.9.6 Consider the following vectors in R . ei = (0, ··· , 0, 1, 0, ··· , 0) , where n 1 is in the ith position, i = 1, ··· , n. The set En = {e1, e2, ··· , en} forms a basis for R which is called the standard basis.

Let V be a vector space and B = {u1, u2, ··· , uk} a basis for V . Since B generates V , for any u ∈ V there exists the real numbers x1, x2, ··· , xn such that u = x1u1 +···+xnun. ′ The column vector x = (x1, x2, ··· , xn) is called the vector of coordinates of u with respect to B.

n Example 5.9.7 Consider the vector space R with the standard basis En. For any ′ ′ u = (u1, ··· , un) we can represent u as u = u1e1 + ··· + unen; therefore, (u1, ··· , un) is the vector of coordinates of u with respect to En.

Example 5.9.8 Consider the vector space R2. Let us find the coordinate vector of − ′ ′ − ′ − ′ ( 1, 2) with respect to the basis B = (1, 1) , (2, 3) (i.e., find ( 1, 2)B). We have to solve for a, b such that (−1, 2)′ = a(1, 1)′ + b(2, −3)′. Solving the system a + 2b = −1 − 1 − 3 − ′ 1 − 3 ′ and a 3b = 2, we find a = 5 , b = 5 . Thus, ( 1, 2)B = ( 5 , 5 ) .

The dimension of a vector space V dim(V ) is the number of elements in any basis for V .

n Example 5.9.9 The dimension of the vector space R with the standard basis En is n.

Let U, V be two vector spaces. A linear transformation of U into V is a mapping T : U → V such that for any u, v ∈ U and any a, b ∈ R

T (au + bv) = aT (u) + bT (v).

Example 5.9.10 Let A be a m × n real matrix. The mapping T : Rn → Rm defined by T (u) = Au is a linear transformation.

59 Properties:

Let U and V be two vector spaces, B = (b1, ··· , bn) a basis for U and C = (c1, ··· , cm) a basis for V .

1. Any linear transformation T can be represented by an m × n matrix AT

whose ith column is the coordinate vector of T (bi) relative to C.

′ 2. If x = (x1, ··· , xn) is the coordinate vector of u ∈ U relative to B and ′ y = (y1, ··· , ym) is the coordinate vector of T (u) relative to C then T defines the following transformation of coordinates:

y = AT x for any u ∈ U.

The matrix AT is called the matrix representation of T relative to bases B,C. Remark. Any linear transformation is uniquely determined by a transformation of coordinates.

Example 5.9.11 Consider the linear transformation T : R3 → R2, T ((x, y, z)′) = (x − 2y, x + z)′ and bases B = {(1, 1, 1)′, (1, 1, 0)′, (1, 0, 0)′} for R3 and C = {(1, 1)′, (1, 0)′} for R2. How can we find the matrix representation of T relative to bases B,C? We have

T ((1, 1, 1)′) = (−1, 2),T ((1, 1, 0)′) = (−1, 1),T ((1, 0, 0)′) = (1, 1).

′ ′ The columns of AT are formed by the coordinate vectors of T ((1, 1, 1) ), T ((1, 1, 0) ), T ((1, 0, 0)′) relative to C. Applying the procedure developed in Example 5.9.8 we find    2 1 1 AT = −3 −2 0

Let V be a vector space of dimension n, B and C be two bases for V , and I : V → V be the identity transformation ((I(v) = v for all v ∈ V ). The change-of-basis matrix D relative to B, C is the matrix representation of I to B, C.

′ Example 5.9.12 For u ∈ V , let x = (x1, ··· , xn) be the coordinate vector of u relative ′ to B and y = (y1, ··· , yn) is the coordinate vector of u relative to C. If D is the change- of-basis matrix relative to B, C then y = Cx. The change-of-basis matrix relative to C, B is D−1.

60 Example 5.9.13 Given the following bases for R2: B = {(1, 1)′, (1, 0)′} and C = {(0, 1)′, (1, 1)′}, find the change-of-basis matrix D relative to B, C. The columns of D are the coordinate vectors of (1, 1)′, (1, 0)′ relative to C. Following Example 5.9.8 we find   0 −1 D =   . 1 1

61 Chapter 6

Comparative Statics and the Concept of Derivative

6.1 The Nature of Comparative Statics

Comparative statics is concerned with the comparison of different equilibrium states that are associated with different sets of values of parameters and exogenous variables. When the value of some parameter or exogenous variable that is associated with an initial equilibrium changes, we can get a new equilibrium. Then the question posed in the comparative-static analysis is: How would the new equilibrium compare with the old? It should be noted that in the comparative-statics analysis we don’t concern with the process of adjustment of the variables; we merely compare the initial (prechange) equilibrium state with the final (postchange) equilibrium state. We also preclude the possibility of instability of equilibrium for we assume the equilibrium to be attainable. It should be clear that the problem under consideration is essentially one of finding a rate of change: the rate of change of the equilibrium value of an endogenous variable with respect to the change in a particular parameter or exogenous variable. For this reason, the mathematical concept of derivative takes on preponderant significance in comparative statics.

62 6.2 Rate of Change and the Derivative

We want to study the rate of change of any variable y in response to a change in another variable x, where the two variables are related to each other by the function

y = f(x)

Applied in the comparative static context, the variable y will represent the equilibrium value of an endogenous variable, and x will be some parameter.

The Different Quotient

We use the symbol ∆ to denote the change from one point, say x0, to another point, say x1. Thus ∆x = x1 − x0. When x changes from an initial value x0 to a new value x0 + ∆x, the value of the function y = f(x) changes from f(x0) to f(x0 + ∆x). The change in y per unit of change in x can be represented by the difference quotient.

∆y f(x + ∆x) − f(x ) = 0 0 ∆x ∆x Example 6.2.1 y = f(x) = 3x2 − 4 2 − 2 − Then f(x0) = 3x0 4, f(x0 + ∆x) = 3(x0 + ∆x) 4 and thus, ∆y f(x + ∆x) − f(x ) = 0 0 ∆x ∆x 3(x + ∆x)2 − 4 − (3x2 − 4) = 0 0 ∆x 6x ∆x + 3(∆x)2 = 0 ∆x

= 6x0 + 3∆x

The Derivative

Frequently, we are interested in the rate of change of y when ∆x is very small. In particular, we want to know the rate of ∆y/∆x when ∆x approaches to zero. If, as ∆x → 0, the limit of the difference quotient ∆y/∆x exits, that limit is called the derivative of the function y = f(x), and the derivative is denoted by dy ∆y ≡ y′ ≡ f ′(x) ≡ lim dx ∆x→0 ∆x

63 Remark: Several points should be noted about the derivative: (1) a derivative is a function. Whereas the difference quotient is a function of x0 and ∆x, the derivative is a function of x0 only; and (2) since the derivative is merely a limit of the difference quotient, it must also be of necessity a measure of some rate of change. Since ∆x → 0, the rate measured by the derivative is in the nature of an instantaneous rate of change.

Example 6.2.2 Referring to the function y = 3x2 − 4 again. Since

∆y = 6x + 3∆x, ∆x

dy we have dx = 6x.

6.3 The Derivative and the Slope of a Curve

Elementary economics tells us that, given a total-cost function C = f(Q), where C is the total cost and Q is the output, the marginal cost MC is defined as MC = ∆C/∆Q. It is a continuous variable, ∆Q will refer to an infinitesimal change. It is well known that MC can be measured by the slope of the total . But the slope of the total-cost curve is nothing but the limit of the ratio ∆C/∆Q as ∆Q → 0. Thus the concept of the slope of a curve is merely the geometric counterpart of the concept of the derivative.

6.4 The Concept of Limit

In the above, we have defined the derivative of a function y = f(x) as the limit of ∆y/∆x as ∆x → 0. We now study the concept of limit. For a given function q = q(v), the concept of limit is concerned with the question: What value does q approach as v approaches a specific value? That is, as v → N (the N can be any number, say N = 0, N = +∞, −∞), what happens to limv→N g(v). When we say v → N, the variable v can approach the number N either from values greater than N, or from values less than N. If, as v → N from the left side (from values less than N), q approaches a finite number L, we call L the left-side limit of q. Similarly, we call L the right-side limit of q if v → N from the right side. The left-side limit an right-side limit of q are denoted by limv→N − q and limv→N + q, respectively. The limit of q

64 at N is said to exit if lim q = lim q v→N − v→N + and is denoted by limv→N q. Note that L must be a finite number. If we have the situation of limv→N q = ∞ (or −∞), we shall consider q to possess no limit or an “infinite limit.” It is important to realize that the symbol ∞ is not a number, and therefore it cannot be subjected to the usual algebraic operations.

Graphical Illustrations

There are several possible situations regrading the limit of a function, which are shown in the following diagrams.

Evaluation of a limit

Let us now illustrate the algebraic evaluation of a limit of a given function q = g(v).

65 2 Example 6.4.1 Given q = 2 + v , find limv→0 q. It is clear that limv→0− q = 2 and 2 limv→0+ q = 2 and v → 0 as v → 0. Thus limv→0 q = 2.

Note that, in evaluating limv→N q, we only let v tends N but, as a rule, do not let v = N. Indeed, sometimes N even is not in the domain of the function q = g(v).

Example 6.4.2 q = (1 − v2)/(1 − v). For this function, N = 1 is not in the domain of the function, and we cannot set v = 1 since it would involve division by zero. Moreover, even the limit-evaluation procedure of letting v − 1 will cause difficulty since (1 − v) → 0 as v → 1. One way out of this difficulty is to try to transform the given ratio to a form in which v will not appear in the denominator. Since

1 − v2 (1 − v)(2 + v) q = = = 1 + v (v ≠ 1) 1 − v 1 − v and v → 1 implies v ≠ 1 and (1 + v) → 2 as v → 1, we have limv→1 q = 2.

2v+5 Example 6.4.3 Find limv→∞ v+1

66 2v+5 2(v+1)+3 3 Since v+1 = v+1 = 2 + v+1 3 2v+5 and limv→∞ v+1 = 0, so limv→∞ v+1 = 2.

Formal View of the Limit Concept

Definition of the Limit

The number L is said to be the limit of q = g(v) as v approaches N if, for every neighbor- hood of L, there can be found a corresponding neighborhood of N (excluding the point v = N) in the domain of the function such that, for every value of v in that neighborhood, its image lies in the chosen L-neighborhood. Here a neighborhood of a point L is an open interval defined by

(L − a1,L + a2) = {q|L − a1 < q < L + a2} for a1 > a2 > 0

67 6.5 Inequality and Absolute Values

Rules of Inequalities

Transitivity: a > b and b > c implies a > c a ≥ b and b ≥ c implies a ≥ c Addition and Subtraction: a > b =⇒ a ± k > b ± k a ≥ b =⇒ a ± k ≥ b ± k Multiplication and Division: a > b =⇒ ka > kb (k > 0) a > b =⇒ ka < kb (k < 0) Squaring: a > b with b ≥ 0 =⇒ a2 > b2

Absolute Values and Inequalities

For any real number n, the absolute value of n is defined and denoted by    n if n > 0 |n| = −n if n < 0   0 if n = 0

Thus we can write |x| < n as an equivalent way −n < x < n (n > 0). Also |x| ≤ n if and only if −n ≤ x ≤ n (n > 0). The following properties characterize absolute values: 1) |m| + |n| ≥ |m + n| 2) |m| · |n| = |m · n|

|m| m 3) |n| = n

Solution of an Inequality

Example 6.5.1 Find the solution of the inequality 3x − 3 > x + 1. By adding (3 − x) to both sides, we have 3x − 3 + 3 − x > x + 1 + 3 − x.

68 Thus, 2x > 4 so x > 2.

Example 6.5.2 Solve the inequality |1 − x| ≤ 3. From |1 − x| ≤ 3, we have −3 ≤ 1 − x ≤ 3, or −4 ≤ −x ≤ 2. Thus, 4 ≥ x ≥ −2, i.e., −2 ≤ x ≤ 4.

6.6 Limit Theorems

Theorems Involving a Single Equation

Theorem I: If q = av + b, then limv→N q = aN + b

Theorem II: If q = g(v) = b, then limv→N q = b k k Theorem III: limv→N v = N

Example 6.6.1 Given q = 5v + 7, then limv→2 = 5 · 2 + 7 = 17

3 Example 6.6.2 q = v . Find limv→2 q. 3 By theorem III, we have limv→2 = 2 = 8.

Theorems Involving Two Functions

For two functions q1 = g(v) and q2 = h(v), if limv→N q1 = L1, limv→N q2 = L2, then we have the following theorems:

Theorem IV: limv→N (q1 + q2) = L1 + L2.

Theorem V: limv→N (q1q2) = L1L2.

q1 L1 Theorem VI: limv→N = (L2 ≠ 0). q2 L2

1+v Example 6.6.3 Find limv→0 2+v . 1+v 1 Since limv→0(1 + v) = 1 and limv→0(2 + v) = 2, so limv→0 2+v = 2

Remark. Note that L1 and L2 represent finite numbers; otherwise theorems do not apply.

Limit of a Polynomial Function

2 n 2 n lim a0 + a1v + a2v + ··· + anv = a0 + a1N + a2N + ··· + anN v→N

69 6.7 Continuity and Differentiability of a Function

Continuity of a Function

A function q = g(v) is said to be continuous at N if limv→N q exists and limv→N g(v) = g(N). Thus the term continuous involves no less than three requirements: (1) the point N must be in the domain of the function; (2) limv→N g(v) exists; and (3) limv→N g(v) = g(N). Remark: It is important to note that while – in discussing the limit of a function – the point (N,L) is excluded from consideration, we are no longer excluding it in the present context. Rather, as the third requirement specifically states, the point (N,L) must be on the graph of the function before the function can be considered as continuous at point N.

Polynomial and Rational Functions

From the discussion of the limit of polynomial function, we know that the limit exists and equals the value of the function at N. Since N is a point in the domain of the function, we can conclude that any polynomial function is continuous in its domain. By those theorems involving two functions, we also know any rational function is continuous in its domain.

4v2 Example 6.7.1 q = v2+1 Then 2 2 2 4v lim → 4v 4N lim = v N = . → 2 2 2 v N v + 1 limv→N (v + 1) N + 1

Example 6.7.2 The rational function

v3 + v2 − 4v − 4 q = v2 − 4 is not defined at v = 2 and v = −2. Since v = 2, −2 are not in the domain, the function is discontinuous at v = 2 and v = −2, despite the fact that its limit exists as v → 2 or −2.

70 Differentiability of a Function

′ By the definition of the derivative of a function y = f(x), at x0 we know that f (x0) exists if and only if the lim of ∆y/∆x exists at x = x0 as ∆x → 0, i.e.,

′ ∆y f (x0) = lim ∆x→0 ∆x f(x + ∆x) − f(x ) ≡ lim 0 0 (differentiability condition) ∆x→0 ∆x

On the other hand, the function y = f(x) is continuous at x0 if and only if

lim f(x) = f(x0) (continuity condition) x→x0

We want to know what is the relationship between the continuity and differentia- bility of a function. Now we show the continuity of f is a necessary condition for its differentiability. But this is not sufficient.

Since the notation x → x0 implies x ≠ x0, so x − x0 is a nonzero number, it is permissible to write the following identity:

f(x) − f(x0) f(x) − f(x0) = (x − x0) x − x0

Taking the limit of each side of the above equation as x → x0 yields the following results:

Left side = lim (f(x) − f(x0)) = lim f(x) − f(x0) x→x0 x→x0

f(x) − f(x ) Right side = lim 0 lim (x − x ) → → 0 x x0 x − x0 x x0 ′ = f (x0) lim (x − x0) = 0 x→x0

− − Thus limx→x0 f(x) f(x0) = 0. So limx→x0 f(x) f(x0) which means f(x) is continuous at x = x0. Although differentiability implies continuity, the converse is not true. That is, continu- ity is a necessary, but not sufficient, condition for differentiability. The following example shows that fact.

Example 6.7.3 f(x) = |x|.

71 This function is clearly continuous at x = 0. Now we show that it is not differentiable at x = 0. This involves the demonstration of a disparity between the left-side and the right-side limits. Since, in considering the right-side limit x > 0, thus

f(x) − f(0) x lim = lim = lim 1 = 1 x→0+ x − 0 x→0+ x x→0−

On the other hand, in considering the left-side limit, x < 0; thus

f(x) − f(0) |x| −x lim = lim = lim = lim −1 = −1 x→0− x − 0 x→0+ x x→0− x x→0−

∆y − | | Thus, limx→0 ∆x does not exist, this means the derivative of y = x does not exist at x = 0.

72 Chapter 7

Rules of Differentiation and Their Use in Comparative Statics

The central problem of comparative-static analysis, that of finding a rate of change, can be identified with the problem finding the derivative of some function y = f(x), provided only a small change in x is being considered. Before going into comparative-static models, we begin some rules of differentiation.

7.1 Rules of Differentiation for a Function of One Variable

Constant-Function Rule

If y = f(x) = c, where c is a constant, then

dy ≡ y′ ≡ f ′ = 0 dx

Proof. dy f(x′) − f(x) c − c ′ = lim ′ = lim ′ =x →x 0 = 0 dx x′→x x − x x′→x x − x dy df We can also write dx = dx as d d y = f. dx dx So we may consider d/dx as an operator symbol.

73 Power-Function Rule

If y = f(x) = xa where a is any real number −∞ < a < ∞,

d f(x) = axa−1. dx

Remark 7.1.1 (i) If a = 0, then

d d x0 = 1 = 0 dx dx

(ii) If a = 1, then y = x. Thus dx = 1. dx

For simplicity, we prove this rule only for the case where a = n, where n is any positive integer. Since

n − n − n−1 n−2 2 n−3 ··· n−1 x x0 = (x x0)(x + x0x + x0x + + x0 ),

then xn − xn 0 n−1 n−2 2 n−3 ··· n−1 = x + x0x + x0x + + x0 x − x0 Thus,

f(x) − f(x ) xn − xn f ′(x ) = lim 0 = lim 0 0 → → x x0 x − x0 x x0 x − x0 n−1 n−2 2 n−3 ··· n−1 = lim x + x0x + x0x + + x0 x→x0 n−1 n−1 n−1 ··· n−1 = x0 + x0 + x0 + + x0

n−1 = nx0

Example 7.1.1 Suppose y = f(x) = x−3. Then y′ = −3x−4 √ ′ 1 − 1 2 Example 7.1.2 Suppose y = f(x) = x. Then y = 2 x . In particular, we can know √ ′ 1 − 1 2 · 2 that f (2) = 2 2 = 4

Power-Function Rule Generalized

If the function is given by y = cxa, then

dy df = = acxa−1. dx dx

74 Example 7.1.3 Suppose y = 2x. Then

dy = 2x0 = 2. dx

Example 7.1.4 Suppose y = 4x3. Then

dy = 4 · 3x3−1 = 12x2. dx

Example 7.1.5 Suppose y = 3x−2. Then

dy = −6x−2−1 = −6x−3. dx

Some Special Rules:

f(x) = constant ⇒ f ′(x) = 0

f(x) = xa(a is constant) ⇒ f ′(x) = axa−1

f(x) = ex ⇒ f ′(x) = ex

f(x) = ax(a > 0) ⇒ f ′(x) = ax ln a 1 f(x) = ln x ⇒ f ′(x) = x 1 1 f(x) = log x(a > 0; a ≠ 1) ⇒ f ′(x) = log e = a x a x ln a f(x) = sin x ⇒ f ′(x) = cos x

f(x) = cos x ⇒ f ′(x) = − sin x 1 f(x) = tan x ⇒ f ′(x) = cos2 x 1 f(x) = ctanx ⇒ f ′(x) = − sin2 x 1 f(x) = arcsin x ⇒ f ′(x) = √ 1 − x2 1 f(x) = arccos x ⇒ f ′(x) = −√ 1 − x2 1 f(x) = arctan x ⇒ f ′(x) = 1 − x2 1 f(x) = arcctanx ⇒ f ′(x) = − 1 − x2

75 7.2 Rules of Differentiation Involving Two or More Functions of the Same Variable

Let f(x) and g(x) be two differentiable functions. We have the following rules:

Sum-Difference Rule:

d d d [f(x) ± g(x)] = f(x) ± g(x) = f ′(x) ± g′(x) dx dx dx This rule can easily be extended to more functions [ ] d ∑n ∑n d ∑n f (x) = f (x) = f ′(x) dx i dx i i i=1 i=1 i=1

Example 7.2.1 d (ax2 + bx + c) = 2ax + b dx

Example 7.2.2 Suppose a short-run total-cost function is given by c = Q3 −4Q2 +10Q+ 75. Then the marginal-cost function is the limit of the quotient ∆C/∆Q, or the derivative of the C function: dC = 3Q2 − 4Q + 10. dQ

In general, if a primitive function y = f(x) represents a total function, then the derivative function dy/dx is its marginal function. Since the derivative of a function is the slope of its curve, the marginal function should show the slope of the curve of the total function at each point x. Sometimes, we say a function is smooth if its derivative is continuous.

Product Rule:

d d d [f(x)g(x)] = f(x) g(x) + g(x) f(x) dx dx dx = f(x)g′(x) + g(x)f ′(x)

76 Proof. d f(x)g(x) − f(x )g(x ) [f(x )g(x )] = lim 0 0 0 0 → dx x x0 x − x0 f(x)g(x) − f(x)g(x ) + f(x)g(x ) − f(x )g(x ) = lim 0 0 0 0 → x x0 x − x0 f(x)[g(x) − g(x )] + g(x )[f(x) − f(x )] = lim 0 0 0 → x x0 x − x0 g(x) − g(x ) f(x) − f(x ) = lim f(x) 0 + lim g(x ) 0 → → 0 x x0 x − x0 x x0 x − x0 ′ ′ = f(x0)g (x0) + g(x0)f (x0)

Since this is true for any x = x0, this proves the rule.

Example 7.2.3 Suppose y = (2x + 3)(3x2). Let f(x) = 2x + 3 and g(x) = 3x2. Then f ′(x) = 2, g′(x) = 6x. Hence, d [(2x + 3)(3x2)] = (2x + 3)6x + 3x2 · 2 dx = 12x2 + 18x + 6x2

= 18x2 + 18x

As an extension of the rule to the case of three functions, we have d [f(x)g(x)h(x)] = f ′(x)g(x)h(x) + f(x)g′(x)h(x) + f(x)g(x)h′(x). dx

Finding Marginal-Revenue Function from Average-Revenue Function

Suppose the average-revenue (AR) function is specified by

AR = 15 − Q.

The the total-revenue (TR) function is

TR ≡ AR · Q = 15Q − Q2.

Thus, the marginal-revenue (MR) function is given by d MR ≡ TR = 15 − 2Q. dQ In general, if AR = f(Q), then

TR ≡ AR · Q = Qf(Q).

77 Thus d MR ≡ TR = f(Q) + Qf ′(Q). dQ From this, we can tell relationship between MR and AR. Since

MR − AR = Qf ′(Q),

they will always differ the amount of Qf ′(Q). Also, since

TR PQ AR ≡ = = P, Q Q we can view AR as the inverse demand function for the product of the firm. If the market is perfectly competitive, i.e., the firm takes the price as given, then P = f(Q) =constant. Hence f ′(Q) = 0. Thus MR − AR = 0 or MR = AR. Under imperfect , on the other hand, the AR curve is normally downward-sloping, so that f ′(Q) < 0. Thus MR < AR.

Quotient Rule

[ ] d f(x) f ′(x)g(x) − f(x)g′(x) = dx g(x) g2(x) We will come back to prove this rule after learning the chain rule.

Example 7.2.4 [ ] d 2x − 3 2(x + 1) − (2x − 3)(1) 5 = = dx x + 1 (x + 1)2 (x + 1)2

[ ] d 5x 5(x2 + 1) − 5x(2x) 5(1 − x2) = = dx x2 + 1 (x2 + 1)2 (x2 + 1)2 [ ] d ax2 + b 2ax(cx) − (ax2 + b)c c(ax2 − b) ax2 − b = = = dx cx (cx)2 (cx)2 cx2

Relationship Between Marginal-Cost and Average-Cost Functions

As an economic application of the quotient rule, let us consider the rate of change of when output varies.

78 Given a total cost function C = C(Q), the average cost (AC) function and the marginal-cost (MC) function are given by

C(Q) AC ≡ (Q > 0) Q

and MC ≡ C′(Q).

The rate of change of AC with respect to Q can be found by differentiating AC: [ ] [ ] d C(Q) C′(Q)Q − C(Q) 1 C(Q) = = C′(Q) − . dQ Q Q2 Q Q

From this it follows that, for Q > 0.

d AC > 0 iff MC(Q) > AC(Q) dQ d AC = 0 iff MC(Q) = AC(Q) dQ d AC < 0 iff MC(Q) < AC(Q) dQ

7.3 Rules of Differentiation Involving Functions of Different Variables

Now we consider cases where there are two or more differentiable functions, each of which has a distinct independent variable.

Chain Rule

If we have a function z = f(y), where y is in turn a function of another variable x, say, y = g(x), then the derivative of z with respect to x is given by

dz dz dy = · = f ′(y)g′(x). [ChainRule] dx dy dx

The chain rule appeals easily to intuition. Given a ∆x, there must result in a corre- sponding ∆y via the function y = g(x), but this ∆y will in turn being about a ∆z via the function z = f(y).

79 Proof. Note that dz ∆z ∆z ∆y = lim = lim . dx ∆x→0 ∆x ∆x→0 ∆y ∆x Since ∆x → 0 implies ∆y → 0, we have

dz dz dy = · = f ′(y)g′(x). Q.E.D. dx dy dx

In view of the function y = g(x), we can express the function z = f(y) as z = f(g(x)), where the contiguous appearance of the two function symbols f and g indicates that this is a compose function (function of a function). So sometimes, the chain rule is also called the composite function rule. As an application of this rule, we use it to prove the quotient rule. Let z = y−1 and y = g(x). Then z = g−1(x) = 1/g(x) and

dz dz dy 1 g′(x) = · = − g′(x) = − dx dy dx y2 g2(x)

80 Thus, [ ] d f(x) d = [f(x) · g−1(x)] dx g(x) dx d = f ′(x)g−1(x) + f(x) [g−1(x)] dx[ ] g(x) = f ′(x)g−1(x) + f(x) − g2(x) f ′(x)g(x) − f(x)g′(x) = . Q.E.D. g2(x)

Example 7.3.1 If z = 3y2 and y = 2x + 5, then

dz dz dy = = 6y(2) = 12y = 12(2x + 5) dx dy dx

Example 7.3.2 If z = y − 3 and y = x3, then

dz dz dy = = 1 · 3x2 = 3x2 dx dy dx

Example 7.3.3 z = (x2 + 3x − 2)17. Let z = y17 and y = x2 + 3x − 2. Then

dz dz dy = = 17y16 · (2x + 3) dx dy dx = 17(x2 + 3x − 2)16(2x + 3).

Example 7.3.4 Suppose TR = f(Q), where output Q is a function of labor input L, or

Q = g(L). Then, by the chain rule, the marginal revenue product of labor (MRPL) is

dR dR dQ MRP = = = f ′(Q)g′(L) = MR · MPP L dL dQ dL L where MPPL is marginal physical product of labor. Thus the result shown above consti- tutes the mathematical statement of the well-known result in economics that MRPL =

MR · MPPL.

Inverse-Function Rule

If the function y = f(x) represents a one-to-one mapping, i.e., if the function is such that a different value of x will always yield a different value of y, then function f will have an inverse function x = f −1(y). Here, the symbol f −1 is a function symbol which signifies a function related to the function f; it does not mean the reciprocal of the function f(x).

81 When x and y refer specifically to numbers, the property of one-to-one mapping is seen to be unique to the class of function known as monotonic function. A function f is said to be monotonically increasing (decreasing) if x1 > x2 implies f(x1) > f(x2)(f(x1) < f(x2)). In either of these cases, an inverse function f −1 exists. A practical way of ascertaining the monotonicity of a given function y = f(x) is to check whether the f ′(x) always adheres to the same algebraic sign for all values. Geomet- rically, this means that its slope is either always upward or always downward.

Example 7.3.5 Suppose y = 5x + 25. Since y′ = 5 for all x, the function is monotonic and thus the inverse function exists. In fact, it is given by x = 1/5y − 5.

Generally speaking, if an inverse function exists, the original and the inverse functions must be both monotonic. Moreover, if f −1 is the inverse function of f, then f must be the inverse function of f −1. For inverse functions, the rule of differentiation is

dx 1 = . dy dy dx

Example 7.3.6 Suppose y = x5 + x. Then

dx 1 1 = = . dy dy 5x4 + 1 dx dx 1 Example 7.3.7 Given y = ln x, its inverse is x = ey. Therefore = = x = ey. dy 1/x

7.4 Integration (The Case of One Variable) ∫ Let f(x) be a continuous function. The indefinite integral of f (denoted by f(x)dx) is defined as ∫ f(x)dx = F (x) + C, where F (x) is such that F ′(x) = f(x), and C is an arbitrary constant.

Rules of Integration ∫ ∫ ∫ • [af(x) + bg(x)]dx = a f(x)dx + b g(x)dx, a, b are constants (linearity of the integral);

82 ∫ ∫ • f ′(x)g(x)dx = f(x)g(x) − f(x)g′(x)dx (integration by parts); ∫ ∫ • du f(u(t)) dt dt = f(u)du (integration by substitution).

Some Special Rules of Integration:

∫ f ′(x) dx = ln |f(x)| + C ∫ f(x) 1 dx = ln |x| + C ∫ x exdx = ex + C ∫ f ′(x)ef(x)dx = ef(x) + C ∫ xa+1 xadx = + C, a ≠ −1 ∫ a + 1 ax axdx = + C, a > 0 ln a

Example 7.4.1 ∫ ∫ ∫ ∫ x2 + 2x + 1 1 x2 dx = xdx + 2dx + dx = + 2x + ln |x| + C. x x 2

Example 7.4.2 ∫ ∫ ∫ −x2 2 1 2 1 e xe−x dx = − (−2x)e−x dx = − e−zdz = − + C. 2 2 2

Example 7.4.3 ∫ ∫ xexdx = xex − exdx = xex − ex + C.

(The Newton-Leibniz formula) The definite integral of a continuous function f is ∫ b |b − f(x)dx = F (x) a = F (b) F (a), a for F (x) such that F ′(x) = f(x) for all x ∈ [a, b]. Remark: The indefinite integral is a function. The definite integral is a number.

83 Properties of Definite Integrals:

(a, b, c, α, β are real numbers) ∫ ∫ ∫ b b b [αf(x) + βg(x)]dx = α f(x)dx + β g(x)dx ∫a ∫ a a b a f(x)dx = − f(x)dx ∫a b a f(x)dx = 0 ∫a ∫ ∫ b c b f(x)dx = f(x)dx + f(x)dx ∫a a∫ c b b

f(x)dx ≤ |f(x)|dx ∫ a a ∫ b g(b) f(x)g′(x)dx = f(u)du, u = g(x) (change of variable) ∫a g(a) ∫ b b ′ |b − ′ f (x)g(x)dx = f(x)g(x) a f(x)g (x)dx a a

Some More Useful Results:

∫ d b(λ) f(x)dx = f(b(λ))b′(λ) − f(a(λ))a′(λ) dλ a(λ)

Example 7.4.4 ∫ d x f(x)dx = f(x). dx a (Mean Value Theorem) If f(x) is continuous in [a, b], and differentiable on (a, b), where a < b, then there exist at least one point c ∈ (a, b) such that

f(b) − f(a) f ′(c) = . b − a

7.5 Partial Differentiation

So far, we have considered only the derivative of functions of a single independent variable. In comparative-static analysis, however, we are likely to encounter the situation in which several parameters appear in a model, so that the equilibrium value of each endogenous

84 Figure 7.1: The mean value theorem implies that there exists some c in the interval (a, b) such that the secant joining the endpoints of the interval [a, b] is parallel to the tangent at c. variable may be a function of more than parameter. Because of this, we now consider the derivative of a function of more than one variable.

Partial Derivatives

Consider a function

y = f(x1, x2, ··· , xn), where the variables xi (i = 1, 2, ··· , n) are all independent of one another, so that each can vary by itself without affecting the others. If the variable xi changes ∆xi while the other variables remain fixed, there will be a corresponding change in y, namely, ∆y. The difference quotient in this case can be expressed as

∆y f(x , x , ··· , x − , x + ∆x , x , ··· , x ) − f(x , x , ··· , x ) = 1 2 i 1 i i i n 1 2 n ∆xi ∆xi

If we take the limit of ∆y/∆xi, that limit will constitute a derivative. We call it the partial derivative of y with respect to xi. The process of taking partial derivatives is called

85 ∂y partial differentiation. Denote the partial derivative of y with respect to xi by , i.e., ∂xi ∂y ∆y = lim . → ∂xi ∆xi 0 ∆xi

Also we can use fi to denote ∂y/∂xi. If the function happens to be written in terms of unsubscripted variables, such as y = f(u, v, w), one also uses, fu, fv, fw to denote the partial derivatives.

Techniques of Partial Differentiation

Partial differentiation differs from the previously discussed differentiation primarily in that we must hold the other independent variables constant while allowing one variable to vary.

2 2 Example 7.5.1 Suppose y = f(x1, x2) = 3x1 + x1x2 + 4x2. Find ∂y/∂x1 and ∂y/∂x2. ∂y ∂f ≡ = 6x1 + x2 ∂x1 ∂x1 ∂y ∂f ≡ = x1 + 8x2 ∂x2 ∂x2

Example 7.5.2 For y = f(u, v) = (u + 4)(3u + 2v), we have

∂y ≡ f = (3u + 2v) + (u + 4) · 3 ∂u u = 6u + 2v + 12

∂y ≡ f = 2(u + 4) ∂v v

When u = 2 and v = 1, then fu(2, 1) = 26 and fv(2, 1) = 12.

Example 7.5.3 Given y = (3u − 2v)/(u2 + 3v),

∂y 3(u2 + 3v) − (3u − 2v)(2u) −3u2 + 4uv + 9v = = ∂u (u2 + 3v)2 (u2 + 3v)2

∂y −2(u2 + 3v) − (3u − 2v) · 3 −u(2u + 9) = = ∂v (u2 + 3v)2 (u2 + 3v)2

86 7.6 Applications to Comparative-Static Analysis

Equipped with the knowledge of the various rules of differentiation, we can at last tackle the problem posed in comparative-static analysis: namely, how the equilibrium value of an endogenous variable will change when there is a change in any of the exogenous variables or parameters.

Market Model

For the one-commodity market model:

Qd = a − bp (a, b > 0)

Qs = −c + dp (c, d > 0) the equilibrium price and quantity are given by

a + c p¯ = b + d ad − bc Q¯ = b + d

These solutions will be referred to as being in the reduced form: the two endogenous variables have been reduced to explicit expressions of the four independent variables, a, b, c, and d. To find how an infinitesimal change in one of the parameters will affect the value ofp ¯ or Q¯, one has only to find out its partial derivatives. If the sign of a partial derivative can be determined, we will know the direction in whichp ¯ will move when a parameter changes; this constitutes a qualitative conclusion. If the magnitude of the partial derivative can be ascertained, it will constitute a quantitative conclusion. To avoid misunderstanding, ¯ a clear distinction should be made between the two derivatives, say, ∂Q/∂a and ∂Qd/∂a. The latter derivative is a concept appropriate to the demand function taken alone, and without regard to the supply function. The derivative ∂Q/∂a¯ , on the other hand, to the equilibrium quantity which takes into account interaction of demand and supply together. To emphasize this distinction, we refer to the partial derivatives ofp ¯ and Q¯ with respect to the parameters as comparative-static derivatives.

87 For instance, forp ¯, we have

∂p¯ 1 = ∂a b + d ∂p¯ a + c = − ∂b (b + d)2 ∂p¯ 1 = ∂c b + d ∂p¯ a + c = − ∂b (b + d)2

∂p¯ ∂p¯ ∂p¯ ∂p¯ Thus ∂a = ∂c > and ∂b = ∂d < 0.

88 National-Income Model

Y = C + I0 + G0 (equilibrium condition) C = α + β(Y − T )(α > 0; 0 < β < 1)

T = γ + δY (γ > 0; 0 < δ < 1) where the endogenous variables are the national income Y , consumption C, and taxes T . The equilibrium income (in reduced form) is

α − βγ + I + G Y¯ = 0 0 1 − β + βδ Thus,

∂Y¯ 1 = > 0 (the government-expenditure ) ∂G0 1 − β + βδ ∂Y¯ −β = < 0 ∂γ 1 − β + βδ ∂Y¯ −β(α − βγ + I + G ) −βY¯ = 0 0 = < 0 ∂δ (1 − β + βδ)2 (1 − β + βδ)2

7.7 Note on Jacobian Determinants

Partial derivatives can also provide a means of testing whether there exists functional (linear or nonlinear) dependence among a set of n variables. This is related to the notion of Jacobian determinants. Consider n differentiable functions in n variables not necessary linear.

1 y1 = f (x1, x2, ··· , xn)

2 y2 = f (x1, x2, ··· , xn)

···

n yn = f (x1, x2, ··· , xn), where the symbol f i denotes the ith function, we can derive a total of n2 partial derivatives.

∂y i (i = 1, 2, ··· , n; j = 1, 2, ··· , n). ∂xj

89 We can arrange them into a square matrix, called a Jacobian matrix and denoted by J, and then take its determinant, the result will be what is known as a Jacobian determinant (or a Jacobian, for short), denoted by |J|:

∂y1 ∂y1 ··· ∂y1 ∂x1 ∂x2 ∂xn

··· ∂y2 ∂y2 ··· ∂y2 ∂(y1, y2, , yn) ∂x1 ∂x2 ∂xn |J| = = ∂(x1, x2, ··· , xn) ············

∂yn ∂yn ··· ∂yn ∂x1 ∂x2 ∂xn Example 7.7.1 Consider two functions:

y1 = 2x1 + 3x2

2 2 y2 = 4x1 + 12x1x2 + 9x2

Then the Jacobian is

∂y1 ∂y1 2 3 ∂x1 ∂x2 |J| = = ∂y2 ∂y2 (8x1 + 12x2) (12x1 + 18x2) ∂x1 ∂x2 A Jacobian test for the existence of functional dependence among a set of n functions is provided by the following theorem:

Theorem 7.7.1 The jacobian |J| defined above will be identically zero for all values of

1 2 n x1, x2, ··· , xn if and only if the n functions f , f , ··· f are functionally (linear or nonlinear) dependent.

For the above example, since

∂(y1, y2) |J| = = (24x1 + 36x2) − (24x1 + 36x2) = 0 ∂(x1, x2)

1 2 for all x1 and x2, y and y are functionally dependent. In fact, y2 is simply y1 squared.

Example 7.7.2 Consider the linear-equation system: Ax = d, i.e.,

a11x1 + a12x2 + ··· + a1nxn = d1

a21x1 + a22x2 + ··· + a2nxn = d2 ···

an1x1 + an2x2 + ··· + annxn = dn

90 We know that the rows of the coefficient matrix A are linearly dependent if and only if |A| = 0. This result can now be interpreted as a special application of the Jacobian criterion of functional dependence. Take the left side of each equation Ax = d as a separate function of the n variables x1, x2, ··· , xn, and denote these functions by y1, y2, ··· , yn. Then we have ∂yi/∂xj = aij. In view of this, the elements of |J| will be precisely the elements of A, i.e., |J| = |A| and thus the Jacobian criterion of functional dependence among y1, y2, ··· , yn is equivalent to the criterion |A| = 0 in the present linear case.

91 Chapter 8

Comparative-static Analysis of General-Functions

The study of partial derivatives has enables us, in the preceding chapter, to handle the simple type of comparative-static problems, in which the equilibrium solution of the model can be explicitly stated in the reduced form. We note that the definition of the partial derivative requires the absence of any functional relationship among the independent variables. As applied to comparative-static analysis, this means that parameters and/or exogenous variables which appear in the reduced-form solution must be mutually independent. However, no such expediency should be expected when, owing to the inclusion of general functions in a model, no explicit reduced-form solution can be obtained. In such a case, we will have to find the comparative-static derivatives directly from the originally given equations in the model. Take, for instance, a simple national-income model with two endogenous variables Y and C:

Y = C + I0 + G0 (equilibrim condition)

C = C(Y,T0)(T0 : exogenous taxes) which reduces to a single equation

Y = C(Y,T0) + I0 + G0 to be solved for Y¯ . We must, therefore, find the comparative-static derivatives directly from this equation. How might we approach the problem?

92 Let us suppose that an equilibrium solution Y¯ does exists. We may write the equation

¯ ¯ Y = Y (I0,G0,T0) even though we are unable to determine explicitly the form which this function takes. Furthermore, in some neighborhood of Y¯ , the following identical equality will hold:

¯ ¯ Y ≡ C(Y,T0) + I0 + G0

¯ Since Y is a function of T0, the two arguments of the C function are not independent. ¯ T0 can in this case affect C not only directly, but also indirectly via Y . Consequently, partial differentiation is no longer appropriate for our purposes. In this case, we must resort to total differentiation (as against partial differentiation). The process of total differentiation can lead us to the related concept of . Once we become familiar with these concepts, we shall be able to deal with functions whose arguments are not all independent so that we can study the comparative-static of a general-function model.

8.1 Differentials

The symbol dy/dx has been regarded as a single entity. We shall now reinterpret as a ratio of two quantities, dy and dx.

Differentials and Derivatives

Given a function y = f(x), we can use the difference quotient ∆y/∆x to represent the ratio of change of y with respect to x. Since [ ] ∆y ∆y ≡ ∆x, (8.1) ∆x the magnitude of ∆y can be found, once the ∆y/∆x and the variation ∆x are known. If we denote the infinitesimal changes in x and y, respectively, by dx and dy, the identity (7.1) becomes [ ] dy dy ≡ dx or dy = f ′(x)dx (8.2) dx The symbols dy and dx are called the differentials of y and x, respectively.

93 Dividing the two identities in (7.2) throughout by dx, we have ( ) (dy) dy (dy) ≡ or ≡ f ′(x) (dx) dx (dx) This result show that the derivative dy/dx ≡ f ′(x) may be interpreted as the quotient of two separate differentials dy and dx. On the basis of (7.2), once we are given f ′(x), dy can immediately be written as f ′(x)dx. The derivative f ′(x) may thus be viewed as a “converter” that serves to convert an infinitesimal change dx into a corresponding change dy.

Example 8.1.1 Given y = 3x2 + 7x − 5, find dy. Since f ′(x) = 6x + 7, the desired differential is dy = (6x + 7)dx

Remark. The purpose of finding the differential dy is called differentiation. Recall that we have used this term as a synonym for derivation. To avoid confusion, the word “differentiation” with the phrase “with respect to x” when we take derivative dy/dx. The following diagram shows the relationship between “∆y” and “dy”

[ ] ∆y CB ∆y ≡ ∆x = AC = CB ∆x AC

94 [ ] dy CD dy = ∆x = AC = CD dx AC which differs from ∆y by an error of DB.

Differentials and Point

As an illustration of the application of differentials in economics, let us consider the elasticity of a function. For a demand function Q = f(P ), for instance, the elasticity is defined as (∆Q/Q)/(∆P/P ), the ratio of percentage change in quantity and percentage change in price. Now if ∆P → 0, the ∆P and ∆Q will reduce to the differential dP and dQ, and the elasticity becomes

dQ/Q dQ/dP marginal demand function ϵ ≡ = = d dP/P Q/P average demand function

In general, for a given function y = f(x), the point elasticity of y with respect to x as

dy/dx marginal function ϵ = = yx y/x average function

Example 8.1.2 Find ϵd if the demand function is Q = 100 − 2P . Since dQ/dP = −2 and Q/P = (100−2P )/2, so ϵd = (−P )/(50−P ). Thus the demand is inelastic (|ϵd| < 1) for 0 < P < 25, unit elastic (|ϵd| = 1) for P = 25, and elastic for for 25 < P < 50.

8.2 Total Differentials

The concept of differentials can easily be extended to a function of two or more indepen- dent variables. Consider a saving function

S = S(Y, i) where S is savings, Y is national income, and i is interest rate. If the function is continuous and possesses continuous partial derivatives, the total differential is defined by

∂S ∂S dS = dY + di ∂Y ∂i

That is, the infinitesimal change in S is the sum of the infinitesimal change in Y and the infinitesimal change in i.

95 Remark. If i remains constant, the total differential will reduce to a partial differential: ( ) ∂S dS = ∂Y dY i constant

Furthermore, general case of a function of n independent variables y = f(x1, x2, ··· , xn), the total differential of this function is given by

∂f ∂f ∂f ∑n df = dx + dx + ··· + dx = f dx ∂x 1 ∂x 2 ∂x n i i 1 2 n i=1 in which each term on the right side indicates the amount of change in y resulting from an infinitesimal change in one of the independent variables. Similar to the case of one variable, the n partial elasticities can be written as

∂f xi ··· ϵfxi = (i = 1, 2, , n). ∂xi f

8.3 Rule of Differentials

Let c be constant and u and v be two functions of the variables x1, x2, ··· , xn. The the following rules are valid: Rule I. dc = 0 Rule II. d(cua) = caua−1du Rule III. d(u ± v) = du ± dv Rule IV. d(uv) = vdu + udv Rule V. d(u/v) = 1/v2(vdu − udv)

2 Example 8.3.1 Find dy of the function y = 5x1 + 3x2. There are two ways to find dy.

One is the straightforward method by finding ∂f/∂x1 and ∂f/∂x2: ∂f/∂x1 = 10x1 and

∂f/∂x2 = 3, which will then enable us to write

dy = f1dx1 + f2dx2 = 10x1dx1 + 3dx2.

2 The other way is to use the rules given above by letting u = 5x1 and v = 3x2;

2 dy = d(5x1) + d(3x2)(by rule III)

= 10x1dx1 + 3dx2 (by rule II)

96 2 2 2 Example 8.3.2 Find dy of the function y = 3x1 + x1x2. Since f1 = 6x1 + x2 and f2 = 2x1x2, the desired differential is

2 dy = (6x1 + x2)dx + 2x1x2dx2.

By applying the given rules, the same result can be arrived at

2 2 dy = d(3x1) + d(x1x2)

2 = 6x1dx1 + x2dx1 + 2x1x2dx2

2 = (6x1 + x2)dx + 2x1x2dx2

Example 8.3.3 For the function x1 + x2 y = 2 2x1

−(x1 + 2x2) 1 f1 = 3 and f2 = 2 , 2x1 2x1 then −(x1 + 2x2) 1 dy = 3 dx1 + 2 d2 2x1 2x1 The same result can also be obtained by applying the given rules:

1 2 − 2 dy = 4 [2x1d(x1 + x2) (x1 + x2)d(2x1)] [by rule V] 4x1 1 2 − = 4 [2x1(dx1 + dx2) (x1 + x2)4x1dx1] 4x1 1 − 2 = 4 [ 2x1(x1 + 2x2)dx1 + 2x1dx2] 4x1 −x1 + 2x2 1 = 3 dx1 + 2 dx2 2x1 2x1

For the case of more than two functions, we have Rule VI. d(u ± v ± w) = du ± dv ± dw Rule VII. d(uvw) = vwdu + uwdv + uvdw.

8.4 Total Derivatives

Consider any function y = f(x, w) where x = g(w)

97 Unlike a partial derivative, a total derivative does not require the argument x to main constant as w varies, and can thus allow for the postulated relationship between the two variables. The variable w can affect y through two channels: (1) indirectly, via the function g and then f, and (2) directly, via the function. Whereas the partial derivative fw is adequate for expressing the direct effect alone, a total derivative is needed to express both effects jointly.

To get the total derivative, we first get the total differential dy = fxdx+fwdw. Dividing both sides of this equation by dw, we have the total derivative

dy dx dw = f + f dw x dw w dw ∂y dx ∂y = + ∂x dw ∂w

Example 8.4.1 Find the dy/dw, given the function

y = f(x, w) = 3x − w2 where g(w) = 2w2 + w + 4

dy dx = f + f = 3(4w + 1) − 2w = 10w + 3. dw x dw w As a check, we may substitute the function g into f, to get

y = 3(2w2 + w + 4) − w2 = 5w2 + 3w + 12

which is now a function of w alone. Then

dy = 10w + 3 dw

the identical answer.

A Variation on the Theorem

For the function

y = f(x1, x2, w) with x1 = g(w) and x2 = h(w), the total derivative of y is given by

dy ∂f dx ∂f dx ∂f = 1 + 2 + . dw ∂x1 dw ∂x2 dw ∂w

98 Example 8.4.2 Let the production function be

Q = Q(K, L, t) where K is the capital input, L is the labor input, and t is the time which indicates that the production can shift over time in reflection of technological change. Since capital and labor can also change over time, we may write

K = K(t) and L = L(t)

Thus the rate of output with respect to time can be denote as dQ ∂Q dK ∂Q dL ∂Q = + + dt ∂K dt ∂L dt ∂t

Another Variation on the Theme

Now if a function is given,

y = f(x1, x2, u, v) with x1 = g(u, v) and x2 = h(u, v), we can find the total derivative of y with respect to u (while v is held constant). Since ∂f ∂f ∂f ∂f dy = dx1 + dx2 + du + dv. ∂x1 ∂x2 ∂u ∂v dividing both sides of the above equation by du, we have dy ∂y dx ∂y dx ∂y du ∂y dv = 1 + 2 + + du ∂x1 du ∂x2 du ∂u du[ ∂v du ] ∂y dx ∂y dx ∂y dv = 1 + 2 + = 0 since v isconstant ∂x1 du ∂x2 du ∂u du Since v is held constant, the above is the partial total derivative, we redenote the above equation by the following notation: §y ∂y ∂x ∂y ∂x ∂y = 1 + 2 + §u ∂x1 ∂u ∂x2 ∂u ∂u Remark: In the cases we have discussed, the total derivative formulas can be re- garded as expressions of the chain rule, or the composite-function rule. Also the chain of derivatives does not have to be limited to only two “links”; the concept of the total deriva- tive should be extendible to cases where there are three or more links in the composite function.

99 8.5 Implicit Function Theorem

The concept of total differentials can also enable us to find the derivatives of the so-called “implicit functions.”

Implicit Functions

A function given in the form of y = f(x1, x2, ··· , xn) is called an explicit function, because the variable y is explicitly expressed as a function of x. But in many cases y is not an explicit function x1, x2, ··· , xn, instead, the relationship between y and x1, ··· , xn is given with the form of

F (y, x1, x2, ··· , xn) = 0.

Such an equation may also be defined as implicit function y = f(x1, x2, ··· , xn). Note that an explicit function y = f(x1, x2, ··· , xn) can always be transformed into an equation

F (y, x1, x2, ··· , xn) ≡ y −f(x1, x2, ··· , xn) = 0. The reverse transformation is not always possible. In view of this uncertainty, we have to impose a certain condition under which we can be sure that a given equation F (y, x1, x2, ··· , xn) = 0 does indeed define an implicit function y = f(x1, x2, ··· , xn). Such a result is given by the so-called “implicit-function theorem.”

Theorem 8.5.1 (Implicit-Function Theorem) Given F (y, x1, x2, ··· , xn) = 0, sup- pose that the following conditions are satisfied:

··· (a) the function F has continuous partial derivatives Fy,Fx1 ,Fx2 , ,Fxn ;

(b) at point (y0, x10, x20, ··· , xn0) satisfying F (y0, x10, x20, ··· , xn0) = 0, Fy is nonzero.

Then there exists an n−dimensional neighborhood of (x10, x20, ··· , xn0), N such that y is an implicitly defined function of variables x1, x2, ··· , xn, in the form of y = f(x1, x2, ··· , xn), and F (y, x1, x2, ··· , xn) = 0 for all points in N. Moreover, the implicit function f is con- tinuous, and has continuous partial derivatives f1, ··· , fn.

100 Derivatives of Implicit Functions

Differentiating F , we have dF = 0, or

Fydy + F1dx1 + ··· + Fndxn = 0.

Suppose that only y and x1 are allowed to vary. Then the above equation reduce to

Fydy + F1dx1 = 0. Thus

dy ∂y F1 ≡ = − . dx1 other variable constant ∂x1 Fy In the simple case where the given equation is F (y, x) = 0, the rule gives

dy F = − x . dx Fy

− 3 Example 8.5.1 Suppose y − 3x4 = 0. Then dy = − Fx = − 12x = 12x3. dx Fy 1 In this particular case, we can easily solve the given equation for y, t get y = 3x4 so that dy/dx = 12x3.

Example 8.5.2 F (x, y) = x2 + y2 − 9 = 0. Thus,

dy F 2x x = − x = − = − , (y ≠ 0). dx Fy 2y y

Example 8.5.3 F (y, x, w) = y3x2 + w3 + yxw − 3 = 0, we have

3 ∂y −Fx − 2y x + yw = = 2 2 . ∂x Fy 3y x + xw

∂y − In particular, at point (1, 1, 1), ∂x = 3/4.

Example 8.5.4 Assume that the equation F (Q, K, L) = 0 implicitly defines a production function Q = f(K,L). Then we can find MPK and MPL as follows:

∂Q FK MPK ≡ = − ; ∂K FQ

∂Q FL MPL ≡ = − . ∂L FQ

In particular, we can also find the MRTSLK which is given by

∂K FL MRTSLK ≡ | | = . ∂L FK

101 Extension to the Simultaneous-Equation Case

Consider a set of simultaneous equations.

1 F (y1, ··· , yn; x1, ··· , xm) = 0;

2 F (y1, ··· , yn; x1, ··· , xm) = 0;

···

n F (y1, ··· , yn; x1, ··· , xm) = 0.

Suppose F 1,F 2, ··· ,F n are differentiable. Then we have [ ] ∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1 dy1 + dy2 + ··· + dyn = − dx1 + dx2 + ··· + dxm ; ∂y1 ∂y2 ∂yn ∂x1 ∂x2 ∂xm [ ] ∂F 2 ∂F 2 ∂F 2 ∂F 2 ∂F 2 ∂F 2 dy1 + dy2 + ··· + dyn = − dx1 + dx2 + ··· + dxm ; ∂y1 ∂y2 ∂yn ∂x1 ∂x2 ∂xm ··· [ ] ∂F n ∂F n ∂F 1 ∂F n ∂F n ∂F n dy1 + dy2 + ··· + dyn = − dx1 + dx2 + ··· + dxm . ∂y1 ∂y2 ∂yn ∂x1 ∂x2 ∂xm Or in matrix form,         ∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1 ∂F 1 ··· dy1 ··· dx1  ∂y1 ∂y2 ∂y3     ∂x1 ∂x2 ∂x3             ∂F 2 ∂F 2 ∂F 2     ∂F 2 ∂F 2 ∂F 2    ··· dy2 ··· dx2  ∂y1 ∂y2 ∂y3     ∂x1 ∂x2 ∂x3        = −     . ············   ···  ············   ···          ∂F n ∂F n ∂F n ∂F n ∂F n ∂F n ··· dyn ··· dxm ∂y1 ∂y2 ∂y3 ∂x1 ∂x2 ∂x3 Now suppose the following Jacobian determinant is nonzero:

∂F 1 ∂F 1 ··· ∂F 1 ∂y1 ∂y2 ∂y3

1 2 ··· n ∂F 2 ∂F 2 ··· ∂F 2 ∂(F ,F , ,F ) ∂y1 ∂y2 ∂y3 |J| = = ≠ 0. ∂(y1, y2, ··· , yn) ············

∂F n ∂F n ··· ∂F n ∂y1 ∂y2 ∂y3

′ Then, we can obtain differentials dy = (dy1, dy2, . . . , dyn) by inverting J.

−1 dy = J Fx,

102 where   ∂F 1 ∂F 1 ··· ∂F 1  ∂x1 ∂x2 ∂x3     ∂F 2 ∂F 2 ··· ∂F 2   ∂x1 ∂x2 ∂x3  Fx =   . ············    ∂F n ∂F n ··· ∂F n ∂x1 ∂x2 ∂x3

If we want to obtain partial derivatives with respect to xi (i = 1, 2, . . . , m), we can let dxt = 0 for t ≠ i. Thus, we have the following equation:       1 1 1 1 ∂F ∂F ··· ∂F ∂y1 ∂F  ∂y1 ∂y2 ∂y3   ∂xi   ∂xi   2 2 2     2   ∂F ∂F ··· ∂F   ∂y2   ∂F   ∂y1 ∂y2 ∂y3   ∂xi   ∂xi      = −   . ············  ···  ···        n n n n ∂F ∂F ··· ∂F ∂yn ∂F ∂y1 ∂y2 ∂y3 ∂xi ∂xi Then, by Cramer’s rule, we have

∂y |J i| j = j (j = 1, 2, ··· , n; i = 1, 2, ··· , m). ∂xi |J|

| i| | | where Jj is obtained by replacing the jth column of J with [ ] 1 2 n ′ ∂F ∂F ··· ∂F Fxi = , , , . ∂xi ∂xi ∂xi

Of course, we can find these derivatives by inversing the Jacobian matrix J:

   −1   1 1 1 1 ∂y1 ∂F ∂F ··· ∂F ∂F  ∂xi   ∂y1 ∂y2 ∂y3   ∂xi     2 2 2   2   ∂y2   ∂F ∂F ··· ∂F   ∂F   ∂xi   ∂y1 ∂y2 ∂y3   ∂xi    = −     . ··· ············   ···        n n n n ∂yn ∂F ∂F ··· ∂F ∂F ∂xi ∂y1 ∂y2 ∂y3 ∂xi In the compact notation, ∂y − −1 = J Fxi . ∂xi

Example 8.5.5 Let the national-income model be rewritten in the form:

Y − C − I0 − G0 = 0;

C − α − β(Y − T ) = 0;

T − γ − δY = 0.

103 Then

∂F 1 ∂F 1 ∂F 1 − ∂Y ∂C ∂T 1 1 0

|J| = ∂F 2 ∂F 2 ∂F 2 = − = 1 − β + βδ. ∂Y ∂C ∂T β 1 β

∂F 3 ∂F 3 ∂F 3 − ∂Y ∂C ∂T δ 0 1

Suppose all exogenous variables and parameters are fixed except G0. Then we have       1 −1 0 ∂Y¯ 1    ∂G0          −β 1 β  ∂C¯  = 0 .    ∂G0    −δ 0 1 ∂T¯ 0 ∂G0 ¯ We can solve the above equation for, say, ∂Y /∂G0 which comes out to be

− 1 1 0

0 1 β

∂Y¯ 0 0 1 1 = = . ∂G0 |J| 1 − β + βδ

8.6 Comparative Statics of General-Function Models

Consider a single-commodity market model:

Qd = Qs, [equilibrium condition];

Qd = D(P,Y0), [∂D/∂P < 0; ∂D/∂Y0 > 0];

Qs = S(P ), [dS/dP > 0],

where Y0 is an exogenously determined income. From this model, we can obtain a single equation:

D(P,Y0) − S(P ) = 0.

Even though this equation cannot be solved explicitly for the equilibrium price P¯, by the implicit-function theorem, we know that there exists the equilibrium price P¯ that is the function of Y0: ¯ ¯ P = P (Y0), such that ¯ ¯ D(P,Y0) − S(P ) ≡ 0.

104 It then requires only a straight application of the implicit-function rule to produce the ¯ comparative-static derivative, dP /dY0: dP¯ ∂F/∂Y ∂D/∂Y = − 0 = − 0 > 0. dY0 ∂F/∂P ∂D/∂P − dS/dP Since Q¯ = S(P¯), thus we have

dQ¯ dS dP¯ = > 0. dY0 dP dY0

8.7 Matrix Derivatives

Matrix derivatives play an important role in economic analysis, especially in econometrics. If A is n × n non-singular matrix, the derivative of its determinant with respect to A is given by ∂ |A| = [C ] ∂A ij where [Cij] is the matrix of cofactors of A.

Useful Formulas

Let a, b be k × 1 vectors and M be a k × k matrix.

da′b = a db db′a = a db dMb = M ′ db db′Mb = (M + M ′)b db

Example 8.7.1 How to Find the Least Squares Estimator in a Multiple Regression Mod- el Consider the multiple regression model:

y = Xβ + ϵ where n × 1 vector y is the dependent variable, X is a n × k matrix of k explanatory variables with rank(X) = k, β is a k × 1 vector of coefficients which are to be estimated and ϵ is a n × 1 vector of disturbances. We assume that the matrices of observations X

105 and y are given. Our goal is to find an estimator b for β using the least squares method. The least squares estimator of β is a vector b, which minimizes the expression

E(b) = (y − Xb)′(y − Xb) = y′y − y′Xb − b′X′y + b′X′Xb.

The first-order condition for extremum is:

dE(b) = 0 ⇒ −2X′y + 2X′Xb = 0 ⇒ b = (X′X)−1X′y. db

On the other hand, by the third derivation rule above, we have:

dE2(b) = (2X′X)′ = 2X′X. db2

It will be seen in Chapter 11 that, to check whether the solution b is indeed a minimum, we need to prove the positive definiteness of the matrix X′X. First, notice that X′X is a symmetric matrix. To prove positive definiteness, we take an arbitrary k × 1 vector z, z ≠ 0 and check the following quadratic form:

z′(X′X)z = (Xz)′(Xz)

The assumptions rank(X) = k and z ≠ 0 imply Xz ≠ 0. Thus X′X is positive definite.

106 Chapter 9

Derivatives of Exponential and Logarithmic Functions

Exponential functions, as well as the closely related logarithmic functions, have impor- tant applications in economics, especially in connection with growth problems, and in economic dynamics in general. This chapter gives some basic properties and derivatives of exponential and logarithmic functions.

9.1 The Nature of Exponential Functions

In this simple version, the exponential function may be represented in the form:

y = f(t) = bt (b > 1) where b denotes a fixed base of the exponent. Its generalized version has the form:

y = abct

Remark. y = abct = a(bc)t. Thus we can consider bc as a base of exponent t. It changes exponent from ct to t and changes base b to bc. If the base is an irrational number e = 2.71828 ··· , the function:

y = aert is referred to the natural exponential function, which can alternatively denote

y = a exp(rt)

107 Remark. It can be proved that e may be defined as the limit: ( ) 1 n e ≡ lim f(n) = lim 1 + . n→∞ n→∞ n

9.2 Logarithmic Functions

For the exponential function y = bt and the natural exponential function y = et, taking the log of y to the base b (denote by logb y) and the base e (denoted by loge y) respectively, we have

t = logb y and ≡ t = loge y ln y The following rules are familiar to us:

Rules:

(a) ln(uv) = ln u + ln v [log of product]

(b) ln(u/v) = ln u − ln v [log of quotient]

(c) ln ua = a ln u [log of power]

(d) logb u = (logb e)(loge u) = (logb e)(ln u) [conversion of log base]

(e) logb e = 1/(loge b) = 1/ ln b [inversion of log base]

Properties of Log:

(a) log y1 = log y2 iff y1 = y2

(b) log y1 > log y2 iff y1 > y2

(c) 0 < y < 1 iff log y < 0

(d) y = 1 iff log y = 0

(e) log y → ∞ as y → ∞

(f) log y → −∞ as y → 0

Remark. t = logb y and t = ln y are the respective inverse functions of the exponential functions y = bt and y = et.

108 9.3 Derivatives of Exponential and Logarithmic Func- tions

d ln t 1 (a) dt = t

det t (b) dt = e

def(t) ′ f(t) (c) dt = f (t)e

′ d f (t) (d) dt ln f(t) = f(t)

Example 9.3.1 (a) Let y = ert. Then dy/dt = rert (b) Let y = e−t. Then dy/dt = −e−t (c) Let y = ln at. Then dy/dt = a/at = 1/t (d) Let y = ln tc. Since y = ln tc = c ln t, so dy/dt = c(1/t) (e) Let y = t3 ln t2. Then dy/dt = 3t2 ln t2 + 2t3/t = 2t2(1 + 3 ln t)

The Case of Base b

dbt t (a) dt = b ln b d 1 (b) dt logb t = t ln b d f(t) ′ t (c) dt b = f (t)b ln b ′ d f (t) 1 (d) dt logb f(t) = f(t) ln b Proof of (a). Since bt = eln bt = et ln b, then (d/dt)bt = (d/dt)et ln b = (ln b)(et ln b) = bt ln b. Proof of (b). Since

logb t = (logb e)(loge t) = (1/ ln b) ln t,

(d/dt)(logb t) = (d/dt)[(1/ ln b) ln t] = (1/ ln b)(1/t)

1−t dy d(1−t) 1−t − 1−t Example 9.3.2 (a) Let y = 12 . Then dt = dt 12 ln 12 = 12 ln 12

109 An Application

Example 9.3.3 Find dy/dx from y = xaekx−c. Taking the natural log of both sides, we have ln y = a ln x + kx − c.

Differentiating both with respect to x, we get

1 dy a = + k y dx x

dy a kx−c Thus dx = (a/x + k)y = (a/x + k)x e .

110 Chapter 10

Optimization: Maxima and Minima of a Function of One Variable

From now on, our attention will be turned to the study of goal equilibrium, in which the equilibrium state is defined as the optimal position for a given economic unit and in which the said economic unit will be deliberately striving for attainment of that equilibrium. Our primary focus will be on the classical techniques for locating optimal positions - those using differential calculus.

10.1 Optimal Values and Extreme Values

Economics is by large a science of choice. When an economic project is to be carried out, there are normally a number of alternative ways of accomplishing it. One (or more) of these alternatives will, however, be more desirable than others from the stand-point of some criterion, and it is the essence of the optimization problem to choose. The most common criterion of choice among alternatives in economics is the goal of maximizing something or of minimizing something. Economically, we may categorize such maximization and minimization problems under general heading of optimization. From a purely mathematical point of view, the collective term for maximum and minimum is the more matter-of-fact designation extremum, meaning an extreme value. In formulating an optimization problem, the first order of business is to delineate an objective function in which the dependent variable represents the objects whose magni-

111 tudes the economic unit in question can pick and choose. We shall therefore refer to the independent variables as choice variables. Consider a general-form objective function y = f(x). Three specific cases of functions are depicted in the following Figures:

Remark. The point E and F in (c) are relative (or local) extremum, in the sense that each of these points represents an extremum in some neighborhood of the point only. We shall continue our discussion mainly with reference to the search for relative extrema. Since an absolute (or global) maximum must be either a relative maxima or one of the ends of the function. Thus, if we know all the relative maxima, it is necessary only to select the largest of these and compare it with the end points in order to determine the absolute maximum. Hereafter, the extreme values considered will be relative or local ones, unless indicated otherwise.

10.2 General Result on Maximum and Minimum

Definition 10.2.1 If f(x) is continuous in a neighborhood U of a point x0, it is said to have a local or relative maximum (minimum) at x0 if for all x ∈ U, x ≠ x0, f(x) ≤ f(x0)(f(x) ≥ f(x0)).

Proposition 10.2.1 (Weierstrass’s Theorem) If f is continuous on a closed and bound- ed subset S of R1 (or, in the general case, of Rn ), then f reaches its maximum and minimum in S, i.e. there exist points m, M ∈ S such that f(m) ≤ f(x) ≤ f(M), for all x ∈ S. Moreover, the set of maximum (minimum) points is compact.

112 10.3 First-Derivative Test for Relative Maximum and Minimum

Given a function y = f(x), the first derivative f ′(x) plays a major role in our search for its extreme values. For smooth functions, relative extreme values can only occur where f ′(x) = 0, which is a necessary (but not sufficient) condition for a relative extremum (either maximum or minimum).

We summarize this as in the following proposition.

Proposition 10.3.1 (Fermat’s Theorem on the Necessary Condition for Extremum) If f(x) is differentiable on X and has a local extremum (minimum or maximum) at an ′ interior point x0 of X, then f (x0) = 0.

Note that if the first derivative vanishes at some point, it does not imply that at this point f possesses an extremum. We can only state that f has a stationary point. We have some useful results about stationary points.

Proposition 10.3.2 (Rolle Theorem) If f is continuous in [a, b], differentiable on (a, b) and f(a) = f(b), then there exists at least one point c ∈ (a, b) such that f ′(c) = 0.

From Roll Theorem, we can prove the well-known and useful Lagrange’s Theorem or the Mean-Value Theorem.

113 Proposition 10.3.3 (Lagrange’s Theorem or the Mean-Value Theorem) If f is continuous on [a, b] and differentiable on (a, b), then there exists at least one point c ∈ (a, b) ′ f(b)−f(a) such that f (c) = b−a .

− f(b)−f(a) Proof. Let g(x) = f(x) b−a x. Then, g is continuous in [a, b], differentiable on (a, b) and g(a) = g(b). Thus, by Rolle or F´ermat’sTheorem, there exists one point ∈ ′ ′ f(b)−f(a) c (a, b) such that g (c) = 0 and therefore f (c) = b−a . An variation of the above mean-value theorem is in form of integral calculus

Theorem 10.3.1 (Mean-Value Theorem of Integral Calculus) If f :[a, b] → R is continuous on [a, b], then there exists a number c ∈ (a, b) such that ∫ b f(x)dx = f ′(c)(b − a). a The second variation of the mean-value theorem is the generalized mean-value theorem:

Proposition 10.3.4 (Cauchy’s Theorem or the Generalized Mean Value Theorem) If f and g are continuous in [a, b] and differentiable in (a, b), then there exists at least one point c ∈ (a, b) such that (f(b) − f(a))g′(c) = (g(b) − g(a))f ′(c).

To verify if it is a maximum or minimum, we can use the following first-derivative test relative extremum.

First-Derivative Test Relative Extremum

′ ′ If f (x0) = 0, then the value of the function at x0, f (x0), will be

(a) A relative maximum if f ′(x) changes its sign from positive to negative from the

immediate left of the point x0 to its immediate right.

(b) A relative minimum if f ′(x) changes its sign from negative to positive from the

immediate left of the point x0 to its immediate right.

(c) No extreme points if f ′(x) has the same sign on some neighborhood.

Example 10.3.1 y = (x − 1)3. x = 1 is not extreme point even f ′(1) = 0.

114 Example 10.3.2 y = f(x) = x3 − 12x2 + 36x + 8 Since f ′(x) = 3x2 − 24x + 36, to get the critical values, i.e., the values of x satisfying the condition f ′(x) = 0, we set f ′(x) = 0, and thus

3x2 − 24x + 36 = 0.

′ ′ Its roots arex ¯1 = 2 andx ¯2 = 6. It is easy to verify that f (x) > 0 for x < 2 and f (x) < 0 for x > 2. Thus x = 2 is a maximum point and the corresponding maximum value of the function f(2) = 40. Similarly, we can verify that x = 6 is a minimum point and f(6) = 8.

Example 10.3.3 Find the relative extremum of the average-cost function

AC = f(Q) = Q2 − 5Q + 8.

Since f ′(2.5) = 0, f ′(Q) < 0 for Q < 2.5, and f ′(Q) > 0 for Q > 2.5, so Q¯ = 2.5 is a mininal point.

10.4 Second and Higher Derivatives

Since the first derivative f ′(x) of a function y = f(x) is also a function of x, we can consider the derivative of f ′(x), which is called second derivative. Similarly, we can find derivatives of even higher orders. These will enable us to develop alternative criteria for locating the relative extrema of a function. The second derivative of the function f is denoted by f ′′(x) or d2y/dx2. If the second derivative f ′′(x) exists for all x values, f(x) is said to be twice differentiable; if, in addition, f ′′(x) is continuous, f(x) is said to be twice continuously differentiable. The higher-order derivatives of f(x) can be similarly obtained and symbolized along the same line as the second derivative:

f ′′′(x), f (4)(x), ··· , f (n)(x) or d3y d4y dny , , ··· , dx3 dx4 dxn Remark. dny/dxn can be also written as (dn/dxn)y, where the dn/dxn part serves as an operator symbol instructing us to take the nth derivative with respect to x.

115 Example 10.4.1 y = f(x) = 4x4 − x3 + 17x2 + 3x − 1 Then

f ′(x) = 16x3 − 3x2 + 34x + 3

f ′′(x) = 48x2 − 6x + 34

f ′′′(x) = 96x − 6

f (4)(x) = 96

f (5)(x) = 0

Example 10.4.2 Find the first four derivatives of the function

x y = g(x) = (x ≠ −1) 1 + x

g′(x) = (1 + x)−2

g′′(x) = −2(1 + x)−3

g′′′(x) = 6(1 + x)−4

g(4)(x) = −24(1 + x)−5

Remark. A negative second derivative is consistently reflected in an inverse U-shaped curve; a positive second derivative is reflected in an U-shaped curve.

10.5 Second-Derivative Test

Recall the meaning of the first and the second derivatives of a function f. The sign of the first derivative tells us whether the value of the function increases (f ′ > 0) or decreases(f ′ < 0), whereas the sign of the second derivative tells us whether the slope of the function increases (f ′′ > 0) or decreases (f ′′ < 0). This gives us an insight how to verify that at a stationary point there exists a maximum or minimum. Thus, we have the following second-derivative test for relative extremum.

Second-Derivative Test for Relative Extremum:

′ If f (x0) = 0, then the value of the function at x0, f(x0) will be

116 ′′ (a) A relative maximum if f (x0) < 0.

′′ (b) A relative minimum if f (x0) > 0.

This test is in general more convenient to use than the first-derivative test, since it does not require us to check the derivative sign to both the left and right of x.

Example 10.5.1 y = f(x) = 4x2 − x. Since f ′(x) = 8x − 1 and f ′′(x) = 8, we know f(x) reaches its minimum atx ¯ = 1/8. Indeed, since the function plots as a U-shaped curve, the relative minimum is also the absolute minimum.

Example 10.5.2 y = g(x) = x3 − 3x2 + 2. The y′ = g′(x) = 3x2 − 6x and y′′ = 6x − 6. Setting g′(x) = 0, we obtain the critical valuesx ¯1 = 0 andx ¯2 = 2, which in turn yield the two stationary values g(0) = 2 (a maximum because g′(0) < 0) and g(2) = −2 (a minimum because g′(2) > 0).

′ ′′ ′′ Remark. Note that when f (x0) = 0, f (x0) < 0 (f (x0) > 0) is a sufficient condition for a relative maximum (minimum) but not a necessary condition. However, the condition ′′ ′′ f (x0) ≤ 0 (f (x0) ≥ 0) is a necessary (even though not sufficient) for a relative maximum (minimum).

Condition for Profit Maximization

Let R = R(Q) be the total-revenue function and let C = C(Q) be the total-cost function, where Q is the level of output. The profit function is then given by

π = π(Q) = R(Q) − C(Q).

To find the profit-maximizing output level we need to find Q¯ such that

π′(Q¯) = R′(Q¯) − C′(Q¯)

or R′(Q¯) = C′(Q¯), or MR(Q¯) = MC(Q¯).

To be sure the first-order condition leads to a maximum, we require d2π ≡ π′′(Q¯) − C′′(Q¯) < 0. dQ

117 Economically, this would mean that, if the rate of change of MR is less than the rate of change of MC at Q¯, then that output Q will maximize profit.

Example 10.5.3 Let R(Q) = 1200Q − 2Q2 and C(Q) = Q3 − 61.25Q2 + 1528.5Q + 2000. Then the profit function is

π(Q) = −Q3 + 59.2Q2 − 328.5Q − 2000.

′ 2 ¯ ¯ Setting π (Q) = −3Q + 118.5Q − 328.5 = 0, we have Q1 = 3 and Q2 = 36.5. Since π′′(3) = −18 + 118.5 = 100.5 > 0 and π′′(36.5) = −219 + 118.5 = −100.5 < 0, so the profit-maximizing output is Q¯ = 36.5.

10.6 Taylor Series

This section considers the so-called “expansion” of a function y = f(x) into what is known as Taylor series (expansion around any point x = x0). To expand a function y = f(x) around a point x0 means to transform that function into a polynomial form, in which the ′ coefficients of the various terms are expressed in terms of the derivative values f (x0), ′′ f (x0), etc. - all evaluated at the point of expansion x0.

Proposition 10.6.1 (Taylor’s Theorem) Given an arbitrary function ϕ(x), if we know ′ ′′ the values of ϕ(x0), ϕ (x0), ϕ (x0), etc., then this function can be expanded around the point x0 as follows:

1 1 ϕ(x) = ϕ(x ) + ϕ′(x )(x − x ) + ϕ′′(x )(x − x )2 + ··· + ϕ(n)(x )(x − x )n + R 0 0 0 2! 0 0 n! 0 0 n

≡ Pn + Rn

where Pn represents the nth-degree polynomial and Rn denotes a remainder which can be denoted by the so-called Lagrange form of the remainder:

ϕ(n+1)(P ) R = (x − x )n+1 n (n + 1)! 0 with P being some number between x and x0. Here n! is the “n factorial”, defined as

n! ≡ n(n − 1)(n − 2) ··· (3)(2)(1).

118 Remark. When n = 0, the Talor series reduce to the mean-value theorem we discussed in above: ′ ϕ(x) = P0 + R0 = ϕ(x0) + ϕ (P )(x − x0) or ′ ϕ(x) − ϕ(x0) = ϕ (P )(x − x0)

This mean-value theorem states that the difference between the value of the function

ϕ at x0 and at any other x value can be expressed as the product of the difference (x−x0) ′ and ϕ (P ) with P being some point between x and x0.

Remark. If x0 = 0, then Taylor series reduce to the so-called Maclaurin series:

1 1 1 ϕ(x) = ϕ(0) + ϕ′(0)x + ϕ′′(0)x2 + ··· + ϕ(n)(0)xn + ϕ(n+1)(P )xn+1 2! n! (n + 1)! where P is a point between 0 and x.

Example 10.6.1 Expand the function

1 ϕ(x) = 1 + x

119 1 around the point x0 = 1, with n = 4. Since ϕ(1) = 2 and

ϕ′(x) = −(1 + x)−2, ϕ′(1) = −1/4

ϕ′′(x) = 2(1 + x)−3, ϕ′′(1) = 1/4

ϕ(3)(x) = −6(1 + x)−4, ϕ(3)(1) = −3/8

ϕ(4)(x) = 24(1 + x)−5, ϕ(4)(1) = 3/4 we obtain the following Taylor series:

2 3 4 ϕ(x) = 1/2 − 1/4(x − 1) + 1/8(x − 1) − 1/16(x − 1) + 1/32(x − 4) + Rn.

10.7 Nth-Derivative Test

A relative extremum of the function f can be equivalently defined as follows:

A function f(x) attains a relative maximum (minimum) value at x0 if f(x) − f(x0) is nonpositive (nonnegative) for values of x in some neighborhood of x0.

Assume f(x) has finite, continuous derivatives up to the desired order at x = x0, then the function can be expanded around x = x0 as a Taylor series:

1 f(x) − f(x ) =f ′(x )(x − x ) + f ′′(x )(x − x )2 + ··· 0 0 0 2! 0 0 1 1 + f (n)(x )(x − x )n + f (n+1)(x )(x − x )n+1 n! 0 0 (n + 1)! 0 0

From the above expansion, we have the following test:

Nth-Derivative Test

′ If f (x0) = 0, and if the first nonzero derivative value at x0 encountered in successive (N) derivation is that of the Nth derivative, f (x0) ≠ 0, then the stationary value f(x0) will be

(N) (a) A relative maximum if N is an even number and f (x0) < 0.

(N) (b) A relative minimum if N is an even number and f (x0) > 0.

(c) A inflection point if N is odd.

120 Example 10.7.1 y = (7 − x)4. Since f ′(7) = 4(7 − 7)3 = 0, f ′′(7) = 12(7 − 7)2 = 0, f ′′′(7) = 24(7 − 7) = 0, f (n)(7) = 24 > 0, so x = 7 is a minimum point such that f(7) = 0.

121 Chapter 11

Optimization: Maxima and Minima of a Function of Two or More Variables

This chapter develops a way of finding the extreme values of an objective function that involves two or more choice variables. As before, our attention will be focused heavily on relative extrema, and for this reason we should often drop the adjective “relative,” with the understanding that, unless otherwise specified, the extrema referred to are relative.

11.1 The Differential Version of Optimization Con- dition

This section shows the possibility of equivalently expressing the derivative version of first and second conditions in terms of differentials. Consider the function z = f(x). Recall that the differential of z = f(x) is

dz = f ′(x)dx.

Since f ′(x) = 0, which implies dz = 0, is the necessary condition for extreme values, so dz = 0 is also the necessary condition for extreme values. This first-order condition requires that dz = 0 as x is varied. In such a context, with dx ≠ 0, dz = 0 if and only if f ′(x) = 0.

122 What about the sufficient conditions in terms of second-order differentials? Differentiating dz = f ′(x)dx, we have

d2z ≡ d(dz) = d[f ′(x)dx]

= d[f ′(x)]dx

= f ′′(x)dx2

Note that the symbols d2z and dx2 are fundamentally different. d2z means the second- order differential of z; but dx2 means the squaring of the first-order differential dx. Thus, from the above equation, we have d2z < 0 (d2z > 0) if and only if f ′′x < 0 (f ′′x > 0). Therefore, the second-order sufficient condition for maximum (minimum) of z = f(x) is d2z < 0 (d2z > 0).

11.2 Extreme Values of a Function of Two Variables

For a function of one variable, an extreme value is represented graphically by the peak of a hill or the bottom of a valley in a two-dimensional graph. With two choice variables, the graph of the function z = f(x, y) becomes a surface in a 3-space, and while the extreme values are still to be associated with peaks and bottoms.

First-Order Condition

For the function z = f(x, y), the first-order necessary condition for an extremum again involves dz = 0 for arbitrary values of dx and dy: an extremum point must be a stationary

123 point, at which z must be constant for arbitrary infinitesimal changes of two variables x and y. In the present two-variable case, the total differential is

dz = fxdx + fydy

Thus the equivalent derivative version of the first-order condition dz = 0 is

fx = fy = 0 or ∂f/∂x = ∂f/∂y = 0

As in the earlier discussion, the first-order condition is necessary, but not sufficient. To develop a sufficient condition, we must look to the second-order total, which is related to second-order partial derivatives.

Second-Order Partial Derivatives

From the function z = f(x, y), we can have two first-order partial derivatives, fx and fy.

Since fx and fy are themselves functions of x, we can find second-order partial derivatives: ( ) 2 ≡ ∂ ∂ z ≡ ∂ ∂z fxx fx or 2 ∂x ∂x ∂x( ∂x) 2 ≡ ∂ ∂ z ≡ ∂ ∂z fyy fy or 2 ∂y ∂y ( ∂y) ∂y ∂2z ∂ ∂z fxy ≡ ≡ ∂x∂y ∂x ( ∂y ) ∂2z ∂ ∂z f ≡ ≡ yx ∂y∂x ∂y ∂x The last two are called cross (or mixed) partial derivatives.

Theorem 11.2.1 (Schwarz’s Theorem or Young’s Theorem) If at least one of the two partials is continuous, then ∂2f ∂2f = , i, j = 1, 2, ··· , n. ∂xj∂xi ∂xi∂xj

Remark. Even though fxy and fyx have been separately defined, they will – accord- ing to Young’s theorem, be identical with each other, as long as the two cross partial derivatives are both continuous. In fact, this theorem applies also to functions of three or more variables. Given z = g(u, v, w), for instance, the mixed partial derivatives will be characterized by guv = gvu, guw = gwu, etc. provided these partial derivatives are continuous.

124 Example 11.2.1 Find all second-order partial derivatives of z = x3 + 5xy − y2. The first partial derivatives of this function are

2 fx = 3x + 5y and fy = 5x − 2y

Thus, fxx = 6x, fyx = 5, and fyy = −2. As expected, fyx = fxy.

Example 11.2.2 For z = x2e−y, its first partial derivatives are

−y 2 −y fx = 2xe and fy = −x e

Again, fyx = fxy.

Second-Order Total Differentials

From the first total differential

dz = fxdx + fydx, we can obtain the second-order total differential d2z:

∂(dz) ∂(dz) d2z ≡ d(dz) = dx + dy ∂x ∂y ∂ ∂ = (f dx + f dy)dx + (f dx + f dy)dy ∂x x y ∂y x y

= [fxxdx + fxydy]dx + [fyxdx + fyydy]dy

2 2 = fxxdx + fxydydx + fyxdxdy + fyydy

2 2 = fxxdx + 2fxydxdy + fyydy [iffxy = fyx].

We know that if f(x, y) satisfy the conditions of Schwarz’s theorem, we have fxy = fyx.

Example 11.2.3 Given z = x3 + 5xy − y2, find dz and dz2

dz =fxdx + fydy =(3x2 + 5y)dx + (5x − 2y)dy

2 2 2 d z =fxxdx + 2fxydxdy + fyydy =6xdx2 + 10dxdy − 2dy2

125 Note that the second-order total differential can be written in matrix form

d2z = f dx2 + 2f dxdy + f dy2 xx  xy   yy [ ] fxx fxy dx = dx, dy fyx fyy dy for the function z = f(x, y), where the matrix  

fxx fxy H =   fyx fyy is the Hessian matrix (or simply a Hessian). Then, by the discussion on quadratic forms in Chapter 5, we have

2 2 (a) d z is positive definite iff fxx > 0 and fxxfyy − (fxy) > 0;

2 2 (b) d z is negative definite iff fxx < 0 and fxxfyy − (fxy) > 0.

2 From the inequality fxxfyy − (fxy) > 0, it implies that fxx and fyy are required to take the same sign.

Example 11.2.4 Give fxx = −2, fxy = 1, and fyy = −1 at a certain point on a function z = f(x, y), does d2z have a definite sign at that point regardless of the values of dx and dy? The Hessian determinant is in this case

− 2 1 , 1 −1 with principal minors |H1| = −2 < 0 and

− 2 1 |H2| = = 2 − 1 = 1 > 0. 1 −1

Thus d2z is negative definite.

Second-Order Condition for Extremum

Using the concept of d2z, we can state the second-order sufficient condition for:

126 (a) A maximum of z = f(x, y) if d2z < 0 for any values of dx and dy, not both zero;

(b) A minimum of z = f(x, y) if d2z > 0 for any values of dx and dy, not both zero.

Example 11.2.5 Give fxx = −2, fxy = 1, and fyy = −1 at a certain point on a function z = f(x, y), does d2z have a definite sign at that point regardless of the values of dx and dy? The Hessian determinant is in this case

− 2 1 , 1 −1 with principal minors |H1| = −2 < 0 and

− 2 1 |H2| = = 2 − 1 = 1 > 0. 1 −1

Thus d2z is negative definite.

For operational convenience, second-order differential conditions can be translated into equivalent conditions on second-order derivatives. The actual translation would require a knowledge of quadratic forms, which will be discussed in the next section. But we may first introduce the main result here.

Main Result

For any values of dx and dy, not both zero,

2 2 d z < 0 iff fxx < 0; fyy < 0; and fxxfyy > (fxy)

2 2 d z > 0 iff fxx > 0; fyy > 0; and fxxfyy > (fxy)

From the first- and second-order conditions, we obtain conditions for relative ex- tremum:

Conditions for Maximum:

(1) fx = fy = 0 (necessary condition);

2 (2) fxx < 0; fyy < 0; and fxxfyy > (fxy)

127 Conditions for Minimum:

(1) fx = fy = 0 (necessary condition);

2 (2) fxx > 0; fyy > 0; and fxxfyy > (fxy)

Example 11.2.6 Find the extreme values of z = 8x3 + 2xy − 3x2 + y2 + 1.

2 fx = 24x + 2y − 6x, fy = 2x + 2y

fxx = 48x − 6, fyy = 2, fxy = 2

Setting fx = 0 and fy = 0, we have

24x2 + 2y − 6x = 0

2y + 2x = 0

Then y = −x and thus from 24x2 + 2y − 6y, we have 24x2 − 8x = 0 which yields two solutions for x:x ¯1 = 0 andx ¯2 = 1/3. 2 Since fxx(0, 0) = −6 and fyy(0, 0) = 2, it is impossible fxxfyy ≥ (fxy) = 4, so the point (¯x1, y¯1) = (0, 0) is not extreme point. For the solution (¯x2, y¯2) = (1/3, −1/3), we 2 find that fxx = 10 > 0, fyy = fxy = 2 > 0, and fxxfyy − (fxy) = 20 − 4 > 0, so (¯x, y,¯ z¯) = (1/3, −1/3, 23/27) is a relative minimum point.

x 2y x 2y Example 11.2.7 z = x + 2ey − e − e . Letting fx = 1 − e and fy = 2e − 2e , we have x 2y x¯ = 0 andy ¯ = 1/2. Since fxx = −e , fyy = −4e , and fxy = 0, we have fxx(0, 1/2) = 2 −1 < 0, fyy(0, 1/2) = −4e < 0, and fxxfyy − (fxy) > 0, so (¯x, y,¯ z¯) = (0, 1/2, −1) is the maximum point.

11.3 Objective Functions with More than Two Vari- ables

When there are n choice variables, the objective function may be expressed as

z = f(x1, x2, ··· , xn)

128 The total differential will then be

dz = f1dx1 + f2dx2 + ··· + fndxn.

so that the necessary condition for extremum is dz = 0 for arbitrary dxi, which in turn means that all the n first-order partial derivatives are required to be zero:

f1 = f2 = ··· = fn = 0.

It can be verified that the second-order differential d2z can be written as     ··· f11 f12 f1n dx1 [ ]     f f ··· f  dx  2  21 22 2n  2 d z = dx , dx , ··· , dx     1 2 n ············  ···     

fn1 fn2 ··· fnn dxn

≡ (dx)′Hdx.

Thus the Hessian determinant is

··· f11 f12 f1n

··· f21 f22 f2n |H| = ············

fn1 fn2 ··· fnn and the second-order sufficient condition for extremum is, as before, that all the n principal minors be positive for a minimum in z and that they duly alternate in sign for a maximum in z, the first one being negative. In summary, we have

Conditions for Maximum:

(1) f1 = f2 = ··· = fn = 0 (necessary condition);

n 2 (2) |H1| < 0, |H2| > 0, |H3| < 0, ··· ,(−1) |Hn| > 0. [d z is negative definite].

Conditions for Minimum:

(1) f1 = f2 = ··· = fn = 0 (necessary condition);

2 (2) |H1| > 0, |H2| > 0, |H3| > 0, ··· , |Hn| > 0. [d z is positive definite].

129 Example 11.3.1 Find the extreme values of

2 2 2 z = 2x1 + x1x2 + 4x2 + x1x3 + x3 + 2.

From the first-order conditions:

f1 = 0 : 4x1 + x2 + x3 = 0

f2 = 0 : x1 + 8x2 + 0 = 0

f3 = 0 : x1 + 0 + 2x3 = 0

we can find a unique solutionx ¯1 =x ¯2 =x ¯3 = 0. This means that there is only one stationary value,z ¯ = 2. The Hessian determinant of this function is

f11 f12 f13 4 1 1

| | H = f21 f22 f23 = 1 8 0

f31 f32 f33 1 0 2 the principal minors of which are all positive: |H1| = 4, |H2| = 31, and |H3| = 54. Thus we can conclude thatz ¯ = 2 is a minimum.

11.4 Second-Order Conditions in Relation to Con- cavity and Convexity

Second-order conditions which are always concerned with whether a stationary point is the peak of a hill or the bottom of a valley are closely related to the so-called (strictly) concave or convex functions. A function that gives rise to a hill (valley) over the entire domain is said to be a concave (convex) function. If the hill (valley) pertains only to a subset S of the domain, the function is said to be concave (convex) on S. Mathematically, a function is concave (or convex) if and only if, for any pair of distinct points u and v in the domain of f, and for any 0 < θ < 1,

θf(u) + (1 − θ)f(v) ≤ f(θu + (1 − θ)v)

(or θf(u) + (1 − θ)f(v) ≥ f(θu + (1 − θ)v))

130 Further, if the weak inequality “≤” (“≥”) is replaced by the strictly inequality “<” (“>”), the function is said to be strictly concave (strictly convex). Remark. θu + (1 − θ)v consists of line segments between points u and v when θ takes values of 0 ≤ θ ≤ 1. Thus, in the sense of geometry, the function f is concave (convex) iff, the line segment of any two points u and v lies on or below (above) the surface. The function is strictly concave (strictly convex) iff the line segment lies entirely below (above) the surface, except at M and N.

From the definition of concavity and convexity, we have the following three theorems: Theorem I (Linear functions). If f(x) is a linear function, then it is a concave function as well as a convex function, but not strictly so. Theorem II (Negative of a function). If f(x) is a (strictly) concave function, then −f(x) is a (strictly) convex function, and vice versa. Theorem III (Sum of functions). If f(x) and g(x) are both concave (convex) functions, then f(x) + g(x) is a concave (convex) function. Further, in addition, either one or both of them are strictly concave (strictly convex), then f(x) + g(x) is strictly concave (convex). In view of the association of concavity (convexity) with a global hill (valley) config- uration, an extremum of a concave (convex) function must be a peak - a maximum (a bottom - a minimum). Moreover, the maximum (minimum) must be a peak an absolute

131 maximum (minimum). Further, the maximum (minimum) is unique if the function is strictly concave (strictly convex). In the preceding paragraph, the properties of concavity and convexity are taken to be global in scope. If they are valid only for a portion of surface (only in a subset S of domain), the associated maximum and minimum are relative to that subset of the domain.

We know that when z = f(x1, ··· , xn) is twice continuous differentiable, z = f(x1, ··· , xn) reaches its maximum (minimum) if d2z is negative (positive) definite. The following proposition shows the relationship between concavity (convexity) and negative definiteness.

Proposition 11.4.1 A twice continuous differentiable function z = f(x1, x2, ··· , xn) is concave (convex) if and only if d2z is everywhere negative (positive) semidefinite. The said function is strictly concave (convex) if (but not only if) d2z is everywhere negative (positive) definite.

This proposition is useful. It can easily verify whether a function is strictly concave (strictly convex) by checking whether its Hessian matrix is negative (positive) definite.

Example 11.4.1 Check z = −x4 for concavity or convexity by the derivative condition. Since d2z = −12x2 ≤ 0 for all x, it is concave. This function, in fact, is strictly concave.

2 2 Example 11.4.2 Check z = x1 + x2 for concavity or convexity. Since

f11 f12 2 0 |H| = =

f21 f22 0 2

|H1| = 2 > 0, |H2| = 4 > 0. Thus, by the proposition, the function is strictly convex.

Proposition 11.4.2 Suppose f : R → R is differentiable. Then f is concave if and only if for any x, y ∈ R, we have

f(y) 5 f(x) + f ′(x)(y − x). (11.1)

Indeed, for a concave function, the from Figure 11.1, we have u(x) − u(x∗) 5 u′(x∗), x − x∗ which means (12.3).

132 Figure 11.1: Concave function

When there are two or more independent variables, the above proposition becomes:

Proposition 11.4.3 Suppose f : Rn → R is differentiable. Then f is concave if and only if for any x, y ∈ R, we have

∑n ∂f(x) f(y) 5 f(x) + (y − x ). (11.2) ∂x j j j=1 j

11.5 Economic Applications

Problem of a Multiproduct Firm

Example 11.5.1 Suppose a competitive firm produces two products. Let Qi represents the output level of the i-th product and let the prices of the products are denoted by P1 and P2. Since the firm is a competitive firm, it takes the prices as given. Accordingly, the firm’s revenue function will be

TR = P1Q1 + P2Q2

The firm’s cost function is assumed to be

2 2 C = 2Q1 + Q1Q2 + 2Q2.

Thus, the profit function of this hypothetical firm is given by

− − 2 − − 2 π = TR C = P1Q1 + P2Q2 2Q1 Q1Q2 2Q2.

133 The firm wants to maximize the profit by choosing the levels of Q1 and Q2. For this purpose, setting ∂π = 0 : 4Q1 + Q2 = P1 ∂Q1 ∂π = 0 : Q1 + 4Q2 = P2 ∂Q2 We have 4P − P 4P − P Q¯ = 1 2 and Q¯ = 2 1 . 1 15 2 15 Also the Hessian matrix is    

π11 π12 −4 −1 H =   =   . π21 π22 −1 −4

2 Since |H1| = −4 < 0 and |H2| = 16 − 1 > 0, the Hessian matrix (or d z) is negative definite, and the solution does maximize. If fact, since H is everywhere negative definite, the maximum profit found above is actually a unique absolute maximum.

Example 11.5.2 Let us now transplant the problem in the above example into the set- ting of a monopolistic market. Suppose that the facing the monopolist firm are as follows:

Q1 = 40 − 2P1 + P2

Q2 = 15 + P1 − P2

Again, the cost function is given by

2 2 C = Q1 + Q1Q2 + Q2.

From the monopolistic’s demand function, we can express prices P1 and P2 as functions of Q1 and Q2. That is, solving

− 2P1 + P2 = Q1 − 40

P1 − P2 = Q2 − 15

we have

P1 = 55 − Q1 − Q2

P2 = 70 − Q1 − 2Q2

134 Consequently, the firm’s total revenue function TR can be written as

TR =P1Q1 + P2Q2

=(55 − Q1 − Q2)Q1 + (70 − Q1 − 2Q2)Q2 − − 2 − 2 =55Q1 + 70Q2 2Q1Q2 Q1 2Q2.

Thus the profit function is

π = TR − C − − 2 − 2 = 55Q1 + 70Q2 3Q1Q2 2Q1 3Q2

which is an object function with two choice variable. Setting

∂π = 0 : 4Q1 + 3Q2 = 55 ∂Q1 ∂π = 0 : 3Q1 + 6Q2 = 70 ∂Q2 we can find the solution output level are

2 (Q¯ , Q¯ ) = (8, 7 ) 1 2 3

The prices and profit are

1 2 1 P¯ = 39 , P¯ = 46 , andπ ¯ = 488 . 1 3 2 3 3

Inasmuch as the Hessian determinant is

− − 4 3 , −3 −6 we have |H1| = −4 < 0 and |H2| = 15 > 0 so that the value ofπ ¯ does represent the maximum. Also, since Hessian matrix is everywhere negative definite, it is a unique absolute maximum.

135 Chapter 12

Optimization with Equality Constraints

The last chapter presented a general method for finding the relative extrema of an objec- tive function of two or more variables. One important feature of that discussion is that all the choice variables are independent of one another, in the sense that the decision made regarding one variable does not impinge upon the choice of the remaining variables. However, in many cases, optimization problems are the constrained optimization prob- lem. Say, every consumer maximizes his utility subject to his budget constraint. A firm minimizes the cost of production with the constraint of production technique. In the present chapter, we shall consider the problem of optimization with equality constraints. Our primary concern will be with relative constrained extrema.

12.1 Effects of a Constraint

In general, for a function, say z = f(x, y), the difference between a constrained extremum and a free extremum may be illustrated in Figure 12.1. The free extremum in this particular graph is the peak point of entire domain, but the constrained extremum is at the peak of the inverse U-shaped curve situated on top of the constraint line. In general, a constraint maximum can be expected to have a lower value than the free maximum, although by coincidence, the two maxima may happen to have the same value. But the constrained maximum can never exceed the free maximum.

136 Figure 12.1: Difference between a constrained extremum and a free extremum

In general, the number of constraints should be less than the number of choice variables.

12.2 Finding the Stationary Values

For illustration, let us consider a problem: maximizes his utility:

u(x1, x2) = x1x2 + 2x1 subject to the budget constraint:

4x1 + 4x2 = 60.

Even without any new technique of solution, the constrained maximum in this problem can easily be found. Since the budget line implies 60 − 4x x = 1 = 30 − 2x 2 2 1

137 we can combine the constraint with the objective function by substitution. The result is an objective function in one variable only:

− − 2 u = x1(30 2x1) + 2x1 = 32x1 2x1 which can be handled with the method already learned. By setting ∂u = 32 − 4x1 = 0, ∂x1 we get the solutionx ¯1 = 8 and thus, by the budget constraint,x ¯2 = 30−2¯x1 = 30−16 = 14 2 2 − since d u/dx1 = 4 < 0, that stationary value constitutes a (constrained) maximum. However, when the constraint is itself a complicated function, or when the constraint cannot be solved to express one variable as an explicit function of the other variables, the technique of substitution and elimination of variables could become a burdensome task or would in fact be of no avail. In such case, we may resort to a method known as the method of Lagrange multiplier.

Lagrange-Multiplier Method

The essence of the Lagrange-multiplier method is to condition of the free-extremum prob- lem can still be applied. In general, given an objective function

z = f(x, y) subject to the constraint g(x, y) = c where c is a constant, we can write the Lagrange function as

Z = f(x, y) + λ[c − g(x, y)].

The symbol λ, representing some as yet undermined number, is called a Lagrange multiplier. If we can somehow be assured that g(x, y) = c, so that the constraint will be satisfied, then the last term of Z will vanish regardless of the value of λ. In that event, Z will be identical with u. Moreover, with the constraint out of the way, we only have to seek the free maximum. The question is: How can we make the parenthetical expression in Z vanish?

138 The tactic that will accomplish this is simply to treat λ as an additional variable, i.e., to consider Z = Z(λ, x, y). For stationary values of Z, then the first-order condition for free extremum is ∂Z Z ≡ = c − g(x, y) = 0 λ ∂λ ∂Z Z ≡ = f − λg (x, y) = 0 x ∂x x x ∂Z Z ≡ = f − λg (x, y) = 0. y ∂y y y

Example 12.2.1 Let us again consider the consumer’s choice problem above. The La- grange function is

Z = x1x2 + 2x1 + λ[60 − 4x1 − 2x2] for which the necessary condition for a stationary value is

Zλ = 60 − 4x1 − 2x2 = 0

Zx = x2 + 2 − 4λ = 0

Zy = x1 − 2λ = 0.

Solving the critical values of the variables, we find thatx ¯1 = 8,x ¯2 = 14, and λ = 4. As expected,x ¯1 = 8 andx ¯2 = 14 are the same obtained by the substitution method.

Example 12.2.2 Find the extremum of z = xy subject to x + y = 6. The Lagrange function is Z = xy + λ(6 − x − y)

The first-order condition is

Zλ = 6 − x − y = 0

Zx = y − λ = 0

Zy = x − λ = 0.

Thus, we find λ¯ = 3,x ¯ = 3,y ¯ = 3.

2 2 Example 12.2.3 Find the extremum of z = x1 + x2 subject to x1 + 4x2 = 2. The Lagrange function is

2 2 − − Z = x1 + x2 + λ(2 x1 4x2).

139 The first-order condition (FOC) is

Zλ = 2 − x1 − 4x2 = 0 − Zx1 = 2x1 λ = 0 − Zx2 = 2x2 4λ = 0.

The stationary value of Z, defined by the solution

4 2 8 4 λ¯ = , x¯ = , x¯ = , z¯ = . 17 1 17 2 17 17

To tell whetherz ¯ is a maximum, we need to consider the second-order condition.

An Interpretation of the Lagrange Multiplier

The Lagrange multiplier λ¯ measures the sensitivity of Z to change in the constraint. If we can express the solution λ¯,x ¯, andy ¯ all as implicit functions of the parameter c:

λ¯ = λ¯(c), x¯ =x ¯(c), andy ¯ =y ¯(c), all of which will have continuous derivative, we have the identities

c − g(¯x, y¯) ≡ 0 ¯ fx(¯x, y¯) − λgx(¯x, y¯) ≡ 0 ¯ fy(¯x, y¯) − λgy(¯x, y¯) ≡ 0.

Thus, we can consider Z as a function of c:

Z = f(¯x, y¯) + λ¯[c − g(¯x, y¯)].

Then [ ] dZ¯ dx¯ dy¯ dλ¯ dx¯ dy¯ = f + f + [c − g(¯x, y¯)] + λ 1 − g − g dc x dc y dc dc x dc y dc dx¯ dy¯ dx¯ = (f − λg ) + (f − λg ) + [c − g(¯x, y¯)] + λ x x dc y y dc dc = λ

140 n-Variable and Multiconstraint Cases

The generalization of the Lagrange-multiplier method to n variable can be easily carried. The objective function is

z = f(x1, x2, ··· , xn) subject to

g(x1, x2, ··· , xn) = c.

It follows that the Lagrange function will be

Z = f(x1, x2, ··· , xn) + λ[c − g(x1, x2, ··· , xn)] for which the first-order condition will be given by

Zλ = c − g(x1, x2, ··· , xn)

Zi = fi(x1, x2, ··· , xn) − λgi(x1, x2, ··· , xn)[i = 1, 2, ··· , n].

If the objective function has more than one constraint, say, two constraints

g(x1, x2, ··· , xn) = c and h(x1, x2, ··· , xn) = d

The Lagrange function is

Z = f(x1, x2, ··· , xn) + λ[c − g(x1, x2, ··· , xn)] + µ[d − h(x1, x2, ··· , xn)],

for which the first-order condition consists of (n + 2) equations:

Zλ = c − g(x1, x2, ··· , xn) = 0

Zµ = d − h(x1, x2, ··· , xn) = 0

Zi = fi(x1, x2, ··· , xn) − λgi(x1, x2, ··· , xn) − µhi(x1, x2, ··· , xn) = 0

12.3 Second-Order Condition

From the last section, we know that finding the constrained extremum is equivalent to find the free extremum of the Lagrange function Z and give the first-order condition. This section gives the second-order condition for the constrained extremum of f.

141 We only consider the case where the objective functions take form

z = f(x1, x2, ··· , xn) subject to

g(x1, x2, ··· , xn) = c.

The Lagrange function is then

Z = f(x1, x2, ··· , xn) + λ[c − g(x1, x2, ··· , xn)].

Define the bordered Hessian determinant |H¯ | by

··· 0 g1 g2 gn

··· g1 Z11 Z12 Z1n

| ¯ | ··· H = g2 Z21 Z22 Z2n

···············

gn Zn1 Zn2 ··· Znn where Zij = fij − λgij. Note that by the first-order condition, f f f λ = 1 = 2 = ··· = n . g1 g2 gn The bordered principal minors can be defined as

0 g g g 1 2 3 0 g1 g2 g Z Z Z ¯ ¯ 1 11 12 13 |H2| = g Z Z |H3| = (etc.) 1 11 12 g2 Z21 Z22 Z23 g Z Z 2 21 22 g3 Z31 Z32 Z33

The Conditions for Maximum:

(1) Zλ = Z1 = Z2 = ··· = Zn = 0 [necessary condition]

¯ ¯ ¯ n ¯ (2) |H2| > 0, |H3| < 0, |H4| > 0, ··· ,(−1) |Hn| > 0.

The Conditions for Minimum:

(1) Zλ = Z1 = Z2 = ··· = Zn = 0 [necessary condition] ¯ ¯ ¯ ¯ (2) |H2| < 0, |H3| < 0, |H4| < 0, ··· ,(|Hn| < 0.

142 Example 12.3.1 For the objective function z = xy subject to x + y = 6, we have shown that (¯x, y,¯ z¯) = (3, 6, 9) is an extremum point. Since Zx = y − λ and Zy = x − λ, then

Zxx = 0, Zxy = 1, and Zyy = 0, gx = gy = 1. Thus, we find that

0 1 1

| ¯ | H = 1 0 1 = 2 > 0.

1 1 0 which establishes the value ofz ¯ = 9 as an maximum.

2 2 Example 12.3.2 For the objective function of z = x1 + x2 subject to x1 + 4x2 = 2, we have shown that (¯x, y,¯ z¯) = (2/17, 8/17, 4/7) is the extremum point. To tell whether it is maximum or minimum, we check the second-order sufficient condition. Since Z1 = 2x1 −λ and Z2 = 2x2 − λ as well as g1 = 1 and g2 = 4, we have Z11 = 2, Z22 = 2, and

Z12 = Z21 = 0. It thus follows that the bordered Hessian is

0 1 4

| ¯ | − H = 1 2 0 = 34 < 0

4 0 2 and the valuez ¯ = 4/17 is a minimum.

12.4 General Setup of the Problem

Now consider the most general setting of the problem with n variables and m equali- ty constraints (”Extremize” means to find either the minimum or the maximum of the objective function f):

extremize f(x1, . . . , xn) (12.1)

j s.t. g (x1, . . . , xn) = bj, j = 1, 2, . . . , m < n.

Again, f is called the objective function, g1, g2, . . . , gm are the constraint functions, b1, b2, . . . , bm are the constraint constants. The difference n − m is the number of de- grees of freedom of the problem. Note that n must be strictly greater than m, otherwise there is no freedom to choose.

143 If it is possible to explicitly express (from the constraint functions) m independen- t variables as functions of the other n − m independent variables, we can eliminate m variables in the objective function, thus the initial problem will be reduced to the uncon- strained optimization problem with respect to n − m variables. However, in many cases it is not technically feasible to explicitly express one variable as function of the others. Instead of the substitution and elimination method, we may resort to the easy-to-use and well-defined method of Lagrange multipliers.

j Let f and g1, . . . , gm be C(1)-functions and the Jacobian J = ( ∂g , i = 1, 2, . . . , n, j = ∂xi 1, 2, . . . , m) have the full rank, i.e. rank(J ) = m. We introduce the Lagrangian function as ∑m j L(x1, . . . , xn, λ1, . . . , λm) = f(x1, . . . , xn) + λj(bj − g (x1, . . . , xn)) j=1 where λ1, . . . , λm are constant (Lagrange multipliers). What are the Necessary Conditions for the Solution of the problem (12.1)?

Equate all partials of L with respect to x1, . . . , xn, λ1, . . . , λm to zero:

∂L(x , . . . , x , λ , . . . , λ ) ∂f(x , . . . , x ) ∑m ∂gj(x , . . . , x ) 1 n 1 m = 1 n − λ 1 n = 0, i = 1, 2, . . . , n ∂x ∂x j ∂x i i j=1 i

∂L(x1, . . . , xn, λ1, . . . , λm) j = bj − g (x1, . . . , xn) = 0, j = 1, 2, . . . , m. ∂λj

Solve these equations for x1, . . . , xn and λ1, . . . , λm. ∗ ∗ ∗ In the end we will get a set of stationary points of the Lagrangian. If x = (x1, . . . , xn) is a solution of the problem (12.1), it should be a stationary point of L. It is important to assume that rank(J ) = m, and the functions are continuously differentiable. If we need to check whether a stationary point is a maximum or minimum, the following local sufficient conditions can be applied:

144 Proposition 12.4.1 Let us introduce a bordered Hessian |H¯ | as   r 1 1 0 ··· 0 ∂g ··· ∂g  ∂x1 ∂xr     ......   ......   m m   0 ··· 0 ∂g ··· ∂g  ¯  ∂x1 ∂xr  |Hr| = det   , r = 1, 2, . . . , n  ∂g1 ··· ∂gm ∂2L ··· ∂2L   ∂x ∂x ∂x ∂x ∂x ∂xr   1 1 1 1 1   ......   ......  ∂g1 ··· ∂gm ∂2L ··· ∂2L ∂xr ∂xr ∂xr∂x1 ∂xr∂xr Let f and g1, . . . , gm are C(2)-functions and let x∗ satisfy the necessary conditions for the problem (12.1).

¯ ∗ ∗ Let |Hr(x )| be the bordered Hessian determinant evaluated at x . Then

m ¯ ∗ ∗ (1) if (−1) |Hr(x )| > 0, r = m + 1, . . . , n, then x is a local minimum point for the problem (12.1);

r ¯ ∗ ∗ (2) if (−1) |Hr(x )| > 0, r = m + 1, . . . , n, then x is a local maximum point for the problem (12.1).

12.5 Quasiconcavity and Quasiconvexity

For a problem of free extremum, we know that the concavity (convexity) of the objec- tive function guarantees the existence of absolute maximum (absolute minimum). For a problem of constrained optimization, we will demonstrate the quansiconcavity (quasicon- vexity) of the objective function guarantees the existence of absolute maximum (absolute minimum).

Algebraic Characterization

Quasiconcavity and quasiconvexity, like concavity and convexity, can be either strict or nonstrict:

Definition 12.5.1 A function is quasiconcave (quasiconvex) iff, for any pair of distict points u and v in the convex domain of f, and for 0 < θ < 1, we have

f(θu + (1 − θ)v) ≥ min{f(u), f(v)}

[f(θu + (1 − θ)v) ≤ max{f(u), f(v)}].

145 Note that when f(v) ≥ f(u), the above inequalities imply respectively

f(θu + (1 − θ)v) ≥ f(u)

[f(θu + (1 − θ)v) ≤ f(v)].

Furthermore, if the weak inequality “≥” (“≤”) is replaced by the strict inequality “>” (“<”), f is said to be strictly quasiconcave (strictly quasiconvex).

Remark: From the definition of quasiconcavity (quasiconvexity), we can know that quasiconvity (quasiconvexity) is a weaker condition than concavity (convexity). Theorem I (Negative of a function). If f(x) is quasiconcave (strictly quasicon- cave), then −f(x) is quasiconvex (strictly quasiconvex). Theorem II (concavity versus quaisconcavity). Any (strictly) concave (convex) function is (strictly) quasiconcave (quasiconvex), but the converse may not be true. Theorem III (linear function). If f(x) is linear, then it is quasiconcave as well as quasiconvex. Theorem IV (monotone function with one variable). If f is a function of one variable, then it is quasiconcave as well as quasiconvex. Remark: Note that, unlike concave (convex) functions, a sum of two quasiconcave (quasiconvex) functions is not necessarily quasiconcave (quasiconvex). Sometimes it may prove easier to check quasiconcavity and quasiconvexity by the following alternative (equivalent) definitions. We state it as a proposition.

Proposition 12.5.1 A function f(x), where x is a vector of variables in the domain X, is quasiconcave (quasiconvex) if and only if, for any constant k, the set S≥ ≡ {x ∈ X : f(x) ≥ k} (S≤ ≡ {x ∈ X : f(x) ≤ k}) is convex.

146 Example 12.5.1 (1) Z = x2 is quasiconvex since S≤ is convex. (2) Z = f(x, y) = xy is quasiconcave since S≥ is convex. (3) Z = f(x, y) = (x − a)2 + (y − b)2 is quasiconvex since S≤ is convex.

The above facts can be seen by looking at graphs of these functions.

Differentiable Functions

Similar to the concavity, when a function is differentiable, we have the following result.

Proposition 12.5.2 Suppose f : R → R is differentiable. Then f is concave if and only if for any x, y ∈ R, we have

f(y) = f(x) ⇒ f ′(x)(y − x) = 0. (12.2)

When there are two or more independent variables, the above proposition becomes:

Proposition 12.5.3 Suppose f : Rn → R is differentiable. Then f is concave if and only if for any x, y ∈ R, we have

∑n ∂f(x) f(y) = f(x) ⇒ (y − x ) = 0. (12.3) ∂x j j j=1 j

If a function z = f(x1, x2, ··· , xn) is twice continuously differentiable, quasiconcav- ity and quansiconvexity can be checked by means of the first and second order partial derivatives of the function.

147 Define a bordered determinant as follows:

··· 0 f1 f2 fn

··· f1 f11 f12 f1n

| | ··· B = f2 f21 f22 f2n

···············

fn fn1 fn2 ··· fnn

Remark: The determinant |B| is different from the bordered Hessian |H|. Unlike |H|, the border in |B| is composed of the first derivatives of the function f rather than an extraneous constraint function g. We can define successive principal minors of B as follows:

0 f1 f2 0 f | | 1 | | · · · | | | | B1 = B2 = f1 f11 f12 Bn = B . f f 1 11 f2 f21 f22

A necessary condition for a function z = f(x1, ··· , xn) defined the nonnegative orthant to be quasiconcave is that

n |B1| ≤ 0, |B2| ≥ 0, |B3| ≤ 0, ··· , (−1) |Bn| ≥ 0.

A sufficient condition for f to be strictly quasiconcave on the nonnegative orthant is that

n |B1| < 0, |B2| > 0, |B3| < 0, ··· , (−1) |Bn| > 0.

For strict quasiconvexity, the corresponding sufficient condition is that

|B1| < 0, |B2| < 0, ··· , |Bn| < 0.

Example 12.5.2 z = f(x1, x2) = x1x2 for x1 > 0 and x2 > 0. Since f1 = x2, f2 = x1, f11 = f22 = 0, and f12 = f21 = 1, the relevant principal minors turn out to be

0 x2 x1 0 x | | 2 − 2 | | B1 = = x2 < 0 B2 = x2 0 1 = 2x1x2 > 0. x 0 2 x1 1 0

Thus z = x1x2 is strictly quasiconcave on the positive orthant.

148 Example 12.5.3 Show that z = f(x, y) = xayb (x, y > 0; a, b > 0) is quasiconcave. Since

a−1 b a b−1 fx = ax y , fy = bx y

a−2 b a−1 b−1 a b−2 fxx = a(a − 1)x y , fxy = abx y , fyy = b(b − 1)x y , thus

0 fx a−1 b 2 |B1| = = −(ax y ) < 0 f f x xx

0 fx fy

| | 2 2 − − 2 − 2 − 3a−2 3b−2 B2 = fx fxx fxy = [2a b a(a 1)b a b(b 1)]x y

fy fyx fyy

= ab(a + b)x3a−2y3b−2 > 0, and thus it is strictly quasiconcave.

Remark: When the constraint g is linear: g(x) = a1x1 + ··· + anxn = c, the second- order partial derivatives of g disappears, and thus, from the first order condition fi = λgi, the bordered determinant |B| and the bordered Hessian determinant have the following relationship: |B| = λ2|H¯ |

Consequently, in the linear-constraint case, the two bordered determinants always have the same sign at the stationary of z. The same is true for principal minors. Thus, the first-order necessary condition is also a sufficient condition for the maximization problem.

Absolute versus Relative Extrema

If a function is quasiconcave (quansiconvex), by the similar reasons for concave (convex) functions, its relative maximum (relative minimum) is an absolute maximum (absolute minimum).

12.6 Utility Maximization and Consumer Demand

Let us now re-examine the consumer choice problem – utility maximization problem. For simplicity, only consider the two-commodity case. The consumer wants to maximize his

149 utility

u = u(x, y)(ux > 0, uy > 0) subject to his budget constraint

Pxx + Pyy = B by taking prices Px and Py as well as his income as given.

First-Order Condition

The Lagrange function is

Z = u(x, y) + λ(B − Pxx − Pyy).

At the first-order condition, we have the following equations:

Zλ = B − Pxx − Pyy = 0

Zx = ux − λPx = 0

Zy = uy − λPy = 0

From the last two equations, we have

u u x = y = λ Px Py

Or u P x = x . uy Py The term ux ≡ MRS is the so-called marginal rate of substitution of x for y. Thus, uy xy we obtain the well-known equality: MRS = Px which is the necessary condition for the xy Py interior solution.

Second-Order Condition

If the bordered Hessian in the present problem is positive, i.e., if

0 Px Py

| ¯ | − 2 − 2 H = Px uxx uxy = 2PxPyuxy Py uxx Px uyy > 0,

Py uyx uyy

150 (with all the derivatives evaluated at the critical value ofx ¯ andy ¯), then the stationary value of u will assuredly be maximum. Since the budget constraint is linear, from the result in the last section, we have

|B| = λ2|H¯ |

Thus, as long as |B| > 0, we know the second-order condition holds. Recall that |B| > 0 means that utility function is strictly quasi-concave. Also, quasi-concavity of a utility function means the indifference curves represented by the utility function is convex, i.e., the upper contour set {y : u(y)) ≥ u(x)} is convex, in which case we call the preferences represented by the utility function is convex.

Remark 12.6.1 The convexity of preferences implies that people want to diversify their consumptions, and thus, convexity can be viewed as the formal expression of basic measure of economic markets for diversification. Also, the strict quasi-concavity implies that the strict convexity of ≻i, which in turn implies the conventional diminishing marginal rates of substitution (MRS), and weak convexity of

151 Chapter 13

Optimization with Inequality Constraints

Classical methods of optimization (the method of Lagrange multipliers) deal with opti- mization problems with equality constraints in the form of g(x1, . . . , xn) = c. Nonclassical optimization, also known as mathematical programming, tackles problems with inequality constraints like g(x1, . . . , xn) ≤ c. Mathematical programming includes linear programming and nonlinear programming. In linear programming, the objective function and all inequality constraints are linear. When either the objective function or an inequality constraint is nonlinear, we face a problem of nonlinear programming. In the following, we restrict our attention to non-linear programming. A problem of linear programming - also called a linear program - is discussed in the next chapter.

13.1 Non-Linear Programming

The nonlinear programming problem is that of choosing nonnegative values of certain variables so as to maximize or minimize a given (non-linear) function subject to a given set of (non-linear) inequality constraints.

152 The nonlinear programming maximum problem is

max f(x1, . . . , xn)

i s.t. g (x1, . . . , xn) ≤ bi, i = 1, 2, . . . , m.

x1 ≥ 0, . . . , xn ≥ 0

Similarly, the minimization problem is

min f(x1, . . . , xn)

i s.t. g (x1, . . . , xn) ≥ bi, i = 1, 2, . . . , m.

x1 ≥ 0, . . . , xn ≥ 0

First, note that there are no restrictions on the relative size of m and n, unlike the case of equality constraints. Second, note that the direction of the inequalities (≤or ≥) at the

i constraints is only a convention, because the inequality g ≤ bi can be easily converted to i the ≥ inequality by multiplying it by -1, yielding −g ≥ −bi. Third, note that an equality k k constraint, say g = bk, can be replaced by the two inequality constraints, g ≤ bk and k −g ≤ −bk.

j Definition 13.1.1 ( Binding Constraint) A constraint g ≤ bj is called binding (or 0 0 0 j 0 active) at x = (x1, . . . , xn) if g (x ) = bj.

Example 13.1.1 The following is an example of a nonlinear program:

max π = x1(10 − x1) + x2(20 − x2)

s.t. 5x1 + 3x2 ≤ 40

x1 ≤ 5

x2 ≤ 10

x1 ≥ 0, x2 ≥ 0

Example 13.1.2 ( How to Solve a Nonlinear Program Graphically) An intuitive way to understand a low-dimensional non-linear program is to represent the feasible set and the objective function graphically.

153 For instance, we may try to represent graphically the optimization problem from Example (13.1.1), which can be re-written as

2 2 max π = 125 − (x1 − 5) − (x2 − 10)

s.t. 5x1 + 3x2 ≤ 40

x1 ≤ 5

x2 ≤ 10

x1 ≥ 0, x2 ≥ 0 or

2 2 min C = (x1 − 5) + (x2 − 10)

s.t. 5x1 + 3x2 ≤ 40

x1 ≤ 5

x2 ≤ 10

x1 ≥ 0, x2 ≥ 0 Region OABCD below shows the feasible set of the nonlinear program.

The value of the objective function can be interpreted as the square of the distance from the point (5, 10) to the point (x1, x2). Thus, the nonlinear program requires us to find the point in the feasible region which minimizes the distance to the point (5, 10).

154 ∗ ∗ As can be seen in the figure above, the optimal point (x1, x2) lies on the line 5x1+3x2 = 40 and it is also the tangency point with a circle centered at (5, 10). These two conditions are sufficient to determine the optimal solution.

2 2 Consider (x1 − 5) + (x2 − 10) − C = 0. From the implicit-function theorem, we have dx x − 5 2 = − 1 dx1 x2 − 10 − 5 Thus, the tangent to the circle centered at (5, 10) which has the slope 3 is given by the equation

5 x − 5 − = − 1 3 x2 − 10 or

3x1 − 5x2 + 35 = 0.

Solving the system of equations   5x1 + 3x2 = 40  , 3x1 − 5x2 = −35

∗ ∗ 95 295 we find the optimal solution (x1, x2) = ( 34 , 34 ).

13.2 Kuhn-Tucker Conditions

For the purpose of ruling out certain irregularities on the boundary of the feasible set, a restriction on the constrained functions is imposed. This restriction is called the constraint qualification.

Definition 13.2.1 Denote by C the constraint set. We say that the constraint qualifica- tion condition is satisfied at x∗ ∈ C if the constraints that hold at x∗ with equality are independent; that is, if the gradients (the vectors of partial derivatives) of gj-constraints that are valuated and binding at x∗ are linearly independent.

Define the Lagrangian function for optimization with inequality constraints: ∑m j L(x1, . . . , xn, λ1, . . . , λm) = f(x1, . . . , xn) + λj(bj − g (x1, . . . , xn)). j=1 The following results is the necessity theorem about Kuhn-Tucker conditions for a local optimum if the constraint qualification is satisfied.

155 Proposition 13.2.1 ( Necessity Theorem of Kuhn-Tucker conditions) Suppose that that the constraint qualification condition is satisfied and that the objective functions and constraint functions are differentiable. Then, we have

(1) the Kuhn-Tucker (in short, KT ) necessary conditions for maximization are: ∂L ∂L ≤ 0, xi ≥ 0 and xi = 0, i = 1, . . . , n; ∂xi ∂xi ∂L ∂L ≥ 0, λj ≥ 0 and λj = 0, j = 1, . . . , m. ∂λj ∂λj (2) the Kuhn-Tucker necessary conditions for minimization are:

∂L ∂L ≥ 0, xi ≥ 0 and xi = 0, i = 1, . . . , n; ∂xi ∂xi ∂L ∂L ≤ 0, λj ≥ 0 and λj = 0, j = 1, . . . , m. ∂λj ∂λj

In general, the KT conditions are neither necessary nor sufficient for a local optimum without other conditions such as constraint qualification condition. However, if certain assumptions are satisfied, the KT conditions become necessary and even sufficient.

Example 13.2.1 The Lagrangian function of the nonlinear program in Example (13.1.1) is:

L = x1(10 − x1) + x2(20 − x2) − λ1(5x1 + 3x2 − 40) − λ2(x1 − 5) − λ3(x2 − 10).

The KT conditions are:

∂L = 10 − 2x1 − 5λ1 − λ2 ≤ 0; ∂x1 ∂L = 20 − 2x2 − 3λ1 − λ2 ≤ 0; ∂x2 ∂L = −(5x1 + 3x2 − 40) ≥ 0; ∂λ1 ∂L = −(x1 − 5) ≥ 0; ∂λ2 ∂L = −(x2 − 10) ≥ 0; ∂λ3

x1 ≥ 0, x2 ≥ 0;

156 λ1 ≥ 0, λ2 ≥ 0, λ3 ≥ 0; ∂L ∂L x1 = 0, x2 = 0; ∂x1 ∂x2 ∂L λi = 0, i = 1, 2, 3. ∂λi

Notice that the failure of the constraint qualification signals certain irregularities at the boundary kinks of the feasible set. Only if the optimal solution occurs in such a kink may the KT conditions not be satisfied. If all constraints are linear, the constraint qualification is always satisfied.

Example 13.2.2 The constraint qualification for the nonlinear program in (13.1.1) is

95 295 satisfied since all constraints are linear. Therefore, the optimal solution ( 34 , 34 ) must satisfy the KT conditions in example (13.1.1).

Example 13.2.3 The following example illustrates a case where the Kuhn-Tucker condi- tions are not satisfied in the solution of an optimization problem. Consider the problem:

max y

s.t. x + (y − 1)3 ≤ 0

x ≥ 0, y ≥ 0

The solution to this problem is (0, 1). (If y > 1, then the restriction x + (y + 1)3 ≤ 0 implies x < 0.) The Lagrangian function is L = y + λ[−x − (y − 1)3] One of the Kuhn-Tucker conditions requires ∂L ≤ ∂y 0 or 1 − 3λ(y − 1)2 ≤ 0 As can be observed, this condition is not verified at the point (0, 1).

Proposition 13.2.2 (Kuhn-Tucker Sufficiency Theorem- 1) Suppose the following conditions are satisfied:

157 (a) f is differentiable and concave in the nonnegative orthant;

(b) each constraint function is differentiable and convex in the nonnegative orthant;

(c) the point x∗ satisfies the Kuhn-Tucker necessary maximum conditions.

Then x∗ gives a global maximum of f.

Example 13.2.4 Suppose the ordering is represented by the linear utility function : u(x, y) = ax + by.

Since the marginal rate of substitution is a/b and the economic rate of substitution is px/py are both constant, they cannot be in general equal. So the first-order condition cannot hold with equality as long as a ≠ px . In this case the answer to the utility- b py maximization problem typically involves a boundary solution: only one of the two goods will be consumed. It is worthwhile presenting a more formal solution since it serves as a nice example of the Kuhn-Tucker theorem in action. The Kuhn-Tucker theorem is the appropriate tool to use here, since we will almost never have an interior solution. The Lagrange function is

L(x, y, λ) = ax + by + λ(m − pxx − pyy) and thus ∂L = a − λp (13.1) ∂x x ∂L = b − λp (13.2) ∂y t ∂L = m − p − p (13.3) ∂λ x y (13.4)

There are four cases to be considered: Case 1. x > 0 and y > 0. Then we have ∂L = 0 and ∂L = 0. Thus, a = px . Since ∂x ∂y b py λ = a > 0, we have p x + p y = m and thus all x and y that satisfy p x + p y = m are px x y x y the optimal consumptions. Case 2. x > 0 and y = 0. Then we have ∂L = 0 and ∂L 5 0. Thus, a = px . Since ∂x ∂y b py λ = a > 0, we have p x + p y = m and thus x = m is the optimal consumption. px x y px

158 Case 3. x = 0 and y > 0. Then we have ∂L 5 0 and ∂L = 0. Thus, a 5 px . Since ∂x ∂y b py λ = b > 0, we have p x + p y = m and thus y = m is the optimal consumption. py x y py Case 4. x = 0 and y = 0. Then we have ∂L 5 0 and ∂L 5 0. Since λ = b > 0, we ∂x ∂y py have pxx + pyy = m and thus m = 0 because x = 0 and y = 0. In summary, the demand functions are given by    (m/px, 0) if a/b > px/py

(x(px, py, m), y(px, py, m)) = (0, m/p ) if a/b < p /p  y x y  (x, m/px − py/pxx) if a/b = px/py for all x ∈ [0, m/px].

Figure 13.1: Utility maximization for linear utility function

Remark 13.2.1 In fact, it is easily found out the optimal solutions by comparing rel- atives steepness of the indifference curves and the budget line. For instance, as shown in Figure 13.1 below, when a > px , the indifference curves become steeper, and thus b py the optimal solution is the one the consumer spends his all income on good x . When a < px , the indifference curves become flatter, and thus the optimal solution is the one b py the consumer spends his all income on good y. When a ≠ px , the indifference curves and b py the budget line are parallel and coincide at the optimal solutions, and thus the optimal solutions are given by all the points on the budget line.

159 The concavity of the objective function and convexity of constraints may be weakened to quasiconcavity and quasiconvexity, respectively. To adapt this theorem for minimiza- tion problems, we need to interchange the two words “concave” and “convex” in a) and b) and use the Kuhn-Tucker necessary minimum condition in c).

Proposition 13.2.3 (Kuhn-Tucker Sufficiency Theorem- 2) The KT conditions are sufficient conditions for x∗ to be a local optimum of a maximization (minimization) pro- gram if the following assumptions are satisfied:

(a) the objective function f is differentiable and quasiconcave (or quasiconvex);

(b) each constraint gi is differentiable and quasiconvex (or quasiconcave);

(c) any one of the following is satisfied:

∗ (c.i)there exists j such that ∂f(x ) < 0 (> 0); ∂xj ∂f(x∗) (c.ii) there exists j such that > 0 (< 0) and xj can take a ∂xj positive value without violating the constraints; (c.iii) f is concave (or convex).

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ The problem of finding the nonnegative vector (x , λ ), x = (x1, . . . , xn), λ = (λ1, . . . , λm) which satisfies Kuhn-Tucker necessary conditions and for which

∗ ∗ ∗ ∗ L(x, λ ) ≤ L(x , λ ) ≤ L(x , λ) ∀ x = (x1, . . . , xn) ≥ 0, λ = (λ1, . . . , λm) ≥ 0 is known as the saddle point problem.

Proposition 13.2.4 If (x∗, λ∗) solves the saddle point problem then (x∗, λ∗) solves the problem (13.1).

13.3 Economic Applications

Economic Interpretation of a Nonlinear Program and Kuhn-Tucker Conditions

A maximization program in the general form, for example, is the production problem facing a firm which has to produce n goods such that it maximizes its revenue subject to

160 m resource (factor) constraints. The variables have the following economic interpretations:

• xj is the amount produced of the jth product;

• ri is the amount of the ith resource available;

• f is the profit (revenue) function;

• gi is a function which shows how the ith resource is used in producing the n goods.

The optimal solution to the maximization program indicates the optimal quantities of each good the firm should produce. In order to interpret the Kuhn-Tucker conditions, we first have to note the meanings of the following variables:

∂f • fj = is the marginal profit (revenue) of product j; ∂xj

• λi is the shadow price of resource i;

i • gi = ∂g is the amount of resource i used in producing a marginal unit of product j ∂xj j;

• i λigj is the imputed cost of resource i incurred in the production of a marginal unit of product j.

∑m ∂L i The KT condition ≤ 0 can be written as fj ≤ λig and it says that the marginal ∂xj j i=1 profit of the jth product cannot exceed the aggregate marginal imputed cost of the jth product.

∂L The condition xj = 0 implies that, in order to produce good j (xj > 0), the marginal ∂xj profit of good j must be equal to the aggregate marginal imputed cost ( ∂L = 0). The ∂xj same condition shows that good j is not produced (xj = 0) if there is an excess imputation ∂L xj < 0. ∂xj The KT condition ∂L ≥ 0 is simply a restatement of constraint i, which states that ∂λi the total amount of resource i used in producing all the n goods should not exceed total amount available ri.

161 The condition ∂L = 0 indicates that if a resource is not fully used in the optimal ∂λi ∂L solution ( > 0), then its shadow price will be 0 (λi = 0). On the other hand, a fully ∂λi ∂L used resource ( = 0) has a strictly positive price (λi > 0). ∂λi

Example 13.3.1 Let us find an economic interpretation for the maximization program given in Example (13.1.1):

max R = x1(10 − x1) + x2(20 − x2)

s.t. 5x1 + 3x2 ≤ 40

x1 ≤ 5

x2 ≤ 10

x1 ≥ 0, x2 ≥ 0 A firm has to produce two goods using three kinds of resources available in the amounts 40, 5, 10 respectively. The first resource is used in the production of both goods: five units are necessary to produce one unit of good 1, and three units to produce one unit of good 2. The second resource is used only in producing good 1 and the third resource is used only in producing good 2. The sale prices of the two goods are given by the linear inverse demand equations p1 = 10 − x1 and p2 = 20 − x2. The problem the firm faces is how much to produce of each good in order to maximize revenue R = x1p1 + x2p2. The solution (2, 10) gives the optimal amounts the firm should produce.

The Sales-Maximizing Firm

Suppose that a firm’s objective is to maximize its sales revenue subject to the constraint that the profit is not less than a certain value. Let Q denote the amount of the good supplied on the market. If the revenue function is R(Q) = 20Q − Q2, the cost function is C(Q) = Q2 + 6Q + 2 and the minimum profit is 10, find the sales-maximizing quantity. The firm’s problem is

max R(Q) s.t.R(Q) − C(Q) ≥ 10,Q ≥ 0 or max R(Q) s.t.2Q2 − 14Q + 2 ≤ −10,Q ≥ 0

162 The Lagrangian function is Z = 20Q − Q2 − λ(2Q2 − 14Q + 12) and the Kuhn-Tucker conditions are ∂Z ≤ ≥ ∂Z ∂Q 0,Q 0,Q ∂Q = 0 (7) ∂Z ≤ ≥ ∂Z ∂λ 0, λ 0, λ ∂λ = 0 (8) Explicitly, the first inequalities in each row above are −Q + 10 − λ(2Q − 7) ≤ 0 (9) −Q2 + 7Q − 6 ≥ 0 (10) The second inequality in (7) and inequality (10) imply that Q > 0 (if Q were 0, (10)

∂Z would not be satisfied). From the third condition in (7) it follows that ∂Q = 0. ∂Z If λ were 0, the condition ∂Q = 0 would lead to Q = 10; thus, we must have λ > 0 ∂Z and ∂λ = 0. 2 The quadratic equation −Q + 7Q − 6 = 0 has the roots Q1 = 1 and Q2 = 6. Q1 − 9 implies λ = 5 , which contradicts λ > 0. The solution Q2 = 6 leads to a positive value for λ. Thus, the sales-maximizing quantity is Q = 6.

163 Chapter 14

Linear Programming

In this chapter, the object and constraint functions are all in the linear form.

14.1 The Setup of the Problem

There are two general types of linear programs: the maximization program in n variables subject to m + n constraints:

∑n Max π = cjxj j=1 ∑n s.t. aijxj ≤ ri (i = 1, 2, . . . , m) j=1

xj ≥ 0 (j = 1, 2, . . . , n) the minimization program in n variables subject to m + n constraints:

∑n Max C = cjxj j=1 ∑n s.t. aijxj ≥ ri (i = 1, 2, . . . , m) j=1

xj ≥ 0 (j = 1, 2, . . . , n) Note that without loss of generality all constraints in a linear program may be written as either ≤ inequalities or ≥ inequalities because any ≤ inequality can be transformed into a ≥ inequality by multiplying both sides by -1.

164 A more concise way to express a linear program is by using matrix notation: the maximization program in n variables subject to m + n constraints:

Max π = c′x

s.t. Ax ≤ r

x ≥ 0 the minimization program in n variables subject to m + n constraints:

Max C = c′x

s.t. Ax ≥ r

x ≥ 0

′ ′ ′ wherec = (c1, . . . , cn) , x = (x1, . . . , xn) , r = (r1, . . . , rn) , ···  a11 a12 a1n     ···   a21 a22 a2n  A =    . . .. .   . . . . 

am1 am2 ··· amn Example 14.1.1 The following is an example of a linear program:

Max π = 10x1 + 30x2

s.t. 5x1 + 3x2 ≤ 40

x1 ≤ 5

x2 ≤ 10

x1 ≥ 0, x2 ≥ 0 Proposition 14.1.1 (The Globality Theorem) If the feasible set F of an optimiza- tion problem is a closed convex set, and if the objective function is a continuous concave (convex) function over the feasible set, then

(1)any local maximum (minimum) will also be a global maximum (minimum) and

(2)the points in F at which the objective function is optimized will constitute a convex set.

165 Note that any linear program satisfies the assumptions of the globality theorem. In- deed, the objective function is linear; thus, it is both a concave and a convex function. On the other hand, each inequality restriction defines a closed halfspace, which is also a convex set. Since the intersection of a finite number of closed convex sets is also a closed convex set, it follows that the feasible set F is closed and convex. The theorem says that for a linear program there is no difference between a local optimum and a global optimum, and if a pair of optimal solutions to a linear program exists, then any convex combination of the two must also be an optimal solution.

Definition 14.1.1 An extreme point of a linear program is a point in its feasible set which cannot be derived from a convex combination of any other two points in the feasible set.

Example 14.1.2 For the problem in example (14.1.1) the extreme points are (0, 0), (0, 10), (2, 10), (5, 5), (5, 0). They are just the corners of the feasible region represented in the x10x2 system of coordinates.

Proposition 14.1.2 The optimal solution of a linear program is an extreme point.

14.2 The Simplex Method

The next question is to develop an algorithm which will enable us to solve a linear program. Such an algorithm is called the simplex method. The simplex method is a systematic procedure for finding the optimal solution of a linear program. Loosely speaking, using this method we move from one extreme point of the feasible region to another until the optimal point is attained. The following two examples illustrate how the simplex method can be applied:

Example 14.2.1 Consider again the maximization problem in Example (14.1.1) .

• Step 1.

Transform each inequality constraint into an equality constraint by inserting a dum- my variable (slack variable) in each constraint.

After applying this step, the problem becomes:

166 Max π = 10x1 + 30x2

s.t. 5x1 + 3x2 + d1 = 40

x1 + d2 = 5

x2 + d3 = 10

x1 ≥ 0, x2 ≥ 0, d1 ≥ 0, d2 ≥ 0, d3 ≥ 0

The extreme points of the new feasible set are called basic feasible solutions (BFS). For each extreme point of the original feasible set there is a BFS. For example,

the BFS corresponding to the extreme point (x1, x2) = (0, 0) is (x1, x2, d1, d2, d3) = (0, 0, 40, 5, 10).

• Step 2

Find a BFS to be able to start the algorithm.

A BFS is easy to find when (0, 0) is in the feasible set, as it is in our example. Here, the BFS are (0, 0, 40, 5, 10). In general, finding a BFS requires the introduction of additional artificial variables. Example 77 illustrates this situation.

• Step 3

Set up a simplex tableau like the following one:

Row π x1 x2 d1 d2 d3 Constant 0 1 -10 -30 0 0 0 0 1 0 5 3 1 0 0 40 2 0 1 0 0 1 0 5 3 0 0 1 0 0 1 10 Row 0 contains the coefficients of the objective function. Rows 1,2,3 include the coefficients of the constraints as they appear in the matrix of coefficients.

In row 0 we can see the variables which have nonzero values in the current BFS. They correspond to the zeros in row 0, in this case, the nonzero variables are

m d1, d2, d3. These nonzero variables always correspond to a basis in R , where m is the number of constraints (see the general form of a linear program). The current

167 basis is formed by the column coefficient vectors which appear in the tableau in

rows 1,2,3 under d1, d2, d3(m = 3).

The rightmost column shows the nonzero values of the current BFS (in rows 1,2,3) and the value of the objective function for the current BFS (in row 0). That is,

d1 = 40, d2 = 5, d3 = 10, π = 0.

• Step 4

Choose the pivot.

The simplex method finds the optimal solution by moving from one BFS to another BFS until the optimum is reached. The new BFS is chosen such that the value of the objective function is improved (that is, it is larger for a maximization problem and lower for a minimization problem).

In order to switch to another BFS, we need to replace a variable currently in the basis by a variable which is not in the basis. This amounts to choosing a pivot element which will indicate both the new variable and the exit variable.

The pivot element is selected in the following way. The new variable (and, corre- spondingly, the pivot column) is that variable having the lowest negative value in row 0 (largest positive value for a minimization problem). Once we have determined the pivot column, the pivot row is found in this way:

– pick the strictly positive elements in the pivot column except for the element in row 0;

– select the corresponding elements in the constant column and divide them by the strictly positive elements in the pivot column;

– the pivot row corresponds to the minimum of these ratios.

In our example, the pivot column is the column of x2 (since −30 < −10) and the { 40 10 10 } pivot row is row 3 (since min 3 , 1 = 1 ); thus, the pivot element is the 1 which

lies at the intersection of column of x2 and row 3.

• Step 5

168 Make the appropriate transformations to reduce the pivot column to a unit vector (that is, a vector with 1 in the place of the pivot element and 0 in the rest).

Usually, first we have to divide the pivot row by the pivot element to obtain 1 in the place of the pivot element. However, this is not necessary in our case since we already have 1 in that position.

Then the new version of the pivot row is used to get zeros in the other places of the pivot column. In our example, we multiply the pivot row by 30 and add it to row 0; then we multiply the pivot row by ..3 and add it to row 1.

Hence we obtain a new simplex tableau:

Row π x1 x2 d1 d2 d3 Constant 0 1 -10 0 0 0 30 300 1 0 5 0 1 0 -3 10 2 0 1 0 0 1 0 5 3 0 0 1 0 0 1 10

• Step 6

Check that row 0 still contains negative values (positive values for a minimization problem). If it does, proceed to step 4. If it does not, the optimal solution must be found.

In our example, we still have a negative value in row 0 (-10). According to step { 10 5 } 10 4,the column of x1 is the pivot column (x1 enters the base). Since min 5 , 1 = 5 , it follows that row 1 is the pivot row. Thus, 5 in row 1 is the new pivot element. Applying step 5, we obtain a third simplex tableau:

Row π x1 x2 d1 d2 d3 Constant 0 1 0 0 2 0 24 320

1 3 1 0 1 0 5 0 - 5 2 1 3 2 0 0 0 - 5 1 5 3 3 0 0 1 0 0 1 10

There is no negative element left in row 0 and the current basis consists of x1, x2, d2.

Thus, the optimal solution can be read from the rightmost column: x1 = 2, x2 =

169 10, d2 = 3, π = 320. The correspondence between the optimal values and the vari-

ables in the current basis is made using the 1’s in the column vectors of x1, x2, d2.

Example 14.2.2 Consider the minimization problem:

Min C = x1 + x2

s.t. x1 − x2 ≥ 0

− x1 − x2 ≥ −6

x2 ≥ 1

x1 ≥ 0, x2 ≥ 0

After applying step 1 of the simplex algorithm, we obtain:

Min C = x1 + x2

s.t. x1 − x2 − d1 = 0

− x1 − x2 − d2 = −6

x2 − d3 = 1

x1 ≥ 0, x2 ≥ 0, d1 ≥ 0, d2 ≥ 0, d3 ≥ 0

In this example, (0, 0) is not in the feasible set. One way of finding a starting BFS is to transform the problem by adding 3 more artificial variables f1, f2, f3 to the linear program as follows:

Min C = x1 + x2 + M(f1 + f2 + f3)

s.t. x1 − x2 − d1 + f1 = 0

x1 + x2 + d2 + f2 = 6

x2 − d3 + f3 = 1

x1 ≥ 0, x2 ≥ 0, d1 ≥ 0, d2 ≥ 0, d3 ≥ 0, f1 ≥ 0, f2 ≥ 0, f3 ≥ 0

Before adding the artificial variables, the second constraint is multiplied by -1 to obtain a positive number on the right-hand side.

As long as M is high enough, the optimal solution will contain f1 = 0, f2 = 0, f3 = 0 and thus the optimum is not affected by including the artificial variables in the linear

170 program. In our case, we choose M = 100. (In maximization problems M is chosen very low.)

Now, the starting BFS is (x1, x2, d1, d2, d3, f1, f2, f3) = (0, 0, 0, 0, 0, 0, 6, 1) and the first simplex tableau is:

Row π x1 x2 d1 d2 d3 f1 f2 f3 Constant 0 1 -1 -1 0 0 0 -100 -100 -100 0 1 0 1 -1 -1 0 0 1 0 0 0 2 0 1 1 0 1 0 0 1 1 6 3 0 0 1 0 0 -1 0 0 0 1

In order to obtain unit vectors in the columns of f1, f2 and f3, we add 100(row1+row2+row3) to row 0:

Row π x1 x2 d1 d2 d3 f1 f2 f3 Constant 0 1 199 99 -100 100 -100 0 0 0 700 1 0 1 -1 -1 0 0 1 0 0 0 2 0 1 1 0 1 0 0 1 1 6 3 0 0 1 0 0 -1 0 0 0 1 Proceeding with the other steps of the simplex algorithm will finally lead to the optimal solution x1 = 1, x2 = 1,C = 2. Sometimes the starting BFS (or the starting base) can be found in a simpler way than by using the general method that was just illustrated. For example, our linear program can be written as:

Min C = x1 + x2

s.t. x1 − x2 + d1 = 0

x1 + x2 + d2 = 6

x2 − d3 = 1

x1 ≥ 0, x2 ≥ 0, d1 ≥ 0, d2 ≥ 0, d3 ≥ 0

Taking x1 = 0, x2 = 0 we get d1 = 0, d2 = 6, d3 = −1. Only d3 contradicts the constraint d3 ≥ 0. Thus, we need to add only one artificial variable f to the linear program and the variables in the starting base will be d1, d2, f. The complete sequence of simplex tableaux leading to the optimal solution is given below.

171 Row C x1 x2 d1 d2 d3 f Constant 0 1 -1 -1 0 0 0 -100 0 1 0 -1 1 1 0 0 0 0 2 0 1 1 0 1 0 0 0 3 0 0 1 0 0 -1 1 1 0 1 -1 99 0 0 -100 0 100 1 0 -1 1∗ 1 0 0 0 0 2 0 1 1 0 1 0 0 6 3 0 0 1 0 0 -1 1 1 0 1 98 0 -99 0 -100 0 100 1 0 -1 1 1 0 0 0 0 2 0 2 0 -1 1 0 0 6 3 0 1∗ 0 -1 0 -1 1 1 0 1 0 0 -1 0 -2 -98 2 1 0 0 1 0 0 -1 1 1 2 0 0 0 1 1 2 -2 4 3 0 1 0 -1 0 -1 1 1 (The asterisks mark the pivot elements.)

As can be seen from the last tableau, the optimum is x1 = 1, x2 = 1, d2 = 4,C = 2.

Economic Interpretation of a Maximization Program

The maximization program in the general form is the production problem facing a firm which has to produce n goods such that it maximizes its revenue subject to m resource (factor) constraints. The variables have the following economic interpretations:

xj is the amount produced of the jth product;

ri is the amount of the ith resource available;

aij is the amount of the ith resource used in producing a unit of the jth product;

cj is the price of a unit of product j. The optimal solution to the maximization program indicates the optimal quantities of each good the firm should produce.

172 Example 14.2.3 Let us find an economic interpretation of the maximization program given in Example (14.1.1) :

Max π = 10x1 + 30x2

s.t. 5x1 + 3x2 ≤ 40

x1 ≤ 5

x2 ≤ 10

x1 ≥ 0, x2 ≥ 0

A firm has to produce two goods using three kinds of resources available in the amounts 40, 5, 10 respectively. The first resource is used in the production of both goods: five units are necessary to produce one unit of good 1, and three units to produce 1 unit of good 2. The second resource is used only in producing good 1 and the third resource is used only in producing good 2. The question is, how much of each good should the firm produce in order to maximize its revenue if the sale prices of goods are 10 and 30, respectively? The solution (2, 10) gives the optimal amounts the firm should produce. Economics Application 10 (Economic Interpretation of a Minimization Program) The minimization program (see the general form) is also connected to the firm’s pro- duction problem. This time the firm wants to produce a fixed combination of m goods using n resources (factors). The problem lies in choosing the amount of each resource such that the total cost of all resources used in the production of the fixed combination of goods is minimized. The variables have the following economic interpretations:

xj is the amount of the jth resource used in the production process;

ri is the amount of the good ith the firm decides to produce;

aij is the marginal productivity of resource j in producing good i ;

cj is the price of a unit of resource j. The optimal solution to the minimization program indicates the optimal quantities of each factor the firm should employ.

Example 14.2.4 Consider again the minimization program in example (14.2.2) :

173 Min C = x1 + x2

s.t. x1 − x2 ≥ 0

x1 + x2 ≤ 6

x2 ≥ 1

x1 ≥ 0, x2 ≥ 0 What economic interpretation can be assigned to this linear program? A firm has to produce two goods, at most six units of the first good and one unit of the second good. Two factors are used in the production process and the price of both factors is one. The marginal productivity of both factors in producing good 1 is one and the marginal productivity of the factors in producing good 2 are zero and one, respectively (good 2 uses only the second factor). The production process also requires that the amount of the second factor not exceed the amount of the first factor. The question is how much of each factor the firm should employ in order to minimize the total factor cost. The solution (1, 1) gives the optimal factor amounts.

14.3 Duality

As a matter of fact, the maximization and the minimization programs are closely related. This relationship is called duality.

Definition 14.3.1 The dual (program) of the maximization program

Max π = c′x

s.t. Ax ≤ r

x ≥ 0 is the minimization program

Min π∗ = r′y

s.t. A′y ≥ c

y ≥ 0

174 Definition 14.3.2 The dual (program) of the minimization program

Min C = c′x

s.t. Ax ≥ r

x ≥ 0

is the maximization program

Min C∗ = r′y

s.t. A′y ≤ c

y ≥ 0

The original program from which the dual is derived is called the primal (program).

Example 14.3.1 The dual of the maximization program in Example (14.1.1) is

∗ Min C = 40y1 + 5y2 + 10y3

s.t. 5y1 + y2 ≥ 10

3y1 + y3 ≥ 30

y1 ≥ 0, y2 ≥ 0, y3 ≥ 0

Example 14.3.2 The dual of the minimization program in Example (14.2.2) is

∗ Max π = −6y2 + y3

s.t. y1 − y2 ≤ 1

− y1 − y2 + y3 ≤ 1

y1 ≥ 0, y2 ≥ 0, y3 ≥ 0

The following duality theorems clarify the relationship between the dual and the pri- mal.

Proposition 14.3.1 The optimal values of the primal and the dual objective functions are always identical, if the optimal solutions exist.

175 Proposition 14.3.2 1. If a certain choice variable in a primal (dual) program is op- timally nonzero then the corresponding dummy variable in the dual (primal) must be optimally zero. 2. If a certain dummy variable in a primal (dual) program is optimally nonzero then the corresponding choice variable in the dual (primal) must be optimally zero. The last simplex tableau of a linear program offers not only the optimal solution to that program but also the optimal solution to the dual program.

Here we summarize a procedure about how to solve the dual program: The optimal solution to the dual program can be read from row 0 of the last simplex tableau of the primal in the following way:

1. the absolute values of the elements in the columns corresponding to the primal dummy variables are the values of the dual choice variables.

2. the absolute values of the elements in the columns corresponding to the primal choice variables are the values of the dual dummy variables.

Example 14.3.3 From the last simplex tableau in Example 74 we can infer the optimal ∗ solution to the problem in Example 80. It is y1 = 2, y2 = 0, y3 = 24, π = 320.

Example 14.3.4 The last simplex tableau in Example 77 implies that the optimal solu- tion to the problem in Example 81 is y1 = 1, y2 = 0, y3 = 2,C = 2.

176 Chapter 15

Continuous Dynamics: Differential Equations

′ Definition 15.0.3 An equation F (x, y, y , . . . , y(n)) = 0 which assumes a functional rela- tionship between independent variable x and dependent variable y and includes x, y and derivatives of y with respect to x is called an ordinary differential equation of order n.

The order of a differential equation is determined by the order of the highest derivative in the equation. A function y = y(x) is said to be a solution of this differential equation, in some open ′ interval I, if F (x, y(x), y (x), . . . , y(n)(x)) = 0 for all x ∈ I.

15.1 Differential Equations of the First Order

Let us consider the first order differential equations, y′ = f(x, y). The solution y(x) is said to solve Cauchy’s problem (or the initial value problem) with the initial values (x0, y0), if y(x0) = y0. Proposition 36 (The Existence Theorem)

If f is continuous in an open domain D, then for any given pair (x0, y0) ∈ D there ′ exists a solution y(x) of y = f(x, y) such that y(x0) = y0. Proposition 37 (The Uniqueness Theorem) ∂f ∈ If f and ∂y are continuous in an open domain D, then given any (x0, y0) D there ′ exists a solution y(x) of y = f(x, y) such that y(x0) = y0.

177 Definition: A differential equation of the form y′ = f(y) is said to be autonomous (i.e. y′ is determined by y alone). Note that the differential equation gives us the slope of the solution curves at all points of the region D. Thus, in particular, the solution curves cross the curve f(x, y) = k (k is a constant) with slope k. This curve is called the isocline of slope k. If we draw the set of isoclines obtained by taking different real values of k, we can schematically sketch the family of solution curves in the (x − y)-plane. We put here a procedure to solve a differential equation of the first order: There is no general method of solving differential equations. However, in some cases the solution can be easily obtained: The Variables Separable Case. ′ dy If y = f(x)g(y), then g(y) = f(x)dx. By integrating both parts of the latter equation, we will get a solution.

Example 15.1.1 Solve Cauchy’s problem (x2 + 1)y′ + 2xy2 = 0, give y(0) = 1. If we rearrange the terms, this equation becomes equivalent to

dy 2xdx = − y2 x2 + 1 The integration of both parts yields

1 = ln(x2 + 1) + C y or 1 y(x) = ln(x2 + 1) + C Since we should satisfy the condition y(0) = 1, we can evaluate the constant C,C = 1.

Differential Equations with Homogeneous Coefficients.

Definition 15.1.1 A function f(x, y) is called homogeneous of degree n, if for any λ, f(λx, λy) = λnf(x, y).

If we have the differential equation M(x, y)dx + N(x, y)dy = 0 and M(x, y),N(x, y) are homogeneous of the same degree, then the change of variable y = tx reduces this equation to the variables separable case.

178 Example 15.1.2 Solve (y2 − 2xy)dx + x2dy = 0. The coefficients are homogeneous of degree 2. Note that y ≡ 0 is a special solution. If y ≠ 0, let us change the variable: y = tx, dy = tdx + xdt. Thus

x2(t2 − 2t)dx + x2(tdx + xdt) = 0

Dividing by x2 (since y ≠ 0, x ≠ 0 as well) and rearranging the terms, we get a variables separable equation dx dt = − x t2 − t Taking the integrals of both parts,

∫ ∫ ∫ dx dt 1 1 = ln|x|, − = ( − )dt = ln|t| − ln|t − 1| + C x t2 − 1 t t − 1 therefore

Ct Cy x = t−1 , or, in terms of y, x = y−x . x2 We can single out y as a function of x, y = x−C . Exact differential equations. ∂M(x,y) ≡ ∂N(x,y) If we have the differential equation M(x, y)dx + N(x, y)dy = 0 and ∂y ∂x , then all the solutions of this equation are given by F (x, y) = C, C is a constant, where

∂F ∂F F (x, y) is such that ∂x = M(x, y), ∂y = N(x, y).

y 3 Example 15.1.3 Solve x dx + (y + lnx)dy = 0. ∂ y ∂ 3 ∂y ( x ) = ∂x (y +lnx), so we have checked that we are dealing with an exact differential equation. Therefore, we need to find F (x, y) such that

∫ y y F ′ = ⇒ F (x, y) = dx = ylnx + ϕ(y) x x x and

′ 3 Fy = y + lnx

To find ϕ(y), note that ∫ 1 F ′ = y3 + lnx = lnx + ϕ′(y) ⇒ y3dy = y4 + C y 4

Therefore, all solutions are given by the implicit function 4ylnx + y4 = C.

179 Linear differential equations of the first order. If we need to solve y′ + p(x)y = q(x), then all the solutions are given by the formula ∫ ∫ ∫ y(x) = e− p(x)dx(C + q(x)e p(x)dxdx)

Example 15.1.4 Solve xy′ − 2y = 2x4. ′ − 2 3 In other words, we need to solve y x y = 2x . Therefore, ∫ ∫ ∫ 2 dx 3 − 2 dx lnx2 3 ln 1 2 2 y(x) = e x (C + 2x e x sd) = e (C + 2x e x2 sd) = x (C + x )

15.2 Linear Differential Equations of a Higher Order with Constant Coefficients

These are the equations of the form.

(n) (n−1) ′ y + a1y + ... + an−1y + any = f(x)

If f(x) ≡ 0 the equation is called homogeneous, otherwise it is called non-homogeneous. Here we summarize a procedure of the general solution of the homogeneous equation yg(x)

The general solution is a sum of basic solutions y1, . . . , yn,

yg(x) = C1y1(x) + ... + Cnyn(x),C1,...,Cn are arbitrary constants. These arbitrary constants, however, can be defined in a unique way, once we set up the initial value Cauchy problem. Find y(x) such that

′ (n−1) y(x) = y00 , y (x) = y01 , . . . , y (x) = y0n−1 at x = x0.

where x0, y00 , y01 , . . . , y0n−1 are given initial values (real numbers). To find all basic solutions, we proceed as follows:

• 1. Solve the characteristic equation

n n−1 λ + a1λ + ... + an−1λ + an = 0

for λ. The roots of this equation are λ1, . . . , λn. Some of these roots may be complex numbers. We may also have repeated roots.

180 • 2. If λi is a simple real root, the basic solution corresponding to this root is yi(x) =

eλix.

• 3. If λi is a repeated real root of degree k, it generates k basic solutions

λix λix k−1 λix yi1 (x) = e , yi2 (x) = xe , . . . , yik (x) = x e

√ • 4. If λj is a complex root, λj = αj + iβj, i = −1,then the complex conjugate of

λj, say λj+1 = αj − iβj, is also a root of the characteristic equations. Therefore the

pair λj, λj+1 give rise to two basic solutions:

αj x αj x yj1 = e cos βjx, yj2 = e sin βjx

• 5. If λj is a repeated complex root of degree l, λj = αj + iβj, then the complex

conjugate of λj, say λj+1 = αj − iβj, is also a repeated root of degree l. These 2l roots generate 2l basic solutions

αj x αj x l−1 αj x yj1 = e cos βjx, yj2 = xe cos βjx, . . . , yjl = x e cos βjx

αj x αj x l−1 αj x yjl+1 = e sin βjx, yjl+2 = xe sin βjx, . . . , yj2t = x e sin βjx

At the same time, we give a procedure to find the solution the Non-Homogeneous equation.

The general solution of the nonhomogeneous solution is ynh(x) = yg(x)+yp(x),

where yg(x) is the general solution of the homogeneous equation and yp(x) is a particular solution of the nonhomogeneous equation, i.e. any function which solves the nonhomogeneous equation.

And we give some procedures to find a particular solution of the Non-Homogeneous equations.

bx • 1. If f(x) = Pk(x)e ,Pk(x) is a polynomial of degree k, then a particular solution is

s bx yp(x) = x Qk(x)e ,

181 where Qk(x) is a polynomial of the same degree k. If b is not a root of the charac- teristic equation, s = 0; if b is a root of the characteristic polynomial of degree m, then s = m.

px px • 2. If f(x) = Pk(x)e cos qx+Qk(x)e sin qx, Pk(x),Qk(x) are polynomials of degree k, then a particular solution can be found in the form

s px s px yp(x) = x Rk(x)e cos qx + x Tk(x)e sin qx,

where Rk(x),Tk(x) are polynomials of degree k. If p + iq is not a root of the characteristic equation, s = 0; if p + iq is a root of the characteristic polynomial of degree m then s = m.

• 3. The general method for finding a particular solution of a nonhomogeneous equa- tion is called the variation of parameters or the method of undetermined coefficients.

Suppose we have the general solution of the homogeneous equation

yg = C1y1(x) + ... + Cnyn(x),

where yi(x) are basic solutions. Treating constants C1,...,Cn as functions of x, for

instance, u1(x), . . . , un(x), we can express a particular solution of the nonhomoge- neous equation as

yp(x) = u1(x)y1(x) + ... + un(x)yn(x),

where u1(x), . . . , un(x) are solutions of the system

′ ′ u1(x)y1(x) + ... + un(x)yn(x) = 0

′ ′ ′ ′ u1(x)y1(x) + ... + un(x)yn(x) = 0 . .

′ (n−2) ′ (n−2) u1(x)y1 (x) + ... + un(x)yn (x) = 0

′ (n−1) ′ (n−1) u1(x)y1 (x) + ... + un(x)yn (x) = f(x)

182 • 4. If f(x) = f1(x)+f2(x)+...+fr(x) and yp1(x), . . . , ypr(x) are particular solutions

corresponding to f1(x), . . . , fr(x) respectively, then

yp(x) = yp1(x) + ... + ypr(x)

Example 15.2.1 Solve y′′ − 5y′ + 6y = x2 + ex − 5.

The characteristic roots are λ1 = 2, λ2 = 3, therefore the general solution of the homogeneous equation is

2t 3t y(t) = C1e + C2e

We search for a particular solution in the form

2 t yp(t) = at + bt + c + de

To find coefficients a, b, c, d, let us substitute the particular solution into the initial equation:

2a + det − 5(2at + b + det) + 6(at2 + bt + c + det) = t2 − 5 + et

Equating term-by-term the coefficients at the left-hand side to the right-hand side, we obtain 6a = 1, −5 · 2a + 6b = 0, 2a − 5b + 6c = −5, d − 5d + 6d = 1

Thus d = 1/2, a = 1/6, b = 5/18, c = −71/108. Finally, the general solution of the nonhomogeneous equation is

x2 5x ex y (x) = y(t) = C e2t + C e3t + + − frac71108 + nh 1 2 6 18 2

′′ − ′ ex Example 15.2.2 Solve y 2y + y = x . x x The general solution of the homogeneous equation is yg(x) = C1xe + C2e (the char- acteristic equation has the repeated root λ = 1 of degree 2. We look for a particular solution of the nonhomogeneous equation in the form

x x yp(x) = u1(x)xe + u2(x)e

We have

′ x ′ x u1(x)xe + u2(x)e = 0

183 ex u′ (x)(ex + xex) + u′ (x)ex = 1 2 x ′ 1 ′ − | | − Therefore u1(x) = x , u2(x) = 1. Integrating, we find that u1(x) = ln x , u2(x) = x. x x Thus yp(x) = ln|x|xe + xe ,and

x x x yn(x) = yg(x) + yp(x) = ln|x|xe + C1xe + C2e

15.3 Systems of the First Order Linear Differential Equations

The general form of a system of the first order differential equations is:

x˙(t) = A(t)x(t) + b(t), x(0) = x0

′ where t is the independent variable (”time”), x(t) = (x1(t), . . . , xn(t)) is a vector of dependent variables, A(t) = (aij(t))[n×n] is a real n × n matrix with variable coefficients, ′ b(t) = (b1(t), . . . , bn(t)) a variant n-vector. In what follows, we concentrate on the case when A and b do not depend on t, which is the case in constant coefficients differential equations systems:

x˙(t) = A(t)x(t) + b, x(0) = x0 (15.1)

We also assume that A is non-singular. The solution of system (11) can be determined in two steps: First, we consider the homogeneous system (corresponding to b = 0):

x˙(t) = A(t)x(t), x(0) = x0 (15.2)

The solution xc(t) of this system is called the complementary solution.

Second, we find a particular solution xp of the system (15.1), which is called the particular integral. The constant vector xp is simply the solution to Axp = −b, i.e.xp = −A−1b.

Given xc(t) and xp, the general solution of the system (11) is simply:

x(t) = xc(t) + xp

184 We can find the solution to the homogeneous system (15.2) in two different ways. First, we can eliminate n−1 unknowns and reduce the system to one linear differential equation of degree n.

Example 15.3.1 Let   x˙ = 2x + y  y˙ = 3x + 4y Taking the derivative of the first equation and eliminating y andy ˙(y ˙ = 3x + 4y = 3x + 4x ˙ − 4 · 2x), we arrive at the second order homogeneous linear differential equation

x¨ − 6x ˙ + 5x = 0x

t 5t which has the general solution x(t) = C1e + C2e . Since y(t) =x ˙ − 2x, we find that t 5t y(t) = −C1e + 3C2e . The second way is to write the solution to (15.2) as

At x(t) = e x0

where (by definition)

A2t2 eAt = I + At + + ... 2! Unfortunately, this formula is of little practical use. In order to find a feasible formula for eAt, we distinguish three main cases. Case 1: A has real and distinct eigenvalues The fact that the eigenvalues are real and distinct implies that any corresponding eigenvectors are linearly independent. Consequently, A is diagonalizable, i.e.

A = P ΛP −1 where P = [v1, v2, . . . , vn] is a matrix composed by the eigenvectors of A and Λ is a diagonal matrix whose diagonal elements are the eigenvalues of A. Therefore,

eA = P eΛP −1

185 Thus, the solution of the system (15.2) can be written as:

Λt −1 x(t) = P e P x0 = P eΛtc

λ1t λ1n = c1v1e + ... + cnvne where c = (c1, c2, . . . , cn) is a vector of constants determined from the initial conditions −1 (c = P x0). Case 2: A has real and repeated eigenvalues First, consider the simpler case when A has only one eigenvalue λ which is repeated m times (m is called the algebraic multiplicity of λ). Generally, in this situation the maximum number of independent eigenvectors corresponding to λ is less than m, meaning that we cannot construct the matrix P of linearly independent eigenvectors and, therefore, A is not diagonalizable. The solution in this case has the form:

∑m x(t) = cihi(t) i=1 where hi(t) are quasipolinomials and ci are constants determined by the initial conditions. For example, if m = 3, we have:

λt h1(t) = e v1

λt h2(t) = e (tv1 + v2)

λt 2 h3(t) = e (t v1 + 2tv2 + 3v3) where v1, v2, v3 are determined by the conditions:

(A − λI)vi = vi−1, v0 = 0

If A happens to have several eigenvalues which are repeated, the solution of (12) is ob- tained by finding the solution corresponding to each eigenvalue, and then adding up the individual solutions. Case 3: A has complex eigenvalues.

186 Since A is a real matrix, the complex eigenvalues always appear in pairs, i.e. if A has the eigenvalue α + βi, then it also must have the eigenvalue α − βi. Now consider the simpler case when A has only one pair of complex eigenvalues,

λ1 = α + βi and λ2 = α − βi. Let v1 and v2 denote the eigenvectors corresponding to λ1 and λ2; then v2 =v ¯1, wherev ¯1 is the conjugate of v1. The solution can be expressed as:

At x(t) = e x0

Λt −1 = P e P x0 = P eΛtc

(α+βi)t (α−βi)t = c1v1e + c2v2e

αt αt = c1v1e (cos βt + i sin βt) + c2v2e (cos βt − i sin βt)

αt αt = (c1v1 + c2v2)e cos βt + i(c1v1 − c2v2)e sin βt

αt αt = h1e cos βt + h2e sin βt where h1 = c1v1 + c2v2, h2 = i(c1v1 − c2v2) are real vectors. Note that if A has more pairs of complex eigenvalues, then the solution of (15.2) is obtained by finding the solution corresponding to each eigenvalue, and then adding up the individual solutions. The same remark holds true when we encounter any combination of the three cases discussed above.

Example 15.3.2 Consider   5 2 A =   −4 −1 Solving the characteristic equation det(A − λI) = λ2 − 4λ + 3 = 0, we find eigenvalues of

A : λ1 = 1, λ2 = 3.

From the condition (A − λ1I)v1 = 0, we find an eigenvector corresponding to λ1v1 = ′ (−1, 2) . Similarly, the condition (A − λ2I)v2 = 0 gives an eigenvector corresponding to ′ λ2v2 = (−1, 1) . Therefore, the general solution of the systemx ˙(t) = Ax(t) is

λ1t λ2t x(t) = c1v1e + c2v2e or

187     t 3t x1(t) −c1e − c2e x(t) =   =   t 3t x2(t) 2c1e + c2e

Example 15.3.3 Consider   3 1 A =   −1 1

det(A − λI) = λ2 − 4λ + 4 = 0 gives the eigenvalueλ = 2 repeated twice. −  1 From the condition (A − λ1I)v1 = 0, we find v1 =   1  1  and from (A − λ1I)v2 = v1, we obtain v2 = −2 The solution of the systemx ˙ = Ax(t) is

λt λt x(t) = c1h1(t) + c2h2(t) = c1v1e + c2(tv1 + v2)e or    

x1(t) −c1 + c2 − c2t x(t) =   =   e2t x2(t) c1 − 2c2 + c2t

Example 15.3.4 Consider 3 2 A =   −1 1 2 The equation det(A − λI) = λ − 4λ + 5 = 0 leads to the complex eigenvalues λ1 =

2 + i, λ2 = 2 − i.

For λ1 we set the equation (A − λ1I)v1 = 0 and obtain:     − 1 + i 1 i v1 = v2 =v ¯1 = −1 −1 The solution of the systemx ˙ = Ax(t) is

λ1t λ2t x(t) = c1v1e + c2v2e

(2+i)t (2−i)t = c1v1e + c2(tv1 + v2)e

(2+i)t (2−i)t = c1v1e + c2(tv1 + v2)e

2t 2t = c1v1e (cos t + i sin t) + c2v2e (cos t − i sin t)t

188 Performing the computations, we finally obtain:     cos t − sin t cos t + sin t 2t   2t   x(t) = (c1 + c2)e + i(c1 − c2)e − cos t − sin t Now let the vector b be different from 0. To specify the solution to the system (11), we

−1 need a particular solution xp. This is found from the conditionx ˙ = 0; thus, xp = −A b. Therefore, the general solution of the non-homogeneous system is

Λt −1 −1 x(t) = xc(t) + xp = P e P c − A b

where the vector of constants c is determined by the initial condition x(0) = x0 as c = −1 x0 + A b.

Example 15.3.5 Consider       −  5 2  1  1 A = , b = and x0 = −4 −1 2 2 In Example (15.3.2) we obtained   − t − 3t  c1e c2e  xc(t) = t 3t 2c1e + c2e In addition, we have         c −1 −5 −8  1 −1    3   3  c = = x0 + A b = + = 14 20 c2 2 3 3 and       −1 −2 1 −5 −1  3 3     3  −xp = A b = = 4 5 14 3 3 2 3 The solution of the systemx ˙(t) = Ax(t) + b is   8 et − 20 e3t + 5  3 3 3  x(t) = xc(t) + xp = −16 t 20 3t −14 3 e + 3 e + 3

189 15.4 Economic Application: General Equilibrium

′ ′ Consider a market for n goods x = (x1, x2, . . . , xn) . Let p = (p1, p2, . . . , pn) denote the vector of prices and Di(p),Si(p) demand and supply for good i, i = 1, . . . , n. The general ∗ ∗ ∗ equilibrium is obtained for a price vector p which satisfies the conditions: Di(p ),Si(p ) for all i = 1, . . . , n. The dynamics of price adjustment for the general equilibrium can be described by a system of first order differential equations of the form:

p˙ = W [D(p) − S(p)] = WAp, p(0) = p0

where W = diag(wi) is a diagonal matrix having the speeds of adjustment wi, i = 1, . . . , n, ′ ′ D(p) = [D1(p),...,Dn(p)] on the diagonal and S(p) = [S1(p),...,Sn(p)] are linear functions of p and A is a constant n × n matrix. We can solve this system by applying the methods discussed above. For instance, if WA = P ΛP −1, then the solution is

Λt −1 p(t) = P e P p0

Economics Application 12 (The Dynamic IS − LM Model) A simplified dynamic IS − LM model can be specified by the following system of differential equations: ˙ Y = w1(I − S)

r˙ = w2[L(Y, r) − M] with initial conditions

Y (0) = Y0, r(0) = r0 where: Y is national income; r is the interest rate;

I = I0 − αr is the investment function (α is a positive constant); S = s(Y − T ) + (T − G) is national savings; (T,G and s represents taxes, government purchases and the savings rate);

L(Y, r) = a1Y − a2r is money demand (a1 and a2 are positive constants);

190 M is money supply;

w1, w2 are speeds of adjustment.

For simplicity, consider w1 = 1 and w2 = 1. The original system can be re-written as:         ˙ Y −s −α Y I0 − (1 − s)T + G   =     +   r˙ a1 −a2 r −M   −s −α Assume that A =   is diagonalizable, i.e. A = P ΛP −1, and has distinct a1 −a2 eigenvalues λ1, λ2 with corresponding eigenvectors v1, v2.(λ1, λ2 are the solutions of the 2 equation r + (s + a2)r + sa2 + αa1 = 0.) The solution becomes:   Y (t)   Λt −1 −1 −1 = P e P (x0 + A b) − A b r(t) where     − − Y0 I0 (1 s)T + G x0 = , b = . r0 −M We have  

1 −a2 α A−1 =   αa + sa 1 2 −a1 −s and  

1 −a2[I0 − (1 − s)T + G] − αM A−1b =   αa + sa 1 2 −a1[I0 − (1 − s)T + G] + sM   v˜′ Assume that P −1 =  1, then we have v˜′ 2     ′ eλ1t 0 v˜ Λt −1    1 ′ λ1t ′ λ2t P e P = [v1, v2] = v1v˜1e + v2v˜2e λ2t ′ 0 e v˜2 Finally, the solution is:      

−a2S−αM −a2S−αM Y (t) ′ ′ Y0 +   λ1t λ2t  αa1+sa2  −  αa1+sa2  = [v1v˜1e + v2v˜2e ] −a1S+sM −a1S+sM r(t) r0 + αa1+sa2 αa1+sa2 where S = I0 − (1 − s)T + G. Economics Application 13 (The Solow Model)

191 The Solow model is the basic model of economic growth and it relies on the following key differential equation, which explains the dynamics of capital per unit of effective labor:

˙ k(t) = sf(k(t)) − (n + g + δ)k(t), k(0) = k0 where k denotes capital per effective labor; s is the savings rate; n is the rate of population growth; g is the rate of technological progress; δ is the depreciation rate. Let us consider two types of production function. Case 1. The production function is linear: f(k(t)) = ak(t), where a is a positive constant. The equation for k˙ becomes:

k˙ (t) = (sa − n − g − δ)k(t)

This linear differential equation has the solution

(sa−n−g−δ)t (sa−n−g−δ)t k(t) = k(0)e = k0e

Case 2. The production function takes the Cobb-Douglas form: f(k(t)) = [k(t)]α, where α ∈ [0, 1]. In this case, the equation for k˙ represents a Bernoulli nonlinear differential equation:

˙ α k(t) + (sa − n − g − δ)k(t) = s[k(t)] , k(0) = k0

Dividing both sides of the latter equation by [k(t)]α, and denoting m(t) = [k(t)]1−α, m(0) =

1−α (k0) , we obtain: 1 m˙ (t) + (n + g + δ)m(t) = s 1 − α or m˙ (t) + (1 − α)(n + g + δ)m(t) = s(1 − α)

192 The solution to this first order linear differential equation is given by: ∫ ∫ ∫ m(t) = e− (1−α)(n+g+δ)dt[A + s(1 − α)e (1−α)(n+g+δ)dtdt] ∫ = e−(1−α)(n+g+δ)t[A + s(1 − α) e(1−α)(n+g+δ)tdt] s = Ae−(1−α)(n+g+δ)t + n + g + δ

1−α where A is determined from the initial condition m(0) = (k0) . It follows that A = 1−α − s (k0) n+g+δ and s s m(t) = [(k )1−α − ]e−(1−α)(n+g+δ)t + 0 n + g + δ n + g + δ

1−α s −(1−α)(n+g+δ)t s 1 k(t) = {[(k ) − ]e + } 1−α 0 n + g + δ n + g + δ Economics Application 14 (The Ramsey-Cass-Koopmans Model) This model takes capital per effective labor k(t) and consumption per effective labor c(t) as endogenous variables. Their behavior is described by the following system of differential equations: k˙ (t) = f(k(t)) − c(t) − (n + g)k(t) f ′(k(t)) − ρ − θg c˙(t) = c(t) θ with the initial conditions

k(0) = k0, c(0) = c0, where, in addition to the variables already specified in the previous application, ρ stands for the discount rate ; θ is the coefficient of relative . Assume again that f(k(t)) = ak(t) where a > 0 is a constant. The initial system reduces to a system of linear differential equations:

k˙ (t) = (a − n − g)k(t) − c(t)

a − ρ − θg c˙(t) = c(t) θ The second equation implies that:

bt a−ρ−θg c(t) = c0e , where b = θ .

193 Substituting for c(t) in the first equation, we obtain:

˙ bt k(t) = (a − n − g)k(t) − c0e

The solution of this linear differential equation takes the form:

k(t) = kc(t) + kp(t) where kc and kp are the complementary solution and the particular integral, respectively. The homogeneous equation k˙ (t) = (a − n − g)k(t) gives the complementary solution:

(a−n−g)t k(t) = k0e

bt The particular integral should be of the form kp(t) = me ; the value of m can be ˙ determined by substituting kp in the non-homogeneous equation for k :

bt bt bt mbe = (a − n − g)me − c0e

It follows that c c m = 0 , k (t) = 0 ebt a − n − g − b p a − n − g − b and c k(t) = k e(a−n−g)t + 0 ebt 0 a − n − g − b

15.5 Simultaneous Differential Equations. Types of Equilibria

Let x = x(t), y = y(t), where t is independent variable. Then consider a two-dimensional autonomous system of simultaneous differential equations   dx dt = f(x, y)  dy dt = g(x, y) The point (x∗, y∗) is called an equilibrium of this system if f(x∗, y∗) = g(x∗, y∗) = 0. Let J be the Jacobian matrix   ∂f ∂f J =  ∂x ∂y  ∂g ∂g ∂x ∂y

194 ∗ ∗ evaluated at (x , y ), and let λ1, λ2 be the eigenvalues of this Jacobian. Then the equilibrium is

1. a stable (unstable) node if λ1, λ2 are real, distinct and both negative (pos- itive);

2. a saddle if the eigenvalues are real and of different signs, i.e. λ1λ2 < 0;

3. a stable (unstable) focus if λ1, λ2 are complex and Re(λ1) < 0(Re(λ1) > 0);

4. a center or a focus if λ1, λ2 are complex and Re(λ1) = 0;

5. a stable (unstable) improper node if λ1, λ2 are real, λ1 = λ2 < 0(λ1 = λ2 > 0) and the Jacobian is a not diagonal matrix;

6. a stable (unstable) star node if λ1, λ2 are real, λ1 = λ2 < 0(λ1 = λ2 > 0) and the Jacobian is a diagonal matrix.

The figure below illustrates this classification:

Example 15.5.1 Find the equilibria of the system and classify them   x˙ = 2xy − 4y − 8  y˙ = 4y2 − x2

Solving the system   2xy − 4y − 8 = 0  4y2 − x2 = 0 for x, y, we find two equilibria, (−2, −1) and (4, 2). At (−2, −1) the Jacobian     2y 2x − 4 −2 −8 J =   =   −2x 8y 4 −8 (−2,−1)

The characteristic equation for J is λ2 − (tr(J))λ + det(J) = 0.

2 (tr(J)) − 4det(J) < 0 ⇒ λ1,2 are complex.

tr(J) < 0 ⇒ λ1 + λ2 < 0 ⇒ Reλ1 < 0 ⇒ the equilibrium (−2, −1) is a stable focus.

195 At (4, 2)   4 4 J =   −8 16

2 (tr(J)) − 4det(J) > 0 ⇒ λ1,2 are distinct and real.

tr(J) = λ1 + λ2 > 0, det(J) = λ1λ2 > 0 ⇒ λ1 > 0, λ2 > 0 ⇒ the equilibrium (4, 2) is an unstable node.

196 Chapter 16

Discrete Dynamics: Difference Equations

Let y denote a real-valued function defined over the set of natural numbers. Hereafter yk stands for y(k)-the value of y at k, where k = 0, 1, 2 ....

Definition 16.0.1 The first difference of y evaluated at k is △yk = yk+1 − yk; 2 The second difference of y evaluated at k is △ yk − △(△yk) − yk+2 − 2yk+1 + yk; n n−1 In general, the nth difference of y evaluated at k is defined as △ yk = △(△ yk), for any integer n, n > 1.

Definition 16.0.2 A difference equation in the unknown y is an equation relating y and any of its differences △y, △2y, . . . for each value of k, k = 0, 1,... Solving a difference equation means finding all functions y which satisfy the relation specified by the equation.

Example 16.0.2 2△yk − yk = 1 (or, equivalently,2△y − y = 1) 2 2 2 2 △ yk + 5yk△yk + (yk) = 2 (or △ y + 5y△y + (y) = 2) Note that any difference equation can be written as

f(yk+n, yk+n+1, . . . , yk+1, yk) = g(k), n ∈ N

For example, the above two equations may also be expressed as follows:

2 2yk+1 − 3yk = 1, yk+2 − 2yk+1 + yk + 5ykyk+1 − 4(yk) = 2

197 Definition 56 A difference equation is linear if it takes the form

f0(k)yk+n + f1(k)yk+n−1 + ... + fn−1(k)yk+1 + fn(k)yk = g(k),

where f0, f1, . . . , fn, g are each functions of k (but not of some yk+i, i ∈ N) defined for all k = 0, 1, 2,...

Definition 16.0.3 A linear difference equation is of order n if both f0(k) and fn(k) are different from zero for each k = 0, 1, 2,...

From now on, we focus on the problem of solving linear difference equations of order n with constant coefficients. The general form of such an equation is:

f0yk+n + f1yk+n−1 + ... + fn−1yk+1 + fnyk = gk,

where, this time, f0, f1, . . . , fn are real constants and f0, fn are different from zero.

fi gk Dividing this equation by f0 and denoting ai = for i = 0, . . . , n, rk = , the f0 f0 general form of a linear difference equation of order n with constant coefficients becomes:

yk+n + a1yk+n−1 + ... + an−1yk+1 + anyk = rk (16.1)

where an is non-zero. Here we give a general procedure of solution of difference equation The general procedure for solving a linear difference equation of order n with constant coefficients involves three steps:

• Step 1.

Consider the homogeneous equation

yk+n + a1yk+n−1 + ... + an−1yk+1 + anyk = 0,

and find its solution Y . (First we will describe and illustrate the solving procedure for equations of order 1 and 2. Following that, the theory for the case of order n is obtained as a natural extension of the theory for second-order equations.)

198 • Step 2.

Find a particular solution y∗ of the complete equation (16.1). The most useful technique for finding particular solutions is the method of undetermined coefficients which will be illustrated in the examples that follow.

• Step 3.

The general solution of equation (13) is:

∗ yk = Y + y

16.1 First-order Linear Difference Equations

The general form of a first-order linear differential equation is:

yk+1 + ayk = rk.

The corresponding homogeneous equation is:

yk+1 + ayk = 0, and has the solution Y = C(−a)k, where C is an arbitrary constant. Here we give a procedure of solution of First-Order Non-Homogeneous Linear Differ- ence Equation:

First, consider the case when rk is constant over time, i.e. rk ≡ r for all k. Much as in the case of linear differential equations, the general solution of the nonho- mogeneous difference equation yt is the sum of the general solution of the homogeneous equation and a particular solution of the nonhomogeneous equation. Therefore r y = A · (−a)k + , a ≠ −1 k 1 + a

k yk = A · (−a) + rk = A + rk, a = −1 where A is an arbitrary constant.

If we have the initial condition yk = y0 when k = 0, the general solution of the nonhomogeneous equation takes the form r r y = (y − ) × (−a)k + , a ≠ −1 k 0 1 + a 1 + a

199 yk = y0 + rk, a = −1

If r now depends on k, r = rk, then the solution takes the form ∑k−1 k k−1−i yk = (−a) y0 + (−a) ri, k = 1, 2,... i=0 A particular solution to non-homogeneous difference equation can be found using the method of undetermined coefficients.

Example 16.1.1 yk+1 − 3yk = 2 k The solution of the homogeneous equation yk+1 −3yk = 0 is: Y = C ·3 . The righthand term suggests looking for a constant function as a particular solution. Assuming y∗ = A and substituting into the initial equation, we obtain A = −1; hence y∗ = −1 and the

∗ k general solution is yk = Y + y = C3 − 1. 2 Example 98 yk+1 − 3yk = k + k + 2 The homogeneous equation is the same as in the preceding example. But the right- hand term suggests finding a particular solution of the form:

y∗ = Ak2 + Bk + D

Substituting y∗ into non-homogeneous equation, we have:

A(k + 1)2 + B(k + 1) + D − 3Ak2 − 3Bk − 3D = k2 + k + 2 or, −2Ak2 + 2(A − B)k + A + B − 2D = k2 + k + 2

In order to satisfy this equality for each k, we must have:   −  2A = 1 2(A − B) = 1   A + B − 2D = 2

− 1 − − ∗ − 1 2 − − 3 It follows that A = 2 ,B = 1,D = frac34 and y = 2 k k 4 . The general ∗ k − 1 2 − − 3 solution is yk = Y + y = C3 2 k k 4 .

k Example 16.1.2 yk+1 − 3yk = 4e This time we try a particular solution of the form: y∗ = Aek. After substitution, we

4 ∗ k 4ek find A = e−3 . The general solution is yk = Y + y = C3 + e−3 .

200 Two remarks: The method of undetermined coefficients illustrated by these examples applies to the general case of order n as well. The trial solutions corresponding to some simple functions r are given in the following table: k

− ∗ Right hand term rk T rial solution y

n 2 n k A0 + A1k + A2k + ... + Ank

sin ak or cos ak A sin ak + b cos ak If the function r is a combination of simple functions for which we know the trial solutions, then a trial solution for r can be obtained by combining the trial solutions

k 3 for the simple functions. For example, if rk = 5 k , then the trial solution would be ∗ k 2 3 y = 5 (A0 + A1k + A2k + A3k ).

16.2 Second-Order Linear Difference Equations

The general form is:

yk+2 + a1yk+1 + a2yk = rk (14)

The solution to the corresponding homogeneous equation

yk+2 + a1yk+1 + a2yk = 0 (15) depends on the solutions to the quadratic equation:

2 m + a1m + a2 = 0 which is called the auxiliary (or characteristic) equation of equation (14). Let m1, m2 denote the roots of the characteristic equation. Then, both m1 and m2 will be non-zero because a2 is non-zero for (14) to be of second order.

Case 1. m1 and m2 are real and distinct.

k k The solution to (15) is Y = C1m1+C2m2, where C1,C2 are arbitrary constants.

Case 2. m1 and m2 are real and equal.

k k k The solution to (15) is Y = C1m1 + C2m2 = (C1 + C2k)m1.

Case 3. m1 and m2 are complex with the polar forms r(cos θ ± i sin θ), r > 0, θ ∈ (−π.π].

201 k The solution to (15) is Y = C1r cos(kθ + C2). Any complex number a + bi admits a representation in the polar (or trigonometric) √ form r(cos θ ± i sin θ). The transformation is given by r = a2 + b2. θ is the unique angle (in radians) such that θ ∈ (−π.π] and cos θ = √ a , sin θ = √ b . a2+b2 a2+b2

Example 16.2.1 yk+2 − 5yk+1 + 6yk = 0 2 The characteristic equation is m − 5m + 6 = 0 and has the roots m1 = 2, m2 = 3. k k The difference equation has the solution yk = C12 + C23 .

k Example 16.2.2 yk+2 − 8yk+1 + 16yk = 2 + 3 2 The characteristic equation m − 8m + 16 = 0 has the roots m1 = m2 = 4, therefore, k the solution to the corresponding homogeneous equation is Y = (C1 + C2k)4 . To find a particular solution, we consider the trial solution y∗ = A2k + B. By substituting y∗ into the non-homogeneous equation and performing the computation-

1 1 s we obtain A = 4 ,B = 3 . Thus, the solution to the non-homogeneous equation is k 1 k 1 yk = (C1 + C2k)4 + 4 2 + 3 .

Example 16.2.3 yk+2 + yk+1 + yk = 0 √ √ 2 −1−i 3 −1+i 3 The equation m + m + 1 = 0 gives the roots m1 = 2 , m2 = 2 or, written in 2π ± 2π the polar form, m1,2 = cos 3 i sin 3 . Therefore, the given difference equation has the ∗ 2π solution: y = C1 cos( 3 + C2).

16.3 The General Case of Order n

In general, any difference equation of order n can be written as:

yk+n + a1yk+n−1 + ... + an−1yk+1 + anyk = rk (14)

The corresponding characteristic equation becomes

n n−1 m + a1m + ... + an−1m + an = 0

and has exactly n roots denoted as m1, . . . , mn. The general solution to the homogeneous equation is a sum of terms which are pro- duced by the roots m1, . . . , mn in the following way:

202 k a real simple root m generates the term C1m . a real and repeated p times root m generates the term

2 p−1 k (C1 + C2k + C3k + ... + Cpk )m

each pair of simple complex conjugate roots r(cos θ ± i sin θ) generates the term

k C1r cos(kθ + C2)

each repeated p times pair of complex conjugate roots r(cos θ ± i sin θ) generates the term

k p−1 r [C1,1 cos(kθ + C1,2) + C2,1k cos(kθ + C2,2) + ... + Cp,1k cos(kθ + Cp,2)]

The sum of terms will contain n arbitrary constants. A particular solution y∗ to the non-homogeneous equation again can be obtained by means of the method of undetermined coefficients (as was illustrated for the cases of first-order and second-order difference equations).

Example 103 yk+4 − 5yk+3 + 9yk+2 − 7yk+1 + 2yk = 0 The characteristic equation

m4 − 5m3 + 9m2 − 7m + 2 = 0

∗ has the roots m1 = 1, which is repeated 3 times, and m2 = 2. The solution is y = 2 k (C1 + C2k + C3k ) + C42 .

16.4 Economic Application:A dynamic model of eco- nomic growth

Example 16.4.1 (Simple and compound interest) Let S0 denote an initial sum of money. There are two basic methods for computing the interest earned in a period, for example, one year.

S0 earns simple interest at rate r if each period the interest equals a fraction r of S0.

If Sk denotes the sum accumulated after k periods, Sk is computed in the following way:

Sk+1 = Sk + rS0

203 This is a first-order difference equation which has the solution Sk = S0(1 + kr).

S0 earns compound interest at rate r if each period the interest equals a fraction r of the sum accumulated at the beginning of that period. In this case, the computation formula is:

Sk+1 = Sk + rSk or Sk+1 = Sk(1 + r). k The solution to this homogeneous first-order difference equation is Sk = (1 + r) S0.

Example 16.4.2 Let Yt,Ct,It denote national income, consumption, and investment in period t, respectively. A simple dynamic model of economic growth is described by the equations:

Yt = Ct + It

Ct = c + mYt

Yt+1 = Yt + rIt where c is a positive constant; m is the marginal propensity to consume, 0 < m < 1; r is the growth factor. From the above system of three equations, we can express the dynamics of national income in the form of a first-order difference equation:

Yt+1 − Yt = rIt = r(Yt − Ct) = rYt − r(c + mYt) or

Yt+1 − [1 + r(1 − m)]Yt = −rc

The solution to this equation is c c Y = (Y − )[1 + r(1 − m)]t + t 0 1 − m 1 − m The behavior of investment can be described in the following way:

It+1 − It = (Yt+1 − Ct+1) − (Yt − Ct) = rIt − m(Yt+1 − Yt) = rIt − mrIt or

It+1 = [1 + r(1 − m)]It

t This homogeneous difference equation has the solution It = [1 + r(1 − m)] I0.

204 Chapter 17

Introduction to Dynamic Optimization

17.1 The First-Order Conditions

A typical dynamic optimization problem takes the following form: ∫ T maxc(t) V = f[k(t), c(t), t]dt 0 s.t. k˙ (t) = g[k(t), c(t), t]

k(0) = k0 > 0 where [0,T ] is the horizon over which the problem is considered (T can be finite or infinite);

V is the value of the objective function as seen from the initial moment t0 = 0; c(t) is the control variable and the objective function V is maximized with respect to this variable; k(t) is the state variable and the first constraint (called the equation of motion) de- ˙ dk scribes the evolution of this state variable over time (k denotes dt ). Similar to the static optimization, in order to solve the optimization program we need first to find the first-order conditions: Here we give a procedure to derive the First-Order Conditions:

• Step 1.

205 Construct the Hamiltonian function

H = f(k, c, t) + λ(t)g(k, c, t)

where λ(t) is a Lagrange multiplier.

• Step 2.

Take the derivative of the Hamiltonian with respect to the control variable and set it to 0: ∂H ∂f ∂g = + λ = 0 ∂c ∂c ∂c • Step 3.

Take the derivative of the Hamiltonian with respect to the state variable and set it to equal the negative of the derivative of the Lagrange multiplier with respect to time: ∂H ∂f ∂g = + λ = −λ˙ ∂k ∂k ∂k • Step 4.

Transversality condition Case 1: Finite horizons.

µ(T )k(T ) = 0

Case 2: Infinite horizons with f of the form f[k(t), c(t), t] = e−ρtu[k(t), c(t)]:

lim [λ(t)k(t)] = 0 t→∞ Case 3: Infinite horizons with f in a form different from that specified in case 2.

lim [H(t)] = 0 t→∞ Here we give a procedure to find the First-Order Conditions with More than One State and/or Control Variable: It may happen that the dynamic optimization problem contains more than one control variable and more than one state variable. In that case we need an equation of motion for each state variable. To write the first-order conditions, the algorithm specified above should be modified in the following way:

206 • Step 1a.

The Hamiltonian includes the right-hand side of each equation of motion times the corresponding multiplier.

• Step 2a.

Applies for each control variable.

• Step 3a.

Applies for each state variable.

17.2 Present-Value and Current-Value Hamiltonians

In economic problems, the objective function is usually of the form

f[k(t), c(t), t] = e−ρtu[k(t), c(t)], where ρ is a constant discount rate and e−ρt is a discount factor. The Hamiltonian

H = e−ρtu(k, c) + λ(t)g(k, c, t)

is called the present-value Hamiltonian (since it represents a value at the time t0 = 0). Sometimes it is more convenient to work with the current-value Hamiltonian, which represents a value at the time t:

H˙ = Heρt = u(k, c) + µ(t)g(k, c, t) where µ(t) = λ(t)eρt. The first-order conditions written in terms of the current-value Hamiltonian appear to be a slight modification of the present-value type:

∂Hˆ ∂c = 0, ∂Hˆ − ∂k = ρµ µ,˙ µ(T )k(T ) = 0 (The transversality condition in the case of finite horizons).

207 17.3 Dynamic Problems with Inequality Constraints

Let us add an inequality constraint to the dynamic optimization problem considered above. ∫ T maxc(t) V = f[k(t), c(t), t]dt 0 s.t. k˙ (t) = g[k(t), c(t), t]

h(t, k, c) ≥ 0

k(0) = k0 > 0

The following algorithm allows us to write the first-order conditions: Here we give a proceduer find the First-Order Conditions. with Inequality Constraints:

• Step 1. Construct the Hamiltonian

H = f(k, c, t) + µ(t)g(k, c, t) + w(t)h(k, c, t)

where µ(t) and w(t) are Lagrange multipliers.

• ∂Hˆ Step 2. ∂c = 0.

• ∂Hˆ − Step 3. ∂k = ρµ µ˙ .

• Step 4. w(t) ≥ 0, w(t)h(k, c, t) = 0.

• Step 5. Transversality condition if T is finite:

µ(T )k(T ) = 0.

17.4 Economics Application:The Ramsey Model

This example illustrates how the tools of dynamic optimization can be applied to solve a model of economic growth, namely the Ramsey model (for more details on the economic illustrations presented in this section see, for instance, R. J. Barro, X. Sala-i-Martin Economic Growth, McGraw-Hill, Inc., 1995).

208 The model assumes that the consists of households and firms but here we further assume, for simplicity, that households carry out production directly. Each house- hold chooses, at each moment t, the level of consumption that maximizes the present value of lifetime utility ∫ ∞ U = u[A(t)c(t)]ente−ρtdt 0 subject to the dynamic budget constraint A(t) is the level of technology; c denotes consumption per unit of effective labor (effective labor is defined as the product of labor force L(t) and the level of technology A(t)); k denotes the amount of capital per unit of effective labor; f(·) is the production function written in intensive form (f(k) = F (K/AL, 1)); n is the rate of population growth; ρ is the rate at which households discount future utility; x is the growth rate of technological progress; δ is the depreciation rate. The expression A(t)c(t) in the utility function represents consumption per person and the term ent captures the effect of the growth in the family size at rate n. The dynamic constraint says that the change in the stock of capital (expressed in units of effective labor) is given by the difference between the level of output (in units of effective labor) and the sum of consumption (per unit of effective labor), depreciation and the additional effect (x + n)k(t) from expressing all variables in terms of effective labor. As can be observed, households face a dynamic optimization problem with c as a control variable and k as a state variable. The present-value Hamiltonian is

H = u(A(t)c(t))ente−ρt + µ(t)[f(k(t)) − c(t) − (r + n + δ)k(t)] and the first-order conditions state that ∂H ∂u(Ac) = ente−ρt = µ ∂c ∂c ∂H = µ[f ′(k) − (x + n + δ)] = −µ˙ ∂k In addition the transversality condition requires:

lim [µ(t)k(t)] = 0 t→∞

209 Eliminating µ from the first-order conditions and assuming that the utility function (Ac)(1−θ) takes the functional form u(Ac) = 1−θ leads us to an expression for the growth rate of c: c˙ 1 = [f ′(k) − δ − ρ − θr] c θ This equation, together with the dynamic budget constraint forms a system of dif- ferential equations which completely describe the dynamics of the Ramsey model. The boundary conditions are given by the initial condition k0 and the transversality condition. Economics Application 18 (A Model of Economic Growth with Physical and Human Capital) This model illustrates the case when the optimization problem has two dynamic con- straints and two control variables. As was the case in the previous example, we assume that households perform produc- tion directly. The model also assumes that physical capital K and human capital H enter the production function Y in a Cobb-Douglas manner:

Y = AKαH1−α, where A denotes a constant level of technology and 0 ≤ α ≤ 1. Output is divided among consumption C, investment in physical capital IK and investment in human capital IH :

Y = C + IK + IH

The two capital stocks change according to the dynamic equations:

˙ K = IK − δK

˙ H = IH − δH where δ denotes the depreciation rate.

The households’ problem is to choose C,IK ,IH such that they maximize the present- value of lifetime utility ∫ ∞ U = u(C)e−ρtdt 0 C(1−θ) subject to the above four constraints. We again assume that u(C) = 1−θ . By eliminating

210 Y and IK , the households’ problem becomes

max U

˙ α 1−α s.t. K = AK H − C − δK − IH ˙ H = IH − δH and it contains two control variables (C,IH ) and two state variables (K,H). The Hamiltonian associated with this dynamic problem is:

α 1−α H = u(C) + µ1(AK H − C − δK − IH ) + µ2(IH − δH) and the first-order conditions require ∂H = u′(C)e−ρt − µ = 0 (16) ∂C 1 ∂H = −µ1 + µ2 = 0 (17) ∂IH ∂H = −µ (αAKα−1H1−α − δ) = −µ (18) ∂K 1 1 ∂H = −µ˙ (1 − α)AKαH−α − µ δ = −µ˙ (19) ∂K 1 2 2

Eliminating µ1 from (16) and (18) we obtain a result for the growth rate of consump- tion: C˙ 1 K = [αA( )−(1−α) − δ − ρ] C θ H

Since µ1 = µ2 (from (17)), conditions (18) and (19) imply K K αA( )−(1−α) − δ = (1 − α)A( )α − δ H H which enables us to find the ratio: K α = H 1 − α

C˙ After substituting this ratio in the result for C , it turns out that consumption grows at a constant rate given by: C˙ 1 = [Aαα(1 − α)(1 − α) − δ − ρ] C θ

K The substitution of H into the production function implies that Y is a linear function of K: 1 − α Y = AK( )(1−α) α

211 (Therefore, such a model is also called an AK model.) Formulas for Y and the K/H-ratio suggest that K,H and Y grow at a constant rate. Moreover, it can be shown that this rate is the same as the growth rate of C found above. Economics Application 19 (Investment-Selling Decisions) Finally, let us exemplify the case of a dynamic problem with inequality constraints. Assume that a firm produces, at each moment t, a good which can either be sold or reinvested to expand the productive capacity y(t) of the firm. The firm’s problem is to choose at each moment the proportion u(t) of the output y(t) which should be reinvested such that it maximizes total sales over the period [0,T ]. Expressed in mathematical terms, the firm’s problem is: ∫ T max [1 − u(t)]y(t)dt 0 s.t.y˙(t) = u(t)y(t)

y(0) = y0 > 0 0 ≤ u(t) ≤ 1 where u is the control and y is the state variable. The Hamiltonian takes the form:

H = (1 − u)y + λuy + w1(1 − u) + w2u

The optimal solution satisfies the conditions:

Hu = (λ − 1)y + w2 − w1 = 0

λ˙ = u − 1 − uλ

w1 ≥ 0, w1(1 − u) = 0

w2 ≥ 0, w2u = 0

λ(T ) = 0

These conditions imply that

λ > 1, u = 1, λ˙ = −λ (20) or λ < 1, u = 0, λ˙ = −1 (21)

212 that, λ(t) is a decreasing function over [0,T ]. Since λ(t) is also a continuous function, there exists t∗ such that (21) holds for any t in [t∗,T ]. Hence

u(t) = 0

λ(t) = T − t

x(t) = x(t∗) for any t in [t∗,T ]. t∗ must satisfy λ(t∗) = 1, which implies that t∗ = T − 1 if T ≥ 1. If T ≤ 1, we have t∗ = 0. If T > 1, then for any t in [0,T −1] (20) must hold. Using the fact that y is continuous over [0,T ] (and, therefore, continuous at t∗ = T − 1), we obtain

u(t) = 1

λ(t) = eT −t−1

t x(t) = y0e for any t in [0,T − 1]. In conclusion, if T > 1 then it is optimal for the firm to invest all the output until t = T − 1 and to sell all the output after that date. If T < 1 then it is optimal to sell all the output.

213