Introduction to Mathematical Optimization R. Clark Robinson
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Mathematical Optimization R. Clark Robinson Copyright c 2013 by R. Clark Robinson Department of Mathematics Northwestern University Evanston Illinois 60208 USA Contents Preface v Chapter 1. Linear Programming 1 1.1. Basic Problem 1 1.2. Graphical Solution 2 1.3. Simplex Method 5 1.3.1. Slack Variables 5 1.3.2. Simplex Method for Resource Requirements 7 1.3.3. General Constraints 10 1.4. Duality 17 1.4.1. Duality for Non-standard Linear Programs 19 1.4.2. Duality Theorems 21 1.5. Sensitivity Analysis 27 1.5.1. Changes in Objective Function Coefficients 29 1.6. Theory for Simplex Method 31 Chapter 2. Unconstrained Extrema 39 2.1. Mathematical Background 39 n 2.1.1. Types of Subsets of R 39 2.1.2. Continuous Functions 41 2.1.3. Existence of Extrema 43 2.1.4. Differentiation in Multi-Dimensions 44 2.1.5. Second Derivative and Taylor’s Theorem 47 2.1.6. Quadratic Forms 50 2.2. Derivative Conditions 55 2.2.1. First Derivative Conditions 56 2.2.2. Second Derivative Conditions 56 Chapter 3. Constrained Extrema 61 3.1. Implicit Function Theorem 61 3.2. Extrema with Equality Constraints 69 3.2.1. Interpretation of Lagrange Multipliers 73 iii iv Contents 3.3. Extrema with Inequality Constraints: Necessary Conditions 75 3.4. Extrema with Inequality Constraints: Sufficient Conditions 81 3.4.1. Convex Structures 81 3.4.2. Karush-Kuhn-Tucker Theorem under Convexity 84 3.4.3. Rescaled Convex Functions 88 3.4.4. Global Extrema for Concave Functions 92 3.4.5. Proof of Karush-Kuhn-Tucker Theorem 93 3.5. Second-Order Conditions for Extrema of Constrained Functions 98 Chapter 4. Dynamic Programming 107 4.1. Parametric Maximization and Correspondences 107 4.1.1. Budget Correspondence for Commodity Bundles 113 4.1.2. Existence of a Nash Equilibrium 114 4.2. Finite-Horizon Dynamic Programming 117 4.2.1. Supremum and Infimum 121 4.2.2. General Theorems 121 4.3. Infinite-Horizon Dynamic Program 124 4.3.1. Examples 127 4.3.2. Theorems for Bounded Reward Function 134 4.3.3. Theorems for One-Sector Economy 136 4.3.4. Continuity of Value Function 141 Appendix A. Mathematical Language 147 Bibliography 149 Index 151 Preface This book has been used in an upper division undergraduate course about optimization given in the Mathematics Department at Northwestern University. Only deterministic problems with a continuous choice of options are considered, hence optimization of functions whose variables are (possibly) restricted to a subset of the real numbers or some Euclidean space. We treat the case of both linear and nonlinear functions. Optimization of linear functions with linear constraints is the topic of Chapter 1, linear programming. The optimization of nonlinear func- tions begins in Chapter 2 with a more complete treatment of maximization of unconstrained functions that is covered in calculus. Chapter 3 considers optimization with constraints. First, we treat equality constraints that includes the Implicit Function Theorem and the method of Lagrange multipliers. Then we treat inequality constraints, which is the covers Karush-Kuhn- Tucker Theory. Included is a consideration of convex and concave functions. Finally, Chapter 4 considers maximization over multiple time periods, or dynamic programming. Although only considering discrete time, we treat both finite and infinite number of time periods. In previous years, I have used the textbooks by Sundaram [14] and Walker [16]. Our presentation of linear programming is heavily influenced by [16] and the material on nonlinear functions by [14]. The prerequisites include courses on linear algebra and differentiation of functions of sev- eral variables. Knowledge of linear algebra is especially important. The simplex method to solve linear programs involves a special type of row reduction of matrices. We also use the concepts of the rank of a matrix and linear independence of a collection of vectors. In terms of differentiation, we assume that the reader knows about partial derivatives and the gradient of a real-valued function. With this background, we introduce the derivative of a function between Euclidean spaces as a matrix. This approach is used to treat implicit differentiation with several constraints and independence of constraints. In addition, we assume an exposure into formal presentation of mathematical definitions and theorems that our student get through a course on foundation to higher mathematics or a calculus course that introduces formal mathematical notation as our freshman MENU and MMSS courses do at Northwestern. Appendix A contains a brief summary of some of the mathematical language that is assumed from such a course. We do not assume the reader has had a course in real analysis. We do introduce a few concepts and terminology that is covered in depth in such a course, but merely require the reader to gain a working knowledge of this material so we can state results succinctly and accurately. I would appreciate being informed by email of any errors or confusing statements, whether typographical, numerical, grammatical, or miscellaneous. R. Clark Robinson Northwestern University Evanston Illinois [email protected] May 2013 v Chapter 1 Linear Programming In this chapter, we begin our consideration of optimization by considering linear programming, maximization or minimization of linear functions over a region determined by linear inequali- ties. The word ‘programming’ does not refer to ‘computer programming’ but originally referred to the ‘preparation of a schedule of activities’. Also, these problems can be solved by an algo- rithm that can be implemented on a computer, so linear programming is a widely used practical tool. We will only present the ideas and work low dimensional examples to gain an understand- ing of what the computer is doing when it solves the problem. However, the method scales well to examples with many variables and many constraints. 1.1. Basic Problem n For a real valued function f defined on a set F , written f : F ⊂ R ! R, f(F ) = ff(x): x 2 F g is the set of attainable values of f on F , or the image of F by f. We say that the real valued function f has a maximum on F at a point xM 2 F provided that f(xM ) ≥ f(x) for all x 2 F . When xM exists, the maximum value f(xM ) is written as maxff(x): x 2 F g and is called the maximum or maximal value of f on F . We also say that xM is a maximizer of f on F , or that f(xM ) maximizes f(x) subject to x 2 F . In the same way, we say that f has a minimum on F at a point xm 2 F provided that f(xm) ≤ f(x) for all x 2 F . The value f(xm) is written as minff(x): x 2 F g and is called the minimum or minimal value of f on F . We also say that xm is a minimizer of f on F , or that f(xM ) minimizes f(x) subject to x 2 F . The function f has an extremum at x0 2 F provided that x0 is either a maximizer or a minimizer. The point x0 is also called an optimal solution. Optimization Questions. • Does f(x) attain a maximum (or minimum) on F for some x 2 F ? • If so, what are the points at which f(x) attains a maximum (or minimum) subject to x 2 F , and what is the maximal value (or minimal value)? There are problems that have no maximum nor minimum. For example, there is no maxi- mizer or minimizer of f(x) = 1=x on (0; 1). 1 2 1. Linear Programming Notations We denote the transpose of a matrix (or a vector) A by A>. n For two vectors v and w in R , v ≥ w means that vi ≥ wi for all 1 ≤ i ≤ n. In the same way, v w means that vi > wi for all 1 ≤ i ≤ n. We also consider two positive n “quadrants” of R , n n n R+ = f x 2 R : x ≥ 0 g = f x 2 R : xi ≥ 0 for 1 ≤ i ≤ n g and n n n R++ = f x 2 R : x 0 g = f x 2 R : xi > 0 for 1 ≤ i ≤ n g: Definition. The rank of an m × n matrix A is used at various times in this course. The rank of A, rank(A), is the dimension of the column space of A, i.e., the largest number of linearly independent columns of A. This integer is the same as number of pivots in the row reduced echelon form of A. (See Lay [9].) It is possible to determine the rank by determinants of submatrices: rank(A) = k iff k ≥ 0 is the largest integer with det(Ak) 6= 0, where Ak is any k × k submatrix of A formed by selecting k columns and k rows. To see this is equivalent, let A0 be submatrix of A with k columns that are linearly independent and span column space, so rank(A0) = k. The dimension of the row space of A0 is k, so there are k rows of A0 forming the k × k submatrix Ak with rank(Ak) = k, so det(Ak) 6= 0. The submatrix Ak is also a submatrix of pivot columns and rows. Note in linear algebra, the pivot columns are usually chosen with the matrix in echelon form with zeroes in the row to the left of a pivot positions. We often choose other columns as will be since in the examples given in the rest of the chapter.