APM462 Lecture Notes

APM462 Lecture Notes Yuchen Wang December 28, 2019 Contents 1 Matrix Calculus 3 1.1 Matrix Multiplication . .3 1.2 Partitioned Matrices . .3 1.3 Matrix Differentiation . .4 2 Second-year Calculus Review 5 2.1 Mean Value Theorem in 1 Dimension . .6 2.2 1st Order Taylor Approximation . .6 2.3 2nd Order Mean Value Theorem . .7 2.4 Recall: Definition of gradient . .7 2.5 Mean Value Theorem in n dimension . .7 n 2.6 1st Order Taylor Approximation in R .................................8 n 2.7 2nd Order Mean Value Theorem in R .................................8 n 2.8 2nd Order Taylor Approximation in R ................................8 2.9 Geometric Meaning of Gradient . .9 2.10 Implicit Function Theorem . .9 2.11 Level Sets of f ..............................................9 3 Convex Sets & Functions 9 3.1 Definitions . .9 3.2 Basic Properties of Convex Functions . 10 3.3 Criterions for convexity . 11 3.4 Minimization and Maximization of Convex Functions . 12 4 Basics of Unconstrained Optimization 13 4.1 Extreme Value Theorem . 13 4.2 Unconstrained Optimization . 14 4.3 1st order necessary condition for local minimum . 15 4.4 2nd order necessary condition for local minimum . 15 4.5 2nd order sufficient condition (for interior points) . 16 5 Optimization with Equality Constraints 16 5.1 Definitions of Related Spaces . 16 5.2 Lagrange Multipliers: 1st order necessary condition for local minimum . 17 5.3 2nd order necessary condition for local minimum . 17 5.4 2nd order sufficient condition for local minimum . 18 1 CONTENTS 2 6 Optimization with Inequality Constraints 18 6.1 Kuhn-Tucker conditions: 1st order necessary condition for local minimum . 19 6.2 2nd order necessary conditions for local minimum . 20 6.3 2nd order sufficient conditions . 20 7 Different Computation Methods for Solving Optimum 21 7.1 Newton's Method . 21 7.2 Method of Steepest Descent (Gradient Method) . 23 7.3 Method of Conjugate Direction . 26 7.3.1 Geometric Interpretations of Method of Conjugate Directions . 29 8 Calculus of Variations 30 8.1 Example . 31 8.2 Classical Problem: the Brachistochrone . 33 8.3 General class of problems in Calculus of Variations . 33 n 8.4 Euler-Lagrange Equations in R .................................... 36 8.5 Equality constraints . 38 8.5.1 Isoperimetric constraints . 38 8.5.2 Holonomic constraints . 39 1 MATRIX CALCULUS 3 1 Matrix Calculus Row v.s. Column Vector Our default rule is that every vector is a column vector unless explicitly stated otherwise. This is also known as the numerator layout. n Special case: For f : R ! R, Df is a 1 × n matrix or row vector. 1.1 Matrix Multiplication Definition 1.1.1 Let A be m × n, and B be n × p, and let the product AB be C = AB then C is a m × p matrix, with element (i; j) given by n X cij = aikbkj k=1 for all i = 1; 2; : : : ; m; j = 1; 2; : : : ; p. Proposition 1.1.2 Let A be m × n, and x be n × 1, then the typical element of the product z = Ax is given by n X zi = aikxk k=1 for all i = 1; 2; : : : ; m. Similarly, let y be m × 1, then the typical element of the product zT = yT A is given by n T X zi = akiyk k=1 for all i = 1; 2; : : : ; n. Finally, the scalar resulting from the product α = yT Ax is given by m n X X α = ajkyixk j=1 k=1 1.2 Partitioned Matrices Proposition 1.2.1 Let A be a square, nonsingular matrix of order m. Partition A as A A A = 11 12 A21 A22 so that A11 and A22 are invertible. Then −1 −1 −1 −1 −1 −1 (A11 − A12A22 A21) −A11 A12(A22 − A21A11 A12) A = −1 −1 −1 −1 −1 −A22 A21(A11 − A12A22 A21) (A22 − A21A11 A12) proof: Direct multiplication of the proposed A−1 and A yields A−1A = I 1 MATRIX CALCULUS 4 1.3 Matrix Differentiation Proposition 1.3.1 @A @AT = @x @x Proposition 1.3.2 Let y = Ax where y is m × 1, x is n × 1, A is m × n, and A does not depend on x. Suppose that x is a function of the vector z, while A is independent of z. Then @y @x = A @z @z Proposition 1.3.3 Let the scalar α be defined by α = yT Ax where y is m × 1, x is n × 1, A is m × n, and A is independent of x and y, then @α = yT A @x and @α = xT AT @y Proposition 1.3.4 For the special case where the scalar α is given by the quadratic form α = xT Ax where x is n × 1, A is n × n, and A does not depend on x, then @α = xT (A + AT ) @x proof: By definition n n X X α = aijxixj j=1 i=1 Differentiating with respect to the kth element of x we have n n @α X X = a x + a x @x kj J ik i k j=1 i=1 for all k = 1; 2; : : : ; n, and consequently, @α = xT AT + xT A = xT (AT + A) @x Proposition 1.3.4 For the special case where A is a symmetric matrix and α = xT Ax where x is n × 1, A is n × n, and A does not depend on x, then @α = 2xT A @x 2 SECOND-YEAR CALCULUS REVIEW 5 Proposition 1.3.5 Let the scalar α be defined by α = yT x where y is n × 1, x is n × 1, and both y and x are functions of the vector z. Then @α @y @x = xT + yT @z @z @z Proposition 1.3.6 Let the scalar α be defined by α = xT x where x is n × 1, and x is a functions of the vector z. Then @α @y = 2xT @z @z Proposition 1.3.7 Let the scalar α be defined by α = yT Ax where y is m × 1, A is m × n, x is n × 1, and both y and x are functions of the vector z, while A does not depend on z. Then @α @y @x = xT AT + yT A @z @z @z Proposition 1.3.8 Let A be an invertible, m×m matrix whose elements are functions of the scalar parameter α. Then @A−1 @A = −A−1 A−1 @α @α proof: Start with the definition of the inverse A−1A = I and differentiate, yielding @A @A−1 A−1 + A = 0 @α @α rearranging the terms yields @A−1 @A = −A−1 A−1 @α @α Vector-by-vector Differentiation Identities 1.3.9 Young's Theorem 1.3.10 i.e. Symmetry of second derivatives T [rxyf(x; y)] = ryxf(x; y) proof: This is straightforward by writing out the elements of the matrix. 2 Second-year Calculus Review functions R ! R 2 SECOND-YEAR CALCULUS REVIEW 6 2.1 Mean Value Theorem in 1 Dimension 1 g 2 C on R g(x + h) − g(x) = g0(x + θh) h where θ 2 (0; 1) Or equivalently, g(x + h) = g(x) + hg0(x + θh) 2.2 1st Order Taylor Approximation 1 g 2 C on R g(x + h) = g(x) + hg0(x) + o(h) where o(h) is \little o" of h, the error term. Say a function f(h) = o(h), this means lim f(h) = 0 h!0 h For example, for f(h) = h2, we can say f(h) = o(h), 2 since lim f(h) = lim h = limh = 0 h!0 h h!0 h h!0 proof: (Use MVT): WTS : g(x + h) − g(x) − hg0(x) = o(h) [g(x + h) − g(x)] − hg0(x) [hg0(x + θh)] − hg0(x) lim = lim h!0 h h!0 h = lim g0(x + θh) − g0(x) h!0 = lim g0(x) − g0(x) h!0 = 0 2 SECOND-YEAR CALCULUS REVIEW 7 2.3 2nd Order Mean Value Theorem 2 g 2 C on R h2 g(x + h) = g(x) + hg0(x) + g0(x + θh) 2 for some θ 2 (0; 1) proof: 0 h2 00 2 WTS: g(x + h) − g(x) − hg (x) − 2 g (x) = o(h ) 2 2 2 g(x + h) − g(x) − hg0(x) − h g00(x) [ h g0(x + θh)] − h g00(x) lim 2 = lim 2 2 h!0 h2 h!0 h2 1 = lim (g00(x + θh) − g00(x)) h!0 2 1 = lim (g00(x) − g00(x)) h!0 2 = 0 n multivariate functions: R ! R 2.4 Recall: Definition of gradient n n Gradient of f : R ! R at x 2 R (denoted rf(x)) if exists is a vector characterized by the property: f(x + v) − f(x) − rf(x) · v lim = 0 v!0 jjvjj In Cartesian coordinates, rf(x) = ( @f (x);:::; @f (x)) @x1 @xn 2.5 Mean Value Theorem in n dimension 1 n n f 2 C on R , then for any x; v 2 R , f(x + v) = f(x) + rf(x + θv) · v for some θ 2 (0; 1) proof: Reduce to 1-dimension case g(t) := f(x + tv); t 2 R d g0(t) = f(x + tv) dt n X @f d(x + tv)i = (x + tv) · (by Chain Rule) @x dt i=1 i X @f d(xi + tvi) = (x + tv) · @xi dt X @f = (x + tv) · vi @xi = rf(x + tv) · v (*) 2 SECOND-YEAR CALCULUS REVIEW 8 1 g 2 C on R Using MVT in R: f(x + v) = g(1) = g(0 + 1) = g(0) + 1g0(0 + θ1) (θ 2 (0; 1)) = g(0) + g0(θ) = f(x) + rf(x + θv) · v (by (*)) 2.6 1st Order Taylor Approximation in Rn 1 n f 2 C on R f(x + v) = f(x) + rf(x) · v + o(jjvjj) proof: [f(x + v) − f(x)] − rf(x) · v [rf(x + θv) · v] − rf(x) · v lim = lim jjvjj!0 jjvjj jjvjj!0 jjvjj v = lim [rf(x + θv) − rf(x)] · jjvjj!0 jjvjj v = 0 ( jjvjj is a unit vector, remains 1) 2.7 2nd Order Mean Value Theorem in Rn 2 n f 2 C on R 1 f(x + v) = f(x) + rf(x) · v + vT r2f(x + θv) · v 2 Remarks In this course, r2 means Hessian, not Laplacian.

APM462 Lecture Notes

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support