1 Space Curves and Tangent Lines 2 Gradients and Tangent Planes

CLASS NOTES for CHAPTER 4, Nonlinear Programming 1 Space Curves and Tangent Lines Recall that space curves are de¯ned by a set of parametric equations, x1(t) 2 x2(t) 3 r(t) = . 6 . 7 6 7 6 xn(t) 7 4 5 In Calc III, we might have written this a little di®erently, ~r(t) =< x(t); y(t); z(t) > but here we want to use n dimensions rather than two or three dimensions. The derivative and antiderivative of r with respect to t is done componentwise, x10 (t) x1(t) dt 2 x20 (t) 3 2 R x2(t) dt 3 r(t) = . ; R(t) = . 6 . 7 6 R . 7 6 7 6 7 6 xn0 (t) 7 6 xn(t) dt 7 4 5 4 5 And the local linear approximation to r(t) is alsoRdone componentwise. The tangent line (in n dimensions) can be written easily- the derivative at t = a is the direction of the curve, so the tangent line is given by: x1(a) x10 (a) 2 x2(a) 3 2 x20 (a) 3 L(t) = . + t . 6 . 7 6 . 7 6 7 6 7 6 xn(a) 7 6 xn0 (a) 7 4 5 4 5 In Class Exercise: Use Maple to plot the curve r(t) = [cos(t); sin(t); t]T ¼ and its tangent line at t = 2 . 2 Gradients and Tangent Planes Let f : Rn R. In this case, we can write: ! y = f(x1; x2; x3; : : : ; xn) 1 Note that any function that we wish to optimize must be of this form- It would not make sense to ¯nd the maximum of a function like a space curve; n dimensional coordinates are not well-ordered like the real line- so the fol- lowing statement would be meaningless: (3; 5) > (1; 2). This type of function is sometimes called a surface even though we can't graph it if the domain has a higher dimension than 2. The gradient of f, f, is de¯ned by its partial derivatives. We will denote @f r @xi as fxi , and so: fx1 (x1; x2; : : : ; xn) 2 fx2 (x1; x2; : : : ; xn) 3 f(x1; x2; : : : ; xn) = . r 6 . 7 6 7 6 fxn (x1; x2; : : : ; xn) 7 4 5 NOTE: When writing the gradient, we will not distinguish between it being a row vector or a column vector. For some ideas, it may be more clear notationally to write it as a column, but for computations, it is a row vector. Normally, it will be clear from the context whether it is a row or column. The rate of change of f at x = a in the direction of the unit vector u is de¯ned as the directional derivative of f: Duf(a) = f(a) u r ¢ If we recall the useful relationship between any two vectors u, v Rn, 2 u v = u 2 v 2 cos(θ) ¢ k k k k then we see that the direction for which the rate of change of f is a maximum is in the direction of f(a). In Class Example:rFind the maximum rate of change of f at the given point and the direction in which it occurs: f(x; y) = x + y=z at (4; 3). The linearization of f at x = a is the tangent plane, L(x) = f(a) + f(a) (x a) r ¢ ¡ which is also the ¯rst order Taylor series approximation to f at a. 2 2.1 Surfaces and Contours In maximizing or minimizing a function, we'll consider equations like: F (x) = k, where k is a real number. From our earlier discussion on the Implicit Function Theorem, we know that this equation could implicitly de¯ne one of the variables in terms of the others. In Calculus III, for example, we saw F (x; y) = k de¯nes a level curve in the plane. ² F (x; y; z) = k de¯nes a level surface in three dimensions. ² Now let r(t) be a curve on the surface de¯ned by F (x) = k. Then we could write F as dependent on t, and di®erentiate with respect to t: F (r(t)) r0(t) = 0 r ¢ which implies that the gradient is orthogonal to any curve on the surface. In particular, we can compute the tangent line or plane at the point (a; b) or (a; b; c): Fx(a; b) (x a) + Fy(a; b) (y b) = 0 ² ¢ ¡ ¢ ¡ Fx(a; b; c) (x a) + Fy(a; b; c) (y b) + Fz(a; b; c) (z c) = 0 ² ¢ ¡ ¢ ¡ ¢ ¡ If x Rn, this can be written compactly as F (a) (x a) = 0 2 r ¢ ¡ In Class Example: Compute the tangent line and show that the gradient 2 2 1 1 is orthogonal to it, if x + y = 1 at p2 ; p2 . In this case, the gradient of ³ ´ F (x; y) = x2 + y2 is [2x; 2y], and at the given point, the gradient is [p2; p2]. Notice that this vector points directly opposite from the center of the circle. On the other hand, the equation of the tangent line is: 1 1 p2 x + p2 y = 0 µ ¡ p2¶ µ ¡ p2¶ or in the more familiar form, y = x + p2. The \slope" of the gradient is 1, while the slope of the tangent line¡is 1. In two dimensions such as this, a ¡ F line in the direction of the gradient would have slope y , while the slope of Fx F the tangent line would be x . ¡Fy The important point in this section: Given a level surface of the form F (x) = k, the gradient F at x = a is orthogonal. Furthermore, the gradient points in the directionr where F increases the fastest (for y = F (x). 3 3 Derivatives we don't see in Calculus III 3.1 The Hessian Matrix n Let f : R R, so that we have something in the form of y = f(x1; x2; : : : ; xn). We have already! seen that the derivative of f takes the form of the gradient. We now examine the second derivative of f, and we'll see that the second derivative forms what is called the Hessian matrix. The ¯rst derivative is in the form of the gradient, where we have a vector of functions. Let fxi (x1; x2; : : : ; xn) = gi(x1; x2; : : : ; xn). Now each function g is also a function from Rn to R, and so the derivative of g is also a gradient. We could therefore write the second derivative of f in matrix form to obtain what is called the Hessian of f: g1 fx1x1 fx1x2 : : : fx1xn r 2 g2 3 2 fx2x1 fx2x2 : : : fx2xn 3 Hf = r. = . 6 . 7 6 . 7 6 7 6 7 6 gn 7 6 fxnx1 fxnx2 : : : fxnxn 7 4 r 5 4 5 In Class Example: Let f(x; y) = x3 +2xy y2. Compute the gradient and Hessian of f. ¡ 3x2 + 2y 6x + 2 2 f = ; Hf = r · 2x 2y ¸ · 2 2 ¸ ¡ ¡ You might note that, if the second partials of f are continuous, then Hf will always be a symmetric matrix (this is a consequence of Clairaut's Theorem). In Class Example: Let f(x; y) = 2 + x2 + x + xy + y + y2. Show that f can be written as: 1 f(a) + f(a)(x a) + (x a)T Hf(a)(x a) r ¡ 2 ¡ ¡ where a = (0; 0). First, we compute f and Hf: r 2x + 1 + y 2 1 f = ; Hf = r · x + 1 + 2y ¸ · 1 2 ¸ 4 Substituting (x; y) = (0; 0), the gradient is [1; 1], and now compute: x 0 1 2 1 x 0 2 + [1; 1] ¡ + [x 0; y 0] ¡ ¢ · y 0 ¸ 2 ¡ ¡ · 1 2 ¸ · y 0 ¸ ¡ ¡ 1 2x + y 2 + x + y + [x; y] = 2 + x + y + x2 + xy + y2 2 · x + 2y ¸ The Second Order Taylor Approximation If f : Rn R, then the Taylor approximation to f at a to second order is: ! 1 f(x) f(a) + f(a)(x a) + (x a)T Hf(a)(x a) ¼ r ¡ 2 ¡ ¡ 3.2 Connecting the Hessian to Calculus III In Calc III, when we were ¯nding local extrema, we had the \Second Deriva- tives Test". To generalize this result requires a little linear algebra, but the end result is that if x = a is a critical point of f, and Hf(a) has all positive eigenvalues, then f has a local minimum at a. If Hf(a) has all negative eigenvalues, then f has a local maximum at a. If the eigenvalues are mixed in sign, there is a saddle point. 3.3 The Jacobian Matrix Let f : Rn Rm. The we could write f using m coordinate functions, each depending !on n variables, f1(x1; : : : ; xn) 2 f2(x1; : : : ; xn) 3 f(x1; : : : ; xn) = . 6 . 7 6 7 6 fm(x1; : : : ; xn) 7 4 5 De¯nition: The derivative of f takes the form of a matrix called the Jaco- bian of f, constructed by taking the partial derivatives: (f1)x1 (f1)x2 (f1)xn ¢ ¢ ¢ 2 (f2)x1 (f2)x2 (f2)xn 3 Jf = . ¢ ¢ ¢ . 6 . 7 6 7 6 (fm)x1 (fm)x2 (fm)xn 7 4 ¢ ¢ ¢ 5 5 In Class Example: Compute the Jacobian of: x2 + yz + z f(x; y; z) = 2 y2 + xy + 3 3 sin(xyz) 4 5 2x z y + 1 Jf = 2 y 2y + x 0 3 yz cos(xyz) xz cos(xyz) xy cos(xyz) 4 5 If f : Rn Rm, then the linearization of f at x = a is given by: ! L(x) = f(a) + Jf(a)(x a) ¡ 4 The Extreme Value Theorem Let f : Rn R be continuous on a closed and bounded domain D.

1 Space Curves and Tangent Lines 2 Gradients and Tangent Planes

Infinitesimal and Tangent to Polylogarithmic Complexes For

The Matrix Calculus You Need for Deep Learning

2. the Tangent Line

A Brief Tour of Vector Calculus

1101 Calculus I Lecture 2.1: the Tangent and Velocity Problems

Policy Gradient

Calculus Terminology

High Order Gradient, Curl and Divergence Conforming Spaces, with an Application to NURBS-Based Isogeometric Analysis

Calculus Formulas and Theorems

Leonhard Euler - Wikipedia, the Free Encyclopedia Page 1 of 14

Integrating Gradients

Mean Value Theorem on Manifolds