1 Gradients As a Linear Approximation 2 Gradients and Tangents

Home , Linear approximation

Math 51 Chapter 11 October 18, 2019

Goals: • Compute the gradient of a function of n variables

• Use the gradient to compute the tangent plane/line to the contour sets of f for n = 2, 3.

• Carry the ﬁrst steps of gradient descent with simple numbers and represent it on a contour plot.

1 Gradients as a linear approximation

The gradient of a function f : Rn → R is the vector-valued function:

 ∂ f  ∂x1  .  ∇ f =  .  .  .  ∂ f ∂xn The linear approximation to f for x near a point a is:

f (x) ≈ f (a) + (∇ f (a)) · (x − a).

Question 1. (clicker question) Use the gradient to compute the linear approximation of a function.

2 Gradients and tangents

Example 1. Here is the contour plot of the function f (x, y) = x3 + xy + y2.

(i) Compute the gradient ∇ f of f as a function R2 → R2.

(ii) Evaluate the gradient at (1, 0), and draw an ar- row on the contour plot representing the direction of the gradient at that point.

(iii) Same question for (2, −2) and (−2, −1).

(iv) Compare the direction of the gradient with the tangent line to the contour line at each point. What do you notice?

(v) Give a parametric equation for the tangent line to the curve x3 + xy + y2 = 8 at the point (2, −2).

The gradient of the function f : R2 → R at a point a is perpendicular to the tangent line to the level set that passes through a.

Example 2. Consider the surface S deﬁned by z = 2x2 − y2. In other words, S is the level set at 0 of the function g = 2x2 − y2 − z.

1 c 2019 Stanford University Department of Mathematics All rights reserved. (i) Without any computation, describe in your own words what the tangent plane at (0, 0) should look like. Convince your neighbor.

(ii) Compute the gradient ∇g of the function g(x, y, z) = 2x2 − y2 − z.

(iii) Evaluate the gradient at a = (0, 0, 0). Find an equation for the plane through (0, 0, 0) with normal vector (∇g)(a).

(iv) Same questions for the point b = (−1, 0, 0).

The gradient of a function f : R3 → R at a point a is perpendicular to

3 Gradient descent

Theorem. Let f : Rn → R. If the gradient (∇ f )(a) is nonzero, then the vector (∇ f )(a) points in the direction where f is increasing the most rapidly. The vector −(∇ f )(a) points in the direction where f is decreasing the most rapidly.

Example 3. Pick a point on the contour plot above, and move orthogonally to the contour lines, in the direction in which f is decreasing. Repeat for a few different points until you’ve convinced yourself that you always end up at a local minimum. Do the same in the direction in which f is increasing, and convince yourself you will end up at a maximum.

Example 4. Let f (x, y) = x2 + (y − 1)2. The function f has a local minimum at the point x = (0, 1).

(i) Compute the gradient (∇ f )(a) of f at the point a = (1, 2). Compare −(∇ f )(a) with the displacement vector x − a.

(ii) Compute b = a − (0.1)(∇ f )(a)/k(∇ f )(a)k.

Example 4 describes the ﬁrst step of an algorithm called gradient descent. It leads us step by step to the minimum of f . If we are instead looking for the maximum, we use (∇ f )(a) at each step instead of −(∇ f )(a). → Examples 11.3.5, 11.3.6.