<<

Jim Lambers MAT 280 Semester 2009-10 Lecture 5 Notes

These notes correspond to Section 11.4 in Stewart and Section 2.3 in Marsden and Tromba.

Tangent Planes, Linear and Differentiability

Now that we have learned how to compute partial of functions of several independent variables, in order to measure their instantaneous rates of change with respect to these variables, we will discuss another essential application of derivatives: the of functions by linear functions. Linear functions are the simplest to work with, and for this reason, there are many instances in which functions are replaced by a in the context of solving a problem such as solving a differential .

Tangent Planes and Linear Approximations In single- , we learned that the graph of a f(x) can be approximated near a point x0 by its tangent , which has the equation

′ y = f(x0) + f (x0)(x − x0).

′ For this reason, the function Lf (x) = f(x0) + f (x0)(x − x0) is also referred to as the , or linear approximation, of f(x) at x0. 2 Now, suppose that we have a function of two variables, f : D ⊆ ℝ → ℝ, and a point (x0, y0) ∈ D. Furthermore, suppose that the first partial derivatives of f, fx and fy, exist at (x0, y0). Because the graph of this function is a surface, it follows that a that approximates f near (x0, y0) would have a graph that is a plane. Just as the tangent line of f(x) at x0 passes through the point (x0, f(x0)), and has a that ′ is equal to f (x0), the instantaneous rate of change of f(x) with respect to x at x0, a plane that best approximates f(x, y) at (x0, y0) must pass through the point (x0, y0, f(x0, y0)), and the slope of the plane in the x- and y-directions, respectively, should be equal to the values of fx(x0, y0) and fy(x0, y0). Since a general linear function of two variables can be described by the formula

Lf (x, y) = A(x − x0) + B(y − y0) + C, so that Lf (x0, y0) = C, and a simple differentiation yields ∂L ∂L f = A, f = B, ∂x ∂y

1 we conclude that the linear function that best approximates f(x, y) near (x0, y0) is the linear approximation ∂f ∂f L (x, y) = f(x , y ) + (x , y )(x − x ) + (x , y )(y − y ). f 0 0 ∂x 0 0 0 ∂y 0 0 0

Furthermore, the graph of this function is called the tangent plane of f(x, y) at (x0, y0). Its equation is ∂f ∂f z − z = (x , y )(x − x ) + (x , y )(y − y ). 0 ∂x 0 0 0 ∂y 0 0 0

2 2 Example Let f(x, y) = 2x y + 3y , and let (x0, y0) = (1, 1). Then f(x0, y0) = 5, and the first partial derivatives at (x0, y0) are

2 fx(1, 1) = 4xy∣x=1,y=1 = 4, fy(1, 1) = 2x + 6y∣x=1,y=1 = 8.

It follows that the tangent plane at (1, 1) has the equation

z − 5 = 4(x − 1) + 8(y − 1), and the linearization of f at (1, 1) is

Lf (x, y) = 5 + 4(x − 1) + 8(y − 1).

Let (x, y) = (1.1, 1.1). Then f(x, y) = 6.292, while Lf (x, y) = 6.2, for an error of 6.292−6.2 = 0.092. However, if (x, y) = (1.01, 1.01), then f(x, y) = 5.120902, while Lf (x, y) = 5.12, for an error of 5.120902 − 5.12 = 0.000902. That is, moving 10 times as close to (1, 1) decreased the error by a factor of over 100. □ Another useful application of a linear approximation is to estimate the error in the value of a function, given estimates of error in its inputs. Given a function z = f(x, y), and its linearization Lf (x, y) around a point (x0, y0), if x0 and y0 are measured values and dx = x − x0 and dz = y − y0 are regarded as errors in x0 and y0, then the error in z can be estimated by computing

dz = z − z0 = Lf (x, y) − f(x0, y0)

= [f(x0, y0) + fx(x0, y0)(x − x0) + fy(x0, y0)(y − y0)] − f(x0, y0)

= fx(x0, y0) dx + fy(x0, y0) dy.

The variables dx and dy are called differentials, and dz is called the total differential, as it depends on the values of dx and dy. The total differential dz is only an estimate of the error in z; the actual error is given by Δz = f(x, y) − f(x0, y0), when the actual errors in x and y,Δx = x − x0 and Δy = y − y0, are known. Since this is rarely the case in practice, one instead estimates the error in z from estimates dx and dy of the errors in x and y.

2 Example Recall that the volume of a cylinder with radius r and height ℎ is V = r2ℎ. Suppose that r = 5 cm and ℎ = 10 cm. Then the volume is V = 250 cm3. If the measurement error in r and ℎ is at most 0.1 cm, then, to estimate the error in the computed volume, we first compute

2 Vr = 2rℎ = 100, Vℎ = r = 25.

It follows that the error in V is approximately

3 dV = Vr dr + Vℎ dℎ = 0.1(100 + 25) = 12.5 cm .

If we specify Δr = 0.1 and Δℎ = 0.1, and compute the actual volume using radius r + Δr = 5.1 and height ℎ + Δℎ = 10.1, we obtain

V + ΔV = (5.1)2(10.1) = 262.701 cm3, which yields the actual error

ΔV = 262.701 − 250 = 12.701 cm3.

Therefore, the estimate of the error, dV , is quite accurate. □

Functions of More than Two Variables The concepts of a tangent plane and linear approximation generalize to more than two variables in n (0) (0) (0) a straightforward manner. Specifically, given f : D ⊆ ℝ → ℝ and p0 = (x1 , x2 , . . . , xn ) ∈ D, n+1 we define the tangent space of f(x1, x2, . . . , xn) at p0 to be the n-dimensional hyperplane in ℝ whose points (x1, x2, . . . , xn, y) satisfy the equation

∂f (0) ∂f (0) ∂f (0) y − y0 = (p0)(x1 − x1 ) + (p0)(x2 − x2 ) + ⋅ ⋅ ⋅ + (p0)(xn − xn ), ∂x1 ∂x2 ∂xn where y0 = f(p0). Similarly, the linearization of f at p0 is the function Lf (x1, x2, . . . , xn) defined by

∂f (0) ∂f (0) ∂f (0) Lf (x1, x2, . . . , xn) = y0 + (p0)(x1 − x1 ) + (p0)(x2 − x2 ) + ⋅ ⋅ ⋅ + (p0)(xn − xn ). ∂x1 ∂x2 ∂xn

The Vector It can be seen from the above definitions that writing formulas that involve the partial derivatives of functions of n variables can be cumbersome. This can be addressed by expressing collections of partial derivatives of functions of several variables using vectors and matrices, especially for vector-valued functions of several variables.

3 (0) (0) (0) By convention, a point p0 = (x1 , x2 , . . . , xn ), which can be identified with the position vector (0) (0) (0) p0 = ⟨x1 , x2 , . . . , xn ⟩, is considered to be a column vector ⎡ (0) ⎤ x1 ⎢ (0) ⎥ ⎢ x2 ⎥ p0 = ⎢ . ⎥ . ⎢ . ⎥ ⎣ . ⎦ (0) xn n Also, by convention, given a function of n variables, f : D ⊆ ℝ → ℝ, the collection of its partial derivatives with respect to all of its variables is written as a row vector h i ∇f(p ) = ∂f (p ) ∂f (p ) ⋅ ⋅ ⋅ ∂f (p ) . 0 ∂x1 0 ∂x2 0 ∂xn 0

This vector is called the gradient of f at p0. Viewing the partial derivatives of f as a vector allows us to use vector operations to describe, much more concisely, the linearization of f. Specifically, the linearization of f at p0, evaluated at a point p = (x1, x2, . . . , xn), can be written as

∂f (0) ∂f (0) ∂f (0) Lf (p) = f(p0) + (p0)(x1 − x1 ) + (p0)(x2 − x2 ) + ⋅ ⋅ ⋅ + (p0)(xn − xn ) ∂x1 ∂x2 ∂xn n X ∂f (0) = f(p ) + (p )(x − x ) 0 ∂x 0 i i i=1 i = f(p0) + ∇f(p0) ⋅ (p − p0), where ∇f(p0) ⋅ (p − p0) is the dot product, also known as the inner product, of the vectors ∇f(p0) and p − p0. Recall that given two vectors u = ⟨u1, u2, . . . , un⟩ and v = ⟨v1, v2, . . . , vn⟩, the dot product of u and v, denoted by u ⋅ v, is defined by n X u ⋅ v = uivi = u1v1 + u2v2 + ⋅ ⋅ ⋅ + unvn = ∥u∥∥v∥ cos , i=1 where  is the angle between u and v.

3 Example Let f : ℝ → ℝ be defined by f(x, y, z) = 3x2y3z4.

Then    3 4 2 2 4 2 3 3  ∇f(x, y, z) = fx fy fz = 6xy z 9x y z 12x y z .

Let (x0, y0, z0) = (1, 2, −1). Then     ∇f(x0, y0, z0) = ∇f(1, 2, −1) = fx(1, 2, −1) fy(1, 2, −1) fz(1, 2, −1) = 48 36 −96 .

4 It follows that the linearization of f at (x0, y0, z0) is

Lf (x, y, z) = f(1, 2, −1) + ∇f(1, 2, −1) ⋅ ⟨x − 1, y − 2, z + 1⟩ = 24 + ⟨48, 36, −96⟩ ⋅ ⟨x − 1, y − 2, z + 1⟩ = 24 + 48(x − 1) + 36(y − 2) − 96(z + 1) = 48x + 36y − 96z − 192.

At the point (1.1, 1.9, −1.1), we have f(1.1, 1.9, −1.1) ≈ 36.5, while Lf (1.1, 1.9, −1.1) = 34.8. Because f is changing rapidly in all coordinate directions at (1, 2, −1), it is not surprising that the linearization of f at this point is not highly accurate. □

The Jacobian n m Now, let f : D ⊆ ℝ → ℝ be a vector-valued function of n variables, with component functions ⎡ ⎤ f1(p) ⎢ f2(p) ⎥ f(p) = ⎢ ⎥ , ⎢ . ⎥ ⎣ . ⎦ fm(p)

m where each fi : D → ℝ . Combining the two conventions described above, the partial derivatives of these component functions at a point p0 ∈ D are arranged in an m × n matrix ⎡ ⎤ ∂f1 (p ) ∂f1 (p ) ⋅ ⋅ ⋅ ∂f1 (p ) ∂x1 0 ∂x2 0 ∂xn 0 ∂f2 ∂f2 ∂f2 ⎢ (p0) (p0) ⋅ ⋅ ⋅ (p0) ⎥ ⎢ ∂x1 ∂x2 ∂xn ⎥ Jf (p0) = ⎢ . . ⎥ . ⎢ . . ⎥ ⎣ . ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ . ⎦ ∂fm (p ) ∂fm (p ) ⋅ ⋅ ⋅ ∂fm (p ) ∂x1 0 ∂x2 0 ∂xn 0

This matrix is called the Jacobian matrix of f at p0. It is also referred to as the of f at ′ x0, since it reduces to the f (x0) when f is a scalar-valued function of one variable. Note that rows of Jf (p0) correspond to component functions, and columns correspond to independent variables. This allows us to view Jf (p0) as the following collections of rows or columns: ⎡ ⎤ ∇f1(p0) ∇f (p ) ⎢ 2 0 ⎥ h ∂f ∂f ∂f i Jf (p0) = ⎢ . ⎥ = (p0) (p0) ⋅ ⋅ ⋅ (p0) . ⎢ . ⎥ ∂x1 ∂x2 ∂xn ⎣ . ⎦ ∇fm(p0)

The Jacobian matrix provides a concise way of describing the linearization of a vector-valued function, just the gradient does for a scalar-valued function. The linearization of f at p0 is the

5 function Lf (p), defined by ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ∂f1 (p ) ∂f1 (p ) f1(p0) ∂x1 0 ∂xn 0 ∂f2 ∂f2 ⎢ f2(p0) ⎥ ⎢ (p0) ⎥ ⎢ (p0) ⎥ ⎢ ⎥ ⎢ ∂x1 ⎥ (0) ⎢ ∂xn ⎥ (0) Lf (p) = . + ⎢ . ⎥ (x1 − x1 ) + ⋅ ⋅ ⋅ + ⎢ . ⎥ (xn − xn ) ⎢ . ⎥ ⎢ . ⎥ ⎢ . ⎥ ⎣ ⎦ ⎣ . ⎦ ⎣ . ⎦ ∂f ∂f fm(p0) m (p ) m (p ) ∂x1 0 ∂xn 0 n X ∂f (0) = f(p ) + (p )(x − x ) 0 ∂x 0 j j j=1 j

= f(p0) + Jf (p0)(p − p0), where the expression Jf (p0)(p − p0) involves matrix multiplication of the matrix Jf (p0) and the vector p − p0. Note the similarity between this definition, and the definition of the linearization of a function of a single variable. In general, given a m × n matrix A; that is, a matrix A with m rows and n columns, and an n × p matrix B, the product AB is the m × p matrix C, where the entry in row i and column j of C is obtained by computing the dot product of row i of A and column j of B. When computing the linearization of a vector-valued function f at the point p0 in its domain, the ith component function of the linearization is obtained by adding the value of the ith component function at p0, fi(p0), to the dot product of ∇fi(p0) and the vector p − p0, where p is the vector at which the linearization is to be evaluated.

2 2 Example Let f : ℝ → ℝ be defined by

   x  f1(x, y) e cos y f(x, y) = = −2x . f2(x, y) e sin y

Then the Jacobian matrix, or derivative, of f is the 2 × 2 matrix

     x x  ∇f1(x, y) (f1)x (f1)y e cos y −e sin y Jf (x, y) = = = −2x −2x . ∇f2(x, y) (f2)x (f2)y −2e sin y e cos y

Let (x0, y0) = (0, /4). Then we have " √ √ # 2 − 2 J (x , y ) = √2 √2 , f 0 0 2 − 2 2 and the linearization of f at (x0, y0) is     f1(x0, y0) x − x0 Lf (x, y) = + Jf (x0, y0) f2(x0, y0) y − y0

6 " √ # " √ √ # 2 2 − 2  x − 0  = √2 + √2 √2 2 2 y −  2 − 2 2 4 " √ √ √ # 2 + 2 x − 2 y −   = √2 √2 √2 4 . 2 2   2 − 2x + 2 y − 4

At the point (x1, y1) = (0.1, 0.8), we have

 0.76998   0.76749  f(x , y ) ≈ , L (x , y ) ≈ . 1 1 0.58732 f 1 1 0.57601

Because of the relatively small partial derivatives at (x0, y0), the linearization at this point yields a fairly accurate approximation at (x1, y1). □

Differentiability

Before using a linearization to approximate a function near a point p0, it is helpful to know whether this linearization is actually an accurate approximation of the function in the first place. That is, we need to know if the function is differentiable at p0, which, informally, means that its instantaneous rate of change at p0 is well-defined. In the single-variable case, a function f(x) is differentiable at ′ x0 if f (x0) exists; that is, if the

′ f(x) − f(x0) f (x0) = lim x→x0 x − x0 exists. In other words, we must have

f(x) − f(x ) − f ′(x )(x − x ) lim 0 0 0 = 0. x→x0 x − x0

′ But f(x0)+f (x0)(x−x0) is just the linearization of f at x0, so we can say that f is differentiable at x0 if and only if f(x) − L (x) lim f = 0. x→x0 x − x0 Note that this is a stronger statement than simply requiring that

lim f(x) − Lf (x) = 0, x→x0 because as x approaches x0, ∣1/(x−x0)∣ approaches ∞, so the difference f(x)−Lf (x) must approach zero particularly rapidly in order for the fraction [f(x) − Lf (x)]/(x − x0) to approach zero. That is, the linearization must be a sufficiently accurate approximation of f near x0 for this to be the case, in order for f to be differentiable at x0.

7 This notion of differentiability is readily generalized to functions of several variables. Given n m f : D ⊆ ℝ → ℝ , and p0 ∈ D, we say that f is differentiable at p0 if ∥f(p) − L (p)∥ lim f = 0, p→p0 ∥p − p0∥ where Lf (p) is the linearization of f at p0. 2 Example Let f(x, y) = x y. To verify that this function is differentiable at (x0, y0) = (1, 1), we 2 first compute fx = 2xy and fy = x . It follows that the linearization of f at (1, 1) is

Lf (x, y) = f(1, 1) + fx(1, 1)(x − 1) + fy(1, 1)(y − 1) = 1 + 2(x − 1) + (y − 1) = 2x + y − 2. Therefore, f is differentiable at (1, 1) if ∣x2y − (2x + y − 2)∣ ∣x2y − (2x + y − 2)∣ lim = lim p = 0. (x,y)→(1,1) ∥(x, y) − (1, 1)∥ (x,y)→(1,1) (x − 1)2 + (y − 1)2 By rewriting this expression as ∣x2y − (2x + y − 2)∣ ∣x − 1∣∣y(x + 1) − 2∣ = , p(x − 1)2 + (y − 1)2 p(x − 1)2 + (y − 1)2 and noting that ∣x − 1∣ lim ∣y(x + 1) − 2∣ = 0, 0 ≤ p ≤ 1, (x,y)→(1,1) (x − 1)2 + (y − 1)2 we conclude that the limit actually is zero, and therefore f is differentiable. □ There are three important conclusions that we can make regarding differentiable functions:

∙ If all partial derivatives of f at p0 exist, and are continuous, then f is differentiable at p0.

∙ Furthermore, if f is differentiable at p0, then it is continuous at p0. Note that the converse is not true; for example, f(x) = ∣x∣ is continuous at x = 0, but it is not differentiable there, because f ′(x) does not exist there.

∙ If f is differentiable at p0, then its first partial derivatives exist at p0. This statement might seem redundant, because the first partial derivatives are used in the definition of the linearization, but it is important nonetheless, because the converse of this statement is not true. That is, if a function’s first partial derivatives exist at a point, it is not necessarily differentiable at that point. The notion of differentiability is related to not only partial derivatives, which only describe how a function changes as one of its variables changes, but also the instantaneous rate of change of a function as its variables change along any direction. If a function is differentiable at a point, that means its rate of change along any direction is well-defined. We will explore this idea further in Lecture 7.

8 Practice Problems

x −x 1. Compute the equation of the tangent plane of f(x, y) = e cos y + e sin y at (x0, y0) = (−1, /2). Then, use the linearization of f at this point to approximate the value of f(−1.1, 1.6). How accurate is this approximation?

1 2 2 2 2. Compute the equation of the tangent space of f(x, y, z) = 2 ln ∣(x − 1) + (y + 2) + z ∣ at (x0, y0, z0) = (1, 1, 1). Then, use the linearization of f at this point to approximate the value of f(1.01, 0.99, 1.05). How accurate is this approximation?

3. Let f(x, y) = x2 + y2. Use the definition of differentiability to show that this function is differentiable at (x0, y0) = (1, 1).

4. Suppose that the coordinates of two points (x1, y1) = (2, −3) and (x2, y2) = (7, −5) are obtained by measurements, for which the maximum error in each is 0.01. Estimate the maximum error in the distance between the two points.

Additional Practice Problems

Additional practice problems from the recommended textbooks are:

∙ Stewart: Section 11.4, Exercises 1-5 odd, 15-27 odd

∙ Marsden/Tromba: Section 2.3, Exercises 5, 9, 15

9