<<

How we’re going to generalize the chain rule to several variables. We’ll do it in stages start- ing today and finishing next time. First, we’re go- ing to turn x into a vector. Then, f is a scalar- f The chain rule, part 1 valued Rm → R, and x is actually a func- Math 131 Multivariate tion R →x Rm. Then, f ◦ x : R → R is defined by D Joyce, Spring 2014 (f ◦ x)(t) = f(x(t)). x RR- m The chain rule. First the one you know. @ We’ll start with the chain rule that you already @ @ f know from ordinary functions of one variable. It f ◦ x @ tells you how to find the of the com- @R ? f◦g f R position A −→ C of two functions, R → R and g df R → R. This composition is defined by (f ◦g)(x) = The question is, what’s the derivative in this f(g(x)) and illustrated by the commutative dia- dt generalization? gram After we’ve answered that question, we’ll make g RR- t into a vector t in Rn. Then x won’t be just a @ function R →x Rm, but a function Rn →x Rm, and @ n @ f f ◦ x : R → R is defined as (f ◦ x)(t) = f(x(t)). f ◦ g @ x @R ? Rn - Rm R @ @ @ f When we generalize this, we’ll use a slightly dif- f ◦ x @ ferent notation. We’ll let the independent variable @R ? be t instead of x, but we’ll use x instead of g, R so x is a function of t. Thus, f ◦ x is defined by Finally, we’ll make f into a vector-valued func- (f ◦ x)(t) = f(x(t)). f tion Rn → Rp to get full generality. Then f ◦ x : x RR- Rn → Rp is defined by (f ◦ x)(t) = f(x(t)). @ x @ Rn - Rm @ f @ f ◦ x @ @ @R ? @ f R f ◦ x @ @R ? p Using these symbols for the functions and using R the prime notation for , the chain rule says The first step in the case m = 2. To get a (f ◦ x)0(t) = f 0(x(t)) x0(t). better start on this, we’ll let m = 2. Then we’ve got a function x : R → R2. It’s made out of two We’ll have partial derivatives, so the chain rule in coordinate functions Leibniz’ notation is a better place to start. In that notation, it says x(t) = (x, y)(t) = (x(t), y(t)),

df df dx where x and y are real-valued functions of t. Then = . dt dx dt f : R2 → R is evaluated at (x(t), y(t)) to get the

1 composition f(x(t), y(t)). And we want to compute notation, dot products, and matrix multiplication. df the derivative in terms of the derivatives of f Note that he of f is dt ∂f ∂f  ∂f ∂f ∂f  (which are the two partial derivatives and ) ∇f = , ,..., , ∂x ∂y ∂x1 ∂x2 ∂xm dx dy and the derivatives of x and y, namely, and . dt dt while the derivative of x is Example 1. Let f(x, y) = x3 + xy + y4, and let dx dx dx  1 , 2 ,..., m x(t) = cos t and y(t) = sin t. Then the composition dt dt dt is which we can denote either Dx or x0. Then the f(x(t), y(t)) = cos3 t + cos t sin t + sin4 t. chain rule is the of these two vectors

df df Now, to compute , just differentiate the equation = (∇f) · x0. dt dt f(x, y) = x3 +xy +y4, but keep in mind that x and y are both functions of t. You get When treated as matrices, we can write

h ∂f ∂f ∂f i df 2 dx dx dy 3 dy Df = ... = 3x + y + x + 4y . ∂x1 ∂x2 ∂xm dt dt dt dt dt and Now, since  dx1  ∂f dt = 3x2 + y  dx2  ∂x Dx =  dt   ...  while dxm ∂f dt = x + 4y3, ∂y so we can also write the chain rule as a matrix mul- df tiplication of these last two matrices we can see that can be rewritten as dt df = Df Dx. df ∂f dx ∂f dy dt = + . dt ∂x dt ∂y dt Continued in part 2.

That last equation is the chain rule in this gen- Math 131 Home Page at eralization. The formal proof depends on the ordi- http://math.clarku.edu/~djoyce/ma131/ nary definition of derivative and the usual proper- ties of limits, but as this is a form of the chain rule, the proof has a lot of details.

The first step for general m. Of course, this generalizes to df ∂f dx ∂f dx ∂f dx = 1 + 2 + ··· + m dt ∂x1 dt ∂x2 dt ∂xm dt for general m, not just for m = 2. But let’s see how that can be written more concisely using vector

2