<<

Lecture 6: 2.5 The Chain Rule. The Chain rule in one variable: suppose that y = g(x), and z = f(y), i.e. z = h(x), where h(x) = f(g(x)) = f ◦ g(x) then

dz dz dy = ⇔ h′(x) = f ′(y) g ′(x) dx dy dx

The intuitive way to understand this is through the linear approximation:

△z = f(y + △y) − f(y) ∼ f ′(y)△y

which is the same as saying that f is differentiable. Similarly

△y = g(x + △x) − g(x) ∼ g ′(x)△x

and if we combine the two we get

△z ∼ f ′(y) g ′(x)△x

Since this is also equal to

△z = h(x + △x) − h(x) ∼ h′(x)△x

the chain rule in one variable follows. The Chain rule in several variables: Suppose that g : Rn → Rm, f : Rm → Rp and let h = f ◦ g : Rn → Rp (i.e. h(x) = f(g(x))). Then

Dh(x0) = Df(y0) Dg(x0), where y0 = g(x0)

and the right hand side is the p × n matrix formed by the matrix product of the p × m matrix Df(y0) by the m × n matrix Dg(x0).

The intuitive argument above actually generalizes to several variables just by re- placing f ′ by Df etc. since differentiability of functions in several variables says

g(x + △x) − g(x) ∼ Dg(x)△x.

If h(t) = f(c(t)) where f : R3 → R, c(t) = (x(t), y(t), z(t)) is a path or curve, then by the chain rule   dx   [ ]  dt    ∂h ∂f ∂f ∂f  dy  ∂f dx ∂f dy ∂f dz = Df Dc =   = + + ∂t ∂x ∂y ∂z  dt  ∂x dt ∂y dt ∂z dt   dz dt The of a f : Rn → R given by [ ] ∂f ∂f grad f = ∇f = ... ∂x1 ∂xn 1 2

This can also be expressed with the gradient notation and dh (t) = ∇f(c(t)) · c ′(t) dt where c′(t) = (x′(t), y′(t), z′(t)), is the velocity vector of the path. Note that c′(t) is to the path c(t), which follows since it can be obtained as the as h → 0 of the vector between two close points on the curve ( ) c(t+ h) − c(t) x(t+ h) − x(t) y(t+ h) − y(t) z(t+ h) − z(t) = , , → (x′(t), y′(t), z′(t)) h h h h Ex. If z = x2 + y2, x = cos t and y = sin t find dz/dt. dz ∂z dx ∂z dy Sol. 1 = + = 2x (− sin t)+2y cos t = 2 cos t (− sin t)+2 sin t cos t = 0 dt ∂x dt ∂y dt dz Sol. 2 z = x2 + y2 = cos2 t + sin2 t = 1, = 0 dt Ex. If z = h(r, θ) = f(x, y) where x = g1(r, θ) = r cos θ and y = g2(r, θ) = r sin θ then by the chain rule   ∂g1 ∂g1 (∂h ∂h) (∂f ∂f )  ∂r ∂θ  , = ,   ∂r ∂θ ∂x ∂y ∂g2 ∂g2 ∂r ∂θ i.e. ∂h ∂f ∂x ∂f ∂y = + ∂r ∂x ∂r ∂y ∂r ∂h ∂f ∂x ∂f ∂y = + ∂θ ∂x ∂θ ∂y ∂θ or written shorter since h and f is the same function expressed in different coordi- nates ∂ ∂x ∂ ∂y ∂ = + ∂r ∂r ∂x ∂r ∂y ∂ ∂x ∂ ∂y ∂ = + ∂θ ∂θ ∂x ∂θ ∂y In this case the matrix   ∂x ∂x ( )  ∂r ∂θ  cos θ −r sin θ   = ∂y ∂y sin θ r cos θ ∂r ∂θ ∂r ∂x Note that in general ≠ ( )−1. whereas for a function of one variable its ∂x ∂r true that dx/dy = (dy/x)−1. That this is true in one dimension follows from differentiating the identity f(f −1(x)) = x which gives f ′(f −1(x))f −1(x) ′ = 1. The higher dimension analogue of this would be with the matrices so    −1 ∂r ∂r ∂x ∂x ( )    ∂x ∂y   ∂r ∂θ  1 r cos θ r sin θ   =   = ∂θ ∂θ ∂y ∂y r − sin θ − cos θ ∂x ∂y ∂r ∂θ 3

2.6 The gradient and .

The gradient of a function f : R3 → R is the vector ( ) ∂f ∂f ∂f ∇f = , , ∂x ∂y ∂z

i.e. it is the matrix of written as a vector.

Consider the equation of a line in space ℓ(t) = x + tv, −∞ < t < ∞. The function h(t)=f◦ ℓ(t)=f(x + tv) represents the function f restricted to the line. The directional derivative of f at x in the direction of unit vector v is given by

d f(x + tv) = ∇f(x) · v dt t=0 Here the equality follows from the chain rule: h ′ =Dh=Df Dℓ=∇f · ℓ ′ =∇f · v.

The reason we choose v to be a unit vector is that we want the directional derivative to represent the rate of change in different directions. Suppose that f represents the temperature at different points in space. Suppose that a fly flies along the line above at unit speed then the change of temperature per unit time or distance is the directional derivative.

The gradient points in the direction along which f increases the fastest. In fact ∇f · v = |∇f| |v| cos θ, where θ is the angle between ∇f and v, and the max is when cos θ = 1.

Suppose we are lost in wood and we want to reach a high hill top to see where we are. However, we can only see a few feet in front of us because of the high trees. In which direction shall we go in order to reach a hill-top fast. The answer is that if we go in the direction of the grade likely to reach a hill-top fast.

The gradient is normal to the tangent plane of the level surface:

3 Let f : R → R and let (x0, y0, z0) be a point on the level surface S defined by f(x, y, z) = k, for some constant k. Then ∇f(x0, y0, z0) is normal to the level surface in the following sense: If v = c′(0) is a tangent vector to a path c(t) in S with c(0) = (x0, y0, z0), then ∇f(x0, y0, z0) · v = 0. In fact, since f(x(t), y(t), z(t)) = k it follows that

d 0 = f(x(t), y(t), z(t)) = ∇f(x(t), y(t), z(t)) · c′(t) dt Let S be a level surface f(x, y, z) = k. The tangent plane of S at a point (x0, y0, z0) of S is defined by the equation

∇f(x0, y0, z0) · (x − x0, y − y0, z − z0) = 0

In fact (x, y, z) is in the tangent plane if (x, y, z)−(x0, y0, z0) is parallel to the plane and hence perpendicular to the normal.