Summary of

William G. Faris

December 6, 2004 2 Chapter 1

Functions

1.1 A catalog of functions

A function f takes an input number in its domain and gives an output number in its range. If for each output number in the range there is only one corresponding number in the domain, then the function f has an inverse function f −1. That is, if y = f(x), then x is defined in terms of y by x = f −1(y). The domain of the inverse function is the range of the original function, and the range of the inverse function is the domain of the original function. Sometimes a function may not have an inverse function, but by restricting it to a smaller domain it will have an inverse function. In that case, the inverse function is determined by solving y = f(x) for x, with x in this restricted domain. For example, the squaring function y = x2 is naturally defined on the domain of all real numbers x, and it does not have an inverse function. However if we restrict the squaring function to the smaller domain of all x ≥ 0, then there 1 is an inverse function x = y 2 , that is, taking the positive square root. In mathematics it is customary to describe a function by what it does on input values. In a few cases there are explicit names for the functions. For in- stance, many calculators and computer languages have notations such as square and sqrt that describe the function itself. Thus for example the square func- tion is the function that sends x to square(x) = x2. Similarly, the square root √ 1 function sqrt sends x to sqrt(x) = x = x 2 . A function is often described by a graph, where ordinarily the horizontal axis represents the input, and the vertical axis represents the output. For the graph to describe a function, it must have the property that every vertical line intersects the graph at most once. For the graph to describe a function with an inverse, it must have the property that every horizontal line intersects the graph at most once. The graph of the inverse function is obtained by interchanging the roles of horizontal and vertical. For a first example of a function, fix a number p called the power. If x is the input number, then xp is the output number. This power function may be

3 4 CHAPTER 1. FUNCTIONS defined on the domain consisting of all numbers x > 0, that is, on the interval (0, +∞). If p > 0 it may be defined on a larger domain consisting of all real numbers x ≥ 0, that is, on the interval [0, +∞). In either case it sends the domain to itself. With these domains the function has an inverse function. 1 1 Since y = xp, implies x = y p , this inverse function sends y to y p . Thus the inverse function of the pth power function is the 1/pth power function. In some circumstances the power function may be defined on a larger domain. Say that p = n/k, where n and k are integers, and k is odd. Then the domain may be extended to include all numbers x < 0, that is, the interval (−∞, 0) is a subset of the domain. If n is odd, then the function has the range equal to 1 the domain. If n is odd, the inverse function sends y to y p , where 1/p = k/n. If n is even, then the range is a subset of [0, ∞). In this case the function with this larger domain has no inverse function. For a second example, fix a number a > 0 called the base. To make the function interesting, take a 6= 1. If x is the input number, then ax is the output number. This is the exponential function with base a. The domain consists of all real numbers, and the range consists of the interval (0, +∞). Since y = ax is equivalent to x = loga(y), the inverse of the exponential function with base a is the logarithm function with base a. The most common choices of base are 2, e, and 10. The almost universal choice in calculus contexts is a = e. In this case, the exponential function is sometimes denoted exp, and the logarithm function is often denoted ln. Thus exp−1 = ln . (1.1) The reason for this choice is that e is the only base for which the exponential function satisfies the inequality 1 + x ≤ ex (1.2) for all x. All the other exponential functions may be defined in terms of the one with base e. This is because ax = (eln(a))x = eln(a)x. All one has to know is the numerical value of the constant ln(a). Similarly, if y = loga(x), then x = y ln(a)y a = e , so ln(a)y = ln(x), that is, loga(x) = ln(x)/ ln(a). Again the same constant is involved. Notice that the inequality for other bases a > 1 takes the form 1 + ln(a)t ≤ at (1.3) for all t. The reason for the simplicity in the case of base e is that ln(e) = 1. The next example is that of the trigonometric functions. These functions will always be defined with radians as inputs. (Degrees may be converted to radians by multiplying by π/180.) The functions sin and cos have domain consisting of all real numbers and range consisting of the interval [−1, 1]. With these natural domains they do not have inverse functions. However if one restricts sin to [−π/2, π/2] and cos to [0, π], then the restricted functions have inverses, called arcsin and arccos. Thus sin−1 = arcsin (1.4) 1.2. EXPONENTIALS BEAT POWERS 5 and cos−1 = arccos . (1.5) The function tan is defined by dividing the output of the sin function by the output of the cosine function. Since the cosine is zero at odd multiples of π/2, the domain of the tangent function must exclude these points. The tangent function does not have an inverse unless it is restricted to a smaller interval, and the natural restriction is to (−π/2, π/2). Then inverse function is

tan−1 = arctan . (1.6)

A final example of a function is a constant function. This gives the constant output c for every input.

1.2 Exponentials beat powers

An exponential function will always be larger than a power function for suffi- ciently large input values. In fact, from 1 + x ≤ ex we see that for x ≥ 0 we have (1 + x)n ≤ enx for each n = 1, 2, 3,.... We may set t = nx and get

t (1 + )n ≤ et (1.7) n for t ≥ 0. This says that the exponential function grows at least as fast as an 1 2 t nth degree polynomial. For example, when n = 2 we have 1 + t + 4 t ≤ e for t ≥ 0. From 1 + x ≤ ex we get 1 + ln(y) ≤ y for all y > 0. Set y = sp for p > 0. This gives 1 + p ln(s) ≤ sp, or

1 ln(s) ≤ (sp − 1) (1.8) p for s > 0. This says that the logarithm function grows no faster than a p power 1 √ function. For example, when p = 1/2 we have ln(s) ≤ 2 ( s − 1) for s > 0.

1.3 Combining functions

Functions may be combined by addition, subtraction, multiplication, and divi- sion. This is done by performing the corresponding operation on the output values. Thus the value of the sum or difference function f ± g at x is

(f ± g)(x) = f(x) ± g(x). (1.9)

The value of the product function f · g at x is

(f · g)(x) = f(x) · g(x). (1.10) 6 CHAPTER 1. FUNCTIONS

f The value of the quotient function g at x is

f f(x) ( )(x) = . (1.11) g g(x)

Here the domain is restricted to those x for which the denominator g(x) 6= 0. Start with the constant functions and the first power function. If we repeat- edly apply the sum, difference, and product operations, we obtain the polynomial functions. If in addition we apply the quotient operations we obtain the rational functions. Another method of combining functions is composition. The composition f ◦ g is defined by (f ◦ g)(x) = f(g(x)). (1.12) Thus the output of g becomes the input of f. In other words, the output is obtained by first applying g and then applying f. The order in which functions are composed is important. Thus, for example, the function sin ◦ square with input x has output sin(x2). On the other hand, the function square ◦ sin with input x has output (sin(x))2. In general, the composition f ◦ g represents the action of g followed by the action of f. The notion of inverse function is closely related to the notion of composition. If f −1 is the inverse function to f, so that y = f(x) is equivalent to x = f −1(y), then f −1(f(x)) = x and f(f −1(y)) = y. Thus arcsin(sin(x)) = x and sin(arcsin(y)) = y, where −π/2 ≤ x ≤ π/2 and −1 ≤ y ≤ 1. It is extremely important to avoid confusion between the notation g−1(x) for inverse function and the notation g(x)−1 = 1/g(x) for reciprocal function. These both play an important role in calculus, and the fact that the notation is essentially the same requires constant vigilance. Thus it is often useful in division problems to note that the reciprocal (multiplicative inverse)

1 = g(x)−1 (1.13) g(x) is the composition where one first applies g and then the −1 power function. This has nothing to do with the inverse function of g. There are some rather simple compositions that are important in practice. The first is the shift. Say that f is a given function. Then we can shift it up by k if we define a new function with output y satisfying y − k = f(x), or y = f(x) + k. We can shift it right by h if we define a new function with output y satisfying y = f(x − h). A nice example of the latter is when cos is changed to sin by shifting to the right by π/2. Another is the stretch. Say that f is a given function. Then we can stretch it vertically by c > 0 if we define a new function with output y satisfying y/c = f(x), or y = cf(x). We can stretch it horizontally by a > 0 if we define a new function with output y satisfying y = f(x/c). Another one is the reflection. Say that f is a given function. Then we can reflect it vertically if we define a new function with output y satisfying 1.4. UNITS AND DIMENSIONS 7 y = −f(x). We can reflect it horizontally if we define a new function with output y satisfying y = f(−x). A function is even if it is unchanged by horizontal reflection, that is, if f(x) = f(−x) for all x. The standard examples is f = cos. Another example is the pth power where p = n/k with k odd and n even. A function is odd if horizontal reflection is the same as vertical reflection, that is, if f(−x) = −f(x) for all x. The standard example is sin. Another example is the pth power where p = n/k with k odd and n odd.

1.4 Units and dimensions

In applications many quantities are measured in units. Thus length may be measured in meters, time in seconds, and mass in kilograms. In general units of length, time, and mass are indicated by L, T and M. The specification of length, time, mass, and so on is called the dimensions of the quantity. The formula V = s3 for the volume of a box of side s is a good example. In this formula the input s has dimension L of length, while the output V has dimensions L3. Thus if s is measured in meters, the V is measured in cubic meters. Similarly, velocity has units L/T of length over time, acceleration has units L/T 2 of length over time squared, and force has units F = ML/T 2 of mass times length over time squared. Velocity could be measured in meters per second, acceleration in meters per second squared, and force in newtons. Other important physical quantities are energy with dimensions FL and power with dimensions F L/T . The unit of energy is the joule, while the unit of power is the watt. (Thus a joule is a watt-second, while 3.6 megajoules is a kilowatt-hour.) In applications we have relations like u = Aekt, where t is time. This defines u as a function of t. Here u is some positive quantity that grows exponentially with time (if k > 0) or decays exponentially with time (if k < 0). The input to the exponential function must be independent of the units, that is, dimension- less. Therefore, if t is measured in seconds, the continuous growth rate k must be measured in inverse seconds, so that kt is a dimensionless number. That is, the growth rate has dimension 1/T of inverse time. k(t−t0) One modification of this scheme is to have u = Ae . In this case t0 is a starting time. Again this defines u as a function of t, but now A is the value of u at the starting time t0. However this reduces to the previous case, if we write

k(t−t0) kt u = Ae = A1e , (1.14)

−kt0 where A1 = Ae . Notice that the output of the exponential function is also dimensionless, so the dimension of A1 is the same as the dimension of A. In applications it is common to pick a base for the exponential function that is adapted to the particular situation. The idea is to choose a convenient unit of time ∆t > 0. Thus ∆t could be a half hour or 30 minutes. Define

a = ek∆t, (1.15) 8 CHAPTER 1. FUNCTIONS

Then t kt a ∆t = e . (1.16) The reason for the choice of base a is that it is the growth factor for the chosen time unit ∆t. Conversely, one can choose a growth factor such as a = 2; then the time unit ∆t is the doubling time. (These concepts have to be modified in an obvious way if k is negative. Then the growth factor becomes a decay factor, and the doubling time becomes a half life.) Often one writes a = 1 + r, where r is the increase over 1. If k∆t is small, then since ek∆t ≈ 1 + k∆t, it follows that r ≈ k∆t. It is not surprising that people confuse these two quantities. A more interesting case where the same ideas apply is when we have a relation like u = A cos(ωt), where t is time. This defines u as a function of t. The input to the cosine function is in radians and therefore must be independent of the units, that is, dimensionless. Therefore, if t is measured in seconds, the angular frequency ω must be measured in radians per seconds, so that the phase ωt is a dimensionless number. That is, the angular frequency ω has dimension 1/T of inverse time. One modification of this scheme is to have u = A cos(ω(t − t0). In this case t0 is the time shift. Again this defines u as a function of t, but now A is the value of u at the starting time t0. However this reduces to something more like the previous case, if we write

u = A1 cos(ωt) + B1 sin(ωt), (1.17) where A1 = A cos(ωt0) and B1 = B sin(ωt0). Notice that the output of the sine and cosine functions are also dimensionless, so the dimension of A1 and B1 are the same as the dimension of A. Sometimes we write φ = ωt0 to get a dimensionless constant called the phase shift. In this notation u = A cos(ωt − φ). There are several variations on these notions. The frequency ν is related to the angular frequency by ω = 2πν. (1.18) This is because radians are being accumulated faster than complete rotations. Often the frequency is measured in hertz (another name for inverse seconds). The frequency ν is related to period T by 1 ν = . (1.19) T Therefore the angular frequency ω is related to period by 2π ω = . (1.20) T The same ideas apply to a wave with variation in some direction in space rather than in time. Thus we might have a relation u = A cos(kx), where x is distance. This defines u as a function of x. The input to the cosine function 1.5. CONSTANT SPEED WAVE PROPAGATION 9 must be independent of the units, that is, dimensionless. Therefore, if x is measured in meters, the wave number k must be measured in inverse meters, so that kx is a dimensionless number. That is, the wave number has dimension 1/L of inverse length. The wave number is related to the wavelength λ by

2π k = . (1.21) λ

1.5 Constant speed wave propagation

For light waves (and also for sound waves) there is a remarkable relation between these quantities. Let c be the speed of these waves. One possible wave has the form cos(kx − ωt) = cos(kx) cos(ωt) + sin(kx) sin(ωt). (1.22) For simplicity assume that ω and k are positive. (Otherwise use absolute values). Then the speed c satisfies ω = ck (1.23) This immediately gives νλ = c. (1.24) The frequency times the wavelength is the speed. Frequency is measured in hertz (inverse seconds), while wavelength is measured in meters. The speed c is of course measured in meters per second. The speed of light is about c = 3 · (10)8 meters per second. Wavelengths of 3 km, 300 m, 30 m, 3 m, 30 cm correspond to radio frequencies of 100 kilohertz, 1 megahertz, 10 megahertz, 100 megahertz, and 1 gigahertz. This are what are called LF, MF, HF, VHF, and UHF radio bands. As the frequencies get higher and the wavelengths get shorter one gets into the radar and microwave range and eventually to the infrared at about 3·(10)−6 meter wavelength. Then 3 · (10)−7 meters is already in the ultraviolet. The speed of sound is about c = 3.3 · (10)2 meters per second. Thus a frequency of 330 hertz (cycles per second) corresponds to a wave of length about one meter. This is about the size of a low string instrument. If you triple the frequency and divide the length by three, you get something more like a violin or a flute.

1.6 Appendix: Metric units

Fundamental metric units for length, mass, and time are the meter, kilogram, and second, abbreviated by m, kg, and s. The units minute, hour, day are abbreviated min, hr, d. Sometimes radian is abbreviated by rad. Other units for frequency, force, energy, and power are the hertz (inverse second), (kilogram meter per second squared), joule (newton meter), and watt (joule per second), abbreviated by Hz, N, J, W. 10 CHAPTER 1. FUNCTIONS

The multipliers 10−9, 10−6, 10−3 are called nano, micro, and milli, and are abbreviated as n, µ, and m. Sometimes 10−2 is called centi and abbreviated as c. The multipliers 103, 106, 109 are abbreviated kilo, mega, and giga and are abbreviated by k, M, G. Thus for example a kilowatt-hour is 1 kw hr = 3.6 MJ. The situation is quite confused in the non-metric world. The question is about the choice of the unit of mass. Some authors like to use the foot, pound- mass, poundal system. Then the unit of mass is the pound-mass, and the unit of force is the poundal. Others use the foot, slug, pound-force system. Then the unit of mass is the slug, and the unit of force is the pound-force. A foot is 0.3048 meter. A pound-mass is 0.45359 kilogram. (A kilogram is thus about 2.2 pound-mass.) In the foot, pound-mass, second system the unit of force is the poundal, which is a pound-mass foot per second squared. This is 0.45359 kilogram times 0.3048 meter per second squared, which works out to be 0.13825 newton. The acceleration of gravity at the earth’s surface is 9.80665 meter per second squared, or 32.1740 foot per second squared. A pound-force is a pound-mass times the acceleration of gravity, that is, 32.1740 poundal. In the foot, slug, second system a slug is 32.1740 pound-mass, or 14.594 kilogram. The unit of force is the pound-force, which is 32.1740 poundal, or 4.4482 newton. Chapter 2

The

2.1 Limits

Say that g is a function. We say that limt→c g(t) = L provided that g(t) can be made as close as one wishes to L by taking t sufficiently close to c. In other words, one can get an output of g arbitrarily close to L by taking the input close enough to c. The official definition of limt→c g(t) = L is that for every number  > 0 (no matter how small), there is a number δ > 0 such that for every number t 6= c, if |t − c| < δ, then |g(t) − L| < .

2.2 A limit for the exponential function

We have seen that 1 + x ≤ ex for all x. It follows that 1 − x ≤ e−x for all x. If x < 1, then 1 − x > 0, and so we may write this in the equivalent form ex ≤ 1/(1 − x). This proves the result 1 1 + h ≤ eh ≤ (2.1) 1 − h for all h < 1. As a consequence h h ≤ eh − 1 ≤ (2.2) 1 − h for all h < 1. From this we see that eh − 1 1 1 ≤ ≤ (2.3) h 1 − h for 0 < h < 1 and 1 eh − 1 ≤ ≤ 1 (2.4) 1 − h h

11 12 CHAPTER 2. THE DERIVATIVE for h < 0. In either case, if h is close to 0, then 1/(1−h) is close to 1, and hence by the inequality (eh − 1)/h is close to 1. This proves

eh − 1 lim = 1. (2.5) h→0 h Notice that one could prove in the same way that at − 1 lim = ln(a). (2.6) t→0 t The reason for the simplicity of the base e case is that ln(e) = 1. This limit helps clarify the meaning of the continuous growth rate k. Recall that for each time interval ∆t there is a corresponding growth rate r∆t such that t kt (1 + r∆t) ∆t = e . (2.7) It follows that k∆t r∆t = e − 1 (2.8) and so r ek∆t − 1 ∆t = k . (2.9) ∆t k∆t Let ∆t → 0. Then also k∆t → 0. So r lim ∆t = k. (2.10) ∆t→0 ∆t This says that the continuous growth rate is found by comparing the growth during a small time interval to the length of the time interval. So if the growth rate is an increase by 0.003 (or 0.3 percent) in 1/100 year, then the continuous growth rate k = 0.29996 is close to 0.003/(1/100) = 0.30 per year. This should be contrasted with the actual growth in a year. This is by a factor of (1.003)100 = 1.349. This of course is the same as e0.29996 = 1.349. So in a year the growth is an increase of 34.9 percent.

2.3 Limits for the sine and cosine functions

Look at the circle centered at the origin with radius one. Consider h with 0 < h < π/2. Then (cos(h), sin(h)) is the point on the circle corresponding to angle h in radians. The distance along the circle from the point (1, 0) to the point (cos(h), sin(h)) is h. From this it is clear that sin(h) ≤ h for 0 < h < π/2. Now consider the area of the sector inside the circle corresponding to angle 0 to angle h. This is (1/2)h. Compare this to the area of the triangle running from (0, 0) to (1, 0) to (1, tan(h)). This area is (1/2) tan(h), so we have (1/2)h ≤ (1/2) tan(h). From the above inequalities we see that sin(h) cos(h) ≤ ≤ 1 (2.11) h 2.4. CONTINUITY 13 for 0 < |h| < π/2. This is true for both positive and negative values of h, since these are all even functions. The conclusion is the important limit result sin(h) lim = 1. (2.12) h→0 h We can get a result for cosine from the result for sine and from cos2(h) + sin2(h) = 1. Write this in the form sin2(h) = (1 + cos(h))(1 − cos(h)). From this we get sin2(h) 1 − cos(h) = (1 + cos(h)) . (2.13) h2 h2 Consider the limit as h → 0. The left hand side has limit 1, while the first factor on the right hand side has limit 2. This proves cos(h) − 1 1 lim = − . (2.14) h→0 h2 2 As a corollary we get that cos(h) − 1 lim = 0. (2.15) h→0 h 2.4 Continuity

A function f is continuous provided that for all x in the domain we have

lim f(t) = f(x). (2.16) t→x In other words, the output of f is arbitrarily close to f(x) if the input is close enough to x. Here is another way of saying the same thing. A function f is continuous provided that for all x in the domain we have

lim f(x + h) = f(x). (2.17) h→0 2.5 The derivative

A function f is differentiable provided that for all x in the domain we have f(x + h) − f(x) lim = f 0(x). (2.18) h→0 h The value f 0(x) is the derivative of the function f at the input value x. In other words, the difference quotient may be made arbitrarily close to the derivative by making the denominator sufficiently close to zero. The classic example is f(x) = x2. We can write the difference quotient as

(x + h)2 − x2 = 2x + h. (2.19) h 14 CHAPTER 2. THE DERIVATIVE

The left hand side is defined for h 6= 0. However it is evident from the right hand side that (x + h)2 − x2 lim = 2x. (2.20) h→0 h So f 0(x) = 2x. Another simple example is g(u) = 1/u. We can write

1/(u + h) − 1/u 1 = − . (2.21) h u(u + h)

The left hand side is defined for h 6= 0. However it is evident from the right hand side that 1/(u + h) − 1/u 1 lim = − . (2.22) h→0 h u2 So g0(u) = −1/u2. Say that y = f(x). Then the derivative may also be written in the form

∆y f 0(x) = lim , (2.23) ∆x→0 ∆x where ∆x 6= 0 is the change in x between two points and ∆y = f(x+∆x)−f(x) is the corresponding change in y between the same two points. This makes explicit the fact that the derivative is a limit of quotients of differences. A common notation is the Leibniz notation, in which the derivative is written

dy ∆y = lim . (2.24) dx ∆x→0 ∆x Since the two points become closer and closer, the derivative dy/dx depends only on one point. The fact that there are two notations in calculus is one of its most confusing aspects. However the relation between the two notations is quite precise. If

y = f(x) (2.25) then dy = f 0(x). (2.26) dx Here f and f 0 are functions, with f 0 the derivative of f while x and y are the input and output variables. Thus the two equations above are actually equations involving the outputs of the functions. That is, dy/dx is the output of the derivative function f 0 at the input x. If we want to evaluate y or dy/dx at the point where the input variable x is a, then this value is y(x=a) = f(a) or 0 dy/dx(x=a) = f (a). The interpretation of dy/dx is as the rate of change of y with respect to change in x. Thus if dy/dx = f 0(x) > 0, then y is increasing with x, while if dy/dx = f 0(x) < 0, then y is decreasing with x. If the value of dy/dx at 2.5. THE DERIVATIVE 15

0 a particular point x = a is zero, that is, if dy/dx(x=a) = f (a) = 0, then y is hesitating at this point as x changes, undecided whether to increase or decrease. Return to the examples. Suppose that y = x2. We can write the difference quotient as

∆y (x + ∆x)2 − x2 2x∆x + (∆x)2 = = = 2x + ∆x. (2.27) ∆x ∆x ∆x The left hand side is defined for ∆x 6= 0. However it is evident from the right hand side that (x + ∆x)2 − x2 lim = 2x. (2.28) ∆x→0 ∆x So dy/dx = 2x. Notice in connection with this example that (∆x)2 must be distinguished from ∆x2 = 2x∆x + (∆x)2. Another simple example is w = 1/u. We can write

∆w 1/(u + ∆u) − 1/u 1 = = − . (2.29) ∆u ∆u u(u + ∆u)

The left hand side is defined for ∆u 6= 0. However it is evident from the right hand side that 1/(u + ∆u) − 1/u 1 lim = − . (2.30) ∆u→0 ∆u u2 So dw/du = −1/u2. Warning: The Leibniz notation should serve to remind us of the definition of derivative as a limit of quotients. Thus an expression like dy/dx or dw/du is not a quotient, but is a limit of quotients ∆y/∆x or ∆w/∆u. Suppose that s = f(t), where t is time and s is the position at time t. Then the velocity is ds v = = f 0(t). (2.31) dt The classic example is the falling body, where

1 s = gt2, (2.32) 2 and g is a constant. This describes the distance below the starting point when an object is dropped from this point at time zero. Then

∆s 1 g(t + ∆t)2 − 1 gt2 1 = 2 2 = gt + g∆t. (2.33) ∆t ∆t 2 In this case ds v = = gt. (2.34) dt 16 CHAPTER 2. THE DERIVATIVE

2.6 Derivative of the exponential function

If exp(x) = ex, then exp0(x) = ex. The exponential function is its own deriva- tive. This is seen by calculating

ex+h − ex eh − 1 = ex. (2.35) h h By using the limit we know for the exponential functions, we get

ex+h − ex exp0(x) = lim = ex = exp(x). (2.36) h→0 h 2.7 of the sine and cosine functions

We have sin(x + h) − sin(x) cos(h) − 1 sin(h) = sin(x) + cos(x). (2.37) h h h Take the limit as h → 0. This gives

sin0(x) = cos(x). (2.38)

Similarly,

cos(x + h) − cos(x) cos(h) − 1 sin(h) = cos(x) − sin(x). (2.39) h h h Take the limit as h → 0. This gives

cos0(x) = − sin(x). (2.40)

2.8 Differentiability implies continuity

If f is differentiable at x, then f is continuous at x. This is because when f 0(x) exists, then

f(x + h) − f(x) lim [f(x + h) − f(x)] = lim · lim h = f 0(x) · 0 = 0. (2.41) h→0 h→0 h h→0

This implies that limh→0 f(x + h) = f(x), which is continuity at x. If f is continuous at x, then f may or√ may not be differentiable at x. Here is an interesting example. Let sqrt(x) = x be the square root function, defined for x ≥ 0. This function is continuous at zero, since if a number is small, its square root is also small. The difference quotient is √ √ x + h − x 1 = √ √ . (2.42) h x + h + x 2.9. THE 17

(This is a mildly tricky bit of algebra to invent, but to check it is simple. The underlying principle is that a2−b2 = (a−b)(a+b) implies a−b = (a2−b2)/(a+b). Apply this to the numerator on the left hand side.) Now take the limit as h approaches zero. If x > 0 the answer is just 1 sqrt0(x) = √ . (2.43) 2 x √ However if x = 0, then the difference quotient is 1/ h. This makes sense at least if h > 0. However it does not approach a real number in the limit as h → 0, in fact, it gets larger and larger. So sqrt does not have a derivative at zero. (However one could say that it has a right hand derivative with value +∞, but this would involve a more general notion of derivative.) Another important example is f(x) = |x|. This absolute value function is continuous at zero, since if a number is close to zero, then its absolute value is also close to zero. However the difference quotient is |x + h| − |x| 2x + h = . (2.44) h |x + h| + |x| (Check the algebra; the same trick works as in the square root example.) When x 6= 0 this has a limit f 0(x) = x/|x|. However there is a problem when x = 0. The difference quotient in this case is h/|h| (or h/|h|, which is the same thing). When h > 0 this is 1, while when h < 0 this is −1. So making h close to zero does not force the difference quotient to be close to some fixed number. Thus the absolute value function is not differentiable at zero. (However one could say that it has a right hand derivative at zero with value 1 and a left hand derivative at zero with value −1.)

2.9 The second derivative

If y = f(x) and dy/dx = f 0(x), then it may well be that there is a second derivative, usually written as d2y = f 00(x). (2.45) dx2 At the end of this section is an argument that a better version of the Leibniz notation for second derivative would be d2y = f 00(x). (2.46) (dx)2 However it is common to write the second derivative as d2y/dx2 without the parentheses around the dx. This is in spite of the fact that dx2 has another meaning in the context dx2/dx = 2x. The second derivative is the rate of change of the rate of change. If it is positive, then the original function is concave up, while if it is negative, then the original function is concave down. 18 CHAPTER 2. THE DERIVATIVE

Consider the example w = g(u) = 1/u with dw/du = g0(u) = −1/u2. The fact that this is negative means that the original function is decreasing for all inputs u 6= 0. We can write −1/(u + h)2 + 1/u2 2u + h = . (2.47) h u2(u + h)2 The left hand side is defined for h 6= 0. However it is evident from the right hand side that −1/(u + h)2 + 1/u2 1 lim = 2 . (2.48) h→0 h u3 So d2w/du2 = g00(u) = 2/u3. From this one can see that the original function is concave up for u > 0 and concave down for u < 0. If s = f(t) is the position of a moving particle at time t, then the velocity is ds v = = f 0(t). (2.49) dt The acceleration is dv d2s a = = = f 00(t). (2.50) dt dt2 The Leibniz notation for second derivative is perhaps puzzling. However we have seen that if we define ∆y = f(x + ∆x) − f(x), then dy ∆y = lim . (2.51) dx ∆x→0 ∆x This is called the forward difference definition of the derivative. A completely equivalent definition would result from defining instead ∆y = f(x) − f(x − ∆x). Then the derivative could also be defined with this backward difference definition. (From the point of view of symmetry and numerical accuracy an even nicer definition would be to take the average of the forward and backward differences.) The most elegant way to do the second difference is to first perform the forward difference and then the backward difference. This would give

∆2y = ∆∆y = ∆[(f(x + ∆x) − f(x)] = [f(x + ∆x) − f(x)] − [f(x) − f(x − ∆x)], (2.52) where ∆y = ∆f(x) = [f(x + ∆x) − f(x)] is defined by the forward difference and ∆∆y = ∆[f(x + ∆x) − f(x)] is defined by the backward difference. (If we first did the backward difference and then the forward difference, then we would get the same result.) Fact: The second derivative may be expressed in an alternate form as a single limit d2y ∆2y = lim . (2.53) (dx)2 ∆x→0 (∆x)2

This makes the point that the notation d2y/(dx)2 might be more suggestive than the more conventional d2y/dx2. Chapter 3

Differentiation rules

3.1 Summary of differentiation rules

Here are the differentiation rules in the Leibniz notation. This is the nota- tion that is most useful in mathematical modeling, that is, in applications of mathematics to the real world. The rule for differentiating a constant is dc = 0. (3.1) dx The rule for differentiating a power function dxp = pxp−1. (3.2) dx One may use x or u or whatever for the input variable. The rule for differenti- ating the exponential function is deu = eu. (3.3) du The rule for differentiating the sine function is

d sin(θ) = cos(θ). (3.4) dθ The rule for differentiating the cosine function is

d cos(θ) = − sin(θ). (3.5) dθ Let u = f(x) and v = g(x). The sum (difference) rule is

d(u ± v) du dv = ± . (3.6) dx dx dx

19 20 CHAPTER 3. DIFFERENTIATION RULES

The product rule is d(uv) du dv = v + u . (3.7) dx dx dx The is d u du v − u dv v = dx dx . (3.8) dx v2 Set y = f(u) and u = g(x). The is dy dy du = . (3.9) dx du dx After differentiation, substitute u = g(x) in dy/du. Take y = f(u) with u = f −1(y). The inverse function rule follows from the chain rule by computing 1 = dy/dy = (dy/du)(du/dy). The result is du 1 = . (3.10) dy dy du After differentiation, substitute u = f −1(y) in dy/du..

3.2 Differentiating inverse functions

Since deu/du = eu the derivative of the logarithm function u = ln(y) is d ln(y) 1 1 1 = = = (3.11) dy eu eln(y) y for y > 0. Since d sin(θ)/dθ = cos(θ) the derivative of the inverse sine function θ = sin−1(y) with −π/2 < θ < π/2 is

d sin−1(y) 1 1 = = p (3.12) dy cos(sin−1(y)) 1 − y2 for −1 < y < 1. Parenthetical note: Since d cos(θ)/du = − sin(θ) the derivative of the inverse cosine function θ = cos−1(y) with 0 < θ < π is d cos−1(y) 1 1 = = −p (3.13) dy − sin(cos−1(y)) 1 − y2 for −1 < y < 1. The fact that the answer is the same up to a sign change should be no surprise, since π cos−1(y) = − sin−1(y). (3.14) 2 Since tan(θ) = sin(θ)/ cos(θ), the quotient rule gives

d tan(θ) cos2(θ) + sin2(θ) 1 = = . (3.15) dθ cos2(θ) cos2(θ) 3.3. CHAIN RULE EXAMPLES 21

Therefore the derivative of the inverse tangent function θ = tan−1(y) with −π/2 < θ < π/2 is

d tan−1(y) 1 = cos2(θ) = cos2(tan−1(y)) = . (3.16) dy 1 + y2 3.3 Chain rule examples

The sine of the square is expressed by the composition y = sin(u) with u = x2. Thus dy/dx = (dy/du)(du/dx), or

d sin(x2) d sin(u) du du = = cos(u) = cos(x2)2x. (3.17) dx du dx dx The derivative of the square of a sine is given by z = w2 with w = sin(θ). Thus dz/dθ = (dz/dw)(dw/dθ), or

d sin2(θ) w2 dw dw = = 2w = 2 sin(θ) cos(θ). (3.18) dθ dw dθ dθ The answer 2 sin(θ) cos(θ) = sin(2θ) is periodic with period π. This suggests another way of doing the problem. Use the identity cos(2θ) = cos2(θ)−sin2(θ) = 1 − 2 sin2(θ). This gives d sin2(θ) 1 d cos(2θ) 1 d cos(τ) dτ 1 dτ 1 = − = − = sin(τ) = sin(2θ)2 = sin(2θ). dθ 2 dθ 2 dτ dθ 2 dθ 2 (3.19) Some uses of the chain rule are so simple that most people do not use an intermediate variable. For instance, instead of writing θ = ωt and computing d sin(ωt)/dt = d sin(θ)/dθ dθ/dt one just writes d sin(ωt) = ω cos(ωt). (3.20) dt Similarly, dekt = kekt. (3.21) dt kt t Write e = a ∆t , where a is the growth factor for time step ∆t. Notice that

t da ∆t ln(a) t = a ∆t . (3.22) dt ∆t is exactly the same thing, since ln(a) = k∆t. Remember that sometimes people write a = 1+r, where r is the growth rate for time step ∆t. Then ln(1+r) = k∆t, and when r is small this implies that r ≈ k∆t. So the meaning of the continuous growth rate k is r/∆t (in the limit as ∆t → 0). That is, the continuous growth rate is the growth rate per time step, in the limit of very small time step. The growth rate r is dimensionless, but the continuous growth rate k has dimensions of inverse time. 22 CHAPTER 3. DIFFERENTIATION RULES

3.4 Implicit functions

When we write y = 1/x2, we have an explicit definition of y as a function of x. However when we write yx2 = 1 we have an implicit definition of y as a function of x. It is possible to differentiate such expressions using the ordinary rules of calculus. Thus we would get (dy/dx)x2 + y(2x) = 0, which may be solved to get dy/dx = −2y/x. This is equivalent to the answer dy/dx = −2/x3 that one would get by explicit differentiation. The equation yx2 = 1 also gives an implicit definition of x as a func- √ tion of y. The explicit definition would be x = 1/ y = y−1/2. If we dif- ferentiate the implicit equation, we get x2 + y(2x)dx/dy = 0. This gives dx/dy = −(1/2)x/y = −(1/2)y−3/2. Notice that the −2 power and the −1/2 power are inverse functions of each other. This is the reason why dy/dx and dx/dy are reciprocals of each other. As another example, consider the implicit equation w = sin(z). This is supposed to define z as a function of w. We get 1 = cos(z)(dz/dw), so dz/dw√ = 2 2 2 2 2 1/ cos(z). Since cos (z)+sin√ (z) = cos (z)+w = 1, we get cos(z) = ± 1 − w and so dz/dw = 1/(± 1 − w2). This is the answer we got when we did the derivative of the inverse function arcsin = sin−1 directly. (However one has to think about the ± sign.)

3.5 Proof of the product rule

As a sample of the kind of argument to prove these rules, let us prove the product rule. The idea is that if u = f(x) and v = g(x), then ∆u = f(x + ∆x) − f(x) and ∆v = g(x + ∆x) − g(x). Furthermore,

(u + ∆u)(v + ∆v) = uv + ∆u v + u ∆v + ∆u ∆v. (3.23)

This algebraic identity says that changing the sides of a rectangle gives a change that has three terms. The last term is the small postage stamp term.

∆uv (u + ∆u)(v + ∆v) − uv ∆u ∆v ∆u ∆v = = v + u + ∆x. (3.24) ∆x ∆x ∆x ∆x ∆x ∆x In the limit the postage stamp term goes away, and one gets the product rule. This is the crucial simplification that makes calculus work. Remark: The quotient rule follows from the product rule. Write u/v = uv−1. Then by the power rule and the chain rule dv−1/dx = −v−2dv/dx. Hence by the product rule

duv−1 du dv−1 du dv = v−1 + u = v−1 − uv−2 . (3.25) dx dx dx dx dx Some people find this a helpful way to remember the quotient rule. 3.6. PROOF OF THE CHAIN RULE 23

3.6 Proof of the chain rule

The proof of the chain rule is much simpler than the proof of the product rule, since the chain rule is valid even before one takes the limit. If y = f(u) and u = g(x), then ∆y = f(u + ∆u) − f(u) and ∆u = g(x + ∆x) − g(x). Then

∆y ∆y ∆u = . (3.26) ∆x ∆u ∆x When you take the limit, you get the chain rule. (The only problem could be in the rare situation when ∆u = 0 even when ∆x 6= 0, but this can be fixed.)

3.7 Differentiating power functions

Let p be an arbitrary real number. Say that f(x) = xp. Then it is true in general that f 0(x) = pxp−1. To prove this, we consider the cases when x > 0, x = 0, and x < 0 separately. When x > 0, then there are no restrictions on p. We can write f(x) = ep ln(x). Hence by the chain rule

1 1 f 0(x) = ep ln(x)p = xpp = pxp−1. (3.27) x x When x = 0, we restrict to p > 1. In that case we can use the definition of the derivative to argue that the derivative is zero. When x < 0, we can define xp when p = n/k with k odd. In that case, xp is an even function for n even and an odd function for n odd. If f(x) is even, we have f(x) = f(−x), so by the chain rule f 0(x) = −f 0(−x), and so f 0(x) is odd. Similarly, if f(x) is odd, then f 0(x) is even. These facts show that the power rule continues to work for x < 0. In fact, if p = n/k with n even (or odd), then p − 1 = (n − k)/k has n − k odd (or even).

3.8 Summary of differentiation rules in a differ- ent notation

Here we restate the rules in a different notation. This notation is used mainly in theoretical discussions. We have c0 = 0, ( )p0 = p()p−1, exp0 = exp, sin0 = cos and cos0 = − sin. We have (f ± g)(x) = f(x) + g(x), so the sum and difference rules are

(f ± g)0(x) = f 0(x) ± g0(x). (3.28)

Also, (fg)(x) = f(x)g(x), so the product rule is

(fg)0(x) = f 0(x)g(x) + f(x)g0(x). (3.29) 24 CHAPTER 3. DIFFERENTIATION RULES

Again, (f/g)(x) = f(x)/g(x), so the quotient rule is

f f 0(x)g(x) − f(x)g0(x) ( )0(x) = . (3.30) g g(x)2

With (f ◦ g)(x) = f(g(x)) the chain rule is

(f ◦ g)0(x) = f 0(g(x))g0(x). (3.31)

With f(f −1(y)) = y the inverse function rule is 1 (f −1)0(y) = . (3.32) f 0(f −1(y))

3.9 Summary of differentiation rules in mixed notation

Here are the rules in a hybrid notation that may make some people comfortable. The sum and difference rules are d(f(x) ± g(x)) = f 0(x) ± g0(x). (3.33) dx The product rule is d(f(x)g(x)) = f 0(x)g(x) + f(x)g0(x). (3.34) dx The quotient rule is d(f(x)/g(x)) f 0(x)g(x) − f(x)g0(x) = . (3.35) dx g(x)2 The chain rule is d f(g(x)) = f 0(g(x))g0(x). (3.36) dx The inverse function rule is df −1(y) 1 = . (3.37) dy f 0(f −1(y))

3.10 The derivative as a linear approximation

If f is differentiable at x, then it is always possible to write

0 f(x + h) = f(x) + f (x)h + Ex(h), (3.38) where E (h) lim x = 0. (3.39) h→0 h 3.11. L’HOSPITAL’S RULE IN A SPECIAL CASE 25

0 There is no mystery about Ex(h); it is just Ex(h) = f(x + h) − f(x) − f (x)h. Think of it as an error term with the limit property expressed above. So f(x+h) as a function of h is approximated by the linear function f(x) + f 0(x)h with an error that is relatively small with respect to h. In the Leibniz notation the same result may be expressed by taking y = f(x) and ∆y = f(x + ∆x) − f(x) and writing

dy ∆y = ∆x + E (∆x). (3.40) dx x This says that ∆y is proportional to ∆x up to an error that is relatively small with respect to ∆x. As an example, take sin(x + h) = sin(x) + cos(x)h + Ex(h). This says that a small change in the sine function at a particular point is given by multiplying the change by the value of the cosine function at the point. In the Leibniz notation this could take the form ∆ sin(x) = cos(x)∆x + Ex(∆x). There is another way of expressing the same ideas. If f is differentiable at a, then 0 f(z) = f(a) + f (a)(z − a) + Ea(z − a), (3.41) where E (z − a) lim a = 0. (3.42) z→a z − a This says that f(z) as a function of z is approximated by the linear function f(a) + f 0(x)(z − a) with an error that is relatively small with respect to z − a. That is, if one looks at f(z) on a very small range of values of inputs near a, then it looks like a linear function. As an example, take sin(z) = sin(a)+cos(a)(z−a)+Ex(z−a). This says that the sine function near a particular point a is approximately a linear function with cos(a). Contrast this with the absolute value function, which does not have a deriva- tive at zero. If one looks at the graph of |z| near the origin, the corner is always apparent, no matter how close one looks.

3.11 l’Hospital’s rule in a special case

There is a special case of l’Hospital’s rule that makes sense from the point of view of linear approximation. This says that if f and g are both differentiable at a, and if f(a) = 0 and g(a) = 0, and g0(a) 6= 0, then

f(z) f 0(a) lim = . (3.43) z→a g(z) g0(a)

The reason this is true is that we can write

0 f(z) = f (a)(z − a) + Ea(f, z − a), (3.44) 26 CHAPTER 3. DIFFERENTIATION RULES and 0 g(z) = g (a)(z − a) + Ea(g, z − a). (3.45) Near a both f and g look like linear functions that vanish at a. Furthermore, the ratio of the two linear functions is the ratio of their . So

E (f,z−a) f(z) f 0(a) + a f 0(a) lim = lim z−a = . (3.46) z→a g(z) z→a 0 Ea(g,z−a) g0(a) g (a) + z−a This used the fact that the limit of a quotient is the quotient of the limits. An example of l’Hospital’s rule at work is ex − eπ eπ lim = = −eπ. (3.47) x→π sin(x) cos(π) Chapter 4

Change

4.1 First derivative test

Let f be a function. Then f has a local minimum at a if f(x) ≥ f(a) for all x near a. Also, f has a local maximum at a if f(x) ≤ f(a) for all x near a. The first derivative test says that if f is a differentiable function and f has either a local minimum or a local maximum at a, then dy = f 0(a) = 0. (4.1) dx (x=a)

4.2 Second derivative test

The second derivative test is about the situation where f is a differentiable function with f 0(a) = 0. Then if d2y = f 00(a) > 0 (4.2) dx2 (x=a) then it follows that f has a local minimum at a. If instead d2y = f 00(a) < 0 (4.3) dx2 (x=a) then it follows that f has a local maximum at a.

4.3 Global minimum and global maximum

A function f has a global minimum on some domain at a if f(x) ≥ f(a) for all x in the domain. Also, f has a global maximum at a if f(x) ≤ f(a) for all x in the domain. A continuous function f defined on a closed interval [a, b] (an interval that includes the end points) always has a point where there is a global minimum

27 28 CHAPTER 4. CHANGE and a point where it has a global maximum. If the function is differentiable in the open interval (a, b), then at such a point either the derivative is zero, or it is an end point. A continuous function defined on an infinite interval like [a, +∞) or on an open interval like (a, b) does not necessarily have a global minimum or a global maximum.

4.4 Change of coordinates

Say that y = f(u) and u = g(x), so that y = h(x) = f(g(x)). Then we can think of y as a function f of u or as a function h of x. Suppose du/dx 6= 0. Then the first derivative test for y as a function of u is the same as the first derivative test for y as a function of x. This is because dy dy du = . (4.4) dx du dx Therefore since x = a and u = g(a) are the same point, we have dy dy = 0 ⇔ = 0. (4.5) dx (x=a) du (u=g(a)) Again suppose du/dx 6= 0. Then the second derivative test for y as a function of u is the same as the second derivative test for y as a function of x. This is because   d2y d2y du 2 dy d2u = + . (4.6) dx2 du2 dx du dx2 Therefore since x = a and u = g(a) are the same point and at that point dy dy = = 0, (4.7) dx (x=a) du (u=g(a)) we have d2y d2y > 0 ⇔ > 0 (4.8) dx2 (x=a) du2 (u=g(a) and d2y d2y < 0 ⇔ < 0. (4.9) dx2 (u=a) du2 (u=g(a))

4.5 Optimization

Consider the example of minimizing the area

A = 2(πr2) + (2πr)h (4.10) of a cylinder with fixed value of the volume

V = (πr2)h. (4.11) 4.5. OPTIMIZATION 29

There are several ways to do the problem. A straightforward method is to think of A as a function of r. Then 2V A = 2πr2 + . (4.12) r The first derivative is dA 2V = 4πr − . (4.13) dr r2 The second derivative is d2A 4V = 4π + . (4.14) dr2 r3 The first derivative vanishes at

  1 V 3 r∗ = . (4.15) 2π

At this point where dA = 0 (4.16) dr (r=r∗) the second derivative is d2A = 12π > 0. (4.17) dr2 (r=r∗) This indicates a local minimum. A slightly uglier method is to think of A as a function of h. Then

2V 1 A = + 2(πV h) 2 . (4.18) h The first derivative is dA 2V 1 − 1 = − + (πV ) 2 h 2 . (4.19) dh h2 The second derivative is

2 d A 4V 1 1 − 3 = − (πV ) 2 h 2 . (4.20) dh2 h3 2 The first derivative vanishes at

  1 4V 3 h∗ = . (4.21) π

At this point where dA = 0 (4.22) dh (h=h∗) the second derivative is d2A 3 = π > 0. (4.23) dh2 (h=h∗) 4 30 CHAPTER 4. CHANGE

This indicates a local minimum. There is however a much nicer way to do the problem. It does not matter whether we think of A as a function of r or of h. For definiteness, let us take A as a function of r. But the idea is to differentiate implicitly. Thus dA dh = 2π(2r + h + r ). (4.24) dr dr Furthermore, since V is constant, we have dh 2rh + r2 = 0. (4.25) dr Thus dA = 2π(2r − h). (4.26) dr At the point where dA/dr = 0 we have h = 2r. (4.27) This version gives much more geometric insight. The second derivative is d2A dh h = 2π(2 − ) = 4π(1 + ). (4.28) dr2 dr r At the point where the first derivative vanishes this is 12π > 0 as before. This indicates a local minimum. At this minimum the actual value of A is 6πr2 where r = r∗.

4.6 The

The mean value theorem says that if f is continuous on [a, b] and differentiable on (a, b), then there exists a number c in (a, b) with f(b) − f(a) f 0(c) = . (4.29) b − a This is a very important theoretical result, since shows that information about the derivative gives information about the functions. In the Leibniz notation y = f(x) this says that if we compute ∆y = f(x + ∆x) − f(x), then there is a c between x and x + ∆x such that dy ∆y = . (4.30) dx (x=c) ∆x The most important consequences are the increasing function theorem and the constant function theorem. The increasing function theorem says that if f 0(x) > 0 for all x in (a, b), then f is increasing on (a, b). The constant function theorem says that if f 0(x) = 0 for all x in (a, b), then f is constant on [a, b]. One consequence of the constant function theorem is that if two functions have the same derivative on some interval, then they differ by a constant. That is, if F 0(x) = G0(x), then F (x) = G(x) + C. This constant is usually called the constant of integration. 4.7. L’HOSPITAL’S RULE 31

4.7 l’Hospital’s rule l’Hospital’s rule says that if f and g are both differentiable at a, and if f(a) = 0 and g(a) = 0, then f(z) f 0(z) lim = lim . (4.31) z→a g(z) z→a g0(z) Consider the function f(z)g(x)−g(z)f(x) as a function of x for fixed z. This function has value zero at x = a and at x = z. By the mean value theorem, there is a number c between a and z such that the derivative f(z)g0(c)−g(z)f 0(c) = 0. Thus f(z) f 0(c) = (4.32) g(z) g0(c) with c between a and z. As z gets close to a, then also c gets close to a. So taking the limit gives l’Hospital’s rule.

4.8 The error in the linear approximation ac- cording to Cauchy

Consider a differentiable function f(z) and the the linear approximation f(a) + f 0(a)(z − a) near a. Fix a particular value of z and look at this as a function of a. Then d [f(a) + f 0(a)(z − a)] = f 0(a) − f 0(a) + f 00(a)(z − a) = f 00(a)(z − a). (4.33) da This formula says that as a approaches z, the change in the predicted value depends on the second derivative. Furthermore, this change gets small as a gets close to z. The reason is that the predicted value of f at z given by the linear approximation is almost exact as a gets close to z. Apply the mean value theorem to the interval from a to z. This says that there is a c between a and z such that [f(z) + f 0(z)(z − z)] − [f(a) + f 0(a)(z − a)] = f 00(c)(z − c). (4.34) z − a This proves that

f(z) = f(a) + f 0(a)(z − a) + f 00(c)(z − c)(z − a), (4.35) where c is between a and z. This gives the Cauchy form of the error in the linear approximation. As an example, take sin(z) = sin(a) + cos(a)(z − a) + Ea(z − a), where Ea(z − a) = − sin(c)(z − c)(z − a). It is always true that | sin(c)| ≤ 1. If |z − a| ≤ 1/100, then certainly it is also the case that |z − c| ≤ 1/100. So |Ea(z − a)| ≤ 1/10000. 32 CHAPTER 4. CHANGE

4.9 The error in the linear approximation ac- cording to Lagrange

There is another way of writing the error that is sometimes more convenient. The idea is to take a as a function of a parameter t. The number z remains a fixed constant. Let g(t) = f(a) + f 0(a)(z − a), where a = h(t). Then by the chain rule g0(t) is given by d da 1 d(z − a)2 g0(t) = [f(a) + f 0(a)(z − a)] = f 00(a)(z − a) = − f 00(a) . (4.36) dt dt 2 dt 2 This suggests a choice of parameter. Take t ≤ 0 such that (z − a) = −t. With√ this choice the rate of change is g0(t) = (1/2)f 00(a), where a = h(t) = z ± −t. What makes this work is that when t is near 0, the rate of change of a with respect to t is very large. This makes the predicted value of f at z given by the linear approximation continue to change at a rate given by the second derivative, even when a is close to z. The interval from t to 0 parameterizes the interval from a = h(t) to z = h(0). The mean value theorem on the interval from t to 0 gives g(0) − g(t) [f(z) + f 0(z)(z − z)] − [f(a) − f 0(a)(z − a)] 1 = = g0(c) = f 00(c∗), 0 − t (z − a)2 2 (4.37) where c is between t and 0, and c∗ = h(c) is between a and z. This proves that 1 f(z) = f(a) + f 0(a)(z − a) + f 00(c∗)(z − a)2, (4.38) 2 where c∗ is between a and z. This gives the Lagrange form of the error in the linear approximation. This last formula is particularly important. It may also be stated in the form 1 E (z − a) = f 00(c∗)(z − a)2. (4.39) a 2 This says that if one knows that |f 00(w)| ≤ M on some interval near a, then for z in that interval 1 |E (z − a)| ≤ M(z − a)2. (4.40) a 2 This can give a quite useful idea of how small the error is. As an example, take sin(z) = sin(a) + cos(a)(z − a) + Ea(z − a), where ∗ 2 ∗ Ea(z − a) = −(1/2) sin(c )(z − a) . It is always true that | sin(c )| ≤ 1. If |z − a| ≤ 1/100, then |Ea(z − a)| ≤ 1/20000.

4.10 Newton’s second law of motion

The fundamental law of physics is

F = ma. (4.41) 4.10. NEWTON’S SECOND LAW OF MOTION 33

Here F is force, m is mass, and a = dv/dt = d2s/dt2 is acceleration. In metric units F is measured in newtons, m in kilograms, and a in meters per second squared. In one English system F is measured in pounds of force, m in slugs, and a in feet per second squared. A pound of force is about 4.4482 newtons. A slug is about 14.594 kilograms. First consider the special case of constant gravitation. Distance is measured upward. Thus the force F = −mg is downward. The solution is a = −g and so v = v0 − gt, where v0 is the constant of integration, which is the initial velocity. Finally 1 s = s + v t − gt2, (4.42) 0 0 2 where s0 is the next constant of integration, the initial displacement. Next consider the special case when the only force is a friction F = −αv. This might describe a tiny particle in a fluid. Then the Newton equation is dv m = −αv. (4.43) dt This can be written 1 dv α = − . (4.44) v dt m Thus α ln(±v) = ln(±v ) − t. (4.45) 0 m The constant of integration is the logarithm of the absolute value of the initial velocity. This violates the rule that the input to a logarithm function must be dimensionless, but fortunately the equation has the equivalent form v α ln( ) = − t (4.46) v0 m without this problem. It follows that the solution is

− α t v = v0e m . (4.47) This says that the motion gets slower and slower. The solution for the displace- ment is m − α t s = s + v (1 − e m ). (4.48) 0 α 0 This implies that as t → ∞ the displacement s approaches a limiting value m s = s0 + α v0. What if there is both gravitation and friction? Then the force is F = −mg + αv. The Newton equation is dv m = −mg − αv. (4.49) dt This can be written 1 dv α mg = − . (4.50) α + v dt m 34 CHAPTER 4. CHANGE

Thus mg mg α ln(±(v + )) = ln(±(v + )) − t. (4.51) α 0 α m The solution is m − α t m v = (v + g)e m − g. (4.52) 0 α α m This says that the motion approaches a terminal velocity v = − α g where the acceleration is zero. The solution for the displacement is

m m − α t m s = s + (v + g)(1 − e m ) − gt. (4.53) 0 α 0 α α This implies that as t → ∞ the displacement s becomes approximately linear in t, just like free motion. However the mechanism is different; it is a balance between gravity pulling down and friction pushing up. What happens in the above formula as g → 0? It is easy to see that one −(α/m)t recovers s = s0 + (m/α)v0(1 − e ), which is the case with friction but with no gravity. What happens in the above formula as α → 0? This is a harder calculation. Write the result as 2 m − α t m α − α t s = s + v (1 − e m ) + g(1 − t − e m ). (4.54) 0 α 0 α2 m 2 With the help of l’Hospital’s rule the limit as α → 0 is s = s0 + v0t − (1/2)gt , the result for motion with gravity but without friction. Here is one final example, the frictionless harmonic oscillator. The force F = −ks is proportional to the displacement s. The constant k is the spring constant. Thus the farther the mass is from s = 0, the stronger it is pulled back in this direction. Newton’s law of motion says ma = −ks, that is, d2s m = −ks. (4.55) dt2 The second derivative is proportional to the function. This suggests trying a sine or cosine function. One possible choice is s = A cos(ωt − φ). (4.56) Here A is the amplitude and φ is the phase. In order to determine the angular frequency ω, substitute this back into the differential equation. The result is −mω2 = −k. (4.57) Thus the angular frequency is r k ω = . (4.58) m The constants A and φ depend on the initial conditions. In fact, the initial displacement is s0 = A cos(φ) and the initial velocity is v0 = Aω sin(φ). Chapter 5

The integral

5.1 Riemann sums

Say that f is a function. Let a and b be two real numbers. Let n be a positive integer. Set h = (b − a)/n. Write xi = a + ih. Notice that x0 = a and xn = b. A left Riemann sum is a sum of the form

nX−1 b La(f, n) = f(xi)h. (5.1) i=0

It would be nice to be able to calculate such sums. Unfortunately, this can be done in an explicit way in only a few cases. The rest of the time we need to resort to the computer. Here are a few cases when the sum can be evaluated. First, if f(x) = 1, the sum is nX−1 h = nh = b − a. (5.2) i=0 Second, if f(x) = x, we can use the trick of telescoping sums. Notice that (x + h)2 − x2 = 2xh + h2. Therefore

nX−1 nX−1 2 2 2 2 (2xi + h)h = ((xi + h) − xi ) = b − a . (5.3) i=0 i=0

Therefore nX−1 nX−1 2 2 2 xih + h h = b − a . (5.4) i=0 i=0 The conclusion is nX−1 1 1 x h = (b2 − a2) − h(b − a) (5.5) i 2 2 i=0

35 36 CHAPTER 5. THE INTEGRAL

Third, if f(x) = x2, we can again use the trick of telescoping sums. Notice that (x + h)3 − x3 = 3x2h + +3xh2 + h3. Therefore

nX−1 nX−1 2 2 3 3 3 3 (3xi + 3xih + h )h = ((xi + h) − xi ) = b − a . (5.6) i=0 i=0 Therefore nX−1 nX−1 nX−1 2 2 3 3 3 xi h + 3h xih + h h = b − a . (5.7) i=0 i=0 i=0 The conclusion is

nX−1 1 1 1 x2h = (b3 − a3) − h(b2 − a2) + h2(b − a). (5.8) i 3 2 6 i=0 For a final example, fix r > 0 with r 6= 1 and let f(x) = rx. This is an exponential function. Use the trick of telescoping sums to evaluate the geometric series. Notice that rx+h − rx = rx(rh − 1). Therefore

nX−1 h nX−1 h rxi h = (rxi+h − rxi ) = (rb − ra). (5.9) rh − 1 rh − 1 i=0 i=0 Various well-known sums are obtained by setting a = 0 and b = n, so h = 1. For instance, the last example gives the classic formula

nX−1 rn − 1 ri = . (5.10) r − 1 i=0 for the partial sum of a geometric series.

5.2 The definite integral

Say that f is a function. Let a and b be two real numbers. Let n be a positive integer. For each n = 1, 2, 3,... define b − a ∆t = . (5.11) n So as n gets large, the corresponding ∆t gets small. Also, for i = 0, 1, 2, . . . , n − 1, n let ti = a + i∆t. (5.12)

Thus t0 = a and tn = b. Let the left Riemann sum be nX−1 b La(f, n) = f(ti)∆t. (5.13) i=0 5.2. THE DEFINITE INTEGRAL 37

Let the right Riemann sum be Xn b Ra(f, n) = f(ti)∆t. (5.14) i=1 Say that f is a continuous function. Let a and b be two numbers. The definite integral is defined by

b b b Ia(f) = lim La(f, n) = lim Ra(f, n). (5.15) n→∞ n→∞ The definite integral is a number that depends on a, b, and f. It can be computed by either the left or right sum approximations. (From the point of view of symmetry and numerical accuracy an even nicer definition would be to take the average of the left and right sums.) The usual notation for the definite integral is Z b b Ia(f) = f(t) dt. (5.16) a Another variable may be used. For instance

Z b b Ia(f) = f(x) dx (5.17) a defines the same number. b b Both La(f, n) and Ra(f, n) turn out to be useful in the appropriate situa- tions. For instance, if the function f(t) is increasing on the interval from a to b with a < b, then Z b b b La(f, n) ≤ f(x) dx ≤ Ra(f, n). (5.18) a On the other hand, if the function f(t) is decreasing on the interval from a to b with a < b, then Z b b b Ra(f, n) ≤ f(x) dx ≤ La(f, n). (5.19) a Example. As n → ∞ the corresponding ∆t = (b − a)/n → 0. Use the left sum to calculate

nX−1 1 1 t ∆t = [ (b2 − a2) − ∆t(b − a)]. (5.20) i 2 2 i=0 Since ∆t → 0 as n → ∞, the integral is then

Z b nX−1 1 2 2 t dt = lim ti ∆t = (b − a ). (5.21) n→∞ 2 a i=0 These calculations are obvious geometrically, at least if 0 ≤ a < b. The integral is just the area (1/2)b2 of the big triangle minus the area (1/2)a2 of the small 38 CHAPTER 5. THE INTEGRAL triangle. The sum is a little smaller, since it left out n small triangles each of area (1/2)(∆t)2. Their total area is (1/2)∆t n ∆t = (1/2)∆t(b − a). Example. Use the left sum to calculate

Z b nX−1 2 2 1 3 3 1 2 2 1 2 1 3 3 t dt = lim ti ∆t = lim [ (b −a )− ∆t(b −a )+ (∆t) (b−a)] = (b −a ). n→∞ ∆t→0 3 2 6 3 a i=0 (5.22) Example. Fix r > 0 with r 6= 1. Use the left sum to calculate Z b nX−1 ∆t rt dt = lim rti ∆t = lim (rb − ra). (5.23) n→∞ n→∞ r∆t − 1 a i=0

By l’Hospital’s rule the limit

h 1 1 lim = lim = . (5.24) h→0 rh − 1 h→0 ln(r)rh ln(r)

So the answer is Z b 1 rt dt = (rb − ra). (5.25) a ln(r) In particular, taking r = ek we get Z b 1 ekt dt = (ekb − eka). (5.26) a k

5.3 First fundamental theorem of calculus

If f is a given function, then its antiderivative is a function F such that F 0(x) = f(x). The antiderivative is only determined up to an additive constant. Suppose f is continuous on [a, b]. Suppose it has an antiderivative F with

F 0(x) = f(x). (5.27)

The first fundamental theorem of calculus says that

Z b f(t) dt = F (b) − F (a). (5.28) a If you can find an antiderivative, then this gives a practical way of comput- ing definite integrals. It is the calculus form of the telescoping trick, but the computations are much easier. Example: Since F (x) = x2/2 has derivative f(x) = x, it follows that Z b 1 1 x dx = b2 − a2. (5.29) a 2 2 5.4. SECOND FUNDAMENTAL THEOREM OF CALCULUS 39

Example: Since F (x) = x3/3 has derivative f(x) = x2, it follows that Z b 1 1 x2 dx = b3 − a3. (5.30) a 3 3 Example: Since F (x) = ex has derivative f(x) = ex, it follows that

Z b ex dx = (eb − ea). (5.31) a Example: There is a function F (x) such that F 0(x) = sin(x2). However it does not have an expression in terms of elementary functions. So while it is true that Z b sin(x2) dt = F (b) − F (a), (5.32) a this does not give a particularly convenient answer. The first fundamental theorem of calculus is a close analog to the telescoping sum principle. Here is an example that makes this clear. Let ∆x = (b − a)/n and set xi = a + i ∆x. The summation example uses the telescoping procedure based on ∆x2 = (x + ∆x)2 − x2 = 2x ∆x + (∆x)2. This gives nX−1 nX−1 2 2 2 (2xi + ∆x) ∆x = (∆x )i = (b − a ). (5.33) i=0 i=0 Since n∆x = b − a, it follows that nX−1 2 2 2xi ∆x = (b − a ) − (b − a)∆x. (5.34) i=0 Compare this with the calculus calculation. Since dx2/dx = 2x, it follows that Z Z b b dx2 2x dx = dx = b2 − a2. (5.35) a a dx 5.4 Second fundamental theorem of calculus

Suppose f is continuous on [a, b]. Define

Z x F (x) = f(t) dt. (5.36) a The second fundamental theorem of calculus says that

F 0(x) = f(x). (5.37)

This theorem says that the definite integral may always be used to define an antiderivative of f. This antiderivative may be very difficult to compute in terms of elementary functions. 40 CHAPTER 5. THE INTEGRAL

Example: According to the second fundamental theorem of calculus the function f(x) = sin(x2) has an antiderivative given by Z x F (x) = sin(t2) dt. (5.38) a Even though this function does not have an expression in terms of elementary functions, it is true that F 0(x) = sin(x2). The second fundamental theorem of calculus gives a way of defining new functions. For instance, define Z x sin(t) Si(x) = dt. (5.39) 0 t This sine-integral function Si is used in optics. As another example, define Z z 2 1 1 − x N(z) = + √ e 2 dx. (5.40) 2 2π 0 This is called the standard normal cumulative distribution function in probabil- ity. This function has values between 0 and 1, and indeed its values represent probabilities. Its derivative is the function

2 0 1 − z n(z) = N (z) = √ e 2 . (5.41) 2π This is called the standard normal density function. In popular language it is the bell-shaped curve.

5.5 Notation for integrals

The value of a definite integral depends on the function and the end points, but not on the variable. So in principle one could write

Z b Z b b Iaf = f(x) dx = f(t) dt (5.42) a a For instance, one could write

Z b Z b b I0 sin = sin(x) dx = sin(t) dt = 1 − cos(b). (5.43) 0 0 The notation on the left without variables is very unusual. There is nothing mysterious about the fact that the variable of integration can be changed in this way. The situation is exactly the same for a sum

n=p k=p X X 1 − rp+1 rn = rk = . (5.44) 1 − r n=0 k=0 Chapter 6

Integration rules

6.1 Summary of integration rules

Here are the integration rules in the Leibniz notation. These are all rules for finding antiderivatives. The rule for integrating a power function with power p 6= −1 is Z xp+1 xp dx = + C, (6.1) p + 1 The rule for integrating the −1 power function for x 6= 0 is Z 1 dx = ln(|x|) + C. (6.2) x

(Often x > 0; then the absolute value is not needed.) The rule for integrating the exponential function is Z ex dx = ex + C (6.3)

The rule for integrating the sine function is Z sin(x) dx = − cos(x) + C. (6.4)

The rule for integrating the cosine function is Z cos(x) dx = sin(x) + C. (6.5)

Let u = f(x) and v = g(x). The rule for integrating a constant multiple is Z Z cu dx = c u dx. (6.6)

41 42 CHAPTER 6. INTEGRATION RULES

The sum (difference) rule is Z Z Z (u ± v) dx = u dx ± v dx. (6.7)

The integration by parts rule is Z Z dv du u dx = uv − v dx. (6.8) dx dx This rule applies to a product. However it just converts one integral into another integral. The hope is that the second one is easier. The integration by parts rule comes from the product rule for differentiation. Suppose z = f(g(x)) and set z = f(u) and u = g(x). The substitution rule is Z Z du f(u) dx = f(u) du. (6.9) dx After integration, substitute u = g(x) on the right hand side. This rule only applies to a product of a rather special form. The substitution rule comes from the chain rule applied to y = F (u) = F (g(x), where F 0(u) = f(u). It can also be written Z Z f(g(x))g0(x) dx = f(u) du, (6.10) where u = g(x). That is, to perform the integral, one must express u in terms of x in such a way that the integrand has this special form. Sometimes the substitution rules works well in conjunction with inverse func- tions. Then instead of expressing w in terms of x, the original variable x is expressed in terms of w, and so Z Z Z dx dw dx f(x) dx = f(x) dx = f(x) dw. (6.11) dw dx dw If the inverse function is x = h(w) this is Z Z f(x) dx = f(h(w))h0(w) dw, (6.12) that is, it is the substitution rule run backward. All of these variants are handled automatically when one uses differential forms, as we shall see.

6.2 Definite integrals by substitution

The substitution rule for definite integrals is

Z b Z g(b) f(g(x))g0(x) dx = f(u) du. (6.13) a g(a) 6.3. DIFFERENTIAL FORMS 43

6.3 Differential forms

A function is an object f with a numerical input and a numerical output. How- ever applications of calculus also deal with variable quantities that are related in several different ways. For instance, one might have w = f(u) and u = g(x). Then x and u and w are each a variable quantity. In this example they are related by the functions g and f. In a problem there may be a variable quantity t such that all other variable quantities of interest may be expressed as differentiable functions of t, with con- tinuous derivatives. Such a quantity is called a coordinate. If t is a coordinate, then other variable quantities such as w and u may be expressed in terms of t. Thus dw/dt and du/dt are defined. A coordinate need not be unique. If s and t are two coordinates, then s = h(t) and t = h−1(s), where h−1 is the inverse function of h. Thus ds/dt and dt/ds are defined, and ds dt = 1. (6.14) dt ds In particular ds/dt 6= 0 and dt/ds 6= 0. A differential form is an assignment of a variable quantity to each coordinate system, in such a way that the variable quantity associated with coordinate t is related to the variable quantity associated to coordinate s by multiplication by ds/dt. Let u be a variable quantity, so f(u) is also a variable quantity. An example of a differential form is f(u) du. For each coordinate there is a corresponding variable quantity. With the t coordinate the variable quantity is f(u)du/dt, and with the s coordinate the variable quantity is f(u)du/ds. These variable quantities are related by

du du ds f(u) = f(u) . (6.15) dt ds dt That is, one is obtained from the other by multiplication with ds/dt or dt/ds. Note: In advanced mathematics there are various technical names for differ- ential forms. Sometimes they are called differential 1-forms, sometimes they are called covariant vector fields. In this part of elementary calculus we are looking at these objects only in the one dimensional case, when every variable quantity may be expressed in terms of a single coordinate. Each variable quantity w has a differential which is a differential form. Thus if w = F (u) and u = g(x), then the differential of w is dw = F 0(u) du = F 0(g(x))g0(x) dx. It does not make sense to say a differential form has a particular non-zero numerical value, but it does make sense to say that it is zero at a certain point. Example. Suppose that t is a coordinate for the quantities of interest, so we can always take derivatives with respect to t. Then it is natural to say that dt 6= 0. If w = u2 = sin2(t), then w is not a coordinate, and in fact, dw can vanish at certain points. Thus dw = 2u du = 2 sin(t) cos(t) dt = sin(2t) dt 44 CHAPTER 6. INTEGRATION RULES vanishes where u = 0 or du = 0, that is, where t is a multiple of π/2. These are critical points of w.

6.4 Second differentials

Say that t is a coordinate. Since all variable quantities of interest may be expressed as differentiable functions of t, we may consider dt 6= 0. Consider another coordinate s. Differentiate a variable quantity w with respect to these coordinates. The derivatives are related by dw dw ds = (6.16) dt ds dt Differentiate again with respect to t and use the chain rule. This gives   d2w d2w ds 2 dw d2s = + . (6.17) dt2 ds2 dt ds dt2

In general we may have d2w/dt2 positive or negative or zero quite independent of whether d2w/ds2 is positive or negative or zero. However, at a point where dw/ds = 0 we have   d2w d2w ds 2 = , (6.18) dt2 ds2 dt where   ds 2 > 0. (6.19) dt It does not make sense to say that the second differential d2w has a particular numerical value, but at a particular point where dw = 0 it does make sense to say that d2w > 0 or d2w < 0 or d2w = 0 at that point. This is because the numerical value of the second differential d2w with respect to coordinate t is related to the numerical value of the second differential d2w with respect to coordinate s by multiplication by (ds/dt)2 > 0. To summarize, consider a variable quantity w. Then at a point where dw = 0 (first derivative test) the second differential d2w is defined, and d2w can be either positive, negative, or zero (second derivative test). Note: The second differential is defined only at a particular point where the first differential is equal to zero. In advanced mathematics this second differential is called the Hessian. Here we are only looking at the Hessian in the very special case when every variable quantity may be expressed in terms of a single coordinate. Example. Suppose w = u2 = sin2(t). Then dw = sin(2t) dt = 0 where t is a multiple of π/2. At such a point where dw = 0 we have d2w = 2 cos(2t)(dt)2. This is greater than zero when t is an even multiple of π/2 and is less than zero when t is an odd multiple of π/2. In the first case these are points that 6.5. DIFFERENTIAL FORMS AND INDEFINITE INTEGRALS 45 give local minima of w, and in the second case they are points that give local maxima of w. Example: Consider the example of minimizing the area

A = 2(πr2) + (2πr)h (6.20) of a cylinder with fixed value of the volume

V = (πr2)h. (6.21)

Both r and h are coordinates. Calculate dA = 2π((2r + h) dr + r dh). (6.22) Since V is constant, we have

2rh dr + r2 dh = 0. (6.23)

Thus 2r dA = 2π(2r − h) dr = πr(1 − ) dh. (6.24) h At the point where dA = 0 we have

h = 2r. (6.25)

The second differential at this point is 3 d2A = 12π(dr)2 = π(dh)2 > 0. (6.26) 4 This indicates a local minimum.

6.5 Differential forms and indefinite integrals

Differential forms occur naturally as integrands. For example, if u = g(x), then du = g0(x) dx. This gives the substitution rule Z Z f(g(x))g0(x) dx = f(u) du. (6.27)

Integration of a differential recovers the original variable quantity. Thus Z dw = w + C. (6.28)

Example: Since u = x2 implies du = 2x dx and d cos(u) = − sin(u) du, we have Z Z Z sin(x2)2x dx = sin(u) du = − d cos(u) = − cos(u) + C = − cos(x2) + C. (6.29) 46 CHAPTER 6. INTEGRATION RULES

6.6 Differential forms and definite integrals

To use differential forms in definite integrals the limits of integration should be thought of as points rather than as numbers. A point may be specified by the value of a coordinate. If x is a coordinate, then x = a and x = b are points. Then Z x=b Z u=q f(g(x))g0(x) dx = f(u) du, (6.30) x=a u=p where p = g(a) and q = g(b). The conditions u = p and u = q may not uniquely specify points defined by x values, but if we are only concerned with variable quantities that depend on u, then they determine the value of the integral. Such a notation allows hybrid expressions in integrals like

Z x=b Z x=b Z x=b 0 f(g(x))g (x) dx = f(u) du = dw = wx=b − wx=a, (6.31) x=a x=a x=a where dw = f(u) du. Example: Suppose x is a coordinate. Then

Z x=b Z x=b Z x=b sin(x2)2x dx = sin(u) du = − d cos(u) = 1 − cos(b2). (6.32) x=0 x=0 x=0 6.7 Rules for differentials

The rule for differentiating a constant is

dc = 0. (6.33)

The rule for differentiating a power function

dxp = pxp−1 dx. (6.34)

The rule for differentiating the exponential function is

deu = eu du. (6.35)

The rule for differentiating the sine function is

d sin(θ) = cos(θ) dθ. (6.36)

The rule for differentiating the cosine function is

d cos(θ) = − sin(θ) dθ. (6.37)

The sum (difference) rule is

d(u ± v) = du ± dv. (6.38) 6.8. RULES FOR INTEGRATION 47

The product rule is d(uv) = du v + u dv. (6.39) The quotient rule is u du v − u dv d = . (6.40) v v2 The chain rule is df(u) = f 0(u) du. (6.41)

6.8 Rules for integration

The fundamental theorem of calculus says that Z dw = w + C. (6.42)

The rule for integrating a constant multiple is Z Z cu dx = c u dx. (6.43)

The sum (difference) rule is Z Z Z (u ± v) dx = u dx ± v dx. (6.44)

The integration by parts rule is Z Z u dv = uv − v du. (6.45)

Suppose u = g(x). Since du = g0(x) dx, the substitution rule is Z Z f(g(x)) g0(x) dx = f(u) du. (6.46)

6.9 Integration examples

With u = x4 + 5 we have du = 4x3. So Z Z Z 1 1 1 1 x3 cos(x4+5) dx = cos(u) du = d sin(u) = sin(u)+C = sin(x4+5)+C. 4 4 4 4 √ (6.47) Since w = 1 + x is inverted by x = (w − 1)2, we have dx = 2(w − 1) dw, and so Z q Z √ √ 2 5 2 3 1 + x dx = w 2(w − 1) dw = 2 w 2 − 2 w 2 + C. (6.48) 5 3 √ To finish substitute w = 1 + x back in terms of x. 48 CHAPTER 6. INTEGRATION RULES

√ The preceding problem could also be done mindlessly by w = 1 + x and √1 dw = 2 x dx. Then Z q Z Z √ √ √ √ 1 + x dx = w 2 x dw = 2 w (w − 1) dw (6.49) is completed as before. One learns from experience that quadratic expressions are simplified by trigonometric substitution. Thus x2 + 9 is simplified by the substitution x = 3 tan(θ). This gives Z Z Z 1 1 1 3 1 1 1 x dx = dθ = dθ = θ+C = arctan( )+C. x2 + 9 9 tan2(θ) + 1 cos2(θ) 3 3 3 3 (6.50) Say x > 0. Then the exponential substitution x = eu gives Z Z 1 dx = du = u + C = ln(x) + C. (6.51) x Let me conclude by thanking Ali Vafaei for help in getting errors out of these notes. Those that remain are mine.