Lecture 10 : Taylor's Theorem
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder
Optimization and Gradient Descent INFO-4604, Applied Machine Learning University of Colorado Boulder September 11, 2018 Prof. Michael Paul Prediction Functions Remember: a prediction function is the function that predicts what the output should be, given the input. Prediction Functions Linear regression: f(x) = wTx + b Linear classification (perceptron): f(x) = 1, wTx + b ≥ 0 -1, wTx + b < 0 Need to learn what w should be! Learning Parameters Goal is to learn to minimize error • Ideally: true error • Instead: training error The loss function gives the training error when using parameters w, denoted L(w). • Also called cost function • More general: objective function (in general objective could be to minimize or maximize; with loss/cost functions, we want to minimize) Learning Parameters Goal is to minimize loss function. How do we minimize a function? Let’s review some math. Rate of Change The slope of a line is also called the rate of change of the line. y = ½x + 1 “rise” “run” Rate of Change For nonlinear functions, the “rise over run” formula gives you the average rate of change between two points Average slope from x=-1 to x=0 is: 2 f(x) = x -1 Rate of Change There is also a concept of rate of change at individual points (rather than two points) Slope at x=-1 is: f(x) = x2 -2 Rate of Change The slope at a point is called the derivative at that point Intuition: f(x) = x2 Measure the slope between two points that are really close together Rate of Change The slope at a point is called the derivative at that point Intuition: Measure the -
Calculus Terminology
AP Calculus BC Calculus Terminology Absolute Convergence Asymptote Continued Sum Absolute Maximum Average Rate of Change Continuous Function Absolute Minimum Average Value of a Function Continuously Differentiable Function Absolutely Convergent Axis of Rotation Converge Acceleration Boundary Value Problem Converge Absolutely Alternating Series Bounded Function Converge Conditionally Alternating Series Remainder Bounded Sequence Convergence Tests Alternating Series Test Bounds of Integration Convergent Sequence Analytic Methods Calculus Convergent Series Annulus Cartesian Form Critical Number Antiderivative of a Function Cavalieri’s Principle Critical Point Approximation by Differentials Center of Mass Formula Critical Value Arc Length of a Curve Centroid Curly d Area below a Curve Chain Rule Curve Area between Curves Comparison Test Curve Sketching Area of an Ellipse Concave Cusp Area of a Parabolic Segment Concave Down Cylindrical Shell Method Area under a Curve Concave Up Decreasing Function Area Using Parametric Equations Conditional Convergence Definite Integral Area Using Polar Coordinates Constant Term Definite Integral Rules Degenerate Divergent Series Function Operations Del Operator e Fundamental Theorem of Calculus Deleted Neighborhood Ellipsoid GLB Derivative End Behavior Global Maximum Derivative of a Power Series Essential Discontinuity Global Minimum Derivative Rules Explicit Differentiation Golden Spiral Difference Quotient Explicit Function Graphic Methods Differentiable Exponential Decay Greatest Lower Bound Differential -
Maxima and Minima
Basic Mathematics Maxima and Minima R Horan & M Lavelle The aim of this document is to provide a short, self assessment programme for students who wish to be able to use differentiation to find maxima and mininima of functions. Copyright c 2004 [email protected] , [email protected] Last Revision Date: May 5, 2005 Version 1.0 Table of Contents 1. Rules of Differentiation 2. Derivatives of order 2 (and higher) 3. Maxima and Minima 4. Quiz on Max and Min Solutions to Exercises Solutions to Quizzes Section 1: Rules of Differentiation 3 1. Rules of Differentiation Throughout this package the following rules of differentiation will be assumed. (In the table of derivatives below, a is an arbitrary, non-zero constant.) y axn sin(ax) cos(ax) eax ln(ax) dy 1 naxn−1 a cos(ax) −a sin(ax) aeax dx x If a is any constant and u, v are two functions of x, then d du dv (u + v) = + dx dx dx d du (au) = a dx dx Section 1: Rules of Differentiation 4 The stationary points of a function are those points where the gradient of the tangent (the derivative of the function) is zero. Example 1 Find the stationary points of the functions (a) f(x) = 3x2 + 2x − 9 , (b) f(x) = x3 − 6x2 + 9x − 2 . Solution dy (a) If y = 3x2 + 2x − 9 then = 3 × 2x2−1 + 2 = 6x + 2. dx The stationary points are found by solving the equation dy = 6x + 2 = 0 . dx In this case there is only one solution, x = −1/3. -
Infinitesimal Calculus
Infinitesimal Calculus Δy ΔxΔy and “cannot stand” Δx • Derivative of the sum/difference of two functions (x + Δx) ± (y + Δy) = (x + y) + Δx + Δy ∴ we have a change of Δx + Δy. • Derivative of the product of two functions (x + Δx)(y + Δy) = xy + Δxy + xΔy + ΔxΔy ∴ we have a change of Δxy + xΔy. • Derivative of the product of three functions (x + Δx)(y + Δy)(z + Δz) = xyz + Δxyz + xΔyz + xyΔz + xΔyΔz + ΔxΔyz + xΔyΔz + ΔxΔyΔ ∴ we have a change of Δxyz + xΔyz + xyΔz. • Derivative of the quotient of three functions x Let u = . Then by the product rule above, yu = x yields y uΔy + yΔu = Δx. Substituting for u its value, we have xΔy Δxy − xΔy + yΔu = Δx. Finding the value of Δu , we have y y2 • Derivative of a power function (and the “chain rule”) Let y = x m . ∴ y = x ⋅ x ⋅ x ⋅...⋅ x (m times). By a generalization of the product rule, Δy = (xm−1Δx)(x m−1Δx)(x m−1Δx)...⋅ (xm −1Δx) m times. ∴ we have Δy = mx m−1Δx. • Derivative of the logarithmic function Let y = xn , n being constant. Then log y = nlog x. Differentiating y = xn , we have dy dy dy y y dy = nxn−1dx, or n = = = , since xn−1 = . Again, whatever n−1 y dx x dx dx x x x the differentials of log x and log y are, we have d(log y) = n ⋅ d(log x), or d(log y) n = . Placing these values of n equal to each other, we obtain d(log x) dy d(log y) y dy = . -
Calculus Formulas and Theorems
Formulas and Theorems for Reference I. Tbigonometric Formulas l. sin2d+c,cis2d:1 sec2d l*cot20:<:sc:20 +.I sin(-d) : -sitt0 t,rs(-//) = t r1sl/ : -tallH 7. sin(A* B) :sitrAcosB*silBcosA 8. : siri A cos B - siu B <:os,;l 9. cos(A+ B) - cos,4cos B - siuA siriB 10. cos(A- B) : cosA cosB + silrA sirrB 11. 2 sirrd t:osd 12. <'os20- coS2(i - siu20 : 2<'os2o - I - 1 - 2sin20 I 13. tan d : <.rft0 (:ost/ I 14. <:ol0 : sirrd tattH 1 15. (:OS I/ 1 16. cscd - ri" 6i /F tl r(. cos[I ^ -el : sitt d \l 18. -01 : COSA 215 216 Formulas and Theorems II. Differentiation Formulas !(r") - trr:"-1 Q,:I' ]tra-fg'+gf' gJ'-,f g' - * (i) ,l' ,I - (tt(.r))9'(.,') ,i;.[tyt.rt) l'' d, \ (sttt rrJ .* ('oqI' .7, tJ, \ . ./ stll lr dr. l('os J { 1a,,,t,:r) - .,' o.t "11'2 1(<,ot.r') - (,.(,2.r' Q:T rl , (sc'c:.r'J: sPl'.r tall 11 ,7, d, - (<:s<t.r,; - (ls(].]'(rot;.r fr("'),t -.'' ,1 - fr(u") o,'ltrc ,l ,, 1 ' tlll ri - (l.t' .f d,^ --: I -iAl'CSllLl'l t!.r' J1 - rz 1(Arcsi' r) : oT Il12 Formulas and Theorems 2I7 III. Integration Formulas 1. ,f "or:artC 2. [\0,-trrlrl *(' .t "r 3. [,' ,t.,: r^x| (' ,I 4. In' a,,: lL , ,' .l 111Q 5. In., a.r: .rhr.r' .r r (' ,l f 6. sirr.r d.r' - ( os.r'-t C ./ 7. /.,,.r' dr : sitr.i'| (' .t 8. tl:r:hr sec,rl+ C or ln Jccrsrl+ C ,f'r^rr f 9. -
Calculus 120, Section 7.3 Maxima & Minima of Multivariable Functions
Calculus 120, section 7.3 Maxima & Minima of Multivariable Functions notes by Tim Pilachowski A quick note to start: If you’re at all unsure about the material from 7.1 and 7.2, now is the time to go back to review it, and get some help if necessary. You’ll need all of it for the next three sections. Just like we had a first-derivative test and a second-derivative test to maxima and minima of functions of one variable, we’ll use versions of first partial and second partial derivatives to determine maxima and minima of functions of more than one variable. The first derivative test for functions of more than one variable looks very much like the first derivative test we have already used: If f(x, y, z) has a relative maximum or minimum at a point ( a, b, c) then all partial derivatives will equal 0 at that point. That is ∂f ∂f ∂f ()a b,, c = 0 ()a b,, c = 0 ()a b,, c = 0 ∂x ∂y ∂z Example A: Find the possible values of x, y, and z at which (xf y,, z) = x2 + 2y3 + 3z 2 + 4x − 6y + 9 has a relative maximum or minimum. Answers : (–2, –1, 0); (–2, 1, 0) Example B: Find the possible values of x and y at which fxy(),= x2 + 4 xyy + 2 − 12 y has a relative maximum or minimum. Answer : (4, –2); z = 12 Notice that the example above asked for possible values. The first derivative test by itself is inconclusive. The second derivative test for functions of more than one variable is a good bit more complicated than the one used for functions of one variable. -
Mean Value, Taylor, and All That
Mean Value, Taylor, and all that Ambar N. Sengupta Louisiana State University November 2009 Careful: Not proofread! Derivative Recall the definition of the derivative of a function f at a point p: f (w) − f (p) f 0(p) = lim (1) w!p w − p Derivative Thus, to say that f 0(p) = 3 means that if we take any neighborhood U of 3, say the interval (1; 5), then the ratio f (w) − f (p) w − p falls inside U when w is close enough to p, i.e. in some neighborhood of p. (Of course, we can’t let w be equal to p, because of the w − p in the denominator.) In particular, f (w) − f (p) > 0 if w is close enough to p, but 6= p. w − p Derivative So if f 0(p) = 3 then the ratio f (w) − f (p) w − p lies in (1; 5) when w is close enough to p, i.e. in some neighborhood of p, but not equal to p. Derivative So if f 0(p) = 3 then the ratio f (w) − f (p) w − p lies in (1; 5) when w is close enough to p, i.e. in some neighborhood of p, but not equal to p. In particular, f (w) − f (p) > 0 if w is close enough to p, but 6= p. w − p • when w > p, but near p, the value f (w) is > f (p). • when w < p, but near p, the value f (w) is < f (p). Derivative From f 0(p) = 3 we found that f (w) − f (p) > 0 if w is close enough to p, but 6= p. -
Integration by Parts
3 Integration By Parts Formula ∫∫udv = uv − vdu I. Guidelines for Selecting u and dv: (There are always exceptions, but these are generally helpful.) “L-I-A-T-E” Choose ‘u’ to be the function that comes first in this list: L: Logrithmic Function I: Inverse Trig Function A: Algebraic Function T: Trig Function E: Exponential Function Example A: ∫ x3 ln x dx *Since lnx is a logarithmic function and x3 is an algebraic function, let: u = lnx (L comes before A in LIATE) dv = x3 dx 1 du = dx x x 4 v = x 3dx = ∫ 4 ∫∫x3 ln xdx = uv − vdu x 4 x 4 1 = (ln x) − dx 4 ∫ 4 x x 4 1 = (ln x) − x 3dx 4 4 ∫ x 4 1 x 4 = (ln x) − + C 4 4 4 x 4 x 4 = (ln x) − + C ANSWER 4 16 www.rit.edu/asc Page 1 of 7 Example B: ∫sin x ln(cos x) dx u = ln(cosx) (Logarithmic Function) dv = sinx dx (Trig Function [L comes before T in LIATE]) 1 du = (−sin x) dx = − tan x dx cos x v = ∫sin x dx = − cos x ∫sin x ln(cos x) dx = uv − ∫ vdu = (ln(cos x))(−cos x) − ∫ (−cos x)(− tan x)dx sin x = −cos x ln(cos x) − (cos x) dx ∫ cos x = −cos x ln(cos x) − ∫sin x dx = −cos x ln(cos x) + cos x + C ANSWER Example C: ∫sin −1 x dx *At first it appears that integration by parts does not apply, but let: u = sin −1 x (Inverse Trig Function) dv = 1 dx (Algebraic Function) 1 du = dx 1− x 2 v = ∫1dx = x ∫∫sin −1 x dx = uv − vdu 1 = (sin −1 x)(x) − x dx ∫ 2 1− x ⎛ 1 ⎞ = x sin −1 x − ⎜− ⎟ (1− x 2 ) −1/ 2 (−2x) dx ⎝ 2 ⎠∫ 1 = x sin −1 x + (1− x 2 )1/ 2 (2) + C 2 = x sin −1 x + 1− x 2 + C ANSWER www.rit.edu/asc Page 2 of 7 II. -
Math 105: Multivariable Calculus Seventeenth Lecture (3/17/10)
Daily Summary Fast Taylor Series Critical Points and Extrema Lagrange Multipliers Math 105: Multivariable Calculus Seventeenth Lecture (3/17/10) Steven J Miller Williams College [email protected] http://www.williams.edu/go/math/sjmiller/ public html/341/ Bronfman Science Center Williams College, March 17, 2010 1 Daily Summary Fast Taylor Series Critical Points and Extrema Lagrange Multipliers Summary for the Day 2 Daily Summary Fast Taylor Series Critical Points and Extrema Lagrange Multipliers Summary for the day Fast Taylor Series. Critical Points and Extrema. Constrained Maxima and Minima. 3 Daily Summary Fast Taylor Series Critical Points and Extrema Lagrange Multipliers Fast Taylor Series 4 Daily Summary Fast Taylor Series Critical Points and Extrema Lagrange Multipliers Formula for Taylor Series Notation: f twice differentiable function ∂f ∂f Gradient: f =( ∂ ,..., ∂ ). ∇ x1 xn Hessian: ∂2f ∂2f ∂x1∂x1 ∂x1∂xn . ⋅⋅⋅. Hf = ⎛ . .. ⎞ . ∂2f ∂2f ⎜ ∂ ∂ ∂ ∂ ⎟ ⎝ xn x1 ⋅⋅⋅ xn xn ⎠ Second Order Taylor Expansion at −→x 0 1 T f (−→x 0) + ( f )(−→x 0) (−→x −→x 0)+ (−→x −→x 0) (Hf )(−→x 0)(−→x −→x 0) ∇ ⋅ − 2 − − T where (−→x −→x 0) is the row vector which is the transpose of −→x −→x 0. − − 5 Daily Summary Fast Taylor Series Critical Points and Extrema Lagrange Multipliers Example 3 Let f (x, y)= sin(x + y) + (x + 1) y and (x0, y0) = (0, 0). Then ( f )(x, y) = cos(x + y)+ 3(x + 1)2y, cos(x + y) + (x + 1)3 ∇ and sin(x + y)+ 6(x + 1)2y sin(x + y)+ 3(x + 1)2 (Hf )(x, y) = − − , sin(x + y)+ 3(x + 1)2 sin(x + y) − − so 0 3 f (0, 0)= 0, ( f )(0, 0) = (1, 2), (Hf )(0, 0)= ∇ 3 0 which implies the second order Taylor expansion is 1 0 3 x 0 + (1, 2) (x, y)+ (x, y) ⋅ 2 3 0 y 1 3y = x + 2y + (x, y) = x + 2y + 3xy. -
Derivative, Gradient, and Lagrange Multipliers
EE263 Prof. S. Boyd Derivative, Gradient, and Lagrange Multipliers Derivative Suppose f : Rn → Rm is differentiable. Its derivative or Jacobian at a point x ∈ Rn is × denoted Df(x) ∈ Rm n, defined as ∂fi (Df(x))ij = , i =1, . , m, j =1,...,n. ∂x j x The first order Taylor expansion of f at (or near) x is given by fˆ(y)= f(x)+ Df(x)(y − x). When y − x is small, f(y) − fˆ(y) is very small. This is called the linearization of f at (or near) x. As an example, consider n = 3, m = 2, with 2 x1 − x f(x)= 2 . x1x3 Its derivative at the point x is 1 −2x2 0 Df(x)= , x3 0 x1 and its first order Taylor expansion near x = (1, 0, −1) is given by 1 1 1 0 0 fˆ(y)= + y − 0 . −1 −1 0 1 −1 Gradient For f : Rn → R, the gradient at x ∈ Rn is denoted ∇f(x) ∈ Rn, and it is defined as ∇f(x)= Df(x)T , the transpose of the derivative. In terms of partial derivatives, we have ∂f ∇f(x)i = , i =1,...,n. ∂x i x The first order Taylor expansion of f at x is given by fˆ(x)= f(x)+ ∇f(x)T (y − x). 1 Gradient of affine and quadratic functions You can check the formulas below by working out the partial derivatives. For f affine, i.e., f(x)= aT x + b, we have ∇f(x)= a (independent of x). × For f a quadratic form, i.e., f(x)= xT Px with P ∈ Rn n, we have ∇f(x)=(P + P T )x. -
Lecture 13: Extrema
Math S21a: Multivariable calculus Oliver Knill, Summer 2013 Lecture 13: Extrema An important problem in multi-variable calculus is to extremize a function f(x, y) of two vari- ables. As in one dimensions, in order to look for maxima or minima, we consider points, where the ”derivative” is zero. A point (a, b) in the plane is called a critical point of a function f(x, y) if ∇f(a, b) = h0, 0i. Critical points are candidates for extrema because at critical points, all directional derivatives D~vf = ∇f · ~v are zero. We can not increase the value of f by moving into any direction. This definition does not include points, where f or its derivative is not defined. We usually assume that a function is arbitrarily often differentiable. Points where the function has no derivatives are not considered part of the domain and need to be studied separately. For the function f(x, y) = |xy| for example, we would have to look at the points on the coordinate axes separately. 1 Find the critical points of f(x, y) = x4 + y4 − 4xy + 2. The gradient is ∇f(x, y) = h4(x3 − y), 4(y3 − x)i with critical points (0, 0), (1, 1), (−1, −1). 2 f(x, y) = sin(x2 + y) + y. The gradient is ∇f(x, y) = h2x cos(x2 + y), cos(x2 + y) + 1i. For a critical points, we must have x = 0 and cos(y) + 1 = 0 which means π + k2π. The critical points are at ... (0, −π), (0, π), (0, 3π),.... 2 2 3 The graph of f(x, y) = (x2 + y2)e−x −y looks like a volcano. -
Chain Rule & Implicit Differentiation
INTRODUCTION The chain rule and implicit differentiation are techniques used to easily differentiate otherwise difficult equations. Both use the rules for derivatives by applying them in slightly different ways to differentiate the complex equations without much hassle. In this presentation, both the chain rule and implicit differentiation will be shown with applications to real world problems. DEFINITION Chain Rule Implicit Differentiation A way to differentiate A way to take the derivative functions within of a term with respect to functions. another variable without having to isolate either variable. HISTORY The Chain Rule is thought to have first originated from the German mathematician Gottfried W. Leibniz. Although the memoir it was first found in contained various mistakes, it is apparent that he used chain rule in order to differentiate a polynomial inside of a square root. Guillaume de l'Hôpital, a French mathematician, also has traces of the chain rule in his Analyse des infiniment petits. HISTORY Implicit differentiation was developed by the famed physicist and mathematician Isaac Newton. He applied it to various physics problems he came across. In addition, the German mathematician Gottfried W. Leibniz also developed the technique independently of Newton around the same time period. EXAMPLE 1: CHAIN RULE Find the derivative of the following using chain rule y=(x2+5x3-16)37 EXAMPLE 1: CHAIN RULE Step 1: Define inner and outer functions y=(x2+5x3-16)37 EXAMPLE 1: CHAIN RULE Step 2: Differentiate outer function via power rule y’=37(x2+5x3-16)36 EXAMPLE 1: CHAIN RULE Step 3: Differentiate inner function and multiply by the answer from the previous step y’=37(x2+5x3-16)36(2x+15x2) EXAMPLE 2: CHAIN RULE A biologist must use the chain rule to determine how fast a given bacteria population is growing at a given point in time t days later.