Interpolation & Polynomial Approximation

Home , Newton polynomial, Polynomial interpolation

The next problem we want to investigate is the problem of finding a simple • function such as a polynomial, trigonometric function or rational function which approximates either a discrete set of (n + 1) data points (x ,f ), (x ,f ), (x ,f ) 0 0 1 1 n n or a more complicated function f(x). We saw that we can approximate a set of discrete data by a linear least • squares approach. In this case we make the assumption that the data behaves in a certain way such as linearly or quadratically. Now we take a different approach where we find a function which agrees with the data at each of the given points, i.e., it interpolates the data. The problem we are going to consider initially is to find an interpolating • polynomial for a set of discrete data (xi,fi). The data can be obtained from two main sources. First, suppose we have a • continuous function f(x) and we select (n + 1) distinct points x [a,b], i ∈ and then let fi = f(xi). In this way we are using interpolation as a means to approximate a continuous function by a simpler function.

Secondly, the data (xi,fi) may come from observations, e.g., from measure- • ments. In this case we do not know a continuous function f(x) but rather its value at (n + 1) points. If the data is observed then this is the ﬁxed data we must work with. However, • if the data arises from evaluating a known function f(x) at (n + 1) points, then we can choose the points uniformly or some other sampling. Polynomial interpolation of discrete data: Given n+1 distinct points (x ,f ), (x ,f ), (x ,f ) 0 0 1 1 n n ﬁnd a polynomial of degree n, say pn, that interpolates the given data, i.e., such that ≤

pn(xi)= fi, for i =0, 1, 2,...,n

The polynomial pn(x) is called the interpolant of the data fi ,i = 1,...,n +1. The points xi , i = 0, 1,...,n are {called} the interpolation points. { }

This means that the graph of the polynomial passes through our n+1 distinct • data points. Let denote the space of all polynomials of degree n or less. Then any • Pn polynomial in n can be written in terms of the monomial basis for n 1,x,x2,...,xnPas P 2 n 1 n a0 + a1x + a2x + + an 1x − + anx − Note that we have (n +1) coeﬃcients to determine and (n +1) interpolation • conditions to satisfy.

Why do we choose n to be the space of all polynomials with degree n • and not just equal toPn? ≤ In the first figure we have two data points so we find the linear polynomial • which passes through these two points. In the second figure we have three data points so we find the quadratic poly- • nomial which passes through these three points. In the third and fourth figures we have four and five points so we find a cubic • and a quartic, respectively. The interpolation conditions pn(xi)= fi, i =0,...,n can be gathered together into the linear system of algebraic equations given by

2 3 n 1 n 1 x0 x0 x0 x0− x0 a0 f0 2 3 n 1 n 1 x1 x1 x1 x1− x1 a1 f1  2 3 n 1 n      1 x x x x − x a f 2 2 2 2 2 2 = 2 . (1)  ......   .   .   2 3 n 1 n       1 xn 1 xn 1 xn 1 xn−1 xn 1   an 1   fn 1   − −2 −3 n−1 −n   −   −   1 x x x x − x   an   fn   n n n n n            The (n + 1) (n + 1) coeﬃcient matrix of this linear system is referred as the interpolation× matrix or, more precisely, the interpolation matrix corresponding to j n the monomial basis x j=0. In this case, the interpolation matrix is known as the Vandermonde matrix{ }. It is not diﬃcult to show that Vandermonde determinant, i.e., the determinant of the matrix in (1), is given by 0 k

Because the determinant of coefficient matrix of the linear system (1) is non-zero, that matrix is non-singular so that there exists a unique set of numbers a0, a1, a2, a3,..., an 1, an that solve those linear systems. But those numbers are the − coefficients of the interpolant pn(x) so that we have proved the following result. Theorem. Existence and uniqueness of the polynomial inter- n polant. Given the data pairs xj,fj j=0, where the points x n are distinct, there exists{ a unique} polynomial p (x) { j}j=0 n ∈ n that satisfies the interpolation conditions pn(xi) = fi, iP=0, 1, 2,...,n.

n It seems that we are done: given the data pairs xj,fj j=0, we can construct the interpolating polynomial of degree less than{ or equal} to n by ﬁrst solving the linear system (1) for the coeﬃcients a n and then substituting them { i}j=0 into the expression for pn. There are two reasons why we are not done. First, the Vandermonde matrix is extremely ill conditioned; in general, the condition number grows exponentially as n increases. The second reason is that there are ways to construct the interpolant without having to solve a matrix problem. What do we want to accomplish?

1. First, we want to determine an efficient way to determine the polynomial, i.e., determine the coefficients a ,a , ,a . 0 1 n 2. Once we have determined the coefficients, we want to determine how to efficiently evaluate the polynomial at a point.

3. If we decide to add points to our data set, then we don’t want to start our calculation from scratch. Since the polynomial is unique we choose to evaluate it any way that works and is eﬃcient.

4. We want to decide if this is a good approach to interpolating a set of data or a function. If not, we want to look at alternative ways to interpolate. 5. If the data is obtained by evaluating a function at points, is there an “optimal” way to choose these points? Some Simple Examples

Example As a ﬁrst example, ﬁnd the linear polynomial p1 which passes through the two points (1, 5), (5, 3). −

Clearly if p1 passes through these two points we have p (1) = 5 p (5) = 3 1 − 1 So if p1 = a0 + a1x we must satisfy a + a (1) = 5 a +5a =3 0 1 − 0 1 Solving these two equations we get a0 = 7 and a1 =2 so we have the polynomial p =2x 7. − 1 −

Another approach to ﬁnd p1 is to write it as the sum of two linear polynomials ℓi(x), i =1, 2 with the special properties

1 if x = x1 0 if x = x1 ℓ1(x)= ℓ2(x)= 0 if x = x2 1 if x = x2

If we determine ℓi(x), i =1, 2 then the linear polynomial which interpolates the two data points is just p (x)= 5ℓ (x)+3ℓ (x) 1 − 1 2 because p (x )= 5 and p (x )=3. 1 1 − 1 2 Now we can immediately write down these two linear polynomials x x x 5 x x x 1 ℓ (x)= − 2 = − ℓ (x)= − 1 = − 1 x x 4 2 x x 4 1 − 2 − 2 − 1 Thus the interpolating polynomial is 5 3 (x 5) + (x 1)=2x 7 4 − 4 − − Of course we got the same polynomial as before because it is unique. Example As a second example, ﬁnd the quadratic that passes through the 3 points (1, 10), (2, 19), (-1,-14).

2 Clearly if p2 = a0 + a1x + a2x passes through these three points we have p (1) = 10 p (2) = 19 p ( 1) = 14 2 2 2 − − or equivalently we must satisfy a +a (1)+a (1)2 = 10 a +2 a +(2)2a = 19 a +a ( 1)+a ( 1)2 = 14 0 1 2 0 1 2 0 1 − 2 − − Solving these three equations we get the polynomial p = x2 + 12x 1. 2 − −

Alternately, we can write p2(x) in terms of three quadratic polynomials ℓi(x), i =1, 2, 3 where p (x)=10ℓ (x)+19ℓ (x) 14ℓ (x) 2 1 2 − 3 with

1 x = x1 1 x = x2 1 x = x3 ℓ1(x)= ℓ2(x)= ℓ3(x)= 0 x = x2 or x3 0 x = x1 or x3 0 x = x1 or x2 Clearly (x 2)(x + 1) (x 1)(x + 1) (x 1)(x 2) ℓ (x)= − ℓ (x)= − ℓ (x)= − − 1 (1 2)(1 + 1) 2 (2 1)(2 + 1) 3 ( 1 1)( 1 2) − − − − − − As you can see, if we proceed in the manner when we directly impose the • interpolation conditions then to ﬁnd a polynomial of degree n that passes through the given n +1 points we have to solve n +1 linear equations for the n +1 unknowns a0,a1,...,an.

However, if we take the approach of writing pn as the sum of (n +1) polyno- • mials of degree n then we avoid solving the linear system. There are basically two main efficient approaches for determining the interpo- • lating polynomial. Each has advantages in certain circumstances. Since we are guaranteed that we will get the same polynomial using different • approaches (since the polynomial is unique) it is just a question of which is a more efficient implementation. Lagrange Form of the Interpolating Polynomial

We want to generalize the procedure of the previous examples which avoids • solving a linear system to ﬁnd the interpolating polynomial. This approach is called the Lagrange form of the interpolating polynomial. Assume we have (n + 1) data points (x ,f ), i = 0, 1, 2,...,n where the x • i i i are distinct. Let ℓi(x) be the n degree polynomial which satisﬁes the “delta conditions”

1 if x = xi ℓi(x)= or equivalently ℓi(xj)= δij . 0 if x = x , for j = i j If we can write each of these polynomials then the interpolating polynomial is just n

pn(x)= fiℓi(x) . i=0 How do we form ℓ (x)? Since it must be zero at each x , j = i then the • i j numerator is just the product of the factors (x x ), i.e., − j (x x0)(x x1) (x xi 1)(x xi+1) (x xn) − − − − − − and because it must be one at x = xi the denominator is just the numerator evaluated at xi, i.e.,

(xi x0)(xi x1) (xi xi 1)(xi xi+1) (xi xn) − − − − − − Given n +1 distinct points (x ,f ), (x ,f ), (x ,f ) . 1 1 2 2 n+1 n+1 The Lagrange form of the interpolating polynomial is n

pn = fiℓi(x) , i=0 where (x x1)(x x2) (x xi 1)(x xi+1) (x xn+1) ℓi(x)= − − − − − − (xi x1)(xi x2) (xi xi 1)(xi xi+1) (xi xn+1) − − − − − −

Each ℓ (x) is a polynomial of degree n and satisﬁes ℓ (x )= δ . • i i j ij The space n has dimension (n + 1). The polynomials ℓi(x) belong to n; • there are (nP + 1) of them and they are linearly independent so they formP another basis for . They are called the Lagrange basis. Pn The interpolation conditions • n

fj = ciℓi(xj) imply fj = cj i=0 which gives the form n

pn = fiℓi(x) . i=0 Using the Lagrange form of the interpolation polynomial doesn’t require solving a linear system so instead of the coeﬃcient matrix being the ill-conditioned Vandermonde matrix it is the perfectly conditioned identity matrix! The uniqueness of the Lagrange interpolating polynomial can be proved with- • out resorting to the Vandermonde system (1) or, for that matter, resorting to any speciﬁc method for constructing that polynomial. Suppose we have two polynomials p(x) and p(x) , both of which ∈ Pn ∈Pn satisfy the interpolating conditions ; i.e., we have p(xj) = p(xj) = fj for j =0, 1,...,n. Then, p(x) p(x) is also a polynomial of degree less than or equal to n and we have that−p(x ) p(x )=0 for j =0, 1,...,n. j − j Thus, p(x) p(x) is a polynomial of degree less than or equal to n that vanishes at (−n + 1) distinct points; such a polynomial must vanish for all x, i.e., p(x) p(x)=0 for all x so that p(x)= p(x)=0. − The polynomials

(x x1)(x x2) (x xi 1)(x xi+1) (x xn+1) ℓi(x)= − − − − − − (xi x1)(xi x2) (xi xi 1)(xi xi+1) (xi xn+1) − − − − − − are sometimes written as

W (x) ℓ (x)= for j =0,...,n. j (x x )W (x ) − j ′ j where n W (x)= (x x ), − k k=0 Example. Evaluate the interpolating polynomial for the data (0, 4), (1, 5), ( 2, 10), (4, 128) − − at x = 2 and x = 3. In practice we will typically evaluate an interpolating polynomial at points instead of writing out the polynomials; e.g., when we plot the polynomial we simply evaluate it at many points and connect the points to get the curve.

Because we have 4 points we know that we will have a cubic (or less degree) polynomial. The polynomial evaluated at x =2 is p =4ℓ (2)+5ℓ (2) 10ℓ (2) + 128ℓ (2). 3 1 2 − 3 4 The coeﬃcients of the Lagrange polynomial at x =2 can be found by evaluating the numerator of each ℓi at x =2 and then dividing by the denominator of ℓi (x 1)(x + 2)(x 4) 1 4 2 ℓ (x)= − − ℓ (2) = − = 1 1 (0 1)(0 + 2)(0 4) ⇒ 1 8 − − − (x 0)(x + 2)(x 4) 2 4 2 16 ℓ (x)= − − ℓ (2) = − = 2 (1 0)(1 + 2)(1 4) ⇒ 2 9 9 − − − (x 0)(x 1)(x 4) 2 1 2 1 ℓ (x)= − − − ℓ (2) = − = 3 ( 2 0)( 2 1)( 2 4) ⇒ 3 36 9 − − − − − − − (x 0)(x 1)(x + 2) 2 1 4 8 1 ℓ (x)= − − ℓ (2) = = = 4 (4 0)(4 1)(4 + 2) ⇒ 4 24 3 72 9 − − Therefore 16 1 1 36+80 10 + 128 p (2) = 4( 1)+5( ) 10( ) + 128( )= − − = 18 3 − 9 − 9 9 9

Now to evaluate p3(3) note that the denominators of the ℓi remain the same as well as the coeﬃcients 4, 5, 10, 128 in the expansion; the only thing that − changes is the numerator of each ℓi because we are evaluating it at a diﬀerent x-value. 2 5 1 3 4 1 ℓ (3) = − ℓ (3) = − 1 8 2 9 3 2 1 3 −2 5 ℓ (3) = − ℓ (3) = 3 36 4 72 −

Then p3(3) is given by 4 5 10 128 p (3) = ( 10) + ( 12) + − ( 6) + (30) = 55 3 8 − 9 − 36 − 72 − − For homework you are going to investigate an eﬃcient way to implement the Lagrange interpolating polynomial assuming you just want to evaluate it at given points. Newton Form of the Interpolating Polynomial

This approach to determining the interpolating polynomial using divided dif- • ferences and allows adding data without starting over. Recall that when we wrote the Lagrange form of say the third degree inter- • polating polynomial we wrote it in terms of sums of cubic polynomials. The Newton form takes a diﬀerent approach. The Newton form of the line passing through (x ,f ), (x ,f ) is • 0 0 1 1 p (x)= a + a (x x ) 1 0 1 − 0 for appropriate constants a0,a1.

The Newton form of the parabola passing through (x0,f0), (x1,f1), (x2,f2) • is p (x)= a + a (x x )+ a (x x )(x x ) 2 0 1 − 0 2 − 0 − 1 for appropriate constants a0,a1,a2. The general form of the Newton polynomial passing through (xi,fi), i = • 0, 1,...,n is p (x)= a +a (x x )+a (x x )(x x )+ +a (x x )(x x ) (x x ) n 0 1 − 0 2 − 0 − 1 n − 0 − 1 − n for appropriate constants a0,a1,...,an.

Note ﬁrst that when we evaluate pn(x0) all terms in the expansion disappear • so that f0 = pn(x0)= a0 . If we evaluate p (x ) then all terms starting at a term disappear to get • n 1 2 f f f = p (x )= f + a (x x ) a = 1 − 0 . 1 n 1 0 1 1 − 0 ⇒ 1 x x 1 − 0 If we evaluate p (x ) then all terms starting at a term disappear to get • n 2 3

f1 f0 f2 = pn(x2)= f0 + − (x2 x0)+ a2(x2 x0)(x2 x1) x1 x0 − − − − f1 f0 f2 f0 x −x (x2 x0) a = − − 1− 0 − . ⇒ 2 (x x )(x x ) 2 − 0 2 − 1

f f f f 2 − 1 1 − 0 x x − x x a = 2 − 1 1 − 0 ⇒ 2 x x 2 − 0 The ﬁrst divided diﬀerence of f with respect to x , x is • i i+1 f(xi+1) f(xi) f[xi,xi+1]= − xi+1 xi so that −

a1 = f[x0,x1], the first divided difference of f with respect to x0,x1 The second divided difference of f with respect to x , x , x is • i i+1 i+1 f[xi+1,xi+2] f[xi,xi+1] f[xi,xi+1,xi+2]= − . xi+2 xi so that −

a2 = f[x0,x1,x2], the second divided diﬀerence of f with respect to x0,x1,x2 In general, a is the kth divided diﬀerence of f with respect to x ,x ,x ,...,x . • k 0 1 2 k The following example illustrates the use of this approach and the ease at • which we can add additional data points.

Example Compute the coeﬃcients for the Newton interpolating polynomial at the points (0, 4), (1, 5), (3, 15) and then evaluate the polynomial at x =2. Then add the data point (4, 40) and repeat the calculation.

We make a divided diﬀerence table xi f[xi] f[xi 1,xi] f[xi 2,xi 1,xi] − − − 04 - - 5 4 1 5 − =1 - 1 0 − 15 5 5 1 4 3 15 − =5 − = 3 1 3 0 3 − − Thus the interpolating polynomial p2(x) is p (x)= f(x )+ f[x ,x ](x x )+ f[x ,x ,x ](x x )(x x ) 2 0 0 1 − 0 0 1 2 − 0 − 1 4 4 26 =4+1(x 0) + (x 0)(x 1) p (2)=4+2+ (2)(1) = − 3 − − ⇒ 2 3 3 To add another data point we simply add another row and column to our table. xi f[xi] f[xi 1,xi] f[xi 2,xi 1,xi] f[xi 3,xi 2,xi 1,xi] − − − − − − 04 - - 5 4 1 5 − =1 - 1 0 − 15 5 5 1 4 3 15 − =5 − = 3 1 3 0 3 − − 40 15 25 5 20 20/3 4/3 16 4 4 40 − = 25 − = − = = 4 3 4 1 3 4 0 12 3 − − −

Thus the interpolating polynomial p2(x) is p (x)= f(x )+f[x ,x ](x x )+f[x ,x ,x ](x x )(x x )+f[x ,x ,x ,x ](x x )(x 2 0 0 1 − 0 0 1 2 − 0 − 1 0 1 2 3 − 0 − 4 4 =4+1(x 0) + (x 0)(x 1) + (x 0)(x 1)(x 3) − 3 − − 3 − − − 4 4 8 8 p (2) = 4+1(2)+ (2)(1) + (2)(1)( 1) p (2)=6+ =6 2 3 3 − ⇒ 2 3 − 3 Note that to add a data point we used all of our previous calculations. If we use the Lagrange form of the interpolating polynomial then when we add a data point we have to start over with the calculations. Measuring Errors

We have used phrases such as the “difference between two functions” or the “error in an approximation of a function.” We want to quantify these notions, i.e., precisely say what we mean by these phrases, specifically, say how we are going to measure such differences and errors. In our previous error calculations we had a vector of errors at points. Here f(x) and p(x) are functions defined everywhere on some interval [a,b] so instead using a vector norm to measure the error we want an error which measures how well we are approximating f(x) over the entire interval. In our context, we are given a function f(x) and then we constructed a simpler function (the interpolating polynomial) p(x) that is hopefully “close” to f(x); we want to quantify what it means to be “close,” or equivalently, quantify what it means for the difference or error f(x) p(x) to be “small.” − There are many ways to measure the closeness of two functions; we use two such ways in these notes. Continuous norms

Let g(x) be a square integrable function deﬁned on the interval [a,b], we can measure its size using the L2(a,b) or root-mean square norm b 1/2 2 g(x) L2(a,b) = g(x) dx , a | |

Example Let g(x) = sin πx on [0, 1] ; calculate L2(0, 1) norm of g(x).

1 1 1 g(x) 2 = sin(πx) 2 dx = 1 cos(2πx) dx L2(a,b) | | 2 − 0 0 x sin2πx x=1 1 √2 = = = g(x) L2(a,b) = 0.707 2 − 4π x=0 2 ⇒ 2 ≈ Let g(x) be an integrable function deﬁned on the interval [a,b], we can measure its size using the L∞[a,b] or uniform or maximum or pointwise norm

g(x) L [a,b] = max g(x) , ∞ x [a,b] | | ∈

Example Let g(x) = sin πx on [0, 1] ; calculate L∞[a,b] norm of g(x).

sin πx L [0,1] = max sin πx =1 ∞ x [0,1] | | ∈ When we looked at vector norms, we saw that all vector norms are equivalent. This is because they are norms on the finite dimensional space (i.e., we can find a basis) IRn. This is true for all finite dimensional spaces. However, here we are dealing with infinite dimensional spaces.

2 What is the relationship between norms in L and in L∞? For a continuous function g(x), b b 2 2 2 g L2 = g(x) dx [max g(x) dx | | ≤ x [a,b] | | a a ∈ b 2 2 2 = max g(x) dx =(b a) max g(x) =(b a) g L x [a,b] | | − x [a,b] | | − ∞ ∈ a ∈ so that 1/2 g 2 (b a) g . L (a,b) ≤ − L∞[a,b] Now if we let g(x)= f(x) p(x), this bound implies that so long as the interval [a,b] is ﬁnite (i.e, we have−

The converse, however, is not true, i.e., close approximation in the L2(a,b) norm does not imply close approximation in the L∞[a,b] norm.

Example This example illustrates that close approximation in the L2(a,b) norm does not imply close approximation in the L∞[a,b] norm.

Consider the function f(x)=1 on the interval [0, 1] and the continuous function fε(x) as depicted in the ﬁgure, where 0 < ε < 1. We have that 1/2 f(x) f (x) 2 = ε and f(x) f (x) =1. − ε L (0,1) − ε L∞[0,1])

2 If ε 1, then f(x) fε(x) L (0,1) is small but f(x) fε(x) L∞[0,1] =1, i.e., ≪ 2 − − small L (0, 1) norm does not imply small L∞[0, 1] norm. f(x)

fε(x)

x ε 1

Small L∞[a,b] norm implies that the function is small at every point in the interval [a,b] whereas small L2(a,b) norm only implies that the function is small in some averaged sense.

In this example, we see that fε(x) is a good approximation to the function f(x)=1 if the error is measured in the L2(0, 1) norm, but is a bad approximation if the error is measured in the L∞(0, 1) norm.

This is why it is very important to specify how one chooses to measure error and how one interprets the smallness or lack thereof of errors.

To economize notation, from now on we will abbreviate L [a,b] by and ∞ ∞ 2 by whenever the interval involved is not ambiguous. L (a,b) 2 Discrete norms

As we have seen, sometimes data is given in discrete form, i.e., only the pairs m 2 xj,fj j=1 are known. In such cases, it makes no sense to consider L (a,b) { } m or L∞[a,b] norms. Instead, we consider the data fj j=1 as being a discrete function1 for which one uses discrete norms such as{ our} standard vector norms; e.g., m 1/2 m 1 2 m fj j=1 2 = fj dx or fj j=1 = max fj . |||{ } ||| m | | |||{ } |||∞ xj, j=1,...,m | | j=1

1 Why have we included the term m when we deﬁned the Euclidean error norm?

1A discrete function is one that is defined at a discrete set of points; the more familiar functions we deal with are defined for all x in some interval [a, b], i.e., they are defined at a infinite number of points. The Errors Incurred in Lagrange Interpolation

n Consider the case for which the data fj j=0 is determined by evaluating a • given function f(x) at the interpolation{ points} x n , i.e., we have f = { j}j=0 j f(xj). In this case we say that “pn(x) interpolates the function f(x) at the points x0, x1, ..., xn.” We turn to the question of how close is the Lagrange interpolating polynomial • to the function f(x), where we measure closeness using the L∞[a,b] (or uniform) norm. The answer is provided by the following theorem. Note that the theorem holds, at least in the case of inﬁnite precision arithmetic, for any form for the Lagrange interpolating polynomial, i.e., for any basis one chooses for . Pn Theorem. Error estimate for Lagrange interpolation Sup- pose f(x) has (n + 1) continuous derivatives for all x [a,b] ∈ and pn(x) denotes the Lagrange interpolating polynomial of degree less than or equal to n that interpolates f(x) at the points x n in [a,b]. Let W (x)= n (x x ). Then, { j}j=0 j=0 − j 1 (n+1) f pn f W , (2) − ∞ ≤ (n + 1)! ∞ ∞ where f n+1(x) denotes the (n + 1)-st derivative of f(x).

We refer to inequalities such as (2) as “error estimates.” It is important to note that (2) only provides an upper bound for the error, i.e., all it says is that the error f p cannot be any larger than the right-hand side. For a speciﬁc choice of f−(x),∞ the error can be a lot smaller than the right-hand side. On the other hand, there do exist functions f(x) for which (2) holds with equality. Estimates for which the latter can be said are referred to as being “sharp error estimates.” n We see that the error estimate (2) depends on the interpolation points xj j=0 only through the term W . Note that W (b a)n+1 because{ each} factor in its deﬁnition has absolute∞ value less than∞ or≤ equal−to (b a). − n It is then natural to ask: can we select the interpolation points xj j=0 in a such a way so that W is minimized? For the interval [a,b] the{ answer} to this question is given by the∞ following classic theorem. Theorem. Chebyshev Points Let n W (x)= (x x ) . − j j=0 Then, W is minimized on [ 1, 1] by the Chebyshev points given by ∞ − (2j + 1) x = cos π for j =0,...,n. (3) j 2(n + 1) n For the Chebyshev points, we have that W 2− . ∞ ≤

The Chebyshev points are often deﬁned (see Wikipedia) in the equivalent way 2k 1 x = cos − π k =1,...,m k 2m

Example Find the Chebyshev points on [ 1, 1] for n =3. − π x = cos( )=0.92388 0 8 3π x = cos( )=0.38268 1 8 5π x = cos( )= 0.38268 2 8 − 7π x = cos( )= 0.92388 3 8 −

The analogous results for the general case of Chebyshev points on the interval [a,b] are easily deduced by mapping from the interval [ 1, 1]. The choice of points that minimizes W is then given by − ∞

(b + a) (b a) 2j +1 x = + − cos π for j =0,...,n j 2 2 2(n + 1)

2n 1 n+1 in which case W 2− − (b a) . ∞ ≤ − Example . Find the Chebyshev points on [1, 3] for n =3.

2 π x =2+ cos( )=2.92388 0 2 8 2 3π x =2+ cos( )=2.38268 1 2 8 2 5π x =2+ cos( )=1.61732 2 2 8 2 7π x =2+ cos( )=1.07612 3 2 8

We now consider some additional ramiﬁcation of the error estimate (2). If we ﬁx the degree of the polynomial n in (2) and we consider small intervals [a,b], the Lagrange interpolating polynomial provides quite a good approximation.

Example Consider the two interpolation points x0 = a and x1 = b. If we interpolate f(x) by a linear polynomial p1(x) at the points x0 and x1, we obtain the error estimate h2 f p1 f ′′ , − ∞ ≤ 8 ∞ where h =(b a). − a+b Consider the three interpolation points x0 = a, x1 = 2 , and x2 = b. If we interpolate f(x) by a quadratic polynomial p2(x) at the points x0, x1, and x2, we obtain the error estimate 3 f p2 h f ′′′ . − ∞ ≤ ∞ Clearly, p1(x) and p2(x) are good approximations (in the uniform norm) to f(x) as h 0, i.e., for (b a) small. → − The situation changes drastically when we consider the approximating properties of Lagrange interpolating polynomials when we keep [a,b] ﬁxed and let n, the degree of the interpolating polynomial, increase.

Note that when we increase n we need more interpolation points. Thus, we n consider a sequence, indexed by n, of interpolation point sets xj j=0 for n = 0, 1, 2,... and the corresponding sequence of Lagrange interpolating{ } polynomials p (x) . n ∈Pn

Question: Is it true that pn(x) f(x) as n for all x [a,b]? This seems → → ∞2 ∈ plausible in view of the estimate (2), at least for C∞[a,b] functions f(x).

2Cr[a, b] denotes the space of functions having r continuous derivatives so that C∞[a, b] denotes the space of functions that are inﬁnitely diﬀerentiable. Runge’s Example

This example demonstrates that it is NOT true that pn(x) f(x) as n for all x [a,b] using equally spaced points. In fact, → → ∞ ∈ f pn − ∞ becomes arbitrarily large as n . → ∞ Consider the function 1 f(x)= 1 x 1 1+25x2 − ≤ ≤ which we want to interpolate (using only function values) with an increasing number of points. We interpolate with evenly spaced points, i.e., with a higher and higher degree polynomial. The black curve represents the function. 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 -1 -0.5-0.2 0.5 1 -1 -0.5 0.5 1 -0.4 2 4 1.5 3 2 1 1 0.5 -1 -0.5-1 0.5 1 -1 -0.5 0.5 1 -2

One may suspect that it is the equal spacing of the interpolation points that causes these problems. To some extent this it true. For example, we have the following result. Theorem. Convergence of Interpolating Polynomial using Chebyshev Points. Let f(x) have two continuous derivatives on [a,b] and let pn(x) denote its Lagrange interpolating polynomial of degree less than or equal to n with respect to the Chebyshev points (3). Then, pn(x) converges to f(x) uniformly, i..e., f pn 0 as n ; in fact, we have that − ∞ → → ∞ 1 f pn C − ∞ ≤ √n as n , where C is independent of n. → ∞

Nevertheless, in the general case of a continuous function f(x), it can be shown that no matter how the interpolation points x n are chosen , there exists a { j}i=1 continuous function f(x) such that f pn as n . − ∞ → ∞ → ∞ Since we know that it is not advisable to approximate a function by a higher degree interpolating polynomial, how should we interpolate a function to get better accuracy? The answer is piecewise interpolation. Piecewise Polynomial Interpolation

We have seen that Lagrange interpolating polynomials can provide a good • approximation for a fixed degree if (b a) is small. − In general, the Lagrange interpolating polynomials are not so effective if [a,b] • is fixed and the degree n increases. In general, high degree polynomial interpolation should be avoided. • What can we do instead? • We use something called piecewise interpolation. • When a graphing program plots a function like we did in the previous examples • it connects closely spaced points with a straight line. If the points are close enough, then the result looks like a curve to our eye. This is called piecewise linear interpolation. Examples of Piecewise Functions

Piecewise Constant Function

0.6 0.4 0.2

0.5 1 1.5 2 2.5 3 -0.2 -0.4 Continuous Piecewise Linear Function

1.4 1.2 1 0.8 0.6 0.4 0.2

0.5 1 1.5 2 2.5 3

A continuous piecewise linear function is linear on each subinterval and because it is continuous the linear function on the interval [xi 1,xi] and the linear function − deﬁned on [xi,xi+1] will match up at xi. Below we see some examples of a piecewise linear interpolant to sin x2 on (-2,2). In each plot we are using uniform subintervals and doubling the number of subintervals from one plot to the next.

1 1 0.75 0.75 0.5 0.5 0.25 0.25 -2 -0.25-1 1 2 -2 -0.25-1 1 2 -0.5 -0.5 -0.75 -0.75 1 1 0.75 0.75 0.5 0.5 0.25 0.25 -2 -0.25-1 1 2 -2 -0.25-1 1 2 -0.5 -0.5 -0.75 -0.75 Below we see some examples of piecewise linear interpolation for Runge’s example. Note that this form of interpolation does not have the “wiggles” we encountered by using a higher degree polynomial.

1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 -1 -0.5 0.5 1 -1 -0.5 0.5 1 1 0.8 0.6 0.4 0.2 -1 -0.5 0.5 1 Piecewise Linear Interpolation

Assume that we are given (x0,f0), (x1,f1), (x2,f2), ..., (xn,fn). We want • to construct the piecewise linear interpolant. x1 x2 x3 xi xi+1 xn xn+1

We divide our domain into points x i =0, 1,...,n • i The linear interpolation on the interval [x ,x ] is just • i i+1 (x)= a + b (x x ) Li i i − i where f f a = f f = f + b (x x ) b = i+1 − i i i i+1 i i i+1 − i ⇒ i x x i+1 − i Alternately we could write it as (x)= f (x x )+ f (x x ) Li i − i+1 i+1 − i Thus our piecewise linear interpolant is given by • (x) if x x x L0 0 ≤ ≤ 1  1(x) if x1 x x2 L ≤ ≤  . .  . . (x)=  L  i(x) if xi x xi+1 L ≤ ≤ . .   n 1(x) if xn 1 x xn  − − L ≤ ≤  Note that (x) is continuous but not diﬀerentiable everywhere. • L Example Determine the piecewise linear interpolant to sin x on [0,π] found by π 3π using 2 equal subintervals; then evaluate it at the points 3, 5 . Compute error at the points.

Our intervals are [0, π] and [π,π]. • 2 2 On the first interval the coefficients are • sin π sin0 2 a =sin0=0 b = 2 − = 1 1 π 0 π 2 − 2 2x so we have =0+ (x 0) = . L0 π − π On the second interval the coefficients are • π sin π sin π 0 1 2 a = sin =1 b = − 2 = − = 2 2 2 π π π −π − 2 2 2 π which gives =1 (x ). L1 − π − 2 The point π is in the first interval so • 3 π 2 π 2 (x)= a + b (x x ) ( )=0+ ( 0) = L0 1 1 − 1 ⇒L0 3 π 3 − 3 π The actual value of sin 3 is 0.866025 so our error is approximately 0.2. The error is fairly large because the length of our subinterval is large π 1.57. 2 ≈ Now the point 3π is in the second interval so • 5 3π 2 3π π 4 (x)= a + b (x x ) ( )=1 ( )= L1 2 2 − 2 ⇒L2 5 − π 5 − 2 5 3π The actual value of sin 5 is 0.950157 so our error is approximately 0.151.

Of course we can also do piecewise quadratic or cubic interpolation. Usually • we don’t go higher than this. Remember that piecewise interpolation is almost always preferable to using a • higher order polynomial. Formal Deﬁnitions

Given

Deﬁnition. Continuous piecewise-linear polynomials Given the partition ∆ of [a,b], the set of all continuous on [a,b] h L∆h piecewise-linear polynomials with respect to ∆h is the set = φ(x) is continuous on the interval [a,b] and is a L∆h linear polynomial on each subinterval [xj 1,xj] in ∆h . − Basis for Piecewise Linear Interpolation

Now consider the functions x1 x − if x0 x x1 φ (x)= x1 x0 ≤ ≤ 0  −  0 otherwise

 x xi 1 − − if xi 1 x xi xi xi 1 − ≤ ≤  − − φi(x)= xi+1 x for j =1,...,n 1  − if xi x xi+1 −  xi+1 xj ≤ ≤  − 0 otherwise    x xn 1 − − if xn 1 x xn φn(x)= xn xn 1 − ≤ ≤  − −  0 otherwise. For the functions φ (x) n , we have the following.  { j }j=0 For j = 0,...,n, each φ (x) , i.e., each φ (x) is a piecewise-linear • j ∈ L∆h j function with respect to the partition ∆h of the interval [a,b]. n The set φj(x) j=0 is a basis for ∆h, i.e., it is linearly independent and spans • . { } L L∆h has dimension n +1; •L∆h The basis φ (x) n has the delta property • { j }j=0 φj(xk)= δjk for j,k =0,...,n. (4) For j =0 and j = n, φ (x) has support3 over one interval; for j =1,...,n • j − 1, φj(x) has support over two subintervals. Speciﬁcally, we have that,

φj(x)=0 for x [xk 1,xk] whenever k = j or j +1. (5) ∈ − N The functions φj(x) j=0 are commonly referred to as the hat functions; the { } 4 ﬁgure provides an illustration of the piecewise-linear basis functions φj(x) j=0 for n =4. { } 3The support of a function are those x for which the function is not zero. φ φ (x) φ (x) φ 3(x) φ (x) φ (x) 0 1 2(x) 4 5

x0=a x1 x2 x3 x4 x5=b

Given f(x) continuous on the interval [a,b], consider the function qn(x) ∆h deﬁned by ∈L n q (x)= f(x )φ (x)= f(x )φ (x)+ f(x )φ (x)+ n j j 0 0 1 1 j=0 (6) + f(xn 1)φn 1(x)+ f(xn)φn(x). − − Evaluating this function at xk, we have that n n

qn(xk)= f(xj)φj(xk)= f(xj)δjk = f(xk) for k =0,...,n, j=0 j=0 i.e., the continuous piecewise-linear function qn(x) interpolates f(x) at the points x n . Thus, we refer to q (x) given by (6) as the Lagrange continuous { k}k=1 n piecewise-linear interpolant of f(x) with respect to the partition ∆h of the interval [a,b]. This interpolant exists (we in fact, have just shown that by constructing it) and it is easy to show that it is also unique.

Because of (5), i.e., because of φj(x) has support over at most two subintervals, (6) collapses to just two terms in each subinterval, i.e., we have that

qn(x)= f(xj 1)φj 1(x)+ f(xj)φj(x) for x [xj 1,xj], j =1,...,n. − − ∈ − (7) Thus, the value of qn(x) for x [xj 1,xj] is determined using only the two ∈ − basis function φj 1(x) and φj(x) corresponding to the end points of that interval; for this reason,− piecewise polynomial interpolants are referred as being local approximations. On the other hand, because all the Lagrange fundamental polynomials ℓj(x) are non-zero over the whole interval [a,b], all terms in the sum L are needed whenever one evaluates the Lagrange interpolant pn(x) at any point x [a,b]. For this reason, the Lagrange interpolant is referred to as being a ∈ global approximation. Likewise, because the basis ℓj(x) consists of globally supported functions, i.e., functions that are non-zero{ over} all of [a,b], that basis is referred to as a global basis whereas the basis φj(x) consists of locally supported functions, i.e., functions that are non-zero ove{ r only} the intervals that have xj as an endpoint, that basis is referred to as a local basis.

π Example Write the piecewise linear interpolant of f(x) = sin( 2 x) on the 4 5 interval [1, 2] using the points x0 =1, x1 = 3, x2 = 3, and x3 =2 and the local basis given by the “hat functions”.

We have n =3 and the data i 0 1 23 4 5 xj 1 3 3 2 (8) √3 1 f(xj) 1 2 2 0 so that 4 3 x 4 4 − =4 3x if 1 x 3 φ0(x)= 1 − ≤ ≤  3 −  0 otherwise x 1  4 4 − =3x 3 if 1 x 3 3 1 − ≤ ≤  5 − φ (x)=  x 1  3 − =5 3x if 4 x 5  5 4 − 3 ≤ ≤ 3  3 − 3 0 otherwise    x 4  − 3 =3x 4 if 4 x 5 5 4 − 3 ≤ ≤ 3  3 − 3 φ (x)=  2 x 2  − =6 3x if 5 x 2  2 5 − 3 ≤ ≤  − 3 0 otherwise   5  x 3 5 − 5 =3x 5 if 3 x 2 φ3(x)= 2 − ≤ ≤  − 3  0 otherwise

 so that f(1)φ (x)+ f(4)φ (x) = 1(4 3x)+ √3(3x 3) if 1 x 4 0 3 1 − 2 − ≤ ≤ 3 4 5 √3 1 4 5 q3(x)=  f( )φ1(x)+ f( )φ2(x)= (5 3x)+ (3x 4) if x  3 3 2 − 2 − 3 ≤ ≤ 3  f(5)φ (x)+ f(2)φ (x)= 1(6 3x)+0(3x 5) if 5 x 2. 3 2 3 2 − − 3 ≤ ≤  3 One can easily verify that the interpolation conditions q3(xj) = f(xj) i=0 are satisﬁed. { } Error in the Piecewise-Linear Interpolant

We now derive an estimate for the error in the piecewise-linear interpolant, in particular, an estimate for f(x) q (x) . − n L∞[a,b] On each subinterval [xj 1,xj], qn(x) is a linear polynomial such that qn(xj 1)= − − f(xj 1) and qn(xj) = f(xj) so that qn(x), restricted to the interval [xj 1,xj], is the− Lagrange linear interpolation polynomial for f(x) over that interval.− Thus, on each subinterval we can apply the error estimate (2) to conclude that 1 f(x) qn(x) L∞[xj 1,xj] f ′′(x) L∞[xj 1,xj] (x xj 1)(x xj) L∞[xj 1,xj], − − ≤ 2 − − − − − where have used W (x)=(x xj 1)(x xj). − − − Now 1 2 (x xj 1)(x xj) L∞[xj 1,xj] = h − − − − ≤ 4 where we have used the calculus to ﬁnd the maximum value of the function (x xj 1)(xj x) over the interval [xj 1,xj] (just like in a previous example) − − − − 2 2 and also used that, by the deﬁnition of h, (xj xj 1) h . Combining the last two results, we have that − − ≤

1 2 f(x) qn(x) L∞[xj 1,xj] h f ′′(x) L∞[xj 1,xj] for j =1,...,n. − − ≤ 8 − Clearly,

f ′′(x) L∞[xj 1,xj] = max f ′′(x) max f ′′(x) = f ′′(x) L∞[a,b]. − x [xj 1,xj] | |≤ x [a,b] | | ∈ − ∈ Combining the last two results, we have

1 2 f(x) qn(x) L∞[xj 1,xj] h f ′′(x) L∞[a,b] for j =1,...,n. − − ≤ 8 However, we have that

f(x) qn(x) L∞[a,b] = max f(x) qn(x) = max max f(x) qn(x) − x [a,b] | − | j=1,...,n x [xj 1,xj] | − | ∈ ∈ −

= max f(x) qn(x) L∞[xj 1,xj] j=1,...,n − − so that combining the last two results, we have

1 2 1 2 f(x) qn(x) L [a,b] max h f ′′(x) L [a,b] = h f ′′(x) L [a,b]. − ∞ ≤ j=1,...,N 8 ∞ 8 ∞ We have thus proved the following result. Theorem. Error estimate for Lagrange continuous piecewise-linear polynomial interpolation Let f(x) denote a given twice continu- ously diﬀerentiable over the interval [a,b] and let qn(x) denote the Lagrange continuous piecewise-linear polynomial that interpolates f(x) at the points a = x0

If we increase the number of interpolation points n +1 in such a way that h becomes smaller and smaller, i.e., the length of the longest subinterval [xj 1,xj] becomes smaller, then, so long as f(x) has two continuous derivatives, we− are guaranteed the error becomes smaller and smaller. In fact, given any ε > 0, we can guarantee that f(x) q (x) <ε by choosing n large enough. − n L∞[a,b] A special case is given by uniform partitions of [a,b], i.e., we have that xj = b a a + i −n for j = 0, 1,...,n. In this case all the subintervals have the same (b a) length h = −n and (9) becomes 1 (b a)2 f(x) q (x) f ′′(x) − . (10) − n L∞[a,b] ≤ 8 L∞[a,b] n2

In this case it is obvious that we can make f(x) qn(x) L∞[a,b] as small as we want by choosing n large enough. −

1 Example Consider the function f(x)= 1+x2 on the interval [ 5, 5] for which, if one uses a uniform partition of the interval [a,b], the error for the− global Lagrange polynomial interpolant becomes arbitrarily large as we increase the degree of the polynomial. A tedious but easy calculation (involving ﬁnding the extrema of 1 over the interval [ 5, 5]) yields that f (x) = 2. Then, because 1+x2 ′′ L∞[ 5,5] b a = 10, we have from− (10) that − − 1 1 1 2 1 25 qn(x) qn(x) ( )(2)(10 ) = for all x [ 5, 5] 1+ x2− ≤ 1+ x2− L [ 5,5] ≤ 8 n2 n2 ∈ − ∞ − 1 Thus, we can guarantee that 2 qn(x) <ε for all x [ 5, 5] by choosing 1+x − ∈ − 25 <ε, i.e., choosing n to be the smallest integer larger than 5 . For example, n2 √ε we can guarantee that the error is less than0.01 by choosing n = 50. Piecewise Cubic Hermite Interpolation

When we use piecewise linear, piecewise quadratic, piecewise cubic, etc inter- • polation the interpolating polynomial does not get smoother at the nodes. For example, for piecewise quadratic interpolation we add another interpola- • tion point in each interval [xi,xi+1], typically the midpoint. If we want a smoother interpolating polynomial we can add additional inter- • polation conditions at the interior nodes. For example, we can require that not only is the function continuous but additionally the ﬁrst derivatives are continuous. When we use derivatives as interpolation conditions, the interpolating poly- • nomial is called a Hermite interpolating polynomial. In the homework you will explore such a polynomial. • Cubic Splne Interpolation

Linear interpolation is the simplest form of a spline, followed by quadratic • interpolation and then cubic interpolation. It is generally assumed that cubic is the optimal degree for splines. • So we divide our interval [a,b] into subintervals [xi 1,xi] and on each subin- • terval represent the function by a cubic polynomial.−

Assume we have points xi, i = 0, 1,...,n so we have n subintervals. On • each subinterval we have four conditions to determine the cubic so there are 4n degrees of freedom.

On each of the n intervals [xi 1,xi] the cubic must interpolate the data at • each endpoint. This guarantees− continuity at the interior nodes and gives a total 2(n) constraints. So now we have 4n degrees of freedom and only 2n constraints so we add • additional conditions for smoothness at the n 1 interior nodes. If we impose − continuity of the derivatives at the interior nodes then we only add n 1 conditions so we can actually require continuity of the first and second deriva-− tives at the interior nodes bringing the total up to 2n + 2(n 1) = 4n 2 constraints. − − How do we impose the remaining two constraints? Different conditions can • be used. A common one is that the second derivative of the cubic is zero at the two endpoints. In the lab, you will use Matlab’s cubic spline interpolant to fit some data sets. • If we determined a basis (similar to the hat functions) for cubic splines, then • we would find that the support of the basis was no longer two subintervals but rather has support over four intervals. Here is the cubic spline basis function on the interval [ 2, 2] where the subin- • − tervals have length one. To get the spline on [xi 2,xi+2] we use the transfor- mation −

x φ(h i) xi 2 x xi+2 φi(x)= − − ≤ ≤ 0 elsewhere where 1 (x + 2)3 2 x 1 , 4 − ≤ ≤−  1  1+3(1+ x)+3(1+ x)2 3(1 + x)3 1 x 0 ,  φ(x)=  4 − − ≤ ≤  1  1+3(1 x)+3(1 x)2 3(1 x)3 0 x 1 ,  4 − − − − ≤ ≤ 1 3  (2 x) 1 x 2 .  4 − ≤ ≤    1 φ(x)

-2 -1 1 2