Geometric Modeling Summer Semester 2012

Interpolation and Approximation · Least-Squares Techniques Interpolation General & Interpolation Interpolation Problem

First approach to modeling smooth objects: • Given a set of points along a curve or surface • Choose basis functions that span a suitable function space . Smooth basis functions . Any will be smooth, too • Find a linear combination such that the curve/surface interpolates the given points

3 / 85 Interpolation Problem Different types of interpolation: • Nearest

• Linear

• Polynomial

4 / 85 General Formulation

Settings: ds • Domain   R , mapping to R. f (x) • Looking for a function f:   R. x2  • Basis set: B = {b1,...,bn}, bi:   R. x1 • Represent f as linear combination of basis functions:

n 1  f (x)   b (x) , i.e. f is just determined by     λ  i i λ   i1   n  ds • Function values: {(x1, y1), ..., (xn, yn)}, (xi, yi)  R  R

• We want to find  such that: i {1,...,n}: fλ (xi ) yi

5 / 85 Illustration

R f: R    R

y1 b2 y3

b1 y2 b3

x1 x2 x3  1D Example

n f (x)   b (x)  i i i {1,...,n}: f (xi )  yi i1

6 / 85 Solving the Interpolation Problem

Solution: linear system of equations

• Evaluate basis functions at points xi:

n i {1,...,n}: i bi (xi )  yi i1 • Matrix form:

b (x )  b (x )   y   1 1 n 1  1   1                 b1(xn )  bn(xn )n   yn 

7 / 85 Illustration

y2 =  R  3 b 2 y1 =  y3 =  b1  3 b3  2  2  3

 1     2 1  1 x1 x2 x3  interpolation problem linear system

8 / 85 Illustration

y2 =  R  3 b 2 y1 =  y3 =  b1  3 b3  2  2  3

 1     2 1  1  x1 x2  x3  interpolation problem linear system anything in between does not matter (determined by basis only)

9 / 85 Example

Example: Polynomial Interpolation • B = {1, x, x2, x3, ..., xn-1} • Linear system to solve:

1 x  x n1   1 1    y  n1  1   1  1 x2  x2          “”         y   n1  n   n  1 xn  xn 

10 / 85 Example with Numbers

Example with numbers • Quadratic monomial basis B = {1, x, x2} • Function values: {(0,2), (1,0), (2,3)} [(x, y)] • Linear system to solve:

1 0 0  2   1    1 1 1   0 (2,3)   2    1 2 4  3 (0,2)   3   

• Result: 1 = 2, 2 = -9/2, 3 = 5/2 (1,0)

11 / 85 ...

The interpolation problem is ill conditioned:

• For equidistant xi, the condition number of the Vandermode matrix grows exponentially with n (maximum degree+1 = number of points to interpolate)

cond 2,5E+17 cond 1,0E+18

1,0E+16 2,0E+17

1,0E+14

1,0E+12 1,5E+17 1,0E+10

1,0E+08 1,0E+17

1,0E+06 (logarithmic) 5,0E+16 1,0E+04

1,0E+02

0,0E+00 1,0E+00 0 5 10 15 20 25 0 5 10 15 20 25 #points #points

12 / 85 Why ill-conditioned?

• Solution with inverse Vandermonde matrix: Mx = y => x = M-1y = (V D-1 UT)y  2.5 0 0 0   0 1.1 0 0  D:  0 0 0.9 0   0 0 0 0.000000001  • Condition number defined as a ratio between largest and

smallest singular value: max /min cond 2,5E+17

2,0E+17

1,5E+17

1,0E+17

5,0E+16

0,0E+00 0 5 10 15 20 25 #points Why ill-conditioned?

Monomial Basis: • Functions become 2 1,8 increasingly indistinguishable 1,6 with degree (non orthogonal) 1,4 1,2 • Only differ in growing rate 1 (xi growth faster than xi-1) 0,8 0,6 • For higher degrees numerical 0,4 0,2 precision became a key factor 0 0 0,5 1 1,5 2

Monomial basis

14 / 85 The Cure...

This problem can be fixed: • Use orthogonal polynomial basis • How to get one?  e.g. Gram-Schmidt orthogonalization (see assignment sheet #1) • Legendre – orthonormal on [-1..1] • Much better condition of the linear system (converges to 1)

16 / 85 However...

This does not fix all problems: • Polynomial interpolation is instable . “Runge’s phenomenon”: Oscillating behavior . Small changes in control points can lead to

very different result. xi sequence important. • Weierstraß approximation theorem: . Smooth functions (C0) can be approximated arbitrarily well with polynomials . However: Need carefully chosen construction for convergence . Not useful in practice

17 / 85 Runge’s Phenomenon

18 / 85 Conclusion

Conclusion: Need a better basis for interpolation

For example, piecewise polynomials will work much better % Splines

19 / 85 Approximation (Reweighted) Least-squares, Scattered Data Approximation

Common Situation: • We have many data points, they might be noisy • Example: Scanned data • Want to approximate the data with a smooth curve / surface

What we need: • Criterion – what is a good approximation? • Methods to compute this approximation

21 / 85 Least-Squares

We assume the following scenario:

• We have a set of function values yi at positions xi. (1D  1D for now)

• The independent variables xi are known exactly.

• The dependent variables yi are known approximately, with some error. • The error is normal distributed, independent, and with the same distribution at every point (normal noise). • We know the class of functions from which the noisy samples were taken.

23 / 85 Situation

y 1 yn y2

x1 x2 xn Situation:

• Original sample points taken at xi from original f.

• Unknown Gaussian noise added to each yi. • Want to estimated reconstructed f~.

24 / 85 Maximum Likelihood Estimation

Goal: • Maximize the probability that the data originated from the reconstructed curve f~ fits the points • “Maximum likelihood estimation”

1  x  2  p (x)  exp   ,  2   2π  2 

Gaussian normal distribution

25 / 85 Maximum Likelihood Estimation

~ n ~ n 1  ( f (x ) y )2  arg max N ( f (x ) y )  arg max exp i i  ~  0, i i ~   2  f i1 f i1  2π  2  ~ n 1  ( f (x ) y )2   arg max ln exp i i  ~   2  f i1  2π  2  ~ n  1  ( f (x ) y )2   arg max ln   i i  ~    2 f i1   2π  2  ~ n ( f (x ) y )2  arg min i i ~  2 f i1 2 n ~  arg min ( f (x ) y )2 ~  i i f i1

26 / 85 Maximum Likelihood Estimation

~ n ~ n 1  ( f (x ) y )2  arg max N ( f (x ) y )  arg max exp i i  ~  0, i i ~   2  f i1 f i1  2π  2  ~ n 1  ( f (x ) y )2   arg max ln exp i i  ~   2  f i1  2π  2  ~ n  1  ( f (x ) y )2   arg max ln   i i  ~    2 f i1   2π  2  ~ n ( f (x ) y )2  arg min i i ~  2 f i1 2 n ~  arg min ( f (x ) y )2 ~  i i f i1

27 / 85 Least-Squares Approximation

This shows: • The solution with maximum likelihood in the considered scenario (y-direction, iid Gaussian noise) minimizes the sum of squared errors.

Next: Compute optimal coefficients ~ k • Linear ansatz: f x:  j bj x j1

• Task: determine optimal i

29 / 85 Maximum Likelihood Estimation

Compute optimal coefficients: 2 n ~ n  k   arg min ( f (x ) y )2  arg min   b (x )  y  i i  j j i  i  λ i1 λ i1  j1   n T 2  arg minλ b(xi ) yi  λ i1   n  n n   arg minλ T b(x )bT(x ) λ 2 y λ Tb(x ) y 2    i i   i i  i  λ   i1  i1 i1  xTAx bx c  Quadratic optimization problem

30 / 85 Critical Point

1  b1(x) bi (x1 )  y1          λ :   k entries , b(x):   k entries , bi :   n entries, y :   n entries         k  bk (x) bi (xn )  yn    n  n n   λ T b(x )bT (x ) λ 2 y λ Tb(x ) y 2  λ   i i   i i  i    i1  i1 i1   y Tb  n  1   T   2b(xi )b (xi )λ 2    i1   T  y bk  We obtain a linear system of equations:

 y Tb  n  1   T  b(xi )b (xi )λ      i1   T  y bk  31 / 85 Critical Point

This can also be written as:

 b ,b  b ,b    y,b   1 1 1 k  1   1                 b ,b  b ,b   y,b  k 1 k k  k   k  with: n bi ,b j : bi (xt )bj (xt ) t 1 n y,bi : bi (xt ) yt t 1

32 / 85 Summary

Statistical model yields least-squares criterion: n n ~ ~ 2 arg max N0, ( f (xi )  yi ) arg min ( f (x ) y ) ~  ~  i i f i1 f i1 Linear function space leads to quadratic objective: 2 ~ k n  k   f x :  b x arg min   b (x )  y    j j    j j i  i  j1 λ i1  j1   Critical point: linear system

n  b ,b  b ,b    y,b  b ,b : b (x )b (x )  1 1 1 k  1   1  i j  i t j t       t 1      with: n      y,b : b (x ) y  bk ,b1  bk ,bk k   y,bk  i  i t t t 1

33 / 85 Variants

Weighted least squares: • In case the data point’s noise has different standard deviations  at the different data points • This gives a weighted least squares problem • Noisier points have smaller influence

34 / 85 Same procedure as prev. slides...

~ n ~ n 1  ( f (x ) y )2  arg max N ( f (x ) y )  arg max exp i i  ~   i i ~   2  f i1 f i1  i 2π  2 i  ~ n 1  ( f (x ) y )2   arg max log exp i i  ~   2  f i1  i 2π  2 i  ~ n  1  ( f (x ) y )2   arg max log   i i  ~    2 f i1    i 2π  2 i  ~ n ( f (x ) y )2  arg min i i ~  2 f i1 2 i n 1 ~  arg min ( f (x ) y )2 ~  2 i i f i1  i weights

35 / 85 Result

Linear system for the general case:

n  b ,b  b ,b    y,b  b ,b : b (x )b (x ) 2 x   1 1 1 n  1   1  i j  i i j i i       with: l1      n      y,b : b (x ) y  2 x   bn ,b1  bn ,bn n   y,bn  i  i i i i l1

2 1 1  xi  2 , i.e. xi   i  i Larger  larger influence of data point

36 / 85 Least-Squares Linear Systems

Remark: • We get the same result, if we solve an overdetermined system for the interpolation problem in a least squares sense • Least-squares solution to linear system: Ax  b  arg minAx  b2 x  arg minx T A T Ax 2Axb  bTb x compute gradient :  2A T Ax  2A Tb, i.e.: A T Ax  A Tb • “System of normal equations”

37 / 85 SVD

Problem with normal equations: • Condition number of normal equations is square of that of A itself • Proof: SVD : A  UDV T T T T 2 A A  V DU UDV  V D V • For “evil” (i.e. ill conditioned) problems, normal equations are not the best way to solve the problem • In that case, we can use the SVD to solve the problem...

38 / 85 Least-Squares with SVD

Compute singular value decomposition, then: A  UDV A T Ax  A T b V TDU TUDVx  V TDU T b  V TD2Vx  V TDU Tb  D2Vx  DU T b T  DVx  U b  x  VT D1UTb If D is not invertible (not full rank), inverting the non- zero entries only yields the least-squares solution of minimal norm (critical point with || x|| minimal).

39 / 85 One more Variant...

Function Approximation f f~ • Given the following problem: . We know a function f:   Rn  R . We want to approximate f in ~ k a linear subspace: f x:  j bj x j1 . How to choose ? • Difference: Continuous function as “data” to be matched. • Solution: Almost the same as before...

40 / 85 Function Approximation

Objective function: ~ 2 • f x f  min • We obtain: 2 k k k  j b j x f   j b j x f , j b j x f j1 j1 j1  b ,b  b ,b   1 1 n 1  k T  λ     λ 2 j b j x, f  f , f   j1  b1 ,bn  bn ,bn 

41 / 85 Function Approximation

Critical point (i.e., solution):  b ,b  b ,b    b x, f   1 1 k 1  1   1                   b1 ,bk  bk ,bk k   bk x, f  with: f , g   f (x)g(x)dx (unweighted version) 

f , g  f (x)g(x) 2(x)dx (weighted version)   

42 / 85 Summary

What we can do so far: • Least-squares approximation: . Given more data points than basis functions, we can fit an approximate function from a basis function set to the data • Variants: . We can solve linear systems in a least-squares sense . Given a function, we can fit the most similar approximation from a subspace • Extensions: . Any known uncertainty in the data can be modeled by weights . The multi-dimensional case is similar

44 / 85 Remaining problems

What is missing: • Any error in x-direction is ignored so far (only y-direction) . We will look at that problem next (total least-squares)... • Noise must be Gaussian . Can be generalized using iteratively reweighted least-squares (M-estimators)

45 / 85 Approximation Total Least Squares Statistical Model

Generative Model:

original curve / surface noisy sample points

47 / 85 Statistical Model

Generative Model: 1. Determine sample point (uniform) 2. Add noise (Gaussian)

sampling Gaussian noise many samples distribution (in space)

48 / 85 Squared Distance Function

Result: • Gaussian distribution convolved with object • No analytical density

Approximation: • 1D Gaussian  minimize squared residual • This case  minimize squared distance function

49 / 85 General Total Least Squares

General Total Least Squares: • Given a class of objects obj with parameters   Rk. • A set of n sample points (Gaussian, iid, isotropic m covariance) di  R . • Total least squares solution minimizes: n 2 arg min  distobjλ ,di  k λR i1 • In general: Non-linear, possibly constrained (restrictions on admissible s) optimization problem • Special cases can be solved exactly

50 / 85 Fitting Affine Subspaces

The following problem can be solved exactly: • Best fitting line to a set of 2D, 3D points • Best fitting plane to a set of 3D points • In general: Affine subspace of ℝ푚, with dimension 푑 ≤ 푚 that best approximates a set of data points 푚 퐱푖 ∈ ℝ .

This will lead to the - famous - principle component analysis (PCA). Start: 0-dim Subspaces

Easy Start: The optimal 0-dimensional affine subspace 푚 • Given a set 퐗 of 푛 data points 퐱푖 ∈ ℝ , what is the point 퐱ퟎ with minimum least square error to all data points? • Answer: just the sample mean (average)...: 푛 1 퐱 = m 퐗 ≔ 퐱 0 푛 푖 푖=1

x0 One Dimensional Subspaces...

Next: • What is the optimal line (1D subspace) that approximates a set of data points 퐗? • Two questions: . Optimum origin (point on the line)? – This is still the average r xi . Optimum direction? x0 – We will look at that next... • Parametric line equation:

푚 푚 퐱 푡 = 퐱0 + 푡. 퐫 퐱0 ∈ ℝ , 퐫 ∈ ℝ , 퐫 = 1 Best Fitting Line

Line equation: line 푚 푚 r (unit length) 퐱 푡 = 퐱0 + 푡. 퐫 퐱0 ∈ ℝ , 퐫 ∈ ℝ , 퐫 = 1 x0 Best projection on any line:

푡푖 = 퐫, 퐱푖 − 퐱0 xi Objective Function: r 푡푖 = 퐫, 퐱푖 − 퐱0 푛 푛 x0 2 2 푑푖푠푡 푙푖푛푒, 퐱푖 = 퐱0 + 푡푖퐫 − 퐱푖 푖=1 푖=1 Best Fitting Line

Optimal parameters ti: 푡푖 = 퐫, 퐱푖 − 퐱0 푛 푛 푛 2 2 2 푑푖푠푡 푙푖푛푒, 퐱푖 = 퐱0 + 푡푖퐫 − 퐱푖 = 푡푖퐫 − [퐱푖 − 퐱0] 푖=1 푖=1 푖=1

푛 푛 푛 2 2 2 = 푡푖 퐫 − 2 푡푖 퐫, 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1 푖=1

푛 푛 푛 2 2 2 = 퐫, 퐱푖 − 퐱0 − 2 퐫, 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1 푖=1

푛 푛 2 2 = − 퐫, 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1

푛 푛 2 2 2 = − 퐫 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1

푛 푛 T T 2 = − 퐫 퐱푖−퐱0 퐱푖−퐱0 퐫 + 퐱푖 − 퐱0 푖=1 푖=1 Matrix=:퐒 const. 푤.푟.푡. 퐫

Best Fitting Line

Result: 푛 2 퐓 푑푖푠푡 푙푖푛푒, 퐱푖 = −퐫 퐒퐫 + const. 푖=1 with 퐒 = 푛 퐱 −퐱 퐱 −퐱 T, 퐫 = 1 푖=1 푖 0 푖 0 Eigenvalue Problem: • rTSr is a Rayleigh quotient • Minimizing the energy: maximum quotient • Solution: eigenvector with largest eigenvalue General Case

Fitting a d-dimensional affine subspace: • d = 1: line • d = 2: plane • d = 3: 3D subspace • ...

Simple rule: • Use the d eigenvectors with the largest eigenvalues from the spectrum of S. • Gives the (total) least-squares optimal subspace that approximates the data set 퐗. General Case

Procedure: Principal Component Analysis (PCA)

• Compute average x0 = m(D) n T • Compute “scatter matrix” S  di  x0 di  x0  i1 • Let (1,v1), ... ,(n,vn) be sorted eigenvalue/vector pairs of S, where 1 is the largest, and the vi are of unit length. d • The subspace spanned by p  t 1 ,..., t d   x 0    t i v i  i1 approximates the data optimally in terms of squared distances to a point in the subspace. • Stronger: projecting the data into this subspace is the best d-dimensional (affine subspace) data approximation.

58 / 85 Statistical Interpretation

푛 1  v 훍 = 퐱0 = 퐱푖 2 1 푛 푖=1 x 푛 N (x) 0 1 1 Σ,μ 횺 = 퐒 = 퐱 −퐱 퐱 −퐱 T  v 푛 − 1 푛 − 1 푖 0 푖 0 1 1 푖=1

1 1 푁 퐱 = exp − 퐱 − 훍 T횺−1 퐱 − 훍 횺,훍 푑 2 2휋 2 det 횺 1/2 Observation: ퟏ • 퐒 is the covariance matrix of the data set X = {x } 풏−ퟏ i i=1:n • PCA can be interpreted as fitting a Gaussian distribution and computing the main axes Applications

ퟐ Fitting a line to a point cloud in ℝ : x(t) x0 t v1 • Sample mean and direction of maximum eigenvalue x0 Plane Fitting in ℝퟑ: • Sample mean and the two directions of maximum eigenvalues • Smallest eigenvalue ( /  ) small . Eigenvector points in normal direction 2 1

. Aspect ratio (3 / 2) is a measure of “flatness” (quality of fit)

(2 / 1) larger Applications

Application: Normal estimation in point clouds 3 • Given a set of points pi  R that form a smooth surface. • We want to estimate: . Surface normals . Sampling spacing

Algorithm: • For each point, compute the k nearest neighbors (k  20) • Compute a PCA (average, main axes) of these points . Eigenvector with smallest eigenvalue  normal direction . The other two eigenvectors  tangent vectors . Tangent eigenvalues give sample spacing estimate

61 / 85 Example

points normals

tangential frames elliptic splats w/shading

62 / 85 Example

Example: - k-nearest neighbors - PCA coordinate frames at each point - Quadratic monomials (bivariate, local coords.) - Least squares fit

64 / 85