Geometric Modeling Summer Semester 2012
Interpolation and Approximation Interpolation · Least-Squares Techniques Interpolation General & Polynomial Interpolation Interpolation Problem
First approach to modeling smooth objects: • Given a set of points along a curve or surface • Choose basis functions that span a suitable function space . Smooth basis functions . Any linear combination will be smooth, too • Find a linear combination such that the curve/surface interpolates the given points
3 / 85 Interpolation Problem Different types of interpolation: • Nearest
• Linear
• Polynomial
4 / 85 General Formulation
Settings: ds • Domain R , mapping to R. f (x) • Looking for a function f: R. x2 • Basis set: B = {b1,...,bn}, bi: R. x1 • Represent f as linear combination of basis functions:
n 1 f (x) b (x) , i.e. f is just determined by λ i i λ i1 n ds • Function values: {(x1, y1), ..., (xn, yn)}, (xi, yi) R R
• We want to find such that: i {1,...,n}: fλ (xi ) yi
5 / 85 Illustration
R f: R R
y1 b2 y3
b1 y2 b3
x1 x2 x3 1D Example
n f (x) b (x) i i i {1,...,n}: f (xi ) yi i1
6 / 85 Solving the Interpolation Problem
Solution: linear system of equations
• Evaluate basis functions at points xi:
n i {1,...,n}: i bi (xi ) yi i1 • Matrix form:
b (x ) b (x ) y 1 1 n 1 1 1 b1(xn ) bn(xn )n yn
7 / 85 Illustration
y2 = R 3 b 2 y1 = y3 = b1 3 b3 2 2 3
1 2 1 1 x1 x2 x3 interpolation problem linear system
8 / 85 Illustration
y2 = R 3 b 2 y1 = y3 = b1 3 b3 2 2 3
1 2 1 1 x1 x2 x3 interpolation problem linear system anything in between does not matter (determined by basis only)
9 / 85 Example
Example: Polynomial Interpolation • Monomial basis B = {1, x, x2, x3, ..., xn-1} • Linear system to solve:
1 x x n1 1 1 y n1 1 1 1 x2 x2 “Vandermonde Matrix” y n1 n n 1 xn xn
10 / 85 Example with Numbers
Example with numbers • Quadratic monomial basis B = {1, x, x2} • Function values: {(0,2), (1,0), (2,3)} [(x, y)] • Linear system to solve:
1 0 0 2 1 1 1 1 0 (2,3) 2 1 2 4 3 (0,2) 3
• Result: 1 = 2, 2 = -9/2, 3 = 5/2 (1,0)
11 / 85 Condition Number...
The interpolation problem is ill conditioned:
• For equidistant xi, the condition number of the Vandermode matrix grows exponentially with n (maximum degree+1 = number of points to interpolate)
cond 2,5E+17 cond 1,0E+18
1,0E+16 2,0E+17
1,0E+14
1,0E+12 1,5E+17 1,0E+10
1,0E+08 1,0E+17
1,0E+06 (logarithmic) 5,0E+16 1,0E+04
1,0E+02
0,0E+00 1,0E+00 0 5 10 15 20 25 0 5 10 15 20 25 #points #points
12 / 85 Why ill-conditioned?
• Solution with inverse Vandermonde matrix: Mx = y => x = M-1y = (V D-1 UT)y 2.5 0 0 0 0 1.1 0 0 D: 0 0 0.9 0 0 0 0 0.000000001 • Condition number defined as a ratio between largest and
smallest singular value: max /min cond 2,5E+17
2,0E+17
1,5E+17
1,0E+17
5,0E+16
0,0E+00 0 5 10 15 20 25 #points Why ill-conditioned?
Monomial Basis: • Functions become 2 1,8 increasingly indistinguishable 1,6 with degree (non orthogonal) 1,4 1,2 • Only differ in growing rate 1 (xi growth faster than xi-1) 0,8 0,6 • For higher degrees numerical 0,4 0,2 precision became a key factor 0 0 0,5 1 1,5 2
Monomial basis
14 / 85 The Cure...
This problem can be fixed: • Use orthogonal polynomial basis • How to get one? e.g. Gram-Schmidt orthogonalization (see assignment sheet #1) • Legendre polynomials – orthonormal on [-1..1] • Much better condition of the linear system (converges to 1)
16 / 85 However...
This does not fix all problems: • Polynomial interpolation is instable . “Runge’s phenomenon”: Oscillating behavior . Small changes in control points can lead to
very different result. xi sequence important. • Weierstraß approximation theorem: . Smooth functions (C0) can be approximated arbitrarily well with polynomials . However: Need carefully chosen construction for convergence . Not useful in practice
17 / 85 Runge’s Phenomenon
18 / 85 Conclusion
Conclusion: Need a better basis for interpolation
For example, piecewise polynomials will work much better % Splines
19 / 85 Approximation (Reweighted) Least-squares, Scattered Data Approximation
Common Situation: • We have many data points, they might be noisy • Example: Scanned data • Want to approximate the data with a smooth curve / surface
What we need: • Criterion – what is a good approximation? • Methods to compute this approximation
21 / 85 Least-Squares
We assume the following scenario:
• We have a set of function values yi at positions xi. (1D 1D for now)
• The independent variables xi are known exactly.
• The dependent variables yi are known approximately, with some error. • The error is normal distributed, independent, and with the same distribution at every point (normal noise). • We know the class of functions from which the noisy samples were taken.
23 / 85 Situation
y 1 yn y2
x1 x2 xn Situation:
• Original sample points taken at xi from original f.
• Unknown Gaussian noise added to each yi. • Want to estimated reconstructed f~.
24 / 85 Maximum Likelihood Estimation
Goal: • Maximize the probability that the data originated from the reconstructed curve f~ fits the points • “Maximum likelihood estimation”
1 x 2 p (x) exp , 2 2π 2
Gaussian normal distribution
25 / 85 Maximum Likelihood Estimation
~ n ~ n 1 ( f (x ) y )2 arg max N ( f (x ) y ) arg max exp i i ~ 0, i i ~ 2 f i1 f i1 2π 2 ~ n 1 ( f (x ) y )2 arg max ln exp i i ~ 2 f i1 2π 2 ~ n 1 ( f (x ) y )2 arg max ln i i ~ 2 f i1 2π 2 ~ n ( f (x ) y )2 arg min i i ~ 2 f i1 2 n ~ arg min ( f (x ) y )2 ~ i i f i1
26 / 85 Maximum Likelihood Estimation
~ n ~ n 1 ( f (x ) y )2 arg max N ( f (x ) y ) arg max exp i i ~ 0, i i ~ 2 f i1 f i1 2π 2 ~ n 1 ( f (x ) y )2 arg max ln exp i i ~ 2 f i1 2π 2 ~ n 1 ( f (x ) y )2 arg max ln i i ~ 2 f i1 2π 2 ~ n ( f (x ) y )2 arg min i i ~ 2 f i1 2 n ~ arg min ( f (x ) y )2 ~ i i f i1
27 / 85 Least-Squares Approximation
This shows: • The solution with maximum likelihood in the considered scenario (y-direction, iid Gaussian noise) minimizes the sum of squared errors.
Next: Compute optimal coefficients ~ k • Linear ansatz: f x: j bj x j1
• Task: determine optimal i
29 / 85 Maximum Likelihood Estimation
Compute optimal coefficients: 2 n ~ n k arg min ( f (x ) y )2 arg min b (x ) y i i j j i i λ i1 λ i1 j1 n T 2 arg minλ b(xi ) yi λ i1 n n n arg minλ T b(x )bT(x ) λ 2 y λ Tb(x ) y 2 i i i i i λ i1 i1 i1 xTAx bx c Quadratic optimization problem
30 / 85 Critical Point
1 b1(x) bi (x1 ) y1 λ : k entries , b(x): k entries , bi : n entries, y : n entries k bk (x) bi (xn ) yn n n n λ T b(x )bT (x ) λ 2 y λ Tb(x ) y 2 λ i i i i i i1 i1 i1 y Tb n 1 T 2b(xi )b (xi )λ 2 i1 T y bk We obtain a linear system of equations:
y Tb n 1 T b(xi )b (xi )λ i1 T y bk 31 / 85 Critical Point
This can also be written as:
b ,b b ,b y,b 1 1 1 k 1 1 b ,b b ,b y,b k 1 k k k k with: n bi ,b j : bi (xt )bj (xt ) t 1 n y,bi : bi (xt ) yt t 1
32 / 85 Summary
Statistical model yields least-squares criterion: n n ~ ~ 2 arg max N0, ( f (xi ) yi ) arg min ( f (x ) y ) ~ ~ i i f i1 f i1 Linear function space leads to quadratic objective: 2 ~ k n k f x : b x arg min b (x ) y j j j j i i j1 λ i1 j1 Critical point: linear system
n b ,b b ,b y,b b ,b : b (x )b (x ) 1 1 1 k 1 1 i j i t j t t 1 with: n y,b : b (x ) y bk ,b1 bk ,bk k y,bk i i t t t 1
33 / 85 Variants
Weighted least squares: • In case the data point’s noise has different standard deviations at the different data points • This gives a weighted least squares problem • Noisier points have smaller influence
34 / 85 Same procedure as prev. slides...
~ n ~ n 1 ( f (x ) y )2 arg max N ( f (x ) y ) arg max exp i i ~ i i ~ 2 f i1 f i1 i 2π 2 i ~ n 1 ( f (x ) y )2 arg max log exp i i ~ 2 f i1 i 2π 2 i ~ n 1 ( f (x ) y )2 arg max log i i ~ 2 f i1 i 2π 2 i ~ n ( f (x ) y )2 arg min i i ~ 2 f i1 2 i n 1 ~ arg min ( f (x ) y )2 ~ 2 i i f i1 i weights
35 / 85 Result
Linear system for the general case:
n b ,b b ,b y,b b ,b : b (x )b (x ) 2 x 1 1 1 n 1 1 i j i i j i i with: l1 n y,b : b (x ) y 2 x bn ,b1 bn ,bn n y,bn i i i i i l1
2 1 1 xi 2 , i.e. xi i i Larger larger influence of data point
36 / 85 Least-Squares Linear Systems
Remark: • We get the same result, if we solve an overdetermined system for the interpolation problem in a least squares sense • Least-squares solution to linear system: Ax b arg minAx b2 x arg minx T A T Ax 2Axb bTb x compute gradient : 2A T Ax 2A Tb, i.e.: A T Ax A Tb • “System of normal equations”
37 / 85 SVD
Problem with normal equations: • Condition number of normal equations is square of that of A itself • Proof: SVD : A UDV T T T T 2 A A V DU UDV V D V • For “evil” (i.e. ill conditioned) problems, normal equations are not the best way to solve the problem • In that case, we can use the SVD to solve the problem...
38 / 85 Least-Squares with SVD
Compute singular value decomposition, then: A UDV A T Ax A T b V TDU TUDVx V TDU T b V TD2Vx V TDU Tb D2Vx DU T b T DVx U b x VT D1UTb If D is not invertible (not full rank), inverting the non- zero entries only yields the least-squares solution of minimal norm (critical point with || x|| minimal).
39 / 85 One more Variant...
Function Approximation f f~ • Given the following problem: . We know a function f: Rn R . We want to approximate f in ~ k a linear subspace: f x: j bj x j1 . How to choose ? • Difference: Continuous function as “data” to be matched. • Solution: Almost the same as before...
40 / 85 Function Approximation
Objective function: ~ 2 • f x f min • We obtain: 2 k k k j b j x f j b j x f , j b j x f j1 j1 j1 b ,b b ,b 1 1 n 1 k T λ λ 2 j b j x, f f , f j1 b1 ,bn bn ,bn
41 / 85 Function Approximation
Critical point (i.e., solution): b ,b b ,b b x, f 1 1 k 1 1 1 b1 ,bk bk ,bk k bk x, f with: f , g f (x)g(x)dx (unweighted version)
f , g f (x)g(x) 2(x)dx (weighted version)
42 / 85 Summary
What we can do so far: • Least-squares approximation: . Given more data points than basis functions, we can fit an approximate function from a basis function set to the data • Variants: . We can solve linear systems in a least-squares sense . Given a function, we can fit the most similar approximation from a subspace • Extensions: . Any known uncertainty in the data can be modeled by weights . The multi-dimensional case is similar
44 / 85 Remaining problems
What is missing: • Any error in x-direction is ignored so far (only y-direction) . We will look at that problem next (total least-squares)... • Noise must be Gaussian . Can be generalized using iteratively reweighted least-squares (M-estimators)
45 / 85 Approximation Total Least Squares Statistical Model
Generative Model:
original curve / surface noisy sample points
47 / 85 Statistical Model
Generative Model: 1. Determine sample point (uniform) 2. Add noise (Gaussian)
sampling Gaussian noise many samples distribution (in space)
48 / 85 Squared Distance Function
Result: • Gaussian distribution convolved with object • No analytical density
Approximation: • 1D Gaussian minimize squared residual • This case minimize squared distance function
49 / 85 General Total Least Squares
General Total Least Squares: • Given a class of objects obj with parameters Rk. • A set of n sample points (Gaussian, iid, isotropic m covariance) di R . • Total least squares solution minimizes: n 2 arg min distobjλ ,di k λR i1 • In general: Non-linear, possibly constrained (restrictions on admissible s) optimization problem • Special cases can be solved exactly
50 / 85 Fitting Affine Subspaces
The following problem can be solved exactly: • Best fitting line to a set of 2D, 3D points • Best fitting plane to a set of 3D points • In general: Affine subspace of ℝ푚, with dimension 푑 ≤ 푚 that best approximates a set of data points 푚 퐱푖 ∈ ℝ .
This will lead to the - famous - principle component analysis (PCA). Start: 0-dim Subspaces
Easy Start: The optimal 0-dimensional affine subspace 푚 • Given a set 퐗 of 푛 data points 퐱푖 ∈ ℝ , what is the point 퐱ퟎ with minimum least square error to all data points? • Answer: just the sample mean (average)...: 푛 1 퐱 = m 퐗 ≔ 퐱 0 푛 푖 푖=1
x0 One Dimensional Subspaces...
Next: • What is the optimal line (1D subspace) that approximates a set of data points 퐗? • Two questions: . Optimum origin (point on the line)? – This is still the average r xi . Optimum direction? x0 – We will look at that next... • Parametric line equation:
푚 푚 퐱 푡 = 퐱0 + 푡. 퐫 퐱0 ∈ ℝ , 퐫 ∈ ℝ , 퐫 = 1 Best Fitting Line
Line equation: line 푚 푚 r (unit length) 퐱 푡 = 퐱0 + 푡. 퐫 퐱0 ∈ ℝ , 퐫 ∈ ℝ , 퐫 = 1 x0 Best projection on any line:
푡푖 = 퐫, 퐱푖 − 퐱0 xi Objective Function: r 푡푖 = 퐫, 퐱푖 − 퐱0 푛 푛 x0 2 2 푑푖푠푡 푙푖푛푒, 퐱푖 = 퐱0 + 푡푖퐫 − 퐱푖 푖=1 푖=1 Best Fitting Line
Optimal parameters ti: 푡푖 = 퐫, 퐱푖 − 퐱0 푛 푛 푛 2 2 2 푑푖푠푡 푙푖푛푒, 퐱푖 = 퐱0 + 푡푖퐫 − 퐱푖 = 푡푖퐫 − [퐱푖 − 퐱0] 푖=1 푖=1 푖=1
푛 푛 푛 2 2 2 = 푡푖 퐫 − 2 푡푖 퐫, 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1 푖=1
푛 푛 푛 2 2 2 = 퐫, 퐱푖 − 퐱0 − 2 퐫, 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1 푖=1
푛 푛 2 2 = − 퐫, 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1
푛 푛 2 2 2 = − 퐫 퐱푖 − 퐱0 + 퐱푖 − 퐱0 푖=1 푖=1
푛 푛 T T 2 = − 퐫 퐱푖−퐱0 퐱푖−퐱0 퐫 + 퐱푖 − 퐱0 푖=1 푖=1 Matrix=:퐒 const. 푤.푟.푡. 퐫
Best Fitting Line
Result: 푛 2 퐓 푑푖푠푡 푙푖푛푒, 퐱푖 = −퐫 퐒퐫 + const. 푖=1 with 퐒 = 푛 퐱 −퐱 퐱 −퐱 T, 퐫 = 1 푖=1 푖 0 푖 0 Eigenvalue Problem: • rTSr is a Rayleigh quotient • Minimizing the energy: maximum quotient • Solution: eigenvector with largest eigenvalue General Case
Fitting a d-dimensional affine subspace: • d = 1: line • d = 2: plane • d = 3: 3D subspace • ...
Simple rule: • Use the d eigenvectors with the largest eigenvalues from the spectrum of S. • Gives the (total) least-squares optimal subspace that approximates the data set 퐗. General Case
Procedure: Principal Component Analysis (PCA)
• Compute average x0 = m(D) n T • Compute “scatter matrix” S di x0 di x0 i1 • Let (1,v1), ... ,(n,vn) be sorted eigenvalue/vector pairs of S, where 1 is the largest, and the vi are of unit length. d • The subspace spanned by p t 1 ,..., t d x 0 t i v i i1 approximates the data optimally in terms of squared distances to a point in the subspace. • Stronger: projecting the data into this subspace is the best d-dimensional (affine subspace) data approximation.
58 / 85 Statistical Interpretation
푛 1 v 훍 = 퐱0 = 퐱푖 2 1 푛 푖=1 x 푛 N (x) 0 1 1 Σ,μ 횺 = 퐒 = 퐱 −퐱 퐱 −퐱 T v 푛 − 1 푛 − 1 푖 0 푖 0 1 1 푖=1
1 1 푁 퐱 = exp − 퐱 − 훍 T횺−1 퐱 − 훍 횺,훍 푑 2 2휋 2 det 횺 1/2 Observation: ퟏ • 퐒 is the covariance matrix of the data set X = {x } 풏−ퟏ i i=1:n • PCA can be interpreted as fitting a Gaussian distribution and computing the main axes Applications
ퟐ Fitting a line to a point cloud in ℝ : x(t) x0 t v1 • Sample mean and direction of maximum eigenvalue x0 Plane Fitting in ℝퟑ: • Sample mean and the two directions of maximum eigenvalues • Smallest eigenvalue ( / ) small . Eigenvector points in normal direction 2 1
. Aspect ratio (3 / 2) is a measure of “flatness” (quality of fit)
(2 / 1) larger Applications
Application: Normal estimation in point clouds 3 • Given a set of points pi R that form a smooth surface. • We want to estimate: . Surface normals . Sampling spacing
Algorithm: • For each point, compute the k nearest neighbors (k 20) • Compute a PCA (average, main axes) of these points . Eigenvector with smallest eigenvalue normal direction . The other two eigenvectors tangent vectors . Tangent eigenvalues give sample spacing estimate
61 / 85 Example
points normals
tangential frames elliptic splats w/shading
62 / 85 Example
Example: - k-nearest neighbors - PCA coordinate frames at each point - Quadratic monomials (bivariate, local coords.) - Least squares fit
64 / 85