Linear Models for Regression
Total Page:16
File Type:pdf, Size:1020Kb
Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Linear Models for Regression Henrik I Christensen Robotics & Intelligent Machines @ GT Georgia Institute of Technology, Atlanta, GA 30332-0280 [email protected] Henrik I Christensen (RIM@GT) Linear Regression 1 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Outline 1 Introduction 2 Preliminaries 3 Linear Basis Function Models 4 Baysian Linear Regression 5 Baysian Model Comparison 6 Summary Henrik I Christensen (RIM@GT) Linear Regression 2 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Introduction The objective of regression is to enable prediction of a value t based on modelling over a dataset X . Consider a set of D observations over a space How can we generate estimates for the future? Battery time? Time to completion? Position of doors? Henrik I Christensen (RIM@GT) Linear Regression 3 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Introduction (2) Example from Chapter 1 1 t 0 −1 0 x 1 m 2 m X i y(x, w) = w0 + w1x + w2x + ... + wmx = wi x i=0 Henrik I Christensen (RIM@GT) Linear Regression 4 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Introduction (3) In general the functions could be beyond simple polynomials The “components” are termed basis functions, i.e. m X T ~ y(x, w) = wi φi (x) = w~ φ(x) i=0 Henrik I Christensen (RIM@GT) Linear Regression 5 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Outline 1 Introduction 2 Preliminaries 3 Linear Basis Function Models 4 Baysian Linear Regression 5 Baysian Model Comparison 6 Summary Henrik I Christensen (RIM@GT) Linear Regression 6 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Loss Function For optimization we need a penalty / loss function L(t, y(x)) Expected loss is then ZZ E[L] = L(t, y(x))p(x, t)dxdt For the squared loss function we have ZZ E[L] = {y(x) − t}2p(x, t)dxdt Goal: choose y(x) to minimize expected loss (E[L]) Henrik I Christensen (RIM@GT) Linear Regression 7 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Loss Function Derivation of the extrema δE[L] Z = 2 {y(x) − t}p(x, t)dt = 0 δy(x) Implies that R tp(x, t)dt Z y(x) = = tp(t|x)dt = E[t|x] p(x) Henrik I Christensen (RIM@GT) Linear Regression 8 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Loss Function - Interpretation t y(x) y(x0) p(tjx0) x0 x Henrik I Christensen (RIM@GT) Linear Regression 9 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Alternative Consider a small rewrite {y(x) − t}2 = {y(x) − E[t|x] + E[t|x] − t}2 The expected loss is then Z Z E[L] = {y(x) − E[t|x]}2p(x)dx + {E[t|x] − t}2p(x)dx Henrik I Christensen (RIM@GT) Linear Regression 10 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Outline 1 Introduction 2 Preliminaries 3 Linear Basis Function Models 4 Baysian Linear Regression 5 Baysian Model Comparison 6 Summary Henrik I Christensen (RIM@GT) Linear Regression 11 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Polynomial Basis Functions 1 Basic Definition: 0.5 φ (x) = xi i 0 Global functions −0.5 Small change in x affects all of them −1 −1 0 1 Henrik I Christensen (RIM@GT) Linear Regression 12 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Gaussian Basis Functions Basic Definition: 1 2 − (x−µi ) φi (x) = e 2s2 0.75 A way to Gaussian mixtures, 0.5 local impact Not required to have 0.25 probabilistic interpretation. 0 µ control position and s −1 0 1 control scale Henrik I Christensen (RIM@GT) Linear Regression 13 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Sigmoid Basis Functions Basic Definition: 1 x − µ φ (x) = σ i i s 0.75 where 0.5 1 σ(a) = 0.25 1 + e−a 0 µ controls location and s −1 0 1 controls slope Henrik I Christensen (RIM@GT) Linear Regression 14 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Maximum Likelihood & Least Squares Assume observation from a deterministic function contaminated by Gaussian Noise t = y(x, w) + p(|β) = N(|0, β−1) the problem at hand is then p(t|x, w, β) = N(t|y(x, w), β−1) From a series of observations we have the likelihood N Y T −1 p(t|X|w, β) = N(ti |w φ(xi ), β ) i=1 Henrik I Christensen (RIM@GT) Linear Regression 15 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Maximum Likelihood & Least Squares (2) This results in N N ln p(t|w, β) = ln β − ln(2π) − βE (w) 2 2 D where N 1 X E (w) = {t − wT φ(x )}2 D 2 i i i=1 is the sum of squared errors Henrik I Christensen (RIM@GT) Linear Regression 16 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Maximum Likelihood & Least Squares (3) Computing the extrema yields: −1 T T wML = Φ Φ Φ t where φ0(x1) φ1(x1) ··· φM−1(x1) φ0(x1) φ1(x2) ··· φM−1(x2) Φ = . .. . φ0(xN ) φ1(xN ) ··· φM−1(xN ) Henrik I Christensen (RIM@GT) Linear Regression 17 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Line Estimation Least square minimization: Line equation: y = ax + b P 2 Error in fit: i (yi − axi − b) Solution: y¯2 x¯2 x¯ a = y¯ x¯ 1 b Minimizes vertical errors. Non-robust! Henrik I Christensen (RIM@GT) Linear Regression 18 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References LSQ on Lasers Line model: ri cos(φi − θ) = ρ Error model: di = ri cos(φi − θ) − ρ P 2 Optimize: argmin(ρ,θ) i (ri cos(φi − θ) − ρ) Error model derived in Deriche et al. (1992) Well suited for “clean-up” of Hough lines Henrik I Christensen (RIM@GT) Linear Regression 19 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Total Least Squares Line equation: ax + by + c = 0 P 2 2 2 Error in fit: i (axi + byi + c) where a + b = 1. Solution: x¯2 − x¯x¯ xy¯ − x¯y¯ a a = µ xy¯ − x¯y¯ y¯2 − y¯y¯ b b where µ is a scale factor. c = −ax¯ − by¯ Henrik I Christensen (RIM@GT) Linear Regression 20 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Line Representations The line representation is crucial Often a redundant model is adopted Line parameters vs end-points Important for fusion of segments. End-points are less stable Henrik I Christensen (RIM@GT) Linear Regression 21 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Sequential Adaptation In some cases one at a time estimation is more suitable Also known as gradient descent (τ+1) (τ) w = w − η∇En (τ) (τ)T = w − η(tn − w φ(xn))φ(xn) Knows as least-mean square (LMS). An issue is how to choose η? Henrik I Christensen (RIM@GT) Linear Regression 22 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Regularized Least Squares As seen in lecture 2 sometime control of parameters might be useful. Consider the error function: ED (w) + λEW (w) which generates N 1 X λ {t − w t φ(x )}2 + wT w 2 i i 2 i=1 which is minimized by −1 w = λI + ΦT Φ ΦT t Henrik I Christensen (RIM@GT) Linear Regression 23 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Outline 1 Introduction 2 Preliminaries 3 Linear Basis Function Models 4 Baysian Linear Regression 5 Baysian Model Comparison 6 Summary Henrik I Christensen (RIM@GT) Linear Regression 24 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Bayesian Linear Regression Define a conjugate prior over w p(w) = N(w|m0, S0) given the likelihood function and regular from Bayesian analysis we can derive p(w|t) = N(w|mN , SN ) where −1 T mN = SN S0 m0 + βΦ t −1 −1 T SN = S0 + βΦ Φ Henrik I Christensen (RIM@GT) Linear Regression 25 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Bayesian Linear Regression (2) A common choice is p(w) = N(w|0, α−1I ) So that T mN = βSN Φ t −1 T SN = αI + βΦ Φ Henrik I Christensen (RIM@GT) Linear Regression 26 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Example - No Data Henrik I Christensen (RIM@GT) Linear Regression 27 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Example - 1 Data Point Henrik I Christensen (RIM@GT) Linear Regression 28 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Example - 2 Data Points Henrik I Christensen (RIM@GT) Linear Regression 29 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Example - 20 Data Points Henrik I Christensen (RIM@GT) Linear Regression 30 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Outline 1 Introduction 2 Preliminaries 3 Linear Basis Function Models 4 Baysian Linear Regression 5 Baysian Model Comparison 6 Summary Henrik I Christensen (RIM@GT) Linear Regression 31 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References Bayesian Model Comparison How does one select an appropriate model? Assume for a minute we want to compare a set of models Mi , i ∈ 1, ...L for a dataset D We could compute p(Mi |D) ∝ p(D|Mi )p(Mi ) Bayes Factor: Ratio of evidence for two models p(D|Mi ) p(D|Mj ) Henrik I Christensen (RIM@GT) Linear Regression 32 / 39 Introduction Preliminaries Linear Models Bayes Regress Model Comparison Summary References The mixture distribution approach We could use all the models: L X p(t|x, D) = p(t|x, Mi , D)p(Mi |D) i=1 Or simply go with the most probably/best model.