Robust Regression

Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline ● Introduction ○ OLS and important terminology ● Least Median of Squares (LMedS) ● M-estimator ● Penalized least squares What is Regression? ● Fit a model to observed data ● Get minimum error between real data and predicted data https://en.wikipedia.org/wiki/Regression_analysis Outliers ● Noise: transmission error, measurement error ● Cause problem to resulting regression model Robust Regression ● More robust to outliers than normal regression ● Outliers are not removed but not strongly affect the model Problem formulation ● ● yi is called response variable ● xi is called explanatory variable with p dimensions ● ei is error term ● Goal: want to find the estimate of each parameter with minimum error Problem formulation (contd.) ● Estimates of parameter are called regression coefficients ● Residual ri is the difference between real and predicted value ● Formally, our goal is to find a model which can fit the data with smallest residuals Ordinary Least Squares (OLS) ● Most common regression model ● Also called sum of least squares or least squares (LS) ● Goal: find regression coefficients that minimize the sum of squared residuals Problem with OLS ● Regression model is sensitive to outlier Breakdown point ● Measure of robustness of regression method ● Ratio of the smallest number of outliers that causes the regression model to break down and total number of data points ● E.g. 1 outlier already corrupt OLS result ○ Its breakdown point is 1/n or 0% ● Highest possible breakdown point is 50% Leverage points ● Outliers can occur in both x- and y-directions ● Outliers in x-direction called leverage point ● Normally yields larger residual than outlier in y-axis Least Median of Squares (LMedS) ● Introduced by Hampel in 1975 ● Replace sum in OLS with median ● More robust because of median LMedS (contd.) ● Can achieve 50% breakdown point ● Computationally expensive for exact solution ○ O(np+1 log n) in p-dimension ● Need some approximation algorithm LMedS with randomization ● Calculate the approximation of LMedS ● Get a good running time of O(n log2 n) is 2-D with high probability and O(np-1 log n) in p-dimension in worst case LMedS with randomization (contd.) ● Goal: maintain the interval of slopes of lines to get minimum residual ● Set of line is defined by ● The interval of slopes (w.r.t. 2 points) is LMedS with randomization (contd.) ● In each iteration, n cones will be random from all possible (n-1)(n-2)/2 cones ● The median of residual will be tested and interval is shrinked ● Repeat until residual is small enough and find the optimal solution from the intersections in the remaining interval Reweighted Least Squares (RLS) ● One variant of LMedS ● Combines OLS with estimates from LMedS ○ S is scale estimate corresponding to LMedS RLS (contd.) From Robust Regression and Outlier Detection by Rousseuww M-estimator ● The name M is from Maximum Likelihood ● Replace squared residual in OLS with a symmetric, positive semi-definite function ρ M-estimator (contd.) ● To find regression coefficients that minimize the objective function, we need to find derivative of that function M-estimator (contd.) ● We can also reduce M-estimator to other types of regression ○ OLS: 2 ρ(ri) = r ○ Least absolute deviations (LAD): ρ(ri) = |r| ● LAD yields less residuals than OLS but in high-dimensional data OLS can perform slightly better ○ But still 0% breakdown points! ● Challenge: need to choose right ρ function to get a good result Penalized Least Squares ● OLS is equivalent to find maximum likelihood estimate (MLE) of data ● MLE only interested in training data, not in prior knowledge => Overfitting ● Solution: use maximum a posteriori (MAP) Penalized Least Squares ● With prior that the data is normally distributed (Gaussian), calculating MAP is equivalent to ● Intuitively, it is OLS with penalty term ● The above is called ridge regression or l2 regularization Penalized Least Squares (contd.) ● Different assumption on data and prior give different type of regularization From Machine Learning: A Probabilistic Perspective by Murphy Hard Thresholding (TORRENT) ● TORRENT = Thresholding Operator-based Robust RegrEssioN meThod ● Based on l1 regularized regression ● Iteratively maintain the active set St using hard thresholding operator ○ Active set is a set of clean points (not outliers) ● Keep updating weights (regression coeff.) until the residual less than some pre-specified error tolerance TORRENT (contd.) From paper Robust Regression via Hard Thresholding by Bhatia, Jain and Kar TORRENT (contd.) ● Offer several variants which are suitable in different situations ● Variants ○ TORRENT-FC: fully corrective LS, converge faster but expensive at each step ○ TORRENT-GD: using gradient descent, suitable for high dimensional data ○ TORRENT-HYB: hybrid version of above variants Self-Scaled Regularized Robust Regression ● Also based on l1 regularized regression ● Incorperate prior knowledge to make the penalty term able to scaled automatically ○ Prior e.g. Data occurrence Conclusion ● OLS is sensitive to outliers ● LMedS have high breakdown point but slow ● M-Estimate is flexible but hard to find the right function to make it robust ● Penalized Least Squares is also robust but require prior knowledge on data ○ Sometime need strong assumption and not always correct Remarks ● Old papers tend to talk more about high breakdown point i.e. try to reach 50% breakdown point ● More recent papers interested in computational speed instead ○ Effect of high dimensional data.

Robust Regression

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support