Robust Regression

Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline ● Introduction ○ OLS and important terminology ● Least Median of Squares (LMedS) ● M-estimator ● Penalized least squares What is Regression? ● Fit a model to observed data ● Get minimum error between real data and predicted data https://en.wikipedia.org/wiki/Regression_analysis Outliers ● Noise: transmission error, measurement error ● Cause problem to resulting regression model Robust Regression ● More robust to outliers than normal regression ● Outliers are not removed but not strongly affect the model Problem formulation ● ● yi is called response variable ● xi is called explanatory variable with p dimensions ● ei is error term ● Goal: want to find the estimate of each parameter with minimum error Problem formulation (contd.) ● Estimates of parameter are called regression coefficients ● Residual ri is the difference between real and predicted value ● Formally, our goal is to find a model which can fit the data with smallest residuals Ordinary Least Squares (OLS) ● Most common regression model ● Also called sum of least squares or least squares (LS) ● Goal: find regression coefficients that minimize the sum of squared residuals Problem with OLS ● Regression model is sensitive to outlier Breakdown point ● Measure of robustness of regression method ● Ratio of the smallest number of outliers that causes the regression model to break down and total number of data points ● E.g. 1 outlier already corrupt OLS result ○ Its breakdown point is 1/n or 0% ● Highest possible breakdown point is 50% Leverage points ● Outliers can occur in both x- and y-directions ● Outliers in x-direction called leverage point ● Normally yields larger residual than outlier in y-axis Least Median of Squares (LMedS) ● Introduced by Hampel in 1975 ● Replace sum in OLS with median ● More robust because of median LMedS (contd.) ● Can achieve 50% breakdown point ● Computationally expensive for exact solution ○ O(np+1 log n) in p-dimension ● Need some approximation algorithm LMedS with randomization ● Calculate the approximation of LMedS ● Get a good running time of O(n log2 n) is 2-D with high probability and O(np-1 log n) in p-dimension in worst case LMedS with randomization (contd.) ● Goal: maintain the interval of slopes of lines to get minimum residual ● Set of line is defined by ● The interval of slopes (w.r.t. 2 points) is LMedS with randomization (contd.) ● In each iteration, n cones will be random from all possible (n-1)(n-2)/2 cones ● The median of residual will be tested and interval is shrinked ● Repeat until residual is small enough and find the optimal solution from the intersections in the remaining interval Reweighted Least Squares (RLS) ● One variant of LMedS ● Combines OLS with estimates from LMedS ○ S is scale estimate corresponding to LMedS RLS (contd.) From Robust Regression and Outlier Detection by Rousseuww M-estimator ● The name M is from Maximum Likelihood ● Replace squared residual in OLS with a symmetric, positive semi-definite function ρ M-estimator (contd.) ● To find regression coefficients that minimize the objective function, we need to find derivative of that function M-estimator (contd.) ● We can also reduce M-estimator to other types of regression ○ OLS: 2 ρ(ri) = r ○ Least absolute deviations (LAD): ρ(ri) = |r| ● LAD yields less residuals than OLS but in high-dimensional data OLS can perform slightly better ○ But still 0% breakdown points! ● Challenge: need to choose right ρ function to get a good result Penalized Least Squares ● OLS is equivalent to find maximum likelihood estimate (MLE) of data ● MLE only interested in training data, not in prior knowledge => Overfitting ● Solution: use maximum a posteriori (MAP) Penalized Least Squares ● With prior that the data is normally distributed (Gaussian), calculating MAP is equivalent to ● Intuitively, it is OLS with penalty term ● The above is called ridge regression or l2 regularization Penalized Least Squares (contd.) ● Different assumption on data and prior give different type of regularization From Machine Learning: A Probabilistic Perspective by Murphy Hard Thresholding (TORRENT) ● TORRENT = Thresholding Operator-based Robust RegrEssioN meThod ● Based on l1 regularized regression ● Iteratively maintain the active set St using hard thresholding operator ○ Active set is a set of clean points (not outliers) ● Keep updating weights (regression coeff.) until the residual less than some pre-specified error tolerance TORRENT (contd.) From paper Robust Regression via Hard Thresholding by Bhatia, Jain and Kar TORRENT (contd.) ● Offer several variants which are suitable in different situations ● Variants ○ TORRENT-FC: fully corrective LS, converge faster but expensive at each step ○ TORRENT-GD: using gradient descent, suitable for high dimensional data ○ TORRENT-HYB: hybrid version of above variants Self-Scaled Regularized Robust Regression ● Also based on l1 regularized regression ● Incorperate prior knowledge to make the penalty term able to scaled automatically ○ Prior e.g. Data occurrence Conclusion ● OLS is sensitive to outliers ● LMedS have high breakdown point but slow ● M-Estimate is flexible but hard to find the right function to make it robust ● Penalized Least Squares is also robust but require prior knowledge on data ○ Sometime need strong assumption and not always correct Remarks ● Old papers tend to talk more about high breakdown point i.e. try to reach 50% breakdown point ● More recent papers interested in computational speed instead ○ Effect of high dimensional data.

Load more