Robust Methods

Total Page:16

File Type:pdf, Size:1020Kb

Robust Methods Statistics 203: Introduction to Regression and Analysis of Variance Robust methods Jonathan Taylor - p. 1/18 Today’s class ● Today’s class ■ Weighted regression. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares Robust methods. 2 ● Estimating σ ■ ● Weighted regression example Robust regression. ● Robust methods ● Example ● M -estimators ● Huber’s ● Hampel’s ● Tukey’s ● Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale ● Other resistant fitting methods ● Why not always use robust regression? - p. 2/18 Heteroskedasticity ● Today’s class ■ In our standard model, we have assumed that ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 2 " ∼ N(0; σ I): ● Estimating σ ● Weighted regression example ● Robust methods That is, that the errors are independent and have the same ● Example ● M -estimators variance (homoskedastic). ● Huber’s ● Hampel’s ■ We have discussed graphical checks for non-constant ● Tukey’s ● Solving for βb variance (heteroskedasticity) but not “remedies” for ● Iteratively reweighted least squares (IRLS) heteroskedasticity. ● Robust estimate of scale ■ ● Other resistant fitting methods Suppose that ● Why not always use robust 2 regression? " ∼ N(0; σ D) for some known diagonal matrix D. ■ Where does D come from? Suppose that we see that variance increases like f(Xj), then we might choose Di = f(Xij). ■ What is the “maximum likelihood” thing to do? - p. 3/18 MLE for one sample problem ● Today’s class ■ Consider the simpler problem ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 2 Yi ∼ N(µ, σ Di) ● Estimating σ ● Weighted regression example ● 2 Robust methods with σ and D’s known. ● Example ● M -estimators ■ ● Huber’s ● Hampel’s ● Tukey’s n 2 n (Yi − µ) ● Solving for β 2 b −2 log L(µjY; σ) = + n log(2πσ ) + log(Di) ● Iteratively reweighted least 2 σ Di squares (IRLS) i=1 i=1 ● Robust estimate of scale X X ● Other resistant fitting methods ■ ● Why not always use robust Differentiating regression? n (Y − µ) −2 i = 0 σ2D i=1 i X b implying n Y n 1 µ = i = : D D i=1 i i=1 i X X ■ Observations are weightedb inversely proportional to their variance. - p. 4/18 Weighted least squares ● Today’s class ■ ● Heteroskedasticity n p−1 ● 2 MLE for one sample problem (Yi − β0 − j=1 βj Xij) ● Weighted least squares 2 −2 log L(β; σjY; D) = 2 ● Estimating σ σ Di ● Weighted regression example i=1 P ● Robust methods X ● Example 1 t −1 ● M -estimators = 2 (Y − Xβ) D (Y − Xβ) ● Huber’s σ ● Hampel’s ● Tukey’s 1 t ● = (Y − Xβ) W (Y − Xβ): Solving for βb 2 ● Iteratively reweighted least σ squares (IRLS) −1 ● Robust estimate of scale with W = D . ● Other resistant fitting methods ● Why not always use robust ■ Normal equations: regression? t −2X W (Y − XβW ) = 0 or, b t −1 t βW = (X W X) X W Y: ■ Distribution of β W b 2 t −1 βW ∼ N(β; σ (X W X) ): b - p. 5/18 b Estimating σ2 ● Today’s class ■ What are the right residuals? ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares If we knew β exactly 2 ● Estimating σ ● Weighted regression example p−1 ● Robust methods 2 ● Example Yi − β0 − Xijβj ∼ N(0; σ =Wi): ● M -estimators ● Huber’s i=1 ● Hampel’s X ● Tukey’s ■ ● Suggests that the natural residual is Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale Yi − YW;i ● Other resistant fitting methods eW;i = Wi Wiei ● Why not always use robust = regression? where p b p t −1 t YW = (X W X) X W Y: ■ Estimate of σ2 b 1 1 χ2 σ2 = e2 = w e2 ∼ σ2 n−p W n − p W;i n − p i i n − p i=1 i=1 X X b - p. 6/18 Weighted regression example ● Today’s class ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 ● Estimating σ ● Weighted regression example ● Robust methods ● Example ● M -estimators ● Huber’s ● Hampel’s ● Tukey’s ● Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale ● Other resistant fitting methods ● Why not always use robust regression? - p. 7/18 Robust methods ● Today’s class ■ We also discussed outlier detection but no specific remedies. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares One alternative is to discard potential outliers – not always a 2 ● Estimating σ good idea. ● Weighted regression example ● Robust methods ■ Outliers can really mess up the sample mean, but have ● Example ● M -estimators relatively effect on the sample median. ● Huber’s ● Hampel’s ■ ● Tukey’s Could also “downweight” outliers: basis of robust techniques. ● Solving for βb ■ ● Iteratively reweighted least Another “cause” of outliers may be that the data is not really squares (IRLS) ● Robust estimate of scale normally distributed. ● Other resistant fitting methods ● Why not always use robust regression? - p. 8/18 Example ● Today’s class ■ Suppose that we have a sample from ● Heteroskedasticity (Yi)1≤i≤n ● MLE for one sample problem ● Weighted least squares 2 1 −|y−µj/σ ● Estimating σ f(yjµ, σ) = e : ● Weighted regression example 2σ ● Robust methods ● Example ● M -estimators This has heavier tails than the normal distribution. ● Huber’s ■ ● Hampel’s MLE for µ ● Tukey’s n ● Solving for βb ● Iteratively reweighted least µ = argmin jYi − µj: squares (IRLS) µ ● Robust estimate of scale i=1 ● Other resistant fitting methods X ● Why not always use robust ■ It can be shown that µ is the sample median (exercise in regression? b STATS116). ■ Take home message:bif errors are not really normally distributed then least squares is not MLE and the MLE downweights large residuals relative to least squares. - p. 9/18 M-estimators ● Today’s class ■ Depending on the error distribution of " (assuming i.i.d.) we ● Heteroskedasticity ● MLE for one sample problem get different optimization problems. ● Weighted least squares 2 ● Estimating σ ■ Suppose that ● Weighted regression example ● −ρ((y−µ)=s) Robust methods f(yjµ, s) / e : ● Example ● M -estimators ■ ● Huber’s MLE: ● Hampel’s n ● Tukey’s Yi − µ ● argmin Solving for βb µ = ρ : ● Iteratively reweighted least µ s i=1 squares (IRLS) X ● Robust estimate of scale ■ ● Other resistant fitting methods For every ρ, we getb a different estimator: an M-estimator. ● Why not always use robust 0 regression? Let = ρ . ■ Generalizes easily to regression problem n p−1 Yi − β0 − Xi;jβj β = argmin ρ j=1 : s β i=1 P ! X b - p. 10/18 Huber’s 1.0 0.5 0.0 psi.huber(x) * x −0.5 −1.0 −4 −2 0 2 4 x - p. 11/18 Hampel’s 2 1 0 psi.hampel(x) * x −1 −2 −10 −5 0 5 10 x - p. 12/18 Tukey’s 1.0 0.5 0.0 psi.bisquare(x) * x −0.5 −1.0 −4 −2 0 2 4 x - p. 13/18 Solving for β b ● Today’s class ■ Assume for now that s is known in the M-estimator. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares Normal equations: 2 ● Estimating σ ● Weighted regression example n p−1 ● Robust methods Yi − j=0 Xijβj ● Example Xij = 0; 0 ≤ j ≤ p − 1 ● M -estimators s ● Huber’s i=1 P ! ● Hampel’s X b ● Tukey’s where 0. ● = ρ Solving for βb ● Iteratively reweighted least ■ squares (IRLS) Set ● Robust estimate of scale p−1 Yi−P =0 Xij βj ● Other resistant fitting methods j b ● Why not always use robust s regression? Wi = p−1 : Yi − j=0 Xijβj ■ Then P b n p−1 XijWi(Yi − Xijβj ) = 0; 0 ≤ j ≤ p − 1: i=1 j=0 X X b Or βXtW X − XtW Y = 0: - p. 14/18 b Iteratively reweighted least squares (IRLS) ● Today’s class ■ Although we are trying to estimate β above, given an initial ● Heteroskedasticity ● MLE for one sample problem 0 ● Weighted least squares estimate β we can compute initial weights 2 ● Estimating σ ● Weighted regression example p−1 0 p−1 ● Robust methods b Yi − j=0 Xijβj ● Example 0 0 Wi = = Yi − Xijβj ● M -estimators s 0 1 ● Huber’s P ! j=0 ● Hampel’s b X ● Tukey’s @ b A ● Solving for 1 βb ans solve for ● Iteratively reweighted least β squares (IRLS) ● Robust estimate of scale 1 t 0 −1 t 0 ● Other resistant fitting methods b β = (X W X) X W Y: ● Why not always use robust regression? ■ Now we can recomputeb weights, and reestimate β. ■ In general, given weights W j we can solve b βj+1 = (XtW jX)−1XtW jY: ■ This is very similar to a Newton-Raphson technique. b ■ Used over and over again in statistics. - p. 15/18 Robust estimate of scale ● Today’s class ■ In general s needs to be estimated. A popular estimate ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 s = MAD(e1; : : : ; en)=0:6745: ● Estimating σ ● Weighted regression example ● Robust methods where ● Example ● M -estimators ● Huber’s MAD(e1; : : : ; en) = Medianjei − Median(e1; : : : ; en)j: ● Hampel’s ● Tukey’s ● ■ Solving for βb Scale can be estimated at each stage of IRLS procedure ● Iteratively reweighted least squares (IRLS) based on current residuals, or based on some “very ● Robust estimate of scale ● Other resistant fitting methods resistant” fit. (LMS or LTS below) ● Why not always use robust ■ regression? The constant 0.6745 is chosen so that s is asymptotically 2 unbiased for σ if the ei’s are N(0; σ ). - p. 16/18 Other resistant fitting methods ● Today’s class ■ If is not redescending (i.e. does not return to 0) then ● Heteroskedasticity ● MLE for one sample problem robust regression is still susceptible to outliers.
Recommended publications
  • Robust Statistics Part 3: Regression Analysis
    Robust Statistics Part 3: Regression analysis Peter Rousseeuw LARS-IASC School, May 2019 Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 1 Linear regression Linear regression: Outline 1 Classical regression estimators 2 Classical outlier diagnostics 3 Regression M-estimators 4 The LTS estimator 5 Outlier detection 6 Regression S-estimators and MM-estimators 7 Regression with categorical predictors 8 Software Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 2 Linear regression Classical estimators The linear regression model The linear regression model says: yi = β0 + β1xi1 + ... + βpxip + εi ′ = xiβ + εi 2 ′ ′ with i.i.d. errors εi ∼ N(0,σ ), xi = (1,xi1,...,xip) and β =(β0,β1,...,βp) . ′ Denote the n × (p + 1) matrix containing the predictors xi as X =(x1,..., xn) , ′ ′ the vector of responses y =(y1,...,yn) and the error vector ε =(ε1,...,εn) . Then: y = Xβ + ε Any regression estimate βˆ yields fitted values yˆ = Xβˆ and residuals ri = ri(βˆ)= yi − yˆi . Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 3 Linear regression Classical estimators The least squares estimator Least squares estimator n ˆ 2 βLS = argmin ri (β) β i=1 X If X has full rank, then the solution is unique and given by ˆ ′ −1 ′ βLS =(X X) X y The usual unbiased estimator of the error variance is n 1 σˆ2 = r2(βˆ ) LS n − p − 1 i LS i=1 X Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p. 4 Linear regression Classical estimators Outliers in regression Different types of outliers: vertical outlier good leverage point • • y • • • regular data • ••• • •• ••• • • • • • • • • • bad leverage point • • •• • x Peter Rousseeuw Robust Statistics, Part 3: Regression LARS-IASC School, May 2019 p.
    [Show full text]
  • Generalized Least Squares and Weighted Least Squares Estimation
    REVSTAT – Statistical Journal Volume 13, Number 3, November 2015, 263–282 GENERALIZED LEAST SQUARES AND WEIGH- TED LEAST SQUARES ESTIMATION METHODS FOR DISTRIBUTIONAL PARAMETERS Author: Yeliz Mert Kantar – Department of Statistics, Faculty of Science, Anadolu University, Eskisehir, Turkey [email protected] Received: March 2014 Revised: August 2014 Accepted: August 2014 Abstract: • Regression procedures are often used for estimating distributional parameters because of their computational simplicity and useful graphical presentation. However, the re- sulting regression model may have heteroscedasticity and/or correction problems and thus, weighted least squares estimation or alternative estimation methods should be used. In this study, we consider generalized least squares and weighted least squares estimation methods, based on an easily calculated approximation of the covariance matrix, for distributional parameters. The considered estimation methods are then applied to the estimation of parameters of different distributions, such as Weibull, log-logistic and Pareto. The results of the Monte Carlo simulation show that the generalized least squares method for the shape parameter of the considered distri- butions provides for most cases better performance than the maximum likelihood, least-squares and some alternative estimation methods. Certain real life examples are provided to further demonstrate the performance of the considered generalized least squares estimation method. Key-Words: • probability plot; heteroscedasticity; autocorrelation; generalized least squares; weighted least squares; shape parameter. 264 Yeliz Mert Kantar Generalized Least Squares and Weighted Least Squares 265 1. INTRODUCTION Regression procedures are often used for estimating distributional param- eters. In this procedure, the distribution function is transformed to a linear re- gression model. Thus, least squares (LS) estimation and other regression estima- tion methods can be employed to estimate parameters of a specified distribution.
    [Show full text]
  • Generalized and Weighted Least Squares Estimation
    LINEAR REGRESSION ANALYSIS MODULE – VII Lecture – 25 Generalized and Weighted Least Squares Estimation Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 The usual linear regression model assumes that all the random error components are identically and independently distributed with constant variance. When this assumption is violated, then ordinary least squares estimator of regression coefficient looses its property of minimum variance in the class of linear and unbiased estimators. The violation of such assumption can arise in anyone of the following situations: 1. The variance of random error components is not constant. 2. The random error components are not independent. 3. The random error components do not have constant variance as well as they are not independent. In such cases, the covariance matrix of random error components does not remain in the form of an identity matrix but can be considered as any positive definite matrix. Under such assumption, the OLSE does not remain efficient as in the case of identity covariance matrix. The generalized or weighted least squares method is used in such situations to estimate the parameters of the model. In this method, the deviation between the observed and expected values of yi is multiplied by a weight ω i where ω i is chosen to be inversely proportional to the variance of yi. n 2 For simple linear regression model, the weighted least squares function is S(,)ββ01=∑ ωii( yx −− β0 β 1i) . ββ The least squares normal equations are obtained by differentiating S (,) ββ 01 with respect to 01 and and equating them to zero as nn n ˆˆ β01∑∑∑ ωβi+= ωiixy ω ii ii=11= i= 1 n nn ˆˆ2 βω01∑iix+= βω ∑∑ii x ω iii xy.
    [Show full text]
  • Lecture 24: Weighted and Generalized Least Squares 1
    Lecture 24: Weighted and Generalized Least Squares 1 Weighted Least Squares When we use ordinary least squares to estimate linear regression, we minimize the mean squared error: n 1 X MSE(b) = (Y − X β)2 (1) n i i· i=1 th where Xi· is the i row of X. The solution is T −1 T βbOLS = (X X) X Y: (2) Suppose we minimize the weighted MSE n 1 X W MSE(b; w ; : : : w ) = w (Y − X b)2: (3) 1 n n i i i· i=1 This includes ordinary least squares as the special case where all the weights wi = 1. We can solve it by the same kind of linear algebra we used to solve the ordinary linear least squares problem. If we write W for the matrix with the wi on the diagonal and zeroes everywhere else, then W MSE = n−1(Y − Xb)T W(Y − Xb) (4) 1 = YT WY − YT WXb − bT XT WY + bT XT WXb : (5) n Differentiating with respect to b, we get as the gradient 2 r W MSE = −XT WY + XT WXb : b n Setting this to zero at the optimum and solving, T −1 T βbW LS = (X WX) X WY: (6) But why would we want to minimize Eq. 3? 1. Focusing accuracy. We may care very strongly about predicting the response for certain values of the input | ones we expect to see often again, ones where mistakes are especially costly or embarrassing or painful, etc. | than others. If we give the points near that region big weights, and points elsewhere smaller weights, the regression will be pulled towards matching the data in that region.
    [Show full text]
  • Quantile Regression for Overdispersed Count Data: a Hierarchical Method Peter Congdon
    Congdon Journal of Statistical Distributions and Applications (2017) 4:18 DOI 10.1186/s40488-017-0073-4 RESEARCH Open Access Quantile regression for overdispersed count data: a hierarchical method Peter Congdon Correspondence: [email protected] Abstract Queen Mary University of London, London, UK Generalized Poisson regression is commonly applied to overdispersed count data, and focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to response outliers and provide no information on other conditional distribution features of the response. We consider instead a hierarchical approach to quantile regression of overdispersed count data. This approach has the benefits of effective outlier detection and robust estimation in the presence of outliers, and in health applications, that quantile estimates can reflect risk factors. The technique is first illustrated with simulated overdispersed counts subject to contamination, such that estimates from conditional mean regression are adversely affected. A real application involves ambulatory care sensitive emergency admissions across 7518 English patient general practitioner (GP) practices. Predictors are GP practice deprivation, patient satisfaction with care and opening hours, and region. Impacts of deprivation are particularly important in policy terms as indicating effectiveness of efforts to reduce inequalities in care sensitive admissions. Hierarchical quantile count regression is used to develop profiles of central and extreme quantiles according to specified predictor combinations. Keywords: Quantile regression, Overdispersion, Poisson, Count data, Ambulatory sensitive, Median regression, Deprivation, Outliers 1. Background Extensions of Poisson regression are commonly applied to overdispersed count data, focused on modelling the conditional mean of the response. However, conditional mean regression models may be sensitive to response outliers.
    [Show full text]
  • Weighted Least Squares
    Weighted Least Squares 2 ∗ The standard linear model assumes that Var("i) = σ for i = 1; : : : ; n. ∗ As we have seen, however, there are instances where σ2 Var(Y j X = xi) = Var("i) = : wi ∗ Here w1; : : : ; wn are known positive constants. ∗ Weighted least squares is an estimation technique which weights the observations proportional to the reciprocal of the error variance for that observation and so overcomes the issue of non-constant variance. 7-1 Weighted Least Squares in Simple Regression ∗ Suppose that we have the following model Yi = β0 + β1Xi + "i i = 1; : : : ; n 2 where "i ∼ N(0; σ =wi) for known constants w1; : : : ; wn. ∗ The weighted least squares estimates of β0 and β1 minimize the quantity n X 2 Sw(β0; β1) = wi(yi − β0 − β1xi) i=1 ∗ Note that in this weighted sum of squares, the weights are inversely proportional to the corresponding variances; points with low variance will be given higher weights and points with higher variance are given lower weights. 7-2 Weighted Least Squares in Simple Regression ∗ The weighted least squares estimates are then given as ^ ^ β0 = yw − β1xw P w (x − x )(y − y ) ^ = i i w i w β1 P 2 wi(xi − xw) where xw and yw are the weighted means P w x P w y = i i = i i xw P yw P : wi wi ∗ Some algebra shows that the weighted least squares esti- mates are still unbiased. 7-3 Weighted Least Squares in Simple Regression ∗ Furthermore we can find their variances 2 ^ σ Var(β1) = X 2 wi(xi − xw) 2 3 1 2 ^ xw 2 Var(β0) = 4X + X 25 σ wi wi(xi − xw) ∗ Since the estimates can be written in terms of normal random variables, the sampling distributions are still normal.
    [Show full text]
  • Robust Linear Regression: a Review and Comparison Arxiv:1404.6274
    Robust Linear Regression: A Review and Comparison Chun Yu1, Weixin Yao1, and Xue Bai1 1Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802. Abstract Ordinary least-squares (OLS) estimators for a linear model are very sensitive to unusual values in the design space or outliers among y values. Even one single atypical value may have a large effect on the parameter estimates. This article aims to review and describe some available and popular robust techniques, including some recent developed ones, and compare them in terms of breakdown point and efficiency. In addition, we also use a simulation study and a real data application to compare the performance of existing robust methods under different scenarios. arXiv:1404.6274v1 [stat.ME] 24 Apr 2014 Key words: Breakdown point; Robust estimate; Linear Regression. 1 Introduction Linear regression has been one of the most important statistical data analysis tools. Given the independent and identically distributed (iid) observations (xi; yi), i = 1; : : : ; n, 1 in order to understand how the response yis are related to the covariates xis, we tradi- tionally assume the following linear regression model T yi = xi β + "i; (1.1) where β is an unknown p × 1 vector, and the "is are i.i.d. and independent of xi with E("i j xi) = 0. The most commonly used estimate for β is the ordinary least square (OLS) estimate which minimizes the sum of squared residuals n X T 2 (yi − xi β) : (1.2) i=1 However, it is well known that the OLS estimate is extremely sensitive to the outliers.
    [Show full text]
  • A Guide to Robust Statistical Methods in Neuroscience
    A GUIDE TO ROBUST STATISTICAL METHODS IN NEUROSCIENCE Authors: Rand R. Wilcox1∗, Guillaume A. Rousselet2 1. Dept. of Psychology, University of Southern California, Los Angeles, CA 90089-1061, USA 2. Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, G12 8QB, Glasgow, UK ∗ Corresponding author: [email protected] ABSTRACT There is a vast array of new and improved methods for comparing groups and studying associations that offer the potential for substantially increasing power, providing improved control over the probability of a Type I error, and yielding a deeper and more nuanced understanding of data. These new techniques effectively deal with four insights into when and why conventional methods can be unsatisfactory. But for the non-statistician, the vast array of new and improved techniques for comparing groups and studying associations can seem daunting, simply because there are so many new methods that are now available. The paper briefly reviews when and why conventional methods can have relatively low power and yield misleading results. The main goal is to suggest some general guidelines regarding when, how and why certain modern techniques might be used. Keywords: Non-normality, heteroscedasticity, skewed distributions, outliers, curvature. 1 1 Introduction The typical introductory statistics course covers classic methods for comparing groups (e.g., Student's t-test, the ANOVA F test and the Wilcoxon{Mann{Whitney test) and studying associations (e.g., Pearson's correlation and least squares regression). The two-sample Stu- dent's t-test and the ANOVA F test assume that sampling is from normal distributions and that the population variances are identical, which is generally known as the homoscedastic- ity assumption.
    [Show full text]
  • Robust Bayesian General Linear Models ⁎ W.D
    www.elsevier.com/locate/ynimg NeuroImage 36 (2007) 661–671 Robust Bayesian general linear models ⁎ W.D. Penny, J. Kilner, and F. Blankenburg Wellcome Department of Imaging Neuroscience, University College London, 12 Queen Square, London WC1N 3BG, UK Received 22 September 2006; revised 20 November 2006; accepted 25 January 2007 Available online 7 May 2007 We describe a Bayesian learning algorithm for Robust General Linear them from the data (Jung et al., 1999). This is, however, a non- Models (RGLMs). The noise is modeled as a Mixture of Gaussians automatic process and will typically require user intervention to rather than the usual single Gaussian. This allows different data points disambiguate the discovered components. In fMRI, autoregressive to be associated with different noise levels and effectively provides a (AR) modeling can be used to downweight the impact of periodic robust estimation of regression coefficients. A variational inference respiratory or cardiac noise sources (Penny et al., 2003). More framework is used to prevent overfitting and provides a model order recently, a number of approaches based on robust regression have selection criterion for noise model order. This allows the RGLM to default to the usual GLM when robustness is not required. The method been applied to imaging data (Wager et al., 2005; Diedrichsen and is compared to other robust regression methods and applied to Shadmehr, 2005). These approaches relax the assumption under- synthetic data and fMRI. lying ordinary regression that the errors be normally (Wager et al., © 2007 Elsevier Inc. All rights reserved. 2005) or identically (Diedrichsen and Shadmehr, 2005) distributed. In Wager et al.
    [Show full text]
  • Sketching for M-Estimators: a Unified Approach to Robust Regression
    Sketching for M-Estimators: A Unified Approach to Robust Regression Kenneth L. Clarkson∗ David P. Woodruffy Abstract ing algorithm often results, since a single pass over A We give algorithms for the M-estimators minx kAx − bkG, suffices to compute SA. n×d n n where A 2 R and b 2 R , and kykG for y 2 R is specified An important property of many of these sketching ≥0 P by a cost function G : R 7! R , with kykG ≡ i G(yi). constructions is that S is a subspace embedding, meaning d The M-estimators generalize `p regression, for which G(x) = that for all x 2 R , kSAxk ≈ kAxk. (Here the vector jxjp. We first show that the Huber measure can be computed norm is generally `p for some p.) For the regression up to relative error in O(nnz(A) log n + poly(d(log n)=")) d time, where nnz(A) denotes the number of non-zero entries problem of minimizing kAx − bk with respect to x 2 R , of the matrix A. Huber is arguably the most widely used for inputs A 2 Rn×d and b 2 Rn, a minor extension of M-estimator, enjoying the robustness properties of `1 as well the embedding condition implies S preserves the norm as the smoothness properties of ` . 2 of the residual vector Ax − b, that is kS(Ax − b)k ≈ We next develop algorithms for general M-estimators. kAx − bk, so that a vector x that makes kS(Ax − b)k We analyze the M-sketch, which is a variation of a sketch small will also make kAx − bk small.
    [Show full text]
  • On Robust Regression with High-Dimensional Predictors
    On robust regression with high-dimensional predictors Noureddine El Karoui ∗, Derek Bean ∗ , Peter Bickel ∗ , Chinghway Lim y, and Bin Yu ∗ ∗University of California, Berkeley, and yNational University of Singapore Submitted to Proceedings of the National Academy of Sciences of the United States of America We study regression M-estimates in the setting where p, the num- pointed out a surprising feature of the regime, p=n ! κ > 0 ber of covariates, and n, the number of observations, are both large for LSE; fitted values were not asymptotically Gaussian. He but p ≤ n. We find an exact stochastic representation for the dis- was unable to deal with this regime otherwise, see the discus- Pn 0 tribution of β = argmin p ρ(Y − X β) at fixed p and n b β2R i=1 i i sion on p.802 of [6]. under various assumptions on the objective function ρ and our sta- In this paper we intend to, in part heuristically and with tistical model. A scalar random variable whose deterministic limit \computer validation", analyze fully what happens in robust rρ(κ) can be studied when p=n ! κ > 0 plays a central role in this regression when p=n ! κ < 1. We do limit ourselves to Gaus- representation. Furthermore, we discover a non-linear system of two deterministic sian covariates but present grounds that the behavior holds equations that characterizes rρ(κ). Interestingly, the system shows much more generally. We also investigate the sensitivity of that rρ(κ) depends on ρ through proximal mappings of ρ as well as our results to the geometry of the design matrix.
    [Show full text]
  • Robust Regression in Stata
    The Stata Journal (yyyy) vv, Number ii, pp. 1–23 Robust Regression in Stata Vincenzo Verardi University of Namur (CRED) and Universit´eLibre de Bruxelles (ECARES and CKE) Rempart de la Vierge, 8, B-5000 Namur. E-mail: [email protected]. Vincenzo Verardi is Associated Researcher of the FNRS and gratefully acknowledges their finacial support Christophe Croux K.U.Leuven, Faculty of Business and Economics Naamsestraat 69. B-3000, Leuven E-mail: [email protected]. Abstract. In regression analysis, the presence of outliers in the data set can strongly distort the classical least squares estimator and lead to un- reliable results. To deal with this, several robust-to-outliers methods have been proposed in the statistical literature. In Stata, some of these methods are available through the commands rreg and qreg. Unfortunately, these methods only resist to some specific types of outliers and turn out to be ineffective under alternative scenarios. In this paper we present more ef- fective robust estimators that we implemented in Stata. We also present a graphical tool that allows recognizing the type of detected outliers. KEYWORDS: S-estimators, MM-estimators, Outliers, Robustness JEL CLASSIFICATION: C12, C21, C87 c yyyy StataCorp LP st0001 2 Robust Regression in Stata 1 Introduction The objective of linear regression analysis is to study how a dependent variable is linearly related to a set of regressors. In matrix notation, the linear regression model is given by: y = Xθ + ε (1) where, for a sample of size n, y is the (n × 1) vector containing the values for the dependent variable, X is the (n × p) matrix containing the values for the p regressors and ε is the (n × 1) vector containing the error terms.
    [Show full text]