Algorithms for Robust Linear Regression by Exploiting the Connection to Sparse Signal Recovery

ALGORITHMS FOR ROBUST LINEAR REGRESSION BY EXPLOITING THE CONNECTION TO SPARSE SIGNAL RECOVERY Yuzhe Jin and Bhaskar D. Rao Department of Electrical and Computer Engineering, University of California at San Diego, La Jolla, CA 92093-0407, USA fyujin, [email protected] ABSTRACT is the fitting error. This criterion is sensitive to outliers and hence In this paper, we develop algorithms for robust linear regression by not robust. Many existing methods for robust regression follow the leveraging the connection between the problems of robust regres- idea that one should de-emphasize the impact of data with large sion and sparse signal recovery. We explicitly model the measure- deviation in order to obtain robustness. For example, the method ment noise as a combination of two terms; the first term accounts for of Least Absolute Value (LAV) [4] is probably aP well-known rep- M j j regular measurement noise modeled as zero mean Gaussian noise, resentative of this kind. This method minimizes i=1 e~i , which and the second term captures the impact of outliers. The fact that can be equivalently viewed as imposing a Laplacian distribution on the latter outlier component could indeed be a sparse vector pro- the measurement noise ei. Alternatively, the family of M-estimates vides the opportunity to leverage sparse signal reconstruction meth- [1] consider flexible weighting schemes on the fitting error e~i. ods to solve the problem of robust regression. Maximum a posteriori The weighting functions used in M-estimates aim to de-emphasize (MAP) based and empirical Bayesian inference based algorithms are samples with large deviation, and they can also be related to cer- developed for this purpose. Experimental studies on simulated and tain probability densities imposed on the measurement noise. By real data sets are presented to demonstrate the effectiveness of the assuming the measurement noise ei is drawn from the Student’s proposed algorithms. t-distribution [5], the impact of extreme errors is also effectively downscaled. Further, the model of Gaussian mixtures also has been Index Terms— robust linear regression, sparse signal recovery, employed in robust regression wherein samples of the measurement outlier detection, MAP, sparse Bayesian learning noise ei are assumed to be i.i.d. and drawn from a mixture of two Gaussians with one accounting for regular noise and the other for 1. INTRODUCTION outliers [6]. In addition, robust procedures that aim to explicitly remove the Consider the linear regression problem with the model impact of extreme errors have also been developed. For instance, | the method of Least Trimmed Squares (LTS) [7] employs the opti- yi = a xi + ei; for i = 1; 2; :::; M; (1) mization criterion that minimizes only a portion of the squared fitting L errors with smallest magnitudes. The essence of this method can be where xi 2 R is usually termed as the explanatory variable, yi is the response variable, a 2 RL is the regression coefficients, L is the viewed as roughly detecting outliers and removing their impact at the data fitting stage. This idea can be generalized to various out- model order, and ei is the measurement noise in the ith response. Note that model (1) can be compactly represented by lier diagnosis techniques [7]. For an extensive survey of previous work on robust regression and outlier detection, interested readers y = Xa + e; (2) are referred to [1][3][7] and the references therein. It is interesting to note that the probability distribution underly- | | 2 RM where X = [x1; x2; :::; xM ] , e = [e1; e2; :::; eM ] and ing various robust regression methods indeed have counterparts in | 2 RM y = [y1; y2; :::; yM ] . The goal is to determine the regres- the context of the sparse signal recovery problem which has recently sion coefficients a and this is often achieved by using a suitable opti- received much attention in many application domains [8][9]. The mization criterion. This problem has many important applications in heavy-tailed outlier-tolerating priors imposed on the measurement science and engineering. An important factor that makes this prob- noise correspond to the sparsity-inducing distributions in sparse sig- lem interesting and challenging is that the response variable y may nal recovery. The Laplacian distribution in the LAV method and its usually contain outliers. The popular ordinary Least Squares (LS) use in the corresponding `1-norm minimization based sparse signal is sensitive to outliers and hence robust regression methods are of recovery algorithms serves as an excellent example of this kind. As interest. Numerous approaches for robust regression have been de- another example, the LTS method exhibits very similar ingredient veloped [1][2][3] with the goal of extracting the model parameters to the thresholding method that is used for finding sparse solutions. reliably in the presence of outliers. Our work examines this connection more deeply. Intuitively, this connection is made possible by the fact that outliers are events that 1.1. Background occur infrequently, and thus sparse. To make the connection more explicit, in Section 2.1 we develop a two component model for the As a popular technique, ordinaryP LS estimation determines the M | additive noise and reformulate the regression problem such that the model parameters a by minimizing e~2; where e~ = y − a x i=1 i i i i usefulness of sparse recovery methods is evident. Using this new This research was supported by NSF Grants IIS-0613595 and CCF- formulation and connection, in Section 2.2 we develop algorithms 0830612. for the robust regression which are based on sparse signal recovery methods. They are then evaluated in Section 3 and demonstrated to denoising [8]) and `1-regularization problem [11]. By estimating w, be effective in combating the negative impact of outliers. these algorithms determine how each observation is contaminated. In contrast, the LAVmethod, which can be obtained by letting ζ ! 0 2. TWO COMPONENT MODEL AND SPARSE SIGNAL in (6), assumes a Laplacian prior on the total noise and minimizes RECOVERY ALGORITHMS FOR ROBUST REGRESSION the sum of `1-norm of the fitting errors. As a result, it is not able to clarify the underlying mechanism of noise contamination. 2.1. The two component model To solve the above optimization problems, (5) and (6) will be- come convex optimization problems when p = 1 and (5) will be We leverage the fact that outliers occur infrequently and hence are considered in the simulation study. Motivated by the analysis in sparse. Unfortunately, the linear model (1) leaves us little opportu- [8, 5.2]p and our experience, we choose the regularization parameter nity to take advantage of this observation, since a single measure- σ~ 2 log M 1 λ = 3 , where σ~ is a proper estimation of scale. Procedures ment noise term ei deals with both the impact of outlier and regular can be developed for other choices of p [12]. noise. To explicitly make use of the sparsity of outliers, we suggest an alternative model by splitting e into two independent additive i 2.2.2. Empirical Bayesian inference based robust regression components, namely wi and ϵi, as follows, | This method adopts the empirical Bayesian approach for robust yi = a xi + wi + ϵi; for i = 1; 2; :::; M: (3) regression. In particular, we utilize the Sparse Bayesian learning methodology developed in [13][14]. To this end, it is assumed that The interpretations of xi; yi and a are carried over from (1). If a wi is a random variable with prior distribution wi ∼ N (0; γi), response yi is not an outlier, then the corresponding wi is assumed where γi is the hyperparameter that controls the variance of each wi to be zero. If yi is an outlier, then wi can be viewed as the anomalous and has to be learnt. If γi = 0, it means the corresponding wi will be error in yi such that (yi −wi) appears to be a response contaminated zero, resulting in no anomalous error being added into observation only by regular noise. The term ϵi, on the other hand, contains the yi. If γi > 0, an anomalous noise whose magnitude depends on γi regular measurement noise in response yi, and it is modeled as i.i.d. 2 will contaminate yi, and it results in an outlier in the measurement. zero mean Gaussian noise, i.e. ϵi ∼ N (0; σ ). Compactly, model (3) can be represented by To estimate the regression coefficients, we jointly find [ ] a^; γ^; σ^2 = arg max P (yjX; a; γ; σ2); (7) a 2 y = Xa + w + ϵ = [X; I] + ϵ; (4) a;γ,σ w where γ , fγ1; γ2; :::; γM g. Then w can be estimated by the pos- | where, in addition to (2), w = [w1; w2; :::; wM ] , and ϵ = terior mean, i.e. [ϵ ; ϵ ; :::; ϵ ]|. By definition, w is a sparse vector, which means 1 2 M 2 the number of nonzero entries of w is (much) smaller than the w^ = E[wjX; y; a^; γ^; σ^ ]: (8) length of w. As we shall see, this model (4) enables the opportunity Note that the essence of this method is that the robust regres- to adapt sparse signal recovery methods to robust regression. sion problem is cast into the framework of Sparse Bayesian learning (SBL) with appropriate modifications, and this is made possible by 2.2. Sparse signal recovery algorithms for robust linear regres- our proposed two component noise modeling technique. The algo- sion rithm development, analysis and experimental study of the original Since w is sparse, one can utilize ideas from sparse signal recovery SBL for sparse signal recovery have been extensively discussed in to develop robust linear regression methods. In this work, we con- [13][14][15].

Load more