Robust Methods
Total Page:16
File Type:pdf, Size:1020Kb
Statistics 203: Introduction to Regression and Analysis of Variance Robust methods Jonathan Taylor - p. 1/18 Today’s class ● Today’s class ■ Weighted regression. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares Robust methods. 2 ● Estimating σ ■ ● Weighted regression example Robust regression. ● Robust methods ● Example ● M -estimators ● Huber’s ● Hampel’s ● Tukey’s ● Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale ● Other resistant fitting methods ● Why not always use robust regression? - p. 2/18 Heteroskedasticity ● Today’s class ■ In our standard model, we have assumed that ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 2 " ∼ N(0; σ I): ● Estimating σ ● Weighted regression example ● Robust methods That is, that the errors are independent and have the same ● Example ● M -estimators variance (homoskedastic). ● Huber’s ● Hampel’s ■ We have discussed graphical checks for non-constant ● Tukey’s ● Solving for βb variance (heteroskedasticity) but not “remedies” for ● Iteratively reweighted least squares (IRLS) heteroskedasticity. ● Robust estimate of scale ■ ● Other resistant fitting methods Suppose that ● Why not always use robust 2 regression? " ∼ N(0; σ D) for some known diagonal matrix D. ■ Where does D come from? Suppose that we see that variance increases like f(Xj), then we might choose Di = f(Xij). ■ What is the “maximum likelihood” thing to do? - p. 3/18 MLE for one sample problem ● Today’s class ■ Consider the simpler problem ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 2 Yi ∼ N(µ, σ Di) ● Estimating σ ● Weighted regression example ● 2 Robust methods with σ and D’s known. ● Example ● M -estimators ■ ● Huber’s ● Hampel’s ● Tukey’s n 2 n (Yi − µ) ● Solving for β 2 b −2 log L(µjY; σ) = + n log(2πσ ) + log(Di) ● Iteratively reweighted least 2 σ Di squares (IRLS) i=1 i=1 ● Robust estimate of scale X X ● Other resistant fitting methods ■ ● Why not always use robust Differentiating regression? n (Y − µ) −2 i = 0 σ2D i=1 i X b implying n Y n 1 µ = i = : D D i=1 i i=1 i X X ■ Observations are weightedb inversely proportional to their variance. - p. 4/18 Weighted least squares ● Today’s class ■ ● Heteroskedasticity n p−1 ● 2 MLE for one sample problem (Yi − β0 − j=1 βj Xij) ● Weighted least squares 2 −2 log L(β; σjY; D) = 2 ● Estimating σ σ Di ● Weighted regression example i=1 P ● Robust methods X ● Example 1 t −1 ● M -estimators = 2 (Y − Xβ) D (Y − Xβ) ● Huber’s σ ● Hampel’s ● Tukey’s 1 t ● = (Y − Xβ) W (Y − Xβ): Solving for βb 2 ● Iteratively reweighted least σ squares (IRLS) −1 ● Robust estimate of scale with W = D . ● Other resistant fitting methods ● Why not always use robust ■ Normal equations: regression? t −2X W (Y − XβW ) = 0 or, b t −1 t βW = (X W X) X W Y: ■ Distribution of β W b 2 t −1 βW ∼ N(β; σ (X W X) ): b - p. 5/18 b Estimating σ2 ● Today’s class ■ What are the right residuals? ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares If we knew β exactly 2 ● Estimating σ ● Weighted regression example p−1 ● Robust methods 2 ● Example Yi − β0 − Xijβj ∼ N(0; σ =Wi): ● M -estimators ● Huber’s i=1 ● Hampel’s X ● Tukey’s ■ ● Suggests that the natural residual is Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale Yi − YW;i ● Other resistant fitting methods eW;i = Wi Wiei ● Why not always use robust = regression? where p b p t −1 t YW = (X W X) X W Y: ■ Estimate of σ2 b 1 1 χ2 σ2 = e2 = w e2 ∼ σ2 n−p W n − p W;i n − p i i n − p i=1 i=1 X X b - p. 6/18 Weighted regression example ● Today’s class ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 ● Estimating σ ● Weighted regression example ● Robust methods ● Example ● M -estimators ● Huber’s ● Hampel’s ● Tukey’s ● Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale ● Other resistant fitting methods ● Why not always use robust regression? - p. 7/18 Robust methods ● Today’s class ■ We also discussed outlier detection but no specific remedies. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares One alternative is to discard potential outliers – not always a 2 ● Estimating σ good idea. ● Weighted regression example ● Robust methods ■ Outliers can really mess up the sample mean, but have ● Example ● M -estimators relatively effect on the sample median. ● Huber’s ● Hampel’s ■ ● Tukey’s Could also “downweight” outliers: basis of robust techniques. ● Solving for βb ■ ● Iteratively reweighted least Another “cause” of outliers may be that the data is not really squares (IRLS) ● Robust estimate of scale normally distributed. ● Other resistant fitting methods ● Why not always use robust regression? - p. 8/18 Example ● Today’s class ■ Suppose that we have a sample from ● Heteroskedasticity (Yi)1≤i≤n ● MLE for one sample problem ● Weighted least squares 2 1 −|y−µj/σ ● Estimating σ f(yjµ, σ) = e : ● Weighted regression example 2σ ● Robust methods ● Example ● M -estimators This has heavier tails than the normal distribution. ● Huber’s ■ ● Hampel’s MLE for µ ● Tukey’s n ● Solving for βb ● Iteratively reweighted least µ = argmin jYi − µj: squares (IRLS) µ ● Robust estimate of scale i=1 ● Other resistant fitting methods X ● Why not always use robust ■ It can be shown that µ is the sample median (exercise in regression? b STATS116). ■ Take home message:bif errors are not really normally distributed then least squares is not MLE and the MLE downweights large residuals relative to least squares. - p. 9/18 M-estimators ● Today’s class ■ Depending on the error distribution of " (assuming i.i.d.) we ● Heteroskedasticity ● MLE for one sample problem get different optimization problems. ● Weighted least squares 2 ● Estimating σ ■ Suppose that ● Weighted regression example ● −ρ((y−µ)=s) Robust methods f(yjµ, s) / e : ● Example ● M -estimators ■ ● Huber’s MLE: ● Hampel’s n ● Tukey’s Yi − µ ● argmin Solving for βb µ = ρ : ● Iteratively reweighted least µ s i=1 squares (IRLS) X ● Robust estimate of scale ■ ● Other resistant fitting methods For every ρ, we getb a different estimator: an M-estimator. ● Why not always use robust 0 regression? Let = ρ . ■ Generalizes easily to regression problem n p−1 Yi − β0 − Xi;jβj β = argmin ρ j=1 : s β i=1 P ! X b - p. 10/18 Huber’s 1.0 0.5 0.0 psi.huber(x) * x −0.5 −1.0 −4 −2 0 2 4 x - p. 11/18 Hampel’s 2 1 0 psi.hampel(x) * x −1 −2 −10 −5 0 5 10 x - p. 12/18 Tukey’s 1.0 0.5 0.0 psi.bisquare(x) * x −0.5 −1.0 −4 −2 0 2 4 x - p. 13/18 Solving for β b ● Today’s class ■ Assume for now that s is known in the M-estimator. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares Normal equations: 2 ● Estimating σ ● Weighted regression example n p−1 ● Robust methods Yi − j=0 Xijβj ● Example Xij = 0; 0 ≤ j ≤ p − 1 ● M -estimators s ● Huber’s i=1 P ! ● Hampel’s X b ● Tukey’s where 0. ● = ρ Solving for βb ● Iteratively reweighted least ■ squares (IRLS) Set ● Robust estimate of scale p−1 Yi−P =0 Xij βj ● Other resistant fitting methods j b ● Why not always use robust s regression? Wi = p−1 : Yi − j=0 Xijβj ■ Then P b n p−1 XijWi(Yi − Xijβj ) = 0; 0 ≤ j ≤ p − 1: i=1 j=0 X X b Or βXtW X − XtW Y = 0: - p. 14/18 b Iteratively reweighted least squares (IRLS) ● Today’s class ■ Although we are trying to estimate β above, given an initial ● Heteroskedasticity ● MLE for one sample problem 0 ● Weighted least squares estimate β we can compute initial weights 2 ● Estimating σ ● Weighted regression example p−1 0 p−1 ● Robust methods b Yi − j=0 Xijβj ● Example 0 0 Wi = = Yi − Xijβj ● M -estimators s 0 1 ● Huber’s P ! j=0 ● Hampel’s b X ● Tukey’s @ b A ● Solving for 1 βb ans solve for ● Iteratively reweighted least β squares (IRLS) ● Robust estimate of scale 1 t 0 −1 t 0 ● Other resistant fitting methods b β = (X W X) X W Y: ● Why not always use robust regression? ■ Now we can recomputeb weights, and reestimate β. ■ In general, given weights W j we can solve b βj+1 = (XtW jX)−1XtW jY: ■ This is very similar to a Newton-Raphson technique. b ■ Used over and over again in statistics. - p. 15/18 Robust estimate of scale ● Today’s class ■ In general s needs to be estimated. A popular estimate ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 s = MAD(e1; : : : ; en)=0:6745: ● Estimating σ ● Weighted regression example ● Robust methods where ● Example ● M -estimators ● Huber’s MAD(e1; : : : ; en) = Medianjei − Median(e1; : : : ; en)j: ● Hampel’s ● Tukey’s ● ■ Solving for βb Scale can be estimated at each stage of IRLS procedure ● Iteratively reweighted least squares (IRLS) based on current residuals, or based on some “very ● Robust estimate of scale ● Other resistant fitting methods resistant” fit. (LMS or LTS below) ● Why not always use robust ■ regression? The constant 0.6745 is chosen so that s is asymptotically 2 unbiased for σ if the ei’s are N(0; σ ). - p. 16/18 Other resistant fitting methods ● Today’s class ■ If is not redescending (i.e. does not return to 0) then ● Heteroskedasticity ● MLE for one sample problem robust regression is still susceptible to outliers.