Robust Methods

Statistics 203: Introduction to Regression and Analysis of Variance Robust methods Jonathan Taylor - p. 1/18 Today’s class ● Today’s class ■ Weighted regression. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares Robust methods. 2 ● Estimating σ ■ ● Weighted regression example Robust regression. ● Robust methods ● Example ● M -estimators ● Huber’s ● Hampel’s ● Tukey’s ● Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale ● Other resistant fitting methods ● Why not always use robust regression? - p. 2/18 Heteroskedasticity ● Today’s class ■ In our standard model, we have assumed that ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 2 " ∼ N(0; σ I): ● Estimating σ ● Weighted regression example ● Robust methods That is, that the errors are independent and have the same ● Example ● M -estimators variance (homoskedastic). ● Huber’s ● Hampel’s ■ We have discussed graphical checks for non-constant ● Tukey’s ● Solving for βb variance (heteroskedasticity) but not “remedies” for ● Iteratively reweighted least squares (IRLS) heteroskedasticity. ● Robust estimate of scale ■ ● Other resistant fitting methods Suppose that ● Why not always use robust 2 regression? " ∼ N(0; σ D) for some known diagonal matrix D. ■ Where does D come from? Suppose that we see that variance increases like f(Xj), then we might choose Di = f(Xij). ■ What is the “maximum likelihood” thing to do? - p. 3/18 MLE for one sample problem ● Today’s class ■ Consider the simpler problem ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 2 Yi ∼ N(µ, σ Di) ● Estimating σ ● Weighted regression example ● 2 Robust methods with σ and D’s known. ● Example ● M -estimators ■ ● Huber’s ● Hampel’s ● Tukey’s n 2 n (Yi − µ) ● Solving for β 2 b −2 log L(µjY; σ) = + n log(2πσ ) + log(Di) ● Iteratively reweighted least 2 σ Di squares (IRLS) i=1 i=1 ● Robust estimate of scale X X ● Other resistant fitting methods ■ ● Why not always use robust Differentiating regression? n (Y − µ) −2 i = 0 σ2D i=1 i X b implying n Y n 1 µ = i = : D D i=1 i i=1 i X X ■ Observations are weightedb inversely proportional to their variance. - p. 4/18 Weighted least squares ● Today’s class ■ ● Heteroskedasticity n p−1 ● 2 MLE for one sample problem (Yi − β0 − j=1 βj Xij) ● Weighted least squares 2 −2 log L(β; σjY; D) = 2 ● Estimating σ σ Di ● Weighted regression example i=1 P ● Robust methods X ● Example 1 t −1 ● M -estimators = 2 (Y − Xβ) D (Y − Xβ) ● Huber’s σ ● Hampel’s ● Tukey’s 1 t ● = (Y − Xβ) W (Y − Xβ): Solving for βb 2 ● Iteratively reweighted least σ squares (IRLS) −1 ● Robust estimate of scale with W = D . ● Other resistant fitting methods ● Why not always use robust ■ Normal equations: regression? t −2X W (Y − XβW ) = 0 or, b t −1 t βW = (X W X) X W Y: ■ Distribution of β W b 2 t −1 βW ∼ N(β; σ (X W X) ): b - p. 5/18 b Estimating σ2 ● Today’s class ■ What are the right residuals? ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares If we knew β exactly 2 ● Estimating σ ● Weighted regression example p−1 ● Robust methods 2 ● Example Yi − β0 − Xijβj ∼ N(0; σ =Wi): ● M -estimators ● Huber’s i=1 ● Hampel’s X ● Tukey’s ■ ● Suggests that the natural residual is Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale Yi − YW;i ● Other resistant fitting methods eW;i = Wi Wiei ● Why not always use robust = regression? where p b p t −1 t YW = (X W X) X W Y: ■ Estimate of σ2 b 1 1 χ2 σ2 = e2 = w e2 ∼ σ2 n−p W n − p W;i n − p i i n − p i=1 i=1 X X b - p. 6/18 Weighted regression example ● Today’s class ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 ● Estimating σ ● Weighted regression example ● Robust methods ● Example ● M -estimators ● Huber’s ● Hampel’s ● Tukey’s ● Solving for βb ● Iteratively reweighted least squares (IRLS) ● Robust estimate of scale ● Other resistant fitting methods ● Why not always use robust regression? - p. 7/18 Robust methods ● Today’s class ■ We also discussed outlier detection but no specific remedies. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares One alternative is to discard potential outliers – not always a 2 ● Estimating σ good idea. ● Weighted regression example ● Robust methods ■ Outliers can really mess up the sample mean, but have ● Example ● M -estimators relatively effect on the sample median. ● Huber’s ● Hampel’s ■ ● Tukey’s Could also “downweight” outliers: basis of robust techniques. ● Solving for βb ■ ● Iteratively reweighted least Another “cause” of outliers may be that the data is not really squares (IRLS) ● Robust estimate of scale normally distributed. ● Other resistant fitting methods ● Why not always use robust regression? - p. 8/18 Example ● Today’s class ■ Suppose that we have a sample from ● Heteroskedasticity (Yi)1≤i≤n ● MLE for one sample problem ● Weighted least squares 2 1 −|y−µj/σ ● Estimating σ f(yjµ, σ) = e : ● Weighted regression example 2σ ● Robust methods ● Example ● M -estimators This has heavier tails than the normal distribution. ● Huber’s ■ ● Hampel’s MLE for µ ● Tukey’s n ● Solving for βb ● Iteratively reweighted least µ = argmin jYi − µj: squares (IRLS) µ ● Robust estimate of scale i=1 ● Other resistant fitting methods X ● Why not always use robust ■ It can be shown that µ is the sample median (exercise in regression? b STATS116). ■ Take home message:bif errors are not really normally distributed then least squares is not MLE and the MLE downweights large residuals relative to least squares. - p. 9/18 M-estimators ● Today’s class ■ Depending on the error distribution of " (assuming i.i.d.) we ● Heteroskedasticity ● MLE for one sample problem get different optimization problems. ● Weighted least squares 2 ● Estimating σ ■ Suppose that ● Weighted regression example ● −ρ((y−µ)=s) Robust methods f(yjµ, s) / e : ● Example ● M -estimators ■ ● Huber’s MLE: ● Hampel’s n ● Tukey’s Yi − µ ● argmin Solving for βb µ = ρ : ● Iteratively reweighted least µ s i=1 squares (IRLS) X ● Robust estimate of scale ■ ● Other resistant fitting methods For every ρ, we getb a different estimator: an M-estimator. ● Why not always use robust 0 regression? Let = ρ . ■ Generalizes easily to regression problem n p−1 Yi − β0 − Xi;jβj β = argmin ρ j=1 : s β i=1 P ! X b - p. 10/18 Huber’s 1.0 0.5 0.0 psi.huber(x) * x −0.5 −1.0 −4 −2 0 2 4 x - p. 11/18 Hampel’s 2 1 0 psi.hampel(x) * x −1 −2 −10 −5 0 5 10 x - p. 12/18 Tukey’s 1.0 0.5 0.0 psi.bisquare(x) * x −0.5 −1.0 −4 −2 0 2 4 x - p. 13/18 Solving for β b ● Today’s class ■ Assume for now that s is known in the M-estimator. ● Heteroskedasticity ● MLE for one sample problem ■ ● Weighted least squares Normal equations: 2 ● Estimating σ ● Weighted regression example n p−1 ● Robust methods Yi − j=0 Xijβj ● Example Xij = 0; 0 ≤ j ≤ p − 1 ● M -estimators s ● Huber’s i=1 P ! ● Hampel’s X b ● Tukey’s where 0. ● = ρ Solving for βb ● Iteratively reweighted least ■ squares (IRLS) Set ● Robust estimate of scale p−1 Yi−P =0 Xij βj ● Other resistant fitting methods j b ● Why not always use robust s regression? Wi = p−1 : Yi − j=0 Xijβj ■ Then P b n p−1 XijWi(Yi − Xijβj ) = 0; 0 ≤ j ≤ p − 1: i=1 j=0 X X b Or βXtW X − XtW Y = 0: - p. 14/18 b Iteratively reweighted least squares (IRLS) ● Today’s class ■ Although we are trying to estimate β above, given an initial ● Heteroskedasticity ● MLE for one sample problem 0 ● Weighted least squares estimate β we can compute initial weights 2 ● Estimating σ ● Weighted regression example p−1 0 p−1 ● Robust methods b Yi − j=0 Xijβj ● Example 0 0 Wi = = Yi − Xijβj ● M -estimators s 0 1 ● Huber’s P ! j=0 ● Hampel’s b X ● Tukey’s @ b A ● Solving for 1 βb ans solve for ● Iteratively reweighted least β squares (IRLS) ● Robust estimate of scale 1 t 0 −1 t 0 ● Other resistant fitting methods b β = (X W X) X W Y: ● Why not always use robust regression? ■ Now we can recomputeb weights, and reestimate β. ■ In general, given weights W j we can solve b βj+1 = (XtW jX)−1XtW jY: ■ This is very similar to a Newton-Raphson technique. b ■ Used over and over again in statistics. - p. 15/18 Robust estimate of scale ● Today’s class ■ In general s needs to be estimated. A popular estimate ● Heteroskedasticity ● MLE for one sample problem ● Weighted least squares 2 s = MAD(e1; : : : ; en)=0:6745: ● Estimating σ ● Weighted regression example ● Robust methods where ● Example ● M -estimators ● Huber’s MAD(e1; : : : ; en) = Medianjei − Median(e1; : : : ; en)j: ● Hampel’s ● Tukey’s ● ■ Solving for βb Scale can be estimated at each stage of IRLS procedure ● Iteratively reweighted least squares (IRLS) based on current residuals, or based on some “very ● Robust estimate of scale ● Other resistant fitting methods resistant” fit. (LMS or LTS below) ● Why not always use robust ■ regression? The constant 0.6745 is chosen so that s is asymptotically 2 unbiased for σ if the ei’s are N(0; σ ). - p. 16/18 Other resistant fitting methods ● Today’s class ■ If is not redescending (i.e. does not return to 0) then ● Heteroskedasticity ● MLE for one sample problem robust regression is still susceptible to outliers.

Robust Methods

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support