Leverage in penalized least squares estimation Keith Knight University of Toronto
[email protected] Abstract The German mathematician Carl Gustav Jacobi showed in 1841 that least squares estimates can be written as an expected value of elemental estimates with respect to a probability measure on subsets of the predictors; this result is trivial to extend to penalized least squares where the penalty is quadratic. This representation is used to define the leverage of a subset of observations in terms of this probability measure. This definition can be applied to define the influence of the penalty term on the estimates. Applications to the Hodrick-Prescott filter, backfitting, and ridge regression as well as extensions to non-quadratic penalties are considered. 1 Introduction We consider estimation in the linear regression model Y = xT β + ε (i =1, , n) i i i ··· where β is a vector of p unknown parameters. Given observations (x ,y ): i = 1, , n , we define the penalized least squares (PLS) { i i ··· } estimate β as the minimizer of n (y xT β)2 + βT Aβ. (1) i − i i=1 where A is a non-negative definite matrix with A = LT L for some r p matrix L. Note that L × is in general not uniquely specified since for any r r orthogonal matrix O, A =(OL)T (OL). × In the context of this paper, we would typically choose L so that its rows are interpretable in some sense. Examples of PLS estimates include ridge estimates (Hoerl and Kennard, 1970), smoothing splines (Craven and Wahba, 1978), the Hodrick-Prescott filter (Hodrick and Prescott, 1997), and (trivially when A = 0) ordinary least squares (OLS) estimates.