Nonconcave Penalized M-Estimation with a Diverging Number of Parameters

Statistica Sinica 21 (2011), 391-419 NONCONCAVE PENALIZED M-ESTIMATION WITH A DIVERGING NUMBER OF PARAMETERS Gaorong Li1, Heng Peng2 and Lixing Zhu2 1Beijing University of Technology and 2Hong Kong Baptist University Abstract: M-estimation is a widely used technique for robust statistical inference. In this paper, we investigate the asymptotic properties of a nonconcave penalized M-estimator in sparse, high-dimensional, linear regression models. Compared with classic M-estimation, the nonconcave penalized M-estimation method can perform parameter estimation and variable selection simultaneously. The proposed method is resistant to heavy-tailed errors or outliers in the response. We show that, un- der certain appropriate conditions, the nonconcave penalized M-estimator has the so-called \Oracle Property"; it is able to select variables consistently, and the estimators of nonzero coe±cients have the same asymptotic distribution as they would if the zero coe±cients were known in advance. We obtain consistency and asymptotic normality of the estimators when the dimension pn of the predictors satis¯es 2 the conditions pn log n=n ! 0 and pn=n ! 0, respectively, where n is the sample size. Based on the idea of sure independence screening (SIS) and rank correlation, a robust rank SIS (RSIS) is introduced to deal with ultra-high dimensional data. Simulation studies were carried out to assess the performance of the proposed method for ¯nite-sample cases, and a dataset was analyzed for illustration. Key words and phrases: Linear model, oracle property, rank correlation, robust estimation, SIS, variable selection. 1. Introduction The modern technologies employed in many scienti¯c ¯elds allow production and storage of large datasets with ever-increasing sample sizes and dimensions, and that often include superﬂuous variables or information. E®ective variable selection procedures are thus required to improve both the accuracy and inter- pretability of learning techniques. Selecting too small a subset leads to misspeci¯cation, whereas choosing too many variables aggravates the \curse of dimensionality"; selecting the right subset of variables and excluding unimportant variables when modeling high-dimensional data is of considerable importance. In high-dimensional modeling, the classical L0 penalized variable selection methods, AIC, BIC, CP and so on, all su®er from a heavy computational bur- den, and the statistical properties of the estimators are di±cult to analyze. To overcome these insu±ciencies, various shrinkage or L1 penalized variable selection methods, such as Bridge Regression (Frank and Friedman (1993)), LASSO 392 GAORONG LI, HENG PENG AND LIXING ZHU (Tibshirani (1996)), Elastic-Net (Zou and Hastie (2005)) and Adaptive LASSO (Zou (2006)), have been extensively investigated for linear and generalized linear models. Fan and Li (2001) proposed a nonconcave penalized likelihood method for variable selection in likelihood-based models, and showed that this method has some good properties compared to other penalized methods. The nonconcave likelihood approach has been further extended to Cox's model for survival data (Fan and Li (2002)), partially linear models for longitudinal data (Fan and Li (2004)) and varying coe±cient partially linear models (Li and Liang (2008)). These shrinkage methods are much more computationally e±cient than the classical variable selection methods and the properties of the estimators for most shrinkage methods are easier to study. It is easy to show that if the tuning parameters can be appropriately selected, then the true model can be consistently identi¯ed. For sparse high-dimensional models, Cand¶esand Tao (2007) proposed the Dantzig selector, which is solution to an L1-regularization problem. They showed that this selector has certain oracle properties in sparse high-dimensional linear regression models. Fan and Peng (2004) investigated the properties of nonconcave penalized estimators using the high-dimension likelihood approach. In addition, the asymptotic properties of penalized least squares estimators for sparse high-dimensional linear regression models have been widely investigated, see Huang, Horowitz and Ma (2008) for Bridge Regression, Zhang and Huang (2008) for LASSO, Zou and Zhang (2009) for Elastic-Net, Huang and Xie (2007) for SCAD. To avoid model misspeci¯cation and increase the robustness of the estimation, a number of model-free dimension reduction methods have been considered. Zhou and He (2008), for example, proposed a constrained dimension reduction method based on canonical correlation and the L1 constraint. Zhong et al. (2005) proposed a novel procedure called regularized sliced inverse regression (RSIR) to directly identify the linear combinations and further the functional TFBS, while avoiding estimation of the link function. Although these model-free dimension reduction methods are not only able to reduce the model dimensions, but also to shrink some of the coe±cients of the selected linear combinations to zero, with- out information on the model structure the estimation e±ciency is somewhat di±cult to study and compare. The robust methods of some of the speci¯ed models have been widely investigated, but robust variable/model selection has received little attention. The seminal papers that do address this issue include those of Ronchetti (1985) and Ronchetti and Staudte (1994) introducing robust versions of the AIC and Mallows' Cp selection criteria, respectively. Ronchetti, Field, and Blanchard (1997) proposed robust model selection by cross-validation. Ronchetti (1997) discussed the robust model selection variants of the classical model selection criteria. Agostinelli (2002) used weighted likelihood to increase NONCONCAVE PENALIZED M-ESTIMATION 393 the robustness of model selection. Wu and Zen (1999) proposed a linear model selection procedure based on M-estimation that includes many classical model selection criteria as special cases. Zheng, Freidlin, and Gastwirth (2004) suggested two measures based on Kullback-Leibler information for choosing a model and demonstrated the robustness of their proposed methods. MÄullerand Welsh (2005) proposed the selection of a regression model based on combining a robust penalized criterion and a robust conditional expected prediction loss function that is estimated using a strati¯ed bootstrap. Khan, Van Aelst, and Zamar (2007) suggested a robust linear model selection method based on Least Angle Regression. Recently, Salibian-Barrera and Van Aelst (2008) proposed a robust model selection method that uses a fast and robust bootstrap. Unfortunately, most of the aforementioned robust model selection methods are based on classical model selection criteria, and hence are computationally expensive. Although a number of fast robust algorithms have been proposed, estimation properties re- main little known, and it is thus di±cult to investigate them in high-dimensional situations. When using a high-dimensional statistical regression model to ¯t data, there are several problems that cannot be avoided. First, because of the high-dimension nature of the model and the data, it is di±cult to determine outlying observa- tions from the data by simple techniques or criteria. High-dimensionality also increases the likelihood of extreme covariates in the dataset. Second, as Fan and Lv (2008) discuss, strong correlation always exists between the covariates when the model dimensions are ultra-high. Thus, even when the model dimensions are smaller than the sample size, the design matrix is close to a singular matrix. Third, most of the theoretical results on penalized least squares in a high-dimensional regression model setting are based on the assumption of normality or the sub-Gaussian distribution of white noise. This assumption seems too restrictive. The white noise distribution is di±cult to substantiate, and too many superﬂuous variables in a model a®ect the estimation and the ¯nal distribution of the residuals. Accordingly, the simple and direct use of penalized least squares is not a good choice because it is not a robust estimation method, and there is a large gap between the theoretical analysis and practical application. The study of robust methods in a high-dimensional model setting is a nec- essary one. Due to the robust property of M-estimation and the good properties of the nonconcave penalty, we investigate nonconcave penalized M-estimation for high-dimension models and show that it has the so-called \Oracle property", and that it retains its robustness properties. We relax the assumption concern- ing white noise and assume only the existence of moments. Of course, there is a cost to doing so; we cannot obtain theoretical results that support us directly in applying the nonconcave penalized M-estimation to a case in which the model 394 GAORONG LI, HENG PENG AND LIXING ZHU dimensions are ultra-high relative to the sample size. To deal with this issue, we draw on the sure independence screening (SIS) concept presented in Fan and Lv (2008), and on rank correlation, to propose a robust rank SIS method, the RSIS. We ¯rst use the RSIS method to reduce the model dimensions to below the sample size; then, nonconcave penalized M-estimation is used to obtain the ¯nal estimation. Our proposed two-step procedure should retain some of the robustness properties supported by our numerical studies. The remainder of the paper is organized as follows. In Section 2, we de¯ne the nonconcave penalized M-estimator and provide its asymptotic properties. The RSIS method is introduced in Section 3. In Section 4, we describe the algorithm used to compute the nonconcave penalized M-estimator and the criterion used to choose the regularization parameter. A two-step robust procedure based on the RSIS and penalized M-estimation is also discussed in this section to deal with ultra high-dimension cases. In Section 5, we apply a number of simulations to assess the ¯nite sample performance of our penalized M-estimation method, and to compare it with other variable selection procedures. A simulation study and application are also presented in this section to compare the performance of the the RSIS with that of the SIS.

Nonconcave Penalized M-Estimation with a Diverging Number of Parameters

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support