Fourier Methods for Estimating the Central Subspace and the Central Mean Subspace in Regression

Fourier Methods for Estimating the Central Subspace and the Central Mean Subspace in Regression Yu Z HU and Peng ZENG In regression with a high-dimensional predictor vector, it is important to estimate the central and central mean subspaces that preserve sufficient information about the response and the mean response. Using the Fourier transform, we have derived the candidate matrices whose column spaces recover the central and central mean subspaces exhaustively. Under the normality assumption of the predictors, explicit estimates of the central and central mean subspaces are derived. Bootstrap procedures are used for determining dimensionality and choosing tuning parameters. Simulation results and an application to a real data are reported. Our methods demonstrate competitive performance compared with SIR, SAVE,and other existing methods. The approach proposed in the article provides a novel view on sufficient dimension reduction and may lead to more powerful tools in the future. KEY WORDS: Bootstrap; Candidate matrix; Central mean subspace; Central subspace; Fourier transform; SAVE; SIR. 1. INTRODUCTION that is the target of sufficient dimension reduction for FY|X.Un- der mild conditions, Cook (1996, 1998) showed that the central Suppose that Y is a univariate response and X is a p-dimen- subspace exists and is unique. Throughout this article, we as- sional vector of continuous predictors. Let F | denote the con- Y X sume the existence of the central subspace. ditional distribution of Y given X and let E[Y|X] denote the When only the mean response E[Y|X] is of interest, suffi- mean response at X. In full generality, the regression of Y on X cient dimension reduction can be defined for E[Y|X] in a similar is to infer about the conditional distribution FY|X, often with fashion as for FY|X. A subspace S is called a mean dimension- the mean response E[Y|X] of primary interest. When F | or Y X reduction subspace if E[Y|X] does not admit a proper parametric form, nonparametric methods such as local polynomial smoothing are usually used Y ⊥⊥ E[Y|X]|PS X. (2) for regression. Because of the curse of dimensionality, how- If the intersection of all mean dimension-reduction subspaces ever, these methods become impractical when the dimension is also a mean dimension-reduction subspace, then it is con- of X is high. To mitigate the curse of dimensionality, various sidered the central mean subspace, denoted by S [ | ] (Cook dimension reduction techniques have been proposed in the lit- E Y X and Li 2002). Similar to the central subspace, the central mean erature, including projection pursuit regression (Friedman and subspace exists under mild conditions, and so its existence is Stuetzle 1981) and principal component regression (Hotelling assumed throughout this article. S [ | ] is the target of suffi- 1957; Kendall 1957). One popular approach is to project X onto E Y X cient dimension reduction for the mean response E[Y|X] and a lower-dimensional subspace where the regression of Y on X is always a subspace of the central subspace S | (Cook and sufficient dimen- Y X can be performed. In this article we focus on Li 2002). Recently, Yin and Cook (2002) extended the cen- sion reduction (SDR), which further requires that the projection tral mean subspace to the central kth-moment subspace that is of X onto the lower-dimensional subspace does not result in any sufficient for the first k moments of the conditional distribu- loss of information about FY|X or E[Y|X]. tion FY|X. The theory of sufficient dimension reduction originated from Various dimension-reduction methods have been proposed the seminal works of Li (1991) and Cook and Weisberg (1991). in the literature, some of which can be used to estimate the During the past decade, much progress has been achieved in central subspace or the central mean subspace. For the cen- S SDR (see Cook 1998 for a comprehensive account). Let de- tral subspace, these include sliced inverse regression (SIR; Li Rp note a subspace of and let PS be the orthogonal projec- 1991) and sliced average variance estimation (SAVE; Cook and S S tion operator onto in the usual inner product. is called a Weisberg 1991); for the central mean subspace, they include dimension-reduction subspace if Y and X are independent con- principal Hessian direction (pHd; Li 1992), iterative Hessian ditioned on PS X, that is, transformation (IHT; Cook and Li 2002), the structure adap- Y ⊥⊥ X|PS X, (1) tive method (SAM; Hristache, Juditsky, Polzehl, and Spokoiny 2001), and minimum average variance estimation (MAVE; Xia, where ⊥⊥ means “independent of.” The dimension-reduction Tong, Li, and Zhu 2002). SAM and MAVE are fundamen- subspace may not be unique. When the intersection of all tally different from the other methods mentioned earlier in dimension-reduction subspaces is also a dimension-reduction that both involve nonparametric estimation of the link func- subspace, it is defined to be the central subspace, denoted tion E[Y|X = x], which may be impractical when the dimension by SY|X (Cook 1996, 1998). The dimension of SY|X is called of X is high. All of the other aforementioned methods avoid the structural dimension of the regression of Y on X and is de- nonparametric estimation of the link function and target either noted by dim(SY|X). SY|X can be considered a metaparameter SY|X or SE[Y|X] directly. They usually follow a common proce- dure consisting of two steps. The first step is to define a p × p nonnegative definite matrix M called a candidate matrix (Ye Yu Zhu is Assistant Professor, Department of Statistics, Purdue University, West Lafayette, IN 47907 (E-mail: [email protected]). Peng Zeng is As- sistant Professor, Department of Mathematics and Statistics, Auburn Univer- © 2006 American Statistical Association sity, Auburn, AL 36849 (E-mail: [email protected]). The authors thank the Journal of the American Statistical Association editor, the associate editor, and three anonymous referees for constructive com- December 2006, Vol. 101, No. 476, Theory and Methods ments and suggestions that greatly helped improve an earlier manuscript. DOI 10.1198/016214506000000140 1638 Zhu and Zeng: Fourier Subspace Estimation in Regression 1639 and Weiss 2003), whose columns span a subspace of SE[Y|X] or To facilitate our approach, we need to modify the model as- ˆ SY|X, and then propose a consistent estimate M of the candidate sumptions as follows. First, we assume that the joint distribu- | | matrix from a sample {(xi, yi)}1≤i≤n of (X, Y). The second step tion of (X, Y), the conditional distributions of X Y and Y X, and is to obtain the spectral decomposition of Mˆ and use the space the marginal distributions of X and Y admit densities, which are | | spanned by the eigenvectors of Mˆ corresponding to the largest denoted by fX,Y (x, y), fY|X(y x), fX|Y (x y), fX(x), and fY (y).Let = × q eigenvalues as the estimate of S [ | ] or S | , where q is the B (β1, β2,...,βq) be a p q matrix with its columns form- E Y X Y X S dimension of S [ | ] or S | . Recently, Cook and Ni (2005) ing a basis for Y|X. Then (1) can be restated in terms of con- E Y X Y X = proposed the minimum discrepancy method, which is more effi- ditional distributions as FY|X FY|Bτ X or in terms of density as cient than spectral decomposition for estimating SE[Y|X] or SY|X from a given candidate matrix. For these methods to work, some | = | τ = ; τ τ τ fY|X(y x) fY|Bτ X(y B x) h(y β1x, β2x,...,βqx), (3) distributional assumptions need to be imposed on X. For con- ; + venience, we assume that the mean of X is the origin of Rp and where h(y u1,...,uq) is a (q 1)-variate function. Similarly, let α ,...,α be a basis of S [ | ]; then (2) is equivalent to the covariance matrix of X is the standard p × p identity ma- 1 q E Y X [ | = ]= τ τ τ trix Ip. Then SIR and IHT require that X satisfies the linearity E Y X x g(α1x, α2x,...,αqx), (4) condition, E[X|PS X]=PS X, and SAVE and pHd require Y|X Y|X where g is a q-variate function. We assume the differentiabil- an additional condition called the constant variance condition, ity of h and g with respect to their coordinates wherever it is cov[X|PS X]=QS , where QS = I −PS . These con- Y|X Y|X Y|X p Y|X needed. X ditions are satisfied when the distribution of is multivariate The rest of the article is organized as follows. Section 2 normal. (For a detailed discussion of the conditions, see Cook derives MFM for the central mean subspace, and Section 3 1998.) derives MFC for the central subspace. Section 4 derives the Although SIR, SAVE, pHd, and IHT work well in practice, estimates of MFM and MFC under the assumption that X is nor- there has not been much study on when they can exhaustively mal and discusses the asymptotic properties of these estimates. recover the central subspace or the central mean subspace. It Section 5 focuses on implementation of the proposed meth- is known that SIR fails to capture directions along which Y is ods for estimating the central subspace and the central mean = τ 2 + symmetric about. For example, if Y (β X) ε, where β is a subspace. Section 6 compares the proposed methods with SIR, p-dimensional vector, τ denotes transpose, X follows N(0, Ip), SAVE,and other existing methods using synthetic and real data.

Fourier Methods for Estimating the Central Subspace and the Central Mean Subspace in Regression

Kūnqǔ in Practice: a Case Study

Tv News Story Segmentation Using Deep Neural Network

Yubin Zeng Resume 1

Names of Chinese People in Singapore

Zhenzhong Zeng School of Environmental Science And

China's Carbon Emissions Report 2015

Integrated Chinese Third Edition 中文听说读写

Rethinking Chinese Kinship in the Han and the Six Dynasties: a Preliminary Observation

At&T Research at Trecvid 2007

A Guide to Names and Naming Practices

Pinyin) Spelling System-How to Pronounce Chinese Names by Stephen M

Zhu Liu. Caregiving Burden Among Family Caregivers of People with Advanced Cancer: a Literature Copyright© Zhu Liu