High Dimensional Covariance and Precision Matrix Estimation

High Dimensional Covariance and Precision Matrix Estimation Wei Wang Washington University in St. Louis Thursday 23rd February, 2017 Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 1 / 18 Outline 1 Introduction and Notation 2 Part I Covariance Matrix Estimation Shrinkage Estimation Sparse Estimation Factor Model-based Estimation 3 Part II Precision Matrix Estimation CLIME CONDREG Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 2 / 18 Introduction and Notation Introduction Covariance matrix marginal correlations between variables Precision (inverse covariance) matrix conditional correlations between pairs of variables given the remaining variables The estimation of covariance and precision matrices is fundamental in multivariate analysis. In high dimensional settings, sample covariance matrix has undesirable properties. p > n =) singular overspreading eigenvalues under the `large p small n' scenario Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 3 / 18 Introduction and Notation Eigenvalues for the sample covariance matrix under the `large p small n' scenario Figure: Average of the largest and smallest eigenvalues of the sample covariance matrices of i.i.d samples from N(0;I) out of 100 replications where p ranges from 5 to 100 and n = 50. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 4 / 18 Introduction and Notation Notation T Xi = (Xi1;:::;Xip) , i = 1; : : : ; n, are i.i.d. samples of a p-variate random T p vector X = (X1;:::;Xp) 2 R with Cov(X) = Σ and precision matrix Ω = Σ−1. ij Σ = (σij)p×p and Ω = (σ )p×p . Sample covariance matrix 1 Pn T 1 Pn Sn = (sjk)p×p = n−1 i=1 Xi − X Xi − X ; where X = n i=1 Xi: Operator norm of a square matrix A = (aij)p×p : jjAjjop = λmax(A). qP P 2 Frobenius norm : jjAjjF = i j jaijj . P P P P l1-norm jjAjj1 = i j jaijj, jjAjj1;off = i j6=i jaijj . jAj1 = max1≤i≤p;1≤j≤p jaijj Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 5 / 18 Part I Covariance Matrix Estimation Part I Covariance Matrix Estimation Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 6 / 18 Part I Covariance Matrix Estimation Shrinkage Estimation Shrinkage Estimation Ledoit and Wolf (2003) proposed the shrinkage estimation: ∗ S = λT + (1 − λ)Sn ; where T is the target matrix and λ 2 [0; 1] is the shrinkage parameter. T is often chosen to be positive definite and well conditioned. There are two popular target matrices: Identity matrix I diag(s11; : : : ; spp) Warton (2008): the sample correlation matrix Rn is regularized as R^(λ) = λRn + (1 − λ)I; −1=2 −1=2 where Rn = Sd SnSd , and Sd = diag(s11; : : : ; spp) . Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 7 / 18 Part I Covariance Matrix Estimation Sparse Estimation Sparse Estimation: Banding, Tapering and Thresholding Banding and tapering require a natural ordering among the variables and assume that variables farther apart in the ordering are less correlated. 1. Banding Bickel and Levina (2008a) gives the k-banded estimator of Σ : Bk(Sn) = [sij1(ji − jj ≤ k)]p×p : Here, k (0 ≤ k ≤ p) is the banding parameter which is usually chosen by a cross-validation method. Figure: Banding of a 16 × 16 matrix whose (i; j)th entry is 0:8ji−jj. k = 5. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 8 / 18 Part I Covariance Matrix Estimation Sparse Estimation 1. Banding (cont'd) The banded estimator is consistent in the operator (spectral) norm, uniformly over the class of approximately `bandable' matrices −1 U (α; ") =fΣ : 0 < " ≤ λmin(Σ) ≤ λmax(Σ) ≤ " ; X −α max fjσij j; ji − jj > kg ≤ Ck g: j i Under the conditions log p 1 n ! 0 as p ! 1; n ! 1. 2 C > 0, " > 0 is fixed and independent of p. α > 0 controls the rate of decay of the covariance entries σij as one moves away from the main diagonal. Bk(Sn) is not necessarily positive definite. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 9 / 18 Part I Covariance Matrix Estimation Sparse Estimation 2. Tapering A tapered estimator of Σ with a tapering matrix W = (wij)p×p is given by SW = Sn ∗ W = (sijwij)p×p A smoother positive-definite tapering matrix with off diagonal entries gradually decaying to zero will ensure the positive-definiteness as well as optimal rate of convergence of the tapered estimator, eg. Cai et al.(2010) used the trapezoidal weight matrix given by 8 1; if ji − jj ≤ kh; < ji−jj wij = 2 − ; if kh < ji − jj < k; kh : 0; otherwise: Under the autoregressive model scenario, usually use kh = k=2. Banding is a special case of tapering with 1; if ji − jj ≤ k; w = ij 0; otherwise: Consistency under both the operator and Frobenius norms holds in a larger class of covariance matrices than banding where their smallest eigenvalue is allowed to be 0. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 10 / 18 Part I Covariance Matrix Estimation Sparse Estimation Comparison of Banding and Tapering Figure: Banding and tapering of a 16 × 16 matrix whose (i; j)th entry is 0:8ji−jj. Upper: banded (k = 5). Lower: tapered (k = 10; kh = 5). Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 11 / 18 Part I Covariance Matrix Estimation Sparse Estimation 3. Thresholding It does not require the variables to be ordered so that the estimator is invariant to permutation of the variables. Sparsity e.g. soft-thresholding. A soft-thresholded covariance matrix estimator is defined by applying the soft thresholding operator to Sn elementwise, Σ^ λ = S(Sn; λ); where S(·; λ) = sign(·)(j · j − λ)+ is the soft thresholding operator. The soft thresholded estimator is the solution of the following optimization problem ^ 1 2 Σλ = argminf jjΣ − SnjjF + λjjΣjj1g Σ 2 p p X X 1 2 = argmin f (σij − sij ) + λjσij jg Σ 2 i=1 j=1 Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 12 / 18 Part I Covariance Matrix Estimation Sparse Estimation 3. Thresholding (cont'd) Regularize the eigenvalues of Sn e.g. Liu (2014): Estimation of Covariance matrices with Eigenvalue Constraints (EC2) The EC2 estimator of the correlation matrix is defined as ÊC2 1 2 R = argmin jjSn − ΣjjF + λjjΣjj1;off s.t. τ ≤ λmin (Σ) ; σjj = 1; Σ 2 where τ > 0 is a desired minimum eigenvalue lower bound of the estimator. The EC2 estimator of the covariance matrix is defined as ÊC2 1=2 ÊC2 1=2 Σ = Sd R Sd ; 1=2 p p where Sd = diag( s11;:::; spp). Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 13 / 18 Part I Covariance Matrix Estimation Factor Model-based Estimation Factor Model-based Estimation In many applications, the more desirable assumption is conditional sparse, i.e. conditional on the common factors, the covariance matrix of the remaining components is sparse. Fan(2013) proposed an estimator of Σ, the principal orthogonal complement thresholding estimator (POET), which can be written as a sum of low rank and sparse matrices. Start with the spectral decomposition of the sample covariance matrix of the data, q X ^ T ^ Sn = λieîeî + R i=1 where q is the number of selected PCs and R^ = (rij) is the matrix of residuals. The estimator is obtained by adaptively thresholding the residual matrix after taking out the first q PCs. Finding q using data-based methods is an important familiar and well-studied topic in the literature of PCA and factor analysis. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 14 / 18 Part II Precision Matrix Estimation Part II Precision Matrix Estimation Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 15 / 18 Part II Precision Matrix Estimation CLIME Constrained l1-minimization for Inverse Matrix Estimation (CLIME) Cai (2011) The CLIME estimator is the solution of the following optimization problem: min jjΩjj1 s.t. jSnΩ − Ij1 ≤ λ, Ω where λ > 0 is the tuning parameter. The solution is usually not symmetric. ^ 1 Suppose Ω1 = (w îj)p×p is the solution of the above optimization problem. The final CLIME estimator is defined as ^ 1 1 Ω = (w îj) ; where wîj =w ^ji = minfwîj; w^jig; which is demonstrated to be positive definite with high probability. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 16 / 18 Part II Precision Matrix Estimation CONDREG CONDition number REGularized estimation (CONDREG) Won (2013) The CONDREG estimator is the solution of the following optimization problem: min tr(ΩSn) − log det Ω Ω s.t. λmax(Ω)/λmin(Ω) ≤ k; where k > 0 is the tuning parameter. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 17 / 18 Q&A Thank you! Wei Wang (Washington University in St.

Load more