<<

High Dimensional Covariance and Precision Estimation

Wei Wang

Washington University in St. Louis

Thursday 23rd February, 2017

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 1 / 18 Outline

1 Introduction and Notation

2 Part I Estimation Shrinkage Estimation Sparse Estimation Factor Model-based Estimation

3 Part II Precision Matrix Estimation CLIME CONDREG

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 2 / 18 Introduction and Notation Introduction

Covariance matrix marginal correlations between variables

Precision (inverse covariance) matrix conditional correlations between pairs of variables given the remaining variables

The estimation of covariance and precision matrices is fundamental in multivariate analysis. In high dimensional settings, sample covariance matrix has undesirable properties. p > n =⇒ singular overspreading eigenvalues under the ‘large p small n’ scenario

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 3 / 18 Introduction and Notation Eigenvalues for the sample covariance matrix under the ‘large p small n’ scenario

Figure: Average of the largest and smallest eigenvalues of the sample covariance matrices of i.i.d samples from N(0,I) out of 100 replications where p ranges from 5 to 100 and n = 50.

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 4 / 18 Introduction and Notation Notation

T Xi = (Xi1,...,Xip) , i = 1, . . . , n, are i.i.d. samples of a p-variate random T p vector X = (X1,...,Xp) ∈ R with Cov(X) = Σ and precision matrix Ω = Σ−1. ij Σ = (σij)p×p and Ω = (σ )p×p . Sample covariance matrix 1 Pn  T 1 Pn Sn = (sjk)p×p = n−1 i=1 Xi − X Xi − X , where X = n i=1 Xi. Operator norm of a square matrix A = (aij)p×p : ||A||op = λmax(A). qP P 2 Frobenius norm : ||A||F = i j |aij| . P P P P l1-norm ||A||1 = i j |aij|, ||A||1,off = i j6=i |aij| . |A|∞ = max1≤i≤p,1≤j≤p |aij|

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 5 / 18 Part I Covariance Matrix Estimation

Part I Covariance Matrix Estimation

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 6 / 18 Part I Covariance Matrix Estimation Shrinkage Estimation Shrinkage Estimation

Ledoit and Wolf (2003) proposed the shrinkage estimation:

∗ S = λT + (1 − λ)Sn ,

where T is the target matrix and λ ∈ [0, 1] is the shrinkage parameter. T is often chosen to be positive definite and well conditioned. There are two popular target matrices: I diag(s11, . . . , spp)

Warton (2008): the sample correlation matrix Rn is regularized as

Rˆ(λ) = λRn + (1 − λ)I,

−1/2 −1/2 where Rn = Sd SnSd , and Sd = diag(s11, . . . , spp) .

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 7 / 18 Part I Covariance Matrix Estimation Sparse Estimation Sparse Estimation: Banding, Tapering and Thresholding

Banding and tapering require a natural ordering among the variables and assume that variables farther apart in the ordering are less correlated. 1. Banding Bickel and Levina (2008a) gives the k-banded estimator of Σ :

Bk(Sn) = [sij1(|i − j| ≤ k)]p×p . Here, k (0 ≤ k ≤ p) is the banding parameter which is usually chosen by a cross-validation method.

Figure: Banding of a 16 × 16 matrix whose (i, j)th entry is 0.8|i−j|. k = 5. Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 8 / 18 Part I Covariance Matrix Estimation Sparse Estimation

1. Banding (cont’d)

The banded estimator is consistent in the operator (spectral) norm, uniformly over the class of approximately ‘bandable’ matrices

−1 U (α, ε) ={Σ : 0 < ε ≤ λmin(Σ) ≤ λmax(Σ) ≤ ε , X −α max {|σij |; |i − j| > k} ≤ Ck }. j i Under the conditions log p 1 n → 0 as p → ∞, n → ∞. 2 C > 0, ε > 0 is fixed and independent of p.

α > 0 controls the rate of decay of the covariance entries σij as one moves away from the main diagonal.

Bk(Sn) is not necessarily positive definite.

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 9 / 18 Part I Covariance Matrix Estimation Sparse Estimation

2. Tapering A tapered estimator of Σ with a tapering matrix W = (wij)p×p is given by

SW = Sn ∗ W = (sijwij)p×p

A smoother positive-definite tapering matrix with off diagonal entries gradually decaying to zero will ensure the positive-definiteness as well as optimal rate of convergence of the tapered estimator, eg. Cai et al.(2010) used the trapezoidal weight matrix given by  1, if |i − j| ≤ kh,  |i−j| wij = 2 − , if kh < |i − j| < k, kh  0, otherwise.

Under the autoregressive model scenario, usually use kh = k/2. Banding is a special case of tapering with

 1, if |i − j| ≤ k, w = ij 0, otherwise.

Consistency under both the operator and Frobenius norms holds in a larger class of covariance matrices than banding where their smallest eigenvalue is allowed to be 0.

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 10 / 18 Part I Covariance Matrix Estimation Sparse Estimation Comparison of Banding and Tapering

Figure: Banding and tapering of a 16 × 16 matrix whose (i, j)th entry is 0.8|i−j|. Upper: banded (k = 5). Lower: tapered (k = 10, kh = 5).

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 11 / 18 Part I Covariance Matrix Estimation Sparse Estimation

3. Thresholding It does not require the variables to be ordered so that the estimator is invariant to permutation of the variables. Sparsity e.g. soft-thresholding.

A soft-thresholded covariance matrix estimator is defined by applying the soft thresholding operator to Sn elementwise,

Σˆ λ = S(Sn, λ),

where S(·, λ) = sign(·)(| · | − λ)+ is the soft thresholding operator. The soft thresholded estimator is the solution of the following optimization problem ˆ 1 2 Σλ = argmin{ ||Σ − Sn||F + λ||Σ||1} Σ 2 p p X X 1 2 = argmin { (σij − sij ) + λ|σij |} Σ 2 i=1 j=1

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 12 / 18 Part I Covariance Matrix Estimation Sparse Estimation

3. Thresholding (cont’d)

Regularize the eigenvalues of Sn e.g. Liu (2014): Estimation of Covariance matrices with Eigenvalue Constraints (EC2)

The EC2 estimator of the correlation matrix is defined as

ˆEC2 1 2 R = argmin ||Sn − Σ||F + λ||Σ||1,off s.t. τ ≤ λmin (Σ) , σjj = 1, Σ 2 where τ > 0 is a desired minimum eigenvalue lower bound of the estimator.

The EC2 estimator of the covariance matrix is defined as ˆEC2 1/2 ˆEC2 1/2 Σ = Sd R Sd ,

1/2 √ √ where Sd = diag( s11,..., spp).

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 13 / 18 Part I Covariance Matrix Estimation Factor Model-based Estimation Factor Model-based Estimation

In many applications, the more desirable assumption is conditional sparse, i.e. conditional on the common factors, the covariance matrix of the remaining components is sparse.

Fan(2013) proposed an estimator of Σ, the principal orthogonal complement thresholding estimator (POET), which can be written as a sum of low rank and sparse matrices. Start with the spectral decomposition of the sample covariance matrix of the data, q X ˆ T ˆ Sn = λieˆieˆi + R i=1

where q is the number of selected PCs and Rˆ = (rij) is the matrix of residuals. The estimator is obtained by adaptively thresholding the residual matrix after taking out the first q PCs. Finding q using data-based methods is an important familiar and well-studied topic in the literature of PCA and factor analysis.

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 14 / 18 Part II Precision Matrix Estimation

Part II Precision Matrix Estimation

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 15 / 18 Part II Precision Matrix Estimation CLIME

Constrained l1-minimization for Inverse Matrix Estimation (CLIME)

Cai (2011)

The CLIME estimator is the solution of the following optimization problem:

min ||Ω||1 s.t. |SnΩ − I|∞ ≤ λ, Ω where λ > 0 is the tuning parameter.

The solution is usually not symmetric.

ˆ 1 Suppose Ω1 = (w ˆij)p×p is the solution of the above optimization problem. The final CLIME estimator is defined as

ˆ 1 1 Ω = (w ˆij) , where wˆij =w ˆji = min{wˆij, wˆji},

which is demonstrated to be positive definite with high probability.

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 16 / 18 Part II Precision Matrix Estimation CONDREG CONDition number REGularized estimation (CONDREG)

Won (2013)

The CONDREG estimator is the solution of the following optimization problem: min tr(ΩSn) − log det Ω Ω

s.t. λmax(Ω)/λmin(Ω) ≤ k, where k > 0 is the tuning parameter.

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 17 / 18 Q&A

Thank you!

Wei Wang (Washington University in St. Louis) High Dimensional Covariance and Precision Matrix Estimation Thursday 23rd February, 2017 18 / 18