Positive Definite Estimation of Large Covariance Matrix Using Generalized Nonconvex Penalties

Accepted for publication by IEEE Access (2016) Positive Definite Estimation of Large Covariance Matrix Using Generalized Nonconvex Penalties Fei Wen, Member, IEEE, Yuan Yang, Peilin Liu, Member, IEEE, Robert C. Qiu, Fellow, IEEE covariance matrix of a distribution from independent and Abstract—This work addresses the issue of large covariance identically distributed samples. In the high-dimensional setting, matrix estimation in high-dimensional statistical analysis. Re- cently, improved iterative algorithms with positive-definite the dimensionality is often comparable to (or even larger than) guarantee have been developed. However, these algorithms the sample size, in which cases the standard sample covariance cannot be directly extended to use a nonconvex penalty for spar- matrix estimator has a poor performance, since the number of sity inducing. Generally, a nonconvex penalty has the capability unknown parameters grows quadratically in the dimension [2, of ameliorating the bias problem of the popular convex lasso penalty, and thus is more advantageous. In this work, we propose 6, 7]. a class of positive-definite covariance estimators using general- To achieve better estimation of large covariance matrix, ized nonconvex penalties. We develop a first-order algorithm intrinsic structures of the covariance matrix can be exploited by based on the alternating direction method framework to solve the nonconvex optimization problem efficiently. The convergence of using regularization techniques, such as banding and tapering this algorithm has been proved. Further, the statistical proper- for banded structure [8-12], thresholding for sparse structure ties of the new estimators have been analyzed for generalized [13-15]. The former is useful for the applications where the nonconvex penalties. Moreover, extension of this algorithm to variables have a natural ordering and variables far apart are covariance estimation from sketched measurements has been considered. The performances of the new estimators have been only weakly correlated, such as in longitudinal data, time series, demonstrated by both a simulation study and a gene clustering spatial data, or spectroscopy. For other applications, where the example for tumor tissues. Code for the proposed estimators is variables do not have such properties but the true covariance available at https://github.com/FWen/Nonconvex-PDLCE.git. matrix is sparse, permutation-invariant thresholding methods Index Terms—Covariance matrix estimation, covariance have been proposed in [13, 14]. These methods have good sketching, alternating direction method, positive-definite esti- theoretical properties and are computationally efficient. It has mation, nonconvex optimization, sparse. been shown in [15] that the generalized thresholding estimators are consistent over a large class of (approximately) sparse I. INTRODUCTION covariance matrices. However, in practical finite sample ap- Nowadays, the advance of information technology makes plications, such an estimator is not always positive-definite massive high-dimensional data widely available for scientific although it converges to a positive-definite limit in the as- discovery, which makes Big Data a very hot research topic. In ymptotic setting. this context, effective statistical analysis for high-dimensional Positive definiteness is desirable in many statistical learning data is becoming increasingly important. In much statistical applications such as quadratic discriminant analysis and co- analysis of high-dimensional data, estimating large covariance variance regularized regression [16]. To simultaneously matrices is needed, which has attracted significant research achieve sparsity and positive definiteness, iterative methods attentions in the past decade and has found applications in have been proposed recently in [17-20]. In [17], a posi- many fields, such as economics and finance, bioinformatics, tive-definite estimator has been proposed via maximizing a social networks, smart grid, and climate studies [1-5]. In these penalized Gaussian likelihood with a lasso penalty, and a applications, the covariance information is necessary for ef- majorize-minimize algorithm has been designed to solve the fective dimensionality reduction and discriminant analysis. estimation problem. In [18], a logarithmic barrier term is added The goal of covariance estimation is to recover the population into the objective function of the soft-thresholding estimator to enforce positive-definiteness. Then, new positive-definite estimators have been proposed in [19, 20] by imposing an This work was supported in part by the National Natural Science Foundation of China (NSFC) under grants 61401501, 61573242 and 61472442, and by the eigenvalue constraint on the optimization problem of the Young Star Science and Technology Project in Shaanxi province under grant soft-thresholding estimator. Although these positive-definite 2015KJXX-46. estimators have good theoretical properties, the derived algo- F. Wen, P. Liu and R. C. Qiu are with the Department of Electronic Engi- neering, Shanghai Jiao Tong University, Shanghai 200240, China (e-mail: rithms are restricted to the convex 1 -norm (lasso) penalty and [email protected]; [email protected]; [email protected]). cannot be directly extended to use a nonconvex penalty, since F. Wen is also with the Air Control and Navigation Institution, Air Force En- gineering University, Xian 710000, China. they are not guaranteed to converge in that case. In [20], in Y. Yang is with the Air Control and Navigation Institution, Air Force Engi- addition to the 1 -penalty, the nonconvex minimax concave neering University, Xian 710000, China. T (MC) penalty has also been considered and an algorithm has elements be one. () denotes the transpose operator. min ()M been proposed based on local linear approximation of the MC and max ()M stand for the minimum and maximal eigenval- penalty. ues of M , respectively. , and stand for the Hadamard Compared with the convex 1 -penalty, a nonconvex penalty, product, Hadamard division and Kronecker product, respec- such as the hard-thresholding or smoothly clipped absolute tively. vec(M ) is the “vectorization” operator stacking the deviation (SCAD), is more advantageous since it can amelio- columns of the matrix one below another. rate the bias problem of the 1 -one. This work proposes a class dist(XY ,SS ): inf{YX F : } denotes the distance from a of positive-definite covariance estimators using generalized point Xm n to a subset S m n . X 0 and X 0 mean nonconvex penalties. We use an eigenvalue constraint to en- that X is positive-semidefinite and positive-definite, respec- sure the positive definiteness of the estimator similar to [19, 20], tively. but the penalty can be noncovex. With an eigenvalue constraint of the covariance and simultaneously employing a nonconvex II. BACKGROUND penalty make the optimization problem challenging. To solve For a vector xd with covariance Σ0 E{}xxT , the goal the nonconvex optimization problem efficiently, we present a is to estimate the covariance from n observations x1,, xn . first-order algorithm based on the alternating direction method In this work, we are interested in estimating the correlation (ADM) framework. It has been proved that the sequence gen- matrix Θ0diag(Σ 0 )1 2Σ 0 diag(Σ 0 ) 1 2 , where Θ0 is the erated by the proposed algorithm converges to a stationary true correlation matrix, and diag(Σ0 )1 2 is the diagonal ma- point of the objective function if the penalty is a trix of true standard deviations. With the estimated correlation Kurdyka-Lojasiewicz (KL) function. Further, the statistical matrix, denoted by Θˆ , the estimation of the covariance matrix properties of the new estimator have been analyzed for a gen- is Σˆ diag(RR )1 2Θˆ diag( ) 1 2 , where R denotes the sample eralized nonconvex penalty. The effectiveness of the new es- covariance matrix. This procedure is more favorable than that timators has been demonstrated via both a simulation study and of estimating the covariance matrix directly, since the corre- a real gene clustering experiment. lation matrix retains the same sparsity structure of the covar- Moreover, extension of the proposed ADM algorithm to iance matrix but with all the diagonal elements known to be sparse covariance estimation from sketches or compressed one. Since the diagonal elements need not to be estimated, the measurements has also been considered. Covariance sketching correlation matrix can be estimated more accurately than the is an efficient approach for covariance estimation from covariance matrix [20-22]. high-dimensional data stream [36-42]. In many practical applications to extract the covariance information from A. Generalized Thresholding Estimator high-dimensional data stream at a high rate, it may be infea- Given the sample correlation matrix S , the generalized sible to sample and store the whole stream due to memory and thresholding estimator [15] solves the following problem processing power constraint. In this case, by exploiting the structure information of the covariance matrix, it can be reli- 1 2 minΘS F g (Θ ) (1) ably recovered from compressed measurements of the data Θ 2 stream with a significantly lower dimensionality. The proposed algorithm can simultaneously achieve positive-definiteness where g ()Θ is a generalized penalty function depending on a and sparsity in estimating the covariance from compressed penalty parameter . The penalty function can be expressed in an element-wise

Load more