<<

arXiv:1810.08360v1 [cs.IT] 19 Oct 2018 ag u h apespoti eaieylw h estimate the low, relatively is i support dimensionality matrix the sample covariance when the that systems but is communication large vari challenge and key and radar One in [1]-[5]. optimization, tasks portfolio processing signal analysis, applicat correlation extensive has in which matrix, covariance the timate n tutrlasmtoso h recvrac matrix covariance true the of assumptions im By structural estimation. ing matrix fail covariance tremen for or recently attracted poorly attention have perform techniques Regularization may apply. estimation to matrix covariance on oyehi nvriy China. University, Polytechnic [email protected] yanguang Email: qguo, Australia. jiangtao, 2522, University NSW Engineering, Wollongong, Telecommunications and Computer n ufr rmsgicn rosrltv otetu cova true the to relative matrix errors ance significant singu from even ordinary suffers or or and ill-conditioned becomes (SCM) (OLS), matrix squares covariance least sample as such method eetsurveys. t [19]-[21 recent See improving estimation. for matrix covariance potential of performance great demonstrated shrin and have [7], [8]-[18] thresholding [6], banding as such techniques hikg,odnr es qae,sml oainematr covariance sample squares, least ordinary shrinkage, ifrn rbesi ra inlprocessing. seve to signal and/ techniques array proposed targets, in our problems of multiple different applications show targets, also esti matrix We shrinkage covariance (OLS)-based squares general least ordinary use Furthermore, estima for that targets. solutions per- low-complexity shrinkage provides near-oracle mat method typical LOOCV covariance achieves several sample method and using Our (SCM) designs LOOCV complexities. shrinkage the for low formance that have optimizat techni show thus existing solving various We resulting and with compared by The are sizes methods LOOCV. or proposed small for analytically of error found prediction problems be the can as solutions quadr used A error. is estimation aim the loss of coefficients, norm shrinkage Frobenius the the met minimize propo choosing We (LOOCV) automatically estimators. cross-validation for shrinkage leave-one-out linear invest data-driven, for paper This choice dimensionality. parameter tra the of than number smaller the samples espec with estimation, applications matrix low-sample-support covariance for of accuracy and number ierSrnaeEtmto fCvrac Matrices Covariance of Estimation Shrinkage Linear .Xa swt colo lcrncadIfrainEnginee Information and Electronic of School with is Elec Xiao of Z. School the with are Yu Y. and Guo Q. Xi, J. Hu, R. Tong, J. ne Terms Index Abstract nsaitclsga rcsig n rtclpolmi t is problem critical one processing, signal statistical In Srnaecnefcieyipoetecondition the improve effectively can —Shrinkage Σ sn o-opeiyCross-Validation Low-Complexity Using osqety inlpoesn ak htrely that tasks processing signal Consequently, . Cvrac arx rs-aiain linear cross-validation, matrix, —Covariance R u og u u intoX,Zia io igu u n Ya and Guo Qinghua Xiao, Zhitao Xi, Jiangtao Hu, Rui Tong, Jun } hc a eotie sn general a using obtained be may which , @uow.edu.au. .I I. NTRODUCTION fWollongong, of u.au, ig Tianjin ring, mation. n to ing { igates ix. ques. ining es- o kage dous trical, jtong, hods for ] pos- ions ially tors atic ous lar, the ion ral rix Σ he ri- or se d s , siaetemdlprmtr n hnconstruct then firs and may parameters one model exists, the observation the estimate about model generative a ie pc-iesga rcsig(ASA)myset may knowledge- (KA-STAP) example, processing For signal knowledge. space-time prior be aided utilize can targets, better diagonal or to identity to target limited shrinkage necessarily of communicati types in different seen Furthermore, models linear systems. is this of example typical 3]freapeapiain narysga processing. signal array on in applications and example [26]-[30] See for estimators. matrix) [36] shrinkage covariance of precisi design of of application-oriented estimation inverse shrinkage (the on imposing studies matrix Th of those body [25]. as a [24], also estimate such are the SCM, on to constraints the related number of condition estimators closely eigenvalues matrix is the covariance and shrink invariant estimatio [23], unitarily shrinkage [22], several nonlinear matrices to covariance approach generalized of shrinkage been linear also has The signific improvements. demonstrated have performance and whic matrices applications various covariance in utilize employed been have designs s age Such can positive-definiteness. as estimators guarantee to such shrinkage designed easily linear estimators thresholding, regularized and other banding to Compared obtained. od ftesrnaetarget shrinkage the of hoods matrix covariance n prahsfrconstructing for approaches ent etrapoiaint h recvrac matrix covariance true the to approximation better ple hndsic use bu h recvrac mat covariance [17]. true available the are can about guesses targets distinct shrinkage when multiple applied covarianc Even past [37]. or estimates [3] matrix environment the about knowledge using rvd odtaef ewe isadvrac,sc that such , both and outperforming bias estimate between an tradeoff good a provide ME rtro,Ldi n of(W 2 eie closed- derived [2] (LW) Wolf and error squared Ledoit mean the criterion, Under (MSE) studied. been have cr methods Various and estimators. shrinkage linear of performance the hikg ofcet.I eea,tesrnaetarget shrinkage the general, In coefficients. shrinkage where arx iersrnaeetmt scntutdas constructed is estimate shrinkage linear a estimate matrix, an Given matrices. covariance of ihlwrvrac u ihrba oprdt h original the to compared bias higher estimate structur but more variance lower or with parsimonious more better-conditioned, is hikg a ae nepeain[] 9.Tetrue The [9]. [2], interpretation Bayes a has Shrinkage h hieo hikg ofcet infiatyinfluence significantly coefficients shrinkage of choice The hsppri ocre ihtelna hikg estimatio shrinkage linear the with concerned is paper This T 0 R stesrnaetre and target shrinkage the is 1] h coefficients The [11]. Σ a easmdt ewti h neighbor- the within be to assumed be can Σ b ρ,τ = gagYu nguang ρ T R 0 R R hr a evrosdiffer- various be can There . + and τ and T ρ ρ T 0 and and , T 0 o xml,when example, For . R 0 τ sahee n a and achieved is τ ftecovariance the of r nonnegative are r hsnto chosen are R Σ 2] A [20]. a be can hrink- ,not s, iteria [31]- used that T T ere ant ed, (1) rix on on be be n h n 0 0 e s 1 t 2

form solutions based on asymptotic estimates of the II. LOOCV CHOICE OF LINEAR SHRINKAGE needed for finding the optimal shrinkage coefficients, where COEFFICIENTS R and T0 are assumed as the SCM and identity matrix, This paper deals with the estimation of covariance matrices respectively. Later the LW solution was extended for more of zero-mean signals whose fourth-order moments exist. We general shrinkage targets [3], [17]. Chen et al [4] assumed study the LOOCV choice of the shrinkage coefficients for the Gaussian distribution and proposed an oracle approximating linear shrinkage covariance matrix estimator (1), i.e., Σρ,τ = shrinkage (OAS) estimator, which achieves near-optimal pa- ρR + τT0. The following assumptions are made: rameter choice for Gaussian data even with very low sample 1) The true covariance matrix Σ, the estimated covarianceb supports. The shrinkage coefficients determination can also be matrix R, and the shrinkage target T0 are all Hermitian cast as a model selection problem and thus generic model and positive-semidefinite (PSD). selection techniques such as cross-validation (CV) [38]-[40] 2) T independent, identically distributed (i.i.d.) samples can be applied. In general, CV splits the training samples for yt of the signal are available. multiple times into disjoint subsets and then fits and assesses 3){ The} shrinkage coefficients are nonnegative, i.e., the models under different splits based on a properly chosen prediction loss. This has been explored, e.g., in [10], [13], ρ 0, τ 0. (2) ≥ ≥ where the Gaussian likelihood is used as the prediction loss. Assumption 3 follows the treatments in [2]-[4] and is sufficient but not necessary to guarantee that the shrinkage estimate Σ All these data-driven techniques achieve near-optimal pa- ρ,τ is PSD when Assumption 1 holds1. Two classes of shrinkage rameter choice when the underlying assumptions hold. How- targets will be considered in this paper. One is constructedb ever, there are also limitations to their applications: almost all independent of the training samples y for generating R, existing analytical solutions to shrinkage coefficients [2]-[4], t similarly to the knowledge-aided targets{ as} considered in [3]. [17] were derived under the assumption of SCM and certain The other is constructed from y , but is highly structured special forms of shrinkage targets. They need to be re-designed t with significantly fewer free parameters{ } as compared to R. when applied to other cases, which is generally nontrivial. The Examples of the second class include those constructed using asymptotic analysis-based methods [2], [3] may not perform only the diagonal entries of R [4], [20] and the Toeplitz well when the sample support is very low. Although the approximations of R [17]. existing CV approaches [10], [13] have broader applications, they assume Gaussian distribution and employ grid search A. Oracle Choice to determine the shrinkage coefficients. The likelihood cost of [10], [13] must be computed for multiple data splits and Different criteria may be used for evaluating the covariance multiple candidates of shrinkage coefficients, which can be matrix estimators. In this paper, we use the squared Frobenius time-consuming. norm of the estimation error as the performance measure. Given Σ, R and T0, the oracle shrinkage coefficients min- In this paper, we further investigate data-driven techniques imize that automatically tune the linear shrinkage coefficients us- 2 2 JO(ρ, τ)= Σρ,τ Σ F = ρR + τT0 Σ F , (3) ing leave-one-out cross-validation (LOOCV). We choose a || − || || − || where denotes the Frobenius norm. The cost function simple quadratic loss as the prediction loss for LOOCV, and || · ||F b derive analytical and computationally efficient solutions. The in (3) can then be rewritten as a quadratic function of the solutions do not need to specify the distribution of the data. shrinkage coefficients: Furthermore, the LOOCV treatment is applicable to different T T ρ ρ ρ covariance matrix estimators including the SCM- and ordinary J (ρ, τ)= A 2 b + tr(Σ2), (4) O τ O τ − τ O (OLS)-based schemes. It can be used together       2 with general shrinkage targets and can also be easily extended tr(R ) tr(RT0) AO = 2 , (5) to incorporate multiple shrinkage targets. The numerical ex- tr(RT0) tr(T0) amples show that the proposed method can achieve oracle-   tr(RΣ) approximating performance for covariance matrix estimation b = , (6) O tr(T Σ) and can improve the performance of several array signal  0  processing schemes. where tr( ) denotes the trace of a matrix. As A is positive- · O definite, we can find the minimizer of JO(ρ, τ) by solving The remainder of the paper is organized as follows. In the above bivariate convex optimization problem. We can also Section 2, we present computationally efficient LOOCV meth- apply the Karush-Kuhn-Tucker (KKT) conditions to find the ods for choosing the linear shrinkage coefficients for both JO(ρ,τ) JO(ρ,τ) solution analytically. From (4), letting ∂ρ = ∂τ =0 SCM- and OLS-based covariance matrix estimators and also leads to tr(R2) tr(RT ) compare the proposed LOOCV methods with several existing ρ + 0 τ =1, (7) methods which have attracted considerable attentions recently. tr(RΣ) tr(RΣ) In Section 3, we extend our results for multi-target shrinkage. 1Imposing Assumption 3 may introduce performance loss. Alternatively, Section 4 reports numerical examples, and finally Section 5 one may remove the constraint ρ ≥ 0, τ ≥ 0 and impose a constraint that b gives conclusions. Σρ,τ is PSD, similar to a treatment in [5]. 3

tr(RT ) tr(T2) 0 ρ + 0 τ =1. (8) where tr(T0Σ) tr(T0Σ) T T 1 R2 1 R T The oracle shrinkage coefficients can be obtained by solving T tr( t ) T tr( t 0) t=1 t=1 (7) and (8): ACV =  T  , (16) ⋆ 1 P P 2 ρO 1 T tr(RtT0) tr(T0) ⋆ = AO− bO. (9)  t=1  τO      P T  Note that (9) may produce negative shrinkage coefficients, 1 T tr(RtSt) t=1 which may not lead to a positive-definite estimate of the bCV =   . (17) PT covariance matrix. In this case, we clip the negative coefficient 1 tr(T S )  T 0 t  to zero and then find the other coefficient using (7) or (8)  t=1  to guarantee the positive definiteness, for τ = 0 or ρ = 0, The shrinkage coefficients canP then be found by solving the respectively. This treatment is similar to [2]-[5] and provides above bivariate, constant-coefficient quadratic program. Ana- a suboptimal yet simple solution. The oracle estimator requires lytical solutions can be obtained under different conditions, as knowledge of Σ, which is unavailable in real applications, but shown below. the result serves as an upper bound of the performance given 1) Unconstrained shrinkage: For unconstrained (ρ, τ), set- the linear shrinkage structure. ∂JCV(ρ,τ) ∂JCV(ρ,τ) ting the partial derivatives ∂ρ = ∂τ =0 yields T T tr(R2) tr(R T ) B. LOOCV Choice for General Cases t t 0 t=1 ρ + t=1 τ =1, (18) T T Let Σ denote a positive-definite, Hermitian matrix. It can P P tr(RtSt) tr(RtSt) be easily verified that the following cost t=1 t=1 P P b , 2 T JS(Σ) E[ Σ yy† F ] (10) || − || tr(RtT0) 2 t=1 T tr(T ) is minimized when Σ = Σ, where the expectation is taken ρ + 0 τ =1. (19) b b PT T over y. In this paper, we apply LOOCV [38] to produce an tr(T0St) tr(T0St) t=1 t=1 estimate of JS(Σ) asb the proxy for measuring the accuracy of Σ, based on which the shrinkage coefficients can be selected. Solving (18)P and (19) producesP the unconstrained solution With the LOOCVb method, the length-T training data Y = ⋆ ρCV 1 [by1, y2, , yT ] is repeatedly split into two sets with respect ⋆ = ACV− bCV. (20) ··· τCV to time. For the t-th split, where 1 t T , T 1 samples   ≤ ≤ − in Yt (with the t-th column yt omitted from Y) are used for We choose (20) as the optimal shrinkage coefficients if both ⋆ ⋆ producing a covariance matrix estimate Rt and the remaining ρCV and τCV are nonnegative. Otherwise, we consider the sample y is spared for parameter validation. In total, T splits optimal choices on the boundary of ρ 0, τ 0 specified by t ≥ ≥ of the training data Y are used and all the training samples are (18) or (19) for τ =0 or ρ =0 as used for validation once. Assuming shrinkage estimation with T given shrinkage coefficients (ρ, τ), we construct from each Yt tr(RtSt) a shrinkage covariance matrix estimator as ρ⋆ = t=1 , τ ⋆ =0, (21) CV PT CV 2 Σ R T tr(Rt ) t,ρ,τ = ρ t + τ 0. (11) t=1 We propose to use the following LOOCV cost function or P b T T tr(T0St) 1 2 ⋆ ⋆ t=1 J (ρ, τ)= Σ y y† (12) ρ =0, τ = . (22) CV T || t,ρ,τ − t t ||F CV CV PT tr(T2) t=1 0 X T b 2) Constrained shrinkage: For the more parsimonious de- 1 2 = ρRt + τT0 ytyt† F (13) sign using convex linear combination, the following constraint T || − || t=1 X is imposed: ρ =1 τ. (23) to approximate the cost in (10) when Σ is chosen as Σt,ρ,τ . − For notational simplicity, define By plugging (23) into the cost function (12) and taking the , b b minimizer, we can also easily find the optimal shrinkage St ytyt†. (14) coefficients using After some manipulations, the above cost function can be T written similarly to (4) as 2 tr(T0) tr(RtT0) tr(T0St) + tr(StRt) t=1 − − T ⋆ T T ρCV = . ρ ρ ρ 1 2 P T  JCV(ρ, τ)= ACV 2 bCV + tr(S ), 2 2 τ τ − τ T t (tr(Rt ) 2tr(RtT0) + tr(T0))       t=1 t=1 − X (15) P (24) 4

In case a negative shrinkage coefficient is produced, we set The closed-form solution to ρ is given by it to zero and let the other be one according to (23). Note T 4 2 2 P yt F that although the closed-form solution involves multiple matrix T tr(R ) (tr(RT0)) t=1k k T 1 tr(T2) T (T 1) operations, the quantities involved need to be computed only ⋆ − − 0 − − ρCV,SCM = T . (33) once. Furthermore, the computational complexity may be 4 2 2 2 P yt F (T 2T )tr(R ) (tr(RT0)) t=1k k greatly reduced given a specific method of covariance matrix − 2 2 + 2 (T 1) tr(T0) T (T 1) estimation. In the following two subsections, we will show − − − ⋆ ⋆ the simplified solutions for SCM- and OLS-based covariance In case ρCV,SCM > 1 or ρCV,SCM < 0, we apply (21) or (22), matrix estimation. respectively, to determine the solution, using the expressions in (27)-(30). tr(R) Note that for the typical shrinkage target T0 = N I, (32) results in τ ⋆ = 1 ρ⋆ . This provides another C. LOOCV Choice for SCM-Based Estimation CV,SCM CV,SCM justification for the convex− linear combination design with an We consider in this subsection that R is the SCM estimate identity target, which has been widely adopted in the literature, of Σ. In this case, e.g., [4]. This also shows that for such a special target the unconstrained solution is equivalent to the constrained T T T 1 1 1 solution, which does not hold for more general shrinkage R = yty† = Rt = St, (25) T t T T targets. t=1 t=1 t=1 X X X 2) Constrained shrinkage: For the widely considered con- which is a sufficient statistic for Gaussian-distributed data vex linear combination with constraint ρ + τ =1, the optimal when the mean vector is the zero vector. For the -th split, t ρ (ignoring the nonnegative constraint) is computed as the SCM constructed from all the samples except the t-th is T 4 2 P yt F 1 T 1 T tr(R ) t=1k k R y y† R S (26) 2 t = j j = t. T 1 2tr(RT0) + tr(T0) T (T 1) T 1 T 1 − T 1 ⋆ − − − − j=t ρCV,SCM = T . − − − 4 X6 t 2 2 P y F We can then verify the following expressions for quickly (T 2T )tr(R ) 2 t=1k k −(T 1)2 2tr(RT0) + tr(T0)+ T (T 1)2 computing the relevant quantities in (16) and (17): − − −(34) T T Similarly, in case a negative shrinkage coefficient is obtained, 1 T (T 2) 1 tr(R2)= − tr(R2) y 4 , we set it to zero and let the other be one. T t (T 1)2 − T (T 1)2 || t||F t=1 − − t=1 The above results show that the optimal shrinkage coeffi- X X (27) cients for the covariance matrix estimate (1) can be computed T T 1 T 1 directly from the samples and shrinkage target, without the tr(R S )= tr(R2) y 4 , T t t T 1 − T (T 1) || t||F need of specifying any user parameters. The constrained t=1 − − t=1 X X (28) shrinkage design may lead to certain performance loss as T compared to the unconstrained one. 1 tr(R T ) = tr(RT ), (29) T t 0 0 t=1 D. LOOCV Choice for OLS-Based Covariance Estimation X One advantage of the LOOCV method is that it can be T 1 applied to different covariance matrix estimators. In this tr(S T ) = tr (RT ) . (30) T t 0 0 subsection, we discuss the LOOCV method for OLS-based t=1 X covariance matrix estimation. Note that most existing analyt- Plugging these into (16) and (17) and after some manipula- ical solutions for choosing the shrinkage coefficients assume tions, we can rewrite the LOOCV cost function (15) as SCM and specific shrinkage targets and need to be re-derived ρT (ρT 2ρ 2T + 2) for general cases. Also, in contrast to general applications of J (ρ, τ)= − − tr(R2) CV (T 1)2 LOOCV which require a grid search of the parameters and − thus a high computational complexity, we have shown that for +2τ(ρ 1)tr(RT )+ τ 2tr(T2) − 0 0 SCM, fast analytical solutions can be obtained for choosing T 1 ρ 2 the shrinkage coefficients. This will also be the case for the + +1 y 4 . (31) T T 1 k tkF OLS-based covariance matrix estimation. t=1  −  X Consider the case with observation y CN modeled as The optimal shrinkage coefficients can then be obtained ana- ∈ lytically from the SCM R, the shrinkage target T0, and the y = Hx + z, (35) training samples yt , as discussed below. CN M { } where H × is a deterministic channel matrix and 1) Unconstrained shrinkage: It can be verified from (19) z CN a∈ zero-mean, white noise with covariance matrix and (30) that the optimal shrinkage coefficients (ignoring the σ2I∈, which is uncorrelated with the zero-mean input signal nonnegative constraint ρ 0, τ 0) satisfy x CM with covariance matrix I. If both training samples of ≥ ≥ ∈ tr(RT0) x and y are known, we may first estimate the channel matrix τ = (1 ρ) 2 . (32) H and the covariance matrix of the noise z using the ordinary − tr(T0) 5 least squares (OLS) approach. Let the block of training data where 2 2 be (X, Y), where the input signal X can be designed to have et F σ δt = k k . (46) certain properties such as being orthogonal. The OLS estimates N(T 1)(1 Φt) − T 1 − − c− of the channel matrix and noise variance are then obtained as Note that both updates can be achieved with low complexity 1 H = YX† XX† − , (36) when a few matrices are computed in advance and reused. In 1 2 this way, the covariance matrix estimate can be computed as 2  σb = Y HX TN − F Rt = R δtI etφ† ψte†, (47) 1 − − t − t c = tr (Y b HX )(Y HX)† TN − − where φt and ψt are defined as 1  1  = tr Y(I bX†(XX†)−b X)Y† , (37) φt = Hft, (48) TN − 2  ψt = φt ft et. (49) where ( ) denotes the estimate of a quantity. In this case, the −b || ||F · covariance matrix of y can be estimated as This shows that the leave-one-out OLS covariance matrix c estimate can be obtained from R by corrections involving R = HH + σ2I. (38) † a scaled identity matrix and two rank-one updates. Eqn. Such OLS-based covariance matrix estimation may be useful (47) can be exploited to compute the closed-form LOOCV b b c for designing signal estimation schemes in wireless commu- solution quickly. From (47), the most involved computation nications. We can apply the linear shrinkage design (1) to for finding the solution of the optimization problem (15) can enhance its accuracy and apply the LOOCV method (12) to be implemented as choose the shrinkage coefficients. Note that in this case, in T T 2 T 1 2 2 N δt 2 =1 δt the t-th split, we generate the covariance matrix estimate Rt tr(R ) = tr(R )+ i=1 i tr(R) T t T − T by applying the OLS estimate to the leave-one-out samples t=1 X P P (X , Y ) which are the subset of (X, Y) with the pair (x , y ) T t t t t 1 omitted. The LOOCV cost is the same as (15). In this case, + e 2 ( φ 2 + ψ 2 ) T || t||F || t||F || t||F the leave-one-out estimate of the covariance matrix for the t-th i=1 XT data split is 2 R(et†(R δtI)(φt + ψt) et†ψtet†φt), 2 − T − − Rt = HtHt† + σt I, (39) i=1 X (50) 2 where Ht and σt denote the channelc matrix and noise variance b b where R( ) denotes the real part of a scalar. When R is already estimated from (Xt, Yt), respectively. A direct computation of · (16) andb (17) forc evaluating the LOOCV cost performs OLS computed, the right-hand side of (50) can be evaluated using 2 estimation for T times, which incurs significant complexity. inner products and matrix-vector products. The terms tr(T0) The complexity can be greatly reduced by observing that the and tr(T0St) are the same as those for SCM. For the other leave-one-out OLS estimate of the channel matrix is related two terms, we have T to the OLS channel matrix estimate H in (36) by a rank-one 1 1 update: tr(R S )= tr RYY† T t t T 1 b t=1 − X  Ht = YtX† XtX† = H etf †, (40) T t t t 1 2 − R(δ y + y†e φ†y + y†ψ e†y ),   − T t k tkF t t t t t t t t where b b t=1 e , y Hx , (41) X (51) t t − t , 1 1 ft (XXb †)− xt. (42) T T 1 Φt 1 t=1 δt − tr(RtT0) = tr (RT0) tr(T0) T − T In the above, t=1 1 P Φ , x†(XX†)− x (43) X T t t t 1 R(φ†T e + e†T ψ ). (52) is the t-th diagonal entry of − T t 0 t t 0 t t=1 1 X Φ = X†(XX†)− X. (44) Note that the computational complexities of (50)-(52) are low because the major operations are matrix-vector products and Similarly, the leave-one-out estimate of the noise variance can be updated as inner products.

2 1 σ = tr (Y H X )(Y H X )† t N(T 1) t − t t t − t t E. Comparisons with Alternative Choices of Linear Shrinkage − Coefficients 2   c = σ δt, b b (45) − In the above, we have introduced LOOCV methods with c analytical solutions for choosing the coefficients for linear 6

tr(R) shrinkage covariance matrix estimators. We now discuss sev- T0 = N I and ρ = 1 τ. They first derive the oracle eral alternative techniques which have received considerable shrinkage coefficients for SCM− obtained from i.i.d. Gaussian attentions recently and compare them with the LOOCV meth- samples, which is determined by N,T, tr(Σ) and tr(Σ2). ods proposed in this paper. Then, they propose an iterative procedure to approach the In 2004, Ledoit and Wolf (LW) [2] studied estimators that oracle estimator. In the iterations, tr(Σ2) and tr(Σ) are shrink SCM toward an identity target, i.e., T0 = I. Such estimated by tr(Σj R) and tr(Σj ), respectively, where Σj estimators do not alter the eigenvectors but shrink eigenvalues is the covariance matrix estimate at the j-th iteration. It is of the SCM, which is well supported by the fact that sample further proved thatb Σj convergesb to the OAS estimator withb eigenvalues tend to be more spread than population eigen- the following analytical expression for τ: values. The optimal shrinkage coefficients under the MMSE b 2 2 2 ⋆ 1 N tr(R ) + (tr(R)) criterion (3) can be written as − 2 (59) τOAS = min 1, (tr(R)) . T +1 2 [tr(R2) ]! α2 β2 − N − N ρ⋆ = , τ ⋆ = µ, (53) δ2 δ2 This approach achieves superior performance for (scaled) iden- tity target and Gaussian data and dominates the LW estimator where the parameters µ , tr(Σ) , δ2 , E[ R µI 2 ], N k − kF [2] when T is small. It was later generalized by Senneret et β2 = E[ Σ R 2 ], and α2 = Σ µI 2 depend on the k − kF k − kF al [20] to a shrinkage target chosen as the diagonal entries of true covariance matrix Σ and other unknown statistics. Ref. the SCM. Other related techniques include [14], which also 2 2 2 [2] shows that δ = α +β and proposes to approximate these assumes SCM, Gaussian data, and identity/diagonal shrinkage quantities by their asymptotic estimates under T ,N targets. , N/T c< , as → ∞ → ∞ → ∞ All the above techniques provide analytical solutions and tr(R) achieve near-oracle performance when the underlying as- µ = , δ2 = R µI 2 , (54) N k − kF sumptions (e.g., large dimensionality, large size of training data, identity/diagonal shrinkage targets) hold. However, they T b 2 2 b2 1 b 2 2 2 also have limitations. A common restriction is that all these β = min δ , ytyt† R , α = δ β , T 2 − F − t=1 ! analytical solutions assume SCM and are not optimized for X c b c b c(55) other types of covariance matrix estimators such as model- which can all be computed from the training samples. By based estimators. In particular, the LW and GLC methods [2], substituting these into (53), estimators that significantly out- [3], which employ asymptotic approximations, may exhibit a perform SCM are obtained, which also approach the oracle noticeable gap to the oracle choice when the sample support estimators when the training length is large enough. is low, which may be relevant in some applications. The OAS The above LW estimator is extended by Stoica et al in method [4] assumed identity target, but its extensions to more 2008 [3] for complex-valued signals with general shrinkage general cases, e.g., with multiple/general shrinkage targets, targets T0, with applications to knowledge-aided space-time are not trivial. By contrast, the LOOCV method proposed in adaptive processing (KA-STAP) in radar applications. Several this paper allows different designs and achieves near-oracle estimators with similar performance are derived there. For the performance in general. general linear combination (GLC) design of [3], it is shown Cross-validation has also been applied previously for choos- that the oracle shrinkage coefficients for (1) satisfy ing shrinkage coefficients for covariance matrix estimation. τ ⋆ The key issues for applying this generic tool include finding ρ⋆ =1 , (56) appropriate predictive metrics for scoring the different estima- − ν tors and fast computation schemes. In [10], [13], the Gaussian where likelihood was chosen as such a proxy. The computations 2 tr(T0Σ) ⋆ β with likelihood are generally involved as multiple matrix in- ν = 2 , τ = ν 2 . (57) T0 F E[ R νT0 F ] verses/determinants are required, and a grid search is required k k k − k for finding the optimal parameters. In this paper, we use the The quantity β2 is estimated in the same way as (55), and a distribution-free, Frobenius norm loss in (12) as the metric, 2 computationally efficient expression for β is given by which leads to analytical solutions and is computationally T 1 1 more tractable. β2 = y 4 cR 2 . (58) T 2 k tkF − T k kF t=1 X c 2 III. MULTI-TARGET SHRINKAGE Furthermore, ν and E[ R νT0 F ] are estimated as ν = tr(T0R) ||2 − || 2 and R νT0 F , respectively. This leads to the result In Section 2, we have considered linear shrinkage designs T0 F || − || given|| || by Eqns. (34) and (35) of [3], which can recoverb the with a single target. Multiple shrinkage targets may be used to LW estimator [2]b when the identity shrinkage target T0 = I further enhance performance, which may be obtained from a is assumed. priori knowledge, e.g., a past covariance matrix estimate from More recently, Chen et al [4] derived the oracle approx- older training samples or from neighboring frequencies. We imating shrinkage (OAS) estimator, which assumes SCM, can easily extend our proposed LOOCV method to multiple real-valued Gaussian samples, and scaled identity target with targets. 7

A. Oracle choice of shrinkage coefficients with T T T 2 P tr(Rt ) P tr(RtT1) P tr(RtTK ) Consider the multi-target shrinkage design t=1 t=1 t=1 T T T K  T ···  P tr(T1Rt) Σρ,τ = ρR + τkTk, (60)  t=1 tr(T2) tr(T T )  ACV,MT =  T 1 ··· 1 K  , Xk=1  . . .. .  b  . . . .  where all the shrinkage coefficients are nonnegative to guar-  T    antee PSD covariance matrix estimates, i.e.,  P tr(TK Rt)   t=1 tr(T T ) tr(T2 )   T K 1 ··· K  ρ 0; τk 0, k. (61)  (68) ≥ ≥ ∀ T 1 The oracle multi-target shrinkage minimizes the squared T tr(RtSt) t=1 Frobenius norm of the estimation error  T  1 P 2 tr(T S ) K  T 1 t  bCV,MT =  t=1  . (69) JO,MT(ρ, τ )= ρR + τkTk Σ , (62)  P .  −  .  k=1 F   X  T  which can be rewritten as  1 tr(T S )   T K t  T T  t=1  ρ ρ ρ 2  P  J (ρ, τ )= A 2 b +tr(Σ ), The constant entries of ACV,MT and bCV,MT can be computed O,MT τ O,MT τ − τ O,MT       (63) in the same way as for the single-target case. When K is small, where τ = [τ , τ , , τ ]T , which is typically the case, the solution that minimizes the 1 2 ··· K LOOCV cost can be found quickly using standard optimization 2 tr(R ) tr(RT1) tr(RTK ) tools. Alternatively, we may find first the global optimizer that 2 ··· tr(T1R) tr(T ) tr(T1TK ) ignores the nonnegative constraint by 1 ··· AO,MT =  . . . .  , (64) . . .. . ⋆ . . . ρCV,MT 1   = A− b , (70)  tr(T R) tr(T T ) tr(T2 )  τ ⋆ CV,MT CV,MT  K K 1 ··· K  CV,MT     tr(RΣ) and check if the nonnegative condition is satisfied. If a tr(T1Σ) negative shrinkage coefficient is produced, we then consider   bO,MT = . . (65) the boundaries of ρ 0, τ 0, k = 1, 2, ,K, which . ≥ k ≥ ···   are equivalent to removing a certain number of shrinkage  tr(T Σ)   K  targets from the shrinkage design. The solution can be found The oracle shrinkage coefficients can then be obtained by solv- in exactly the same way as (70) but with fewer targets. ing the problem of minimizing the cost function JO,MT(ρ, τ ) of (63), which is a strictly convex quadratic program (SCQP) with K +1 variables. Similarly to the single-target case, we may also consider a constrained case, where the shrinkage targets T have the { k} B. LOOCV choice of shrinkage coefficients same trace as the estimated covariance matrix R, and K We now extend the LOOCV method in Section 2 to the ρ + τk =1. (71) multi-target shrinkage here. Following the same treatment as k=1 X in Section II-B, in each split of the training data, Rt and St Then the LOOCV cost function can be rewritten as are constructed to generate and validate the covariance matrix T K 2 estimate, respectively. The multiple shrinkage coefficients are 1 J (τ )= τ A + B , (72) chosen to minimize the LOOCV cost CV,MT T k kt t t=1 k=1 F 2 X X T K 1 where JCV,MT(ρ, τ )= ρRt + τkTk St . (66) T − , t=1 k=1 F Akt Tk Rt, 1 k K, 1 t T, (73) X X − ≤ ≤ ≤ ≤ The above cost function can be rewritten in a form similar to , Bt Rt St, 1 t T. (74) (15) as − ≤ ≤ T T The optimal shrinkage coefficients can be found similarly as ρ ρ ρ J (ρ, τ )= A 2 b for the unconstrained case by minimizing CV,MT τ CV,MT τ − τ CV,MT       T T T T 1 2 1 J (τ )= τ A′ τ 2τ b′ + tr(B ), + tr(S2) (67) CV,MT CV,MT − CV,MT T t T t t=1 t=1 X (75) X 8

where the entries of A′ and b′ are defined by CV,MT CV,MT 15 T 1 [A′ ] , tr(A A ), 1 m,n K, (76) CV,MT mn T mt nt ≤ ≤ t=1 10 X T 1 [b′ ]k , tr(AktBt), 1 k K. (77) CV,MT T ≤ ≤ 5 t=1 X These entries may also be evaluated quickly. For example, Normalized MSE (dB) with SCM, 0

[A′ ] = tr(T T ) tr((T + T )R)+ η, (78) CV,MT mn m n − m n -5 T 0 5 10 15 20 25 30 T 2 1 4 T [b′ ] = tr(R ) y η, (79) CV,MT k T 1 − T (T 1) || t||F − t=1 − − X Figure 1. NMSE of single-target (ST) shrinkage estimates of an AR R where covariance matrix with N = 100,r = 0.5, T = tr( ) I. “LW”, “GLC” T 0 N and “OAS” refer to the methods of [2], [3] and [4], respectively, which are 1 2 η = tr(R ) also described in Section II-E; “CV” refers to our proposed LOOCV method; T t t=1 “Oracle” refers to the coefficient choice in Section II-A; and “Con” and “Unc” X indicate that the constraint ρ + τ = 1 is imposed or not, respectively. can be computed using (27). The solution to τ can be found as ⋆ 1 τCV,MT = A′CV− ,MTbCV′ ,MT (80) if the nonnegative condition is satisfied. Otherwise, find the which has been widely considered for evaluating covariance solution in a similar way as for the unconstrained case. matrix estimation techniques [4]-[7]. Let Σ1/2 be the Cholesky Note that for multi-target shrinkage, Lancewicki and Al- factor of Σ. The training samples are randomly generated 1/2 adjem [17] recently introduced another method for finding as yt = Σ nt, where nt consists of i.i.d. entries drawn the shrinkage coefficients. They assume SCM and shrinkage tr(R) from (0, 1). The typical shrinkage target T0 = N I is targets which belong to a set that can be characterized by consideredN for single-target shrinkage. Our proposed LOOCV Eqn. (21) of [17]. Then, they follow the Ledoit-Wolf (LW) method is compared with the widely used alternative methods framework [2] to derive unbiased estimates of the unknown [2]-[4] for choosing the shrinkage coefficients. The simulation coefficients needed for minimizing the expectation of the cost results (averaged over 1000 repetitions for each training length in (62), based on which ρ, τ can be optimized. By contrast, ) in Fig. 1 confirm that the LOOCV methods with and { } T our approach resorts to a LOOCV estimate of the cost in (10), without the constraint ρ + τ = 1 produce the same results which does not rely on the aforementioned assumptions in for the scaled identity target and they achieve performance [17]. As will be shown later, the LOOCV method can achieve almost identical to the OAS estimator [4], which was derived similar performance as [17] for the shrinkage targets consid- by assuming Gaussian data and identity target. The LW [2] ered there. However, it can be applied to general estimators and GLC [3] methods, which are equivalent for the scaled other than SCM and shrinkage targets which are not covered identity target here, do not perform well for very low sample by Eqn. (21) of [17], offering wider applicability. support, but are able to approximate the oracle choice very well when more samples are available, which is consistent IV. NUMERICAL EXAMPLES with the observations from [4]. All of these shrinkage designs In this section, we present numerical examples to demon- significantly outperform the SCM, confirming the effectiveness strate the effectiveness of the proposed shrinkage design and of shrinkage for covariance matrix estimation. Recall that compare it with alternative methods. The quality of covariance these methods were derived using different strategies and matrix estimation is measured by the MSE normalized by the assumptions and have different analytical solutions. average of the squared Frobenius norm Σ 2 , i.e., || ||F Example 2: Shrinkage toward a nondiagonal target: We Σ Σ 2 , E[ ρ,τ F ] then consider an example of the linear model given by NMSEΣ || −2 || . (81) E[ Σ F ] (35). For each training length, 1000 random realizations of b || || Σ = HH +σ2I are generated and estimated through training, We show examples of covariance matrix estimation and its ap- † where σ2 =0.1. The entries of H are independently generated plications in array signal processing. We denote by (µ, σ2) from (0, 1) and then fixed for the whole training process. a real-valued Gaussian distribution with mean µ andN variance GivenNH, T training samples are generated by y = Hx + z, σ2. with the entries of x and z generated independently from Example 1: Shrinkage toward an identity target: We first and 2 , respectively. In order to demonstrate consider a real-valued example with an autoregressive (AR) (0, 1) (0, σ ) Nthe effectivenessN of the LOOCV method for general shrinkage covariance matrix, whose (i, j)-th entry is given by targets, we assume a scenario where H is slowly time-varying i j [Σ] = r| − |, 1 i, j N, (82) and the shrinkage target T can be constructed as a well- i,j ≤ ≤ 0 9

2 2

0

0 -2

-4 -2 -6

-8 -4 -10

-6 Normalized MSE (dB) -12 Normalized MSE (dB) -14

-8 -16

-18 1 2 -10 10 10 101 102 T T Figure 3. MSE of covariance matrix estimation with multi-target (MT) Figure 2. NMSE of single-target (ST) shrinkage estimation of covariance shrinkage and LOOCV parameter choices. AR covariance matrix with N = matrix for the linear model (35) with N = 50,M = 50, σ2 = 0.1. The non- 50,r = 0.9 is assumed. “LA” refers to the method proposed by Lancewicki diagonal shrinkage target is constructed from the estimate of a past covariance and Aladjem [17]. Note that the LOOCV methods and the LA method achieve matrix. The result indicated by “T0” corresponds to estimating Σ as T0. similar performance for this example. “Identity” indicates a scaled identity shrinkage target is used instead. We can show that imposing the constraint (23) leads to negligible change in performance for the proposed LOOCV approach. LOOCV methods are used, the gap is significantly reduced. We can show that when the number of samples is small, using a more parsimonious design with constrained shrinkage conditioned estimate of a past covariance matrix coefficients or fewer shrinkage targets may achieve better past past past 2 Σ = H H † + σ I, (83) performance. It is seen that the multi-target shrinkage method of [17] (indicated by “MT-LA” in Fig. 3) performs similarly where to the LOOCV method for this example. Note that the method past H = H + ∆, (84) of [17] assumes SCM and shrinkage targets satisfying certain and the entries of ∆ are independently drawn from (0, 0.2) structures and does not apply directly to model-based covari- N ance matrix estimation or more general shrinkage targets. and are fixed for each repetition. Specifically, we construct T0 as the shrinkage estimate of Σpast using SCM and the scaled Example 4: Application to MMSE estimation of MIMO identity target. This construction is similar to the knowledge- channels. A potential application of the proposed technique is the design of MMSE estimator of MIMO channels. Consider aided target considered in [3] and the resulting T0 is not diagonal. We assume that the numbers of samples used for a point-to-point MIMO system with Nt transmitting antennas and receiving antennas. Let be the length of the pilot estimating Σ and T0 are both equal to T . The simulation Nr B results are included in Fig. 2 for the normalized MSE. It can be sequence. The received signal matrix during the training stage seen that the LOOCV methods generally achieve near-oracle is modelled as performance and outperform the GLC method. Also, the non- Y = HP + N, (86) diagonal shrinkage target achieves better performance than the Nr B where Y C × is the received signal matrix, H scaled identity target. Nr Nt ∈ Nt B ∈ C × the channel matrix, P C × the pilot matrix, Nr B ∈ Example 3: Shrinkage with multiple targets: A multi-target and N C × the noise which is uncorrelated with H. example is illustrated in Fig. 3. An AR covariance matrix Vectorizing∈ Y in (86) gives is estimated by shrinking SCM with three targets which tr(R) y = Ph + n, (87) can be represented by Eqn. (21) of [17]: T1 = N I, T2 = Diag(R), and T3 is a symmetric, Toeplitz matrix which where y = vec(Y), P = PT I, h = vec(H), n = vec(N), e was considered in [17]: vec( ) denotes vectorization, and⊗ denotes Kronecker prod- · ⊗ N 1 uct. We assume a Rayleigh fading channel and denote by R − C R e tr( ) tr( i ) NtNr NtNr T3 = I + Ci, (85) Σh C × the covariance matrix of the channel N 2(N i) ∈ i=1 vector h. We also assume that the disturbance n is complex X − where Ci is a symmetric, Toeplitz matrix with unit entries on Gaussian-distributed with a zero mean and identity covariance the i-th sub- and super-diagonals and zeros elsewhere. It is matrix. seen that multi-target shrinkage can significantly outperform Given Σh, the MMSE estimate of h from y can be tr(R) computed as [41] single-target shrinkage with T0 = N I when the number of samples is large enough. For the oracle parameter choices, h = Σ P (PΣ P + I) 1y. (88) the unconstrained shrinkage design, which allows a larger set MMSE h † h † − of shrinkage factors to be chosen, can noticeably outperform The covariance matrix Σ , which can be very large, must b he e e the design constrained by (71). However, when the proposed be estimated in order to compute hMMSE. In communication

b 10

j0.9349π j0.9289π rt = 0.7e− and rr = 0.9e− . While applying 0 Σ shrinkage to estimate hLS , two shrinkage targets are tested: -1 the identity matrix and theb shrinkage estimate (with a scaled -2 identity target) of a past covariance matrix. The second is

-3 considered based on the assumption that Σh is slowly varying in time and a well-conditioned estimate of a past covariance -4 past past matrix Σh can be available. In our simulations, Σh -5 is modeled by randomly perturbing rt and rr in (93) and -6 (94) by δt and δr whose real and imaginary parts are both -7 1 1

NMSE of channel estimation (dB) randomly and uniformly generated from , . The − 10√2 10√2 -8 normalized MSE of channel estimation ish defined as i -9 E[ h h 2 ] 100 150 200 250 300 350 400 450 500 550 600 , MMSE F T NMSEh || −2 || , (95) E[ h F ] b || || Figure 4. Performance of MMSE estimation of MIMO channels with the where hMMSE is the MMSE channel estimate obtained from channel covariance matrix estimated using different estimators with = Nt (88) with the true channel covariance matrix replaced by its Nr = B = 10. Pilot-to-noise ratio is 5 dB. “LS” refers to the LS estimator in (89); “MMSE” refers to the MMSE channel estimator (88) constructed shrinkageb estimate. using estimated covariance matrices; “identity” and “past” represent shrinkage targets chosen as the scaled identity matrix and the estimate of a past covariance matrix, respectively; “MT-CV” uses both the identity target and From the simulation results in Fig. 4, when the number of the target set as a past estimate. samples T of channel estimates is small, the MMSE channel estimator constructed using the SCM estimate of Σh is poorer than the LS estimator which does not require any knowledge of Σ . Therefore, an accurate estimate of the covariance matrix systems, h is not directly observable and thus the SCM h is necessary to exploit the potential of the MMSE channel estimator can not be directly applied to generate Σ . One h estimator. Shrinkage with LOOCV choice of the shrinkage may estimate Σ from least squares (LS) estimates of H, i.e., h coefficients improves the performance of the MMSE channel 1 HLS = YP†(PP†)− . (89) estimator by providing a better estimate of Σh. Two-target shrinkage can further enhance performance. Note that the When orthogonal training signal with P = √P I is applied, b multi-target method of [17] is not directly applicable to the where P determines the power for training signals, it can be shrinkage target used here. Similarly to [42], we do not exploit shown that the Kronecker product structure in (92) and the exponential 1 1 1 modeling of (93) and (94) while estimating the covariance HLS = Y = (HP + N)= H + N. (90) √P √P √P matrix and similar trends can be observed when the channel covariance matrix follows different models such as those in Denoteb by h the vectorization of H . It can be shown that LS LS [43], [44]. the covariance matrix of hLS is b b 1 Σ , E[h h† ]= Σ + I. (91) Example 5: Application to LMMSE signal estimation: An- hLS bLS LS h b P other example application is the design of linear minimum Σ Σ (LMMSE) estimator [45], [46] for estimat- Therefore, if hLS is estimatedb b as hLS , we can then use (91) b b to estimate Σh as ing the transmitted signal x in MIMO communications. The b 1 received signal is modeled by (35) and the LMMSE estimate Σ Σ I h = hLS , of x is obtained as b − P 1 which can be used in (88). The estimation of Σ can be x = H†Σ− y, (96) b b hLS y achieved using the different shrinkage estimatorsb introduced where we have assumed that x has identity covariance ma- in this paper. b trix and Σy is the covariance matrix of y. The OLS-based An example is shown in Fig. 4. The covariance matrix is covariance matrix estimation in Section II-D can be used to assumed to be estimate Σy in (96). In Fig. 5, we show an example where Σh = Σt Σr, (92) the shrinkage target T is chosen as the diagonal matrix of ⊗ 0 the OLS estimate (38) of the covariance matrix. This results where Σt and Σr are, respectively, the transmitter side and receiver side covariance matrix, with entries given by in a shrunk LMMSE signal estimator 2 1 i j x = H†(ρ(HH† + σ I)+ τT0)− y. (97) rt| − |, i j [Σt]i,j = i j ≥ , (93) (r )| − |, i

-2 Standard OLS 8 OLS with shrinkage, CV OLS with shrinkage, oracle -3 6

4 Nr = Nt = 40 -4 2

0 -5 -2

-6 Average SINR (dB) Nr = 60,Nt = 40 -4

-6 -7

Normalized MSE of signal estimation (dB) -8

-8 -10 40 50 60 70 80 90 100 110 120 130 140 5 10 15 20 25 30 T T

Figure 5. Performance of the LMMSE signal estimator with channel matrix Figure 6. Average output SINR for a MVDR beamformer with AoA mismatch and received signal’s covariance matrices estimated using OLS and shrinkage. and the estimated covariance matrix. The results labeled by “SCM” is obtained − The entries of H are independently generated from complex Gaussian distri- by replacing Σ 1 in (99) with the pseudo-inverse of the SCM. Note that the bution with zero mean and variance 1/40, and the noise variance σ2 = 0.1. LOOCV and OAS methods achieve almost the same performance, which is slightly better than the GLC method when T is very small.

The normalized MSE of signal estimation is defined as 2 focus on the low-sample-support case and compare the result E[ x x F ] NMSEx , || − || . (98) with an approach that uses the pseudo-inverse of the SCM for E[ x 2 ] || ||F computing w. The output signal-to-interference-and-noise ra- Fig. 5 presents the simulation resultsb averaged over 1000 tio (SINR) averaged over 1000 repetitions are plotted in Fig. 6. random realizations of H for each T . It can be seen that It is seen that though the proposed approach targets covariance the shrinkage estimate of the covariance matrix can lead to matrix estimation only and is not optimized for beamformer noticeable improvement of the MSE performance of signal designs, it still provides noticeable gains as compared to the estimation. The resulting performance can approach the oracle pseudo-inverse approach in the low-sample-support regime. choice of (ρ, τ) that minimizes the MSE of estimating x [35]. Note that in contrast to the cross-validation methods in [35] and [36] which choose shrinkage factors by a grid search V. CONCLUSIONS for optimizing the signal estimation performance, the method proposed in this paper has an analytical solution and optimizes In this paper, we have introduced a leave-one-out cross- covariance matrix estimation. It also differs from [47] which validation (LOOCV) method for choosing the coefficients for targets the design of a signal estimator that shrinks the sample linear shrinkage covariance matrix estimators. By employing LMMSE filter toward the matched filter. a quadratic loss as the LOOCV prediction error, analytical Example 6: Application to MVDR beamforming: Finally, we expressions of the optimal shrinkage coefficients are obtained, show an example application to minimum variance distortion- which do not require a grid search of the parameters. As a less response (MVDR) beamforming [31], [33]. We assume result, the coefficients can be computed at low costs for the a N = 30-element uniform linear array (ULA) with half- SCM- and OLS-based estimation of the covariance matrix. wavelength spacing between neighboring antennas. As in [33], The LOOCV method is generic in the sense that it can be we assume that the desired complex Gaussian signal has an applied to different covariance matrix estimation methods and angle of arrival (AoA) of θ = 0 and there are 8 complex different shrinkage targets. Numerical examples show that 0 ◦ it can approximate the oracle parameter choices in general Gaussian interferences in the directions θ = 8◦, 15◦, { m} { − and have wider applications than several existing analytical 23◦, 21◦, 46◦, 44◦, 85◦, 74◦ , all with an average power−10 dB higher− than− the desired} signal. The noise is methods that have been widely applied. assumed to be additive white Gaussian noise (AWGN) with Zero-mean signals have been assumed in this paper. When an average power 10 dB lower than the desired signal. The nonzero-mean signals are considered, our proposed approach MVDR beamformer is given by may be applied after subtracting an estimate of the mean from the samples. However, the inaccuracy in the mean vector Σ 1s w − (99) estimate may introduce extra errors to the covariance matrix = 1 , s†Σ− s estimation. Jointly estimating the mean and covariance matrix where s is the steering vector of the desired signal and Σ is in a robust manner may be further explored. Other future work the covariance matrix of the received signal. We consider a includes theoretical study of the properties of the proposed practical scenario where the desired signal’s steering vector approach and low-complexity cross-validation schemes for suffers from an AoA error uniformly distributed in [ 5◦, 5◦] choosing shrinkage factors for specific signal processing appli- and Σ is estimated from the training samples by shrinking− cations such as beamforming, space-time adaptive processing, tr(R) the SCM R toward the scaled identity matrix N I. We correlation analysis, etc. 12

ACKNOWLEDGMENTS [24] A. Aubry, A. De Maio, L. Pallotta, and A. Farina, “Maximum likelihood estimation of a structured covariance matrix with a condition number The authors wish to thank Prof. Antonio Napolitano and the constraint,” IEEE Trans. Signal Process., vol. 60, pp. 3004-3021, 2012. anonymous reviewers for their constructive comments which [25] J.-H. Won, J. Lim, S.-J. Kim, and B. Rajaratnam, “Condition-number regularized covariance estimation,” J. Roy. Statist. Soc. B, vol. 75, pp. have greatly improved the paper. This work was supported 427-450, Jun. 2013. in part by an International Links grant of University of [26] A. Kourtis, G. Dotsis, and R. N. Markellos, “Parameter uncertainty in Wollongong (UOW), Australia, and in part by NSFC under portfolio selection: Shrinking the inverse covariance matrix,” J. Bank. Financ., vol. 36, no. 9, pp.2522-2531, 2012. Grant 61601325. [27] M. Zhang, F. Rubio, and D. P. Palomar, “Improved calibration of high- dimensional precision matrices,” IEEE Trans. Sig. Process., vol. 61, no. 6, pp. 1509-1519, 2013. REFERENCES [28] C. Wang, G. Pan, T. Tong, and L. Zhu, “Shrinkage estimation of large dimensional precision matrix using random matrix theory,” Statistica [1] L. L. Scharf, Statistical Signal Processing: Detection, Estimation, and Sinica, vol. 25, no. 3, pp. 993-1008, 2015. Time Series Analysis, Addison–Wesley, Bosten, 1991. [29] T. Ito and T. Kubokawa, “Linear ridge estimator of high-dimensional [2] O. Ledoit and M. Wolf, “A well-conditioned estimator for large- precision matrix using random matrix theory,” Technical Report F-995, dimensional covariance matrices,” J. Multivar. Anal., vol. 88, pp. 365- CIRJE, Faculty of Economics, University of Tokyo, 2015. 411, 2004. [30] T. Bodnar, A. K. Gupta, and N. Parolya, “Direct shrinkage estimation [3] P. Stoica, J. Li, X. Zhu, and J. R. Guerci, “On using a priori knowledge of large dimensional precision matrix,” J. Multivar. Anal., vol. 146, pp. in space-time adaptive processing,” IEEE Trans. Sig. Process., vol. 56, 223-236, 2016. no. 6, pp. 2598-2602, 2008. [31] X. Mestre and M. A. Lagunas, “Finite sample size effect on minimum [4] Y. Chen, A. Wiesel, Y. C. Eldar, and A. O. Hero, “Shrinkage algorithms variance beamformers: Optimum diagonal loading factor for large ar- for MMSE covariance estimation,” IEEE Trans. Sig. Process., vol. 58, rays,” IEEE Trans. Sig. Process., vol. 54, no. 1, pp. 69-82, 2006. no. 10, pp. 5016-5029, 2010. [32] C.-K. Wen, J.-C. Chen, and P. Ting, “A shrinkage linear minimum [5] L. Du, J. Li, and P. Stoica, “Fully automatic computation of diagonal mean square error estimator,” IEEE Sig. Process. Lett., vol. 20, no. 12, loading levels for robust adaptive beamforming,” IEEE Trans. Aerosp. pp.1179-1182, 2013. Electron. Syst, vol. 46, no. 1, pp. 449-458, 2010. [33] J. Serra and M. N´ajar, “Asymptotically optimal linear shrinkage of [6] P. J. Bickel and E. Levina, “Regularized estimation of large covariance sample LMMSE and MVDR filters,” IEEE Trans. Sig. Process., vol. matrices,” Ann. Statist., vol. 36, no. 1, pp. 199-227, 2008. 62, no. 14, pp. 3552-3564, 2014. [7] P. J. Bickel and E. Levina, “Covariance regularization by thresholding,” [34] M. Zhang, F. Rubio, D. Palomar, and X. Mestre, “Finite-sample linear Ann. Statist., vol. 36, no. 6, pp. 2577-2604, 2008. filter optimization in wireless communications and financial systems,” IEEE Trans. Sig. Process., vol. 61, no. 20, pp. 5014-5025, 2013. [8] C. Stein, “Inadmissibility of the usual estimator for the mean of a multivariate normal distribution,” Proc. Third Berkeley Symp. Math. [35] J. Tong, P. J. Schreier, Q. Guo, S. Tong, J. Xi, and Y. Yu, “Shrinkage of covariance matrices for linear signal estimation using cross-validation,” Statist. Prob., 1, pp. 197-206, 1956. IEEE Trans. Sig. Process., vol. 64, no. 11, pp. 2965-2975, 2016. [9] L. R. Haff, “Empirical Bayes estimation of the multivariate normal [36] J. Tong, Q. Guo, J. Xi, Y. Yu, and P. J. Schreier, “Choosing the diagonal covariance matrix,” Ann. Statist., vol. 8, no. 3, pp. 586597, 1980. loading factor for linear signal estimation using cross validation,” in [10] J. P. Hoffbeck and D.A. Landgrebe, “Covariance matrix estimation and Proc. IEEE ICASSP 2016, pp. 3956-3959, 2016. classification with limited training data,” IEEE Trans. Pattern Anal. [37] J. R. Guerci and E. J. Baranoski, “Knowledge-aided adaptive radar at Mach. Intell., vol. 18, no. 7, pp.763-767, 1996. DARPA: an overview,” IEEE Signal Process. Mag., vol. 23, no. 1, pp. [11] M. Daniels and R. Kass, “Shrinkage estimators for covariance matrices,” 41-50, Jan. 2006. Biometrics , vol. 57, pp. 1173-1184, 2001. [38] S. Arlot and A. Celisse, “A survey of cross-validation procedures for [12] J. Sch¨afer and K. Strimmer, “A shrinkage approach to large-scale model selection,” Statist. Surv., vol. 4, pp. 40-79, 2010. covariance matrix estimation and implications for functional genomics,” [39] G. H. Golub, M. Heath, and G. Wahba, “Generalized cross-validation Statist. Appl. Genetics Molecular Biol., vol. 4, 2005. as a method for choosing a good ridge parameter,” Technometrics, vol. [13] D. I. Warton, “Penalized normal likelihood and ridge regularization of 21, no. 2, pp. 215-223, 1979. correlation and covariance matrices,” JASA, vol. 103, no. 481, pp. 340- [40] R. D. Nowak, “Optimal signal estimation using cross-validation,” IEEE 49, 2008. Sig. Process. Letters, vol. 4, no. 1, pp. 23-25, 1997. [14] T. J. Fisher and X. Sun, “Improved Stein-type shrinkage estimators [41] E. Bj¨ornson and B. Ottersten, “A framework for training-based esti- for the high-dimensional multivariate normal covariance matrix,” Comp. mation in arbitrarily correlated Rician MIMO channels with Rician Statist. Data Analysis, 55, 1909-1918, 2011. disturbance,” IEEE Transactions on Signal Processing, vol. 58, no. 3, [15] X. Chen, Z. Jane Wang, and M. J. McKeown, “Shrinkage-to-tapering pp. 18071820, March 2010. estimation of large covariance matrices,” IEEE Trans. Sig. Process., vol. [42] N. Shariati, E. Bj¨ornson, M. Bengtsson and M. Debbah, “Low- 60, pp. 5640-5656, 2012. complexity polynomial channel estimation in large-scale MIMO with [16] J. Theiler, “The incredible shrinking covariance estimator,” Proc. SPIE., arbitrary statistics,” IEEE J. Sel. Topics Signal Process., vol. 8, no. 5, vol. 8391, pp. 83910P, 2012. pp. 815-830, Oct. 2014. [17] T. Lancewicki and M. Aladjem, “Multi-target shrinkage estimation for [43] W. Weichselberger, M. Herdin, H. Ozcelik, and E. Bonek, “A stochastic covariance matrices,” IEEE Trans. Sig. Process., vol. 62, no. 24, pp. MIMO channel model with joint correlation of both link ends,” IEEE 6380-6390, 2014. Trans. Wireless Commun., vol. 5, no. 1, pp. 90-100, 2006. [18] Y. Ikeda, T. Kubokawa, and M. S. Srivastava, “Comparison of linear [44] J. Fang, X. Li, H. Li and F. Gao, “Low-rank covariance-assisted shrinkage estimators of a large covariance matrix in normal and non- downlink training and channel estimation for FDD massive MIMO normal distributions,” Comput. Stat. Data Anal., vol. 95, 95-108, 2016. systems,” IEEE Trans. Wireless Commun., vol. 16, no. 3, pp. 1935-1947, [19] T. Tong, C. Wang, and Y. Wang, “Estimation of and covari- March 2017. ances for high-dimensional data: a selective review,” Wiley Interdiscip. [45] D. Tse and P. Viswanath, Fundamentals of Wireless Communications. Rev. Comput. Stat., vol. 6, no. 4, pp. 255-264, 2014. Cambridge, U.K.: Cambridge Univ. Press, 2005. [20] M. Senneret, Y. Malevergne, P. Abry, G. Perrin, and L. Jaffrs, “Covari- [46] N. Kim, Y. Lee and H. Park, “Performance analysis of MIMO system ance versus precision matrix estimation for efficient asset allocation,” with linear MMSE receiver,” IEEE Trans. Wireless Commun., vol. 7, no. IEEE J. Sel. Topics Sig. Process., vol. 10, no. 6, pp. 982-993, Sept. 11, pp. 4474-4478, Nov. 2008. 2016. [47] J. Tong, J. Xi, Q. Guo and Y. Yu, “Low-complexity cross-validation [21] J. Fan, Y. Liao, and H. Liu, “An overview of the estimation of large design of a linear estimator,” Electronics Letters, vol. 53, no. 18, pp. covariance and precision matrices,” Econom. J. 19, no. 1, C1-C32, 2016. 1252-1254, 2017. [22] O. Ledoit and M. Wolf, “Nonlinear shrinkage estimation of large- dimensional covariance matrices,” Ann. Statist., vol. 40, no. 2, 1024- 1060, 2012. [23] C. Lam, “Nonparametric eigenvalue-regularized precision or covariance matrix estimator,” Ann. Statist., vol. 44, no. 3, pp. 928-953, 2016.