1

Rotational invariant for general noisy matrices Joel¨ Bun∗ † ‡, Romain Allez§, Jean-Philippe Bouchaud ∗ and Marc Potters ∗ ∗Capital Fund Management, 23 rue de l’Universite,´ 75 007 Paris, France, †LPTMS, CNRS, Univ. Paris-Sud, Universite´ Paris-Saclay, 91405 Orsay, France, ‡Leonard de Vinci Poleˆ Universitaire, Finance Lab, 92916 Paris La Defense,´ France, §Weierstrass Institute, Mohrenstr. 39, 10117 Berlin, Germany

Abstract—We investigate the problem of estimating the convergence of the spectral measure for a very a given real symmetric signal matrix C from a noisy large class of random matrices. Perhaps the two most observation matrix M in the limit of large dimension. influential results are Wigner’s semicircle law [1] and We consider the case where the noisy measurement M comes either from an arbitrary additive or multiplicative Marcenkoˇ and Pastur’s theorem [2]. As far as infer- rotational invariant perturbation. We establish, using the ence is concerned, the latter result is arguably the Replica method, the asymptotic global law estimate for cornerstone result of RMT in the sense that it gives three general classes of noisy matrices, significantly ex- theoretical tools to understand why classical tending previously obtained results. We give exact results are insufficient and is now at the heart of many concerning the asymptotic deviations (called overlaps) of the perturbed eigenvectors away from the true ones, and applications in this field (for reviews, see e.g. [3], [4], we explain how to use these overlaps to “clean” the noisy [5], [6] or more recently [7] and references therein). eigenvalues of M. We provide some numerical checks for In this paper, we consider the statistical problem of the different estimators proposed in this paper and we a N × N matrix C which stands for the unknown also make the connexion with some well known results signal that one would like to estimate from the noisy of Bayesian . measurement of a N ×N matrix M in the limit of large dimension N → ∞. A natural question in statistics is I.INTRODUCTION to find an estimator Ξb(M) of the true signal C that One of the most challenging problem in modern depends on the dataset M we have. The true matrix C statistical analysis is to extract a true signal from noisy is unknown and we do not have any particular insights observations in data sets of very large dimensionality. on its components (the eigenvectors). Therefore we Be it in physics, genomics, engineering or finance, sci- would like our estimator Ξb(M) to be constructed in a entists are confronted with datasets where the sample rotationally invariant way from the noisy observation size T and the number of variables N are both very M that we have. In simple terms, this means that large, but with an observation ratio q = N/T that is not there is no privileged direction in the N-dimensional small compared to unity. This setting is known in the space that would allow one to bias the eigenvectors of literature as the high-dimensional limit and differs from the estimator Ξb(M) in some special directions. More the traditional large T , fixed N situation (i.e. q → 0), formally, the estimator construction must obey: arXiv:1502.06736v2 [cond-mat.stat-mech] 26 Oct 2016 meaning that classical results of multivariate statistics Ω Ξb(M) Ω† = Ξb(Ω M Ω†), (I.1) do not necessarily apply. However, when one deals with very large random for any rotation matrix Ω. Any estimator satisfying matrices (such as covariance matrices), one expects Eq. (I.1) will be referred to as a Rotational Invariant the spectral measure of the matrix under scrutiny to Estimator (RIE). In this case, it turns out that the exhibit some universal properties, which are indepen- eigenvectors of the estimator Ξb(M) have to be the dent of the specific realization of the matrix itself. same as those of the noisy matrix M [8], [9]. As This property is at the core of Random Matrix Theory we will show in Section II, this implies that the best (RMT), which provides a very precise description of possible estimator Ξb(M) depends on the overlaps (i.e. 2

the squared scalar product) between the eigenvectors a more recent review). We will see that the derivation of C and those of M. These overlaps turn out to of our results using replicas can be done without too be fully computable in the large N limit, using tools much effort and one can certainly imagine that our from Random Matrix Theory, for a wide class of noise results can be proven rigorously, as was done in [17] sources, much beyond the usual Gaussian models. for the resolvent or [11], [12], [13] for the overlaps The study of the eigenvectors for statistical purposes of covariance matrices and Gaussian matrices with is in fact quite a recent topic in random matrices. external sources. We relegate all these technicalities For sample covariance matrices, such considerations in various appendices and only give our final results have been studied in [10] and [11]. In the latter paper, and their numerical verifications in Section III. Note in the notion of overlap and optimal (oracle) estimator passing that we obtain using replicas the multiplication are treated in great details. As far as we know, this law of the S-transform for product of free matrices (see is the only paper in the literature where the oracle Appendix B-C), a derivation that we have not seen estimator is related to random matrices. Beside the in the literature before. In Section IV, we come back sample covariance matrix, the problem of the overlap to the problem of and apply the for a Gaussian matrix with an external source (also results of Section III to derive the optimal RIE for named as deformed Wigner ensemble) has been treated each considered model. In the multiplicative case, we first in [12] and then reconsidered in a more general recover and generalize the estimator recently derived setting in [13] using Dyson Brownian motions (for an by Ledoit and Pech´ e´ for covariance matrices [11]. Each early – but extremely brief – mention of these overlaps, estimator is illustrated by numerical simulations, and see [14, Section 6.2]). However, no mention on how to we also provide some analytical formulas that can clean the ‘noisy’ matrix was given in [12], [13]. It is the be of particular interest for real life problems. We gap we hope to fill here. We also extend these results to then conclude this work with some open problems and a much broader class of random perturbations, which, possible applications of our results. to the best of our knowledge, was not considered Conventions. We use bold capital letters for matri- before. ces and bold lowercase letters for vectors. We denote usual RMT spectral transforms with calligraphic font. The outline of this paper is organized as follows. We Finally, all acronyms and notations are summarized in introduce in Section II-A some notations and show that Appendix C. the optimal (oracle) RI estimator involves the overlaps between the eigenvectors of the signal matrix C and II.ROTATIONALLY INVARIANT ESTIMATORS, its noisy estimate M. In Section II-B, we observe that EIGENVECTOR OVERLAPSANDTHE RESOLVENT a convergence result on the resolvent of M not only gives us all the information about the eigenvalues, A. The oracle estimator and the overlaps but also the eigenvectors. After motivating the study Throughout this work, we will consider the signal of the resolvent of the measurement matrix M, we matrix C to be a symmetric matrix of dimension N provide in Section III explicit expressions for three with N that goes to infinity. We denote by c1 > c2 > different perturbation processes. The first one is the ... > cN its eigenvalues and by |v1i, |v2i,..., |vN i case where we add a noisy matrix that is free with their corresponding eigenvectors. The perturbed matrix respect to the signal C. The second model concerns M will be assumed to be symmetric with eigenvalues multiplicative perturbations and includes the sample denoted by λ1 > λ2 > ... > λN associated to the covariance matrix of (elliptically distributed) random eigenvectors |u1i, |u2i,..., |uN i. In the limit of large variables. We also reconsider the case of the so-called dimension, it is often more convenient to index the ‘Information-Plus-Noise’ matrix that deals with sam- eigenvectors of both matrices by their corresponding ple covariance matrices constructed from rectangular eigenvalues, i.e. |uii → |uλi i and |vii → |vci i for Gaussian matrices with an external source. The evalu- any integer 1 6 i 6 N, and this is the convention that ation of the resolvent for each model is based on the we adopt henceforth. powerful but non-rigorous replica method (which has We now attempt to construct an optimal estimator been extremely successful in various contexts, includ- Ξb(M) of the true signal C that relies on the given ing RMT or disordered systems– see [15], or [16] for dataset M at our disposal. We recall that our main 3

assumption is that we have no prior insights on the substitution” procedure independently proposed in [5], eigenvectors of the matrix C so that all estimators [18], aside from the trivial case C = IN . considered below satisfy Eq. (I.1). We conclude from We shall also see that the estimator ξbi is self- the seminal work of [9] that any Rotational Invariant averaging in the large N-limit (in the sense that it Estimator Ξ(M) of C shares the same eigenbasis as converges almost surely, see Section II-B) and can thus M, that is to say be approximated by its expectation value

N X N Ξ(M) = ξ |u i hu | , (II.1) X  2 i i i ξbi ≈ E hui|vji cj. i=1 j=1 where the eigenvalues ξ , . . . , ξ are the quantities we 1 N where [·] denotes the expectation value with respect wish to estimate. Next, the optimality of an estimator E to the random eigenvectors (|u i) of the matrix M. We is defined with respect to a specific (e.g. i i will often use the following notation for the (rescaled) the distance) and a standard metric is to consider the mean square overlaps (squared) Euclidean (or Frobenius) norm that we shall denote by  2 O(λi, cj) := NE hui|vji . (II.4)  2 C − Ξ(M) .= Tr (C − Ξ(M)) , L2 Eqs. (II.3) and (II.4) are the quantities of interest in for a given RIE Ξ(M). The best estimator with respect this paper. In Statistics, the optimal solution (II.3) to this loss function is the solution of the following is sometimes called the oracle estimator because it minimization problem depends explicitly on the knowledge of the true signal C. The “miracle” is that in the large N limit, and for a

Ξ(b M) = argmin C − Ξ(M) (II.2) large class of problems, one can actually express this RI Ξ(M) L2 oracle estimator in terms of the (observable) limiting considered over the set of all possible RI estimators spectral density (LSD) of M only. Ξ(M). Since the only free variables in the constrained optimization problem (II.2) are the eigenvalues of Ξ(M), it is easy to find the optimal solution: B. Relation between the resolvent and the overlaps

N N A convenient way to work out the overlap (II.4) is X X 2 Ξb(M) = ξbi |uii hui| , ξbi = hui|vji cj , to study the resolvent of M, defined as j=1 j=1 (II.3) . −1 GM(z) .= (zIN − M) . (II.5) where we see that the optimal eigenvalues ξbi depend on the overlaps between the perturbed |uii, the non- The claim is that for z not too close to the real axis, perturbed eigenvectors |vji and the eigenvalues of the the matrix GM(z) is self-averaging in the large N true matrix C. A few comments on this estimator limit so that its value is independent of the specific are in order. First, the estimator ξbi is designed to realization of M. More precisely, it means that GM(z) construct the best RI estimator Ξb(M) given in (II.1). converges to a deterministic matrix for any fixed value The consequence is that if we restrict our estimator (i.e. independent of N) of z ∈ C \ R when N → ∞. to have the eigenvectors of the noisy matrix M, then We will refer to this deterministic limit as the global 1 the naive approach that consists in substituting the law of GM(z) in the following. N N eigenvalues {ξbi}i=1 with the true ones {ci}i=1 yields The relation between the resolvent and the overlaps to a spectrum that is too wide. Indeed, it is not hard O(λj, ci) is relatively straightforward. For z = λ − iη to see from (II.3) that the top eigenvalues are shrunk with λ ∈ R and η  N −1, we have downward while the bottom ones are shrunk upward. N   In other words, the empirical spectral density (ESD) of X λ + iη GM(λ − iη) = 2 2 |uki huk| . (λ − λk) + η the ξbi is narrower than the true one which shows that k=1 the RI estimator cannot be attained by the “eigenvalues If we take the trace of the above quantity, and take the 1Remember that we have ranked the eigenvalues. limit η → 0 (after N → ∞), one obtains the limiting 4

“density of states” (i.e. the LSD) ρM (see Appendix statistical physics of disordered systems subject to an A): external perturbation where the matrix M is interpreted 1 as the Hamiltonian of the system, given by the sum of a Im g (λ − iη) ≡ Im TrG (λ − iη) = π ρ (λ) . M N M M deterministic term and a random term [19]. A simple (II.6) example is when the noisy matrix OBO† is a sym- Similarly , the elements of Im GM(λ − iη) can be metric Gaussian random matrix with independent and written for η > 0 as identically distributed (i.i.d.) entries, corresponding to N the so-called GOE (Gaussian Orthogonal Ensemble). X η 2 vi Im GM(λ − iη) vi = 2 2 hvi|uki By. construction, the eigenvectors of a GOE matrix are (λ − λk) + η k=1 invariant under rotation. (II.7) It is now well known that the spectral density of This latter quantity is also self-averaging in the large M can be obtained from that of C and B using free 2 N limit (the overlaps hvi|uki , k = 1,...,N with i addition, see [20] and, in the language of statistical fixed display asymptotic independence when N → ∞ physics, [21]. The statistics of the eigenvalues of M so that the law of large number applies here) and we has therefore been investigated in great details, see have [22] and [23] for instance. However, the question of the eigenvectors has been much less studied, except vi Im GM(λ − iη) vi recently in [12], [13] in the special case where OBO† Z η → O(µ, ci)ρM(µ)dµ . belongs to the GOE (see below). N→∞ (λ − µ)2 + η2 R For a general free additive noise, we show in where the overlap function O(µ, ci) is extended (con- Appendix B-2 that the global law estimate for the tinuously) to arbitrary values of µ inside the support of resolvent reads in the large N limit: ρM in the large N limit. Sending η → 0 in this latter equation, we finally obtain the following formula valid hGM(z)i = GC(Z(z)) (III.2) in the large N limit where the function Z(z) is given by

vi Im GM(λ − iη) vi ≈ πρM(λ)O(λ, ci). (II.8) Z(z) = z − RB(gM(z)), (III.3)

Eq. (II.8) will thus enable us to investigate the overlaps and RB is the so-called R-transform of B (see Ap- O(λ, ci) in great details through the calculation of the pendix A for a reminder of the definition of the elements of the resolvent GM(z). This is what we aim different useful spectral transforms). for in the next section. We emphasize that the different Note that Eq. (III.2) is a matrix relation, that sim- equations of the mean square overlaps O(λ, ci) below plifies when written in the basis where C is diagonal, will be expressed in the basis where C is diagonal since in this case GC(Z) is also diagonal. Therefore, without loss of generality (see Appendix B for more the evaluation of the overlap O(λ, c) is straightforward details). using Eq. (II.8). Let us define the Hilbert transform

hM(λ) which is simply the real part of the Stieltjes III.OVERLAPS:SOME EXACT RESULTS transform gM(λ − iη) in the limit η → 0. Then the A. Free additive noise overlap for the free additive noise is given by: The first model of noisy measurement that we con- β1(λ) O(λ, c) = 2 2 2 2 , sider is the case where the true signal C is corrupted (λ − c − α1(λ)) + π β1(λ) ρM(λ) (III.4) by a free additive noise, that is to say where c is the corresponding eigenvalue of the unper- M = C + OBO†, (III.1) turbed matrix C, and where we defined:  α1(λ) := Re [RB (hM(λ) + iπρM(λ))], where B is a fixed matrix with eigenvalues b1 > b2 >  Im [R (h (λ) + iπρ (λ))] (III.5) ··· > bN with limiting spectral density ρB and O is B M M β1(λ) := . a random matrix chosen uniformly in the Orthogonal πρM(λ) group O(N) (i.e. according to the Haar measure). This As a first check of these results, let us consider family of models has found several applications in the normalized trace of Eq. (III.2) and then set u = 5

B. Free multiplicative noise and empirical covariance matrices Our second model deals with multiplicative noise in the following sense: we consider that the noisy measurement matrix M can be written as √ √ M = COBO† C, (III.7)

where again C is the signal, B is a fixed matrix with eigenvalues b1 > b2 > ··· > bN with limiting density ρB and O is a random matrix chosen in the Orthogonal group O(N) according to the Haar measure. Note that

Fig. 1. Computations of the rescaled overlap O(λ, c) as a function we implicitly requires that C is positive definite with of c in the free addition perturbation. We chose i = 250, C a Wishart Eq. (III.7), so that the square root of C is well defined. 2 matrix with parameter q = 0.5 and B a Wigner matrix with σ = 1. An explicit example of such a problem is provided The black dotted points are computed using numerical simulations and the plain red curve is the theoretical predictions Eq. (III.4). The by sample covariance matrices where B is a Wishart agreement is excellent. For i = 250, we have ci ≈ 0.83 and we see matrix [24]), which is of particular interest in multi- that the peak of the curve is in that region. The same observation variate statistical analysis. We shall come back later holds for i = 400 where ci ≈ 1.66. The numerical curves display the empirical mean values of the overlaps over 1000 samples of M to this application. The Replica analysis leads to the given by Eq. (III.1) with C fixed. following systems of equations (see Appendix B-C) for the general problem of a free multiplicative noise above, Eq. (III.7): gM(z) = gC(Z(z)). One can find by using the Blue zhG (z)i = Z(z)G (Z(z)), (III.8) transform, defined in (A.3), that we indeed retrieve the M C free addition formula RM(u) = RC(u)+RB(u) when with: N → ∞, as it should be. Z(z) = zSB(zgM(z) − 1), (III.9) Deformed GOE: As a second verification, we spe- † cialize our result to the case where OBO is a GOE where SB is the so-called S-transform of B (see matrix such that the entries have a variance equal to Appendix A) and gM is the normalized trace of GM(z). 2 2 σ /N. Then, one has RB(z) = σ z meaning that Eq. The latter obeys, from Eq. (III.8), the self-consistent 2 (III.3) simply becomes Z(z) = z − σ gM(z). This equation: allows us to get a simpler expression for the overlap: zgM(z) = Z(z)gC(Z(z)). (III.10)

2 σ Again, Eq. (III.8) is a matrix relation, that simplifies O(λ, c) = 2 2 4 2 2 , (c − λ + σ hM(λ)) + σ π ρM(λ) when written in the basis where C is diagonal. Note (III.6) that Eqs (III.10) and (III.9) allow us to retrieve the which is exactly the result derived in [12], [13] using usual free multiplicative convolution, that is to say: other methods. In Fig. 1, we illustrate this formula in the case where C is an isotropic Wishart matrix SM(u) = SC(u)SB(u). (III.11) of parameter q, by taking e.g. C = T −1HH† where H is a symmetric matrix of size N × T filled with This result is thus the analog of our result (III.2) in the i.i.d. standard Gaussian entries and q = N/T . We set multiplicative case. We refer the reader to the appendix N = 500, T = 1000, and take OBO† as a GOE B-C for more details. We emphasize that for technical matrix with variance 1/N. For a fixed C, we generate reasons, we restrict B to have a normalized trace that 1000 samples of M given by Eq. (III.1) for which we differs from zero. can measure numerically the overlap quantity. We see With the global law estimate for the resolvent given that the theoretical prediction (III.6) agrees remarkably by Eqs. (III.8) and (III.9) above, we can obtain a with the numerical simulations. general overlap formula for the free multiplicative 6

noise case. Let us define the following functions transform of M in term of the Stieltjes transform of   1  the true matrix C α2(λ) := lim Re (  z→λ−i0+ SB(zgM(z) − 1) zgM(z) = Z(z)gC(Z(z)) ,   (III.17)  1 1 Z(z) .= z . β2(λ) := lim Im , 1−q+qzgM(z) z→λ−i0+ SB(zgM(z) − 1) πρM(λ) (III.12) The expression of the limiting overlaps can be further then the overlap O(λ, c) between the eigenvectors of simplified in this particular case to C and M are given by: O(λ, c) = cβ2(λ) qcλ O(λ, c) = . 2 2 2 2 2 2 2 2 2 2 2 (λ − cα2(λ)) + π c β2(λ) ρM(λ) (c(1 − q) − λ + qcλhM(λ)) + q λ c π ρM(λ) (III.13) (III.18) In order to give more insights on our results, we and we recover, as expected, the Ledoit & Pech´ e´ will now specify these results to some well-known result established in [11]. As a conclusion, our result applications of multiplicative models in RMT. generalizes the standard Marcenkoˇ & Pastur formalism Empirical covariance matrix: As mentioned pre- to an arbitrary multiplicative noise term OBO†. viously, the most famous application of a model of Elliptical ensemble: A slightly more general ap- the form (III.7) is given by the sample covariance plication of the model (III.7) is when we suppose estimator that we recall briefly. Let us define by . N×T that the entries of the observation matrix R can be R .= (Rit) ∈ R the observation matrix whose written as the product of two independent sources columns represent the collected samples of size T R = σ Y . The {Y } are characterized by the true that we assume to be independently and identically it t it it signal, i.e. C = hY Y 0 iδ 0 and are generated distributed with zero mean. The N elements of each ij it jt t,t independently from the same distribution at time t that sample generally display some degree of interdepen- will be assumed to be Gaussian in our case. The {σt} dence, that is often represented by the true (or also 2 . N×N are such that hσ i = 1 and allows to add a time- population) covariance matrix C .= (Cij) ∈ R , dependent volatility with a factor σt that is common to defined as hRitRjt0 i = Cijδt,t0 , where δt,t0 is the all variables at time t. This defines the class of elliptical Kronecker symbol. As the signal C is unknown, the distributions and the most famous application is when classical way to estimate the covariances is to compute the {σ } are drawn from a inverse-gamma distribution the empirical (or sample) covariance matrix thanks to t which leads to the multivariate Student distribution Pearson estimator [26] (see Sec. (IV-B) below). The corresponding em- 1 √ √ M .= RR† ≡ CXX† C , (III.14) pirical correlation matrix can be written as T √ √ where X is a N ×T random matrix where all elements M = CXΣX† C, (III.19) are i.i.d. random variables with zero mean and variance where Σ := diag(σ2, σ2, . . . , σ2 ) and X is defined T −1. It is easy to see that this model is a particular 1 2 T as in Eq. (III.14). This model has been subject to case of the model (III.7) where B ≡ XX† whose S- several studies in RMT, see e.g. [27] [28], [29] or transform has an explicit expression [3]: [30]. In all these works, the expression of the limiting 1 N Stieltjes transform of the spectral density is quite SB(x) = , q = . (III.15) 1 + qx T complex, except for the case where C is the identity Using our general results Eqs. (III.8) and (III.9), we matrix. We find here that we can in fact obtain a obtain self-consistent expression for the global law estimate ( of the corresponding resolvent by introducing the ap- zhGM(z)i = Z(z)GC(Z(z)) , (III.16) propriate transforms. Our result generalizes the time- Z(z) .= z . 1−q+qzgM(z) independent result of [25] or [17], and also provides a which is exactly the result found in [25] and also tractable equation for the limiting eigenvalues density. in [17] at leading order. We can therefore recover Before stating the result for the elliptical model the well-known Marcenko-Pasturˇ equation [2] which (III.19), one has to be careful with the S-transform gives a fixed point equation satisfied by the Stieltjes of B. Indeed, it is in fact more convenient to work 7

√ √ † 3 with the “dual” matrix B∗ := ΣX CX Σ in order simulations theoretical to use the free multiplication formula. We then obtain Marcenko-Pastur the S-transform of B∗ to finally express the Stieltjes 2 transform of gB∗ as a function gB, simply by noticing that B∗ has the same eigenvalues as B and the ad- ditional zero eigenvalue with multiplicity T − N. The ρ(λ) final result reads, after elementary manipulations of the 1 T -transform, x + 1 x S (x) = S . (III.20) B∗ x + q B q 0 0 1 2 3 4 5 In a nutshell, applying the result Eq. (III.9) to the λ elliptical case leads to the result z Fig. 2. Theoretical predictions of the density of states from Eq. Z(z) = S (q(zg (z)−1)) (III.21) 1 − q + qzg (z) Σ M (III.8) (red line) compared to simulated data when C is a 500 × 500 M inverse-Wishart matrix (parameter κ = 0.2) and Σ is a white Wishart with Eqs. (III.8) and (III.10) unchanged. With Eq. with q0 = 0.6. The agreement of our theoretical estimate is excellent (III.21), the general result (III.10) extends the stan- and differs strongly from the classical Marcenko-Pasturˇ density (blue dotted curve) dard Marcenko-Pasturˇ formula to a time-dependent2 framework. Note that the corresponding self-consistent equation for the Stieltjes transform gM(z) has been obtained in previous studies [28], [29], [30] yet stated in a different form. One can easily specialize the result of the overlaps O(λ, c) to any λ and c as a function of the spectral measure of Σ. However, we do not find an expression as tractable as Eq. (III.18). Finally, let us now show that Eq. (III.10) can be useful for practical purposes in order to construct non- trivial models. Suppose that C is an inverse-Wishart matrix (see Section IV.B. for the definition of this law) with parameter κ = 0.2 and define Σ to be a Wishart matrix of size T ×T and parameter q0 = 0.6. We follow the same numerical procedure as in the free additive Fig. 3. Rescaled overlap O(λ, c) as a function of cj in the free multiplicative perturbation with N = 500. We chose C as an noise case. We compare in Fig. 2 our theoretical inverse-Wishart matrix with parameter κ = 0.2 and Σ a Wishart result Eq. (III.10) with empirical simulations and the matrix with q0 = 0.6. The black dotted points are computed agreement is remarkable. The same conclusion holds using numerical simulations and the plain curves are the theoretical predictions Eq. (III.13). For i = 250 (resp. i = 400), we have for the overlap (see Fig. 3). ci ≈ 0.37 (resp. ci ≈ 1.48) and we see that the peak of the curve Note that the results obtained in this section could is in that region for both value of i. be extended to the case where the diagonal matrix Σ is not positive definite, for applications in regression model, we suppose that at each time t, we observe a N- analysis (see for example [31]). It can also be used . dimensional vector whose entries are given by Rit .= for studying Maronna’s robust estimators of C as the Ait + σXit for any i = 1,...,N where the signal is deterministic equivalent of such estimators in the large contained in the variable Ait which is perturbed by N falls down into the model (III.7) [32]. N an additive noise σXit ∈ R . We will assume that the entries of X = (X ) ∈ N×T are i.i.d. Gaussian C. Information-Plus-Noise matrix it R random variables with zero mean and unit variance. In The last model we will treat here is the so-called the case where the number of samples T  N, the ‘Information-Plus-Noise’ family of matrix [33]. In this empirical covariance matrix given by 2in the sense that the volatility depends on the observation time 1 1 M = RR† = (A + σX)(A + σX)† (III.22) t. T T 8

1 † 2 3 is a good estimator of AA + σ IN . This model T simulation (i=250) is of particular interest in signal processing, in order theoretical (i=250) 2.5 simulations (i=400) to detect the number of sources and their direction of theoretical (i=400) arrival [34]. Another example of application of this 2 2 > model comes from Finance where one may want to j

| V 1.5 estimate the integrated covariance matrix from high- i

As usual, in the case where T ∼ O(N), the 0.5 empirical estimator cannot be fully trusted. The main 0 assumption of the model is again the convergence of 0 0.5 1 1.5 2 2.5 3 c the empirical density of eigenvalues of C := T −1AA† j towards a limiting density ρC. The global law of the Fig. 4. Rescaled overlap O(λ, c) as a function of cj in the Information-Plus-Noise matrix reads, in a matrix sense: information-plus-noise model with N = 500. We chose A and X to be a N × T Gaussian matrix and T = 2N. The black dotted points 2 −1 −1 hGM(z)i = (zZ(z) − σ (1 − q)) − Z(z) C , are computed using numerical simulations and the plain curves are (III.23) the theoretical predictions Eq. (III.26). For i = 250 (resp. i = 400), we have ci ≈ 0.83 (resp. ci ≈ 1.66) and we see that the peak of where we have defined the curve is in that region for both value of i.

2 Z(z) = 1 − qσ gM(z). (III.24) This global law result (III.23) has already been ob- Gaussian noise of same size with σ = 1. The procedure tained in a mathematical context by the authors of [36] is the same than in the previous section and we give a for applications in wireless communications and signal numerical example in Fig. 4. processing. However, it is satisfactory to see that the replica method is able to reproduce this result. If we IV. OPTIMAL ROTATIONAL INVARIANT ESTIMATOR take the normalized trace of the above equation, we find that the Stieltjes transform reads The above resolvent and overlap formulas for vari- ous models of random matrices are the central results g (z) = M of this study. Equipped with these results, we can now Z dcρ (c) C tackle the problem of the optimal RIE of the signal C. 2 2 c , z(1 − qσ gM(z)) − σ (1 − q) − 2 1−qσ gM(z) As we shall see, the high-dimensional limit N → ∞ (III.25) allows one to reach some degree of universality. First, which is the result obtained in [33]. As far as we we rewrite the RIE ((II.3)) as: understand, the authors of [36] did not discuss the N Z overlaps in the present context. The final expression 1 X ξbi = O(λi, cj)cj ≈ c ρC(c) O(λi, c) dc. N→∞ N for O(λ, c) is quite cumbersome, but again completely j=1 explicit, and reads: Quite remarkably, as we show below, the optimal RIE 2 qσ α3(λ)(λα3(λ) + c) can be expressed, in the three above cases, as a function O(λ, c) = 2 2 (III.26) χ1(λ, c) + χ2(λ, c) of the spectral measure of the observable (noisy) M . 2 2 4 2 2 α3(λ) .= ζ (λ) + q σ π ρM(λ) only. . 2 χ1(λ) .= (λα3(λ) − c)ζ(λ) − α3(λ)σ (1 − q) Let us however stress that the nonlinear “shrinkage” . ξ a priori χ2(λ) .= (λα3(λ) + c)qσπρM(λ) estimators bi we obtain below are valid in . 2 the support of M only. An interesting problem for ζ(λ) .= 1 − qσ hM(λ) . future research would be to extend the results obtained For the sake of completeness, we provide a numer- here for the bulk eigenvalues to the spiked eigenvalues, ical example for the overlap (III.26) where A is a also called outliers. Here, we assume that there are Gaussian matrix of size N × T with q = 0.5 and no spikes and we perform the optimal RIE for each N = 500 with variance 1. The perturbation is a models. 9

A. Free additive noise The optimal cleaning scheme to apply in this case is then given by: We now specialize the RIE and we begin with the  2  free additive noise case for which the noisy measure- σC F1(λ) = λ 2 2 , (IV.4) ment is given by σC + σ M = C + OBO†. where one can see that the optimal cleaning is given by rescaling the empirical eigenvalues by the signal-to- It is easy to see from Eqs. (II.8) and (III.2) that: noise ratio. This result is expected in the sense that we perturb a Gaussian signal by adding a Gaussian noise. "Z # 1 c ρC(c) We know in this case that the optimal estimator of the ξbi = lim Im dc + πρM(λi) z→λi−i0 Z(z) − c signal is given, element by element, by the Wiener filter 1 [37], and this is exactly the result that we have obtained = lim Im Tr [GM(z)C] , + NπρM(λi) z→λi−i0 with (IV.4). We can also notice that the ESD of the (IV.1) cleaned matrix is narrower than the true one. Indeed, let us define the signal-to-noise ratio SNR = σ2 /σ2 ∈ where Z(z) is given by Eq. (III.3). From Eq. (III.2) C M [0, 1], and it is obvious from (IV.4) that Ξ(b M) is a one also has Tr[G (z)C] = N(Z(z)g (z) − 1), and M M Wigner matrix with variance σ2 ×SNR which leads to using Eqs. (III.3) and (III.5), we end up with: Ξ 2 2 2 σM > σC > σC × SNR , (IV.5) lim Im Tr [GM(z)C] + z→λ−i0 as it should be. = NπρM (λ)[λ − α(λ) − β(λ)hM(λ)] . As a second example, we now consider a less trivial case and suppose that C is a white Wishart matrix with We therefore find the following optimal RIE nonlinear parameter q0. For any q0 > 0, it is well known that the “shrinkage” function F1: Wishart matrix has nonnegative eigenvalues. However,

ξbi = F1(λi); F1(λ) = λ − α1(λ) − β1(λ)hM(λ), we expect that the noisy effect coming from the GOE (IV.2) matrix pushes some true eigenvalues towards the neg- where α1, β1 are defined in Sect. III.A, Eq. (III.5). This ative side of the real axis. In Fig 5, we clearly observe result states that if we consider a model where the this effect and a good cleaning scheme should bring signal C is perturbed with an additive noise (that is these negative eigenvalues back to positive values. In free with respect to C), the optimal way to ’clean’ the order to use Eq. (IV.3), we invoke once again the free eigenvalues of M in order to get Ξ(b M) is to keep the addition formula to find the following equation for the eigenvectors of M and apply the nonlinear shrinkage Stieltjes transform of M: formula (IV.2). We see that the non-observable oracle 0 = − q σ2g (z)3 + (σ2 + q z)g (z)2 estimator converges in the limit N → ∞ towards a 0 M 0 M deterministic function of the observable eigenvalues. + (1 − q0 − z)gM(z) + 1 , Deformed GOE: Let us consider the case where B for any z = λ − iη with η → 0. It then suffices to take is a GOE matrix. Using the definition of α and β 1 1 the real part of the Stieltjes transform gM(z) that solves given in Eq. (III.5), the nonlinear shrinkage function this equation3 to get the Hilbert transform. In order to is given by check formula Eq. (IV.2) using numerical simulations,

2 we have generated a matrix of M given by Eq. (III.1) F1(λ) = λ − 2σ hM(λ). (IV.3) with C a fixed white Wishart matrix with parameter q0 † Moreover, suppose that C is also a GOE matrix so and OBO a GOE matrix with radius 1. As we know 2 exactly C, we can compute numerically the oracle that M is a also a GOE matrix with variance σM = 2 2 estimator as given in (II.3) for each sample. In Fig. σC + σ . As a consequence, the Hilbert transform of M can be computed straightforwardly from the Wigner 6, we see that our theoretical prediction in the large semicircle law and we find N limit compares very nicely with the mean values

λ 3 h (λ) = . We take the solution which has a strictly nonnegative imaginary M 2 part 2σM 10

2 of M defined by noisy signal √ √ cleaned M = COBO† C . 1.5 Following the computations done above, we can find after some manipulations of the global law estimate 1 ρ(λ) (III.8):

Tr (GM(z)C) = N(zgM(z) − 1)SB(zgM(z) − 1). 0.5 (IV.6) Using the analyticity of the S-transform, we define the

0 function γB and ωB such that: -2 -1 0 1 2 3 4 λ lim SB(zgM(z) − 1) := γB(λ) + iπρM(λ)ωB(λ) , z→λ−i0+ Fig. 5. Eigenvalues of the noisy measurement M (black dotted line) (IV.7) compared to the true signal C drawn from a 500 × 500 Wishart and as a result, the optimal RIE (or nonlinear shrinkage matrix of parameter q0 = 0.5 (red line). We have corrupted the signal by adding a GOE matrix with radius 1. The eigenvalues formula) for the free multiplicative noise model (III.7) density of M allows negative values while the true one has only reads: positive values. The blue line is the LSD of the optimally cleaned matrix. We clearly notice that the cleaned eigenvalues are all positive ξbi = F2(λi); F2(λ) = λγB(λ)+(λhM(λ)−1)ωB(λ) , and its spectrum is narrower than the true one, while preserving the trace. (IV.8) and this is the analog of the estimator (IV.2) in the multiplicative case. Empirical covariance matrix: As a first application of the general result Eq. (IV.8), we reconsider the ho- mogeneous Marcenko-Pasturˇ setting where B = XX† with X defined as in Eq. (III.14). We trivially find from the definition of the S-transform (III.15) that (IV.7) yields in this case:

1 − q + qλhM(λ) γB(λ) = 2 |1 − q + qλ lim gM(z)| z→λ−i0+ qλ ωB(λ) = − 2 . |1 − q + qλ lim gM(z)| z→λ−i0+ Fig. 6. Eigenvalues according to the optimal cleaning formula (IV.4) (IV.9) (red line) as a function of the observed noisy eigenvalues λ. The parameter are the same as in Fig. 5. We also provide a comparison The nonlinear shrinkage function F2 thus becomes: against the naive eigenvalues substitution method (black line) and λ we see that the optimal cleaning scheme indeed narrows the spacing F (λ) = , between eigenvalues. 2 2 2 2 2 2 (1 − q + qλhM(λ)) + q λ π ρM(λ) (IV.10) which is precisely the Ledoit-Pech´ e´ estimator derived of the empirical oracle estimator computed from the in [11]. Let us insist once again on the fact that this is sample. We can also notice in Fig. 5 that the spectrum the oracle estimator, but it can be computed without the of the cleaned matrix (represented by the ESD in blue) knowledge of C itself, but only with its noisy version is narrower than the standard Marcenko-Pasturˇ density. M. This “miracle” is of course only possible thanks to This confirms the observation made in Sec. II-A. the N → ∞ limit that allows the spectral properties of M and C to become deterministically related one to B. Free multiplicative noise the other. By proceeding in the same way as in the additive Like in the additive case, we can give a pretty case, we can derive formally a nonlinear shrinkage insightful application of the formula (IV.8) based on estimator that depends on the observed eigenvalues λ Bayesian statistics once again. Let us suppose that 11

C is a white inverse-Wishart matrix (i.e. C−1 is a white Wishart matrix of parameter q). The eigenvalue distribution of C can then be computed exactly by performing the following change of variable4 µ = ((1 − q)c)−1 in the Marcenko-Pasturˇ density function to get κ ρ (µ) = p(µ − µ)(µ − µ ) , C πµ2 + − 1 √ µ .= [κ + 1 ± 2κ + 1] , (IV.11) ± κ with q = (2κ + 1)−1 and κ the hyper-parameter which is positive. From there, one can compute the corresponding Stieltjes transform of C Fig. 7. Eigenvalues according to the optimal cleaning formula (IV.8) (red line) as a function of the observed noisy eigenvalues λ p (1 + κ)z − κ ± κ (z − µ+)(z − µ−) when C is an inverse-Wishart matrix (with parameter κ = 0.2 and gC(z) = , 2 z2 the Σ = diag({σt }t) is distributed according the Marcenko-Pasturˇ (IV.12) density (with parameter q0 = 0.5). We compare the result against and we can also compute the Stieltjes transform of numerical simulations (blue points) and the agreement is excellent. We furthermore compute the optimal cleaning scheme when Σ = IT the perturbed matrix M thanks to the Marcenko-Pasturˇ (black dotted line) and we see that Σ allows one to go from linear equation (III.17): to nonlinear shrinkage. z(1 + κ) − κ(1 − q) ± pψ(z, κ) gM(z) = , z(z + 2qκ) (IV.8) will now depends on q = N/T and on the ψ(z, κ) .= (κ(1 − q) − z(1 + κ))2 spectral measure of Σ, which prevents us to get a −z(z + 2qκ)(2κ + 1) (IV.13) tractable form as in the homogeneous Marcenko-ˇ Pastur case (IV.10). However, we may expect to find The reason why we insist on this matrix ensemble is a nonlinear shrinkage formula even when the signal that it plays a special role in multivariate statistics, is given by an Inverse-Wishart matrix. Indeed, the especially for estimating covariance matrices because optimal cleaning formula is given by Eq. (IV.8) where the famous linear shrinkage estimator [38] turns out to we can compute numerically the S-transform of B be exact in this case, in the sense that it corresponds to using the free multiplication and Eq. (III.20) for any the RIE as defined in the introduction. We can recover Σ. We illustrate this in Fig. (7) where the eigenvalues this result within the present formalism. Indeed, the of Σ are generated following Marcenko-Pasturˇ density use of Eq. (IV.13) in the estimator (IV.10) leads, after and we see that the estimator (IV.8) clearly deviates some computations, to: from the linear shrinkage (IV.14). . 1 F2(λ) = αλ + (1 − α), with α .= . 1 + 2qκ As a second example, we consider the Student en- (IV.14) semble of correlation matrices [27], [30] which has en- This is the linear shrinkage estimator that tells us to countered some success in quantitative finance because replace the noisy eigenvalues by a linear combination it allows one to construct non-Gaussian correlated data of the noisy eigenvalues and unity. with a clear interpretation of the matrix Σ. We impose Elliptical Ensemble: In this subsection, we now the {σ2}T to be distributed according to an inverse- consider the elliptical model (III.19) where the noise t t=1 gamma distribution, i.e. term is given by 1  σ2  σµ † 2 0 0 B = XΣX , ρΣ(σ ) = µ  exp − 2 1+µ , (IV.15) Γ 2 σ σ for an arbitrary diagonal T × T matrix Σ. One 2 where we set σ0 := (µ − 2)/2 and µ > 2 in immediately sees that the optimal shrinkage formula order to have hσ2i = 1. Within such prescription, the

4The factor 1 − q is such that TrC = N, which follows from the sample data Rit := σtYit, with hYitYjt0 iδt,t0 = Cij, is Marcenko-Pasturˇ equation. characterized by the multivariate Student distribution 12

Fig. 8. Eigenvalues according to the optimal cleaning formula Fig. 9. Eigenvalues according to the optimal cleaning formula (IV.8) as a function of the noisy observed eigenvalues λ when C is (IV.16) as a function of the noisy observed eigenvalues λ with the an inverse-Wishart matrix (with parameter κ = 0.2) and the Σ = same setting than in Sec. (III-C). We compare the estimator (IV.16) 2 diag({σt }t) is generated according an inverse-gamma distribution (red line) against numerical simulations coming from a single sample (IV.15) (with parameter µ = 6). We compare the RIE (IV.8) (red with N = 500 (blue points), and the agreement is excellent. line) against numerical simulations (blue points) and the agreement is quite convincing, especially in the bulk. We compare it with the substitution procedure (black dotted line) which leads to a wider compared to the previous cases but one can follow spectrum. the same route to find the desired result. We leave the complete derivation to the reader; the final formula for of parameters µ and N [26]. From a financial per- the corresponding shrinkage function F3 reads: spective, this parametrization can be useful as a model 2 2 2 where all individual stock returns are impacted by the F3(λ) = ζ(λ)(λ−σ (1−q)−2qσ λhM(λ))+qσ (1−γM(λ)), (IV.16) same, time dependent scale factor σt that represents the “market volatility” (see [39] for a discussion of where hM(λ) is as before the Hilbert transform of the this assumption). From empirical studies, one possible probability density ρM, ζ(λ) is given in (III.26) and choice that matches quite well the data is to choose the function γM is defined by Eq. (IV.15) with µ ≈ 3 − 5. The results above allow 2 2 2 2 2 γM(λ) = hM(λ)(λ−σ (1−q))+qσ λ(π ρM(λ)−hM(λ)). us to compute numerically either the LSD or the RIE for an arbitrary “true” signal C, thus generalizing the If we consider the trivial case of zero noise (i.e. σ = 0), work done in [27]. we have by definition that M = C and we indeed see We plot in Fig. 8 the RIE (IV.8) when the eigenval- this in Eq. (IV.16) where the optimal shrinkage formula ues of Σ are generated following the inverse-gamma becomes ξbi = λi. The other limit that can be studied distribution with µ = 6 and C is still an inverse- without much effort is when the sample size becomes Wishart matrix of parameter κ = 0.2. The numerical much larger than the number of variable (i.e. q = 0). 2 procedure is the same as for the previous example. The In this case, we know that M = C + σ IN by the results we obtain are quite convincing, especially in the law of large number. The optimal shrinkage (IV.16) 2 bulk. The noisy fluctuations for the largest eigenvalues gives in that case ξbi = λi − σ which was expected in Fig. 8 can be explained by the difficulty to solve Eq. because the observation matrix M is simply a shift 2 (III.10) and (III.21) outside of the bulk, most notably of the signal by a factor σ . Let us now reconsider due to the inversion of the T -transform of Σ. However, the same numerical example of Section (III-C) and we we see that these large eigenvalues still have the right apply the same procedure to test the RIE (IV.16) than behavior in the sense that they are shrunk downward the last two sections. We clearly see in Fig. 9 that the compared to the “naive” substitution procedure. agreement is remarkable even for a finite N.

C. Information-Plus-Noise matrix V. CONCLUSIONANDOPENPROBLEMS The derivation of the asymptotic RI estimator for As we recalled in the introduction, RMT is already the Information-Plus-Noise model is a bit more tedious at the heart of many significant contributions when it 13

comes to reconstructing a true signal matrix C of large also extend the above results to more realistic models dimension from a noisy measurement. In this paper, we of covariances that accounts for autocorrelation effects revisited this statistical problem and considered the so- [28] or fat-tailed distributions [32]. called oracle estimator which is optimal with respect to Although our computations are based on the non- the Euclidean norm. In particular, we have established rigorous Replica method, the comparison between our the global resolvent law for three distinct ensembles theoretical formulas and empirical simulations demon- of random matrices that embrace well-known models strates the robustness of each proposed estimator. like the deformed Wigner or the sample covariance Hence, one can certainly think of possible extensions matrix. These results on the asymptotic convergence of this work based on the same method. For instance, have two important applications: (i) they allow us to a natural extension for our free additive perturbation find exact results on the overlap between the eigen- model would be given by vectors of the signal matrix with the corrupted ones; † M = C + OqBO (V.1) (ii) most importantly, they lead to the ‘miracle’ which q allows the oracle estimator to be expressed without any where the law of the matrix Oq ∈ O(N) interpolates knowledge of the signal C in the large N limit. This between the Haar measure on the Orthogonal group last observation, that generalizes the work of Ledoit O(N) when q = 0 and a given measure on the and Pech´ e´ [11], should be of particular interest in permutation group when q = +∞. Differently said, M practical cases. interpolates between the free and the classical addition. A natural prescription would be to imagine that O is We emphasize that the proposed estimators are opti- q the result of Biane’s Brownian motion [43] on O(N). mal (in the 2 norm sense) when the dimension of the L Another natural possibility is to assume that O is dis- problem becomes very large and under no particular q tributed according to the probability measure with the prior beliefs on the eigenvector structure of the true Harish-Chandra-Itzykson-Zuber (HCIZ) weight [44], matrix C. However, it happens in practice that one [45]: could have a prior structure on the eigenvectors of C  † (factor models), and it would be interesting to see how Pq(dO) ∝ exp qNTrCOBO dO (V.2) can we rewrite our problem in a non-RI framework. which has the right limits when q → 0 (Haar measure This natural extension is left for future work. Note that on the orthogonal group) and q → ∞ (determinis- the case of so-called ‘spiked’ eigenvalues/eigenvectors tic measure rearranging the spectrum of B in non- in the context of empirical covariance matrix was increasing order). Hence, considering a replica method recently solved in [40]. for this specific case might give us access to the global We also highlighted the interesting connexion be- law of the resolvent of M which should enable us to tween our work with some famous result of Bayesian express the correlation function of angular integrals, statistics. For instance, we found out that our results leading to an alternative expression of the Morosov- generalize the Wiener filter [37] (additive case) but also Shatashvili formula [46], [47] expressed in terms of the linear shrinkage [38], [41] (multiplicative case), the free energy of HCIZ integrals. and both have encountered many successes in practical cases. Moreover, the Bayesian theory has found several ACKNOWLEDGMENTS applications in modern statistical analysis, especially We are indebted to M. Abeille and S. Pech´ e´ because the large amount of data may allow one to for fruitful discussions. RA received funding from identify a pattern in the data which could be used the European Research Council under the European as a prior. The different estimators we proposed are Union’s Seventh Framework Programme (FP7/2007- powerful in the sense that they depend only on ob- 2013)/ERC grant agreement nr. 258237 and thanks the servable quantities. Differently said, they can be used Statslab in DPMMS, Cambridge for its hospitality. in many different contexts with no parameter to fit. REFERENCES This is in adequation with the recent work of [42] and [40] which provide efficient numerical methods to use [1] E. P. Wigner, “On the statistical distribution of the widths and spacings of nuclear resonance levels,” in Mathematical these (limiting) results for real life applications, in the Proceedings of the Cambridge Philosophical Society, vol. 47, specific case of sample covariance matrices. One could no. 04. Cambridge Univ Press, 1951, pp. 790–798. 14

[2] V. A. Marchenko and L. A. Pastur, “Distribution of eigenvalues [25] Z. Burda, A. Gorlich,¨ A. Jarosz, and J. Jurkiewicz, “Signal and for some sets of random matrices,” Matematicheskii Sbornik, noise in correlation matrix,” Physica A: Statistical Mechanics vol. 114, no. 4, pp. 507–536, 1967. and its Applications, vol. 343, pp. 295–310, 2004. [3] A. M. Tulino and S. Verdu,´ “Random matrix theory and [26] J.-P. Bouchaud and M. Potters, Theory of financial risk and wireless communications,” Communications and Information derivative pricing: from statistical physics to risk management. theory, vol. 1, no. 1, pp. 1–182, 2004. Cambridge university press, 2003. [4] I. M. Johnstone and D. M. Titterington, “Statistical challenges [27] G. Biroli, J.-P. Bouchaud, and M. Potters, “The student ensem- of high-dimensional data,” Philosophical Transactions of the ble of correlation matrices: Eigenvalue spectrum and kullback– Royal Society A: Mathematical, Physical and Engineering leibler entropy,” Acta Physica Polonica B, vol. 38, p. 4009, Sciences, vol. 367, no. 1906, pp. 4237–4253, 2009. 2007. [5] J.-P. Bouchaud and M. Potters, “Financial applications of [28] Z. Burda, J. Jurkiewicz, and B. Waclaw, “Spectral moments of random matrix theory: a short review,” in The Oxford handbook correlated wishart matrices,” Physical Review E, vol. 71, no. 2, of Random Matrix Theory. Oxford University Press, 2011. p. 026111, 2005. [6] R. Couillet, M. Debbah et al., Random matrix methods for [29] L. Zhang, “Spectral analysis of large dimensional random wireless communications. Cambridge University Press Cam- matrices,” National University of Singapore PHD Thesis, 2006. bridge, MA, 2011. [30] N. El Karoui, “Concentration of measure and spectra of ran- [7] D. Paul and A. Aue, “Random matrix theory in statistics: a dom matrices: applications to correlation matrices, elliptical review,” Journal of Statistical Planning and Inference, vol. 150, distributions and beyond,” The Annals of Applied Probability, pp. 1–29, 2014. vol. 19, no. 6, pp. 2362–2405, 2009. [8] C. Stein, “Estimation of a covariance matrix,” Rietz Lecture, [31] P.-A. Reigneron, R. Allez, and J.-P. Bouchaud, “Principal 1975. regression analysis and the index leverage effect,” Physica A: [9] A. Takemura, “An orthogonally invariant minimax estimator Statistical Mechanics and its Applications, vol. 390, no. 17, of the covariance matrix of a multivariate normal population,” pp. 3026–3035, 2011. DTIC Document, Tech. Rep., 1983. [32] R. Couillet, F. Pascal, and J. W. Silverstein, “The random [10] Z. Bai, B. Miao, and G. Pan, “On asymptotics of eigenvectors matrix regime of maronnas m-estimator with elliptically dis- of large sample covariance matrix,” The Annals of Probability, tributed samples,” Journal of Multivariate Analysis, vol. 139, vol. 35, no. 4, pp. 1532–1572, 2007. pp. 56–78, 2015. [11] O. Ledoit and S. Pech´ e,´ “Eigenvectors of some large sample [33] R. B. Dozier and J. W. Silverstein, “On the empirical distri- covariance matrix ensembles,” Probability Theory and Related bution of eigenvalues of large dimensional information-plus- Journal of Multivariate Analysis Fields, vol. 151, no. 1-2, pp. 233–264, 2011. noise-type matrices,” , vol. 98, no. 4, pp. 678–694, 2007. [12] R. Allez and J.-P. Bouchaud, “Eigenvector dynamics under free [34] R. O. Schmidt, “Multiple emitter location and signal parameter addition,” Random Matrices: Theory and Applications, vol. 03, estimation,” Antennas and Propagation, IEEE Transactions on, no. 03, p. 1450010, 2014. vol. 34, no. 3, pp. 276–280, 1986. [13] R. Allez, J. Bun, and J.-P. Bouchaud, “The eigenvectors of [35] N. Xia and X. Zheng, “Integrated covariance matrix estimation gaussian matrices with an external source,” arXiv preprint for high-dimensional diffusion processes in the presence of arXiv:1412.7108, 2014. microstructure noise,” arXiv preprint arXiv:1409.2121, 2014. [14] Philippe Biane, “Free probability for probabilists,” Quantum [36] W. Hachem, P. Loubaton, J. Najim, and P. Vallet, “On bilinear Probability Communications, vol. 11, no. 11, pp. 55–71, 2003. forms based on the resolvent of large random matrices,” An- ´ Spin glass theory [15]M.M ezard, M. A. Virasoro, and G. Parisi, nales de l’Institut Henri Poincare,´ Probabilites´ et Statistiques, and beyond . World scientific, 1987. vol. 49, pp. 36–63, 2013. [16] F. Morone, F. Caltagirone, E. Harrison, and G. Parisi, “Replica [37] N. Wiener, Extrapolation, interpolation, and smoothing of theory and spin glasses,” arXiv preprint arXiv:1409.2722, stationary time series. MIT press Cambridge, MA, 1949, 2014. vol. 2. [17] A. Knowles and J. Yin, “Anisotropic local laws for random [38] L. Haff, “Empirical bayes estimation of the multivariate normal matrices,” arXiv preprint arXiv:1410.3516, 2014. covariance matrix,” The Annals of Statistics, vol. 8, pp. 586– [18] N. El Karoui, “Spectrum estimation for large dimensional 597, 1980. covariance matrices using random matrix theory,” The Annals [39] R. Chicheportiche and J.-P. Bouchaud, “The joint distribution of Statistics, vol. 36, pp. 2757–2790, 2008. of stock returns is not elliptical,” International Journal of [19] E. Brezin´ and A. Zee, “Correlation functions in disordered Theoretical and Applied Finance, vol. 15, no. 03, pp. 1–23, systems,” Physical Review E, vol. 49, no. 4, p. 2588, 1994. 2012. [20] D. Voiculescu, K. Dykema, and A. Nica, Free random vari- [40] J. Bun, J. P. Bouchaud, and M. Potters, “Cleaning correlation ables. American Mathematical Soc., 1992, no. 1. matrices,” Risk Magazine, 2016. [21] A. Zee, “Law of addition in random matrix theory,” Nuclear [41] O. Ledoit and M. Wolf, “A well-conditioned estimator for Physics B, vol. 474, no. 3, pp. 726–744, 1996. large-dimensional covariance matrices,” Journal of multivariate [22] E. Brezin´ and A. Zee, “Universal relation between green analysis, vol. 88, no. 2, pp. 365–411, 2004. functions in random matrix theory,” Nuclear Physics B, vol. [42] ——, “Numerical implementation of the quest function,” De- 453, no. 3, pp. 531–551, 1995. partment of Economics-University of Zurich, Tech. Rep., 2016. [23] E. Brezin,´ S. Hikami, and A. Zee, “Universal correlations for [43] P. Biane, “Free brownian motion, free stochastic calculus deterministic plus random hamiltonians,” Physical Review E, and random matrices,” Free probability theory (Waterloo, ON, vol. 51, no. 6, p. 5442, 1995. 1995), vol. 12, pp. 1–19, 1997. [24] J. Wishart, “The generalised product moment distribution in [44] Harish-Chandra, “Differential operators on a semisimple lie samples from a normal multivariate population,” Biometrika, algebra,” American Journal of Mathematics, vol. 79, pp. 87– vol. 20A, pp. pp. 32–52, 1928. 120, 1957. 15

[45] C. Itzykson and J.-B. Zuber, “The planar approximation. ii,” where the real part is often called the Hilbert transform Journal of Mathematical Physics, vol. 21, p. 411, 1980. hM(λ) and the imaginary part leads to the eigenvalues [46] A. Morozov, “Pair correlator in the itzykson-zuber integral,” Modern Physics Letters A, vol. 7, no. 37, pp. 3503–3507, 1992. density. [47] S. L. Shatashvili, “Correlation functions in the itzykson-zuber When we consider the case of adding two random model,” Communications in mathematical physics, vol. 154, matrices that are (asymptotically) free with each other, no. 2, pp. 421–432, 1993. [48] R. Speicher, “Free probability theory,” in The Oxford handbook it is suitable to introduce the functional inverse of the of Random Matrix Theory. Oxford University Press, 2011. Stieltjes transform known as the Blue transform [21] [49] Z. Burda, “Free products of large random matrices-a short re- view of recent developments,” Journal of Physics: Conference BM(gM(z)) = z. (A.3) Series, vol. 473, p. 012002, 2013. [50] G. Parisi, “A sequence of approximated solutions to the sk This allows us to define the so-called R-transform model for spin glasses,” Journal of Physics A: Mathematical 1 R (z) .= B (z) − , (A.4) and General, vol. 13, no. 4, p. L115, 1980. M M z [51] E. Marinari, G. Parisi, and F. Ritort, “Replica field theory for deterministic models. ii. a non-random spin glass with glassy which can be seen as the analogue in RMT of the behaviour,” Journal of Physics A: Mathematical and General, logarithm of the Fourier transform for free additive vol. 27, no. 23, p. 7647, 1994. convolution. More precisely, if A and B are two [52] T. Tanaka, “Asymptotics of harish-chandra-itzykson-zuber in- tegrals and free probability theory,” Journal of Physics: Con- N × N independent invariant symmetric random ma- ference Series, vol. 95, p. 012002, 2008. trices, then in the large N limit, the spectral measure [53] A. Guionnet and M. Ma¨ıda, “A fourier view on the r-transform of M = A + B is given by and related asymptotics of spherical integrals,” Journal of Functional Analysis, vol. 222, no. 2, pp. 435 – 490, 2005. RM(z) = RA(z) + RB(z), (A.5) [54] B. Collins and P. Sniady,´ “Integration with respect to the haar measure on unitary, orthogonal and symplectic group,” known as the free addition formula [20]. In this case, Communications in Mathematical Physics , vol. 264, no. 3, pp. we note by ρ the eigenvalues density of M. 773–795, 2006. AB We can do the same for the free multiplicative convolution. In this case, we rather have to define the APPENDIX A so-called T (or sometimes η [3] ) transform given by REMINDERONTRANSFORMSIN RMT Z ρ (λ)λ T (z) = M dλ ≡ zg (z) − 1, (A.6) We give in this first appendix a short reminder on M z − λ M the different transforms that are useful in the study of which can be seen as the moment generating function the statistics of eigenvalues in RMT due to their link of M. Then, the S-transform of M is then defined as with free probability theory (see e.g. [48] or [49] for z + 1 a review). The resolvent of M is defined by: SM(z) := −1 (A.7) zTM (z) G (z) .= (zI − M)−1, (A.1) −1 M N where TM (z) is the functional inverse of the T - transform. Before showing why the S-transform is and the Stieltjes (or sometimes Cauchy) transform is important in RMT, one has to be careful about the the normalized trace of the Resolvent: notion of product of free matrices. Indeed, if we recon- . 1 gM(z) .= TrGM(z) , sider the two N × N independent symmetric random N matrices A and B, the product AB is in general not N Z 1 X 1 ρM(λ) = ∼ dλ . self-adjoint even if A and B are self-adjoint.√ However,√ N z − λk N→∞ z − λ k=1 if A is positive definite, then the product AB A (A.2) makes sense and share the same moments than the product A√B. We√ can thus study the spectral measure The Stieltjes transform can be interpreted as the aver- of M = AB A in order to get the distribution of age law and is very convenient in order to describe the ρ the free multiplicative convolution AB. The result, convergence of the eigenvalues density ρM. If we set first obtained in [20], reads: z = λ − iη and take the limit η → 0, we have in the S (z) := S (z) = S (z)S (z). large N limit AB M A B (A.8) Z ρ (λ0) The S-transform is therefore the analogue of the G (λ − iη) = P.V. M dλ0 + iπρ (λ) M i λ − λ0 M Fourier transform for free multiplicative convolution. 16

APPENDIX B to uncontrolled approximation in some cases [50]. We DERIVATION OF THE GLOBAL LAW ESTIMATE stress that the replica trick (B.3) allows to investigate A. The replica method each entry i, j ∈ {1,...,N} of the resolvent GM. This result we get is thus stronger than classical tools Throughout the following, we shall use the abbre- from free probability theory which concerns only the viation G(z) ≡ G (z) for simplicity. The starting M Stieltjes transform, i.e. the normalized trace of the point of our approach is to rewrite the entries of . N×N resolvent. the resolvent G(z) .= (Gij(z)) ∈ R using the Gaussian integral representation of an inverse matrix: B. Free additive noise Gij(z) = We consider a model of the form n o R η η exp 1 PN η (zδ − M )η QN dη i j 2 k,l=1 k k,l kl l k=1 k M = C + OBO† n o . R exp − 1 PN η (zδ − M )η QN dη 2 k,l=1 k kl kl l k=1 k where B is a fixed matrix with eigenvalues b1 > b2 > (B.1) ··· > bN with spectral ρB and O is a random matrix chosen in the Orthogonal group O(N) according to We recall that the claim is that for a complex z not the Haar measure. Clearly, the noise term is invariant too close to the real axis, we expect the resolvent to under rotation so that we expect the resolvent of M to be self-averaging in the large N limit, that is to say be in the same basis than C. We therefore set without independent of the specific realization of the matrix loss of generality that C is diagonal. Throughout the itself. Therefore we can study the resolvent GM(z) following, we fix i, j ∈ {1,...,N} with possibly j 6= through its ensemble average (denoted by h·i in the i. In order to derive the global law estimate for the following) given by: resolvent of the matrix M, we have to consider the hG (z)i = ensemble average value of the resolvent over the Haar ij measure for the O(N) group, which can be written as *  N  N + 1 Z 1 X Y follow ηiηj exp − ηk(zδkl − Mkl)ηl dηk Z 2 Z n N ! n k,l=1 k=1 Y Y α 1 1 Y − 1 PN (ηα)2(z−c ) hG (z)i = dη η η e 2 k=1 k k (B.2) ij k i j α=1 k=1 α=1 D − 1 PN ηα(OBO†) ηα E where Z is the partition function, i.e. the denominator × e 2 k,l=1 k kl l . (B.4) in Eq. (B.1). The computation of the average value O is highly non trivial in the general case. The replica The evaluation of the later equation can be done method tells us that the expectation value can be straightforwardly if we set the measure dO to be a handled thanks to the following identity † flat measure constrained to the fact that OO = IN , or * Z N ! equivalently said: n−1 Y hGij (z)i = lim Z dηk ηiηj × n→0 N N N ! k=1 Y Y X  N + DO ∝ dOij δ OikOjk − δi,j 1 X i,j=1 i,j=1 k=1 exp − ηk(zδk,l − Mkl)ηl 2 k,l=1 where δ(·) is the Dirac delta function and δi,j is Z N n ! Y Y α 1 1 Kronecker delta. In the case where n is finite (and = lim dη ηi ηj × n→0 k independent of N), one can notice that Eq. (B.4) is the α=1 k=1 Orthogonal low-rank version of the Harish-Chandra- *  n N +  1 X X  exp − ηα(zδ − M )ηα . Itzykson-Zuber integrals ([44], [45]). The result is 2 k k,l kl l  α=1 k,l=1  known for all symmetry groups ([51], [52] or [53] for (B.3) a more rigorous derivation), and reads for the rank-n case We have thus transformed our problem to the compu- Z " n !# 1 X α α † † DO exp Tr η (η ) OBO (B.5) tation of n replicas of the initial system (B.1). So when 2 we have computed the average value in (B.2), it suf- α=1 " n  # N X 1 α † α fices to perform an analytical continuation of the result = exp W (η ) η , (B.6) 2 B N to real values of n and finally takes the limit n → 0. α=1 The main concern of this non-rigorous approach is that with WB the primitive of the R-transform of B. We we assume that the analytical continuation can be done emphasize that (B.5) can be obtained using a simple with only n different set of points which could lead saddle-point calculation valid when n is finite, in the 17

limit N → ∞ and its explicit structure simplifies a which gives the following relation for the Stieltjes transform: ∗ ∗ lot the following computations compared to standard gM(z) = gC(z − RB(p )) = p , which is equivalent to: ∗ approaches that uses Weingarten calculus (see e.g. p = gM(z) = gC (z − RB (gM(z))) , (B.10) [54]). and therefore By plugging (B.5) into (B.4), we see that we need ζ∗ = R (g (z)). (B.11) to compute: B M In conclusion, by plugging (B.10) and (B.11) into (B.7) Z N ! Y 1 1 and then taking the limit n → 0, we obtain the global law hG (z)i = dη η η × ij k i j estimate k=1 −1 hGij (z)i = (Z(z)IN − C)i,i δi,j (B.12) ( n "  α † α  N #) N X (η ) η 1 X α 2 exp WB − (ηk ) (z − ck) , with 2 N 2 α=1 k=1 Z(z) = z − RB (gM(z)) , (B.13) α where we introduced a Lagrange multiplier p = which gives exactly the matrix result stated in Eq. (III.2) and 1 α † α N (η ) η which gives using Fourier transform (re- (III.3) since the indexes i and j are arbitrary. naming ζα = −2iζα/N)

ZZZ n ! n ! n N ! C. Free multiplicative noise Y α Y α Y Y α hGij (z)i ∝ dp dζ dηk × Let us set the measurement matrix M as: α=1 α=1 α=1 k=1 √ √ † ( n ) M = COBO C (B.14) N X α α α exp [WB(p ) − p ζ ] × 2 α=1 where O is still a rotation matrix over the Orthogonal group, ( n N ) C is a positive definite matrix and B is such that TrB 6= 0. 1 X X η1η1 exp − (ηα)2(z − ζα − c ) . Note that we can assume without loss of generality that C i j 2 k k α=1 k=1 is diagonal because the argument of the previous subsec- This additional constraint allows to retrieve a Gaussian tion still applies. As in the previous section, we may fix i, j ∈ {1,...,N} throughout the following without loss of integral over the {ηj} which can be computed exactly. Ignoring normalization terms, we obtain generality as the generalization to the matrix relation (III.8) will be immediate. Using the abbreviation G(z) ≡ GM(z), ZZ δi,j the replica identity (B.3) allows us to write the entries of the hGij (z)i ∝ 1 × z + ζ − ci resolvent of M as follow n !   Z n N ! Nn α α Y α α Y Y α 1 1 − z Pn PN (ηα)2 exp − F0(p , ζ ) dp dζ (B.7) hG (z)i = dη η η e 2 α=1 k=1 k × 2 ij k i j α=1 α=1 k=1 √ √ D 1 PN Pn α † α E k,l=1 α=1 ηk ( COBO C)klηl where the ‘free energy’ F0 is given by e 2 . (B.15) O n " N † n √  √  1 X X α P α α F (p, ζ) .= log(z − ζ − c ) We can notice that the matrix α=1 C η C η 0 Nn k α=1 k=1 is a symmetric rank-n matrix, with n finite and independent # of N. Therefore, the expectation over the Haar measure still α α α − WB(p ) + p ζ . (B.8) leads to a rank-n Orthogonal version of HCIZ integral, and the result reads [53]:   In the large N limit, the integral can be evaluated by * N n √ √ +  1 X X α † α considering the saddle-point of the free energy F as the exp η ( COBO C)klη 0 2 k l  k,l=1 α=1  other term is obviously sub-leading. We now use the replica O symmetric ansatz that tells us if the free energy is invariant " n N !# N X 1 X α 2 under the action of the symmetry group O(N), then we = exp W (η ) c . (B.16) 2 B N i i expect a saddle-point which is also invariant. This implies α=1 i=1 that we have at the saddle-point As in the free addition case, let us defined the auxiliary α 1 PN α 2 α α variable p = N i=1(ηi ) ci that we enforce by a Dirac p = p and ζ = ζ , ∀α ∈ {1, . . . , n}, (B.9) delta function. By following the same step as in the previous section, we get a Gaussian integral over the {ηα} that we from which we find the following solution: k compute to eventually find that ( ∗ ∗ ZZ ζ = RB(p ) δi,j ∗ ∗ hGij (z)i ∝ 1 × p = gC(z − ζ ) . z − ζ ci   n Nn α α Y α α The trick is to see that we can get rid of one variable by exp − F0(p , ζ ) dp dζ (B.17) 2 taking the normalized trace of the (average) resolvent (B.7) α=1 18

where the free energy is given by Then going back to (B.25), we see that the spectral density

n " N of M is given by the free multiplication α α . 1 X 1 X α F0(p , ζ ) .= log(z − ζ ck) n N SM(ω) = SC(ω)SB(ω), (B.28) α=1 k=1 # as expected and it proves that the replica symmetry ansatz α α α + ζ p − WB(p ) . (B.18) holds in this model. Finally, by plugging (B.27) into (B.23), we conclude that we can characterize the asymptotic global We now assume that the saddle-point solution can be com- law of the resolvent of M by a deterministic quantity: puted using the replica symmetry ansatz (B.9), so that the δ hG (z)i = i,j . free energy becomes ij ci (B.29) z − S (T (z)) N B M α α 1 X F0(p , ζ ) ≡ F0(p, ζ) = log(z−ζck)+ζp−WB(p). All in all, the global law of the resolvent of the free N 1/2 † 1/2 k=1 multiplication M = C OBO C is given by (B.19) −1 We first consider the derivative w.r.t. p which leads to z hGij (z)i = δi,j Z(z)(Z(z) − ci) , . ∗ Z(z) .= zSB(zgM(z) − 1) (B.30) ζ = RB(p). (B.20) The other derivative gives which is exactly the result stated in Eq. (III.8) since the indexes i, j were arbitrary throughout the computations.  z  TC ∗ ∗ RB(p ) p = ∗ . (B.21) RB(p ) D. Information-Plus-Noise matrix Hence, we see that the resolvent is given in the large N limit The computation of the global law estimate for this model and the limit n → 0 by is pretty similar to the sample covariance [26]. The noisy δ measurement matrix is given by hG (z)i = i,j . (B.22) ij ∗ 1 z − RB(p )ci M = (A + σX)(A + σX)† T It now remains to determine p∗ and to that end, we can find a genuine simplification in this last expression by using the with A a fixed N×T matrix such that T −1AA† = C. As we connexion with the free multiplication convolution. By taking posit that X is a Gaussian matrix, we can work in the basis the normalized trace in the latter equation, it yields where C is once again diagonal. Moreover, we can in fact z show that interchanging the integral and the average leads zg (z) = Zg (Z), Z = , M C with ∗ (B.23) to the same result. We hence consider directly the annealed RB(p ) average and abbreviate G(z) ≡ GM(z) in order to lighten which can be easily rewrite using (A.6) as the notations. Let us compute the entries of G using (B.3)

TM(z) = TC(Z). with n = 1 and we find Z N ! Next, let us define Y − z PN η2 hGij (z)i = dηk ηiηj e 2 k=1 k × k=1 ω = TM(z) = TC(Z) (B.24)  1/2 1/2  1 PN PT σ2η c Y Y c η ∗ ∗ e 2T k,l=1 t=1 t k k k,t l,t l l , which implies from (B.21) that p = ω/RB(p ). Then, we may rewrite the Eq. (B.24) as N ∗ where we have defined Yt := At + σXt ∈ R which is zTM(z) = ZTC(Z)RB(p ) . still a Gaussian vector. One can readily compute the average It is trivial to see that plugging ω into this latter expression value over the measure of Y to find −1 −1 ∗   yields ωTM (ω) = ωTC (ω)RB(p ), and by the definition * N T +  1 X X 2 1/2 1/2  (A.7) of the S-transform, we have exp σ η c Y Y c η ∝ 2T t k k k,t l,t l l  k,l=1 t=1  SC(ω) SM(ω) = . (B.25) − T ( N ) ∗  2  2 2 −1 RB(p ) σ † 1  σ †  X 2 1 − η η exp 1 − η η c η , T 2 T k k In order to retrieve the desired result, we use the following k=1 relation 1 ∗ ∗ where we omitted all constants terms and used Sherman- ∗ = SB(p RB(p )), (B.26) RB(p ) Morrison formula in the exponential term. Rewriting p = 2 −1 † that follows from the very definition of the S-transform of σ T η η that we enforced by a Dirac delta function, we ∗ ∗ can therefore compute the integral over {ηk} to find B. But recalling that p = ω/RB(p ), we conclude from (B.20), (B.24) and (B.26) that Z Z δ  N  hG (z)i = dp dζ i,j exp − F (p, ζ) , ij 2 ci 0 1 z − qζσ − ∗ 2 = R (p∗) = S (T (z)). (B.27) 1−p ζ∗ B B M (B.31) 19

and we can once again compute the integral in the large N limit by performing the saddle-point of the following free energy 1 − q F (p, ζ) .= log(1 − p) − pζ (B.32) 0 q N 1 X 2 + log (z − qσ ξ)(1 − p) − c  . N k k=1 The derivation over ζ gives the following equation p∗ = 2 ∗ 2 ∗  qσ (1 − p )gC (z − qζσ )(1 − p ) , and by taking the normalized trace of the resolvent, we see that we have

∗ 2 p = qσ gM(z). (B.33) The other derivative leads to 1 − q ζ∗ = + zg (z). (B.34) q M Therefore, by plugging ζ∗ and p∗ into (B.31) and then by sending N → ∞ leads to: −1 hG (z)i = δ (zZ(z) − σ2(1 − q)) − Z(z)−1  , ij i,j C ii (B.35) with 2 Z(z) = 1 − qσ gM(z). (B.36) This is exactly the result announced in Eqs. (III.23) and (III.24) or in [36], showing that considering directly the annealed average is indeed correct in the limit N → ∞.

APPENDIX C ABBREVIATIONS AND SYMBOLS

Symbol Definition RMT Random Matrix Theory ESD Empirical Spectral Density LSD Limiting Spectral Density RI Rotational Invariance RIE Rotational Invariant Estimator SNR Signal to Noise Ratio GM(z) Resolvent of M, see Eq. (A.1) ρM(z) LSD of M gM(z) Stieltjes transform of ρM, see Eq. (A.2) BM(z) Functional inverse of gM(z), see Eq. (A.3) RM(z) R-transform of M, see Eq. (A.4) TM(z) Moment generating function of ρM, see Eq. (A.6) SM(z) S-transform of M, see Eq. (A.7)