1 A Robust Approach to Minimum Liusha Yang∗, Romain Couillet†, Matthew R. McKay∗

Abstract—We study the design of portfolios under a minimum In the finance literature, several approaches have been risk criterion. The performance of the optimized portfolio relies proposed to get around the problem of the scarcity of samples. on the accuracy of the estimated matrix of the One approach is to impose some factor structure on the portfolio asset returns. For large portfolios, the number of available market returns is often of similar order to the number of the [8, 9], which reduces the of assets, so that the sample covariance matrix performs poorly as number of parameters to be estimated. A second approach is a covariance estimator. Additionally, financial market often to use as a covariance matrix estimator a weighted average of contain which, if not correctly handled, may further the sample covariance matrix and another estimator, such as corrupt the covariance estimation. We address these shortcom- the 1-factor covariance matrix or the identity matrix [10, 11]. ings by studying the performance of a hybrid covariance matrix estimator based on Tyler’s robust M-estimator and on Ledoit- A third approach is a nonlinear shrinkage estimation approach Wolf’s shrinkage estimator while assuming samples with heavy- [12], which modifies each eigenvalue of the SCM under the tailed distribution. Employing recent results from random matrix framework of Markowitz’s portfolio selection. A fourth ap- theory, we develop a consistent estimator of (a scaled version proach comprises eigenvalue clipping methods [13–15] whose of) the realized portfolio risk, which is minimized by optimizing underlying idea is to ‘clean’ the SCM by filtering noisy online the shrinkage intensity. Our portfolio optimization method is shown via simulations to outperform existing methods both for eigenvalues claimed to convey little valuable information. This synthetic and real market data. approach has also been employed recently in proposing novel vaccine design strategies for infectious diseases [16, 17], and its theoretical foundations have been examined in [18]. A I.INTRODUCTION fifth method employs a bootstrap-corrected estimator for the optimal return and its asset allocation, which reduces the error The theory of portfolio optimization is generally associated of over-prediction of the in-sample return by bootstrapping with the classical -variance optimization framework of [6]. In contrast to all of these methods (which aim to improve Markowitz [1]. The pitfalls of the mean-variance analysis are the covariance matrix estimate), alternative methods have also mainly related to its sensitivity to the estimation error of been proposed which directly impose various constraints on the and covariance matrix of the asset returns. It is the portfolio weights, such as a no-shortsale constraint [3], a nonetheless argued that estimates of the covariance matrix L norm constraint and a L norm constraint [19, 20]. By are more accurate than those of the expected returns [2, 3]. 1 2 bounding directly the portfolio-weight vector, it is demon- Thus, many studies concentrate on improving the performance strated that the estimation error can be reduced, particularly of the global minimum variance portfolio (GMVP), which when the portfolio size is large [19]. provides the lowest possible portfolio risk and involves only In addition to the problem of sample deficiency, it is often the covariance matrix estimate. the case that the return observations exhibit impulsiveness and The frequently used covariance estimator is the well-known local loss of stationarity [21], which is not addressed by the sample covariance matrix (SCM). However, covariance esti- methods mentioned above and leads to performance degrada- mates for portfolio optimization commonly involve few his- tion. The field of robust estimation [22–25] intends to deal with

arXiv:1503.08013v1 [q-fin.PM] 27 Mar 2015 torical observations of sometimes up to a thousand assets. this problem. However, classical robust covariance In such a case, the number of independent samples n may generally require n  N and do not perform well (or are be small compared to the covariance matrix dimension N, not even defined) when n ' N, making them unsuitable which suggests a poor performance of the SCM. The impact for many modern applications. Recent works [26–32] based of the estimation error on the out-of-sample performance of on random matrix theory have therefore considered robust the GMVP based on the SCM has already been analyzed in estimation in the n ' N regime. Two hybrid robust shrinkage [4–7]. covariance matrix estimates have been proposed in parallel in ∗L. Yang and M. R. McKay are with the Department of [29, 30] and in [31], respectively, both of which estimators Electronic and Computer Engineering, Hong Kong University of are built upon Tyler’s robust M-estimatior [23] and Ledoit- Science and Technology, Clear Water Bay, Kowloon, Hong Kong. Wolf’s shrinkage approach [11]. In [32], the authors show, by (email:[email protected];[email protected]). †R. Couillet is with the Laboratoire de Signaux et Systmes (L2S, means of random matrix theory, that in the large n, N regime UMR8506), CNRS-CentraleSupelec-Universit´ e´ Paris-Sud, 3 rue Joliot-Curie, and under the assumption of elliptical vector observations, the 91192 Gif-sur-Yvette, France (email:[email protected]). estimators in [29, 30] and [31] perform essentially the same Yang and McKay’s work is supported by the Hong Kong Research Grants Council under grant number 16206914. Couillet’s work is supported by the and can be analyzed thanks to their asymptotic closeness to ERC MORE EC–120133. well-known random matrix models. Therefore, in this paper, 2 we concentrate on the estimator studied in [29, 30], which we in particular the class of elliptical distributions, including the denote by Cˆ ST (ST standing for shrinkage Tyler). Namely, for multivariate , exponential distribution and N ˆ independent samples x1, ..., xn ∈ R with zero mean, CST the multivariate Student-T distribution as special cases. This is the unique solution to the fixed-point equation model for xt leads to tractable and adoptable design solutions n T and is a commonly used approximation of the impulsive nature 1 X xtxt Cˆ ST(ρ) = (1 − ρ) + ρIN of financial data [10]. n 1 T ˆ −1 N t=1 N xt CST(ρ)xt Let h ∈ R denote the portfolio selection, i.e., the vector of asset holdings in units of currency normalized by the total for any ρ ∈ (max{1 − N , 0}, 1]. It should be noted that the n outstanding wealth, satisfying hT 1 = 1. In this paper, short- shrinkage structure even allows n < N. N selling is allowed, and thus the portfolio weights may be This paper designs a novel minimum variance portfolio op- negative. Then the portfolio variance (or risk) over the invest- timization strategy based on Cˆ with a risk-minimizing (in- ST ment period of interest is defined as σ2(h) = E[|hT x |2] = stead of Frobenius norm minimizing [32]) shrinkage parameter t hT C h [1]. Accordingly, the GMVP selection problem can ρ. We first characterize the out-of-sample risk of the minimum N be formulated as the following quadratic optimization problem variance portfolio with plug-in ST for all ρ within a specified with a linear constraint: . This is done by analyzing the uniform convergence of 2 T the achieved realized risk on ρ in the double limit regime, min σ (h) s.t. h 1N = 1. h where N, n → ∞, with cN = N/n → c ∈ (0, ∞). We subsequently provide a consistent estimator of the realized This has the well-known solution portfolio risk (or, more precisely, a scaled version of it) that −1 CN 1N is defined only in terms of the observed returns. Based on hGMVP = 1T C−11 this, we obtain a risk-optimized ST covariance estimator by N N N optimizing online over ρ, and thus our optimized portfolio. and the corresponding portfolio risk is The proposed portfolio selection is shown to achieve supe- 1 2 (2) rior performance over the competing methods in [11, 31–33] σ (hGMVP) = T −1 . 1 C 1N in minimizing the realized portfolio risk under the GMVP N N framework for impulsive data. The outperformance of our Here, (2) represents the theoretical minimum portfolio risk portfolio optimization strategy compared to other methods is bound, attained upon knowing the covariance matrix CN demonstrated through Monte Carlo simulations with ellipti- exactly. In practice, CN is unknown, and instead we form an ˆ cally distributed samples, as well as with real data of historical estimate, denoted by CN . Thus, the GMVP selection based ˆ (daily) stock returns from Hong Kong’s Hang Seng Index on the plug-in estimator CN is given by (HSI). ˆ −1 ˆ CN 1N Notations: Boldface upper case letters denote matrices, hGMVP = . 1T Cˆ −11 boldface lower case letters denote column vectors, and stan- N N N T dard lower case letters denote scalars. (·) denotes transpose. The quality of hˆGMVP, implemented based on the in-sample IN denotes the N × N identity matrix and 1N denotes an N- covariance prediction Cˆ N , can be measured by its achieved dimensional vector with all entries equal to one. tr[·] denotes out-of-sample (or “realized”) portfolio risk: the matrix trace operator. and denote the real and complex R C T ˆ −1 ˆ −1 fields of dimension specified by a superscript. k·k denotes the 2   1N CN CN CN 1N σ hˆGMVP = . T ˆ −1 2 Euclidean norm for vectors and the spectral norm for matrices. (1N CN 1N ) The Dirac measure at point x is denoted by δ . The ordered x The goal is to construct a good estimator ˆ , and conse- eigenvalues of a symmetric matrix X of size N × N are CN quently ˆ , which minimizes this quantity. denoted by λ (X) ≤ ... ≤ λ (X), and the cardinality of a set hGMVP 1 N Note that, for the naive uniform diversification rule, C ⊂ is denoted by |C|. Letting U, V be symmetric N × N h = R 1 ˆ N 1N . This is equivalent to setting CN = IN , and yields the matrices, we write U < V if U − V is positive semidefinite. T 1N CN 1N realized portfolio risk: N 2 . Interestingly, this extremely II.DATA MODEL AND PROBLEM FORMULATION simple strategy has been shown in [34] to outperform numer- ous optimized models and will serve as a benchmark in our We consider a comprising x , ..., x ∈ N 1 n R work. logarithmic returns of N financial assets. We assume the xt to be independent and identically distributed (i.i.d.) with III.NOVEL COVARIANCE ESTIMATOR AND PORTFOLIO √ 1/2 xt = µ + τtCN yt, t = 1, 2, ..., n, (1) DESIGNFORMINIMIZINGRISK N A. Tyler’s robust M-estimator with linear shrinkage where µ ∈ R is the mean vector of the asset returns, τt N×N is a real, positive , CN ∈ R is positive Consider the ST covariance matrix estimate introduced in N definite and yt ∈ R is a zero mean unitarily invariant random [29, 30], built upon both Tyler’s M-estimate [23] and the 2 vector with norm kytk = N, independent of the τi’s. It is Ledoit-Wolf shrinkage estimator [11]. This estimator accounts assumed that µ and CN are time-invariant over the observation for the scarcity of samples, even allowing N > n, and exhibits 1/2 period. Denote zt = CN yt. The model (1) for xt embraces robustness to outliers or impulsive samples, e.g., elliptically 3 distributed data. It is defined as the unique solution to the The following theorem presents our first key result: a deter- following fixed-point equation for ρ ∈ (max{0, 1 − n/N}, 1]: ministic characterization of the asymptotic realized portfolio ˆ n T risk achieved with CST(ρ). 1 X x˜tx˜t Cˆ (ρ) = (1 − ρ) + ρI (3) − ST 1 T −1 N Theorem 1. Let Assumption 1 hold. For ε ∈ (0, min{1, c 1}), n x˜ Cˆ (ρ)x˜t t=1 N t ST −1 define Rε = [ε + max{0, 1 − c }, 1]. Then, as N, n → ∞, where x˜ = x − 1 Pn x . t t n i=1 i   Since with probability one the x are linearly independent, 2 ˆ 2 a.s. t sup σ hST(ρ) − σ¯ (ρ) −→ 0 (6) ρ∈R Cˆ ST(ρ) is almost surely defined for each N and n [29, ε Theorem III.1]. The corresponding GMVP selection is where −1 ˆ 2 ˆ CST(ρ)1N γ hST(ρ) = 2 × T ˆ −1 σ¯ (ρ) = 2 2 1N CST(ρ)1N γ − β(1 − ρ) −1 −1 with realized portfolio risk T 1−ρ  1−ρ  1N γ CN + ρIN CN γ CN + ρIN 1N   1T Cˆ −1(ρ)C Cˆ −1(ρ)1 . 2 ˆ N ST N ST N   −1 2 σ hST(ρ) = − . (4) T 1−ρ T ˆ 1 2 1 CN + ρIN 1N (1N CST(ρ)1N ) N γ Our goal is to optimize ρ online such that (4) is minimum. Proof: See Appendix B. However, since (4) involves CN which is unobservable, this equation cannot be optimized directly. Also note that the naive Remark 1. In Theorem 1, the set Rε excludes the region ˆ − approach of simply replacing CN with CST(ρ) in (4) would [0, ε + max{0, 1 − c 1}). As we handle the uniformity of the yield the so-called “in-sample risk”, which underestimates the convergence (6), the proof of Theorem 1 requires us to work on realized portfolio risk, leading to overly-optimistic investment sequences {ρ }∞ of ρ. It is however difficult to handle the n n=1 decisions [33]. We tackle this problem by obtaining a consis- limit σ2(hˆ (ρ )) − σ¯2(ρ ) for a sequence {ρ }∞ with tent estimator for a scaled version of the realized risk (4) as ST n n n n=1 ρ → 0. This follows from the same reasoning as that in [32] n and N go to infinity at the same rate. Contrary to classical n (see Equations (5) and (6) in Section 5.1 of [32] as well as asymptotic theory for time series analysis and mathematical Equation (12) in Appendix A where ρ → ρ > 0 is necessary statistics, which typically deal with the case of N fixed and n 0 to ensure e+ < 1). In the subsequent results, ρ ∈ R is also n → ∞, a double-limiting condition is of more relevance for ε required for the same reason. large portfolio problems, where n is comparable to N. To this end, following [33], we first derive a deterministic asymptotic Theorem 1 enables us to analyze the convergence of the equivalent of (4) and then provide a consistent estimator based realized portfolio risk in the regime of Assumption 1-a for on this. hˆST(ρ). In order to calibrate the shrinkage parameter ρ for optimum GMVP performance, only the available sample data B. Deterministic equivalent of the realized portfolio risk and certainly not the unknown CN can be used. This is the For our asymptotic analysis, we assume the following: objective of the subsequent section. Assumption 1. C. Consistent estimation of scaled realized portfolio risk a. As N, n → ∞, N/n = cN → c ∈ (0, ∞). b. The τt, t = 1, ..., n are i.i.d. τ1, ..., τn ≥ ξ a.s. for some Based on the observable data only, we can obtain an 1 ξ > 0 and E[τ1] < ∞. estimator of a scaled version of the realized portfolio risk, c. Denoting ≤ ≤ the ordered eigenvalues 2 ˆ R 0 < λ1 ... λN σ (hST(ρ))/κ, where we define κ , tν(dt). We begin with 1 PN of CN , as N, n → ∞, νN , N i=1 δλi satisfies the following lemma that provides a consistent estimator of νN → ν weakly with ν =6 δ0 almost everywhere. In γ, scaled by 1/κ, which is denoted as γˆsc (“sc” standing for addition, lim supN λN < ∞. “scaled”). We also introduce some further definitions, which will arise Lemma 1. Under the settings of Theorem 1, as N, n → ∞, in our asymptotic analysis. For ρ ∈ (max(0, 1−c−1), 1], define a.s. γ the unique positive solution to sup |γˆsc − γ/κ| −→ 0 (7) ρ∈Rε Z t 1 = ν(dt) (5) γρ + (1 − ρ)t where n T −1 and 1 1 X x˜ Cˆ (ρ)x˜t γˆ = t ST . Z cγ2t2 sc 1 − (1 − ρ)c n kx˜ k2 β = ν(dt). N t=1 t (γρ + (1 − ρ)t)2 Proof: See Appendix C. 1For technical reasons, made explicit in the appendix, we require the The following theorem provides a consistent estimator of 1 Pn q τi quantities z˜t = zt − zi to have controllable norms. This 2 ˆ 2 n i=1 τt σ (hST(ρ)), scaled by 1/κ, which is denoted as σˆsc(ρ). This imposes the constraint τt ≥ ξ > 0 which might be possible to relax at is our second main result. the expense of increased mathematical complexity. 4

  T ˆ −1 ˆ ˆ −1 γˆ 1N CST(ρ) CST(ρ) − ρIN CST(ρ)1N σˆ2 (ρ) = sc . (8) sc (1 − ρ) − (1 − ρ)2c T ˆ −1 2 N (1N CST(ρ)1N )

Theorem 2. Under the settings of Theorem 1, as N, n → ∞, 3) Cˆ C2, the oracle estimator in [31], which has the same structure as Cˆ C, but resorts to solving an approximate 2 1 2 ˆ a.s. sup σˆ (ρ) − σ (hST(ρ)) −→ 0 problem of minimizing the Frobenius distance to find the sc κ ρ∈Rε optimal shrinkage; 2 4) Cˆ , the Ledoit-Wolf shrinkage estimator in [11]; where σˆsc(ρ) is defined in (8) at the top of the page. LW 5) Cˆ R, the Rubio estimator proposed in [33], which has the Proof: See Appendix D. same structure as Cˆ LW, but with ρ calibrated based on Note that, since κ is independent of ρ, the same ρ minimizes the GMVP framework, as in the present article. 2 2 both σ (hˆST(ρ)) and σ (hˆST(ρ))/κ. The following corollary of Theorem 2 is of fundamental importance, which demonstrates that choosing ρ to minimize A. simulations 2 is asymptotically equivalent to minimizing the unob- The synthetic data are generated i.i.d. from a multivariate σˆsc(ρ) √ 2 ˆ p 2 2 servable σ (hST(ρ)). Student-T distribution, where τt = d/χd, d = 3 and χd is a Chi-square random variable with d degree of freedom. We Corollary 1. Denote ρo and ρ∗ the minimizers of σˆ2 (ρ) and sc set N = 200. The mean vector µ can be set arbitrarily since it σ2(hˆ (ρ)) over R , respectively. Then, under the settings of ST ε is discarded by the empirical mean, having no impact on the Theorem 1 and Theorem 2, as N, n → ∞, covariance estimates. We assume the population covariance

2  o  2  ∗  a.s. matrix C is based on a one-factor return structure [35]: σ hˆST (ρ ) − σ hˆST (ρ ) −→ 0. N T 2 CN = bb σ + Σ, where σ = 0.16. The factor loadings Proof: See Appendix E. b ∈ RN are evenly spread between 0.5 and 1.5. The residual With this result, the GMVP optimization problem is now variance matrix Σ ∈ RN×N is set to be diagonal and 2 2 reduced to the minimization of σˆsc(ρ), which can be done proportional to the identity matrix: Σ = σr I, where σr = 0.2. with a simple numerical search. Fig. 1 illustrates the performance of different estimation To summarise, given n past return observations of N assets, approaches in terms of the realized risk, averaged over 200 our proposed algorithm to construct a portfolio with minimal Monte Carlo simulations. The risk bound is computed by (2), risk can be described as follows: the theoretical minimum portfolio risk. Compared to other ˆ o methods, our proposed estimator CST achieves the smallest Algorithm 1 Proposed algorithm for GMVP optimization realized risk for both n ≤ N and n > N. We omit the realized 1) Compute the optimized shrinkage parameter via a numer- risks achieved by Cˆ N = IN as they are uniformly more than ical search five times as large as those achieved by the other methods. ˆ o o  2 It is interesting to compare the optimized ρ of CST and ρ = arg min σˆsc(ρ) . −1 Cˆ . They are both solutions of (3), but with ρ optimized ρ∈[ε+max{0,1−c },1] P N under different metrics: minimizing the risk and minimizing ˆ o 2) Form the risk-minimizing ST estimator CST, the unique the Frobenius distance, respectively. As shown in Fig. 2, the solution to optimal shrinkage parameter varies under different metrics. n T Interestingly, optimizing ρ under the risk function as opposed 1 X x˜tx˜ Cˆ o = (1 − ρo) t + ρoI . to the Frobenius distance leads to more aggressive shrinkage ST n 1 T ˆ o−1 N t=1 N x˜t CST x˜t (regularization) towards the identity matrix, thus producing a 3) Construct the optimized portfolio portfolio allocation which is closer to the uniform allocation policy. Cˆ o−11 hˆo = ST N . ST T ˆ o−1 1N CST 1N B. Real market data simulations We now investigate the out-of-sample portfolio performance of the different estimators with the real market data. We consider the stocks comprising the HSI. In particular, we IV. SIMULATION RESULTS use the dividend-adjusted daily closing prices downloaded We use both synthetic data and real market data to show from the Yahoo Finance database to obtain the continuously ˆ o the performance of CST compared to the following competing compounded (logarithmic) returns for the 45 constituents of methods: the HSI over L = 736 working days, from Jan. 3, 2011 to 1) Cˆ P, referred to as the Abramovich-Pascal estimate from Dec. 31, 2013 (excluding weekends and public holidays). [32]; As conventionally done in the financial literature, the out- 2) Cˆ C, referred to as the Chen estimate from [32]; of-sample evaluation is defined in terms of a rolling window 5

−3 x 10 5.5 o the different covariance estimators at the optimal estimation Cˆ ST window length of 300. We also test whether the pairwise 5 ˆ CP ˆ o differences between the portfolio variance achieved by CST Cˆ 4.5 C and each benchmark strategy are statistically different from ˆ CC2 zero. Since standard hypothesis tests are not valid when returns 4 ˆ CLW have tails heavier than the normal distribution or are correlated ˆ 3.5 CR across time, we follow the method described in [36] and

Realized risk bound 3 [37] and employ a studentized version of the circular block bootstrap [38] to do the test. The p-values are computed under 2.5 the null hypothesis that the portfolio variance achieved by

2 a particular benchmark covariance matrix estimator is equal 150 200 250 300 350 400 450 ˆ o n (N=200) to that achieved by CST. We use a block length b = 5 and base our reported p-values on 2000 bootstrap iterations. Fig. 1. The average realized portfolio risk of different covariance estimators We also compute the p-values when the block lengths are in the GMVP framework using synthetic data. b = 1 and b = 10. The interpretation of the results does not change for b = 1, b = 5, or b = 10. This implies that the 0.7 temporal correlations of the stock returns are weak and our ˆo CST i.i.d. assumption on the data is acceptable. In the row reporting 0.6 ˆ the risks, statistically significant outperformance of Cˆ o over CP ST 0.5 other methods is denoted by asterisks: ** denotes significance at the 0.01 level (p < 0.01) and * denotes significance at ρ 0.4 the 0.05 level (p < 0.05). It can be seen from Table I that the outperformance of our proposed method is statistically 0.3 significant, with p < 0.05 in all cases. Optimal As a further comparison to investigate the performance with 0.2 finer temporal resolution than that in Fig. 3, we carry out

0.1 a rolling-window analysis on the realized risks. Under the optimal estimation window length of 300, we obtain 436 out-

0 of-sample portfolio returns. From the start of the data, we use 150 200 250 300 350 400 450 n (N=200) the most recent 70 out-of-sample portfolio returns to compute the (annualized) standard deviations of the GMVP. Shifting ˆ o ˆ Fig. 2. The optimal shrinkage parameters of CST and CP in the synthetic one day forward, we repeat this procedure until the end of the data simulation. portfolio returns. For each covariance matrix estimator, this results in 367 risk measurements, which are then displayed in a time series plot, Fig. 4. We find that 69.2% of the time, method. At a particular day t, we use the previous n days (i.e., ˆ o CST achieves the lowest risk among all alternative methods. from t−n to t−1) as the training window for covariance esti- In addition, during the period of high volatility, that is, when ˆ mation and construct the portfolio selection hGMVP. We then 230 < t < 300, Cˆ o exhibits the greatest outperformance. ˆ ST use hGMVP to compute the portfolio returns in the following This justifies that our proposed GMVP optimization strategy 10 days. Next the window is shifted 10 days forward and is robust to market fluctuations and even possibly to outliers. the portfolio returns for another 10 days are computed. This procedure is repeated until the end of the data. The realized risk is computed conventionally as the annualized sample V. CONCLUSIONS standard of the corresponding GMVP returns. In our We have proposed a novel minimum-variance portfolio tests, different training window lengths are considered. optimization strategy based on a robust shrinkage covariance ˆ o Fig. 3 shows that the proposed CST achieves the smallest estimator with a shrinkage parameter calibrated to minimize realized risk. It outperforms the other methods over the entire the realized portfolio risk. Our strategy has been shown to be span of considered estimation windows. The realized risk robust to finite- effects as well as to the impulsive achieved by Cˆ N = IN is also omitted here because it is characteristics of the data. It has been demonstrated that our more than double as those achieved by the competing methods. approach outperforms more standard techniques in terms of When the estimation window is too long (e.g., greater than 320 the realized portfolio risk, both for synthetic data and for days), we observe that the performance starts to systematically real historical stock returns from Hong Kong’s HSI. Although degrade. This is presumably due to a lack of stationarity in the we base our analysis on the assumption of the absence of data over such long durations. This highlights an interesting the outliers, a recent study [39] has shown that the robust phenomenon worthy of further consideration, but a detailed covariance estimator Cˆ ST is resilient to arbitrary outliers study falls beyond the scope of the current contribution. by appropriately weighting good versus outlying data. This When the estimation window length is 300, the lowest risk is somewhat confirmed by our real data tests and is worth ˆ o is achieved by CST. Table I presents the risks obtained by investigating further. 6

TABLE I REALIZED PORTFOLIO RISKS (ANNUALIZED STANDARD DEVIATIONS) ANDTHECORRESPONDINGP-VALUES UNDER DIFFERENT COVARIANCE MATRIX ESTIMATORS.

ˆ o ˆ ˆ ˆ ˆ ˆ Dataset CST CP CC CC2 CLW CR IN Risk (n=300) 0.0419 0.0433∗∗ 0.0428∗ 0.0430∗ 0.0438∗∗ 0.0439∗∗ 0.1112∗∗ HSI p-value 1.000 0.009 0.028 0.041 0.001 0.001 0.000

0.058 portfolio optimization. These considerations are left to future o Cˆ 0.056 ST work. ˆ CP 0.054 ˆ APPENDIX A CC 0.052 ˆ PRELIMINARY RESULTS CC2 In this appendix we provide some preparatory lemmas that 0.05 Cˆ LW are essential for the proof of the main theorems. From now ˆ 0.048 CR on, for readability, we discard all unnecessary indices ρ when

0.046 no confusion is possible. We start by rewriting Cˆ ST in a more convenient form. Annualized 0.044 Denoting

0.042 50 100 150 200 250 300 350 400 450 1 r τ n (N=45) z˜t = zt − ZN , t = 1, 2, ..., n, n τt √ √ √ T Fig. 3. Realized portfolio risks achieved out-of-sample over 736 days of with τ = ( τ1, ..., τn) and ZN = [z1, ..., zn], after some HSI real market data (from Jan. 3, 2011 to Dec. 31, 2013) by a GMVP basic algebra, we obtain implemented using different covariance estimators. n T 1 X z˜tz˜t Cˆ ST = (1 − ρ) + ρIN . 0.065 n 1 T ˆ −1 t=1 N z˜t CSTz˜t Cˆ ST 0.06 T z˜tz˜ Cˆ P ˆ ˆ 1 t Denoting C(t) CST − (1 − ρ) 1 T −1 and using , n z˜ Cˆ z˜t ˆ N t ST 0.055 CC (A+rυυυυυT )−1υ = A−1υυ/(1+rυυT A−1υ) for positive definite ˆ 0.05 CC2 matrix A, vector υ and scalar r > 0, we have ˆ CLW 1 T −1 0.045 z˜ Cˆ z˜ 1 T −1 N t (t) t Cˆ R ˆ z˜t CSTz˜t = 1 T −1 z˜ Cˆ z˜t 0.04 N N t (t) 1 + (1 − ρ)cN 1 T ˆ −1 N z˜t CST z˜t 0.035 so that

Annualized standard deviation 0.03 1 T ˆ −1 1 T ˆ −1 z˜t CSTz˜t = (1 − (1 − ρ)cN ) z˜t C(t) z˜t (9) 0.025 N N 0 50 100 150 200 250 300 350 400 t (N=45,n=300) and we can rewrite Cˆ ST as n T 70 1 − ρ 1 X z˜tz˜ Fig. 4. Annualized rolling-window standard deviations of the most recent Cˆ = t + ρI . out-of-sample log returns for the GMVP based on different covariance matrix ST 1 − (1 − ρ)c n 1 T ˆ −1 N estimators. N t=1 N z˜t C(t) z˜t ˆ 1 T ˆ −1 For t ∈ {1, ..., n}, denote dt(ρ) , N z˜t C(t) z˜t. The ˆ Even though GMVP is not an optimal portfolio in terms of following lemma gives a deterministic approximation of dt(ρ), ˆ the Sharpe ratio or return maximization at a given level of risk, which later helps to show that, up to scaling, CST is somewhat Pn T many empirical studies [40, 41] has shown that an investment similar to t=1 ztzt , which is not observable. in the GMVP often yields better out-of-sample results than Lemma 2. Under the settings of Theorem 1, as N, n → ∞, other mean-variance portfolios, because of the poor estimates a.s. of the means of the asset returns. Therefore, besides the robust sup max dˆt(ρ) − γ(ρ) −→ 0. 1≤t≤n estimation of the covariance matrix, it would be of interest ρ∈Rε to take into account the robust estimation of the means and Proof: This is proved via a contradiction argument, which further develop robust approaches to the various portfolio follows along lines similar to the proof in [32]. The main dif- optimization strategies that involve both the estimates of the ference lies in that we re-center the sample data by subtracting means and the covariance matrix of the asset returns, such the sample mean, while the samples are assumed to be zero as Sharpe ratio maximization or Markowitz’s mean-variance mean in [32]. By subtracting the sample mean, the re-centered 7

√ ˆ a.s. data are correlated and some τt terms still remain in CST, Together with |me N,n − mN,n| −→ 0, we have which introduces new technical difficulties. + a.s. mN,n − m −→ 0. (13) Assuming (by relabelling) that dˆ1(ρ) ≤ ... ≤ dˆn(ρ), we e ˆ first prove that for any fixed ` > 0, dn(ρ) is bounded above It was demonstrated in [32] that m+ < 1. But this is in by γ(ρ) + ` for all large n, uniformly on ρ ∈ R . Since ε contradiction with me N,n ≥ 1. ⇒ −1 −1 U < V V < U , for positive definite matrices U and Now assume ρ0 = 1. According to [32], V, we obtain 1 −→a.s. − !−1 mN,n < 1. − n 1 T 1 + ` ˆ 1 T 1 ρ 1 X z˜tz˜t dn(ρ) = z˜n + ρIN z˜n N 1 − (1 − ρ)cN n dˆ (ρ) Then t=1 t n−1 !−1 1 a.s. − T mN,n − −→ 0, 1 T 1 ρ 1 X z˜tz˜t e 1 + ` ≤ z˜n + ρIN z˜n. N 1 − (1 − ρ)cN n dˆ (ρ) t=1 n 1 ≥ but 1+` < 1, again raising a contradiction with me N,n 1. Since z˜n =6 0 with probability one, this implies Hence, for all large n, there is no converging subsequence of ρ (and thus no subsequence of ρ ) for which dˆ (ρ ) > − !−1 n n n n − n 1 ˆ 1 T 1 ρ 1 X T ˆ γ(ρn) + ` infinitely often. Therefore dn(ρ) ≤ γ(ρ) + ` for all 1 ≤ z˜n z˜tz˜t + dn(ρ)ρIN z˜n. N 1 − (1 − ρ)cN n large n a.s., uniformly on ρ ∈ Rε. t=1 ˆ (10) The same reasoning holds for d1(ρ), which can be proved greater than γ(ρ) − ` for all large n uniformly on ρ ∈ Rε. ∞ Assume that there exists a sequence {ρn}n=1 over which Following the same arguments in [32], since ` > 0 is ˆ dn(ρn) > γ(ρn) + ` infinitely often, for some fixed ` > 0. arbitrary, from the ordering of the dˆt(ρ), we have proved that ∞ a.s. Since {ρn}n=1 is bounded, it has a limit point ρ0 ∈ Rε. Let ˆ − −→ supρ∈Rε max1≤t≤n dt(ρ) γ(ρ) 0.  us restrict ourselves to such a subsequence on which ρ → n The following three lemmas, Lemma 3, 4 and 5 show ρ0 > 0 and dˆn(ρn) > γ(ρn) + `. On this subsequence, from 1 T that functionals of Tyler’s estimator asymptotically perform (10), we have m ≥ 1, where m = z˜ M z˜ and n n e N,n e N,n N j fN,n n similar to functionals of 1 P z zT or 1 P y yT . They  −1 n t=1 t t n t=1 t t M = 1−ρn 1 Pn z˜ z˜T + (γ(ρ ) + `)ρ I . are used as an intermediate step for the development of the fN,n 1−(1−ρn)cN n t=1 t t n n N asymptotic deterministic equivalent of the risk function. Using The quadratic form me N,j is amenable to large random matrix analysis. The first step is to remove the effect of the existing results in [33], quoted as Lemma 6 in this paper, we 1 T can then obtain our main theorems. sample mean. Denote mN,j = N zj MN,jzj and MN,j =  −1 For notational convenience, we denote k = k(ρ) 1−ρn 1 , P T 1−ρ 1−(1−ρ )c n t=6 j ztzt + (γ(ρn)+ `)ρnIN . We have in n N 1−(1−ρ)c . Also recall that γ is the unique positive solution particular: R t N×N to 1 = γρ+(1−ρ)t ν(dt). Assuming AN ∈ R is a deterministic symmetric nonnegative definite matrix, for some Proposition 1. As N, n → ∞, ( a.s. [0, ∞) if lim infN λ1(AN ) > 0 max |mN,j − mN,j| −→ 0. (11) η > 0, define D = , and 1≤j≤n e [η, ∞) otherwise further define that, for ρ ∈ R and w ∈ D, Proof: See Appendix F. ε −1 n T ! Remark 2. In Proposition 1, Assumption 1-b is necessary; 1 X x˜tx˜t Re N = AN + (1 − ρ) + wIN that is, i.i.d. ≥ a.s. for some and ∞. n 1 T ˆ −1 τ1, ..., τn ξ ξ > 0 E[τ1] < t=1 N x˜t CSTx˜t It guarantees that for t = 1, ..., n, the norm of z˜ does not go −1 q t n ! off to infinity, recalling that z˜ = z − 1 Z τ . k 1 X T t t n N τt SeN = AN + z˜tz˜ + wIN γ n t t=1 a.s. By Proposition 1, we have |mN,n − mN,n| −→ 0. This n !−1 e k 1 X allows us to follow the proof in [32], which deals with data S = A + z zT + wI . N N γ n t t N with mean zero. t=1 To proceed, assume first ρ0 =6 1. From the proof of Theorem Then we introduce the following lemma. 1 in [32], N a.s. Lemma 3. Assume aN ∈ R is a deterministic vector with mN,n −→ 2 lim supN kaN k < ∞. Under the settings of Theorem 1, as 1 − (1 − ρ )c  1 − (1 − ρ )c N, n → ∞, 0 δ −(γ(ρ ) + `)ρ 0 m+, 1 − ρ 0 0 1 − ρ , 0 0 T T a.s. (12) sup aN Re N aN − aN SN aN −→ 0. (14) ρ∈Rε,w∈D where, for x < 0, δ(x) is the unique positive solution to Proof: Define Z t n δ(x) = ν(dt). ˆ k 1 X T − t BN (ρ) = z˜tz˜t , x + 1+cδ(x) γ(ρ) n t=1 8

n T 1 X x˜tx˜t For the fourth term, with Assumption 1-b, we have Dˆ N (ρ) = (1 − ρ) n 1 x˜T Cˆ −1x˜ √ √ t=1 N t ST t 1 n  1 τ   1 τ T n X T lim sup ZN √ ZN √ (a) 1 − ρ 1 X z˜tz˜ = t n n n τt n τt 1 − (1 − ρ)c n ˆ t=1 N t=1 dt(ρ) ( n ! n !) 1 T 1 X 1 X 1 ≤ lim sup ZN ZN τt <∞ a.s. where (a) uses the identity (9). Denote n n n n τ t=1 t=1 t T T ∆ a Re N aN − a SeN aN 1 Pn T , N N Therefore, lim sup z˜tz˜ < ∞ a.s. Together with   n n t=1 t (a) T ˆ ˆ Lemma 2, from (15), we have = aN Re N BN (ρ) − DN (ρ) SeN aN

sup Bˆ (ρ) − Dˆ (ρ) −→a.s. 0. (17) where (a) uses the identity that U−1 − V−1 = U−1(V − N N ρ∈Rε U)V−1 for invertible U, V matrices. We first prove that as a.s. N, n → ∞, sup |∆| −→ 0. Note that w ∈ D ensures lim sup sup RN < ρ∈Rε,w∈D N ρ∈Rε,w∈D e As N, n → ∞, using the definition of k, ∞ and lim sup sup SeN < ∞. N ρ∈Rε,w∈D ˆ ˆ 2 sup DN (ρ) − BN (ρ) Together with (17) and kaN k < ∞, we have ρ∈Rε n 2 ˆ sup |∆| ≤ kaN k sup Re N sup SeN 1 X T 1 − ρ dt(ρ) − γ(ρ) ρ∈Rε,w∈D ρ∈Rε,w∈D ρ∈Rε,w∈D ≤ z˜tz˜t sup max . n ∈R 1≤t≤n 1 − (1 − ρ)c ˆ t=1 ρ ε γ(ρ)dt(ρ) n n T k 1 X 1 X x˜tx˜ a.s. (15) × sup z˜ z˜T − (1 − ρ) t −→ 0. γ n t t n 1 T ˆ −1 ρ∈Rε x˜t CSTx˜t We will show that the RHS of (15) goes to 0 a.s. t=1 t=1 N Recalling Lemma 2, this follows upon showing that Following the same reasoning as that of Proposition 1, we 1 Pn T have lim supn n t=1 z˜tz˜t < ∞ a.s. To this end, recall that 1 q τ T T a.s. z˜t = zt − ZN . Then sup a S a − a S a −→ 0. n τt N eN N N N N ρ∈Rε,w∈D n a.s. 1 X T Together with sup |∆| −→ 0, we obtain (14). z˜tz˜ ρ∈Rε,w∈D  n t  −1 t=1 k 1 Pn T Define WN = AN + ytyt + wIN and n n √ T n √ γ n t=1     −1 1 X T 1 X 1 τ 1 X 1 τ T  T  − − n y˜ty˜ = ztzt zt ZN √ ZN √ zt − 1 P t , n n n τt n n τt WfN = AN + (1 ρ) n t=1 1 T 1/2 ˆ −1 1/2 + wIN t=1 t=1 t=1 N y˜t CN CST CN y˜t n  √   √ T where y˜ = y − 1 Pn y . We introduce the following 1 X 1 τ 1 τ t t n i=1 i + ZN √ ZN √ . (16) lemma. n n τ n τ t=1 t t Lemma 4. Under the settings of Lemma 3, as N, n → ∞, We will show that the spectral norm of each term on the RHS of (16) is bounded for all large n a.s. T T a.s. sup aN WfN aN − aN WN aN −→ 0. (18) First, from Assumption 1-c. and [42], we have ρ∈Rε,w∈D 1 Pn T lim supn n t=1 ztzt < ∞ a.s. Next, for the second and Proof: The derivation is similar to that of (14). the third terms on the RHS of (16), Lemma 5. Under the settings of Lemma 3 and assuming n  √ T n  √  1 X 1 τ 1 X 1 τ AN = 0, as N, n → ∞, √ √ T zt ZN = ZN zt n n τt n n τt n T t=1 t=1 T 1 X x˜ix˜i 2 T sup a (1 − ρ) Re aN n ! n ! N n 1 T ˆ −1 N 1 X yt 1 X √ ρ∈Rε,w∈[η,∞) x˜i CSTx˜i √ i=1 N = CN yt τt n n τt n t=1 t=1 T k 1 X T 2 a.s. −aN zizi SN aN −→ 0. (19) n !T n ! γ n 1 X yt 1 X √ i=1 ≤ kC k √ y τ . N n τ n t t Proof: We first notice that t=1 t t=1 n T By the law of large numbers, as N, n → ∞, T 1 X x˜ix˜i 2 a (1 − ρ) Re aN N 1 T ˆ −1 N !T ! n x˜ C x˜i n n √ i=1 N i ST 1 X yt 1 X a.s. " n # √ yt τt − c −→ 0. T n τ n d T 1 X x˜ix˜i t = − a (1 − ρ) Re N aN t=1 t=1 dw N n 1 T ˆ −1 i=1 N x˜i CSTx˜i According to Assumption 1-a and Assumption 1-c, we d   √ T T T n   = − a aN − wa Re N aN 1 P 1 √τ N N can see that lim supn t=1 zt ZN = dw n n τt    √  T d T 1 Pn 1 τ T = a Re N aN + w a Re N aN . lim sup ZN √ z ≤ c kCN k < ∞ a.s. N N n n t=1 n τt t dw 9

−2 n −1/2 T −1/2 ! Following similar steps, we also have 1 − 1 XC x˜tx˜ C = 1T C 1/2 (1−ρ) N t N +ρC−1 n N N 1 T −1 N N n x˜ Cˆ x˜t T k 1 X T 2 T d T  t=1 N t ST a ziz S aN = a SN aN + w a SN aN . N γ n i N N dw N × −1/2 i=1 CN 1N   d 1 T −1/2 −1/2 The almost sure convergence (14) in Lemma 3 when extended = − 1 C WfN C 1N . dw N N N N to w ∈ C is uniform on any bounded region of (C − R) ∪ D, w=0 and the functionals of w in (14) are analytic. Thus, by the −1/2 Setting A = ρC−1 and aT = √1 C 1 in (20), as N N N N N N Weierstrass convergence theorem [43], the following holds: −1 well as AN = ρCN in WN , yields d   d sup aT R a − aT S a  −→a.s. 0. N e N N N N N 1 −1/2 −1/2 ρ∈R ,w∈D dw dw T ε sup 1N CN WN CN 1N ρ∈Rε,w∈[0,∞) N Together with Lemma 3, we obtain (19).  1 T −1/2 −1/2 a.s. − 1 C JN C 1N −→ 0 (22) Lemma 6. [33, Appendix I-B] Under the settings of Lemma 3, N N N N as N, n → ∞, −1  −1  k   a.s. T T where JN = ρCN + (γ+e (w)k) + w IN and for sup aN SN aN − aN TN aN −→ 0 (20) N ρ∈Rε,w∈D each w ∈ [0, ∞), e˜N (w) is the unique positive solution to −1 the following equation:  k  where TN = AN + CN + wIN and, for (γ+eN (w)k) "   −1# each w ∈ D, e (w) is the unique positive solution to the 1 −1 k N e˜N (w) = tr ρCN + + w IN . following equation: n γ +e ˜N (w)k " # 1  k −1 Lemma 4 and the convergence (22) imply eN (w) = tr CN AN + CN + wIN . n (γ + eN (w)k) 1 T −1/2 −1/2 sup 1 C WfN C 1N N N N N Moreover, when A = 0, we have ρ∈Rε,w∈[0,∞) N n 1 T −1/2 −1/2 a.s. T k 1 X T 2 − 1N CN JN CN 1N −→ 0. sup a z z S a N N γ n t t N N ρ∈Rε,w∈[η,∞) t=1 Following the same reasoning as for the proof of Lemma 5,

kγ aT C T2 a the convergence of the derivatives holds such that at w = 0 N N N N a.s. − 2 −→ 0.  (γ + eN (w)k) k2 1 2 2 by the Weierstrass convergence theorem, 1 − 2 tr [C T ] (γ+eN (w)k) n N N   d 1 T −1/2 −1/2 sup 1N CN WfN CN 1N APPENDIX B ρ∈Rε dw N w=0   PROOFOF THEOREM 1 d 1 T −1/2 −1/2 a.s. − 1N CN JN CN 1N −→ 0. First consider the (re-scaled) realized portfolio risk: dw N w=0 1 T ˆ −1 ˆ −1 With Eq. (23) on the top of the next page and 2 N 1N CSTCN CST1N a.s. Nσ (hˆST) = . (21) |e˜ (0) − cγ| −→ 0 when N, n → ∞, we have 1 T ˆ −1 2 N ( N 1N CST1N ) 1 γ2 1 T ˆ −1 ˆ −1 − T For the denominator, Lemma 3 and Lemma 6 imply sup 1N CSTCN CST1N 2 2 1N ρ∈Rε N γ − β(1 − ρ) N −1 1 1  1 − ρ  −1 −1 sup 1T Cˆ −11 − 1T C + ρI 1 −→a.s. 0. 1 − ρ  1 − ρ  N N ST N N N γ N N N × C + ρI C C + ρI 1 ρ∈Rε N N N N N N γ γ Note that in this case, A = 0, a = √1 1 and w = ρ, a.s. N N N N −→ 0. (24) a.s. which leads to |eN (ρ) − cγ| −→ 0 when N, n → ∞. The derivation is based on Assumption 1-c and the definition of γ Equipped with the asymptotic equivalences of the denominator in (5). and numerator of (21), we prove Theorem 1. For the numerator, we rewrite it as APPENDIX C 1 T −1 −1 1 Cˆ C Cˆ 1 PROOFOF LEMMA 1 N N ST N ST N 1 T −1/2 −1/2 ˆ −1/2 −2 −1/2 First notice that = 1N CN (CN CSTCN ) CN 1N , N n T −1 1 1 1 X x˜ Cˆ (ρ)x˜t γˆ = t ST which, upon substituting the RHS of (3) for Cˆ and setting sc − − 1 2 ST 1 (1 ρ)cN N n N kx˜tk −1 t=1 AN = ρCN in WfN , yields n T −1 1 1 1 X z˜ Cˆ (ρ)z˜t = t ST . 1 T −1 −1 − − 1 2 1 Cˆ C Cˆ 1 1 (1 ρ)cN N n N kz˜tk N N ST N ST N t=1 10

−2 1 T 1/2  k  1/2   − 1 C CN + ρIN C 1N d 1 −1/2 −1/2 N N N γ+˜eN (0)k N 1T C J C 1 = . (23) N N N N N  −2 dw N w=0 k2 1 2  k   1 − 2 tr C CN + ρIN (γ+˜eN (0)k) n N γ+˜eN (0)k

It has been shown in Lemma 2 that With respect to the asymptotic equivalence in (24) and upon a.s. sup max ≤ ≤ dˆ (ρ) − γ(ρ) −→ 0, where substituting γˆ for γ/κ, we obtain ρ∈Rε 1 t n t sc dˆ (ρ) = 1 1 z˜T Cˆ −1(ρ)z˜ . Therefore, to prove 1 γˆ t 1−(1−ρ)cN N t ST t T −1 −1 sc sup 1 Cˆ CN Cˆ 1N − 1 2 a.s. κN N ST ST (1 − ρ) − (1 − ρ)2c the convergence (7), it is left to show that N kz˜tk −→ κ. ρ∈Rε N We start by writing   1 T ˆ −1 ˆ ˆ −1 a.s. × 1N CST CST − ρIN CST1N −→ 0. √ √ T N 1 1 1 τ 1 τ k k2 T − T − T z˜t = zt zt zt ZN √ √ ZN zt 1 2 ˆ N N n τt n τt Thus we obtain the consistent estimator of κ σ (hST(ρ)) in √ √ ! Theorem 2. 1 τ T τ T (25) + 2 √ ZN ZN √ . n τt τt APPENDIX E PROOFOF COROLLARY 1 Since the second and the third term on the RHS of (25) are the same, we analyze the second term only. It can be rewritten According to Theorem 2, we have as 2 1 2 ˆ a.s. √ √ sup σˆsc(ρ) − σ (hST(ρ)) −→ 0. (t) ρ∈R κ 1 1 T τ 1 1 T 1 1 T (t) τ ε zt ZN √ = zt zt + zt ZN √ N n τt N n N n τt Then, the following holds true (t) σˆ2 (ρo) ≤ σˆ2 (hˆ (ρ∗)) where ZN√is the matrix with the tth column removed from sc sc ST and τ (t) is the vector with the th entry removed 1 ∗ 1 ZN τ t √ σ2(hˆ (ρ )) ≤ σ2(hˆ (ρo)) √ 1 (t) τ (t) ST ST from τ . Since zt is independent of Z √ , we have κ κ √ n N τt (t) (t) a.s. 1 1 1 T √τ 1 T 2 o − 2 ˆ o ≤ 2 − 2 ˆ n zt ZN τ −→ 0. Together with n zt zt = O(1) a.s., we σˆsc(ρ ) σ (hST(ρ )) sup σˆsc(ρ) σ (hST(ρ)) t √ κ ρ∈R κ 1 1 T τ a.s. ε have z ZN √ −→ 0. a.s. N n t τt −→ 0 For the last term in (25), ∗ 1 ∗ 1 √ T √ 2 − 2 ˆ ≤ 2 − 2 ˆ 1 τ τ σˆsc(ρ ) σ (hST(ρ )) sup σˆsc(ρ) σ (hST(ρ)) T κ ρ∈Rε κ lim sup √ ZN ZN √ n2 τ τ a.s. n t t −→ 0. n 1 T 1 1 X These four relations together ensure that ≤ lim sup ZN ZN τi < ∞. n n τt n i=1 2 o 2 ∗ a.s. |σ (hˆST(ρ )) − σ (hˆST(ρ ))| −→ 0. √ T √ 1 1 τ T τ a.s. Thus, 2 √ Z Z √ −→ 0. N n τt N N τt Since the last three terms on the RHS of (25) vanish with APPENDIX F 1 2 2 a.s. PROOFOF PROPOSITION 1 large n, we obtain N z˜t − zt −→ 0. Therefore, as 1 2 a.s. 1 2 a.s. Denote N kztk −→ κ, we obtain N kz˜tk −→ κ and the convergence (7) unfolds. |me N,j − mN,j| = | − A − B + C − D|, where 1 ≤ j ≤ n and APPENDIX D √ T PROOFOF THEOREM 2 1 1 τ T A , √ ZN MN,j zj N n τj According to Lemma 5 and Lemma 6, in which we set √ τ A = 0, w = ρ and a = √1 1 , the convergence (26) at 1 T 1 τ N N N N B , zj MN,j ZN √ the top of the next page holds. N n τj a.s. √ T √ As |eN (ρ) − cγ| −→ 0 when N, n → ∞, we substitute cγ 1 1 τ T 1 τ C , √ ZN MN,j ZN √ for eN (ρ) in (26), giving N n τj n τj  √ √ 2 T 1   (1−ρ)−(1−ρ) c 1 T 1 − ρn 1 1 X τ τ T T ˆ −1 ˆ ˆ −1 D z˜ M Z √ √ Z sup 1 C CST −ρIN C 1N − , j fN,j  2 N N N ST ST N 1 − (1 − ρn)cN n n τt τt ρ∈Rε N γ t6=j 2  −2 √ √  γ 1 1 − ρ a.s. T × 1T C C + ρI 1 −→ 0. 1 X τ T 1 X τ T 2 2 N N N N N − zt √ ZN − ZN √ zt  MN,j z˜j . γ − β(1 − ρ) N γ n τt n τt t6=j t6=j 11

−2 1 T  k  1N CN CN + ρIN 1N 1 T −1   −1 kγ N (γ+eN (ρ)k) a.s. sup 1 Cˆ Cˆ ST − ρIN Cˆ 1N − −→ 0. (26) N ST ST 2  −2 ρ∈Rε N (γ + eN (ρ)k) k2 1 2  k  1 − 2 tr C CN + ρIN (γ+eN (ρ)k) n N (γ+eN (ρ)k)

We wish to prove that For term A1,

p Kp h i | − | ≤ (27) 1 1 p/2−1 T p E[ me N,j mN,j ] p A1 = · 4 E z MN,jzj N N p np j for some constants p ≥ 1, where K depends on p but not 1 1 p ≤ · 4p/2−1E ky k2pkM kpkkC kp . on N. Then, taking p ≥ 2, along with the union bound, the N p np j N,j N Markov inequality, and the Borel-Cantelli lemma, completes Note also that the proof of Proposition 1. 1 Using the Minkowski inequality, we have k kp ≤ MN,j p p , p (γ(ρn) + `) ρn E [|me N,j − mN,j| ] ≤ (E1/p[|A|p] + E1/p[|B|p] + E1/p[|C|p] + E1/p[|D|p])p. and by Minkowsky’s inequality,

p KpA N Thus, to prove (27), it is enough to show that E[|A| ] ≤ N p , 2p X 2 p p 2p p p KpB p KpC p KpD E[kyjk ] = E( yi,j) ≤ N E|y1,j| ≤ KpN . E[|B| ] ≤ p , E[|C| ] ≤ p and E[|D| ] ≤ p . N N N i=1 A. Moments of |A|, |B| and |C| Thus Start by noting that p p/2−1 p 1 KpkCN k 4 cN KpA1   A1 ≤ p ≤ . √ √ T p/2 N p (γ(ρ ) + `)pρ N p 1 1 τ τ n n E[|A|p]= E zT M Z ZT M z p p  j N,j N √ √ N N,j j  N n τj τj Now consider A2: " √ ! (a) 3p/2−3   p/2  (j) 2 T h p/2i 1 1 (j) τ A ≤ E z Q z − tr (Q ) + E |tr (Q )| = E zT M z + Z 2 p p j N j N N p p j N,j j N √ N n N n τj (b) K  p/4  ≤ p E Ep/4|z |4tr[Q QT ] √ T p/2 p p 1,j N N (j) ! N n (j) τ h i i × zj + Z √ MN,jzj  p T p/4 p/2 N τ  + E|z1,j | tr (QN QN ) + E |tr (QN )| j √ (j) Kp  p/4 4 p  (j) τ p √ T = E |z | + E|z | + 1 EkM Z k " (j) p p 1,j 1,j N,j N √ 1 1 τ N n τj = E zT M z zT + z Z(j)T p p j N,j j j j √ N K   N n τj ≤ p Ep/4|z |4 + E|z |p + 1 N pnp 1,j 1,j √ √ √ T ! p/2 √ p! (j) (j) (j) (j) (j) τ T (j) τ τ (j)T p p/2 (j) τ × E kMN,j k kCN k Y √ +ZN √ zj + ZN √ √ ZN MN,jzj  N τ τj τj τj j p/2  p/4 4 p  √ p KpK E |z1,j | + E|z1,j | + 1 (j) (a) 1 CN 1 τ ≤ E Y(j) ≤ A1 + A2 + A3 + A4, p p p N √ N (γ(ρn) + `) ρn n τj where (a) follows from Jensen’s inequality and p ≤ KpA2 /N , p/2−1 4 h p/2i √ √ T T T (j) τ (j) τ (j) (j)T A1 = p p E zj MN,jzjzj MN,jzj where QN = MN,jZ √ √ Z MN,j, (a) follows N n N τj τj N " √ 4p/2−1 τ (j) from Jensen’s inequality and (b) follows from the trace lemma A = E zT M Z(j) 2 p p j N,j N √ [44, Lemma B.26]. N n τj For A3, p/2 √ T  √ p (j) p/2−1 " # τ (j)T 4 τ (j) h pi × Z M z A ≤ E1/2 zT M Z(j) E1/2 zT M z . √ N N,j j  3 p p j N,j N √ j N,j j τj N n τj

 √ p/2 p p p/2−1 (j) As we have A1 ≤ KpA /N and A2 ≤ KpA /N , we 4 T (j) τ T 1 2 A3 = E z MN,jZ √ z MN,jzj p p p  j N j  obtain A3 ≤ KpA3 /N . Following the same reasoning as for N n τj p A3, we also get A4 ≤ KpA4 /N .  √ T p/2 Therefore, we obtain 4p/2−1 τ (j) A = E zT M z Z(j)T M z . 4 p p  j N,j j √ N N,j j  p p N n τj E[|A| ] ≤ A1 + A2 + A3 + A4 ≤ KpA/N . 12

√ √  T  T    1 1 τ 1 − ρ 1 1 1 1 √ √ T 1 τ − n √ √ T − D1 = zj ZN √ MfN,j 2 ZN τ τ ZN MN,j zj ZN √ N n τj 1 − (1 − ρn)cN n n τ τ n τj √ √  T  √    1 1 τ 1 − ρn 1 1 1 T T 1 τ D2 = zj − ZN √ MfN,j − ZN √ τ ZN MN,j zj − ZN √ N n τj 1 − (1 − ρn)cN n n τ n τj √ √  T  √ T    1 1 τ 1 − ρn 1 1 1 T 1 τ D3 = zj − ZN √ MfN,j − ZN τ √ ZN MN,j zj − ZN √ . N n τj 1 − (1 − ρn)cN n n τ n τj

T p p 1 1 1 √ The same reasoning holds for E[|B| ], giving E[|B| ] ≤ D = zT M √ √ Z τ p 1c j fN,j 2 N KpB/N . As for the moments of |C|, it is similar to how we n τ τ √ T dealt with A1:  1 τ  1 1 T 1 √ D = Z M √ √ Z ττ. " √ 2p # 1d N √ fN,j 2 N n τj n τ τ 1 1 τ p E[|C|p] ≤ E Y √ kC kpkM k p N N N,j 2p 2p N n τj Our aim is to prove that E[|D1b| ] ≤ Kpb, E[|D1c| ] ≤ Kpc 2p p " √ 2p# and E[|D | ] ≤ K . Following the analysis of E[|A| ] and kC kp 1 τ 1d pd ≤ N E[|C|p], we obtain E[|D |2p] ≤ K . For E[|D |2p], we p p p E YN √ 1b pb 1d N (γ(ρn) + `) ρn n τj have K ≤ pC 2p p . E[|D1d| ] N " # 1 1 4p 2p 1 √ 4p ≤ √ E M kC k2p Y τ B. Moments of |D| 2p p fN,j N N n τj τ n 1 1 1 We denote √ = ( √ , ..., √ ) and rewrite D as 2p 4p 4p τ τ1 τn kCN k 1 1 √ ≤ √ E YN τ 2p p 2p 2p D = D1 + D2 + D3 n τj (γ(ρn) + `) ρn τ n

≤ Kpd. with D1, D2 and D3 at the top of the page. We aim to p p prove E[|D| ] ≤ KpD/N , which is achieved by prov- Let us now establish the inequality for D1c. We can see | |p ≤ p | |p ≤ p ing E[ D1 ] KpD1 /N , E[ D2 ] KpD2 /N and that zj is not independent of MfN,j, thus we cannot follow p p E[|D3| ] ≤ KpD3 /N . the same procedure as for our analysis of A to determine the Let’s first analyze with the analysis of and 2p D1 D2 D3 order of E[|D1c| ]. Instead, we divide MfN,j into two parts, following similarly. We obtain one that is independent of zj and the other the remainder. P T p We first write 6 z˜tz˜t = E + F, where E and F are E[|D1| ] t=j defined at the bottom of the page. Note that E is independent (a)  p 1 1 − ρn ≤ 1/2 | |2p 1/2 | |2p of zj and F is not. Then D1c can be rewritten as (29) at the p E [ D1a ]E [ D1b ] N 1 − (1 − ρn)cN top of the next page. Using Jensen’s inequality, (b) 1  1 − ρ p ≤ n | |2p ≤ 2p−1 | |2p | |2p  p E[ D1c ] 2 E[ G ] + E[ H ] , N 1 − (1 − ρn)cN 1/2p 2p 1/2p 2p p 1/2 2p where G and H are the two terms on the RHS of (29). Next × (E [|D1c| ] + E [|D1d| ]) E [|D1b| ] (28) we can use the same technique as used in Appendix F-A to 2p 2p where (a) follows from the Cauchy-Schwarz inequality, (b) prove that E[|G| ] ≤ KpG and E[|H| ] ≤ KpH . Therefore, 2p follows from Minkowsky’s inequality, and we obtain E[|D1c| ] ≤ Kpc. 2p √ T Thus far, we have proven that E[|D | ] ≤ K ,  1 τ  1 1 T 1 √ 1b pb − √ √ E[|D |2p] ≤ K , and E[|D |2p] ≤ K . Coming back to D1a = zj ZN √ MfN,j 2 ZN τ 1c pc 1d pd n τj n τ τ p p √ (28), we obtain E[|D1| ] ≤ KpD1 /N . √   p 1 T T 1 τ Following similar arguments to our analysis of E[|D1| ], D1b = τ ZN MN,j zj − ZN √ p p p n n τj we can also obtain E[|D2| ] ≤ KpD2 /N and E[|D3| ] ≤

T T T  T      T (j) (j) 1 (j) 1 (j)p (j) 1 (j)p (j) (j) 1 1 1 1 (j)p (j) (j)p (j) E=ZN ZN − ZN √ ZN τ − ZN τ ZN √ + √ √ ZN τ ZN τ n τ (j) n τ n2 τ (j) τ (j)  T  T  T 1 (j) 1 √ T 1 √ (j) 1 τj 1 1 T 1 1 1 (j)p (j)√ T F = − ZN √ τjzj − τjzj ZN √ + √ √ zjzj + √ √ ZN τ τjzj n τ (j) n τ (j) n2 τ (j) τ (j) n2 τ (j) τ (j)  T  T 1 1 1 √ (j)p (j) + √ √ τjzj ZN τ . n2 τ (j) τ (j) 13

 1 − ρ 1 −1 1 1 T 1 √  1 − ρ 1  n √ √ τ n D1c =zj E + (γ(ρn) + `)ρnIN 2 ZN τ + zj MfN,j − F 1 − (1 − ρn)cN n n τ τ 1 − (1 − ρn)cN n  1 − ρ 1 −1 1 1 T 1 √ n √ √ τ × E + (γ(ρn)+`)ρnIN 2 ZN ττ. (29) 1−(1 − ρn)cN n n τ τ

p KpD3 /N . As D = D1+D2+D3, by Minkowsky’s inequality, [21] R. Cont, “Empirical properties of asset returns: Stylized facts and we obtain E[|D|p] ≤ K /N p. statistical issues,” Quant. Finance, vol. 1, no. 2, pp. 223–236, 2001. pD [22] P. J. Huber, “Robust estimation of a ,” Ann. Math. Statist., vol. 35, no. 1, pp. 73–101, 1964. REFERENCES [23] D. E. Tyler, “A distribution-free M-estimator of multivariate scatter,” Ann. Statist., vol. 15, no. 1, pp. 234–251, 1987. [1] H. Markowitz, “Portfolio selection,” J. Finance, vol. 7, no. 1, pp. 77–91, [24] R. A. Maronna, “Robust M-estimators of multivariate location and Mar. 1952. scatter,” Ann. Statist., vol. 4, no. 1, pp. 51–67, 1976. [2] R. C. Merton, “On estimating the expected return on the market: An [25] J. T. Kent and D. E. Tyler, “Redescending M-estimates of multivariate exploratory investigation,” J. Finan. Econ., vol. 8, no. 4, pp. 323–361, location and scatter,” Ann. Statist., vol. 19, no. 4, pp. 2102–2119, 1991. Dec. 1980. [26] R. Couillet, F. Pascal, and J. W. Silverstein, “Robust estimates of [3] R. Jagannathan and T. Ma, “Risk reduction in large portfolios: Why covariance matrices in the large dimensional regime,” IEEE Trans. Inf. imposing the wrong constraints helps,” J. Finance, vol. 58, no. 4, pp. Theory, vol. 60, no. 11, pp. 7269–7278, Sept. 2014. 1651–1684, Aug. 2003. [27] ——, “The random matrix regime of Maronna’s M-estimator with [4] N. El Karoui et al., “High-dimensionality effects in the Markowitz elliptically distributed samples,” arXiv preprint arXiv:1311.7034, 2013. problem and other quadratic programs with linear constraints: Risk [28] T. Zhang, X. Cheng, and A. Singer, “Marchenko-Pastur law for Tyler’s underestimation,” Ann. Statist., vol. 38, no. 6, pp. 3487–3566, 2010. and Maronna’s M-estimators,” arXiv preprint arXiv:1401.3424, 2014. [5] N. E. Karoui, “On the realized risk of high-dimensional Markowitz [29] F. Pascal, Y. Chitour, and Y. Quek, “Generalized robust shrinkage portfolios,” SIAM J. Finan. Math., vol. 4, no. 1, pp. 737–783, 2013. estimator and its application to STAP detection problem,” IEEE Trans. [6] Z. Bai, H. Liu, and W.-K. Wong, “Enhancement of the applicability of Signal Process., vol. 62, no. 21, pp. 5640–5651, Nov. 2014. Markowitz’s portfolio optimization by utilizing random matrix theory,” [30] Y. Abramovich and N. K. Spencer, “Diagonally loaded normalised Math. Finance, vol. 19, no. 4, pp. 639–667, 2009. sample matrix inversion (LNSMI) for -resistant adaptive filtering,” [7] J.-P. Bouchaud and M. Potters, “Financial applications of random matrix in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), theory: A short review,” in Chapter 40 of The Oxford Handbook of vol. 3, Honolulu, HI, Apr. 2007, pp. 1105–1108. Random Matrix Theory, G. Akemann, J. Baik, and P. D. Francesco, [31] Y. Chen, A. Wiesel, and A. O. Hero, “Robust shrinkage estimation of Eds. Oxford, U.K.: Oxford Univ. Press, 2011. high-dimensional covariance matrices,” IEEE Trans. Signal Process., [8] L. K. Chan, J. Karceski, and J. Lakonishok, “On portfolio optimization: vol. 59, no. 9, pp. 4097–4107, Sept. 2011. Forecasting and choosing the risk model,” Rev. Finan. [32] R. Couillet and M. R. McKay, “Large dimensional analysis and op- Studies, vol. 12, no. 5, pp. 937–974, 1999. timization of robust shrinkage covariance matrix estimators,” J. Mult. [9] J. Fan, Y. Fan, and J. Lv, “High dimensional covariance matrix estima- Anal., vol. 131, pp. 99–120, 2014. tion using a factor model,” J. , vol. 147, no. 1, pp. 186–197, [33] F. Rubio, X. Mestre, and D. P. Palomar, “Performance analysis and Nov. 2008. optimal selection of large minimum variance portfolios under estimation [10] O. Ledoit and M. Wolf, “Improved estimation of the covariance matrix risk,” IEEE J. Sel. Topics Signal Process., vol. 6, no. 4, pp. 337–350, of stock returns with an application to portfolio selection,” J. Empir. Aug. 2012. Finance, vol. 10, no. 5, pp. 603–621, Dec. 2003. [34] V. DeMiguel, L. Garlappi, and R. Uppal, “Optimal versus naive diversi- [11] ——, “A well-conditioned estimator for large-dimensional covariance fication: How inefficient is the 1/n portfolio strategy?” Rev. Finan. Stud., matrices,” J. Multivar. Anal., vol. 88, no. 2, pp. 365–411, Feb. 2004. vol. 22, no. 5, pp. 1915–1953, 2009. [12] ——, “Nonlinear shrinkage of the covariance matrix for portfolio [35] A. C. MacKinlay and L. Pastor, “Asset pricing models: Implications selection: Markowitz meets Goldilocks,” Available at SSRN 2383361, for expected returns and portfolio selection,” Rev. Finan. Stud., vol. 13, 2014. no. 4, pp. 883–916, 2000. [13] L. Laloux, P. Cizeau, J.-P. Bouchaud, and M. Potters, “Noise dressing of [36] O. Ledoit and M. Wolf, “Robust performance hypothesis testing with financial correlation matrices,” Phys. Rev. Lett., vol. 83, no. 7, p. 1467, the Sharpe ratio,” J, Empir. Finance, vol. 15, no. 5, pp. 850–859, Dec. Aug. 1999. 2008. [14] L. Laloux, P. Cizeau, M. Potters, and J.-P. Bouchaud, “Random matrix [37] ——, “Robust performances hypothesis testing with the variance,” theory and financial correlations,” Internat. J. Theoret. Appl. Finance, Wilmott, vol. 2011, no. 55, pp. 86–89, 2011. vol. 3, no. 3, pp. 391–397, July 2000. [38] D. N. Politis and J. P. Romano, “The stationary bootstrap,” J. Amer. [15] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, T. Guhr, Statist. Assoc., vol. 89, no. 428, pp. 1303–1313, 1994. and H. E. Stanley, “Random matrix approach to cross correlations in [39] D. Morales-Jimenez, R. Couillet, and M. R. McKay, “Large dimensional financial data,” Phys. Rev., vol. 65, no. 6, p. 066126, June 2002. analysis of robust M-estimators of covariance with outliers,” arXiv [16] V. Dahirel, K. Shekhar, F. Pereyra, T. Miura, M. Artyomov, S. Talsania, preprint arXiv:1503.01245, 2015. T. M. Allen, M. Altfeld, M. Carrington, D. J. Irvine et al., “Coordinate [40] P. Jorion, “Bayesian and CAPM estimators of the means: Implications linkage of HIV evolution reveals regions of immunological vulnerabil- for portfolio selection,” J. Banking Finance, vol. 15, no. 3, pp. 717–727, ity,” Proc. Natl. Acad. Sci. USA, vol. 108, no. 28, pp. 11 530–11 535, June 1991. July 2011. [41] V. K. Chopra and W. T. Ziemba, “The effect of errors in means, [17] A. Quadeer, R. Louie, K. Shekhar, A. Chakraborty, I. Hsing, and , and covariances on optimal portfolio choice,” J. Portfolio M. McKay, “Statistical linkage of mutations in the non-structural pro- Management, vol. 19, no. 2, pp. 6–11, Dec. 1993. teins of Hepatitis C virus exposes targets for immunogen design,” J. [42] Z. Bai and J. W. Silverstein, “No eigenvalues outside the support of Virol., vol. 88, no. 13, pp. 7628–7644, Apr. 2014. the limiting spectral distribution of large-dimensional sample covariance [18] D. L. Donoho, M. Gavish, and I. M. Johnstone, “Optimal shrink- matrices,” Ann. Probab., vol. 26, no. 1, pp. 316–345, 1998. age of eigenvalues in the spiked covariance model,” arXiv preprint [43] L. Ahlfors, Complex Analysis: An Introductin to The Theory of Analytic arXiv:1311.0851, 2013. Functions of One Complex Variable. New York: McGraw-Hill, 1978. [19] J. Fan, J. Zhang, and K. Yu, “Vast portfolio selection with gross-exposure [44] Z. Bai and J. W. Silverstein, Spectral analysis of large dimensional constraints,” J. Amer. Statist. Assoc., vol. 107, no. 498, pp. 592–606, random matrices, 2nd ed. New York, USA: Springer Series in Statistics, 2012. 2009. [20] V. DeMiguel, L. Garlappi, F. J. Nogales, and R. Uppal, “A generalized approach to portfolio optimization: Improving performance by constrain- ing portfolio norms,” Manage. Sci., vol. 55, no. 5, pp. 798–812, Mar. 2009.