Information-theoretic approaches to portfolio selection
Nathan LASSANCE
Doctoral Thesis 2 | 2020
Université catholique de Louvain
LOUVAIN INSTITUTE OF DATA ANALYSIS AND MODELING IN ECONOMICS AND STATISTICS (LIDAM)
Universite´ catholique de Louvain Louvain School of Management LIDAM & Louvain Finance
Doctoral Thesis
Information-theoretic approaches to portfolio selection
Nathan Lassance
Thesis submitted in partial fulfillment of the requirements for the degree of Docteur en sciences ´economiques et de gestion
Dissertation committee: Prof. Fr´ed´ericVrins (UCLouvain, BE), Advisor Prof. Kris Boudt (Ghent University, BE) Prof. Victor DeMiguel (London Business School, UK) Prof. Guofu Zhou (Washington University, USA) Prof. Marco Saerens (UCLouvain, BE), President
Academic year 2019-2020
“Find a job you enjoy doing, and you will never have to work a day in your life.”
Mark Twain
Contents
Abstract vii
Acknowledgments ix
Research accomplishments xii
List of Figures xii
List of Tables xv
List of Notation xvii
Introduction1
1 Research background7 1.1 Mean-variance approaches...... 7 1.1.1 Definitions...... 8 1.1.2 Estimation risk...... 10 1.1.3 Robust mean-variance portfolios...... 13 1.2 Higher-moment approaches...... 19 1.2.1 Efficient portfolios...... 21 1.2.2 Downside-risk criteria...... 23 1.2.3 Indirect approaches...... 25 1.3 Risk-parity approaches...... 25 1.3.1 Asset-risk parity...... 26 1.3.2 Factor-risk parity...... 28 1.3.3 Criticisms...... 29 1.4 Information-theoretic approaches...... 30 1.5 Thesis contributions...... 32
2 Minimum R´enyi entropy portfolios 35 2.1 Introduction...... 35 2.2 The notion of entropy...... 36 2.2.1 Shannon entropy...... 37 2.2.2 R´enyi entropy...... 37 2.3 R´enyi entropy and risk measurement...... 39 2.3.1 Connection with deviation risk measures...... 39 2.3.2 The subadditivity property...... 40 2.3.3 Exponential R´enyi entropy as a flexible risk measure...... 42 2.3.4 Appeal of the case α ∈ [0, 1]...... 44 2.4 R´enyi entropy and portfolio selection...... 45
iii Contents iv
2.4.1 Definition...... 45 2.4.2 Entropy and higher moments: A Gram-Charlier expansion.... 45 2.5 Robust m-spacings estimation...... 49 2.5.1 Motivation and expression for the m-spacings estimator...... 49 2.5.2 Properties of the m-spacings estimator...... 51 2.5.2.1 Asymptotic bias...... 51 2.5.2.2 Robustness to outliers...... 52 2.6 Out-of-sample performance...... 53 2.6.1 Data and methodology...... 53 2.6.2 Results...... 56 2.7 Conclusion...... 58 2.8 Appendix...... 59 2.8.1 Proofs of results...... 59 2.8.1.1 Proposition 2.1...... 59 2.8.1.2 Theorem 2.1...... 59 2.8.1.3 Proposition 2.2...... 63 2.8.1.4 Theorem 2.2...... 63 2.8.1.5 Proposition 2.3...... 65 2.8.1.6 Proposition 2.4...... 67 2.8.2 Additional empirical results...... 68
3 Optimal portfolio diversification via independent component analysis 71 3.1 Introduction...... 71 3.2 PCA versus ICA: from decorrelation to independence...... 75 3.2.1 Principal component analysis...... 76 3.2.2 Independent component analysis...... 77 3.2.3 Numerical illustration...... 79 3.3 Factor-variance parity via uncorrelated factors...... 81 3.3.1 Derivation of the factor-variance-parity portfolios...... 81 3.3.2 Arbitrariness of the decorrelation criterion...... 83 3.4 Higher-moment diversification via ICA...... 85 3.4.1 IC-variance-parity portfolios...... 85 3.4.2 Kurtosis diversification...... 86 3.4.3 Data-driven shrinkage portfolio...... 89 3.5 Factor-risk parity with higher-moment risk measures...... 90 3.5.1 Parsimonious estimation of higher moments with ICs...... 90 3.5.2 IC-risk parity with modified Value-at-Risk...... 91 3.6 Out-of-sample performance...... 94 3.6.1 Data and methodology...... 94 3.6.2 Calibration of K and δ ...... 96 3.6.3 Results...... 96 3.7 Conclusion...... 99 3.8 Appendix...... 100 3.8.1 Proofs of results...... 100 3.8.1.1 Proposition 3.1...... 100 Contents v
3.8.1.2 Theorem 3.1...... 101 3.8.1.3 Theorem 3.2...... 102 3.8.1.4 Excess kurtosis of PCVP portfolios in Example 3.5... 104 3.8.1.5 Proposition 3.2...... 105 3.8.1.6 Proposition 3.3...... 105 3.8.2 FastICA algorithm...... 106 3.8.3 Theoretical properties of IC-kurtosis-parity portfolios...... 108 3.8.4 Long-only factor-variance-parity portfolios...... 109
4 Robust portfolio selection using sparse estimation of comoment tensors113 4.1 Introduction...... 113 4.2 Dimension reduction...... 117 4.2.1 Curse of dimensionality...... 117 4.2.2 Reducing dimensionality via principal components...... 118 4.2.3 Approximation of comoment tensors...... 120 4.3 Sparse higher-comoment tensors...... 121 4.3.1 Independent factor model...... 122 4.3.2 Approximation via independent component analysis...... 123 4.3.3 Sparse estimate of optimal portfolio...... 124 4.4 Empirical analysis...... 126 4.4.1 Data and methodology...... 126 4.4.2 Independence of PCs versus ICs...... 128 4.4.3 Results...... 129 4.5 Conclusion...... 133 4.6 Appendix...... 134 4.6.1 Proofs of results...... 134 4.6.1.1 Proposition 4.1...... 134 4.6.1.2 Corollary 4.1...... 134 4.6.1.3 Theorem 4.1...... 135 4.6.1.4 Proposition 4.2...... 135 4.6.2 Robustness test: daily returns...... 135
5 Portfolio selection: A target-distribution approach 137 5.1 Introduction...... 137 5.2 Minimum-divergence portfolio...... 142 5.2.1 General formulation...... 142 5.2.2 The Kullback-Leibler divergence...... 143 5.3 Targeting a generalized-normal return distribution...... 144 5.3.1 Minimum-divergence portfolio under a generalized-normal target. 145 5.3.2 The case of Gaussian asset returns...... 146 5.4 Targeting a Gaussian return distribution...... 149 5.4.1 Decomposition of the KL divergence...... 149 5.4.2 The Dirac-delta target-return distribution...... 150 5.5 A reference-portfolio approach...... 153 5.6 Estimation of the minimum-divergence portfolio...... 155 Contents vi
5.6.1 Estimation for the generalized-normal target return...... 156 5.6.2 Estimation of the portfolio-return entropy H(P )...... 156 5.6.3 Estimation of the expectation E[|P − αˆ|γ]...... 156 5.6.4 Estimation for the Gaussian target return...... 158 5.7 Out-of-sample performance...... 158 5.7.1 Data and methodology...... 158 5.7.2 Reported portfolio strategies...... 162 5.7.3 Results...... 163 5.7.3.1 Full sample...... 163 5.7.3.2 Financial crisis...... 166 5.8 Conclusion...... 168 5.9 Appendix...... 169 5.9.1 Proofs of results...... 169 5.9.1.1 Proposition 5.1...... 169 5.9.1.2 Proposition 5.2...... 170 5.9.1.3 Theorem 5.1...... 171 5.9.1.4 Theorem 5.2...... 171 5.9.1.5 Proposition 5.3...... 171 5.9.1.6 Theorem 5.3...... 172 5.9.1.7 Proposition 5.4...... 173 5.9.1.8 Corollary 5.1...... 173 5.9.2 Choice of kernel bandwidth...... 174 5.9.3 Out-of-sample performance of all considered portfolio strategies. 174 5.9.3.1 Comparison with reference portfolios...... 174 5.9.3.2 Comparison with higher-moment portfolios...... 178 5.9.3.3 Results for the generalized-normal target with γ = 4.. 179 5.9.4 Entropy and diversification...... 180 5.9.4.1 The case of i.i.d. asset returns...... 180 5.9.4.2 Empirical point of view...... 182
6 Conclusion 185 6.1 Summary of main results...... 185 6.2 Open questions for future research...... 188 6.2.1 Specific questions...... 188 6.2.2 General questions...... 195 6.2.2.1 Properties of independent components...... 195 6.2.2.2 The notion of diversification...... 196
References 199 Abstract
Ever since modern portfolio theory was introduced by Harry Markowitz in 1952, a plethora of papers have been written on the mean-variance investment problem. However, due to the non-Gaussian nature of asset returns, the mean and variance statistics are insufficient to adequately represent their full distribution, which depends on higher moments too. Higher-moment portfolio selection is however more complex; a smaller literature has been dedicated to this problem and no consensus emerges about how investors should allocate their wealth when higher moments cannot be ignored. Among the proposed alternatives, researchers have recently considered information theory, and entropy in particular, as a new framework to tackle this problem. Entropy provides an appealing criterion as it measures the amount of randomness embedded in a random variable from the shape of its density function, thus accounting for all moments. The application of information theory to portfolio selection is however nascent and much remains to explore. Therefore, in this thesis, we aim to explore the portfolio-selection problem from an information-theoretic angle, accounting for higher moments.
We review the relevant literature and mathematical concepts in Chapter1. Then, we consider in Chapter2 a natural alternative to the popular minimum-variance portfolio strategy using R´enyi entropy as information-theoretic criterion. We show that the expo- nential R´enyi entropy fulfills natural properties as a risk measure. However, although R´enyi entropy has some nice features, we show that it can be an undesirable investment criterion because it may lead to portfolios with worse higher moments than minimizing the variance. For this reason, we turn in chapters3 to5 to different ways of applying entropy, thereby revisiting two popular frameworks—risk parity and expected utility—to account for higher moments.
In Chapter3, we investigate the factor-risk-parity portfolio—a popular strategy among practitioners—that aims to diversify the portfolio-return risk across uncorrelated factors underlying the asset returns. We show that although principal component analysis (PCA) is very useful for dimension reduction, its resulting factor-risk-parity portfolio is suboptimal. Indeed, PCA merely provides one choice of uncorrelated factors out of infinitely many others, and one would prefer to be diversified over independent factors rather than merely uncorrelated ones. Instead, thus, we propose to diversify the risk across maximally independent factors, provided by independent component analysis (ICA). We Abstract viii show theoretically that this solves the issues related to principal components and provides a natural way of reducing the kurtosis of portfolio returns.
In Chapter4, we apply ICA in a different way in order to obtain robust estimates of moment-based portfolios, such as those based on expected utility. It is well known that these portfolios are difficult to estimate, particularly in high dimensions, because the number of comoments quickly explodes with the number of assets. We propose to address this curse of dimensionality by projecting the asset returns on a small set of maximally independent factors provided by ICA, and neglecting their remaining dependence. In doing so, we obtain sparse approximations of the comoment tensors of asset returns. This drastically decreases the dimensionality of the problem and leads to well-performing and computationally efficient investment strategies with low turnover.
In Chapter5, we introduce an alternative approach to the utility function to capture investors’ preferences. The latter is praised by academics but is difficult to specify when higher moments matter. Because investors ultimately care about the distribution of their portfolio returns, our proposal is to capture their preferences via a target-return distribution. The optimal portfolio is then the one whose distribution minimizes the Kullback-Leibler divergence with respect to the target distribution. Our theoretical exploration shows that Shannon entropy plays a central role as higher-moment criterion in this framework, and our empirical analysis confirms that this strategy outperforms mean-variance portfolios out of sample. Acknowledgments
When I embarked in my PhD journey more than three years ago, holding this thesis in my hands was for me the final, faraway destination. Today, while writing these lines, I see it more as the first piece of the research career I aspire to pursue. Looking back, I am thankful to many people for making me fall in love with scientific research, and making these years so enjoyable.
I could not have hoped for a better advisor and mentor than Prof. Fr´ed´ericVrins to guide me in this journey. Working by your side from the beginning of my master’s thesis to the end of this PhD thesis was truly inspiring and entertaining. Far from being just a supervisor, you were continuously involved in my projects, and few PhD students can boast this. Many times, I went on Overleaf late in the evening and was surprised finding you scratch your head on some mathematical problem we discussed during the day. I keep being inspired by your inane curiosity, rigorousness and broad knowledge; talking with you about entropy, portfolio selection, independent component analysis and the like was always captivating. I will be forever grateful for everything you have taught me; our collaboration has led to several papers that I am very proud of. I will do my best to pass on this legacy in the future.
I wish to warmly thank the members of my dissertation committee. Prof. Marco Saerens (UCLouvain), for presiding the committee. Your course on Java programming during my bachelor’s degree was by far among my favorite ones. Prof. Kris Boudt (Ghent University), you are the author I cite the most in this thesis, and this tells a lot about how influential your work has been to me. Your continuous feedback on my papers has definitely contributed to the quality of this work. I am thankful for that. My greatest recognition goes to Prof. Victor DeMiguel (London Business School), a renowned expert in portfolio selection. I wouldn’t have imagined having the chance to work with someone of your standing. While one may think that top professors are inaccessible, you are actually one of the nicest scholars I have met, and I am thankful you graciously accepted my research visit at LBS. Needless to say that I learnt a lot during those three months, particularly on scientific writing during our several day-long rewriting of our joint paper. I keep fond memories of those days. Finally, I am overwhelmed to say that Prof. Guofu Zhou (Washington University), another recognized expert in finance and portfolio selection, has accepted to join the committee. Thank you, I feel truly honored. Acknowledgments x
I am thankful to other people I discussed research with. Prof. Mikael Petitjean (UCLouvain), your practical point of view was very refreshing. Thank you for your motivating enthusiasm on my work. Prof. Gianluca Fusai (Cass Business School) and Domenico Mignacca (Qatar Investment Authority), for discussions on diversification measurement in portfolio selection. Prof. Massimo Guidolin (Bocconi University), for accepting my post-doctoral visit in a few months. I look forward to working with you. I also thank conference participants and anonymous reviewers for the numerous feedback on my papers. I had the pleasure to pass on some of my knowledge to master’s thesis students: Alexander Ackaert, Guillaume Dequenne, Emmanuel Masson, Lo¨ıcParmentier and Rodolphe Vanderveken. Thank you guys, I hope you had a good time. Finally, I enjoyed sharing my office with you, Cheikh Mbaye. Witnessing your impressive mathematical skills made me stay humble about mine.
This thesis wouldn’t have been possible without financial and administrative support. The aspirant FNRS grant of the Fonds de la Recherche Scientifique (F.R.S.-FNRS) allowed me to pursue research in the best possible conditions. I am incredibly grateful for the research focus it provided me during my PhD. I also thank the F´ed´erationWallonie- Bruxelles for financing my research visit at the London Business School. My gratitude goes to Sandrine Delhaye, who made the process around the doctoral program so smooth. I was proud to work within the LIDAM institute and the CORE research center. Thank you for hosting me and giving me a nice office. Working in such an active environment was exciting. I particularly thank Catherine Germain for her help in all the administrative processes. Finally, it was great being a fellow of the Louvain Finance (LFIN) research center. Thank you to all those involved in organizing high-quality seminars, conferences, doctoral days and courses, among other activities.
Life wouldn’t be as fun without friends around. Les deux sacr´es,Matthieu and Schouls. Thanks for giving me the motivation to run a marathon, and never failing to make me laugh. Le Vince, I look forward to your own PhD defense! Maxime and Adeline, thanks for keeping me company and cooking so many delicious meals when I was living in London. Greg, for our traditional Mexicain-Guinness-MarioKart night out, and for being my fervent partner in supporting Les Diables Rouges. Chlo´e,Greg, Mathou, Meg and Paul for our many board game nights. Le groupe des vrais amis, my oldest group of friends, for visiting me in London.
My family constitutes the most important people in my life. Aubry, you often tell me that I am your source of inspiration. Believe me, I feel just the same. I am so impressed by Acknowledgments xi your capacity of following your passions, and for finishing your first book just a few weeks ago. Somehow, we both found our own way to get published :) I am proud to be your brother. Maman, I am afraid that listing all the ways in which you have supported me up to this day would make these acknowledgments longer than the thesis itself... Thank you for your unconditional love, trust and support. This thesis wouldn’t exist without you. Zorro, for your daily dose of cuteness and craziness, and for holding me company when I was working from home. Finally, my girlfriend’s family: Eric, Yvelise and Gauthier. Thank you for always making me feel at home in Mussy.
Chlo´e,you know better than anyone else how much this PhD mattered to me, and I cannot thank you enough for your support, even when I was living abroad. Your smile, your love and your courage keep uplifting me every day. I feel so lucky to have you by my side, these last seven years have been wonderful. I am also indebted to you for offering me the notebooks in which I developed most of the mathematical results in this thesis. I dedicate this work to you. Research accomplishments
International journal papers
Lassance, N., & Vrins, F. (2018). A comparison of pricing and hedging performances of equity derivatives models. Applied Economics, 50(10):1122–1137. Lassance, N., & Vrins, F. (2019). Minimum R´enyi entropy portfolios. Annals of Operations Research, https://doi.org/10.1007/s10479-019-03364-2.
Papers in progress (submitted work)
Lassance, N., DeMiguel, V., & Vrins, F. (2019). Optimal portfolio diversification via independent component analysis. London Business School working paper. Lassance, N., & Vrins, F. (2019). Robust portfolio selection using sparse estimation of comoment tensors. Louvain Finance working paper. Lassance, N., & Vrins, F. (2019). Portfolio selection: A target-distribution approach.
International conference and seminar presentations
“A comparison of pricing and hedging performances of equity derivatives models”: • Internal seminar for the quant team of ING (Brussels, BE, 2019)
“Minimum R´enyi entropy portfolios”: • PhD day in math finance and financial engineering (Louvain-la-Neuve, BE, 2017) • Actuarial and Financial Mathematics Conference (Brussels, BE, 2018) • 35th Annual Conference of the French Finance Association (Paris, FR, 2018) • Belgian Financial Research Forum (Brussels, BE, 2018)
“Optimal portfolio diversification via independent component analysis”: • INFORMS Annual Meeting (Phoenix, USA, 2018) • Louvain Finance internal seminar (Mons, BE, 2019) • Actuarial and Financial Mathematics Conference (Brussels, BE, 2019) • Financial Econometrics Conference (Toulouse, FR, 2019, presented by V. DeMiguel)
“Portfolio selection: A target-distribution approach”: • 9th General AMaMeF Conference (Paris, FR, 2019) • Louvain Finance internal seminar (Mons, BE, 2019) • Paris Financial Management Conference (Paris, FR, 2019) • Actuarial and Financial Mathematics Conference (Brussels, BE, 2020, accepted) List of Figures
I.1 Minimum-variance portfolio on S&P500 and U.S. 10y Treasury Note...3
1.1 Sensitivity of modified VaR to volatility, skewness and kurtosis...... 24
2.1 The exponential Shannon entropy violates subadditivity for two i.i.d. Student t random variables with ν < 1...... 41 2.2 Decreasing α increases the sensitivity to tail uncertainty of the exponential R´enyi entropy...... 44 2.3 Coefficients of the second-order Gram-Charlier expansion of R´enyi entropy 46 2.4 The minimum R´enyi entropy portfolio balances variance and kurtosis minimization...... 48 2.5 Impact of kurtosis on the exponential R´enyi entropy...... 49 2.6 Impact of the parameter m on the robustness to outliers of the m-spacings estimator of the exponential R´enyi entropy...... 53 2.7 Impact of the parameter m on the stability of the minimum R´enyi entropy portfolio...... 54
3.1 Mutual information versus correlation...... 80 3.2 Mutual information of principal versus independent components..... 81 3.3 Arbitrariness of the factor-variance-parity portfolio...... 85 3.4 Excess kurtosis of PC- and IC-variance-parity portfolios...... 88
4.1 Gain of parameters for the estimation of higher moments via independent versus principal components...... 125 4.2 Time evolution of number of factors K ...... 128 4.3 Non-linear correlation of principal versus independent components.... 131 4.4 Boxplots of the MMVaR, MMVaRPC and MMVaRIC portfolios for the 17Ind dataset...... 132
5.1 Schematic representation of the minimum-divergence portfolio...... 141 5.2 Generalized-normal target-return density for different values of γ ..... 146 5.3 Function C(γ) as a function of γ ...... 147 5.4 Minimum-divergence portfolio with Gaussian asset returns...... 148 5.5 Decomposition of the KL divergence for the Gaussian target return... 151 5.6 Illustration of the reference-portfolio approach...... 155 5.7 Boxplots of portfolio weights for the MV and G2MV portfolios...... 183
xiii
List of Tables
2.1 List of datasets considered in the empirical study of Chapter2...... 53 2.2 Out-of-sample performance of minimum-variance and minimum R´enyi entropy portfolios...... 57 2.3 Mean, volatility and break-even transaction cost for the minimum-variance and minimum R´enyi entropy portfolios...... 69
3.1 Out-of-sample performance of factor-risk-parity portfolios (1/2)..... 97 3.1 Out-of-sample performance of factor-risk-parity portfolios (2/2)..... 98 3.2 Out-of-sample performance of long-only factor-risk-parity portfolios... 111
4.1 Comparison of cardinality of comoment tensors...... 126 4.2 Out-of-sample performance of estimation strategies...... 130 4.3 Robustness test: Out-of-sample performance for daily returns...... 136
5.1 Minimum-divergence portfolio for the Gaussian target return...... 151 5.2 Dirac-delta minimum-divergence portfolio versus mean-variance portfolio 153 5.3 List of datasets considered in the empirical study of Chapter5...... 159 5.4 List of portfolio strategies considered in the empirical study of Chapter5 160 5.5 Out-of-sample performance during the full sample...... 164 5.6 Out-of-sample performance during the financial crisis...... 167 5.7 Out-of-sample performance of all considered portfolios (1/3)...... 175 5.7 Out-of-sample performance of all considered portfolios (2/3)...... 176 5.7 Out-of-sample performance of all considered portfolios (3/3)...... 177 5.8 Portfolio-weight concentration of minimum-divergence versus reference portfolios...... 183
xv
List of Notation
Typesetting convention a, b, c, d, k Real constants α, β, γ Real constants i, j, k Vector or matrix indices K,N Dimension t Time index x Column vector
xi ith entry of vector x A Matrix
Ai ith row of matrix A
Aij (i, j) entry of matrix A X Univariate random variable X(i:T ) ith order statistic of X from sample of size T X Multivariate random variable
Xi ith component of X
Vector and matrix operations x0 Transpose of vector x ||x|| Euclidean norm of vector x diag(x) Diagonal matrix made of entries of x sign(x) Vector of signs of entries of x L(wˆ, w) Expected quadratic-utility loss from approximating w by wˆ H[p] Inverse Herfindahl index of vector p A0 Transpose of matrix A A−1 Inverse of matrix A ||A|| Frobenius norm of matrix A det(A) Determinant of matrix A trace(A) Trace of matrix A ](A) Cardinality of matrix A A ⊗ B Kronecker (tensor) product of matrices (tensors) A and B
Statistical operators B θˆ Bias of estimator θˆ xvii List of Notation xviii
µX , E[X] Mean of X
µˆX Sample mean of X
µX Mean vector of X
µˆX Sample mean vector of X 2 σX , σX Standard deviation (volatility) and variance of X 2 σˆX , σˆX Sample standard deviation (volatility) and variance of X
RX Correlation matrix of X
ΣX Covariance matrix of X
Σb X Sample covariance matrix of X
mi,X ith central moment of X
ζX Skewness of X
κX Excess kurtosis of X
Mk(X) Comoment tensor/matrix of X of order k X? Standardized (zero-mean, unit-variance) copy of X
Information-theoretic operators
H[pX ] Shannon entropy of discrete distribution pX
H(X),H1(X) Shannon entropy of X
H[fX ],H1[fX ] Shannon entropy of density fX exp exp H1 (X),H1 [fX ] Exponential Shannon entropy of X with density fX
Hα(X),Hα[fX ] R´enyi entropy of X with density fX exp exp Hα (X),Hα [fX ] Exponential R´enyi entropy of X with density fX
Hbα(m, T ) m-spacings estimator of Hα(X) from sample of size T Hb(m, T ) m-spacings estimator of H(X) from sample of size T exp exp Hbα (m, T ) m-spacings estimator of Hα (X) from sample of size T GC Hα (X) Second-order Gram-Charlier expansion of Hα(X)
hfX |fY i Kullback-Leibler divergence between X and Y
hfX |fY iα R´enyi divergence between X and Y I(X) Mutual information of X
Particular functions
fX (·) Probability density function (pdf) of X
FX (·) Cumulative distribution function (cdf) of X −1 FX (·),QX (·) Quantile function (inverse cdf) of X
VaRX (·) Value-at-Risk of X
MVaRX (·) Modified Value-at-Risk of X List of Notation xix
CVaRX (·) Conditional Value-at-Risk of X φ(·) Standard Gaussian density Φ(·) Standard Gaussian cdf
zε Standard Gaussian quantile at confidence level ε P(·) Probability measure L(·) Lebesgue measure U(·) Utility function
σEF (·) Efficient-frontier volatility function G(·) FastICA entropy-estimation function ˆ EIFθˆ(·) Empirical influence function of estimator θ Γ(·) Gamma function ψ(·) Digamma function B(·, ·) Beta function
1F1(·) Confluent hypergeometric function
Particular random variables, vectors, matrices and sets X N asset returns Xˆ Projection of asset returns on first K principal components w N portfolio weights P,P (w) Portfolio return Pˆ Projection of portfolio return on first K principal components
wMV Minimum-variance portfolio
wˆMV,K Estimator of wMV based on first K principal components w Mean-variance portfolio with mean µ µ0 0
wλ Mean-variance-efficient portfolio with risk-aversion coefficient λ
wSR Maximum-Sharpe-ratio (tangent) portfolio
wMRE Minimum R´enyi entropy portfolio
wIC Minimum-variance IC-variance-parity portfolio T Target return or sample size w Minimum-divergence portfolio for a target-return density f fT T
wα,β,γ Minimum-divergence portfolio for a generalized-normal target
wα,β Minimum-divergence portfolio for a Gaussian target
wα Minimum-divergence portfolio for a Dirac-delta target
w0 Reference portfolio
wψ Portfolio minimizing a function ψ of moments
wˆψ,K Estimator of wψ based on K principal components List of Notation xx
† wˆψ,K Estimator of wψ based on K independent components µ Asset mean-return vector Σ Asset-return covariance matrix
Σb δ Shrinkage estimator of Σ with shrinkage intensity δ Σe Estimator of Σ based on first K principal components
VN Matrix of N eigenvectors of Σ V Matrix of K ≤ N eigenvectors of Σ
ΛN Diagonal matrix of N eigenvalues of Σ Λ Diagonal matrix of K ≤ N eigenvalues of Σ R K × K rotation matrix R(θ) 2 × 2 rotation matrix S K independent factors Y ? First K principal components Y Arbitrary rotation R of principal components R† Rotation matrix minimizing the mutual information of Y Y † K independent components ˆ † † † Mk(Y ) Approximation of tensor Mk(Y ) assuming Y are independent ˆ ˆ ˆ † Mk(X) Approximation of tensor Mk(X) assuming Y are independent w˜(R) Exposures on factors Y = RY ?
w˜, w˜(IK ) Exposures on principal components w˜(R†) Exposures on independent components 1 N-dimensional vector of ones 1± N-dimensional vector with entries in {−1, 1}
1K K ≤ N-dimensional vector of ones ± 1K K ≤ N-dimensional vector with entries in {−1, 1}
IN N × N identity matrix
ΩX Support set of random variable X W Set of allowed portfolios w Wf Set of allowed exposures on principal components Wf† Set of allowed exposures on independent components Wf(R) Set of allowed exposures on factors Y = RY ? K−1 ? WFVP (R) Set of 2 factor-variance-parity portfolios for factors Y = RY SO(K) Set of K × K rotation matrices
Other symbols σ Volatility contribution of asset X to the portfolio return P Xi|P i List of Notation xxi
† MVaR † ˆ(ε) Modified VaR contribution of independent component Yi Yi |P to the reduced portfolio return Pˆ ˆGME fP Gaussian-mixture estimator of the density fP ˆKDE fP Kernel-density estimator of the density fP N (µ, σ) Gaussian law N (µ, Σ) Multivariate Gaussian law GN (α, β, γ) Generalized-normal law T (ν) Student t law with ν degrees of freedom
δα Dirac-delta density centered in α C(w; f ) Objective function for the the minimum-divergence portfolio w T fT
C(w; α, β, γ) Objective function for the minimum-divergence portfolio wα,β,γ given a generalized-normal target return T ∼ GN (α, β, γ)
C(w; α, β) Objective function for the minimum-divergence portfolio wα,β given a Gaussian target return T ∼ N (α, β)
C(w; α) Objective function for the minimum-divergence portfolio wα
given a Dirac-delta target-return density fT = δα ˆ C(w; γ) Estimate of C(w; α, β, γ) given a reference portfolio w0 ˆ C(w) Estimate of C(w; α, β) given a reference portfolio w0 δ Shrinkage intensity R Number of rolling windows O(N k) Polynomial in N of order k
κmin Minimum portfolio-return excess kurtosis
κmax Maximum portfolio-return excess kurtosis
κIC Excess kurtosis of an IC-variance-parity portfolio
κPC Excess kurtosis of a PC-variance-parity portfolio
ρG(X,Y ) Correlation of G-transform of X and Y
Introduction
How should I optimally allocate my wealth?, that is the investor’s question. The portfolio- selection problem is too multifaceted in nature to hope for the existence of one ideal solution, but providing a suitable framework to think about and solve this question remains of utter importance. Indeed, portfolio selection underpins a substantial portion of the activities conducted by financial institutions, private investors, as well as individuals for whom financial markets are made increasingly accessible at a low cost via recent developments such as exchange-traded funds and digital investment platforms. In turn, the portfolio-selection rules followed by these agents have important consequences on the financial and economic system as a whole.
The 1950s marked a turning point with the 1952 seminal paper of Harry Markowitz— “Portfolio selection”—for which he was awarded the Nobel Prize in 1990 (Markowitz 1952). Markowitz was the first to provide a rigorous mathematical framework to study the portfolio-selection problem, departing from heuristic and less systematic strategies used by investors until then. He introduced in particular the concept of mean-variance efficient frontier: a portfolio of assets is mean-variance efficient if and only if no other portfolio with the same mean return can achieve a lower variance. He also recognized the central notion of diversification, realizing that one can reduce variance by combining imperfectly correlated assets. The mean-variance theory of Markowitz was the start of the so-called modern portfolio theory, which deeply influenced financial markets and led down the road to a plethora of academic papers devoted to this theory and its extensions.
Markowitz’s theory exclusively relies on the mean and variance of portfolio returns to find the optimal portfolio. This makes sense if one of two sufficient conditions is met: investors’ preferences depend only on mean and variance, or the multivariate distribution of asset returns is Gaussian (more precisely, elliptical). Actually, Markowitz (2014) and others also showed that these sufficient conditions might not be necessary, because the optimal portfolio of the investor can often be well approximated by some portfolio on the mean-variance efficient frontier. Leaving that aside, the issue with Markowitz’s theory is that the two sufficient conditions above are often violated in practice. Many evidences contradict the first assumption; see the paper of Scott and Horvath (1980). It is well known, for example, that investors prefer distributions that are skewed to the right rather than skewed to the left. The second assumption is not realistic either; empirical asset-return
1 Introduction 2 distributions are not Gaussian and, in particular, display probabilities for extreme returns that are much higher than predicted by the Gaussian distribution; see the early work of the Belgian mathematician Mandelbrot (1963). Let us take an example to illustrate this non-Gaussian behavior. Consider a U.S. investor who wants to invest in the bond and equity markets. For this purpose, he considers an investment set made of the CBOE 10 year Treasury Note Yield index (TNX) and the S&P500 index (SPX). He collects daily prices from January 1962 to August 2019; see Figure I.1 for the historical returns of the two indices. He decides to follow the conservative minimum-variance portfolio strategy, and computes given the above data that the minimum-variance portfolio invests 36.93% in TNX and 63.07% in SPX. The corresponding portfolio-return annualized mean and volatility are 5.09% and 13.24%, respectively. The investor now wants to assess the historical tail risk associated with this portfolio. To do so, he computes the annualized 1% Value-at-Risk (VaR); that is, the minimum annual loss one can expect to face with a 1% probability. If the investor assumes that the asset returns are Gaussian, he can easily compute that the 1% VaR is given by 25.71%. However, is the Gaussian assumption reasonable? Figure I.1 shows that it is not: the minimum-variance portfolio-return density has much fatter tails than the Gaussian. In particular, the annualized 1% VaR implied by the actual historical density is 33.96%, which is 8.25 percentage points more than the Gaussian VaR. Thus, the Gaussian assumption underpinning the mean-variance portfolio-selection paradigm is dangerous. It underestimates the probability of extreme events and leads to suboptimal portfolios with respect to the higher moments of portfolio returns for investors who truly care about them.
For this reason, extensions of Markowitz’ mean-variance portfolio theory that account for the non-Gaussianity of asset returns have been put forward. However, such extensions are challenging, for two main reasons.
1. Higher moments are difficult to estimate accurately because they are very sensitive to outliers. For example, a sample of size 100 from a unit-variance random variable containing a value of 5 will have an excess kurtosis of at least 54/100 − 3 = 3.25.
2. Whereas mean-variance portfolio selection essentially boils down to the maximization of a quadratic utility function, the literature review of Chapter1 shows that going beyond the first two moments can be accomplished in many different ways. In particular, no consensus emerges from the literature about how investors should allocate their wealth when higher moments cannot be ignored. Introduction 3
Figure I.1: Minimum-variance portfolio on S&P500 and U.S. 10y Treasury Note (a) Additive returns 5 TNX SPX 4
3
2
1 Additive returns
0
-1 1962 1970 1980 1990 2000 2010 2019
(b) Density of minimum-variance portfolio 80 Density of min-variance portfolio 70 Corresponding Gaussian density 60
50
40
Density 30
20
10
0 -5% -4% -3% -2% -1% 0% 1% 2% 3% 4% 5% Daily return
Notes. The top figure depicts the additive returns of the CBOE 10 year Treasury Note Yield index (TNX) and the S&P500 index (SPX), from January 1962 to August 2019. The bottom figure depicts the return density of the minimum-variance portfolio, obtained via kernel density estimation with Gaussian kernels and the bandwidth of Silverman (1986). The Gaussian density with the same mean and variance is depicted on the bottom figure too.
Given these considerations, the purpose of this thesis is to tackle the higher-moment portfolio-selection problem. We accomplish this from a novel angle based on information theory. This branch of mathematics was born with the 1948 seminal paper of Claude Shannon: “A mathematical theory of communication” (Shannon 1948). Information theory was originally used in signal processing to answer different questions related to the information contained in data signals. In particular, Shannon (1948) showed that the limit to which a signal can be compressed without losing information is given by a quantity known as entropy. Intuitively, the higher the entropy of a signal, the more random it is, and the more difficult it is to compress. In physics, the second law of thermodynamics, according to which the entropy of a closed system always increases, is often rephrased as “disorder always increases.” In probability theory, the entropy of a random variable measures its randomness, or uncertainty. What is particularly interesting Introduction 4 in our portfolio-selection context is that entropy measures the uncertainty of the whole distribution of the random variable, going beyond the variance. Thus, because investors aim to minimize the risk of their portfolio returns, several researchers have recently proposed using entropy as an investment criterion; see Section 1.4 for a detailed review of the literature. However, its application to portfolio selection is merely nascent and much remains to explore about the potential benefits of entropy in this context.
In Chapter1, we begin by reviewing the relevant literature and mathematical con- cepts related to portfolio selection, namely: mean-variance approaches, higher-moment approaches, risk-parity approaches and information-theoretic approaches.
In Chapter2, we study a natural alternative to the popular minimum-variance portfolio strategy that consists in minimizing the R´enyientropy (R´enyi 1961) of portfolio returns. The R´enyi entropy recovers Shannon entropy as a special case, and has more flexibility as it features a parameter that one can play with to control the relative contributions of the central and tail parts of the distribution in the final entropy value. The proposed portfolio strategy coincides with the minimum-variance portfolio for Gaussian asset returns but, otherwise, also accounts for higher moments. The exponential of R´enyi entropy is shown to provide a more natural risk measure than the plain R´enyi entropy: it is closely connected to the class of deviation risk measures. However, it violates subadditivity in some extreme scenarios. Although R´enyi entropy has some desirable features, we show via a Gram-Charlier expansion and an empirical study that minimizing R´enyi entropy can yield portfolios with worse higher moments than minimizing the variance. Thus, whereas minimizing entropy may seem like a natural higher-moment alternative to minimizing the variance, it can actually be undesirable with respect to higher moments.
For this reason, we turn in chapters3 to5 to different ways of applying entropy in portfolio selection that, as we show, are theoretically appealing with respect to higher moments. In particular, we consider two popular frameworks. First, risk parity, which is popular among practitioners (Chapter3). Second, expected utility, which is praised by academics (Chapter4 and Chapter5). Our aim is to make them suitable in the presence of higher moments.
In Chapter3, we study a popular heuristic approach used by practitioners, called the factor-risk-parity portfolio, introduced by Meucci (2009). Its aim is to diversify the portfolio-return risk across uncorrelated factors underlying the asset returns. A natural choice of factors proposed in the literature are the principal components. However, as we show, the factor-risk-parity problem is actually ill-posed: there exist an infinite number Introduction 5 of uncorrelated factors, and they are all associated with a different factor-risk-parity portfolio. This makes the principal-components choice arbitrary. We propose to solve this arbitrariness issue by relying on factors that are maximally independent, rather than merely uncorrelated. These so-called independent components are obtained via independent component analysis (ICA), and correspond to the rotation of the principal components with minimum Shannon entropy. In addition, we show theoretically that relying on the independent components provides a natural way of reducing the portfolio- return kurtosis, which is desirable in terms of tail risk. This is confirmed on empirical data. Thus, we provide theoretical foundations for the use of factor-risk parity as higher-moment portfolio strategy, provided one relies on the independent components as factors.
Whereas risk parity is popular among practitioners, most academics dismiss it as a heuristic approach without clear theoretical foundations. Indeed, risk parity is normatively debatable as “diversification” is not, as such, a financial objective. Instead, academics favor the expected-utility approach, such as the mean-variance portfolio, which maximizes the expected portfolio-return utility. We turn to this framework in the next two chapters.
In Chapter4, we show that ICA provides a simple and elegant framework to obtain robust estimates of moment-based portfolios, such as those based on expected utility via a Taylor series expansion. These portfolios suffer from the curse of dimensionality because the number of comoments to estimate quickly explodes with the number of assets and the moment order considered. This severely impacts the stability and out-of- sample performance of the investment strategy. We tackle the curse of dimensionality by enhancing robustness via a sparse representation of the comoment tensors of asset returns. We achieve this by projecting the asset returns onto a small set of maximally independent factors, found via ICA. We show that this solves the curse of dimensionality: by neglecting the remaining dependence of the independent components, we drastically reduce the number of free parameters in the comoment tensors. This leads to well-performing, low-turnover investment strategies that are computationally efficient.
In Chapter5, we revisit the expected-utility framework. A difficulty in the application of expected utility is that, as we explain in Chapter1, specifying the utility function is arduous when higher moments matter. In particular, commonly used utility functions such as the constant-relative-risk-aversion one are locally quadratic, and lead to portfolios close to the mean-variance optimal one. Therefore, in Chapter5, we introduce an alternative way of capturing all investors’ preferences via a single function, which is more natural than the utility function. Specifically, we recognize that investors ultimately care for the Introduction 6 distribution of their portfolio returns, and we propose a framework whereby investors’ preferences are captured via a target-return distribution. The corresponding strategy is the minimum-divergence portfolio, whose return distribution is as close as possible to the target-return one, as measured by the Kullback-Leibler divergence (relative entropy). Relying on a generalized-normal target return, we show that we recover Markowitz’s efficient frontier when asset returns are Gaussian and when targeting a Dirac-delta distribution. For the Gaussian target return, we show that the objective function admits a natural decomposition in three terms that respectively measure the fit to the target- return mean, variance and higher moments. The latter are naturally accounted for via the Shannon entropy of standardized portfolio returns, which drives higher moments toward those of the Gaussian. Empirical results confirm that our strategy helps to improve higher moments compared to mean-variance portfolios.
The thesis contributions raise open questions to be addressed in future research, some of which are discussed in Chapter6. All original results are stated in propositions, and the most important ones in theorems. Appendices at the end of each chapter contain proofs for all results, as well as supplementary materials.
I wish you a pleasant reading of this thesis ! Chapter 1
Research background
In this first chapter, we provide a review of the relevant literature and the mathematical concepts related to portfolio selection that are needed to contextualize, understand and appreciate the contributions of this thesis, covered in chapters2 to5. We divide this chapter in five sections:
1. Mean-variance approaches (Section 1.1)
2. Higher-moment approaches (Section 1.2)
3. Risk-parity approaches (Section 1.3)
4. Information-theoretic approaches (Section 1.4)
5. Thesis contributions (Section 1.5)
Concerning the notation, we adopt the following convention:
• Deterministic scalars: lower case and plain text (a)
• Deterministic vectors: lower case and bold text (a) in column format
• Matrices: upper case and bold text (A)
• Univariate random variables: upper case and plain text (X)
• Multivariate random variables: upper case and bold text (X) in column format
1.1 Mean-variance approaches
Mean-variance portfolio theory marks the start of modern portfolio theory. It remains the baseline approach employed by both academics and practitioners. We review the related definitions in Section 1.1.1, the issue of estimation risk in Section 1.1.2, and robust mean-variance portfolios in Section 1.1.3.
7 Chapter 1. Research background 8
1.1.1 Definitions
The so-called modern portfolio theory was born in a series of papers by Markowitz (1952, 1959), Roy (1952), Sharpe (1962) and Merton (1971, 1972) among the most important ones.1 They together introduced a quantitative framework to formally describe and solve the investment problem of an investor who wants to optimize the mean-variance trade-off of his portfolio returns. The solution to this problem is summarized by the celebrated mean-variance efficient frontier.
Let us formalize the mean-variance portfolio-selection problem. We take as given an investment set of N assets among which the investor can allocate his wealth. At some time t, the ith asset is represented by its price Si,t. Given a time frequency τ (e.g. daily, monthly), the arithmetic return of the ith asset from t to t + τ is then
Si,t+τ − Si,t Xi,t,τ := . (1.1) Si,t
In this thesis, we focus on the single-period investment problem; we do not model the dynamics of asset returns. For this reason, we assume that the asset returns Xi,t,τ are i.i.d. over time and we drop the index t. The time frequency τ will also be clear from the context, and we adopt the simpler notation
Xi,t,τ ← Xi.
0 The N asset returns together form the multivariate random variable X = (X1,...,XN ) . In this thesis, we consider that there is no risk-free asset: the volatility σ > 0 for all i. Xi
Define the asset mean-return vector µ and covariance matrix Σ having entries µi := E[Xi] and Σij := E[(Xi − µi)(Xj − µj)], respectively. We always assume that Σ is of full rank and thus invertible. The investor’s portfolio problem is how to split his wealth among the 0 N assets. The portfolio is represented by a vector of weights w = (w1, . . . , wN ) . Unless otherwise stated, w belongs to the set
N 0 W := {w ∈ R | 1 w = 1}, (1.2)
1It is little known that Roy (1952) independently discovered the basic tenants of mean-variance portfolio theory. Markowitz (1999, p.5) will claim that “On the basis of Markowitz (1952), I am often called the father of modern portfolio theory (MPT), but Roy (1952) can claim an equal share of this honor.” Chapter 1. Research background 9 where 1 is a N-dimensional vector of ones. That is, the portfolio weights sum to one. The resulting return of this portfolio is
P = P (w) := w0X (1.3) with mean and variance 2 0 0 (µP , σP ) = (w µ, w Σw). (1.4)
Having introduced the necessary notation, we report the main properties of mean- variance-efficient portfolios. We use the concise notation of Bodnar et al. (2013).
Property 1.1 (Mean-variance-efficient portfolios). Let the matrix Q be defined as
Σ−1110Σ−1 Q := Σ−1 − . (1.5) 10Σ−11
We have the following properties:
(i) The minimum-variance portfolio is given by
−1 2 Σ 1 wMV := argmin σP =⇒ wMV = 0 −1 , (1.6) w∈W 1 Σ 1
with return mean and variance
10Σ−1µ 1 (µ , σ2 ) := , . (1.7) MV MV 10Σ−11 10Σ−11
(ii) Given a mean return µ0, the mean-variance portfolio is given by
µ − µ w := argmin σ2 subject to µ = µ =⇒ w = w + 0 MV Qµ. (1.8) µ0 P P 0 µ0 MV 0 w∈W µ Qµ
(iii) w is a mean-variance-efficient portfolio if µ ≥ µ . µ0 0 MV (iv) The mean-variance-efficient portfolio can also be defined via the quadratic utility function: λ 2 1 wλ := argmax µP − σP =⇒ wλ = wMV + Qµ, (1.9) w∈W 2 λ
µ0Qµ with the equivalence wλ = wµ if λ = for µ0 ≥ µMV . 0 µ0−µMV Chapter 1. Research background 10
(v) The tangent portfolio maximizing the Sharpe ratio (Sharpe 1994),
µP SRP := , (1.10) σP
is given by Σ−1µ wSR := argmax SRP =⇒ wSR = 0 −1 . (1.11) w∈W 1 Σ µ
(vi) The mean-variance efficient frontier is given by the function
s (µ − µ )2 σ (µ ) := σ2 + 0 MV (1.12) EF 0 MV µ0Qµ
for all µ0 ≥ µMV .
1.1.2 Estimation risk
Suppose that the investor wants to determine his portfolio at a given point in time to optimize his expected quadratic utility. Clearly, the investor does not know the true mean and covariance matrix of asset returns, and thus, will need to estimate his mean-variance- efficient portfolio via historical asset-return data. The standard estimation procedure to perform this is called the sample plug-in estimate, which consists of three steps.
1. Given a fixed time frequency τ (say, monthly), collect a sample of T historical return observations for each asset; for example, ten years of past monthly returns 0 (T = 120). We denote Xt = (X1,t,...,XN,t) the asset-return vector at each time t = 1,...,T .
2. Compute the sample (maximum-likelihood) estimators of µ and Σ:
T 1 X µˆ := X , (1.13) T t t=1 T 1 X 0 Σb := (Xt − µˆ)(Xt − µˆ) . (1.14) T t=1
3. Compute the sample mean-variance-efficient portfolio by plugging µˆ and Σb in place of the true µ and Σ:
−1 −1 Σb 1 1 ˆ Σb µˆ wˆMV = , wˆλ = wˆMV + Qµˆ, wˆSR = . (1.15) 10Σb −11 λ 10Σb −1µˆ Chapter 1. Research background 11
The sample plug-in estimate of the mean-variance portfolio is very sensitive to estimation risk because treating µˆ and Σb as if they were equal to their population counterparts ignores that sample estimators carry potentially substantial estimation errors with them. They can be very different from the true mean and covariance matrix of asset returns. For this reason, Michaud (1989, p.33–34) has coined sample mean-variance portfolios estimation-error maximizers, in the sense that “mean-variance optimization significantly overweights (underweights) those securities that have large (small) estimated returns, negative (positive) correlations and small (large) variances. These securities are, of course, the ones most likely to have large estimation errors.” This severely affects the out-of- sample performance of mean-variance portfolios. Out of sample means here that the future realized performance depends on portfolio weights estimated via past return data.
For mean-variance portfolios, it is well documented that estimation risk arises mainly from the mean-return vector µ rather than the covariance matrix Σ. This can be explained by a combination of two phenomena.
1. The variance of sample means tends to be much larger than the variance of sample variances and covariances; see Merton (1980). As a result, “even if the expected return on the market were known to be a constant for all time, it would take a very long history of returns to obtain an accurate estimate” (Merton 1980, p.326). To get an idea of what “a very long history of returns” might represent, DeMiguel et al. (2009b) show that, assuming Gaussian asset returns, the estimation window required for the sample tangent portfolio to outperform the equally weighted portfolio,
wEW := 1/N, (1.16)
is, for example, about 600 months (50 years) when N = 50 and that the true Sharpe ratio of the tangent and equally weighted portfolio are 0.40 and 0.20, respectively.
2. The mean-variance portfolio is much more sensitive to changes in µ than in Σ; see Best and Grauer (1991) and Chopra and Ziemba (1993). In particular, for a moderate value of the risk-aversion coefficient λ, the latter find that errors in means are about ten times more important than errors in variances, and again twice as much than errors in covariances.
As a result, the minimum-variance portfolio wMV , which is the only portfolio on the efficient frontier that does not require an estimate of asset mean returns, has been shown to outperform any other mean-variance portfolio in terms of out-of-sample Sharpe ratio Chapter 1. Research background 12
and turnover (the magnitude of changes in portfolio weights over time); see, among others, Jorion (1986), Chan et al. (1999), Jagannathan and Ma (2003), DeMiguel et al. (2009a), DeMiguel et al. (2009b) and Kourtis et al. (2012). In particular, Jagannathan and Ma (2003, p.1652) conclude that “the estimation error in the sample mean is so large nothing much is lost in ignoring the mean altogether”.
In order to formalize the superiority of the sample minimum-variance portfolio over the sample mean-variance portfolio, we follow Kan and Zhou (2007) and Kourtis et al. (2012). They consider the case where the investment set is made of N risky assets and one risk-free asset. In that case, the mean-variance-efficient portfolio is given by the risky assets weights 1 w = Σ−1µ, (1.17) λ λ 0 and a weight of 1 − 1 wλ on the risk-free asset. We wish to compare this portfolio with the minimum-variance portfolio that invests in the risky assets only (otherwise, it would
trivially invest 100% in the risk-free asset), given by wMV in (1.6). To that end, we denote the quadratic utility associated to a portfolio w as
λ U(w) := µ − σ2 , (1.18) P (w) 2 P (w)
and we define the expected quadratic-utility loss resulting from approximating the true portfolio w via a sample estimator wˆ as
L(wˆ, w) := U(w) − E(U(wˆ)), (1.19)
where the expectation is taken with respect to the true distribution of asset returns X. In this setup, assuming Gaussian asset returns, Kan and Zhou (2007) show that2
1 L(wˆ , w ) = (1 − c )SR2 + c , (1.20) λ λ 2λ 1 2
2 0 −1 where SR = µ Σ µ is the squared Sharpe ratio of the mean-variance portfolio wλ and
T T (T − 2) c = 2 − , (1.21) 1 T − N − 2 (T − N − 1)(T − N − 4) NT (T − 2) c = . (1.22) 2 (T − N − 1)(T − N − 2)(T − N − 4)
2 Note that because wλ maximizes U(w), L(wˆ, wλ) in (1.19) is always positive. Chapter 1. Research background 13
Moreover, Kourtis et al. (2012) show that
SR2 λ(T − 2) L(wˆ , w ) = − µ + σ2 . (1.23) MV λ 2λ MV 2(T − N − 1) MV
To assess which portfolio among the sample mean-variance portfolio wˆλ and sample minimum-variance portfolio wˆMV yields the lowest quadratic-utility loss, Kourtis et al.
(2012) compute the number of observations T required to have L(wˆλ, wλ) < L(wˆMV , wλ). They find that T is often unrealistically large. To give one example, assume monthly observations and a constant annual mean return of 10% for each asset, and fix N = 25, SR = 0.2 and λ = 1. Then, Kourtis et al. (2012) find that the number of observations required is T = 828; that is, 69 years of monthly observations !
1.1.3 Robust mean-variance portfolios
As we have seen in Section 1.1.2, the sample plug-in estimation is largely suboptimal for portfolio-selection purposes. In particular, a wide body of research shows that the sample minimum-variance portfolio largely outperforms the sample mean-variance portfolios, both theoretically and empirically. Still, the sample minimum-variance portfolio is itself subject to estimation risk because
(i) it is well known that the sample covariance matrix carries substantial estimation errors in high-dimensional settings as it is made of O(N 2) parameters;
(ii) the sample covariance matrix is the most efficient estimator assuming Gaussian asset returns, but its efficiency is very sensitive even to slight deviations from Gaussianity; see Huber (2004) and DeMiguel and Nogales (2009).
The second issue is particularly worrying as asset returns are well known to significantly deviate from Gaussianity; see Section 1.2.
As a result, many researchers have put forward robust minimum/mean-variance portfolios that outperform their sample counterparts out of sample. Fabozzi et al. (2010) and Section 3 of Kolm et al. (2014) provide a detailed review of such portfolios, including for downside risk measures. In this section, we review five well-known approaches that will be used later in our empirical analyses. The first three approaches are directed at the minimum-variance portfolio, and the later two at the mean-variance portfolio:
1. Shrinkage estimation of covariance matrix
2. Robust M-estimation of portfolio weights Chapter 1. Research background 14
3. Constraints on portfolio weights
4. Shrinkage estimation of portfolio weights
5. Bayesian estimation of mean-return vector
Note that, in this thesis, we take a quite general meaning of robustness. A robust portfolio is a portfolio that is stable and well-performing out of sample and, in the same vein, a robust estimator of some statistics is an estimator that yields a robust portfolio. Depending on the context, robustness can then be achieved in different ways, such as avoiding inputs that are difficult to estimate (e.g., the equally weighted or the minimum-variance portfolio), improving the robustness to outliers (e.g., the m-spacings estimator of entropy used in Chapter2) or reducing the number of parameters to estimate (e.g., the sparse estimation approach proposed in Chapter4).
1. Shrinkage estimation of covariance matrix One of the most commonly used approaches to reduce the amount of estimation error in the sample covariance matrix Σb is to rely instead on a shrinkage estimator of the form
Σb δ := (1 − δ)Σb + δFb, δ ∈ [0, 1], (1.24)
where δ is the shrinkage intensity and Fb is the target covariance matrix, taken as a sparse estimator of the true covariance matrix Σ. The rationale is to combine the unbiased but inefficient sample covariance matrix Σb with the biased but efficient target covariance matrix Fb. Shrinkage is an old technique in statistics (James and Stein 1961), but has been applied in portfolio selection for covariance-matrix estimation only starting with a series of papers published by Ledoit and Wolf (2003, 2004a, 2004b). They consider as shrinkage target, respectively, the single-factor model of Sharpe (1963), a scalar multiple of the identity matrix and a constant-correlation model. They calibrate the shrinkage intensity via an estimator δˆ? of the shrinkage intensity δ? minimizing the Frobenius norm:
? δ := argmin Σb δ − Σ , (1.25) δ∈[0,1]
where the Frobenius norm of a matrix A is defined as ||A|| := ptrace(A0A). In particular, δˆ? can be computed in closed form. Other noteworthy approaches used in portfolio selection to shrink the covariance matrix is the shrinkage of the inverse covariance matrix of Kourtis et al. (2012) and the non-linear shrinkage approach of Ledoit and Wolf (2017). An overview Chapter 1. Research background 15 of recent approaches in shrinkage of high-dimensional covariance matrices is provided in Fan et al. (2016).
2. Robust M-estimation of portfolio weights Another well-known approach, made popular by DeMiguel and Nogales (2009), is the M-estimation of portfolio weights. The idea is to replace the variance, which is based on the squared loss function, by a measure of risk based on a robust loss function. That is, a loss function that grows less rapidly than the squared loss. The M-estimator of portfolio-return risk is defined as
T 1 X s (w, m) := ρ(w0X − m), (1.26) ρ T t t=1 where ρ is a convex symmetric loss function with unique minimum at zero, and m is the M-estimator of portfolio mean return:
m := argmin sρ(w, x). (1.27) x∈R
This definition is a generalization of the sample mean and variance, which are obtained for the squared loss function ρ(x) = x2. DeMiguel and Nogales (2009) propose to minimize sρ(w, x) with respect to x and w at the same time to find the M-portfolio:
wρ := argmin sρ(w, x). (1.28) w∈W,x∈R
The use of a more robust (slowly growing) loss function makes the M-portfolio more robust to deviations from normality (such as jumps and fat tails) than the sample minimum-variance portfolio. DeMiguel and Nogales (2009) use Huber’s loss function defined as ( x2/2 if |x| c ρ(x) := 6 (1.29) c(|x| − c/2) if |x| > c for some return threshold c. In particular, DeMiguel and Nogales (2009) show analytically that the M-portfolio has a bounded influence function. The innovative approach of DeMiguel and Nogales (2009), compared to previous studies on M-estimation of portfolio weights such as Vaz-de Melo and Camara (2003), is that robust estimation and portfolio optimization are performed in one step, as seen in (1.28). This avoids the issues related to plug-in approaches. Chapter 1. Research background 16
3. Constraints on portfolio weights Another natural approach to improve the robustness of estimated portfolios is to add constraints on portfolio weights. By limiting the space of allowed portfolios, we expect the estimated portfolios to be less sensitive to changes in the return data and, concurrently, to require less rebalancing over time. The central reference in this area is DeMiguel et al. (2009a) who study the minimum-variance portfolio under a constraint on the norm of portfolio weights. They show that their framework nests several well-known approaches such as the equally weighted portfolio of DeMiguel et al. (2009b), the no-short-selling constraint of Jagannathan and Ma (2003) and the minimum-variance portfolio under shrinkage estimators of the covariance matrix above. In particular, they consider portfolio-
weight constraints of LASSO and Ridge type; that is, constraints on the `1 and `2 norm, respectively. Specifically, they propose to minimize the portfolio-return variance w0Σw 0 under the constraints 1 w = 1 and ||w||1 ≤ δ or ||w||2 ≤ δ. The `1-norm constraint limits
the amount of short-selling, while the `2-norm constraint shrinks the minimum-variance portfolio toward the equally weighted portfolio. Moreover, DeMiguel et al. (2009a) show that norm-constrained minimum-variance portfolios can be interpreted as resulting from Bayesian estimation: they are the portfolios corresponding to the mode of the posterior distribution of minimum-variance portfolio weights given a prior distribution on portfolio
weights that is either a Laplace distribution (`1 norm) or a Gaussian distribution (`2 norm). This Bayesian interpretation shows that norm-constrained minimum-variance portfolios
account for estimation risk. One drawback of the `1 and `2-norm constraints however is that they do not account for the fact that some assets may feature higher estimation risk than others, and thus, should see their weights be relatively more constrained. Realizing this, Levy and Levy (2014) introduce the global variance-based constraint of the form
N 2 σˆ X 1 Xi wi − ≤ δ. (1.30) N 1 PN σˆ i=1 N j=1 Xj
The rationale is “to impose more stringent constraints on stocks with relatively high standard deviations, as the estimation errors for these stocks’ parameters, and hence the potential economic loss, are larger than for stocks with relatively low standard deviations” (p.375). Finally, many other types of portfolio-weight constraints have been considered. For example, cardinality constraints (Bertsimas and Shioda 2009) that control the number of assets one can invest in, and portfolio-turnover constraints (Schreiner 1980, Olivares- Nadal and DeMiguel 2018) that limit the amount of rebalancing needed from one period to the next. Chapter 1. Research background 17
4. Shrinkage estimation of portfolio weights Another popular approach in reducing estimation risk in optimal portfolio weights is to combine two portfolio strategies together. If the estimation errors of the two portfolios are not perfectly correlated with one another, this can yield a portfolio with less estimation risk than either of the two strategies on their own. For example, Kan and Zhou (2007) shrink the sample mean-variance portfolio toward the sample minimum-variance portfolio, DeMiguel et al. (2009b) shrink the sample minimum-variance portfolio toward the equally weighted portfolio, and Tu and Zhou (2011) shrink the sample mean-variance portfolio (among other strategies) toward the equally weighted portfolio. Let us describe Kan and Zhou (2007)’s approach in more details. They consider portfolio weights of the form
1 −1 −1 wKZ := cΣb µˆ + dΣb 1 , (1.31) λ
0 with 1 − 1 wKZ invested in the risk-free asset. Then, they derive the coefficients c and d that minimize the expected quadratic-utility loss in (1.19) assuming asset returns are Gaussian. In particular, as they show, it is always optimal to invest in the sample minimum-variance portfolio; that is, d 6= 0 (unless 10Σb −1µˆ = 0). In the case of this thesis where the risk-free asset is not part of the investment universe, one can follow DeMiguel et al. (2009b) and Frahm and Memmel (2010) and rely on the normalized 0 weights wKZ /1 wKZ . Finally, one may want to consider other criteria than the expected quadratic-utility loss to find the optimal combination of portfolios. To that aim, DeMiguel et al. (2013) consider the shrinkage portfolios studied by Kan and Zhou (2007), DeMiguel et al. (2009b) and Tu and Zhou (2011), and derive analytical expressions for the shrinkage coefficients considering several calibration criteria such as minimum variance or maximum Sharpe ratio. They also consider non-parametric calibration via smoothed bootstrap.
5. Bayesian estimation of mean-return vector A final approach that has been extensively studied to improve the estimation of mean- variance portfolios consists in improving the estimation of the asset-return mean vector µ. Green et al. (2013) list more than 300 papers dealing with this problem. This is a challenging problem because
(i) mean-variance portfolios are highly sensitive to µ;
(ii) the variance of the sample mean can quickly explode even for moderate deviations of the data from Gaussianity (Huber 1964); Chapter 1. Research background 18
(iii) asset returns are not stationary over time and, in particular, display very low autocorrelation (Campbell et al. 1997).
Let us describe here the Bayesian approach to this problem, reviewed by Avramov and Zhou (2010) and Vanderveken (2019). We focus more particularly on the influential work of Jorion (1986) who derives a James-stein estimator of µ via a Bayesian procedure. Stein (1955) and James and Stein (1961) consider the risk function
Z Risk(µ) := Q(µ, µˆ(x))fX|µ(x)dx, (1.32) ΩX
where Q(µ, µˆ(x)) := (µ − µˆ(x))0Σ−1(µ − µˆ(x)) is the quadratic loss function, and show that the sample mean µˆ is inadmissible in the sense that there exists another estimator of µ that achieves a lower Risk(µ) for all values of µ. This estimator is a shrinkage estimator of the form
µˆJS := (1 − δ)µˆ + δµ01 (1.33)
for a given target mean return µ0 and a shrinkage intensity δJS that has an easy analytical expression. Berger (1978) then showed that considering the square of the quadratic loss function leads to an estimator that is very robust to the exact functional form of the loss. This estimator is given by (1.33) with a shrinkage intensity of the form
b δ = 0 −1 (1.34) d + T (µˆ − µ01) Σ (µˆ − µ01)
with b ∈ [0, 2(N − 2)] and weak conditions on d. Jorion (1986)’s insight was then to show that considering a conjugate prior on µ leads to an estimator of the type (1.33)–
(1.34) with b = d = N + 2 and µ0 = µMV . As he showed, this leads to mean-variance portfolios with larger quadratic utility than the sample mean-variance portfolio (δ = 0) and the sample minimum-variance portfolio (δ = 1). Whereas Jorion (1986) focused on an application of the Bayesian framework to mean-variance portfolios, we refer to Bodnar et al. (2017) for an application to the minimum-variance portfolio.
The above approaches provide robust minimum and mean-variance portfolios. Com- pared to their sample non-robust counterparts, they perform well in terms of out-of-sample mean-variance trade-off and remain more stable over time. However, these estimation strategies remain in the mean-variance framework, which is quite restrictive when asset returns are not Gaussian and that investors care about higher moments. Going beyond the first two moments is the subject of the next section. Chapter 1. Research background 19
1.2 Higher-moment approaches
To proceed, let us introduce some additional notations and definitions related to higher k moments. Let mk,P := E[(P − E[P ]) ] be the portfolio–return kth central moment, and
M3(X) and M4(X) be the asset-return coskewness and cokurtosis matrices defined as
0 0 M3(X) := E (X − µ)(X − µ) ⊗ (X − µ) , (1.35) 0 0 0 M4(X) := E (X − µ)(X − µ) ⊗ (X − µ) ⊗ (X − µ) , (1.36) where the Kronecker product ⊗ between two matrices A and B of size m × n and p × q is the mp × nq matrix A11B ...A1nB . . . A ⊗ B := . .. . . (1.37) Am1B ...AmnB Then, we have that the third and fourth portfolio-return central moments can be written concisely as (Briec et al. 2007, Boudt et al. 2008)
0 m3,P = w M3(X)(w ⊗ w), (1.38) 0 m4,P = w M4(X)(w ⊗ w ⊗ w). (1.39)
Further, we define the portfolio-return skewness and excess kurtosis as
m3,P m4,P (ζP , κP ) := 3 , 4 − 3 . (1.40) σP σP
The mean-variance-efficient portfolio in (1.9) is optimal assuming that asset returns are Gaussian or that investors’ preferences can be fully described by a quadratic utility function irrespective of the distribution of asset returns. Neither of these two assumptions is reasonable in practice.
1. Asset returns are not Gaussian, and thus, the choice of portfolio weights do not only impact the portfolio-return mean and variance, but also its higher moments. Evidence for the non-Gaussian behavior of asset returns are numerous; see, for example, Fama (1963), Mandelbrot (1963), Simkowitz and Beedle (1980), Das and Uppal (2004) and Massacci (2017). Gormsen and Jensen (2019) recently provide an in-depth study on the properties of higher-moment risk in stock returns. Asset returns, and particularly equity returns, feature so-called stylized facts (Cont 2001) Chapter 1. Research background 20
such as negative skewness, positive excess kurtosis and jumps that are not depicted by the Gaussian distribution.
2. Investors do not make decisions according to a quadratic utility function. This was realized early on for example by Jean (1971), Ingersoll (1975) and Scott and Horvath (1980). In particular, the latter show that risk-averse rational investors display positive preferences for odd moments (such as mean and skewness) and negative preferences for even moments (such as variance and kurtosis). Thus, the assumption that asset returns are Gaussian or that utility functions are quadratic, while admittedly helpful for the sake of mathematical tractability, is not realistic in practice. As Hanoch and Levy (1970, p.181) put it: “In the real world, investors’ utility functions and investment probability distributions of returns may assume highly complex or irregular forms. However, most theoretical discussions of choice under risk have dealt with relatively simple forms, for example, quadratic utility functions and normal probability distributions, in order to make more manageable the description and testing of investment decision rules.” This makes that the mean-variance-efficient portfolio advocated by Markowitz (1952) and many others can be largely suboptimal, particularly in times of crises where non-Gaussianity is very pronounced (Massacci 2017). Harvey and Siddique (2000) and Ang et al. (2006) for example show that investors are ready to sacrifice some amount of mean return or volatility in exchange of a larger skewness or lower kurtosis leading to less downside risk.
For this reason, researchers have put forward portfolio strategies that account for the higher moments of portfolio returns, instead of merely the mean and variance; see Briec et al. (2013) and Section 3.3.5 of Kolm et al. (2014) for reviews. While the theoretical precept for mean-variance strategies is clear—choose a portfolio on the mean-variance efficient frontier by maximizing your expected quadratic utility—there is no such consensus in the literature when it comes to higher moments. Higher-moment approaches can be classified in four different classes:
1. Approaches that aim to find a portfolio on the higher-moment efficient surface via an expected-utility approach.
2. Approaches that aim to find a portfolio on the higher-moment efficient surface without specifying a utility function.
3. Approaches that optimize higher moments via alternative performance criteria that may result in portfolios outside of the efficient surface. Chapter 1. Research background 21
These first three classes are direct in the sense that they explicitly optimize higher moments and thus need to estimate them.
4. Indirect approaches that improve higher moments without the need to estimate them.
Let us now describe these four classes in turn.
1.2.1 Efficient portfolios
The first two classes concern portfolios located on the higher-moment efficient surface. That is, for a selected number of moments, no other portfolio can perform better on all the selected moments.
The first class considers extensions of the quadratic utility function that include the effect of higher moments. Typically, this is achieved by relying on cubic or quartic utility functions; see Levy (1969), Hanoch and Levy (1970) and, more recently, Jondeau and Rockinger (2006), Guidolin and Timmermann (2008), Harvey et al. (2010) and Martellini and Ziemann (2010). This approach is justified because, for an infinitely differentiable utility function U, a Taylor series expansion around the mean µP gives the expected portfolio-return utility ∞ X U (k)(µ ) [U(P )] = P m . (1.41) E k! k,P k=0 Hence, one obtains the cubic and quartic utility functions by discarding the terms of order k > 3 and k > 4, respectively. For example, Martellini and Ziemann (2010) assume that the investor has a constant relative risk aversion (CRRA) utility function, also known as power utility function, given by
x1−λ − 1 U(x) = . (1.42) 1 − λ
In this case, the four-moment utility function is given by
λ λ(λ + 1) λ(λ + 1)(λ + 2) [U(P )] = U(µ ) − m + m − m . (1.43) E P 2 2,P 6 3,P 24 4,P
The portfolio maximizing (1.43) is located on the mean-variance-skewness-kurtosis efficient surface in the sense that no other portfolio can dominate it on all four moments. However, several studies show that this Taylor-series-expansion approach is not adequate because, for commonly used utility functions such as CRRA and CARA, it results in portfolios that Chapter 1. Research background 22 are barely impacted by the third and fourth moments; see Levy and Markowitz (1979), Cremers et al. (2005) and Markowitz (2014). The latter concludes that “a careful choice from a mean–variance efficient frontier will approximately maximize expected utility for a wide variety of concave (risk-averse) utility functions.” Thus, commonly used utility functions are said to be locally quadratic. Other utility functions that arguably better account for higher-moment preferences have been proposed, such as disappointment- aversion utility (Ang et al. 2005, Dahlquist et al. 2017) or S-shaped utility (Cremers et al. 2005). Still, it seems that the utility function may not be the simplest way for investors to design optimal portfolios in the presence of higher moments.
Given the mentioned difficulties associated with expected utility, the second class still aims to find a portfolio on the higher-moment efficient surface, but without the need to specify a utility function. Let us mention three examples. Lai et al. (1991) solve the mean-variance-skewness portfolio via a polynomial-goal-programming problem, whose merit is that it “requires only the investor’s preferences for mean and skewness of portfolio return, whereas the latter [the utility approach] has to specify an investor’s exact utility function, which is generally unknown or too complicated to be used reliably” (p.297). Athayde and Flores (2004) minimize the portfolio-return variance for a given mean and skewness. As they put it, “the utility function approach may not seem reasonable for fund managers, especially those who need to report to their clients the criteria used in selecting the portfolios” (p.1336). Briec et al. (2007) propose a primal approach whereby one searches for the largest improvements in mean, variance and skewness relative to a benchmark portfolio. This approach is guaranteed to give a global solution and does not require a utility function either. Specifically, given a benchmark portfolio wb, the mean-variance-skewness portfolio is computed by solving the problem
max δ δ∈[0,1],w∈W subject to w0µ ≥ w0 µ, b (1.44) 0 0 w Σw ≤ wbΣwb(1 − δ), 0 0 w M3(X)(w ⊗ w) ≥ wbM3(X)(wb ⊗ wb)(1 + sbδ),